[Predator] Use linear score instead of log linear score. |
||
Issue description[Predator] Use linear score instead of log linear score. Previous, a feature value is in (log(0), log(1)], for example, feature like ``MinDistanceFeature``, if distance between changed lines and crashed lines are more than 50, we score it log(0), in this case, no matter what value it gets for other features, we won't consider this suspect as culprit. However, this way of evaluating features will make the log linear model hard to scale. For example, even thought a suspect didn't touch crashed files in a stacktrace, we still want to checkout whether it touched files under the same directory of crashed files. So we need to switch the value range from (log(0), log(1)] to [0, 1].
,
Feb 28 2017
Using [0, 1] is for convenience. I think the feature value can just be how much percentage this crash has of such a feature, and we let the weight to determine how important this feature is. For example, Weight: -inf/log(0): if have this feature, absolutely don't blame this suspect Weight: 0: this feature doesn't matter at all. Weight: inf: if have this feature, absolutely blame this suspect In this case, the feature just a percentage, and we can adjust the weight as we want, and do not need to change the feature code.
,
Mar 2 2017
The weights do indeed say how much we care about a given feature. But each feature also needs to say how strongly it feels, and *in which direction*. Restricting feature values to [0,1] aka [log(1),log(e)] means features can only ever vote *for* blaming a given suspect: because (log(1),log(e)] is a subset of (log(1),log(inf)). But we also want to allow features to vote *against* blaming a given suspect: i.e., to return values in (log(0),log(1)). Feel free to ignore the fact that logs are involved at all. When features return negative values, that means they're voting against blaming the suspect; when they return positive values, that means they're voting for blaming the suspect. Positive weights mean to trust what the feature says; negative weights mean to trust that the feature is always wrong: i.e., flip the direction of what the feature says re voting for/against. And in all cases the absolute value of the feature value, weight, or score means how strongly to feel that way. Part of the whole point of loglinear models is that we don't need to worry about computing "percentages". We can just have the feature return whatever value it likes. If we want to returns values in [-5,5] we can. If we want values in [0,1] we can. If we want to return values in [-42,0] we can. Each feature can choose its own range of values; and the weights will rescale things so that all the various features contribute an appropriate amount to the overall decision about whether to blame the suspect or not.
,
Mar 2 2017
Also, re weights. Weight 0 does indeed mean to ignore the feature entirely. However, infinite weights are a bit funny. Whether it means to absolutely blame or absolutely not blame, depends on the sign of the feature value. Infinity means to absolutely do whatever the feature says; negative infinity means to absolutely do the opposite of whatever the feature says.
,
Mar 2 2017
This change is just to change the current negative feature value from (log(0), log(1)] to [0, 1]. So we can get rid of -inf/log(0). It's possible some feature may have range hard to be normalized to [0, 1] or have value on both direction. In this case, I am ok with expending the range from [0, 1] to (-inf, inf).
,
Mar 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/0ed0c5d627e1d1a69564ad36c6a5cabef96f1d2d commit 0ed0c5d627e1d1a69564ad36c6a5cabef96f1d2d Author: Sharu Jiang <katesonia@google.com> Date: Fri Mar 10 01:34:59 2017 [Predator] Use linear feature value instead of log linear value. Previous, a feature value is in (log(0), log(1)], for example, feature like ``MinDistanceFeature``, if distance between changed lines and crashed lines are more than 50, we score it log(0), in this case, no matter what value it gets for other features, we won't consider this suspect as culprit. However, this way of evaluating features will make the log linear model hard to scale. For example, even thought a suspect didn't touch crashed files in a stacktrace, we still want to checkout whether it touched files under the same directory of crashed files. So we need to switch the value range from (log(0), log(1)] to [0, 1]. BUG= 695619 Change-Id: I3c4f0400b08867540d3d96098cd458435aa2df45 Reviewed-on: https://chromium-review.googlesource.com/446528 Reviewed-by: Chan Li <chanli@chromium.org> Commit-Queue: Sharu Jiang <katesonia@google.com> [modify] https://crrev.com/0ed0c5d627e1d1a69564ad36c6a5cabef96f1d2d/appengine/findit/crash/loglinear/changelist_features/test/touch_crashed_file_test.py [modify] https://crrev.com/0ed0c5d627e1d1a69564ad36c6a5cabef96f1d2d/appengine/findit/crash/loglinear/changelist_features/top_frame_index.py [modify] https://crrev.com/0ed0c5d627e1d1a69564ad36c6a5cabef96f1d2d/appengine/findit/crash/loglinear/changelist_features/test/touch_crashed_directory_test.py [modify] https://crrev.com/0ed0c5d627e1d1a69564ad36c6a5cabef96f1d2d/appengine/findit/crash/loglinear/test/changelist_classifier_test.py [modify] https://crrev.com/0ed0c5d627e1d1a69564ad36c6a5cabef96f1d2d/appengine/findit/crash/loglinear/changelist_features/test/touch_crashed_file_meta_test.py [modify] https://crrev.com/0ed0c5d627e1d1a69564ad36c6a5cabef96f1d2d/appengine/findit/crash/loglinear/changelist_features/test/min_distance_test.py [modify] https://crrev.com/0ed0c5d627e1d1a69564ad36c6a5cabef96f1d2d/appengine/findit/crash/loglinear/changelist_features/touch_crashed_file.py [modify] https://crrev.com/0ed0c5d627e1d1a69564ad36c6a5cabef96f1d2d/appengine/findit/crash/loglinear/changelist_features/test/top_frame_index_test.py [modify] https://crrev.com/0ed0c5d627e1d1a69564ad36c6a5cabef96f1d2d/appengine/findit/crash/loglinear/changelist_features/touch_crashed_directory.py [modify] https://crrev.com/0ed0c5d627e1d1a69564ad36c6a5cabef96f1d2d/appengine/findit/crash/loglinear/changelist_features/min_distance.py
,
May 15 2017
|
||
►
Sign in to add a comment |
||
Comment 1 by wrengr@chromium.org
, Feb 28 2017