In the CL classifier we like to filter out MatchResults or blame reasons which have a "very bad" score. In the old CL classifier "very bad" meant equal to zero; which is unfortunately flaky, due to IEEE-754 fuzz. In the new loglinear model based CL classifier we say "very bad" means a (log-domain) score equal to negative infinity; which is essentially the same thing, and just as flaky.
For the LLM-based CL classifier (at least), we should instead have a parameter for specifying when scores or probabilities are considered "very bad". That way users/clients can specify what counts as "close enough" to zero or log(zero).
One API issue is how exactly to phrase that parameter. E.g., should it be a predicate on the score or on the probability? The score is easy to compute (and already available), but because it isn't normalized that would make the predicate depend indirectly on the weights. Probabilities are normalized, so get rid of that issue; but are much more intensive to compute. If we do go with a predicate on the score, should it be on the log-domain score, or the normal-domain score?
Comment 1 by mbarbe...@chromium.org
, Oct 26 2017