In the UnnormalizedLogLinearModel class there's the question about how best to store the FeatureValue vector and the weight covector. In the future we will get more and more features, so performance will become a concern.
The standard thing here is to ensure that the weights are *sparse* (i.e., mostly zeros), which is ensured by the training of TrainableLogLinearModel. However right now (i.e., githash ec44c2881) we store the weights as a list (or np.ndarray) which is *dense*; thus we're forced to iterate over all the entries, rather than being able to automatically skip over the zero entries. So step one is to replace this with some sparse representation; e.g., a dict mapping the feature names to their weights, where we (a) ensure we don't store any zero weights, and (b) assume missing keys means those features are mapped to zero weights. This part will probably be handled by the solution to https://crbug.com/674262
However, even if we have a sparse representation for the weight covector, there's still a performance issue. Because the FV vector will be also stored as a dict (so we can zip the dicts along their keys), that means we'll first compute *all* the FVs (to store them in the dict) and then throw most of them away since they end up having weight zero. It'd be better to use a *lazy* dict, so that we only compute the FV object once we know it'll be needed. (This is an example of why we want to distinguish "vectors" from "covectors"; because we want to represent them differently.) Of course, a "lazy dict" is the exact same thing as a MemoizedFunction! Even better, since we know we will only access each value once, we don't even need to memoize: we can use a plain function.
So ultimately we'll want to compute the scores as in the following pseudocode:
def score(ws, f, x, ys):
"""
ws : dict(FeatureName, float)
f : X -> Y -> FeatureName -> FeatureValue
x : X
ys : iterable(Y)
"""
# let the feature function do preprocessing work given x.
fx = f(x)
scores = {}
for y in ys:
# let the feature function do more preprocessing given y; if desired.
fxy = fx(y)
# This fsum(fvi * wi) is the dot product.
scores[y] = math.fsum(
# when we apply fxy to i, *that's* when we compute the FV!
fxy(i) * wi
# Because ws is sparse, we don't even look at zero weights!
for i, wi in ws.iteritems())
return scores
This code avoids all unnecessary work in the dot product, and almost all unnecessary work in computing the FV vector. The one place unnecessary work could slip in is if ``fx(y)`` does a bunch of work; because there's a chance that it does work towards computing FVs which we'll never ask for. We should either (a) accept that as an unavoidable loss, or (b) have ``fx(y)`` do no interesting work (i.e., immediately return the fxy function) and instead only does the work once ``fxy(i)` is computed.
In the UnnormalizedLogLinearModel class there's the question about how best to store the FeatureValue vector and the weight covector. In the future we will get more and more features, so performance will become a concern.
The standard thing here is to ensure that the weights are *sparse* (i.e., mostly zeros), which is ensured by the training of TrainableLogLinearModel. However right now (i.e., githash ec44c2881) we store the weights as a list (or np.ndarray) which is *dense*; thus we're forced to iterate over all the entries, rather than being able to automatically skip over the zero entries. So step one is to replace this with some sparse representation; e.g., a dict mapping the feature names to their weights, where we (a) ensure we don't store any zero weights, and (b) assume missing keys means those features are mapped to zero weights. This part will probably be handled by the solution to https://crbug.com/674262
However, even if we have a sparse representation for the weight covector, there's still a performance issue. Because the FV vector will be also stored as a dict (so we can zip the dicts along their keys), that means we'll first compute *all* the FVs (to store them in the dict) and then throw most of them away since they end up having weight zero. It'd be better to use a *lazy* dict, so that we only compute the FV object once we know it'll be needed. (This is an example of why we want to distinguish "vectors" from "covectors"; because we want to represent them differently.) Of course, a "lazy dict" is the exact same thing as a MemoizedFunction! Even better, since we know we will only access each value once, we don't even need to memoize: we can use a plain function.
So ultimately we'll want to compute the scores as in the following pseudocode:
def score(ws, f, x, ys):
"""
ws : dict(FeatureName, float)
f : X -> Y -> FeatureName -> FeatureValue
x : X
ys : iterable(Y)
"""
# let the feature function do preprocessing work given x.
fx = f(x)
scores = {}
for y in ys:
# let the feature function do more preprocessing given y; if desired.
fxy = fx(y)
# This fsum(fvi * wi) is the dot product.
scores[y] = math.fsum(
# when we apply fxy to i, *that's* when we compute the FV!
fxy(i) * wi
# Because ws is sparse, we don't even look at zero weights!
for i, wi in ws.iteritems())
return scores
This code avoids all unnecessary work in the dot product, and almost all unnecessary work in computing the FV vector. The one place unnecessary work could slip in is if ``fx(y)`` does a bunch of work; because there's a chance that it does work towards computing FVs which we'll never ask for. We should either (a) accept that as an unavoidable performance loss, or (b) have ``fx(y)`` do no interesting work (i.e., immediately return the fxy function) and instead only do the work once ``fxy(i)` is computed.
Comment 1 by wrengr@chromium.org
, Jan 11 2017