This is a meta-bug for various improvements to the code for loglinear models I have in mind to do. For now, I list the things below rather than opening separate tickets, since many of these improvements are still putative.
* When creating the model, take a single vector-valued function rather than an array of scalar-valued functions. Of course, this introduces some possibility for type errors since the vector isn't guaranteed to have the same dimensionality as the weight covector.
* Use decorators to automate the memoization of properties. For inspiration see (BSD3) https://github.com/estebistec/python-memoized-property and be sure to address the lifetime issues mentioned in http://code.activestate.com/recipes/577452-a-memoize-decorator-for-instance-methods/ . Since they're properties rather than functions/methods, we really just want lazy evaluation here (though we need to be able to un-memoize if/when the weights change)
* Use decorators to automate the other memoization we do.
* Find a way to avoid needing the entire domain of Y when computing the partition function. All we really need is the subset with non-zero features (which should be recoverable from the memoized feature function itself), and the cardinality of the subset with zero features. Though there might be some better way to sparsify things. This is especially crucial for dealing with CL-classification (if we want probabilities rather than scores).
* When initializing the model, is there a better way to detect "is zero" than using absolute-error?
* As the number of features increases it will become more crucial to switch to a sparse representation for the weight covector (allowing us to compute the inner product sparsely). Alas, NumPy doesn't do sparseness well (and while SciPy does offer sparse representations, they don't play well with NumPy's operators).
* It may make sense to have a wrapper type around np.array, so we can move the various norms etc to be methods on that class rather than on LogLinearModel itself.
* The FeatureMap method may need to be smarter about how it memoizes things. Rather than keeping all the memos, we may want to do the LRU trick to reduce memory overhead.
* Do we want to memoize the exp versions of various things?
* When computing logZ, is there any feasible way to determine the maximum value/score without walking over the list of them twice?
* Can we get sparseness information out of ConditionalProbability? That'd speed up computing the expectation.
* Are there any tricks to computing the expectation more precisely or efficiently than using ConditionalProbability directly?
This is a meta-bug for various improvements to the code for loglinear models I have in mind to do. For now, I list the things below rather than opening separate tickets, since many of these improvements are still putative.
* When creating the model, take a single vector-valued function rather than an array of scalar-valued functions. Of course, this introduces some possibility for type errors since the vector isn't guaranteed to have the same dimensionality as the weight covector.
* Use decorators to automate the memoization of properties. For inspiration see (BSD3) https://github.com/estebistec/python-memoized-property and be sure to address the lifetime issues mentioned in http://code.activestate.com/recipes/577452-a-memoize-decorator-for-instance-methods/ . Since they're properties rather than functions/methods, we really just want lazy evaluation here (though we need to be able to un-memoize if/when the weights change)
* Use decorators to automate the other memoization we do.
* Find a way to avoid needing the entire domain of Y when computing the partition function. All we really need is the subset with non-zero features (which should be recoverable from the memoized feature function itself), and the cardinality of the subset with zero features. Though there might be some better way to sparsify things. This is especially crucial for dealing with CL-classification (if we want probabilities rather than scores).
* When initializing the model, is there a better way to detect "is zero" than using absolute-error?
* As the number of features increases it will become more crucial to switch to a sparse representation for the weight covector (allowing us to compute the inner product sparsely). Alas, NumPy doesn't do sparseness well (and while SciPy does offer sparse representations, they don't play well with NumPy's dot http://stackoverflow.com/a/13274310). However, we may be able to roll our own with np.flatnonzero http://stackoverflow.com/a/13274310
* It may make sense to have a wrapper type around np.array, so we can move the various norms etc to be methods on that class rather than on LogLinearModel itself.
* The FeatureMap method may need to be smarter about how it memoizes things. Rather than keeping all the memos, we may want to do the LRU trick to reduce memory overhead.
* Do we want to memoize the exp versions of various things?
* When computing logZ, is there any feasible way to determine the maximum value/score without walking over the list of them twice?
* Can we get sparseness information out of ConditionalProbability? That'd speed up computing the expectation.
* Are there any tricks to computing the expectation more precisely or efficiently than using ConditionalProbability directly?
This is a meta-bug for various improvements to the code for loglinear models I have in mind to do. For now, I list the things below rather than opening separate tickets, since many of these improvements are still putative.
* Use decorators to automate the memoization of properties. For inspiration see (BSD3) https://github.com/estebistec/python-memoized-property and be sure to address the lifetime issues mentioned in http://code.activestate.com/recipes/577452-a-memoize-decorator-for-instance-methods/ . Since they're properties rather than functions/methods, we really just want lazy evaluation here (though we need to be able to un-memoize if/when the weights change)
* Use decorators to automate the other memoization we do.
* Find a way to avoid needing the entire domain of Y when computing the partition function. All we really need is the subset with non-zero features (which should be recoverable from the memoized feature function itself), and the cardinality of the subset with zero features. Though there might be some better way to sparsify things. This is especially crucial for dealing with CL-classification (if we want probabilities rather than scores).
* When initializing the model, is there a better way to detect "is zero" than using absolute-error?
* As the number of features increases it will become more crucial to switch to a sparse representation for the weight covector (allowing us to compute the inner product sparsely). Alas, NumPy doesn't do sparseness well (and while SciPy does offer sparse representations, they don't play well with NumPy's dot http://stackoverflow.com/a/13274310). However, we may be able to roll our own with np.flatnonzero http://stackoverflow.com/a/13274310
* It may make sense to have a wrapper type around np.array, so we can move the various norms etc to be methods on that class rather than on LogLinearModel itself.
* The FeatureMap method may need to be smarter about how it memoizes things. Rather than keeping all the memos, we may want to do the LRU trick to reduce memory overhead.
* Do we want to memoize the exp versions of various things?
* When computing logZ, is there any feasible way to determine the maximum value/score without walking over the list of them twice?
* Can we get sparseness information out of ConditionalProbability? That'd speed up computing the expectation.
* Are there any tricks to computing the expectation more precisely or efficiently than using ConditionalProbability directly?
Can you add reference for some of the ideas? like "computing the partition function" and "a sparse representation for the weight covector (allowing us to compute the inner product sparsely)", which can help me understand those ideas, thanks!
@katesonia: where did you want me to put the references? I could put them here, but they'd end up getting lost in the long run. I can put them either in the code or in the gdoc describing LLMs (or both).
Comment 1 by wrengr@chromium.org
, Nov 29 2016