[Predator] Remove dependency on CrashConfig |
|||
Issue descriptionMany objects (e.g., instances of the Findit class) call CrashConfig.Get() directly, which is problematic. We should eliminate (or minimize) this dependency. Here are some reasons why the dependency is a problem in the first place: It's a hidden dependency. Being essentially a global variable, we end up depending on it even if none of the surface code looks like it depends on it. It's antimodular. Having a single enormous config object that everyone depends on makes it extremely difficult to decouple things. It's a lot better to read the config once in order to construct a bunch of objects, and then let them keep track of their own configuration parameters. This frees us up to reorganize the config file, without worrying about touching every file that can be configured. It's poorly typed. Because CrashConfig is (for all intents and purposes) a dict, we end up using strings everywhere in order to access fields. This is very fragile and compounds the issue of being antimodular. (Albeit, this issue can be addressed separately from removing the dependency; e.g., by using an object with properties/attributes, rather than using a dict). It's volatile. Because we cache some parts of CrashConfig.Get() in objects (e.g., regexes, top-N values, etc), but we call it at many different points, it's easy to end up in an inconsistent state. Whereas, if we passed all the relevant parts to the constructors, then each object is guaranteed to be consistent. (There could still be inconsistencies between different objects, but that's easier to address.)
,
Nov 17 2016
,
Jan 30 2017
Per "because self.config is volatile, we need some way of updating the Predator instance whenever the config changes. How to do that cleanly?" Because a new ``CrashWrapperPipeline`` is created for every crash, so I think it's reasonable to assume that the config during an analysis of one crash is consistent. So no need to worry updating the predator instance during analyzing.
,
Jan 31 2017
The big thing is just to make sure that we only ever see a consistent/coherent view of the config. We want both internal consistency and external consistency: Internal consistency: we want to make sure that all the object instances only ever see a single view of the config. Since our processes are short-lived and we don't (afaict) care about dynamically re-running the classifiers whenever the config changes, it should be sufficient to request the config once and cache it, so that everything in the same process always sees the cached version. External consistency: Since the different phases run in different processes, and thus have different object instances, we may also want to take steps to ensure consistency over the multi-process lifetime of "analyzing a single crash report". This basically means serializing the cached config into the pipeline objects, so that when the pipelines become processes they can read from the cache rather than querying the ndb.Model for the config. (We needn't store the raw config itself, we could instead store whatever the objects got from processing that data.) If we have long-lived processes (e.g., server-type processes), then we'd have to worry about actually doing dynamic updates. But I think we should be able to get by without that complexity.
,
Sep 15 2017
Right now, config is passed to predator_app.py |
|||
►
Sign in to add a comment |
|||
Comment 1 by kateso...@chromium.org
, Nov 17 2016