Corpus with representative sample for DOM distiller evaluation |
|||||
Issue descriptionCurrently we use the dataset "reader-mode-golden-data" for performance evaluation, but it is not representative of what users see, so trade-offs based on that dataset might be biased. We should build another dataset that can represent the real world distribution. Since it would only be for performance evaluation, we don't really need the "golden answer" part. This way, creating the dataset can be automated.
,
Mar 11 2016
All the articles in "reader-mode-golde-data" have <meta property="og:type" content="article" />, so markup_parsing time would be biased.
,
Oct 7 2016
Besides performance evaluation, the corpus can also be used for output difference detection. Since a recent bug (issue 654058) can really use a representative corpus with high coverage, and we can support MHTML in our eval server, it might be a good timing to make it happen.
,
Oct 7 2016
,
Oct 7 2016
Non-mobile-friendly distillable corpus is here: cl/135527281
,
Oct 24 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a22aa4380f04d2d8aa18f1f3098393516646f181 commit a22aa4380f04d2d8aa18f1f3098393516646f181 Author: wychen <wychen@chromium.org> Date: Mon Oct 24 19:38:13 2016 Roll DOM Distiller JavaScript distribution package Diff since last roll: https://github.com/chromium/dom-distiller/compare/d16a68c1b8...072fe57b48 Picked up changes: 072fe57 Recognize H4 to H6 as headings as well 52047b4 Avoid using getClassName() to avoid issues with <svg> 8cf93ce Bump ChromeDriver version to 2.24 d876125 Add gen_mhtml_corpus.py to convert MHTML to eval corpus 8b33c8b Amend "Fix partially hidden article" 3fd2017 Strip unwanted classNames from all nodes BUG=593457,599121, 647098 , 658038 Review-Url: https://codereview.chromium.org/2447453002 Cr-Commit-Position: refs/heads/master@{#427118} [modify] https://crrev.com/a22aa4380f04d2d8aa18f1f3098393516646f181/DEPS [modify] https://crrev.com/a22aa4380f04d2d8aa18f1f3098393516646f181/third_party/dom_distiller_js/README.chromium
,
Mar 16 2017
We might want a corpus representative for iOS Reading List, if we want to measure performance changes.
,
Feb 15 2018
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by wychen@chromium.org
, Mar 9 2016