Fuzzy title matching in DOM distiller |
|||
Issue descriptionThe title matching algorithm used in DOM distiller requires exact matching, except for the publisher that's stripped away. However, some sites uses slightly different titles in <title> and <h1>, causing the matching to fail. Example: https://www-marketwatch-com.cdn.ampproject.org/v/www.marketwatch.com/amp/story/guid/E6CA6E62-F220-11E6-82ED-7800910FCE87?amp_js_v=7 What's in <title>: Tesla could decide to tap capital markets as its shares rally analyst says - MarketWatch What's in <h1>: Tesla could decide to tap capital markets as its shares rally, analyst says If edit distance is short enough, 1 in this example, then it should still match.
,
Mar 16 2017
,
Feb 15 2018
|
|||
►
Sign in to add a comment |
|||
Comment 1 by wychen@chromium.org
, Feb 23 2017