Issue metadata
Sign in to add a comment
|
Chrome thinks that chase.com/verifycard is spanish |
||||||||||||||||||||
Issue descriptionChrome Version : 56.0.2924.28 OS Version: OS X 10.12.2 URLs (if applicable) : chase.com/verifycard (redirects to https://www.chase.com/content/chasecom/en/credit-cards/rtbl/verify-credit-card) What steps will reproduce the problem? 1. Go to chase.com/verifycard What is the expected result? No "this is spanish; translate?" bar What happens instead of that? Bar does show up. It's true that there is some spanish on the page, but it's mostly english. Please provide any additional information below. Attach a screenshot if possible. UserAgentString: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.28 Safari/537.36
,
Jan 13 2017
I'm not sure this is a regression, so I don't think a bisect will help much. Maybe it'll find the CLD3 switch, or maybe it won't.
,
Jan 13 2017
Thanks for reporting this. I did some debugging. Yes, there is a paragraph in Spanish, text in English, and the latter is a little longer (note that there are a couple of images with text as well). To handle gibberish input, as a preprocessing step, the model applies a couple of heuristics like removing repetitious snippets close to each other. For this page, this approach results in ignoring short strings like "log in to verify receipt of your card" and "don’t have a chase user id enroll now", so the English text gets reduced. We have started working on a new lang id model that recognizes multiple languages even if the text is in the same script. Currently, CLD2 and CLD3 split on script first, and then make a prediction for each string of the same script. Another promising approach for addressing this issue is adjusting the heuristics mentioned above.
,
Jan 13 2017
Able to reproduce the issue on Windows 10 and Mac 10.12.2 using chrome reported version #56.0.2924.28 and latest canary #57.0.2979.0. This issue is not reproducible on Ubuntu 14.04. Bisect Information: ===================== Good build: 44.0.2395.0 Revision(328881) Bad Build : 44.0.2396.0 Revision(329009) Change Log URL: https://chromium.googlesource.com/chromium/src/+log/83aa5be892018f800328c1b7d03f6fc37bc22d5b..90bb2b934366c595c1c979e0b2363f0a822e1b92 From the above change log suspecting below change Review url: https://codereview.chromium.org/1125403004 abakalov@/andrewhayden@ - Could you please check whether this is caused with respect to your change, if not please help us in assigning it to the right owner. Thanks...!!
,
Jan 13 2017
I am one of the people working on the new language detector (CLD3) and will look into this issue, so let's keep the ownership as is.
,
Apr 14 2017
,
Apr 27 2017
,
Mar 7 2018
|
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by manoranj...@chromium.org
, Jan 13 2017