New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 680525 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows , Mac
Pri: 3
Type: Bug-Regression



Sign in to add a comment

Chrome thinks that chase.com/verifycard is spanish

Project Member Reported by thakis@chromium.org, Jan 12 2017

Issue description

Chrome Version       : 56.0.2924.28
OS Version: OS X 10.12.2
URLs (if applicable) : chase.com/verifycard (redirects to https://www.chase.com/content/chasecom/en/credit-cards/rtbl/verify-credit-card)

What steps will reproduce the problem?
1. Go to chase.com/verifycard


What is the expected result?

No "this is spanish; translate?" bar

What happens instead of that?

Bar does show up. It's true that there is some spanish on the page, but it's mostly english.

Please provide any additional information below. Attach a screenshot if
possible.

UserAgentString: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.28 Safari/537.36



 
Screen Shot 2017-01-12 at 10.46.39 AM.png
875 KB View Download
Labels: Needs-Bisect

Comment 2 by thakis@chromium.org, Jan 13 2017

Cc: andrewhayden@chromium.org abakalov@chromium.org
I'm not sure this is a regression, so I don't think a bisect will help much. Maybe it'll find the CLD3 switch, or maybe it won't.
Cc: -abakalov@chromium.org djweiss@chromium.org riesa@chromium.org
Owner: abakalov@chromium.org
Status: Assigned (was: Untriaged)
Thanks for reporting this. I did some debugging. Yes, there is a paragraph in Spanish, text in English, and the latter is a little longer (note that there are a couple of images with text as well). To handle gibberish input, as a preprocessing step, the model applies a couple of heuristics like removing repetitious snippets close to each other. For this page, this approach results in ignoring short strings like "log in to verify receipt of your card" and "don’t have a chase user id enroll now", so the English text gets reduced.

We have started working on a new lang id model that recognizes multiple languages even if the text is in the same script. Currently, CLD2 and CLD3 split on script first, and then make a prediction for each string of the same script. Another promising approach for addressing this issue is adjusting the heuristics mentioned above.
Labels: -Type-Bug -Pri-3 -Needs-Bisect M-57 hasbisect OS-Windows Pri-1 Type-Bug-Regression
Able to reproduce the issue on Windows 10 and Mac 10.12.2 using chrome reported version #56.0.2924.28 and latest canary #57.0.2979.0.
This issue is not reproducible on Ubuntu 14.04.

Bisect Information:
=====================
Good build: 44.0.2395.0	 Revision(328881)
Bad Build : 44.0.2396.0	 Revision(329009)

Change Log URL: 
https://chromium.googlesource.com/chromium/src/+log/83aa5be892018f800328c1b7d03f6fc37bc22d5b..90bb2b934366c595c1c979e0b2363f0a822e1b92

From the above change log suspecting below change

Review url: https://codereview.chromium.org/1125403004

abakalov@/andrewhayden@ - Could you please check whether this is caused with respect to your change, if not please help us in assigning it to the right owner.

Thanks...!!
I am one of the people working on the new language detector (CLD3) and will look into this issue, so let's keep the ownership as is.
Cc: yyushkina@chromium.org
Labels: -Pri-1 Hotlist-CLD3 Pri-2
Components: -UI>Browser>Translate UI>Browser>Language>Translate
Labels: -Pri-2 Pri-3

Sign in to add a comment