New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 750002 link

Starred by 3 users

Issue metadata

Status: WontFix
Owner:
Closed: Dec 20
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , iOS , Chrome , Mac
Pri: 2
Type: Bug



Sign in to add a comment

If a site has both an html lang attribute specified and a content-language header, we should use the lang attribute to determine the page's language.

Reported by dchau...@etouch.net, Jul 28 2017

Issue description

Chrome Version: 61.0.3163.16 (Official Build)742cbfe2e3f95476845193f35d4298335c22a522-refs/branch-heads/3163@{#97} 32/64-bit.
OS: Windows(7,8,10), Mac(10.11.6,10.12.3,10.12.5), Linux(14.04 LTS).

What steps will reproduce the problem?
1. Launch chrome, go to https://www.sogou.com/ and observe.

Translate bubble doesn't appear.
Translate bubble should appear.

This is a non-regression issue, seen from M-45 series.

Kindly review the attached screen-cast for reference.
 
Actual behavior.mp4
683 KB View Download
Status: Untriaged (was: Unconfirmed)
As this being a Non-Regression issue, changing the status to Untriaged so that the issue would get addressed.

Thank You!

Comment 2 by groby@chromium.org, Aug 4 2017

Cc: napper@chromium.org
Components: -UI>Browser>Bubbles UI>Browser>Language>Translate
This is a Translate issue, not a bubble one -> reclassifying.

In geeral, sogou.com is served with a "content-language:en-us" header, which Translate currently interprets as "This is english content". 

You could make a weak argument that this is _not_ what the standard (https://tools.ietf.org/html/rfc2616#section-14.12) says: " The Content-Language entity-header field describes the natural language(s) **of the intended audience** for the enclosed entity" (emphasis added)

I.e. it says the natural language of the reader is en-US. That is... technically correct, the best kind of correct. 

But that's contradicted by the later https://tools.ietf.org/html/rfc3282. Of course, the next RFC goes back to the original definition: https://tools.ietf.org/html/rfc7231#section-3.1.3.2

We have so many standards so we can choose the appropriate one...

But kidding aside, given that the page also serves an <html lang="cn"> I would argue that that attribute should override content-language.


Cc: yyushkina@chromium.org
Owner: napper@chromium.org
Status: Assigned (was: Untriaged)
Jon - let's take a look at this + the most frequently mismatched languages b/w what we detect and what content-language header and lang attribute say (I think that data should be in now) and decide how/whether to adjust the logic. I think at the very least we should take to heart groby@'s point that if a site has both a lang attribute specified and a content-language header we should use the lang attribute.
Status: Fixed (was: Assigned)
Status: Assigned (was: Fixed)
Update:

Retested this issue on Windows(7,8,10), Mac(10.11.6,10.12.3,10.12.5) and Linux(14.04 LTS) machines on using Stable/Beta (build # 64.0.3282.140), Dev (build # 65.0.3325.31) and Canary (build # 66.0.3340.0). Issue is still persist.

Thank you.

Labels: -M-62 OS-Android OS-Chrome OS-iOS
Owner: ----
Status: Available (was: Assigned)
Summary: If a site has both an html lang attribute specified and a content-language header, we should use the lang attribute to determine the page's language. (was: [Non regression] Translate bubble doesn't appear for sogou.com)
Our fix for the most mismatched languages does not address the issue where a site's own content-language header and html lang attribute disagree. We should change the logic deciding what page a language is in so that if a site has both a lang attribute specified and a content-language header, we use the lang attribute.
Cc: ma...@chromium.org
Labels: Hotlist-GoodFirstBug
@mathp - this would be a good first bug for the new language team folks
Owner: anthonyvd@chromium.org
Status: Assigned (was: Available)
Cc: -yyushkina@chromium.org suproteem@chromium.org
Cc: anthonyvd@chromium.org
Owner: frechette@chromium.org
+frechette@ who's looking at HTML lang
Status: WontFix (was: Assigned)
We do adopt HTML lang over content-language if both are available:
https://cs.chromium.org/chromium/src/components/translate/core/language_detection/language_detection_util.cc?type=cs&sq=package:chromium&g=0&l=182

Plus, I could not reproduce the original issue: sogou.com is identified as CN.

Screenshot from 2018-12-19 14-56-28.png
101 KB View Download
Status: Assigned (was: WontFix)
I can still reproduce no translation being shown. Even though the adopted language resolves to CN,it does not get passed to translate because CLD3 can't detect the language in my case. See screenshot of the detection log. I think in these cases (if CLD3 is unreliable or und), we should take lang attribute as truth.
My bad didn't realize what the expected behavior was.

From what I gather (from chrome://translate-internals, the 'adopted language' field), the language _is_ detected using html lang tag to be CN. However, CN is not supported, hence we do not offer translation.

Is it possible the website sends an invalid language code through HTML lang? Or should CN be supported?
Status: WontFix (was: Assigned)
Ah yeah that's what's happening. CN is invalid. Site owner error here. Good catch.

Sign in to add a comment