If a site has both an html lang attribute specified and a content-language header, we should use the lang attribute to determine the page's language.
Reported by
dchau...@etouch.net,
Jul 28 2017
|
|||||||||||||
Issue descriptionChrome Version: 61.0.3163.16 (Official Build)742cbfe2e3f95476845193f35d4298335c22a522-refs/branch-heads/3163@{#97} 32/64-bit. OS: Windows(7,8,10), Mac(10.11.6,10.12.3,10.12.5), Linux(14.04 LTS). What steps will reproduce the problem? 1. Launch chrome, go to https://www.sogou.com/ and observe. Translate bubble doesn't appear. Translate bubble should appear. This is a non-regression issue, seen from M-45 series. Kindly review the attached screen-cast for reference.
,
Aug 4 2017
This is a Translate issue, not a bubble one -> reclassifying. In geeral, sogou.com is served with a "content-language:en-us" header, which Translate currently interprets as "This is english content". You could make a weak argument that this is _not_ what the standard (https://tools.ietf.org/html/rfc2616#section-14.12) says: " The Content-Language entity-header field describes the natural language(s) **of the intended audience** for the enclosed entity" (emphasis added) I.e. it says the natural language of the reader is en-US. That is... technically correct, the best kind of correct. But that's contradicted by the later https://tools.ietf.org/html/rfc3282. Of course, the next RFC goes back to the original definition: https://tools.ietf.org/html/rfc7231#section-3.1.3.2 We have so many standards so we can choose the appropriate one... But kidding aside, given that the page also serves an <html lang="cn"> I would argue that that attribute should override content-language.
,
Aug 5 2017
Jon - let's take a look at this + the most frequently mismatched languages b/w what we detect and what content-language header and lang attribute say (I think that data should be in now) and decide how/whether to adjust the logic. I think at the very least we should take to heart groby@'s point that if a site has both a lang attribute specified and a content-language header we should use the lang attribute.
,
Jan 18 2018
,
Feb 5 2018
Update: Retested this issue on Windows(7,8,10), Mac(10.11.6,10.12.3,10.12.5) and Linux(14.04 LTS) machines on using Stable/Beta (build # 64.0.3282.140), Dev (build # 65.0.3325.31) and Canary (build # 66.0.3340.0). Issue is still persist. Thank you.
,
Feb 5 2018
Our fix for the most mismatched languages does not address the issue where a site's own content-language header and html lang attribute disagree. We should change the logic deciding what page a language is in so that if a site has both a lang attribute specified and a content-language header, we use the lang attribute.
,
Feb 5 2018
,
Feb 5 2018
@mathp - this would be a good first bug for the new language team folks
,
Mar 7 2018
,
Jun 7 2018
,
Nov 6
+frechette@ who's looking at HTML lang
,
Dec 19
We do adopt HTML lang over content-language if both are available: https://cs.chromium.org/chromium/src/components/translate/core/language_detection/language_detection_util.cc?type=cs&sq=package:chromium&g=0&l=182 Plus, I could not reproduce the original issue: sogou.com is identified as CN.
,
Dec 19
,
Dec 19
I can still reproduce no translation being shown. Even though the adopted language resolves to CN,it does not get passed to translate because CLD3 can't detect the language in my case. See screenshot of the detection log. I think in these cases (if CLD3 is unreliable or und), we should take lang attribute as truth.
,
Dec 20
My bad didn't realize what the expected behavior was. From what I gather (from chrome://translate-internals, the 'adopted language' field), the language _is_ detected using html lang tag to be CN. However, CN is not supported, hence we do not offer translation. Is it possible the website sends an invalid language code through HTML lang? Or should CN be supported?
,
Dec 20
Ah yeah that's what's happening. CN is invalid. Site owner error here. Good catch. |
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by rbasuvula@chromium.org
, Jul 28 2017