New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 647113 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Dec 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug



Sign in to add a comment

Pokedex.org has a translate bar pop up asking to translate from Xhosa to English

Project Member Reported by dominickn@chromium.org, Sep 15 2016

Issue description

Version: 54.0.2840.25
OS: Android

What steps will reproduce the problem?
(1) Visit www.pokedex.org
(2) See a translate infobar appear saying the page is in Xhosa

What is the expected output?

No translate infobar should appear since the site is in English.

Particularly annoying as this will block the app install banner for the PWA.
 
See attached screenshot showing the stacked infobars.
Screenshot_20160915-131155.png
206 KB View Download

Comment 2 by groby@chromium.org, Sep 15 2016

Owner: abakalov@chromium.org
Status: Assigned (was: Untriaged)
We recently switched our language detector to CLD3 -> abakalov@ 
Cc: andrewhayden@chromium.org
CLD2 detects this page as "unknown". The text from translate-internals under CLD2 (Chrome 53.0.2785.101 stable on Linux x64) looks like this:

--- snip ---
Pokedex.org
Pokémon
About

Pokedex.org

BulbasaurIvysaurVenusaurCharmanderCharmeleonCharizardSquirtleWartortleBlastoiseCaterpieMetapodButterfreeWeedleKakunaBeedrillPidgeyPidgeottoPidgeotRattataRaticateSpearowFearowEkansArbokPikachuRaichuSandshrewSandslashNidoran ♀NidorinaNidoqueenNidoran ♂NidorinoNidokingClefairyClefableVulpixNinetalesJigglypuffWigglytuffZubatGolbatOddishGloomVileplumeParasParasectVenonatVenomothDiglettDugtrioMeowthPersianPsyduckGolduckMankeyPrimeapeGrowlitheArcaninePoliwagPoliwhirlPoliwrathAbraKadabraAlakazamMachopMachokeMachampBellsproutWeepinbellVictreebelTentacoolTentacruelGeodudeGravelerGolemPonytaRapidashSlowpokeSlowbroMagnemiteMagnetonFarfetch'dDoduoDodrioSeelDewgongGrimerMukShellderCloysterGastlyHaunterGengarOnixDrowzeeHypnoKrabbyKinglerVoltorb

An open-source site by Nolan Lawson, with help from Pokéapi. 
All content is © Nintendo, Game Freak, and The Pokémon Company.
--- snip ---

To be fair, there's not a lot of English on this page so I think it's reasonable that we're not detecting the site as English. Though it seems odd that it's being detected as anything at all. Maybe CLD3 should do something like give up if the text is one giant blob like that? I don't know. Anton, penny for your thoughts?
Cc: djweiss@chromium.org slav@google.com riesa@chromium.org
Interesting case. As Andrew said, given that there is not a lot of English on the page and mostly Pokemon names, predicting "unknown" seems preferable to me than predicting "English".

CLD3 predicts "Xhosa" with probability 0.67. We introduced a threshold below which predictions are marked as unreliable, and set it to 0.53 based on a comparison with CLD2's reliable/unreliable predictions. However, it seems to me that a better approach would be having a per-language threshold. Predictions marked as unreliable by CLD2 or CLD3 result in Chrome using "unknown".
+1 to the idea of per-language probability thresholds. This will not be affect the size of the binary/model as well.
Additional data point: https://crypto.graphics/ is detected as Danish on stable.
This is the old model, CLD2, but the page is challenging for the new one as well. It predicts bg-Latn with relatively low probability (0.80), so per-language threshold tuning looks promising for this case as well.
FWIW I don't feel this should block us from allowing CLD3 to go to stable as-is. These are edge cases, and while per-language threshold tuning sounds great, I don't think it is critical to success in M54. Anton, WDYT? If we want to do per-language thresholds, can you file a tracking bug and start linking these issues to it?
I agree with Andrew. Also, as mentioned in the other thread, we’ll take a look at the UMA histograms once they collect data with the latest patch for a few days.
Hey Anton,

Did you ever get to per-language threshold tuning?
Hi Andrew,

Apologies for the delayed response! We tuned the global threshold more precisely because we were too generous initially. We also improved the pre-processing and added new features to the model. The prediction for pokedex.org (and crypto.graphics mentioned above) with the update we are preparing is "unknown", which I think is appropriate given the content.

Thanks,
Anton 
Status: Fixed (was: Assigned)
Thanks! Closing as fixed.
Components: -UI>Browser>Translate UI>Browser>Language>Translate

Sign in to add a comment