Pokedex.org has a translate bar pop up asking to translate from Xhosa to English |
||||||
Issue descriptionVersion: 54.0.2840.25 OS: Android What steps will reproduce the problem? (1) Visit www.pokedex.org (2) See a translate infobar appear saying the page is in Xhosa What is the expected output? No translate infobar should appear since the site is in English. Particularly annoying as this will block the app install banner for the PWA.
,
Sep 15 2016
We recently switched our language detector to CLD3 -> abakalov@
,
Sep 15 2016
CLD2 detects this page as "unknown". The text from translate-internals under CLD2 (Chrome 53.0.2785.101 stable on Linux x64) looks like this: --- snip --- Pokedex.org Pokémon About Pokedex.org BulbasaurIvysaurVenusaurCharmanderCharmeleonCharizardSquirtleWartortleBlastoiseCaterpieMetapodButterfreeWeedleKakunaBeedrillPidgeyPidgeottoPidgeotRattataRaticateSpearowFearowEkansArbokPikachuRaichuSandshrewSandslashNidoran ♀NidorinaNidoqueenNidoran ♂NidorinoNidokingClefairyClefableVulpixNinetalesJigglypuffWigglytuffZubatGolbatOddishGloomVileplumeParasParasectVenonatVenomothDiglettDugtrioMeowthPersianPsyduckGolduckMankeyPrimeapeGrowlitheArcaninePoliwagPoliwhirlPoliwrathAbraKadabraAlakazamMachopMachokeMachampBellsproutWeepinbellVictreebelTentacoolTentacruelGeodudeGravelerGolemPonytaRapidashSlowpokeSlowbroMagnemiteMagnetonFarfetch'dDoduoDodrioSeelDewgongGrimerMukShellderCloysterGastlyHaunterGengarOnixDrowzeeHypnoKrabbyKinglerVoltorb An open-source site by Nolan Lawson, with help from Pokéapi. All content is © Nintendo, Game Freak, and The Pokémon Company. --- snip --- To be fair, there's not a lot of English on this page so I think it's reasonable that we're not detecting the site as English. Though it seems odd that it's being detected as anything at all. Maybe CLD3 should do something like give up if the text is one giant blob like that? I don't know. Anton, penny for your thoughts?
,
Sep 15 2016
Interesting case. As Andrew said, given that there is not a lot of English on the page and mostly Pokemon names, predicting "unknown" seems preferable to me than predicting "English". CLD3 predicts "Xhosa" with probability 0.67. We introduced a threshold below which predictions are marked as unreliable, and set it to 0.53 based on a comparison with CLD2's reliable/unreliable predictions. However, it seems to me that a better approach would be having a per-language threshold. Predictions marked as unreliable by CLD2 or CLD3 result in Chrome using "unknown".
,
Sep 15 2016
+1 to the idea of per-language probability thresholds. This will not be affect the size of the binary/model as well.
,
Sep 16 2016
Additional data point: https://crypto.graphics/ is detected as Danish on stable.
,
Sep 16 2016
This is the old model, CLD2, but the page is challenging for the new one as well. It predicts bg-Latn with relatively low probability (0.80), so per-language threshold tuning looks promising for this case as well.
,
Sep 16 2016
FWIW I don't feel this should block us from allowing CLD3 to go to stable as-is. These are edge cases, and while per-language threshold tuning sounds great, I don't think it is critical to success in M54. Anton, WDYT? If we want to do per-language thresholds, can you file a tracking bug and start linking these issues to it?
,
Sep 19 2016
I agree with Andrew. Also, as mentioned in the other thread, we’ll take a look at the UMA histograms once they collect data with the latest patch for a few days.
,
Dec 1 2016
Hey Anton, Did you ever get to per-language threshold tuning?
,
Dec 5 2016
Hi Andrew, Apologies for the delayed response! We tuned the global threshold more precisely because we were too generous initially. We also improved the pre-processing and added new features to the model. The prediction for pokedex.org (and crypto.graphics mentioned above) with the update we are preparing is "unknown", which I think is appropriate given the content. Thanks, Anton
,
Dec 7 2016
Thanks! Closing as fixed.
,
Apr 27 2017
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by dominickn@chromium.org
, Sep 15 2016206 KB
206 KB View Download