TranslateEventProtos with missing features |
|||||
Issue descriptionSome TranslateEventProtos have a defined event_type, but feature fields such as source language and target language are not filled in. As far as I understand, this should never happen. It seems to happen most often with USER_CONTEXT_MENU_TRANSLATE events, but is also seen for USER_REVERT, USER_ACCEPT, USER_ALWAYS_TRANSLATE_LANGUAGE, and USER_DECLINE. See dremel query: https://plx.corp.google.com/script/#a=qo%7Ci=google%253A%253Ascript_1b._9ca5fc_52d6_4f97_a025_f88d6c35d187 Comparing to total event numbers for each type (https://plx.corp.google.com/script/#a=qo%7Ci=google%253A%253Ascript_1f._829f97_fbf1_4071_80eb_22cab0daf66a), this bug affect about 40% (529/1267 on 2017-01-04) of USER_CONTEXT_MENU_TRANSLATE events, and a much smaller percentage (<10%) of other events.
,
Jun 11 2017
,
Jan 25 2018
I recent count shows that only about 0.2% of data as an empty source language string for "explicit events" (events used to train the model). However, this seems to happen more often with USER_ACCEPT (0.75%) than the other negative outcomes. This seems odd to me. https://plx.corp.google.com/script/#a=qo%7Ci=google%253A%253Ascript_ae._29b300_09ae_464a_809a_a868cd8952b3 It would be good to understand what these cases are. The fact that the UI is shown to the user and that these events are not "UNSUPPORTED_LANGUAGE" probably points to something wrong in how we get the source_language and target_language for inference.
,
Jan 25 2018
Looking at numbers where ranker is not enabled, empty source language happens 0.1% of the time for explicit events, but 1% when the event type is USER_ACCEPT. So, it looks like there is a correlation between both facts, but it is hard to say which one causes which. I am adding this bug to issues with TEPs described in b/72396986
,
Jan 26 2018
After investigation, we found that most of these events are TEPs that have not been initialized. So, it means that there is a code path that can show translate UI without going through TranslateManager.InitiateTranslation.
The simple solution to this is to verify that the TEP has been initialized when we call RecordTranslateEvent. We can also filter these events out on the server-side when computing metrics and building training datasets.
There is still a small number of events (0.06%) that have an empty source language, but where target_language is not empty. In this case, the TEP has been properly initialized, but for some reason, either TranslateDownloadManager::IsSupportedLanguage('') returns True, or some other code path eventually shows the UI. These events are very rare, so we can probably just ignore them.
,
Jan 26 2018
,
Jan 29 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d59d15f88feaaaa616bbd049f105dc4ceaa46a75 commit d59d15f88feaaaa616bbd049f105dc4ceaa46a75 Author: Mathieu Perreault <mathp@chromium.org> Date: Mon Jan 29 21:18:02 2018 [Translate] Provide a better default value for proto field Bug: 678689 Change-Id: I9f47f8f59f86621327d5ace0b2431b1fc48ff900 Reviewed-on: https://chromium-review.googlesource.com/889823 Reviewed-by: Robert Kaplow <rkaplow@chromium.org> Commit-Queue: Mathieu Perreault <mathp@chromium.org> Cr-Commit-Position: refs/heads/master@{#532591} [modify] https://crrev.com/d59d15f88feaaaa616bbd049f105dc4ceaa46a75/third_party/metrics_proto/translate_event.proto
,
May 11 2018
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by hamelphi@chromium.org
, Jan 5 2017