trybot log turns non-ASCII characters to sequences of question marks |
|||
Issue descriptionTaken from https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8932223092662122288/+/steps/chrome_public_test_apk_on_Android_device_Nexus_5__with_patch_/0/logs/org.chromium.chrome.browser.payments.CurrencyFormatterTest_testMultipleConversions/0. The first line has '???'. Each byte of the UTF-8 representation of U+20xx is turned into a question mark. org.junit.ComparisonFailure: "USD" "1234" ("fr-FR" locale) should be formatted into "1 234,00 $" expected:<1[ ]234,00 $> but was:<1[???]234,00 $> It can be due to a few things. I don't know. 1) junit is to blame. It should explicitly set the character encoding to UTF-8 for its output stream instead of relying on the default encoding. 2) Java is run under C locale instead of UTF-8 locale. If #1 is done right, this does not matter. 3) Python code collecting the log garbles it. ASCII-codec is used instead of UTF-8. I suspect #3 is most likely. So, I'm filing it under Tryserver (well, despite that, I'm not sure if this is the right component.).
,
Oct 23
To be clear; the log collection code (as of issue 889582) now passes through bytes from stdout/stderr directly to the browser (and tells the browser that it's supposed to be UTF-8). If Chrome gets invalid UTF8 sequences, it will replace them with the unicode replacement character (� U+FFFD) when rendering. The original (unreplaced) log data can be obtained via 'raw' mode (by appending ?format=raw at the end of the URL: https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8932223092662122288/+/steps/chrome_public_test_apk_on_Android_device_Nexus_5__with_patch_/0/logs/org.chromium.chrome.browser.payments.CurrencyFormatterTest_testMultipleConversions/0?format=raw) Since your test is showing the literal ASCII characters ???, I would expect that something in JUnit/Java is converting the output stream to ASCII (? or maybe UTF8 using `?` as the replacement char?) before it gets to the logdog client. I don't have a log stream containing invalid UTF-8 handy though, or I would link to it (hinoka can post one though).
,
Oct 23
,
Oct 25
Thank you for the reply. The character in question is U+20xx and is certainly representatble in UTF-8. That is, this bug is not about invalid UTF-8 sequences but about a valid UTF-8 sequence (in an example in the bug report, it is a 3-byte sequence) that is turned into 3 question marks. Have to track down where the loss (conversion to question marks) is happening.
,
Oct 29
Yeah logdog is configured out to (mostly) pass the raw bytes through, so if you're seeing question marks instead of invalid UTF8, that means the underlying test framework is eating the invalid UTF8 charaters. |
|||
►
Sign in to add a comment |
|||
Comment 1 by no...@chromium.org
, Oct 20Components: -Infra>Platform>Buildbot>TryServer Infra>Platform>LogDog
Owner: hinoka@chromium.org
Status: Assigned (was: Untriaged)