PipelineIntegrationTest.BasicPlayback_VP9_Odd_WebM fails on Fuchsia FYI |
|||||||
Issue descriptionStarting at https://ci.chromium.org/buildbot/chromium.fyi/Fuchsia%20%28dbg%29/17646 this fails on Fuchsia/x64 Debug, and we see a similar failure on Fuchsia/x64 Release. This coincided with https://chromium-review.googlesource.com/945029 landing. Unfortunately the failures don't seem to be generating backtraces, so its not obvious what is going wrong.
,
Mar 27 2018
This may be an interaction between the test and the Fuchsia infrastructure. Strangely, this test only fails when run under TestLauncher, even when configured to run batch jobs sequentially.
,
Mar 27 2018
Running various test combinations again: - Running the test on its own, there is no crash. - Running the VP8A_Odd_WebM and VP9A_Odd_WebM tests (i.e. not VP9_Odd_WebM), the second test crashes, suggesting there is state from the prior test causing issues. - Running just the two VP9*_Odd_WebM tests, there is no crash. - Running the VP8A_Odd_WebM test on its own, I see a nonsense test execution time reported: [1/1] PipelineIntegrationTest.BasicPlayback_VP8A_Odd_WebM (-9223372036854775 ms) So I think the test at fault is PipelineIntegrationTest.BasicPlayback_VP8A_Odd_WebM and that it may be e.g. trashing the stack, which would explain why the backtrace from the crash is unhelpful.
,
Mar 27 2018
,
Mar 27 2018
The only thing (I think) my CL changes in how VP8A might work is with the Reset() changes, but I don't see any issues there. I.e. it now calls Reset(DoNothing()) instead of Reset(bound_reset_cb) which will post task that do nothing, but then we still call bound_reset_cb after that occurs, so execution order should be the same. CancellationHelper doesn't have a defined destructor, but I don't think that should matter.
,
Mar 27 2018
I'm suspicious that the VP8A test is returning NaN as its run-time. I tried reverting your CL and re-running the test and still awaiting NaN but haven't verified whether crash still repros.
,
Mar 27 2018
dalecurtis: Good news! Reverting your CL, this is still broken.
,
Mar 27 2018
Looks like the two VP8A_*_WebM tests have been returning run-time of NaN for a long time, but prior to this build, the VP8A_Odd_WebM test was the last one to run in the batch sub-process.
,
Mar 27 2018
Phew :)
,
Mar 27 2018
Could be libvpx alpha not working properly. You might try replacing the vp8a file with a normal vp8 clip and see what happens.
,
Mar 27 2018
Re the -NaN: Something very strange going on; I added print-out to the XmlUnitTestResultPrinter::OnTestEnd() implementation, printing the elapsed-time cast to double, the elapsed-time as raw TimeInMillis, the elapsed-time cast to double and divided to report seconds, and finally the millis-per-second constant and then ran BasicPlayback_VP*_WebM tests with --test-launcher-jobs=1 (so that we're using TestLauncher XML results files, but not swallowing stdio): Two tests then crashed - both VP8_Odd_WebM and VP8A_Odd_WebM: [00001.934] 04167.04206> [ RUN ] PipelineIntegrationTest.BasicPlayback_VP8A_WebM [00002.046] 04167.04206> [4167:1328009683:0327/212030.904426:2046069:ERROR:gtest_xml_unittest_result_printer.cc(83)] AS DOUBLE MILLIS:-nan [00002.046] 04167.04206> [4167:1328009683:0327/212030.904529:2046172:ERROR:gtest_xml_unittest_result_printer.cc(85)] RAW MILLIS: 111 [00002.047] 04167.04206> PageFault: 425850 free pages [00002.048] 04167.04206> PageFault: MemoryUsed: proc 1104 276M 'fshost' [00002.049] 04167.04206> PageFault: MemoryUsed: proc 1783 8M './media_unittests__exec' [00002.049] 04167.04206> PageFault: MemoryUsed: proc 4167 11M 'media_unittests__exec' [00002.049] 01200.01228> <== fatal exception: process media_unittests__exec[4167] thread initial-thread[4206] ... [00002.415] 04855.04903> [ RUN ] PipelineIntegrationTest.BasicPlayback_VP8A_Odd_WebM [00002.531] 04855.04903> [4855:51282981:0327/212031.389589:2531235:ERROR:gtest_xml_unittest_result_printer.cc(83)] AS DOUBLE MILLIS:-nan [00002.531] 04855.04903> [4855:51282981:0327/212031.389744:2531386:ERROR:gtest_xml_unittest_result_printer.cc(85)] RAW MILLIS: 115 [00002.532] 04855.04903> PageFault: 425904 free pages [00002.533] 04855.04903> PageFault: MemoryUsed: proc 1104 276M 'fshost' [00002.533] 04855.04903> PageFault: MemoryUsed: proc 1783 8M './media_unittests__exec' [00002.534] 04855.04903> PageFault: MemoryUsed: proc 4855 10M 'media_unittests__exec' [00002.534] 01200.01228> <== fatal exception: process media_unittests__exec[4855] thread initial-thread[4903] Somehow the first time we static_cast<double>(TimeInMillis) we get -NaN, and the second time we crash... O_o The TestInfo is being passed-in as a const reference, and we're dereferencing the TestResults to get at the TimeInMillis field, so perhaps they are going out-of-scope or being trashed on the stack..?
,
Mar 27 2018
FPU stuff not working might mean something caused an exception (0 division, etc).
,
Mar 28 2018
It appears that something in these tests is effectively breaking the Fuchsia vsprintf() implementation. Casting from |long long| to |double| creates a result that has the expected values for isnan(), isfinite() & isnormal(), and the bytes of the value look valid. There then seem to be two distinct issues, based on experiment: 1. Writing a double to cout or fprintf() causes it to be cast to a |long double|, and the value of that is -NaN for some reason. (Casting that double to long double before printing leads to an isnan() value). 2. Printing the isnan() value crashes the underlying vsprintf() implementation - cause TBD.
,
Mar 28 2018
+jzern: Is there anything you can think of in libvpx VP8 decode that might mess up subsequent FPU operations?
,
Mar 28 2018
Yes, there was an over ambitious removal of emms while we still have mmx/sse, I didn't realize that had gotten synced. I'll fix this with an update to libvpx.
,
Mar 28 2018
jzern: Could you provide a link to the CL where the overzealous removal(s) landed, for context? Regardless of libvpx leaving things in a funny state, we should not be crashing in vsprintf(), so we'd like to understand what needs fixing.
,
Mar 28 2018
Put emms usage back (external user found it broke clang/mingw win64 builds): https://chromium-review.googlesource.com/c/webm/libvpx/+/978007 I also tried to remove fldcw/fstcw: https://chromium-review.googlesource.com/c/webm/libvpx/+/978402 but that didn't make it to chromium. So probably the emms thing.
,
Mar 28 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b75c75d3c87c585ca36017b88d39b193a36d9cfc commit b75c75d3c87c585ca36017b88d39b193a36d9cfc Author: James Zern <jzern@chromium.org> Date: Wed Mar 28 23:04:32 2018 Roll src/third_party/libvpx/source/libvpx/ 1f82e0612..f4b1eca53 (15 commits) https://chromium.googlesource.com/webm/libvpx.git/+log/1f82e0612293..f4b1eca53e40 $ git log 1f82e0612..f4b1eca53 --date=short --no-merges --format='%ad %ae %s' 2018-03-23 martin Restore emms usage on x86_64 after 726b021a12c1b 2018-03-23 johannkoenig Revert "remove fldcw/fstcw from Win64 builds" 2018-03-05 johannkoenig remove fldcw/fstcw from Win64 builds 2018-03-21 martin configure: Add an arm64-win64-gcc target 2018-03-21 martin test: Check for ARCH_X86_64 in addition to _WIN64 2018-03-21 martin configure: Add an armv7-win32-gcc target 2018-03-21 martin ads2gas: Add a -noelf option 2018-03-23 martin thumb: Remove a brittle, ugly and unused arm->thumb conversion 2018-03-20 jianj vp9 svc frame drop: enable adaptive rd for row mt. 2018-03-22 linfengz Fix implicit-fallthrough warnings 2018-03-22 linfengz Fix dangling-else warnings 2018-03-21 linfengz Fix a strict-overflow warning 2018-03-21 linfengz Rename several static NEON iht functions 2018-03-21 marpan vp9-svc: Fix to sample encoder 2018-03-18 jzern vp9_highbd_iht8x8_add_neon: rm unused functions Created with: roll-dep src/third_party/libvpx/source/libvpx R=johannkoenig@chromium.org Bug: 826345 Change-Id: I2bd69b19e5d6105322e48eb49672376a1aa63742 Reviewed-on: https://chromium-review.googlesource.com/984903 Reviewed-by: Johann Koenig <johannkoenig@google.com> Commit-Queue: James Zern <jzern@google.com> Cr-Commit-Position: refs/heads/master@{#546646} [modify] https://crrev.com/b75c75d3c87c585ca36017b88d39b193a36d9cfc/DEPS [modify] https://crrev.com/b75c75d3c87c585ca36017b88d39b193a36d9cfc/third_party/libvpx/README.chromium [modify] https://crrev.com/b75c75d3c87c585ca36017b88d39b193a36d9cfc/third_party/libvpx/libvpx_srcs.gni [modify] https://crrev.com/b75c75d3c87c585ca36017b88d39b193a36d9cfc/third_party/libvpx/source/config/vpx_version.h
,
Mar 28 2018
Verified that these tests no longer crash after the libvpx roll. Fuchsia platform vsprintf() crashiness is tracked separately. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by dalecur...@chromium.org
, Mar 27 2018