New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 826345 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: Fuchsia
Pri: 1
Type: Bug



Sign in to add a comment

PipelineIntegrationTest.BasicPlayback_VP9_Odd_WebM fails on Fuchsia FYI

Project Member Reported by w...@chromium.org, Mar 27 2018

Issue description

Starting at https://ci.chromium.org/buildbot/chromium.fyi/Fuchsia%20%28dbg%29/17646 this fails on Fuchsia/x64 Debug, and we see a similar failure on Fuchsia/x64 Release.

This coincided with https://chromium-review.googlesource.com/945029 landing.

Unfortunately the failures don't seem to be generating backtraces, so its not obvious what is going wrong.

 
Hmm, we can disable for now if needed. Otherwise, I'll try spinning up my fuschia vm in a few hours to see if I can repro.

Comment 2 by w...@chromium.org, Mar 27 2018

This may be an interaction between the test and the Fuchsia infrastructure.

Strangely, this test only fails when run under TestLauncher, even when
configured to run batch jobs sequentially.

Comment 3 by w...@chromium.org, Mar 27 2018

Running various test combinations again:
- Running the test on its own, there is no crash.
- Running the VP8A_Odd_WebM and VP9A_Odd_WebM tests (i.e. not VP9_Odd_WebM), the second test crashes, suggesting there is state from the prior test causing issues.
- Running just the two VP9*_Odd_WebM tests, there is no crash.
- Running the VP8A_Odd_WebM test on its own, I see a nonsense test execution time reported:

[1/1] PipelineIntegrationTest.BasicPlayback_VP8A_Odd_WebM (-9223372036854775 ms)

So I think the test at fault is PipelineIntegrationTest.BasicPlayback_VP8A_Odd_WebM and that it may be e.g. trashing the stack, which would explain why the backtrace from the crash is unhelpful.

Comment 4 by w...@chromium.org, Mar 27 2018

Cc: sergeyu@chromium.org
The only thing (I think) my CL changes in how VP8A might work is with the Reset() changes, but I don't see any issues there. I.e. it now calls Reset(DoNothing()) instead of Reset(bound_reset_cb) which will post task that do nothing, but then we still call bound_reset_cb after that occurs, so execution order should be the same.

CancellationHelper doesn't have a defined destructor, but I don't think that should matter.

Comment 6 by w...@chromium.org, Mar 27 2018

I'm suspicious that the VP8A test is returning NaN as its run-time. I tried
reverting your CL and re-running the test and still awaiting NaN but
haven't verified whether crash still repros.

Comment 7 by w...@chromium.org, Mar 27 2018

Owner: w...@chromium.org
Status: Started (was: Assigned)
dalecurtis: Good news!  Reverting your CL, this is still broken.

Comment 8 by w...@chromium.org, Mar 27 2018

Looks like the two VP8A_*_WebM tests have been returning run-time of NaN for a long time, but prior to this build, the VP8A_Odd_WebM test was the last one to run in the batch sub-process.
Phew :)
Cc: vigneshv@chromium.org
Could be libvpx alpha not working properly. You might try replacing the vp8a file with a normal vp8 clip and see what happens.

Comment 11 by w...@chromium.org, Mar 27 2018

Re the -NaN: Something very strange going on; I added print-out to the XmlUnitTestResultPrinter::OnTestEnd() implementation, printing the elapsed-time cast to double, the elapsed-time as raw TimeInMillis, the elapsed-time cast to double and divided to report seconds, and finally the millis-per-second constant and then ran BasicPlayback_VP*_WebM tests with --test-launcher-jobs=1 (so that we're using TestLauncher XML results files, but not swallowing stdio):

Two tests then crashed - both VP8_Odd_WebM and VP8A_Odd_WebM:

[00001.934] 04167.04206> [ RUN      ] PipelineIntegrationTest.BasicPlayback_VP8A_WebM
[00002.046] 04167.04206> [4167:1328009683:0327/212030.904426:2046069:ERROR:gtest_xml_unittest_result_printer.cc(83)] AS DOUBLE MILLIS:-nan
[00002.046] 04167.04206> [4167:1328009683:0327/212030.904529:2046172:ERROR:gtest_xml_unittest_result_printer.cc(85)] RAW MILLIS: 111
[00002.047] 04167.04206> PageFault: 425850 free pages
[00002.048] 04167.04206> PageFault: MemoryUsed: proc  1104  276M 'fshost'
[00002.049] 04167.04206> PageFault: MemoryUsed: proc  1783    8M './media_unittests__exec'
[00002.049] 04167.04206> PageFault: MemoryUsed: proc  4167   11M 'media_unittests__exec'
[00002.049] 01200.01228> <== fatal exception: process media_unittests__exec[4167] thread initial-thread[4206]

...

[00002.415] 04855.04903> [ RUN      ] PipelineIntegrationTest.BasicPlayback_VP8A_Odd_WebM
[00002.531] 04855.04903> [4855:51282981:0327/212031.389589:2531235:ERROR:gtest_xml_unittest_result_printer.cc(83)] AS DOUBLE MILLIS:-nan
[00002.531] 04855.04903> [4855:51282981:0327/212031.389744:2531386:ERROR:gtest_xml_unittest_result_printer.cc(85)] RAW MILLIS: 115
[00002.532] 04855.04903> PageFault: 425904 free pages
[00002.533] 04855.04903> PageFault: MemoryUsed: proc  1104  276M 'fshost'
[00002.533] 04855.04903> PageFault: MemoryUsed: proc  1783    8M './media_unittests__exec'
[00002.534] 04855.04903> PageFault: MemoryUsed: proc  4855   10M 'media_unittests__exec'
[00002.534] 01200.01228> <== fatal exception: process media_unittests__exec[4855] thread initial-thread[4903]

Somehow the first time we static_cast<double>(TimeInMillis) we get -NaN, and the second time we crash... O_o

The TestInfo is being passed-in as a const reference, and we're dereferencing the TestResults to get at the TimeInMillis field, so perhaps they are going out-of-scope or being trashed on the stack..?
FPU stuff not working might mean something caused an exception (0 division, etc).

Comment 13 by w...@chromium.org, Mar 28 2018

It appears that something in these tests is effectively breaking the Fuchsia vsprintf() implementation.

Casting from |long long| to |double| creates a result that has the expected values for isnan(), isfinite() & isnormal(), and the bytes of the value look valid.

There then seem to be two distinct issues, based on experiment:

1. Writing a double to cout or fprintf() causes it to be cast to a |long double|, and the value of that is -NaN for some reason.  (Casting that double to long double before printing leads to an isnan() value).

2. Printing the isnan() value crashes the underlying vsprintf() implementation - cause TBD.

Comment 14 by w...@chromium.org, Mar 28 2018

Cc: jzern@chromium.org
+jzern: Is there anything you can think of in libvpx VP8 decode that might mess up subsequent FPU operations?

Comment 15 by jzern@chromium.org, Mar 28 2018

Yes, there was an over ambitious removal of emms while we still have mmx/sse, I didn't realize that had gotten synced. I'll fix this with an update to libvpx.

Comment 16 by w...@chromium.org, Mar 28 2018

jzern: Could you provide a link to the CL where the overzealous removal(s)
landed, for context? Regardless of libvpx leaving things in a funny state,
we should not be crashing in vsprintf(), so we'd like to understand what
needs fixing.
Cc: johannko...@google.com
Put emms usage back (external user found it broke clang/mingw win64 builds): https://chromium-review.googlesource.com/c/webm/libvpx/+/978007

I also tried to remove fldcw/fstcw:
https://chromium-review.googlesource.com/c/webm/libvpx/+/978402

but that didn't make it to chromium. So probably the emms thing.
Project Member

Comment 18 by bugdroid1@chromium.org, Mar 28 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b75c75d3c87c585ca36017b88d39b193a36d9cfc

commit b75c75d3c87c585ca36017b88d39b193a36d9cfc
Author: James Zern <jzern@chromium.org>
Date: Wed Mar 28 23:04:32 2018

Roll src/third_party/libvpx/source/libvpx/ 1f82e0612..f4b1eca53 (15 commits)

https://chromium.googlesource.com/webm/libvpx.git/+log/1f82e0612293..f4b1eca53e40

$ git log 1f82e0612..f4b1eca53 --date=short --no-merges --format='%ad %ae %s'
2018-03-23 martin Restore emms usage on x86_64 after 726b021a12c1b
2018-03-23 johannkoenig Revert "remove fldcw/fstcw from Win64 builds"
2018-03-05 johannkoenig remove fldcw/fstcw from Win64 builds
2018-03-21 martin configure: Add an arm64-win64-gcc target
2018-03-21 martin test: Check for ARCH_X86_64 in addition to _WIN64
2018-03-21 martin configure: Add an armv7-win32-gcc target
2018-03-21 martin ads2gas: Add a -noelf option
2018-03-23 martin thumb: Remove a brittle, ugly and unused arm->thumb conversion
2018-03-20 jianj vp9 svc frame drop: enable adaptive rd for row mt.
2018-03-22 linfengz Fix implicit-fallthrough warnings
2018-03-22 linfengz Fix dangling-else warnings
2018-03-21 linfengz Fix a strict-overflow warning
2018-03-21 linfengz Rename several static NEON iht functions
2018-03-21 marpan vp9-svc: Fix to sample encoder
2018-03-18 jzern vp9_highbd_iht8x8_add_neon: rm unused functions

Created with:
  roll-dep src/third_party/libvpx/source/libvpx
R=johannkoenig@chromium.org

Bug:  826345 
Change-Id: I2bd69b19e5d6105322e48eb49672376a1aa63742
Reviewed-on: https://chromium-review.googlesource.com/984903
Reviewed-by: Johann Koenig <johannkoenig@google.com>
Commit-Queue: James Zern <jzern@google.com>
Cr-Commit-Position: refs/heads/master@{#546646}
[modify] https://crrev.com/b75c75d3c87c585ca36017b88d39b193a36d9cfc/DEPS
[modify] https://crrev.com/b75c75d3c87c585ca36017b88d39b193a36d9cfc/third_party/libvpx/README.chromium
[modify] https://crrev.com/b75c75d3c87c585ca36017b88d39b193a36d9cfc/third_party/libvpx/libvpx_srcs.gni
[modify] https://crrev.com/b75c75d3c87c585ca36017b88d39b193a36d9cfc/third_party/libvpx/source/config/vpx_version.h

Comment 19 by w...@chromium.org, Mar 28 2018

Status: Fixed (was: Started)
Verified that these tests no longer crash after the libvpx roll.

Fuchsia platform vsprintf() crashiness is tracked separately.

Sign in to add a comment