Issue metadata
Sign in to add a comment
|
Pixel_Video_VP9 fails on Android FYI Release (Nexus 9), Nexus 5, and on Windows Debug with AMD GPU |
||||||||||||||||||||
Issue descriptionStarted in https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Android%20FYI%20Release%20%28Nexus%209%29/5312 Looks like the generated image is from a wrong frame. http://chromium-browser-gpu-tests.commondatastorage.googleapis.com/view_test_results.html?7baa2dc10e393f3ca4d052555f00d12680ca5366_Android_FYI_Release_Nexus_9__telemetry Suspecting https://chromium-review.googlesource.com/1091649 liberato@, please suppress this test if there is no quick fix.
,
Jun 28 2018
i ended up reverting the CL: https://chromium-review.googlesource.com/c/chromium/src/+/1119306 and we'll see if it goes back to normal.
,
Jun 28 2018
actually, it's not a different frame. the width of the displayed frame is different -- it takes the entire output width in the generated column, while it doesn't in the reference. that makes it much less likely to involve my change.
,
Jun 28 2018
i don't think it's my change. https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Android%20FYI%20Release%20%28Nexus%209%29/5315 is the next passing run, but didn't include the revert. i'll try re-landing tomorrow to see what happens.
,
Jun 29 2018
I don't see anything in https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Android%20FYI%20Release%20%28Nexus%209%29/5315 blamelist which could be related. There is a possibility that the failure was flaky. All the builds are green after the revert landed in https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Android%20FYI%20Release%20%28Nexus%209%29/5316. You are welcome to try and reland it, but please watch this bot afterwards.
,
Jun 29 2018
yeah, i'm a little unclear on it myself. on one hand, the timing seems strongly connected. on the other, there are two things that don't fit. one is that the bot went green in 5316, before the revert landed. the other is that, from the failure diffs, it looks like the video layer has changed size in the renderer compositor, rather than due to some decoding artifact. there are some weird things that can happen if the sizes don't match, but i think they'd all clip to the layer size at the end of the day. i started a reland. that will tell us for sure.
,
Jun 29 2018
i just had the same test fail on a win7 trybot, for an entirely unrelated CL (haven't relanded this one yet): https://chromium-review.googlesource.com/c/chromium/src/+/1108195 i think this test has something up with it. i'll see if i can get a local repro.
,
Jun 29 2018
On Win7 it was a different failure mode - a timeout.
,
Jun 29 2018
+lethalantidote the video frame in the failure looks like it's exactly what one would expect if one used the coded size of the video frame rather than the natural size (or maybe visible rect size -- i always get those confused). the former is 384x240, while the latter is 320x240. when i scale the reference image horizontally by 384/320 (1.2), it's almost an exact match for the test output. it seems likely that, somewhere or other, the coded size is being used where the one of the other sizes should be. there are several places in VideoFrameSubmitter / VideoFrameResourceUpdater that do this, which could actually set the layer bounds in SurfaceLayerBridge. i don't know if this test is has the UseSurfaceLayerForVideo flag set or not. i also wasn't able to repro locally with it enabled. i didn't see any other references to coded_size that looked wrong, though.
,
Jun 29 2018
i re-landed the change just now. i'll keep an eye on the bots.
,
Jun 29 2018
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Android%20FYI%20Release%20%28Nexus%209%29/5366, which has the reland, failed.
,
Jun 29 2018
Now I'm thinking that WebglConformance_conformance_textures_image_bitmap_from_video_tex_2d_rgb_rgb_unsigned_short_5_6_5 and WebglConformance_conformance_textures_image_bitmap_from_video_tex_2d_rgba_rgba_unsigned_short_5_5_5_1 failures are also related. Original landing attempt also had flakes in WebglConformance_conformance_textures_image_bitmap_from_video_tex_2d_rgba_rgba_unsigned_short_5_5_5_1 and WebglConformance_conformance_textures_video_tex_2d_rgba_rgba_unsigned_short_4_4_4_4.
,
Jun 29 2018
i've started another revert.
,
Jun 29 2018
i've found a nexus 9. if i can get it to start, then maybe i can repro locally.
,
Jun 29 2018
Note that it's also possible to debug on a swarmed bot. To use the functionality, visit a Swarming task you want to investigate, click the Debug button, specify the duration of the lease and follow the instructions that will show up.
,
Jul 2
i managed to get my n9 working. haven't gotten a repro yet. however, one thing i noticed is that vp9 on a nexus 9 doesn't support hw decoding for vp9. also, for any device, videos smaller than 480 width, such as the video in this test, should also not use the hw path, even if it exists. so, that means the entire cl should be a no-op. literally no code should be executed that it touches. i've verified that content shell does skip both hw decoders for this video. yet it clearly does affect something.
,
Jul 3
run_gpu_integration_tests.py doesn't reproduce the failure locally. it does fail, but with some very small pixel differences that don't seem to depend on whether my patch is applied or not. it also turns out that the hw vp9 decoder on my N5X seems to be the odd one. it reports a size of 384, but the libvpx software decoders all say 320. i haven't seen a 384 come out of MediaCodec on the N9, which uses libvpx internally since there's no hw support for it. it's too bad, since forcing the layer to 384x240 makes the output image look the same as the test output from the failing runs. about all i can think to do is to split up the original CL into pieces and land them, and see which one causes the failure. i don't expect this to tell me much, since none of the code runs on a N9 unless i modify chrome to make it.
,
Jul 3
i tried running all the tests, rather than just pixel_video_vp9. they all pass locally, except for pixel_video_vp9 with or without my patch. not sure if that's related or not. seems odd that it passes on the bots without my patch. there has been a roll of vp9 since then, so maybe it's doing something slightly differently. i'll try at ToT. not that it has anything to do with the actual failure, but i'd feel better if i could at least get something to work properly with this test.
,
Jul 3
when i run with the repro instructions on the bot, using the right isolate, the VP9 test passes.
,
Jul 5
Sorry for the difficulty with this test on this bot. If it looks like anything in this test might be flaky, for example when it reports "SUCCESS" to the test harness: https://cs.chromium.org/chromium/src/content/test/data/gpu/pixel_video_vp9.html?q=pixel_video_vp&sq=package:chromium&g=0&l=1 then please let us know. The intent was that the first frame was available, decoded and displaying when the screenshot is taken. Otherwise if you feel that you're reaching diminishing returns then feel free to suppress this failure and leave this WontFix.
,
Jul 5
,
Jul 6
The same failure mode is now being seen on Windows with an AMD GPU in Debug mode in Issue 860809 . On this bot the first failing build is: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win7%20FYI%20Debug%20%28AMD%29/753 and since it's flaky I think that this build is definitely before the regression: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win7%20FYI%20Debug%20%28AMD%29/747 leaving this regression range: http://crrev.com/b34409ecd41d83dbaaa717994236629ae04a8625..284de34d39b7bc4263c227f0df442d91b6b05618 This includes: Creates the SurfaceLayer early for Video. https://chromium.googlesource.com/chromium/src/+/96c18b698ee5f6bf462fbffdc684bba0bd70fcbe Somehow I think this CL is causing the display of the first video frame to sometimes be incorrectly sized. CJ, can you own this bug? Per these instructions: https://chromium.googlesource.com/chromium/src/+/HEAD/docs/gpu/gpu_testing.md#Running-the-GPU-Tests-Locally you can potentially repro locally with a Debug Windows build by running e.g.: run_gpu_integration_test.py pixel --browser=debug --test-filter=Pixel_Video_VP9 over and over again.
,
Jul 6
,
Jul 6
Issue 860548 has been merged into this issue.
,
Jul 6
,
Jul 6
there was an errant call to set the cc layer bounds in SurfaceLayerBridge, which might cause this. the fix hasn't landed yet (https://chromium-review.googlesource.com/c/chromium/src/+/1126450). the caveat is that i wasn't able to directly link that call with the failures on the bots, since blink would always reset the layer bounds to the correct value before anything was drawn. however, if there is a race, then it certainly could have the effects we've been seeing.
,
Jul 6
also, i just started a re-land of https://chromium-review.googlesource.com/1091649 . kbr: thanks re c#20.
,
Jul 6
Thanks for the info. I updated the description of that CL to point to this bug. Hoping that it will fix the flakiness. Let's please try unmarking this test as flaky once it lands.
,
Jul 6
https://chromium-review.googlesource.com/c/chromium/src/+/1128265 which marks the test as flaky on win7 debug amd is in CQ. liberato@ do you want me to hold off on that?
,
Jul 6
yes please!
,
Jul 9
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b642a3cd111a2b2ca1b312e77183f71890d3a7a3 commit b642a3cd111a2b2ca1b312e77183f71890d3a7a3 Author: Mounir Lamouri <mlamouri@chromium.org> Date: Mon Jul 09 15:45:31 2018 Picture-in-Picture: various cleanups in WMPI and SurfaceLayerBridge. Cleanups are: - WMPI: drops the usage of the callback for delayed Picture-in-Picture; - WMPI: drops pip_surface_id_ and request it from the bridge_ instead; - SurfaceLayerBridge: no longer set primary surface id when surface is created; - SurfaceLayerBridge: no longer sets size when surface is activated. Bug: 858826 Cq-Include-Trybots: luci.chromium.try:linux_layout_tests_slimming_paint_v2;master.tryserver.blink:linux_trusty_blink_rel Change-Id: I7ed0490207aa3966e5dc8d04e796558e8965232f Reviewed-on: https://chromium-review.googlesource.com/1126450 Reviewed-by: Daniel Cheng <dcheng@chromium.org> Reviewed-by: Jochen Eisinger <jochen@chromium.org> Reviewed-by: Frank Liberato <liberato@chromium.org> Reviewed-by: Fady Samuel <fsamuel@chromium.org> Commit-Queue: Mounir Lamouri <mlamouri@chromium.org> Cr-Commit-Position: refs/heads/master@{#573319} [modify] https://crrev.com/b642a3cd111a2b2ca1b312e77183f71890d3a7a3/media/blink/webmediaplayer_impl.cc [modify] https://crrev.com/b642a3cd111a2b2ca1b312e77183f71890d3a7a3/media/blink/webmediaplayer_impl.h [modify] https://crrev.com/b642a3cd111a2b2ca1b312e77183f71890d3a7a3/media/blink/webmediaplayer_impl_unittest.cc [modify] https://crrev.com/b642a3cd111a2b2ca1b312e77183f71890d3a7a3/third_party/blink/public/platform/web_surface_layer_bridge.h [modify] https://crrev.com/b642a3cd111a2b2ca1b312e77183f71890d3a7a3/third_party/blink/renderer/platform/graphics/surface_layer_bridge.cc [modify] https://crrev.com/b642a3cd111a2b2ca1b312e77183f71890d3a7a3/third_party/blink/renderer/platform/graphics/surface_layer_bridge.h
,
Jul 9
gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance_textures_image_bitmap_from_video_tex_2d_rgb_rgb_unsigned_short_5_6_5 is failing now. it started failing in https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Android%20FYI%20Release%20%28Nexus%209%29/5630 which doesn't include my reland. that one showed up two builds earlier in https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Android%20FYI%20Release%20%28Nexus%209%29/5628 which passed. there have been scattered successes since then, so it's possible that we got two in a row after it landed. i'll retry locally.
,
Jul 9
It seems that the failures are flaky. https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Android%20FYI%20Release%20%28Nexus%209%29/5702 WebglConformance_conformance_textures_video_tex_2d_rgba_rgba_unsigned_short_4_4_4_4 https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Android%20FYI%20Release%20%28Nexus%209%29/5701 WebglConformance_conformance_textures_video_tex_2d_rgba_rgba_unsigned_short_5_5_5_1 https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Android%20FYI%20Release%20%28Nexus%209%29/5700 WebglConformance_conformance_textures_video_tex_2d_rgba_rgba_unsigned_short_4_4_4_4 https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Android%20FYI%20Release%20%28Nexus%209%29/5699 WebglConformance_conformance_textures_image_bitmap_from_video_tex_2d_rgb_rgb_unsigned_short_5_6_5 WebglConformance_conformance_textures_image_bitmap_from_video_tex_2d_rgba_rgba_unsigned_short_5_5_5_1 WebglConformance_conformance_textures_video_tex_2d_rgba_rgba_unsigned_short_5_5_5_1
,
Jul 9
,
Jul 9
,
Jul 9
turns out that it depends on the test that ran before. if i run video...4_4_4_4 by itself, then it succeeds. if i run textures_video_tex_2d_rgba_rgba_unsigned_byte then _4_4_4_4, then the latter fails with an invalid enum. further, this goes away if i revert my patch. finally, it's something i can actually debug. :)
,
Jul 9
it's a sync token thing. we're tearing down a video texture before the compositor is done with it, in some cases. i'm not sure that this ever worked reliably. however, i can see how my patch would make it worse. i need to think about it more.
,
Jul 10
when the test ends, the player is torn down => immediately mark all cc resources as lost => drop all video frames in renderer => notify gpu process about each video frame => wait for sync token for that frame => make texture unrenderable by removing the gl image => gl errors. the trick is that the "immediately mark all cc resources as lost" part is wrong. they're still in use by the browser compositor. so, the sync token than the gpu waits on is from a point that's too early. the right solution, i think, is to avoid starting teardown of the renderer cc pieces until any outstanding resources from viz are returned. that's somewhat of a big change, though. simpler is to fix the "make textures unrenderable" part in the gpu. we still have to free up some of the video decoder resourcess that the texture might be holding, else decoding progress could stop in some other, unrelated cases. viz might draw the wrong (earlier) video frame in some cases, but it'll be a valid frame. still technically a race, of course.
,
Jul 10
You're correct that in-flight resources may be destroyed before they're used by the browser and cause GL errors, but that's expected and acceptable normally (causing at worst a black flash while a tab is being closed). I think the issue here is that those errors happen while the next test is running. We should wait for a quiescent state and then clean up any errors between tests.
,
Jul 11
Where's the INVALID_ENUM error showing up? At the compositor level, the video decoding stack, or up at the JavaScript level (appearing in the WebGL test)? It should be OK for the compositor to receive such errors. Probably not OK for the video stack to see them. Definitely not OK for JavaScript to see them. Of course, a future test should not be receiving errors from previous pages. It's difficult to see how to quiesce things between tests. The WebGL conformance tests in particular simply navigate the same renderer process from URL to URL, and it's important that they continue to do so rather than run each test in a new tab because this covers real-world usage of the browser.
,
Jul 11
How about navigating to blank page between tests?
,
Jul 11
We could potentially do that but it diverges from the way that most people use the browser. If these errors are showing up in WebGL tests then one could imagine they could show up and have a user-visible effect when clicking links between pages that contain videos.
,
Jul 11
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/00c9a8cb21b0bd26037458b9ebe1d94a8b3204f1 commit 00c9a8cb21b0bd26037458b9ebe1d94a8b3204f1 Author: liberato@chromium.org <liberato@chromium.org> Date: Wed Jul 11 22:12:36 2018 Don't make MCVD textures unrenderable. Previously, we would make MCVD video frame textures unrenderable after the VideoFrame was destroyed and the sync token had cleared. This prevented the image from holding onto MediaCodec buffers unnecessarily, since the actual destruction of the texture might be delayed until the GL context was made current. However, it turns out that the sync token isn't always correct. If the VideoResourceUpdater is destroyed, then it immediately releases all VideoFrames with whatever sync token they have, without waiting for viz to return the resources (and a new sync token!). This shows up during tests quite a bit. Ideally, it would wait. However, that's nontrivial and likely has other side-effects. This CL frees the MediaCodec buffers when the sync token clears, but doesn't actually disconnect the CodecImage from the texture. So, any existing frame will remain renderable. If the image hadn't been rendered yet, then it will render the most recently rendered one, which is still a race. However, in practice, it works. Bug: 858826 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: Ia09b00a68964e886a05acf19f3623b4c87c31115 Reviewed-on: https://chromium-review.googlesource.com/1132115 Reviewed-by: Dale Curtis <dalecurtis@chromium.org> Commit-Queue: Frank Liberato <liberato@chromium.org> Cr-Commit-Position: refs/heads/master@{#574365} [modify] https://crrev.com/00c9a8cb21b0bd26037458b9ebe1d94a8b3204f1/media/gpu/android/codec_image.cc [modify] https://crrev.com/00c9a8cb21b0bd26037458b9ebe1d94a8b3204f1/media/gpu/android/codec_image.h [modify] https://crrev.com/00c9a8cb21b0bd26037458b9ebe1d94a8b3204f1/media/gpu/android/codec_image_group.cc [modify] https://crrev.com/00c9a8cb21b0bd26037458b9ebe1d94a8b3204f1/media/gpu/android/codec_image_group_unittest.cc [modify] https://crrev.com/00c9a8cb21b0bd26037458b9ebe1d94a8b3204f1/media/gpu/android/texture_pool.cc [modify] https://crrev.com/00c9a8cb21b0bd26037458b9ebe1d94a8b3204f1/media/gpu/android/texture_pool.h [modify] https://crrev.com/00c9a8cb21b0bd26037458b9ebe1d94a8b3204f1/media/gpu/android/texture_pool_unittest.cc [modify] https://crrev.com/00c9a8cb21b0bd26037458b9ebe1d94a8b3204f1/media/gpu/android/video_frame_factory_impl.cc [modify] https://crrev.com/00c9a8cb21b0bd26037458b9ebe1d94a8b3204f1/media/gpu/android/video_frame_factory_impl.h
,
Jul 19
Is this fixed now?
,
Jul 19
this has turned into a bit of an umbrella bug. the pixel_test that opened it seems to be mostly stable now. i see only a few failures in the last 200 runs on n9, vs 100% originally. so, i think it is fixed, or at least not p1 anymore. lowering to p2 for now.
,
Jul 19
The test's currently marked flaky on Android: https://cs.chromium.org/chromium/src/content/test/gpu/gpu_tests/pixel_expectations.py?type=cs&q=Pixel_Video_VP9&sq=package:chromium&g=0&l=118 Should we try removing the suppression? (Reassigning to liberato@)
,
Dec 5
#46 Trying to re-enable the tests failed: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-marshmallow-arm64-rel/81074 I'm uploading a change to reenable those tests everywhere and mark as failing on Android as you recommended on https://chromium-review.googlesource.com/c/chromium/src/+/1214364 Also linking issue 716564 here that's tracking the vp9 test being flaky on Android with Nvidia.
,
Dec 11
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1a9264ddffb97e645ae936840846a741dfefee24 commit 1a9264ddffb97e645ae936840846a741dfefee24 Author: Sunny Sachanandani <sunnyps@chromium.org> Date: Tue Dec 11 00:52:48 2018 Re-enable video pixel tests after rebaselining The tests were marked as failing and version numbers were incremented in https://crrev.com/c/1182222, but I forgot to re-enable the tests. Bug: 869677, 913138, 858826, 819635, 774809 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: I7fdb28c3dbaa7aad4137b16d8f9bf8fe15fe9a08 Reviewed-on: https://chromium-review.googlesource.com/c/1214364 Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> Reviewed-by: Kenneth Russell <kbr@chromium.org> Cr-Commit-Position: refs/heads/master@{#615353} [modify] https://crrev.com/1a9264ddffb97e645ae936840846a741dfefee24/content/test/gpu/gpu_tests/pixel_expectations.py
,
Dec 11
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b188cd82b0f877078b2a7f87169a4a16a1f9507d commit b188cd82b0f877078b2a7f87169a4a16a1f9507d Author: Jamie Madill <jmadill@chromium.org> Date: Tue Dec 11 16:38:53 2018 Revert "Re-enable video pixel tests after rebaselining" This reverts commit 1a9264ddffb97e645ae936840846a741dfefee24. Reason for revert: rebaseline didn't seem to work. Failing builds: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win7%20FYI%20Release%20%28AMD%29/3655 https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win7%20FYI%20Release%20%28NVIDIA%29/3470 https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win10%20FYI%20Release%20%28Intel%20HD%20630%29/2954 Failing tests: gpu_tests.pixel_integration_test.PixelIntegrationTest.Pixel_DirectComposition_Video_VP9 gpu_tests.pixel_integration_test.PixelIntegrationTest.Pixel_Video_VP9 gpu_tests.pixel_integration_test.PixelIntegrationTest.Pixel_DirectComposition_Video_MP4 gpu_tests.pixel_integration_test.PixelIntegrationTest.Pixel_Video_MP4 Original change's description: > Re-enable video pixel tests after rebaselining > > The tests were marked as failing and version numbers were incremented in > https://crrev.com/c/1182222, but I forgot to re-enable the tests. > > Bug: 869677, 913138, 858826, 819635, 774809 > Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel > Change-Id: I7fdb28c3dbaa7aad4137b16d8f9bf8fe15fe9a08 > Reviewed-on: https://chromium-review.googlesource.com/c/1214364 > Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> > Reviewed-by: Kenneth Russell <kbr@chromium.org> > Cr-Commit-Position: refs/heads/master@{#615353} TBR=kbr@chromium.org,sunnyps@chromium.org Change-Id: I553c10ca8022a90a0afa73ffafa44aa5bbf4e2d1 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 869677, 913138, 858826, 819635, 774809 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Reviewed-on: https://chromium-review.googlesource.com/c/1372234 Reviewed-by: Jamie Madill <jmadill@chromium.org> Commit-Queue: Jamie Madill <jmadill@chromium.org> Cr-Commit-Position: refs/heads/master@{#615553} [modify] https://crrev.com/b188cd82b0f877078b2a7f87169a4a16a1f9507d/content/test/gpu/gpu_tests/pixel_expectations.py |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by liber...@chromium.org
, Jun 28 2018