New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 693758 link

Starred by 5 users

Issue metadata

Status: Verified
Owner:
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 687868



Sign in to add a comment

GPU hang on caroline

Project Member Reported by marc...@chromium.org, Feb 17 2017

Issue description

I got the attached GPU hang on caroline while doing hangouts + video (youtube) + a few tabs overnight.

The crashy command buffer is this:



00000000 :  7a000004 PIPE_CONTROL
00000004 :  00101420 
00000008 :  00000000
0000000c :  00000000
00000010 :  00000000
00000014 :  00000000
00000018 :  69043321 PIPELINE_SELECT
0000001c :  61010011 CMD_STATE_BASE_ADDRESS
00000020 :  00000001
00000024 :  00000000
00000028 :  00000000
0000002c :  0c04d001
00000030 :  00000000
00000034 :  0b626001
00000038 :  00000000
0000003c :  00000001
00000040 :  00000000
00000044 :  00eb9001
00000048 :  00000000
0000004c :  ffff0001
00000050 :  ffff0001
00000054 :  ffff0001
00000058 :  ffff0001
0000005c :  00000001
00000060 :  00000000
00000064 :  fffff000
00000068 :  70000007 CMD_MEDIA_VFE_STATE
0000006c :  00000000
00000070 :  00000000
00000074 :  003b3b00
00000078 :  00000000
0000007c :  000f0020
00000080 :  00000000
00000084 :  00000000
00000088 :  00000000
0000008c :  70010002 CMD_MEDIA_CURBE_LOAD
00000090 :  00000000
00000094 :  00000100
00000098 :  00000000
0000009c :  70040000 CMD_MEDIA_STATE_FLUSH
000000a0 :  00000000
000000a4 :  70020002 CMD_MEDIA_INTERFACE_DESCRIPTOR_LOAD
000000a8 :  00000000
000000ac :  00000020
000000b0 :  00000100
000000b4 :  18800101
000000b8 :  0ad4d000
000000bc :  00000000
000000c0 :  00000000
000000c4 :  05000000 MI_BATCH_BUFFER_END

This was emitted by gen9_pp_pipeline_setup, I will have to dig but this is certainly video-related.
 
error.txt
5.3 MB View Download
A few questions for you:

What was the build number?

Since this was Hangouts, can I assume you were using VP8 hardware for decode and libvpx for encode?

It would be great to get some logs from the session.

In the meanwhile, we can setup a burn in to duplicate with a VA Trace.

This is from ToT, the device was doing hangouts and had a few other tabs open; enough to use ~500MB of swap.

I looked at the contents of the batch buffer in detail, and it looks correct to me. I have not looked at the sub buffer generated by gen8_pp_object_walker (0x000000000ad4d000) since it isn't part of the error state.
Labels: -Pri-3 videoshortlist Pri-1
Blocking: 687868
Hello, I don't have access to 687868.  Also, is there a way I can add people to this bug?  I don't have the ability to add CC like I do with other bugs.

Thanks,

Sean
@Sean: access to this bug is not restricted (unlike 687868). To receive notifications anyone can go and "star" it (top left).
Great, thanks @marc.
The pipeline is used for video processing, not for decoding. Do you know which kind of video processing is used in your test? Could you give a try with disabling VPP?
Chrome uses pp for YUV-RGB conversion and scaling. You can look at the chrome code for vaapi support.
Chrome's support for VPP in Google's vaapi wrapper was added by William Xie with help from Yakui some time ago.  I'm attaching the patch for reference.
0001-Ozone-enable-video-scaling-when-playing-on-overlay.patch
14.5 KB Download
It's a shame we have not maintained the V4L/libyami port to ChromeOS.  It would have been nice to verify a known stack against the Chrome wrapper.
@Sean, it may need some efforts, but master branch should work. https://github.com/01org/libyami/tree/master
@haihao, you can try "yamidecode -m 3" it will use vpp and egl to draw the decoded output. https://github.com/01org/libyami-utils/blob/master/tests/decodeoutput.cpp#L699
I can't reproduce this issue with 'yamidecode -m 3'.
Tested on both KBL and SKL using VPP through libyami on core Linux, with no issue found.  

@pawel, have you generated a LIBVA_TRACE log of this GPU Hang on the Caroline (SKL) system?
We are able to reproduce the GPU hang on a Caroline system under Chrome.  We are gathering LIBVA_TRACE.  We'll have a look and provide feedback.
When we are able to duplicate the GPU Hang, the last command that was executed before the hang is a 7a000004 CS_DONE.  However, none of the failures we are seeing show the commands like the original bug showed

CMD_MEDIA_STATE_FLUSH

CMD_MEDIA_INTERFACE_DESCRIPTOR_LOAD

CMD_MEDIA_VFE_STATE

So when the hang does happen after hours  of runs we get different commands lists depending on when we hit the actual trigger.  It's not clear from our testing that when the hang happens that it is consistently the same.  So we've asked the Intel Chrome integration team to spend time with the hard to duplicate hangs.  Because we've not been able to duplicate the Media commands seen in this original hang nor tie it to a Media issue per se.  We also are unable to duplicate the hang at all on core Linux Media using similar use cases with the driver. 
Status: Available (was: Untriaged)
Owner: posciak@chromium.org
Status: Assigned (was: Available)
I have some comments about this bug.

We are able to duplicate the gpu hang using only the Google Hangouts with video channel with only 2 participants.  We reproduced it on libva/vaapi-intel-driver 1.8.0 as we bumped up to this version believing a bisect would be required. Then as we can duplicate the problem consistently we enabled LIBVA_TRACE

Google Hangouts works in a way that is monitoring the network traffic and as such is drops resolution from 720p to 540p to 360p.  The traces show exactly that behavior.  A vp8 decoder is accompanied by a vpp to do the nv12 to rgb conversion

The traces look like

va_context 0x02000001 --> vp8 dec 640x360
va_context 0x02000002 --> vpp 640x360
va_context 0x02000003 --> vp8 dec 1280x720
va_context 0x02000004 --> vpp 1280x720
va_context 0x02000003 --> vp8 dec 960x540
va_context 0x02000003 --> vp8 dec 1280x720

In the transition from 360p to 720p the first vpp (360p) has processed #408 frames while the first vp8 dec processed #414 frames

The 6 frames in difference are processed in sequence with the second vp8 (720p) and the second vpp (720p).

The driver cannot make any assumption about these two streams being rendered on the same screen.

There are two options:  When switching to other resolution the application has to destroy the context and the surfaces associated or it requires to flush all pending buffers both on decoder and vpp before attempting to csc the new resolution frames

I believe if any of the above is done then the gpu hang will disappear.

Attached are the logs of reference




201704272248_gpuhang_hangoutsonly.tar.gz
4.6 MB Download
The options mentioned above land on the Google Hangouts stack, @posciak or another person in Google please comment on this information.
Cc: tfiga@chromium.org
Pawel is ooo. Tomasz, since you work on video, can you weigh in?
Some other observations from different LIBVA traces on different runs with the same use case: 1 Hangouts, 1 Youtube clip, 1 Webgl benchmark all of them rendering at the same time by minimizing the tabs to all fit on the screen.

As mentioned before, normally the first vp8 decoder and the first vpp will be used once, then on the resolution change coming from the application these 2 elements va_context 0x02000001 and 0x02000002 will not be used again or distroyed.  On the Chromium VAVDA implementation there's no port reconfiguration coming from the upper class, instead it will only happen if the accelerator returns a request to flush the surfaces and create them again.  In reality this should be ok as long as the vp8 decoder and vpp are properly shut down.  Ideally the upper class that create vda should send a port reconfiguration instead of creating a whole new decoder + vpp.  Unfortunately the GPUVDA API would need a major rework to accommodate that. 

To re-iterate, no matter how this is accomplished, it would be better to have only one vp8 decoder and one vpp at a time.  

On the logs it is seen that the second vpp i.e. va_context 0x02000004 will process the remainder of the vp8 decoder surfaces no matter how many vp8 decoders were created after the first one and before the gpu hang happened. Re-using the vpp is possible because when creating the next vp8decoder it will destroy the Unowned surfaces created and later used by the vpp.








FWIW, while trying to capture traces from the VaapiVideoDecoderThread, it hangs as soon as chrome://tracing is recorded and stopped on the gpu_decoder

This is only with hangouts, no jpeg decoder h/w accelerated.  Also the gpu hang error log decoded is included. 




2017_05_03_18_22_PDT.tar.bz
7.8 MB Download
Another bit of information on the 	201704272248_gpuhang_hangoutsonly.tar.gz 

One thing to notice on the log is the following

[55159.232003][ctx 0x02000003]	frame_count  = #716
[55163.833605][ctx 0x02000003]	frame_count  = #717
[55169.818435][ctx 0x02000003]	frame_count  = #718

frame #716 is rendering on one VASurface and using the reference

[55159.232002][ctx 0x02000003]	render_targets = 0x04000018
[55159.232003][ctx 0x02000003]	frame_count  = #716
[55159.232496][ctx 0x02000003]	last_ref_frame = 4000016
[55159.232497][ctx 0x02000003]	golden_ref_frame = 400001e
[55159.232498][ctx 0x02000003]	alt_ref_frame = 4000017

Then the very next frame #717 comes to va-api 4 seconds later, although still uses the same references

[55163.833602][ctx 0x02000003]	render_targets = 0x0400001a
[55163.833605][ctx 0x02000003]	frame_count  = #717

[55163.835236][ctx 0x02000003]	last_ref_frame = 4000018
[55163.835240][ctx 0x02000003]	golden_ref_frame = 400001e
[55163.835244][ctx 0x02000003]	alt_ref_frame = 4000017


libvpx on the encoder side could be error resilient but on the decoder side, when input bitstreams frames are being dropped by the VaapiDecoderThread which is this case then the next frame should be a key frame or a golden frame so that the first frame after the interruption can be decoded without a reference.

This was tricky to find as the reference frames are following the natural path until it is seen that the decoder dropped input frames.  





Cc: holmer@chromium.org pbos@chromium.org
Components: OS>Kernel>Video
May I ask if you could clarify conclusions in comment #24 please?
- What exact kind of reuse (and of what) if fine, and what kind of reuse/scenario is problematic, and why?
- Is the recommendation not to use multiple VP8 decoders at all? If so, this would prevent us from using the HW decoder in multi-party Hangouts for example.


For comment #26, what is the exact problem that causes GPU hang? In my understanding, even if we are using old references, shouldn't the worst case outcome be image corruption, but no GPU hangs?



+holmer@ and +pbos@ to please comment on the error resiliency/frame dropping issue in comment #26. Are we/could we guarantee to provide a keyframe if we are forced to drop input frames from the client/webrtc side? Thanks.

Comment 28 by pbos@chromium.org, May 10 2017

I think it sounds like we effectively leak handles: "va_context 0x02000001 and 0x02000002 will not be used again or distroyed"

If the decoder side returns an error (re comment 26), then webrtc will issue a keyframe request. This can happen for decode failures: https://chromium.googlesource.com/external/webrtc/+/f93752a2ee559ce385ff1ec7486876ccffa11757/webrtc/modules/video_coding/codecs/vp8/vp8_impl.cc#1058

The decoder needs to not silently drop these frames, but report an error. Other than that it should be fine. :)
Reply to comment #27

"- What exact kind of reuse (and of what) if fine, and what kind of reuse/scenario is problematic, and why?"

Reuse can be done for both decoder and vpp. Application should take care of surface creation/deletion on resolution change.  In this case, the first decoder and vpp can be re-used for as long as needed in this use case. 

" Is the recommendation not to use multiple VP8 decoders at all? If so, this would prevent us from using the HW decoder in multi-party Hangouts for example."

Multiple decoders can be used, the requirement is that each instance has its own config, context and surfaces. On a Multi-party session it is expected that all decoder created are actively used. 

I am not clear yet what which surface is causing the GPU hang exactly.  It is correct that old references should only cause image corruption unless a surface was already released and tried to be used again.  I haven't found evidence of such behavior yet. 

VA-API decoder is not receiving a key frame after skipping input frames.  It is the chrome + GPU process skipping frames.  From VA-API perspective all input bitstream frames are treating as continuous stream as the application is commanding so. 


Currently we create a VA config for each decoder, and on each resolution change we use this config to (re)create a new set of VA surfaces and a VA context associated with these surfaces, destroying previous ones. My understanding is that this should be fine and not be causing issues?

How do we define "port reconfiguration" in this context?

Vpp is separate from decoder and we don't control it directly from decoder, as we use it for rendering. Sharing surfaces between decoders and vpp is done via dmabufs. We should not be giving VA surfaces directly from decoder to vpp, we should only be passing dmabufs. It should be possible to share dmabufs across any number of contexts and the related code should not be directly dependent on backing VA surface structures in my understanding.

Perhaps there is an issue with not taking some references on dmabufs and/or related metadata somewhere? Or perhaps some structures related to VASurfaces, which should not be shared if sharing is done via dmabuf, are shared and when the surfaces are destroyed by decoder, they are still being used by vpp afterwards, or the other way around?

Comment 31 by tfiga@chromium.org, May 12 2017

As far as I remember, Intel's libva driver talks to the DRM kernel driver.
Is this happening directly or through libdrm? Proper handling of multiple
DMAbuf imports with DRM can be a bit tricky to handle, because the import
ioctl always returns the same GEM handle if the DMAbuf represents the same
kernel GEM. This GEM handle is not reference counted by the kernel, so
userspace must make sure that if the same DMAbuf is imported twice, it's
properly deduplicated and/or the GEM handle is refcounted in userspace.

2017/05/12 14:11 "posciak via monorail" <monorail+v2.1412232519@chromium.org
The hang happens exactly when vpp is CSC a decoded surface that is no longer of interest to the decoder; but the decoder has not re-used it yet.  Indeed vpp runs on a separate thread to the vp8 decoder. 


dmabuf ref looks a plausible possibility as this is the analysis (on logs not attached here but the same logic applies)

Surface 0x04000007 calls the locked sequence BeginPicture -> RenderPicture -> EndPicture, That happens from timestamp 30514.191996 to 30514.200701

The next decode surface happens from 3014.212826 to 30514.215017. This surface uses 0x04000007 as its last reference, so the dmabuf ref is kept.

After this, the decoder won't need again the content of 0x04000007 and it is posted for recycling by the application. Contents were not destroyed by the application but I would assume that there's no dmabuf reference kept

Then vpp will try to csc surface 0x04000007 on 30514.225917 (BeginPicture) which is after the decoder last used it.  EndPicture returns on 30519.889848 or a little after the 4 seconds timeout that the driver waits after a GPU hang. 

Now, in terms of the use case, this condition only happens when the stream is dropping frames as mentioned before; although it looks like the application is handling those dropped frames properly.

I also dumped the received stream at the webrtc level and the stream doesn't look to be adding garbage.  The ivf is decodable using libyami. 
Update:

There's a series of patches to i915 kmd that were recently back ported to kernel 3.18 on ChromeOS.  Those patches were identified as fixing a GPU hang after vp8 decoder was disabled.

I took the patch series to i915 kmd and also the gpu hang on gen9_pp_pipeline_setup function is gone.

I have been testing this for a while and everything looks ok.

I will be submitting a revert to vp8 decoder disabled so that more QA testing can be done

i915 kmd patch series starts with this patch: https://chromium-review.googlesource.com/c/505705/1

Patch to re-enable vp8 decoder is here: https://chromium-review.googlesource.com/c/507709/


Thank you for the update. Does this also fix the issue described in #26-32, the surface reuse issue? If not, we may not be able to reenable the decoder yet.
Per comment #32, the application is handling properly the dropped frames and once the GPU hang is gone those can be tracked.

Another important piece of information that might be relevant here and that helped to narrow the investigation is that apprtc test with vp8 enc/dec showed a single vp8 VAVDA decoder which made it more simple to track.  In apprtc the GPU hang did happen.  That analysis was mentioned on comment #32.  

Of course there're things to fix so that Hangouts correct the handles leak but those are beyond the scope of the GPU hang reported here and fixed as mentioned in #33.

Let me know if more clarification is needed. 
Yesterday I started testing with kmd patches mentioned on a pristine sync of Chromium.  I am running: 1 youtube (vp9 software), 1 webgl becnhmark, 1 vp8 clip decoding and 1 two way hangouts session.

No GPU hang was found in this overnight testing. QA to confirm this observation
The patch that fixes gen8+ virtual address issue https://chromium-review.googlesource.com/c/505704/1

The patch is part of the series and the kernel team is back porting it. 
@stephane, benson

https://chromium-review.googlesource.com/c/505704/1 introduces this...
[47835.867581] [drm:i915_hangcheck_elapsed] ERROR Hangcheck timer elapsed... bsd ring idle

behavior does not actually result in a hang.

However upstream does not view this as a failure as of:
commit 83348ba84ee0d5d4d982e5382bfbc8b2a2d05e75

Previously this caused other tests to fail (I think graphics_GLBench). 

Can we submit this series while we back port another series to incorporate commit 83348ba84ee0d5d4d982e5382bfbc8b2a2d05e75 or do you want that series first?

Cc: puneetster@chromium.org
Project Member

Comment 41 by bugdroid1@chromium.org, Jul 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/75273e812fcd5d51e9b3c01e9f87bee9994ca456

commit 75273e812fcd5d51e9b3c01e9f87bee9994ca456
Author: Daniel Charles <daniel.charles@intel.com>
Date: Wed Jul 12 08:29:38 2017

Revert "libva-intel-driver: Disable VP8 hw decode on skylake"

Re-enabling VP8 h/w decoder on skylake

i915 kmd patch series backport to 3.18 kernel fixes the GPU hang that
was reported.

BUG= chromium:693758 
TEST="run hangouts + youtube + webgl. run autotest on vavda"
TEST="run subjective video testing."
CQ-DEPEND=CL:505705

This reverts commit cecdd7e8ca7f279df86064fe943282041e8e4ba6.

Change-Id: I0990fda64ff20dede327fd27635c16a8457dd26c
Signed-off-by: Daniel Charles <daniel.charles@intel.com>
Reviewed-on: https://chromium-review.googlesource.com/507709
Tested-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Marc Herbert <marc.herbert@intel.com>
Reviewed-by: YH Lin <yueherngl@chromium.org>
Reviewed-by: Pawel Osciak <posciak@chromium.org>

[modify] https://crrev.com/75273e812fcd5d51e9b3c01e9f87bee9994ca456/x11-libs/libva-intel-driver/libva-intel-driver-1.7.1.ebuild
[rename] https://crrev.com/75273e812fcd5d51e9b3c01e9f87bee9994ca456/x11-libs/libva-intel-driver/libva-intel-driver-1.7.1-r4.ebuild
[delete] https://crrev.com/b057a3cf147300cda5cba37bbb3783a4adad9aac/x11-libs/libva-intel-driver/files/CHROMIUM-Disable-hw-VP8-decode-on-skylake.patch

Status: Fixed (was: Assigned)
Cc: vsu...@chromium.org avkodipelli@chromium.org
As per https://chromium-review.googlesource.com/c/487910, this is disabled in M59, Also observed it is disabled in M60 as well. 

Are we gonna merge this into M59 and M60 ?
We will likely not be merging.
Status: Verified (was: Fixed)
Verified on chell device on 9752.0.0, 61.0.3159.0.

Sign in to add a comment