Crash in VP8: Assertion `tot < (1 << 24)' failed. |
||||||||||
Issue descriptionI see this crash occasionally when testing a local build of Chromoting host process on Linux: remoting_me2me_host: ../../third_party/libvpx/source/libvpx/vp8/common/treecoder.c:92: void vp8_tree_probs_from_distribution(int, vp8_token *, const vp8_tree_index *, vp8_prob *, unsigned int (*)[2], const unsigned int *, unsigned int, int): Assertion `tot < (1 << 24)' failed. The repro involves building and running the remoting Me2Me host process. I set up ~/.chrome-remote-desktop-session script to run just "xterm". Then I connect to the remoting session, and run this command inside the xterm: while :; do xsetroot -mod $((RANDOM%14+2)) $((RANDOM%14+2)); done This creates a rapidly changing full-screen repeating pattern, which gets captured and encoded with VP8. This time, I was lucky enough to capture the crash within GDB. Log of 'bt full' attached. Sorry I don't have an easy consistent repro, but maybe someone on VPX might recognize the error?
,
Jan 24 2018
I'm not familiar with some of the chromoting components mentioned - is this crash observed on the remote encoder, or on the local decoding client?
,
Jan 24 2018
+libvpx folks
,
Jan 24 2018
Marco and Jerome are most familiar with this.
,
Jan 24 2018
Crash is in the encoder side. The remoting host process is a separate binary from Chrome, and it handles desktop frame capture, encoding and sending.
,
Jan 25 2018
We also see many crashes on SIGFPE. Maybe it's the same crash on release builds - if the assert would fail, perhaps it causes div-by-zero later on?
,
Jan 25 2018
I think SIGFPE is a separate crash, which has been observed on both Debug and Release builds of host process. I'll file a separate bug, once I have more info and a stack trace. I don't think we have "crash ID" for host process, it's not Chrome browser. I can't remember if we have crash-reporting implemented for this process on Linux.
,
Jan 25 2018
Issue 805683 has been merged into this issue.
,
Jan 29 2018
,
Feb 2 2018
I am seeing this crash more frequently in my testing. I think there might be a correlation with low-bitrate encoding - in particular, I have been setting rc_target_bitrate (https://cs.chromium.org/chromium/src/third_party/libvpx/source/libvpx/vpx/vpx_encoder.h?l=481&rcl=efa786d4649cb39919170d745d4206309b463575) to 1 here: https://cs.chromium.org/chromium/src/remoting/codec/webrtc_video_encoder_vpx.cc?l=499&rcl=cbb1c646b962308f397996719ec263b81a1390c7 (I modified our bitrate_filter_.GetTargetBitrateKbps() to always return 1) With this change, the remoting host typically crashes within a few seconds of connection. The remote (Linux) host desktop was simply displaying the default wallpaper that we use in Corp.
,
Jun 11 2018
We have been hit by this issue as well. In our case, we run multi-hour WebRTC sessions and the process gets killed by this check after 5-15 hours. In our case, it's an input from a regular webcam, not remoting or anything like that, which means that regular users see this issue as well. My lame attempt to fix it for us is: https://cs.chromium.org/chromium/src/third_party/libvpx/source/libvpx/vp8/common/treecoder.c?type=cs&sq=package:chromium&g=0&l=77 @@ -89,6 +89,12 @@ void vp8_tree_probs_from_distribution(int n, /* n = size of alphabet */ const unsigned int *const c = branch_ct[t]; const unsigned int tot = c[0] + c[1]; + if (tot >= (1 << 24)) { + // This is not the best possible fix, but at least it does not crash the train. + fprintf(stderr, "vp8_tree_probs_from_distribution: overflow avoided with a hack\n"); + probs[t] = 1; + continue; + } assert(tot < (1 << 24)); /* no overflow below */ Basically, instead of killing the process, it just gives slightly wrong probabilities and (if I understand correctly), the video gets suboptimally encoded for a short period of time. This is certainly not a proper fix, but it might give you the indication that we suffer enough from the bug.
,
Jun 11 2018
As this issue is proved to be not related to Chromoting, I have updated the components to be just media codecs. Feel free to revert this change back, if I did it wrong.
,
Jun 11 2018
,
Jun 11 2018
marpan@chromium.org jianj@chromium.org - ping for triage
,
Jun 11 2018
,
Jun 11 2018
,
Jun 12 2018
Today, I ran another overnight encoding session. The interesting observation is that once it gets to this condition with the overflow, it never recovers: vp8_tree_probs_from_distribution: overflow avoided with a hack vp8_tree_probs_from_distribution: overflow avoided with a hack vp8_tree_probs_from_distribution: overflow avoided with a hack vp8_tree_probs_from_distribution: overflow avoided with a hack vp8_tree_probs_from_distribution: overflow avoided with a hack vp8_tree_probs_from_distribution: overflow avoided with a hack vp8_tree_probs_from_distribution: overflow avoided with a hack vp8_tree_probs_from_distribution: overflow avoided with a hack It happens 1-2 times per frame. The transmitted frames do not have visible artifacts.
,
Jun 14 2018
The following revision refers to this bug: https://chromium.googlesource.com/webm/libvpx/+/343352b556f5f61833174c08a35d697d280933e3 commit 343352b556f5f61833174c08a35d697d280933e3 Author: Jerome Jiang <jianj@google.com> Date: Thu Jun 14 00:03:01 2018 vp8: remove assertion in tree coder. Cast the counter to uint64_t in case it overflows. The assert was to prevent c[0] * Pfac being overflow beyong unsigned int since Pfac could be 2^8. Thus c[0] needs to be smaller than 2^24. In VP9, the assert was removed and c[0] was casted to uint64_t. Bug: 805277 Change-Id: Ic46a3c5b4af2f267de4e32c1518b64e8d6e9d856 [modify] https://crrev.com/343352b556f5f61833174c08a35d697d280933e3/vp8/common/treecoder.c
,
Jun 14 2018
The fix is a workaround and make the code consistent with VP9, which doesn't have an assert. This fix will be rolled into chromium soon. There might be some deeper issue with the accumulation of variables in long content. We'll look into that later.
,
Jun 15 2018
Thank you very much, Jerome! That allowed us to get rid of a local workaround and stabilize our system.
,
Jun 18 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4afd9c778c947e93bf37e1f9f2fae6d7817277d3 commit 4afd9c778c947e93bf37e1f9f2fae6d7817277d3 Author: Jerome Jiang <jianj@google.com> Date: Mon Jun 18 16:59:09 2018 Roll src/third_party/libvpx/source/libvpx/ 37a0283b1..8648a64c8 (16 commits) https://chromium.googlesource.com/webm/libvpx.git/+log/37a0283b18a1..8648a64c83b5 $ git log 37a0283b1..8648a64c8 --date=short --no-merges --format='%ad %ae %s' 2018-06-15 jianj VP9 HBD: Fix integer overflow problem in variance calc. 2018-06-15 tomfinegan Clean up avx512 compiler support test. 2018-05-23 jingning Prepare motion estimation process for temporal dependency model 2018-05-23 jingning Construct temporal dependency building system 2018-06-14 jianj vp9 svc: add tests for inter layer prediction. 2018-06-14 zoeliu Separate GF structure defining from bit allocation 2018-05-21 jingning Allocate tpl_dep_stats frame buffer 2018-06-14 jingning Add temporal model control as a speed feature 2018-05-21 jingning Add data structure for frame dependent mode decision 2018-06-13 luc [VSX] Optimize PROCESS16 macro 2018-06-13 zoeliu Unify frame_index in defining GF group structure 2018-06-13 jianj vp8: remove assertion in tree coder. 2018-06-13 luc VSX Version of SAD8xN 2018-06-13 luc Add Speed Tests for the SADTest test suite. 2018-06-13 tomfinegan Fix avx512 related MSVC build failure. 2018-06-12 jingning Remove duplicate vp9_twopass_postencode_update def Created with: roll-dep src/third_party/libvpx/source/libvpx R=johannkoenig@google.com Bug: 805277 , webm:1534 Change-Id: I3fe5dc01690c4e2e43905663e5204b7e806e4b1c Reviewed-on: https://chromium-review.googlesource.com/1103855 Reviewed-by: Johann Koenig <johannkoenig@google.com> Commit-Queue: Jerome Jiang <jianj@google.com> Cr-Commit-Position: refs/heads/master@{#568041} [modify] https://crrev.com/4afd9c778c947e93bf37e1f9f2fae6d7817277d3/DEPS [modify] https://crrev.com/4afd9c778c947e93bf37e1f9f2fae6d7817277d3/third_party/libvpx/README.chromium [modify] https://crrev.com/4afd9c778c947e93bf37e1f9f2fae6d7817277d3/third_party/libvpx/source/config/vpx_version.h
,
Jun 18 2018
The workaround fix has been rolled into chromium.
,
Jun 18 2018
Thanks! As for non-mandatory background, we use WebRTC directly, so the fix was available on our side for a while and I fully confident that it solves the issue we had (modulo that accumulation which does not show up yet, but might if we go into even longer sessions)
,
Jun 18 2018
Thanks for the verification. Yeah there might be deeper issues with the accumulation. So I'm gonna keep this issue open for a while and I'll take a look. Hopefully I can find something and fix it.
,
Sep 27
|
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by yini...@chromium.org
, Jan 24 2018Status: Available (was: Untriaged)