New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 805277 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 27
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 3
Type: Bug



Sign in to add a comment

Crash in VP8: Assertion `tot < (1 << 24)' failed.

Project Member Reported by lambroslambrou@chromium.org, Jan 24 2018

Issue description

I see this crash occasionally when testing a local build of Chromoting host process on Linux:

remoting_me2me_host: ../../third_party/libvpx/source/libvpx/vp8/common/treecoder.c:92: void vp8_tree_probs_from_distribution(int, vp8_token 
*, const vp8_tree_index *, vp8_prob *, unsigned int (*)[2], const unsigned int *, unsigned int, int): Assertion `tot < (1 << 24)' failed.

The repro involves building and running the remoting Me2Me host process. I set up ~/.chrome-remote-desktop-session script to run just "xterm".
Then I connect to the remoting session, and run this command inside the xterm:

while :; do xsetroot -mod $((RANDOM%14+2)) $((RANDOM%14+2)); done

This creates a rapidly changing full-screen repeating pattern, which gets captured and encoded with VP8.

This time, I was lucky enough to capture the crash within GDB. Log of 'bt full' attached.

Sorry I don't have an easy consistent repro, but maybe someone on VPX might recognize the error?

 
gdb.txt
36.0 KB View Download
Cc: chcunningham@chromium.org jrumm...@chromium.org
Status: Available (was: Untriaged)
lambroslambrou@, do you have the crash ID?
I'm not familiar with some of the chromoting components mentioned - is this crash observed on the remote encoder, or on the local decoding client?
Cc: fgalligan@chromium.org johannkoenig@chromium.org
+libvpx folks
Cc: marpan@chromium.org jianj@chromium.org
Marco and Jerome are most familiar with this.
Crash is in the encoder side. The remoting host process is a separate binary from Chrome, and it handles desktop frame capture, encoding and sending.

We also see many crashes on SIGFPE. Maybe it's the same crash on release builds - if the assert would fail, perhaps it causes div-by-zero later on?

I think SIGFPE is a separate crash, which has been observed on both Debug and Release builds of host process. I'll file a separate bug, once I have more info and a stack trace.

I don't think we have "crash ID" for host process, it's not Chrome browser. I can't remember if we have crash-reporting implemented for this process on Linux.

 Issue 805683  has been merged into this issue.
Cc: brandtr@chromium.org
I am seeing this crash more frequently in my testing. I think there might be a correlation with low-bitrate encoding - in particular, I have been setting
rc_target_bitrate
(https://cs.chromium.org/chromium/src/third_party/libvpx/source/libvpx/vpx/vpx_encoder.h?l=481&rcl=efa786d4649cb39919170d745d4206309b463575)
to 1 here:
https://cs.chromium.org/chromium/src/remoting/codec/webrtc_video_encoder_vpx.cc?l=499&rcl=cbb1c646b962308f397996719ec263b81a1390c7

(I modified our bitrate_filter_.GetTargetBitrateKbps() to always return 1)

With this change, the remoting host typically crashes within a few seconds of connection. The remote (Linux) host desktop was simply displaying the default wallpaper that we use in Corp.

We have been hit by this issue as well. In our case, we run multi-hour WebRTC sessions and the process gets killed by this check after 5-15 hours. In our case, it's an input from a regular webcam, not remoting or anything like that, which means that regular users see this issue as well.

My lame attempt to fix it for us is:

https://cs.chromium.org/chromium/src/third_party/libvpx/source/libvpx/vp8/common/treecoder.c?type=cs&sq=package:chromium&g=0&l=77

@@ -89,6 +89,12 @@ void vp8_tree_probs_from_distribution(int n, /* n = size of alphabet */
     const unsigned int *const c = branch_ct[t];
     const unsigned int tot = c[0] + c[1];
 
+    if (tot >= (1 << 24)) {
+      // This is not the best possible fix, but at least it does not crash the train.
+      fprintf(stderr, "vp8_tree_probs_from_distribution: overflow avoided with a hack\n");
+      probs[t] = 1;
+      continue;
+    }
     assert(tot < (1 << 24)); /* no overflow below */

Basically, instead of killing the process, it just gives slightly wrong probabilities and (if I understand correctly), the video gets suboptimally encoded for a short period of time. This is certainly not a proper fix, but it might give you the indication that we suffer enough from the bug.
Components: -Services>Chromoting
Summary: Crash in VP8: Assertion `tot < (1 << 24)' failed. (was: [remoting host] Crash in VP8: Assertion `tot < (1 << 24)' failed.)
As this issue is proved to be not related to Chromoting, I have updated the components to be just media codecs. Feel free to revert this change back, if I did it wrong.
Cc: krasin@chromium.org
marpan@chromium.org jianj@chromium.org - ping for triage
Owner: jianj@chromium.org
Cc: jzern@chromium.org
Today, I ran another overnight encoding session. The interesting observation is that once it gets to this condition with the overflow, it never recovers:

vp8_tree_probs_from_distribution: overflow avoided with a hack
vp8_tree_probs_from_distribution: overflow avoided with a hack
vp8_tree_probs_from_distribution: overflow avoided with a hack
vp8_tree_probs_from_distribution: overflow avoided with a hack
vp8_tree_probs_from_distribution: overflow avoided with a hack
vp8_tree_probs_from_distribution: overflow avoided with a hack
vp8_tree_probs_from_distribution: overflow avoided with a hack
vp8_tree_probs_from_distribution: overflow avoided with a hack

It happens 1-2 times per frame. The transmitted frames do not have visible artifacts.
Project Member

Comment 18 by bugdroid1@chromium.org, Jun 14 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/webm/libvpx/+/343352b556f5f61833174c08a35d697d280933e3

commit 343352b556f5f61833174c08a35d697d280933e3
Author: Jerome Jiang <jianj@google.com>
Date: Thu Jun 14 00:03:01 2018

vp8: remove assertion in tree coder.

Cast the counter to uint64_t in case it overflows.

The assert was to prevent c[0] * Pfac being overflow beyong unsigned int
since Pfac could be 2^8. Thus c[0] needs to be smaller than 2^24.

In VP9, the assert was removed and c[0] was casted to uint64_t.

Bug:  805277 
Change-Id: Ic46a3c5b4af2f267de4e32c1518b64e8d6e9d856

[modify] https://crrev.com/343352b556f5f61833174c08a35d697d280933e3/vp8/common/treecoder.c

Comment 19 by jianj@chromium.org, Jun 14 2018

Status: Started (was: Available)
The fix is a workaround and make the code consistent with VP9, which doesn't have an assert.

This fix will be rolled into chromium soon.

There might be some deeper issue with the accumulation of variables in long content. We'll look into that later.
Thank you very much, Jerome!
That allowed us to get rid of a local workaround and stabilize our system.
Project Member

Comment 21 by bugdroid1@chromium.org, Jun 18 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4afd9c778c947e93bf37e1f9f2fae6d7817277d3

commit 4afd9c778c947e93bf37e1f9f2fae6d7817277d3
Author: Jerome Jiang <jianj@google.com>
Date: Mon Jun 18 16:59:09 2018

Roll src/third_party/libvpx/source/libvpx/ 37a0283b1..8648a64c8 (16 commits)

https://chromium.googlesource.com/webm/libvpx.git/+log/37a0283b18a1..8648a64c83b5

$ git log 37a0283b1..8648a64c8 --date=short --no-merges --format='%ad %ae %s'
2018-06-15 jianj VP9 HBD: Fix integer overflow problem in variance calc.
2018-06-15 tomfinegan Clean up avx512 compiler support test.
2018-05-23 jingning Prepare motion estimation process for temporal dependency model
2018-05-23 jingning Construct temporal dependency building system
2018-06-14 jianj vp9 svc: add tests for inter layer prediction.
2018-06-14 zoeliu Separate GF structure defining from bit allocation
2018-05-21 jingning Allocate tpl_dep_stats frame buffer
2018-06-14 jingning Add temporal model control as a speed feature
2018-05-21 jingning Add data structure for frame dependent mode decision
2018-06-13 luc [VSX] Optimize PROCESS16 macro
2018-06-13 zoeliu Unify frame_index in defining GF group structure
2018-06-13 jianj vp8: remove assertion in tree coder.
2018-06-13 luc VSX Version of SAD8xN
2018-06-13 luc Add Speed Tests for the SADTest test suite.
2018-06-13 tomfinegan Fix avx512 related MSVC build failure.
2018-06-12 jingning Remove duplicate vp9_twopass_postencode_update def

Created with:
  roll-dep src/third_party/libvpx/source/libvpx
R=johannkoenig@google.com

Bug:  805277 , webm:1534
Change-Id: I3fe5dc01690c4e2e43905663e5204b7e806e4b1c
Reviewed-on: https://chromium-review.googlesource.com/1103855
Reviewed-by: Johann Koenig <johannkoenig@google.com>
Commit-Queue: Jerome Jiang <jianj@google.com>
Cr-Commit-Position: refs/heads/master@{#568041}
[modify] https://crrev.com/4afd9c778c947e93bf37e1f9f2fae6d7817277d3/DEPS
[modify] https://crrev.com/4afd9c778c947e93bf37e1f9f2fae6d7817277d3/third_party/libvpx/README.chromium
[modify] https://crrev.com/4afd9c778c947e93bf37e1f9f2fae6d7817277d3/third_party/libvpx/source/config/vpx_version.h

Comment 22 by jianj@chromium.org, Jun 18 2018

The workaround fix has been rolled into chromium.
Thanks!

As for non-mandatory background, we use WebRTC directly, so the fix was available on our side for a while and I fully confident that it solves the issue we had (modulo that accumulation which does not show up yet, but might if we go into even longer sessions)

Comment 24 by jianj@chromium.org, Jun 18 2018

Thanks for the verification.

Yeah there might be deeper issues with the accumulation. So I'm gonna keep this issue open for a while and I'll take a look. Hopefully I can find something and fix it.
Status: Fixed (was: Started)

Sign in to add a comment