New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 651820 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Closed: Nov 2016
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Feature



Sign in to add a comment

Create WebRTC performance test that would have caught https://crbug.com/647886

Project Member Reported by phoglund@chromium.org, Sep 30 2016

Issue description

Dropped frame tests did not catch the regression, which presumably went in on Mar 8 2016:

https://chromeperf.appspot.com/report?sid=a51762ed2320cc4ce019c79e221921a0d4deb0533bc4aa092cf3fa195e9bcab6

Nor did the WebRTC CPU tests:

https://chromeperf.appspot.com/report?sid=332ae320b7354102dcb5351483704f3f4de7903a0e54170b2094d7ff8a95b5b6&start_rev=379213&end_rev=382806

Figure out why the tests did not catch the issue and implement a test.
 
emircan@, can you point me to a detailed summary of what the bug is about? What are the gritty technical details here? If we assume the dropped-call metrics measure correctly (which I doubt), would a 720p 45-second WebRTC loopback call in a single tab, with the spinning green ball fake video, be enough to detect the regression?
The original CL that caused the regression -https://codereview.chromium.org/1737253002- had an impact because it marked video layer as non-opaque for rendering in order to accomodate for a new feature. It impacted all OSs, but the effects wouldn't be equal on all as it is based on rendering performance. It does not relate to decoder or camera. It relates to how remote video streams are rendered on the page, so it makes sense that it scales with the number of connections. My mistake here was to assume that this change wouldn't have a performance impact as camera streams always have opaque content. Although camera's stream content is opaque, marking the layer as non-opaque makes a difference in rendering according to the metrics. 

We asked marcheu@ about why the effects would be different on platforms to understand why the current tests didn't catch it. He said that on high end devices the impact isn't visible since there is enough bandwidth and GPU horsepower to hide this cost. So far, we know that this only caused difference in panther CrOS device in kiosk mode. Since panther was already running nearly with the full cpu usage; this dropped some extra frames, caused cpu adaptation, dropped frame rate/resolution and made it more visible. Clear problem here is that we do not have any automated testing on CrOS devices with ~10 connections in kiosk mode. But even if we had a test like you described above, simply checking cpu usage or dropped metrics might not be the clear as adaptation and HW performance would interfere. For instance, if panther had more room for cpu usage, we might not see any dropped frames.
Cc: sprang@chromium.org
Your description above suggests there were small changes visible on all platforms, but that their compounded effect happened to show in low-powered CrOS devices. 

What if we measured CPU and GPU usage for a WebRTC call, would we be able to pick this difference up? Is it so miniscule we need several video tags rendering at the same time to be able to spot a difference? Is it CPU/GPU we should even measure, or something else? What if we have one peer connection rendering to 100 video tags in a demo page, will that compound the effect?

I wonder if we can detect those small differences. The goal here is to construct the simplest test possible that would have caught this issue.
Yes, compounded effect happened to show on CrOS panther. According to logs in bisect, the dropped frame metric scales with number of connections. I am not sure about having one peer connection with multiple video tags in case they have rendering optimizations that we aren't aware of. We can test and see.

I cannot think of any other metric than CPU/GPU usage for measuring performance here. We need to be aware of CPU adaptation kicking in, but then frame rate or resolution would change. All these combined should cover it.
Ok, so maybe let's try CPU/GPU and dropped frame with a bunch of peer connections in a test page. Can I simply revert the fixes in  https://crbug.com/647886  to reproduce the regression locally? Do I need to do anything else?
Yes, there is only one Cl for fix actually. Reverting https://codereview.chromium.org/2348903003 would be enough.
Owner: ehmaldonado@chromium.org
Edward is working on a test now.

For a first attempt, he has created a page which creates n peerconnections and n*2 video tags. This currently gives a nice signal pre-bug compared to post-bug Chrome: about 10% CPU usage. Edward, can you elaborate? What's the GPU usage difference by the way?

I also have an idea to create a similar page with one pc, but n streams and n video tags. This is more similar to a real load in our "downstream product".
So, in a test page with 10 peerconnections and 20 video tags running a videocall for 45s, I got a CPU usage of ~44% and a GPU usage of ~11% with the bug, and of ~12% CPU usage and ~6% GPU usage without the bug.
I ran it three times with and without the bug yesterday, and three times with and without the bug today, and the results seem to be stable.

The 10% difference was obtained with a different number of peerconnections (which I don't remember).
Sweet! Ship it!

Like we discussed offline, let's get this page in the demo github repo (Christoffer can help you), pull it into the Telemetry page set, and start running it in our telemetry tests.
Project Member

Comment 11 by bugdroid1@chromium.org, Nov 2 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1d38116383dc36f6be57077b86e3deaaae6cba20

commit 1d38116383dc36f6be57077b86e3deaaae6cba20
Author: ehmaldonado <ehmaldonado@chromium.org>
Date: Wed Nov 02 09:31:00 2016

Add a new telemetry benchmark to stress-test WebRTC.

BUG= 651820 
NOTRY=True
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.perf:android_s5_perf_cq;master.tryserver.chromium.perf:linux_perf_cq;master.tryserver.chromium.perf:mac_retina_perf_cq;master.tryserver.chromium.perf:winx64_10_perf_cq

Review-Url: https://codereview.chromium.org/2463013003
Cr-Commit-Position: refs/heads/master@{#429244}

[modify] https://crrev.com/1d38116383dc36f6be57077b86e3deaaae6cba20/tools/perf/benchmarks/webrtc.py
[modify] https://crrev.com/1d38116383dc36f6be57077b86e3deaaae6cba20/tools/perf/measurements/webrtc.py
[add] https://crrev.com/1d38116383dc36f6be57077b86e3deaaae6cba20/tools/perf/page_sets/data/webrtc_stresstest_cases.json
[add] https://crrev.com/1d38116383dc36f6be57077b86e3deaaae6cba20/tools/perf/page_sets/data/webrtc_stresstest_cases_000.wpr.sha1
[modify] https://crrev.com/1d38116383dc36f6be57077b86e3deaaae6cba20/tools/perf/page_sets/webrtc_cases.py

Cool! Here are some preliminary results:
https://chromeperf.appspot.com/report?sid=18bfbd3c9e3bc99a2bed0ecc8e038d8057fcc961774da2b2f7d3e0c70c85de53

We can see the stress test (45s) uses 12x more CPU compare to the 720p 45s call for the renderer process. It also uses a lot more GPU. For the browser process the stress test uses less for some reason. Well, the renderer is the interesting thing here anyway.

I'm going to get the new test running on CrOS devices as well, maybe we can see an even more pronounced effect there.
Here are some requirements we should aim for:

* It should be possible to disable CPU adaptation
* Input streams should be realistic; movement and colors. Ideally a recording should be used
* We need to be able to track test results over time on a dashboard

What to monitor in the test:
* FPS sent/received
* Resolution
* Dropped frames

I think we have everything in the current incarnation of the stress test, except 1) disabling CPU adaptation and 2) realistic input video. I think the former is a simple change: we need to pass the peerconnection constraint googCpuOveruseDetection and set it to false. We can add a checkbox to the demo page and make sure the test unchecks it before starting. The latter is harder though.
Edward, can you add googCpuOveruseDetection support to the test page? Looks like you need something like in this page: https://test.webrtc.org/manual/peer2peer/
Project Member

Comment 17 by bugdroid1@chromium.org, Nov 17 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/2ef2e178bd89e509251bce9a8de627726d8e6866

commit 2ef2e178bd89e509251bce9a8de627726d8e6866
Author: Patrik Höglund <phoglund@chromium.org>
Date: Wed Nov 16 09:07:29 2016

Add webrtc.stress to whitelist.

BUG= chromium:651820 
TEST=can't test, config change

Change-Id: Ie811095628ddba6a1fe3348c88534b51c73b072c
Reviewed-on: https://chromium-review.googlesource.com/411881
Reviewed-by: Rohit Makasana <rohitbm@chromium.org>
Commit-Queue: Rohit Makasana <rohitbm@chromium.org>
Tested-by: Rohit Makasana <rohitbm@chromium.org>

[modify] https://crrev.com/2ef2e178bd89e509251bce9a8de627726d8e6866/server/cros/telemetry_runner.py

Ok! The test is now running on CrOS devices. It's especially interesting on panther and buddy, which are Chromebox-for-meetings devices. Edward, did you verify the test still would have caught the issue, after the revisions we made? 

If so, the only thing remaining is to get tests under monitoring. I think we might want to monitor only buddy, panther and a couple more boards here, to limit the impact of false positives.
Status: Fixed (was: Assigned)
After verifying that the webrtc.stress test would have caught the issue, we've got both webrtc.peerconnection and webrtc.stress metrics under monitoring on the sheriff rotation for windows 7, mac retina, nexus 5x, linux, buddy, panther and auron_paine.

Comment 20 by dchan@google.com, Mar 4 2017

Labels: VerifyIn-58

Comment 21 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 22 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 24 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment