New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 673558 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Last visit > 30 days ago
Closed: Feb 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Fine tuning iterations in crosperf nightly tests.

Project Member Reported by laszio@chromium.org, Dec 13 2016

Issue description

crosperf runs a set of benchmarks on several boards everyday. Each benchmark is repeated 2 or 3 times. However, some of them are quite stable and some are flaky, hence the results are flaky for some and redundant for the others.

To guarantee a certain confidence interval (p) with a predefined margin of error (e) given the standard deviation (d) of a benchmark, the required samples is square(isf((1 - p) / 2) * d / e).

The expected outcomes are:
1) flaky benchmarks made stable, by allocating more iterations.
2) stable benchmarks cost less time, by reducing iterations.

I'll also review how the scores are calculated in these test suites.
 

Comment 1 by laszio@chromium.org, Dec 13 2016

Note that the formula square(isf((1 - p) / 2) * d / e) assumes that the samples follow normal distribution or the number of samples is large. Where isf is inverse survival function of N(0, 1).

A bound without any assumptions can be obtained by Chebyshev's inequality but that would be too large: square(d / e) / (1 - p).

For example, to get 90% confidence within 1% error given 1.6% standard deviation, by assuming the samples follow normal distribution it requires 7 samples. It would require 26 samples without any assumptions.

Comment 2 by laszio@chromium.org, Feb 20 2017

Standard deviations of 10 iterations on R57-9077.0

======= elm / llvm_trybot_image =======
2.3%  dromaeo.domcoreattr/dom__summary
1.0%  dromaeo.domcoremodify/dom__summary
0.8%  graphics_WebGLAquarium/avg_fps_1000_fishes__summary
1.9%  kraken/Total__summary
0.8%  octane/Total__Score
1.2%  page_cycler_v2.typical_25/pcv1-cold@@timeToOnload_avg__summary
1.7%  page_cycler_v2.typical_25/pcv1-warm@@timeToOnload_avg__summary
0.9%  smoothness.tough_webgl_cases/percentage_smooth__summary
0.7%  speedometer/Total__summary

======= elm / vanilla_image =======
1.7%  dromaeo.domcoreattr/dom__summary
1.1%  dromaeo.domcoremodify/dom__summary
0.7%  graphics_WebGLAquarium/avg_fps_1000_fishes__summary
1.4%  kraken/Total__summary
1.0%  octane/Total__Score
1.5%  page_cycler_v2.typical_25/pcv1-cold@@timeToOnload_avg__summary
1.3%  page_cycler_v2.typical_25/pcv1-warm@@timeToOnload_avg__summary
1.3%  smoothness.tough_webgl_cases/percentage_smooth__summary
0.3%  speedometer/Total__summary

======= peppy / llvm_trybot_image =======
6.4%  dromaeo.domcoreattr/dom__summary
0.7%  dromaeo.domcoremodify/dom__summary
0.4%  graphics_WebGLAquarium/avg_fps_1000_fishes__summary
0.5%  kraken/Total__summary
0.9%  octane/Total__Score
0.6%  page_cycler_v2.typical_25/pcv1-cold@@timeToOnload_avg__summary
0.8%  page_cycler_v2.typical_25/pcv1-warm@@timeToOnload_avg__summary
1.0%  smoothness.tough_webgl_cases/percentage_smooth__summary
0.4%  speedometer/Total__summary

======= peppy / vanilla_image =======
4.9%  dromaeo.domcoreattr/dom__summary
0.7%  dromaeo.domcoremodify/dom__summary
0.4%  graphics_WebGLAquarium/avg_fps_1000_fishes__summary
0.4%  kraken/Total__summary
1.2%  octane/Total__Score
1.2%  page_cycler_v2.typical_25/pcv1-cold@@timeToOnload_avg__summary
1.0%  page_cycler_v2.typical_25/pcv1-warm@@timeToOnload_avg__summary
2.5%  smoothness.tough_webgl_cases/percentage_smooth__summary
0.7%  speedometer/Total__summary

======= squawks / llvm_trybot_image =======
1.8%  dromaeo.domcoreattr/dom__summary
1.0%  dromaeo.domcoremodify/dom__summary
0.5%  graphics_WebGLAquarium/avg_fps_1000_fishes__summary
0.3%  kraken/Total__summary
1.5%  octane/Total__Score
2.0%  page_cycler_v2.typical_25/pcv1-cold@@timeToOnload_avg__summary
1.6%  page_cycler_v2.typical_25/pcv1-warm@@timeToOnload_avg__summary
1.4%  smoothness.tough_webgl_cases/percentage_smooth__summary
0.2%  speedometer/Total__summary

======= squawks / vanilla_image =======
2.0%  dromaeo.domcoreattr/dom__summary
0.9%  dromaeo.domcoremodify/dom__summary
0.6%  graphics_WebGLAquarium/avg_fps_1000_fishes__summary
0.4%  kraken/Total__summary
1.1%  octane/Total__Score
1.9%  page_cycler_v2.typical_25/pcv1-cold@@timeToOnload_avg__summary
2.5%  page_cycler_v2.typical_25/pcv1-warm@@timeToOnload_avg__summary
1.7%  smoothness.tough_webgl_cases/percentage_smooth__summary
0.3%  speedometer/Total__summary

Comment 3 by laszio@chromium.org, Feb 20 2017

The maximum of all the boards and images are:

6.4%  dromaeo.domcoreattr
1.1%  dromaeo.domcoremodify
0.8%  graphics_WebGLAquarium
1.9%  kraken
1.5%  octane
2.5%  page_cycler_v2.typical_25
2.5%  smoothness.tough_webgl_cases
0.7%  speedometer

To achieve CI = (90%, +-2%), #samples required by pagecycler and dromaeo.domcoreattr are 5 and 28, which is not affordable in terms of time. I'm going to remove outliers and file another bug to investigate why they incur such a high variation. For now the following will be used:

2.3%  dromaeo.domcoreattr
2.1%  page_cycler_v2.typical_25

Comment 4 by laszio@chromium.org, Feb 20 2017

Cc: cmt...@chromium.org llozano@chromium.org
Project Member

Comment 5 by bugdroid1@chromium.org, Feb 22 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/toolchain-utils/+/8332364c0237ca6c4976c5206346ab9a596c8e98

commit 8332364c0237ca6c4976c5206346ab9a596c8e98
Author: Ting-Yuan Huang <laszio@google.com>
Date: Wed Feb 22 00:16:41 2017

crosperf: set recommended iterations for benchmarks

This CL associates estimated standard deviations to each benchmark,
according to experiments. The recommended iterations can be specified by
setting iterations = 0 in the experiment files. Setting it to numbers
greater than 0 will override the default iterations and behaves exactly
the same as before.

With this change, benchmarks in all_toolchain_perf get no more than 2%
margin of error within 90% of time. See  crbug.com/673558  for how the
standard deviations are estimated.

BUG= chromium:673558 
TEST=all_toolchain_perf + page_cycler_v2.typical_25 finishes in 3.5
     hours for an image on chell.

Change-Id: Ie2ed07878c1237ad31a8568ae3fd3fb96cd11f3b
Reviewed-on: https://chromium-review.googlesource.com/424915
Commit-Ready: Ting-Yuan Huang <laszio@chromium.org>
Tested-by: Ting-Yuan Huang <laszio@chromium.org>
Reviewed-by: Caroline Tice <cmtice@chromium.org>

[modify] https://crrev.com/8332364c0237ca6c4976c5206346ab9a596c8e98/crosperf/benchmark.py
[modify] https://crrev.com/8332364c0237ca6c4976c5206346ab9a596c8e98/buildbot_test_toolchains.py
[modify] https://crrev.com/8332364c0237ca6c4976c5206346ab9a596c8e98/crosperf/settings_factory.py

Comment 6 by laszio@chromium.org, Feb 27 2017

Status: Verified (was: Started)

Sign in to add a comment