Fine tuning iterations in crosperf nightly tests. |
|||
Issue descriptioncrosperf runs a set of benchmarks on several boards everyday. Each benchmark is repeated 2 or 3 times. However, some of them are quite stable and some are flaky, hence the results are flaky for some and redundant for the others. To guarantee a certain confidence interval (p) with a predefined margin of error (e) given the standard deviation (d) of a benchmark, the required samples is square(isf((1 - p) / 2) * d / e). The expected outcomes are: 1) flaky benchmarks made stable, by allocating more iterations. 2) stable benchmarks cost less time, by reducing iterations. I'll also review how the scores are calculated in these test suites.
,
Feb 20 2017
Standard deviations of 10 iterations on R57-9077.0 ======= elm / llvm_trybot_image ======= 2.3% dromaeo.domcoreattr/dom__summary 1.0% dromaeo.domcoremodify/dom__summary 0.8% graphics_WebGLAquarium/avg_fps_1000_fishes__summary 1.9% kraken/Total__summary 0.8% octane/Total__Score 1.2% page_cycler_v2.typical_25/pcv1-cold@@timeToOnload_avg__summary 1.7% page_cycler_v2.typical_25/pcv1-warm@@timeToOnload_avg__summary 0.9% smoothness.tough_webgl_cases/percentage_smooth__summary 0.7% speedometer/Total__summary ======= elm / vanilla_image ======= 1.7% dromaeo.domcoreattr/dom__summary 1.1% dromaeo.domcoremodify/dom__summary 0.7% graphics_WebGLAquarium/avg_fps_1000_fishes__summary 1.4% kraken/Total__summary 1.0% octane/Total__Score 1.5% page_cycler_v2.typical_25/pcv1-cold@@timeToOnload_avg__summary 1.3% page_cycler_v2.typical_25/pcv1-warm@@timeToOnload_avg__summary 1.3% smoothness.tough_webgl_cases/percentage_smooth__summary 0.3% speedometer/Total__summary ======= peppy / llvm_trybot_image ======= 6.4% dromaeo.domcoreattr/dom__summary 0.7% dromaeo.domcoremodify/dom__summary 0.4% graphics_WebGLAquarium/avg_fps_1000_fishes__summary 0.5% kraken/Total__summary 0.9% octane/Total__Score 0.6% page_cycler_v2.typical_25/pcv1-cold@@timeToOnload_avg__summary 0.8% page_cycler_v2.typical_25/pcv1-warm@@timeToOnload_avg__summary 1.0% smoothness.tough_webgl_cases/percentage_smooth__summary 0.4% speedometer/Total__summary ======= peppy / vanilla_image ======= 4.9% dromaeo.domcoreattr/dom__summary 0.7% dromaeo.domcoremodify/dom__summary 0.4% graphics_WebGLAquarium/avg_fps_1000_fishes__summary 0.4% kraken/Total__summary 1.2% octane/Total__Score 1.2% page_cycler_v2.typical_25/pcv1-cold@@timeToOnload_avg__summary 1.0% page_cycler_v2.typical_25/pcv1-warm@@timeToOnload_avg__summary 2.5% smoothness.tough_webgl_cases/percentage_smooth__summary 0.7% speedometer/Total__summary ======= squawks / llvm_trybot_image ======= 1.8% dromaeo.domcoreattr/dom__summary 1.0% dromaeo.domcoremodify/dom__summary 0.5% graphics_WebGLAquarium/avg_fps_1000_fishes__summary 0.3% kraken/Total__summary 1.5% octane/Total__Score 2.0% page_cycler_v2.typical_25/pcv1-cold@@timeToOnload_avg__summary 1.6% page_cycler_v2.typical_25/pcv1-warm@@timeToOnload_avg__summary 1.4% smoothness.tough_webgl_cases/percentage_smooth__summary 0.2% speedometer/Total__summary ======= squawks / vanilla_image ======= 2.0% dromaeo.domcoreattr/dom__summary 0.9% dromaeo.domcoremodify/dom__summary 0.6% graphics_WebGLAquarium/avg_fps_1000_fishes__summary 0.4% kraken/Total__summary 1.1% octane/Total__Score 1.9% page_cycler_v2.typical_25/pcv1-cold@@timeToOnload_avg__summary 2.5% page_cycler_v2.typical_25/pcv1-warm@@timeToOnload_avg__summary 1.7% smoothness.tough_webgl_cases/percentage_smooth__summary 0.3% speedometer/Total__summary
,
Feb 20 2017
The maximum of all the boards and images are: 6.4% dromaeo.domcoreattr 1.1% dromaeo.domcoremodify 0.8% graphics_WebGLAquarium 1.9% kraken 1.5% octane 2.5% page_cycler_v2.typical_25 2.5% smoothness.tough_webgl_cases 0.7% speedometer To achieve CI = (90%, +-2%), #samples required by pagecycler and dromaeo.domcoreattr are 5 and 28, which is not affordable in terms of time. I'm going to remove outliers and file another bug to investigate why they incur such a high variation. For now the following will be used: 2.3% dromaeo.domcoreattr 2.1% page_cycler_v2.typical_25
,
Feb 20 2017
,
Feb 22 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/toolchain-utils/+/8332364c0237ca6c4976c5206346ab9a596c8e98 commit 8332364c0237ca6c4976c5206346ab9a596c8e98 Author: Ting-Yuan Huang <laszio@google.com> Date: Wed Feb 22 00:16:41 2017 crosperf: set recommended iterations for benchmarks This CL associates estimated standard deviations to each benchmark, according to experiments. The recommended iterations can be specified by setting iterations = 0 in the experiment files. Setting it to numbers greater than 0 will override the default iterations and behaves exactly the same as before. With this change, benchmarks in all_toolchain_perf get no more than 2% margin of error within 90% of time. See crbug.com/673558 for how the standard deviations are estimated. BUG= chromium:673558 TEST=all_toolchain_perf + page_cycler_v2.typical_25 finishes in 3.5 hours for an image on chell. Change-Id: Ie2ed07878c1237ad31a8568ae3fd3fb96cd11f3b Reviewed-on: https://chromium-review.googlesource.com/424915 Commit-Ready: Ting-Yuan Huang <laszio@chromium.org> Tested-by: Ting-Yuan Huang <laszio@chromium.org> Reviewed-by: Caroline Tice <cmtice@chromium.org> [modify] https://crrev.com/8332364c0237ca6c4976c5206346ab9a596c8e98/crosperf/benchmark.py [modify] https://crrev.com/8332364c0237ca6c4976c5206346ab9a596c8e98/buildbot_test_toolchains.py [modify] https://crrev.com/8332364c0237ca6c4976c5206346ab9a596c8e98/crosperf/settings_factory.py
,
Feb 27 2017
|
|||
►
Sign in to add a comment |
|||
Comment 1 by laszio@chromium.org
, Dec 13 2016