New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 629487 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Feature



Sign in to add a comment

tweaks to reduce noise in perf benchmarks in Linux (scaling governor, min/max freq, turbo boost, P states)

Project Member Reported by primiano@chromium.org, Jul 19 2016

Issue description

Filing this to keep track on a more persistent medium.
In [1] I did some quick research about tuning various parameters (scaling governor, frequency, P states) in order to get less noisy benchmarks.
There are two potentially action items here:
 - 1. Having a perf_envsetup script in catapult shared by bots and developers (see nduca suggestion).
 - 2. Improving the kind of tweaks we do in the perf bots today.
 - 3. Figuring out the Windows equivalent of this black magic. I can only do Linux black magic.

TL;DR
On a Z620 cc_perftests's stddev went from 2% to 0.07% using some tweaks. Full data in [2]. The summary is:

- Affinity makes a big difference
- SCHED_FIFO a bit, but not that much
- performance governor seems to make things actually worse
- when you use powersave governor the scaling_max_freq seems to be ignored. Proof:
$ for cpu in /sys/devices/system/cpu/cpu*; do sudo sh -c "echo performance > $cpu/cpufreq/scaling_governor";
$ for cpu in /sys/devices/system/cpu/cpu*; do sudo sh -c "echo 1200000 > $cpu/cpufreq/scaling_min_freq"; done
$ for cpu in /sys/devices/system/cpu/cpu*; do echo -en "$cpu\t"; sudo cat $cpu/cpufreq/scaling_cur_freq; 
  /sys/devices/system/cpu/cpu0    3100015
  /sys/devices/system/cpu/cpu1    3100015
  /sys/devices/system/cpu/cpu10   3099906
  /sys/devices/system/cpu/cpu11   3100015

- The max_freq seems to be respected instead when using "powersave". Also powersave seems to respect the scaling_max_freq. Proof:
for cpu in /sys/devices/system/cpu/cpu*; do echo -en "$cpu\t"; sudo cat $cpu/cpufreq/scaling_cur_freq; done
  /sys/devices/system/cpu/cpu0    1199953
  /sys/devices/system/cpu/cpu1    1199953
  /sys/devices/system/cpu/cpu10   1199953
  /sys/devices/system/cpu/cpu11   1199953
  /sys/devices/system/cpu/cpu12   1199953

- Disabling all the P-state (but P0) makes the biggest difference (1 order of magnitude in stddev)
sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
for cpu in /sys/devices/system/cpu/cpu*; do for p in $(seq 4); do sudo sh -c "echo 1 > $cpu/cpuidle/state$p/disable"; done; do


nduca@ suggested in a follow up: 
"""From my very high level view of things, I think we should move toward a world where catapult has a subsystem in it ("environment setup"?) that sets up the local machine for the best possible perf run possible. It might be parameterized in a few ways to accommodate power, for instance, EnvironmentSetup.Init(is_power_test=False). Then we'd use that in all the places, telemetry of course, but other harnesses as well.
We know that local dev machines are varied and often problematic. I think its great for the environment setup code to detect bad-for-perf things and try to disable them. But, if they can't be disabled, it should at least try to stop the developer from running. I think true success here looks like us steering people to use perf trybots more and use local runs less. Eg better if environment setup say "we can't make your machine produce good results, you're running some stuff we can't disable. please use the perf bots. if you are sure, you can bypass this by saying --ignore-environment-setup-failure."""

I personally really like the idea of having one perf_envsetup script which is shared by humans and bots, so that everybody tests under the same conditions.

[1] https://groups.google.com/a/chromium.org/d/msg/project-trim/uhtD-UIvjLA/RtAd1k6KBQAJ
[2] https://docs.google.com/spreadsheets/d/1xAxScjSHWht-ftiag1ppiAedM2aLON_BJcafWEC9Xck/edit?usp=sharing 
 
Labels: Performance
Cc: stip@chromium.org friedman@chromium.org iannucci@chromium.org pschmidt@chromium.org
We should definitely do a perf_envsetup!

Adding friedman and pschmidt and iannucci and stip for thoughts on how we could implement it so that we can easily do the same environment setup locally and on the various bots we run on (chromium.perf, chromium.perf.fyi, tryserver.chromium.perf try/bisect)
One thing to keep in mind is that we "sorta have kind of this" for android in 
catapult/devil/devil/android/perf/perf_control.py

That has the right "spirit" (all the tech seems good) but cannot be easily launched by developers. I think this is just missing a frontend (concretely that thing has no __main__), possibly the same frontend that we'd use for all these Linux tricks here.
Components: Tests>Telemetry

Comment 5 by pschm...@google.com, Jul 19 2016

Cc: -iannucci@chromium.org mtrofin@chromium.org
Not sure if this helps but it looks like the v8 folks are doing something similar.   (See https://bugs.chromium.org/p/chromium/issues/detail?id=606804 for context)   


Comment 6 by pschm...@google.com, Jul 19 2016

Cc: iannucci@chromium.org
Cc: perezju@chromium.org
On Z840 / Haswell Xeon v3, there's no intel_pstate/noturbo. Instead, the following should be used to disable turbo:

sudo sh -c "echo 0 > /sys/devices/system/cpu/cpufreq/boost"
Components: Speed>Tracing
Components: -Internals>Tracing

Comment 11 by stip@chromium.org, Feb 10 2017

Cc: -stip@chromium.org
Labels: -Performance
Status: Available (was: Untriaged)
Project Member

Comment 13 by sheriffbot@chromium.org, Apr 30 2018

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Owner: charliea@chromium.org
This is sorta on Charlie's plate now :-)
Status: Assigned (was: Untriaged)
This bug has an owner, thus, it's been triaged. Changing status to "assigned".
Owner: brucedaw...@chromium.org
Kicking this over to Bruce, the new owner of the power benchmarks. He should be the one to decide whether this is worth pursuing.
Cc: -iannucci@chromium.org iannu...@google.com
Cc: -eakuefner@chromium.org
Components: Test>Telemetry
Components: -Tests>Telemetry

Sign in to add a comment