puppet causing noise on Telemetry Mac power benchmarks due to periods of ~80% CPU usage |
||||||
Issue descriptionIt seems that puppet is causing noise in our Telemetry Mac power benchmarks (mac-rel-retina). Looking at the perf dashboard for average power use while sitting idle on about:blank (see puppet_ran.png), there are clear spikes for particular benchmark runs. Digging into the trace for this run (see puppet.html), there's a clear section of the trace towards the end where the power spikes. Our CPU snapshotting mechanism shows that, during this time, puppet is running and consuming 80% of the CPU (see puppet_cpu_snapshot.png). Compare this to one of the lower-power traces (see non_puppet.html), where no such high-power area of the trace exists. maruel@, is there any way that we can prevent puppet from running during benchmarks so that it doesn't introduce noise into the results? Do you have any idea who I might be able to talk to about this?
,
Jan 17 2017
,
Jan 17 2017
Robbie worked on a plan for this but that will be only for tasks run on Swarming.
,
Jan 17 2017
Dirk, for background: Charlie is working on power benchmarks using BattOr. Power benchmarks are even more sensitive to background noise than regular benchmarks. We've added a "CPU snapshot" feature into tracing where we can look at spikes in power usage that seem unrelated to Chrome and correlate with other processes on the machine that were running and using CPU. For this particular issue, it's puppet, but labs recently updated their version of gagent to fix a similar issue. The long term plan for this has been swarming dark mode: https://docs.google.com/document/d/1xPhPCtXXsRSqY74hh4DjAt2hvNDiq0w1XLe7VmumpkA/edit Our desktop bots are now swarmed, and Charlie is primarily looking at ways to reduce noise on desktop.
,
Jan 17 2017
Given that all perf benchmarks will soon be run on swarmed bots (still waiting on you, Android), I think that a plan that addresses this on only swarmed bots is fine with me. However, it is important to me that this get resolved in the short-term and we don't have to wait on a distant long-term fix.
,
Jan 24 2017
Assigning to friedman, but he's still out on leave. Given that we need this when Android is on Swarming, let's make sure Ben and Elliott sign off before moving all of Android infra to swarming.
,
Jan 24 2017
This has nothing to do with Android, it's about Mac, so not a blocker for android perf -> swarming.
,
Jan 24 2017
benhenry@, do you have an estimate of when this might be fixed by? As the graphs above show, this is adding quite a bit of noise to BattOr power benchmarks.
,
Jan 25 2017
benhenry@ and I talked offline, and he confirmed my suspicion that friedman@ isn't likely to get to this within the next week or two due to the fact that he's not due back from paternity leave until 1/30. I told benhenry@ that I'd really like this done within a month from now, but it isn't ruining our results so much that it's a P1.
,
Jan 30 2017
Is this not a dupe of crbug.com/416072 ?
,
Jan 31 2017
Sorry - I filed it as a separate bug because it wasn't clear to me if Puppet even *should* be consuming as much CPU as it currently is. I know that when we noticed this for gagent, it was a simple bug that only required a roll of gagent in order to fix it. In an ideal world, the fix would be as easy as that for puppet.
,
Mar 1 2017
Alright, we really need to move the ball forward here. Elliott - what question are you waiting for speed to answer? Please restate explicitly.
,
Mar 1 2017
,
Mar 1 2017
Comment 10, this seems like a dupe of the main issue.
,
Mar 2 2017
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by charliea@chromium.org
, Jan 17 2017