Can we baseline battor measurements against idle system power [no chrome?] |
||||
Issue descriptionThat should make it easier to notice changes to chrome's power usage, and seems like the more relevant number.
,
Dec 16 2016
It BattOr power measurements are stable, then can't we just measure for 2 second before launching chrome, with no other code running in the background?
,
Dec 16 2016
That's exactly what we tried to do with the MSRs. Every time in the past we've tried to give a metric as a diff from a baseline like this (power minus idle baseline, ToT minus stable reference build) we've run into big problems where it decreased the stability of the metric and it was hard to understand from looking at the graph. What is the reason for wanting to get just the power Chrome consumes? If the system power consumption is stable, it shouldn't affect our metrics because it's just a constant added in. If it's not stable, I'd prefer to understand that better through looking at what background processes are running via tools like cpu snapshotting.
,
Dec 16 2016
tl dr: If we have low noise, we should subtract out the system idle power. If we can't subtract out system idle power, noise is probably higher than we think it is. I think that with a chrome-only measurement, we can get a sense for what we should be aiming for. If total draw = 15W, I can't tell if this is 2W Chrome, 13W system, or vice versa. So is a 0.5W improvement to chrome a 25% improvement, or 4%. In my mind, instantaneous power draw = system baseline + system noise + chrome baseline + chrome noise. If Battor power measurements are stable, I expect both system noise and chrome noise to be low, which is why I was suggesting subtracting out system baseline + system noise. If system noise is high, then I agree we can't subtract it out, but then perhaps the total measurements are not as stable as we think they are? Looking at: https://chromeperf.appspot.com/report?sid=7559ae1e32600c7b5b27908d1b16217fa8766160cd0e8024db07c9689954ee6e&start_rev=433359&end_rev=439116 There is a run-to-run noise [after 436479] of ~0.2W [totally made up number from rough eyeballing]. The largest single run difference I found was 0.5W. This looks "stable" because it's relatively small compared to 15W. If Chrome is responsible for 13W, then I agree, we have low-noise measurement. But if Chrome is responsible for 2W, then we have 10% noise, and it's going to be really hard to discern 5% improvements.
,
Dec 16 2016
Maybe a good place to start is by trying to collect the "baseline" as a separate metric, so we can just plot what it would be and use that as a starting point? Charlie, Ned, WDYT?
,
Dec 16 2016
Baselining to get a sense of how much power Chrome is consuming at the moment sounds good to me, but I am not convinced it need be done in the lab monitoring unless Chrome's power usage is about 1% of the system power usage. For the first step, can someone can just make a local battor run to figure out how much Chrome is consuming vs how much the system is consuming?
,
Dec 29 2016
I think I agree with Ned here. It seems like there are two possibilities: 1) Chrome has a negligible impact on overall power, so in order to discern improvements, we need to subtract out the baseline power. In this case, a 50% improvement in Chrome's power draw would only result in something like a 5% or 1% improvement in overall power. While I would be thrilled if this were the case, we know from testing that it's generally not. 2) Chrome has a significant impact on overall power, so we don't need to subtract out baseline power in order to discern improvements. A 50% improvement in Chrome's power might result in a 25% improvement in overall power. This seems to be the current state of the world. A couple of other thoughts: - Erik, I agree with you that it'd be hard to discern a 5% improvement in Chrome power in the linked graph if Chrome is only responsible for 2W. However, if Chrome is only responsible for 2W in the above graph, I'd question why we're expending effort to make 5% improvements to that number. That might make sense a ways down the road from now, but I strongly suspect that we have bigger fish to fry at this point. - Such a subtraction would require time in the user stories (sitting idle for 1s/user story would add overhead). - The metric that results from subtracting out the baseline power would only make sense for Telemetry tests, where we could guarantee that we'd have such an idle period. For interactively collected traces, there'd be no such guarantee. This would mean that we'd have a difficult time computing the same metrics for interactive traces that we collect for Telemetry traces. This goes against the spirit of TBMv2 that you shouldn't have to collect a trace in a lab environment in order to collect metrics on it. - Implementing this would take some time, and we just have bigger fish to fry in the foreseeable future. We still have to fix problems where the serial communication fails when requesting the firmware version (https://bugs.chromium.org/p/chromium/issues/detail?id=673411), where the BattOr agent on Mac is failing to adequately flush the serial stream resulting in multiple failed test runs in a row (https://bugs.chromium.org/p/chromium/issues/detail?id=677303), and, worst of all, where BattOrs just reset when they're told to start tracing (https://bugs.chromium.org/p/chromium/issues/detail?id=672631). Unfortunately, we have a limited amount of resources, so we're forced to prioritize, and each of those comes in at a higher priority than this IMO.
,
Dec 29 2016
+ sullivan. Let me attempt to be more precise. Total battor power readings = idle power + chrome power. These have underlying distributions that we sample from at battor's sampling frequency. We know that both of these distributions have some noise. Let's say define IPSTD to be idle power standard deviation. Claim 1: Any improvements to chrome < IPSTD will be very hard to discern using battor [and likewise regressions will be hard to track down] If IPSTD were low, we could easily subtract out idle power from total power. The fact that we can't suggests that IPSTD isn't low. Even if we're not going to subtract out idle power, it's very important that we know what IPSTD is, so we can determine the specificity of our tests. When we roll out "improvements" to Chrome power draw, we need to be able to measure these improvements [if we can't, then we also won't know when they regress]. If IPSTD is too high, then we simply can't use BattOr as our singular source of truth. We can still use it to measure long term trends, and it will catch *huge* improvements/regressions, but we'll need another measurement with higher specificity. [e.g. thread_times TBMv2? Or something else...] If we are going to use thread_times, then we should get a basic understanding for the effect of CPU usage on power draw: https://bugs.chromium.org/p/chromium/issues/detail?id=674966#c4 I've also been mulling over the idea of a pseudo-idle-wakeups metric, which we could compute with TBMv2 [e.g. if there is an X second interval with no tasks followed by a task, increment pseudo-idle-wakeups.]
,
Dec 29 2016
As a relatively easy first step, can we measure and emit idle power draw along with our other metrics?
,
Dec 29 2016
I think that what you're describing sounds very similar to what the story:power_min is *supposed* to be. In theory, that metric is supposed to give the minimum instantaneous power measurement over the course of the story. If we had this metric, along with some small period of idleness w/o Chrome running at the start of the benchmark (.1s? .25s? .5s?), do you think that'd be sufficient for your purpose of trying to identify the idle power draw? (Notice: idle:power_avg or idle:power_min seem like good candidates for giving the idle power draw for the machine, but these metrics are intended to give the power while *Chrome* is idle, not the machine.) Also, it's worth noting that there's an outstanding feature request to actually make this the way the metric works (https://github.com/catapult-project/catapult/issues/2990). Right now, power_min reports exactly the same thing as power_avg for reasons that are sensible given the context of TBMv2 metrics but completely baffling to users.
,
Dec 29 2016
Yes, I think that story:power_min and an idle period of 0.1s should be enough signal to determine the idle power draw [and noise therein]. Thanks for following up!
,
Jan 10 2017
No problem! I've verified that my changes to the story:power metric are working as intended, as seen in this graph: https://chromeperf.appspot.com/report?sid=e7ae5b3bb54879cd43f168d8a66265376a24198a611e5a88ad8bb7ef7b7a0374 It seems like on these Macbook Pros, the baseline power is very consistently around 5.1W, whereas Chrome adds about 2.2W when sitting idle on about:blank. Mind if we close this bug and see how this works for now, and come back to this later if necessary?
,
Jan 17 2017
sgtm. |
||||
►
Sign in to add a comment |
||||
Comment 1 by sullivan@chromium.org
, Dec 16 2016