video_VDAPerf.h264: performance major regression on squawks |
||||||||
Issue descriptionChromeOS Version range: 62.9901.32.0 - 63.9964.0.0 Chrome Version range: 62.0.3202.39 - 63.0.3222.0 https://chromeperf.appspot.com/report?masters=ChromeOSVideo&bots=cros-squawks&tests=video_VDAPerf.h264%2Fcrowd2160_h264.cpu_usage.kernel&checked=crowd2160_h264.cpu_usage.kernel%2Ccrowd2160_h264.cpu_usage.kernel_ref%2Cref&rev=32220000996400000
,
Oct 2 2017
,
Oct 3 2017
Taking ownership for investigation.
,
Oct 3 2017
,
Oct 3 2017
Raising priority as this is a quite large regression.
,
Oct 4 2017
Mmm it seems like I cannot reproduce this on Clapper. I tried 62.9901.32.0, 63.9964.0.0 and a local ToT. For all three crowd2160_h264.cpu_usage.kernel was consisten, around 0.27-0.28. hiroh@, do we have hard evidence that this is reproducible on Clapper? Otherwise I may need a Squawks device.
,
Oct 4 2017
Could reproduce the result of crowd2160_h264.cpu_usage.kernel == 0.82 on chromeos6-row2-rack12-host22.cros with R63-9997.0.0 (which was already flashed when I locked it). Flashing on the lab is quite slow though, so bisecting may take some time.
,
Oct 4 2017
Although I have never tried by myself, it should be reproduced on clapper according to the following graph. https://chromeperf.appspot.com/report?sid=6bc0bccc3a79a48d50d5dc2e9000a05827685442c74d33bd32b9013b9a425ce9&rev=32220000996400000
,
Oct 4 2017
Mmm, you're right. Let me double-check then, it is strange I could repro this on the lab but not on the local clapper. Btw, the graph timeline is strange: 2017-10-03 comes before 2017-09-23 for some reason. Can someone explain that?
,
Oct 4 2017
> Btw, the graph timeline is strange: 2017-10-03 comes before 2017-09-23 for some reason. Can someone explain that? I think this is because the measured value is sorted by ChromeOS version with Chrome version. ChromeOS for R62 branch stops 9901 and incremented the second number like 9901.36.0 -> 9901.37.0. Therefore, any measured value of 63.*.0.0 are followed by ones for any measured value 62.9901.*.0. Otherwise, R62 and R63 ChromeOS are scrambled on the graph.
,
Oct 4 2017
Very weird. I could test on a local Squawks, both R63-9964.0.0 and R63-10000.0.0. crowd2160_h264.cpu_usage.kernel is around 0.28 in both cases, same as the local Clapper. The difference seems to come from whether we are running the test locally or on a lab machine?
,
Oct 5 2017
Turns out the lab and local squawks on which I tested had different memory configurations: Remote (chromeos6-row2-rack12-host22): MemTotal: 1963636 kB Local: MemTotal: 3958800 kB All the lab machines within pool:suite have the sku:squawks_intel_dual_2Gb label. Would be interesting to borrow (lock) one of the ones with sku:squawks_intel_dual_4Gb and see if we can repro on them, but they are not within the suite pool.
,
Oct 5 2017
We observed when running R63-9964.0.0 on the 4GB squawks that available memory was falling below 1.3GB, which suggests that 2GB devices would run out of memory and could explain the regression. However, after trying R62-9901.32.0 on the same device, I could observe the same metric. So excessive memory consumption does not seem to be the cause of the regression here. However, after double-checking the metrics, I noticed that the regression *could* be observed on the 4GB device as well, contrary to what I initially stated: R62-9901.32.0: cpu_usage.kernel: 0.137 cpu_usage.user: 0.058 R63-9964.0.0: cpu_usage.kernel: 0.29 cpu_usage.user: 0.09 The numbers are lower than 2GB devices (probably because of lower memory pressure), but the difference is quite noticeable. So it looks like we have a basis for bisecting locally!
,
Oct 5 2017
Another observation, this time when running the R62-9901.32.0 VDAPerf binary on a R63-9964.0.0 sysroot: cpu_usage.kernel: 0.136 cpu_usage.user: 0.057 So looks like the regression is introduced by user-space. Testing VDAPerf is fast, so I am now trying to narrow the range as much as possible.
,
Oct 5 2017
Result of bisect is that regression has been introduced by 9964 itself. 9963 still shows good results: cpu_usage.kernel: 0.136 cpu_usage.user: 0.06 Which means the culprit is hiding within this set of changes, kernel changes aside: https://crosland.corp.google.com/log/9963.0.0..9964.0.0
,
Oct 5 2017
Also Chromium version has switched from 63.0.3218.0 to 63.0.3222.0, so that may be the cause too.
,
Oct 5 2017
Bisected culprit to https://chromium-review.googlesource.com/c/chromium/src/+/654459. Looks like this change only affects the tests, so it does not introduce a real-life regression. Still it would be interesting to understand why offscreen surfaces imply more kernel cpu usage.
,
Oct 5 2017
So, it turns out the cpu_usage metrics are computed from the output of the "time" command that is run on video_decode_accelerator_unittest. Since we do not render anymore, the total runtime decreases, but the kernel time and user time remain the same: this is what actually makes these metrics increase. In other words, there is no regression and we just need to come with a better metric to measure performance. Closing this as not a bug.
,
Oct 5 2017
Opened crbug.com/771919 to track the replacement of these metrics. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by hywu@chromium.org
, Oct 2 2017