Issue metadata
Sign in to add a comment
|
platform_BootPerf/seconds_kernel_to_network regression across all(?) boards |
||||||||||||||||||||||
Issue descriptionI noticed that there seems to be a pretty big regression (around 4 to 8 full seconds) in the seconds_kernel_to_network metric in the platform_BootPerf test across a lot of boards (at least Chell, Nyan_Big and Peppy... Hana seems to also be affected but for some reason there are no M66 performance test results for it and many other boards on chromeperf). It seems to be centered right around the M65 branch point, somewhere between 10319.0.0 and 10367.0.0 (can't really tell how to differentiate M65 from M66 after the branch in the graph tool so I'm not sure if M65 is also affected). https://chromeperf.appspot.com/report?sid=a2adab4bb9595f9a6dc8e27f4f3a728e3219a7fa94530a5d61b260a9befd4ff8&start_rev=32990001026700000&end_rev=33360031037800000 This regression is so big that it's causing FAFT tests to fail (see b/74974982). Who is the right person to look into this?
,
Mar 16 2018
,
Mar 22 2018
The following revision refers to this bug: https://chromium.googlesource.com/aosp/platform/system/connectivity/shill/+/d9cd68b36b0f66083b608d035e6c2794de4564bd commit d9cd68b36b0f66083b608d035e6c2794de4564bd Author: Kirtika Ruchandani <kirtika@google.com> Date: Thu Mar 22 03:48:35 2018 shill: network-services: no wait for udev-trigger For release R66, we've resolved the issue of two entities on the system providing cfg80211 functionality by getting rid of stock cfg80211 on boards with Intel wifi. We no longer need to worry about the wrong cfg80211 being loaded on the system, so no need to wait for a wifi device driver (for which udev-trigger was a proxy) before starting network-services (and hence shill). This should resolve boot time performance regressions seen post-Core31 : making shill start-up dependent on udev-trigger meant the network wouldn't come up in the UI until a good 8-10 seconds after boot. On some devices, this was user-visible: user would find wifi turned off in the GUI because shill hadn't started up by the time the login screen came up, and turning the wifi slider to on manually wouldn't work until shill did start up. This partially reverts commit 6455013e3b00 ("init: Get rid of load_cfg80211"). BUG= chromium:810696 , chromium:822485 , b:74171245 TEST=Build and boot on Soraka, check that wifi is up on the login screen. Change-Id: I5f8e25942c27b28a9dc64c891d2ccf1df0adc2fa Reviewed-on: https://chromium-review.googlesource.com/972764 Commit-Ready: Brian Norris <briannorris@chromium.org> Tested-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Brian Norris <briannorris@chromium.org> [modify] https://crrev.com/d9cd68b36b0f66083b608d035e6c2794de4564bd/init/network-services.conf
,
Mar 22 2018
4 to 8 seconds is big, but is it really big enough to be worth failing FAFT tests? Seems like somebody needs a bigger timeout... Anyway, this is fixed. I manually verified on several boards, but you can probably see chromeperf results fill in in the next few days. It remains to be seen whether this will get ported back to M66, but unlikely to M65. This effort will probably happen in one of these bugs: https://bugs.chromium.org/p/chromium/issues/detail?id=807315 https://bugs.chromium.org/p/chromium/issues/detail?id=810696 But it might not happen; in which case...well we live with a silly boot-time regression. Marking M-67 as the 'Fixed' version.
,
Mar 22 2018
> 4 to 8 seconds is big, but is it really big enough to be worth failing FAFT tests? Seems like somebody needs a bigger timeout... This is a timeout that needs to fully run out every time a post-2015 board tries to reboot into recovery mode, so I really don't want to make it longer since FAFT already takes forever as it is. And if we added 10 seconds now, there's still no guarantee that it would be enough for the next regression. FAFT relies heavily on many components in the system image working as expected and there are dozens of ways it could break from a bug there... I don't think it's worth trying to anticipate them all. I would hope that we consider an 8 seconds boot time regression more "catastrophic" than "silly", so it would seem quite unfortunate if it really doesn't make it back to all revisions (and maybe we should investigate why this wasn't noticed earlier to begin with).
,
Mar 22 2018
You can get more than 4 to 8 seconds of variance simply in DHCP server behavior. That's not an excuse, but just a reality. (I'm sourcing grundler's investigations into lab reliability for this.) I agree it will be unfortunate if we don't get this cherry-picked. I'm also confounded why no one noticed earlier. Feel free to track the other bugs I linked. There are also email conversations.
,
Mar 25 2018
The following revision refers to this bug: https://chromium.googlesource.com/aosp/platform/system/connectivity/shill/+/ea3c60b0987369d5cec8dac2ee2a675bed7ca029 commit ea3c60b0987369d5cec8dac2ee2a675bed7ca029 Author: Kirtika Ruchandani <kirtika@google.com> Date: Sat Mar 24 00:49:16 2018 shill: network-services: no wait for udev-trigger For release R66, we've resolved the issue of two entities on the system providing cfg80211 functionality by getting rid of stock cfg80211 on boards with Intel wifi. We no longer need to worry about the wrong cfg80211 being loaded on the system, so no need to wait for a wifi device driver (for which udev-trigger was a proxy) before starting network-services (and hence shill). This should resolve boot time performance regressions seen post-Core31 : making shill start-up dependent on udev-trigger meant the network wouldn't come up in the UI until a good 8-10 seconds after boot. On some devices, this was user-visible: user would find wifi turned off in the GUI because shill hadn't started up by the time the login screen came up, and turning the wifi slider to on manually wouldn't work until shill did start up. This partially reverts commit 6455013e3b00 ("init: Get rid of load_cfg80211"). BUG= chromium:810696 , chromium:822485 , b:74171245 TEST=Build and boot on Soraka, check that wifi is up on the login screen. Change-Id: I5f8e25942c27b28a9dc64c891d2ccf1df0adc2fa Reviewed-on: https://chromium-review.googlesource.com/972764 Commit-Ready: Brian Norris <briannorris@chromium.org> Tested-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Brian Norris <briannorris@chromium.org> (cherry picked from commit d9cd68b36b0f66083b608d035e6c2794de4564bd) [modify] https://crrev.com/ea3c60b0987369d5cec8dac2ee2a675bed7ca029/init/network-services.conf
,
Mar 25 2018
Julius, can you check FAFT on R66 after the next canary picks this up?
,
Mar 30 2018
Bitland successfully finished a FAFT run with R67-10514.0.0 so it looks like the fix works. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by kirtika@google.com
, Mar 16 2018Status: Assigned (was: Untriaged)