New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 822485 link

Starred by 4 users

Issue metadata

Status: Verified
Owner:
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android , Chrome
Pri: 1
Type: Bug-Regression

Blocked on:
issue 810696



Sign in to add a comment

platform_BootPerf/seconds_kernel_to_network regression across all(?) boards

Project Member Reported by jwer...@chromium.org, Mar 15 2018

Issue description

I noticed that there seems to be a pretty big regression (around 4 to 8 full seconds) in the seconds_kernel_to_network metric in the platform_BootPerf test across a lot of boards (at least Chell, Nyan_Big and Peppy... Hana seems to also be affected but for some reason there are no M66 performance test results for it and many other boards on chromeperf). It seems to be centered right around the M65 branch point, somewhere between 10319.0.0 and 10367.0.0 (can't really tell how to differentiate M65 from M66 after the branch in the graph tool so I'm not sure if M65 is also affected).

https://chromeperf.appspot.com/report?sid=a2adab4bb9595f9a6dc8e27f4f3a728e3219a7fa94530a5d61b260a9befd4ff8&start_rev=32990001026700000&end_rev=33360031037800000

This regression is so big that it's causing FAFT tests to fail (see b/74974982). Who is the right person to look into this?
 

Comment 1 by kirtika@google.com, Mar 16 2018

Owner: kirtika@chromium.org
Status: Assigned (was: Untriaged)
We made shill's starting depend on "stopped udev-trigger" so shill starts much later now. 
See https://bugs.chromium.org/p/chromium/issues/detail?id=810696

Comment 2 by kirtika@google.com, Mar 16 2018

Blockedon: 810696
Project Member

Comment 4 by bugdroid1@chromium.org, Mar 22 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/aosp/platform/system/connectivity/shill/+/d9cd68b36b0f66083b608d035e6c2794de4564bd

commit d9cd68b36b0f66083b608d035e6c2794de4564bd
Author: Kirtika Ruchandani <kirtika@google.com>
Date: Thu Mar 22 03:48:35 2018

shill: network-services: no wait for udev-trigger

For release R66, we've resolved the issue of two entities on the system
providing cfg80211 functionality by getting rid of stock cfg80211 on boards
with Intel wifi. We no longer need to worry about the wrong cfg80211 being
loaded on the system, so no need to wait for a wifi device driver (for which
udev-trigger was a proxy) before starting network-services (and hence shill).

This should resolve boot time performance regressions seen post-Core31 :
making shill start-up dependent on udev-trigger meant the network
wouldn't come up in the UI until a good 8-10 seconds after boot.
On some devices, this was user-visible: user would find wifi turned off
in the GUI because shill hadn't started up by the time the login screen
came up, and turning the wifi slider to on manually wouldn't work until
shill did start up.

This partially reverts commit 6455013e3b00 ("init: Get rid of load_cfg80211").

BUG= chromium:810696 ,  chromium:822485 , b:74171245
TEST=Build and boot on Soraka, check that wifi is up on the login screen.

Change-Id: I5f8e25942c27b28a9dc64c891d2ccf1df0adc2fa
Reviewed-on: https://chromium-review.googlesource.com/972764
Commit-Ready: Brian Norris <briannorris@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>

[modify] https://crrev.com/d9cd68b36b0f66083b608d035e6c2794de4564bd/init/network-services.conf

Labels: -M-66 M-67 OS-Chrome
Status: Fixed (was: Assigned)
4 to 8 seconds is big, but is it really big enough to be worth failing FAFT tests? Seems like somebody needs a bigger timeout...

Anyway, this is fixed. I manually verified on several boards, but you can probably see chromeperf results fill in in the next few days. It remains to be seen whether this will get ported back to M66, but unlikely to M65. This effort will probably happen in one of these bugs:

https://bugs.chromium.org/p/chromium/issues/detail?id=807315
https://bugs.chromium.org/p/chromium/issues/detail?id=810696

But it might not happen; in which case...well we live with a silly boot-time regression. Marking M-67 as the 'Fixed' version.
> 4 to 8 seconds is big, but is it really big enough to be worth failing FAFT tests? Seems like somebody needs a bigger timeout...

This is a timeout that needs to fully run out every time a post-2015 board tries to reboot into recovery mode, so I really don't want to make it longer since FAFT already takes forever as it is. And if we added 10 seconds now, there's still no guarantee that it would be enough for the next regression. FAFT relies heavily on many components in the system image working as expected and there are dozens of ways it could break from a bug there... I don't think it's worth trying to anticipate them all.

I would hope that we consider an 8 seconds boot time regression more "catastrophic" than "silly", so it would seem quite unfortunate if it really doesn't make it back to all revisions (and maybe we should investigate why this wasn't noticed earlier to begin with).
You can get more than 4 to 8 seconds of variance simply in DHCP server behavior. That's not an excuse, but just a reality. (I'm sourcing grundler's investigations into lab reliability for this.)

I agree it will be unfortunate if we don't get this cherry-picked. I'm also confounded why no one noticed earlier. Feel free to track the other bugs I linked. There are also email conversations.
Project Member

Comment 8 by bugdroid1@chromium.org, Mar 25 2018

Labels: merge-merged-release-R66-10452.B
The following revision refers to this bug:
  https://chromium.googlesource.com/aosp/platform/system/connectivity/shill/+/ea3c60b0987369d5cec8dac2ee2a675bed7ca029

commit ea3c60b0987369d5cec8dac2ee2a675bed7ca029
Author: Kirtika Ruchandani <kirtika@google.com>
Date: Sat Mar 24 00:49:16 2018

shill: network-services: no wait for udev-trigger

For release R66, we've resolved the issue of two entities on the system
providing cfg80211 functionality by getting rid of stock cfg80211 on boards
with Intel wifi. We no longer need to worry about the wrong cfg80211 being
loaded on the system, so no need to wait for a wifi device driver (for which
udev-trigger was a proxy) before starting network-services (and hence shill).

This should resolve boot time performance regressions seen post-Core31 :
making shill start-up dependent on udev-trigger meant the network
wouldn't come up in the UI until a good 8-10 seconds after boot.
On some devices, this was user-visible: user would find wifi turned off
in the GUI because shill hadn't started up by the time the login screen
came up, and turning the wifi slider to on manually wouldn't work until
shill did start up.

This partially reverts commit 6455013e3b00 ("init: Get rid of load_cfg80211").

BUG= chromium:810696 ,  chromium:822485 , b:74171245
TEST=Build and boot on Soraka, check that wifi is up on the login screen.

Change-Id: I5f8e25942c27b28a9dc64c891d2ccf1df0adc2fa
Reviewed-on: https://chromium-review.googlesource.com/972764
Commit-Ready: Brian Norris <briannorris@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
(cherry picked from commit d9cd68b36b0f66083b608d035e6c2794de4564bd)

[modify] https://crrev.com/ea3c60b0987369d5cec8dac2ee2a675bed7ca029/init/network-services.conf

Comment 9 by kirtika@google.com, Mar 25 2018

Owner: jwer...@chromium.org
Julius, can you check FAFT on R66 after the next canary picks this up?

Labels: OS-Android
Status: Verified (was: Fixed)
Bitland successfully finished a FAFT run with R67-10514.0.0 so it looks like the fix works.

Sign in to add a comment