New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 807976 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 14
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocked on:
issue 827674



Sign in to add a comment

update_engine slowness

Project Member Reported by tnagel@chromium.org, Feb 1 2018

Issue description

Chrome Version: 60.0.3329.0/10359.0.0
OS: Chrome

What steps will reproduce the problem?
(1) Pass through OOBE while connected to ethernet

What is the expected result?
"Checking for updates" should be fast.

What happens instead?
"Checking for updates" takes ~20 seconds.

Full logs attached.
 
debug-logs_20180201-125534
78.5 KB View Download
What kind of file is the attachment? It is not a text file. is it compressed?
cat ~/tmp/debug-logs_20180201-125534 | tar tzvf -

It seems to be tar.gz
No idea why the suffix got lost. Sorry.
Cc: de...@chromium.org senj@chromium.org r...@chromium.org benchan@chromium.org zalcorn@chromium.org
Labels: -Pri-3 Pri-1
Owner: ahass...@chromium.org
Chrome waited for 22 seconds for update_engine to respond.

$ egrep 'EULA_ACCEPTED|UPDATE|setting timezone' chrome/chrome_20180201-035351
[1545:1545:0201/035420.422121:VERBOSE1:wizard_controller.cc(1225)] Wizard screen exit code: EULA_ACCEPTED
[1545:1545:0201/035423.551684:VERBOSE1:wizard_controller.cc(1584)] Resolve TimeZone: setting timezone to 'Europe/Berlin'
[1545:1545:0201/125442.213673:VERBOSE1:wizard_controller.cc(1225)] Wizard screen exit code: UPDATE_NOUPDATE
$ egrep 'Forced update|Processing Done' update_engine/update_engine.20180201-035400 
[0201/035420:INFO:update_attempter.cc(768)] Forced update check requested.
[0201/035420:INFO:chromeos_policy.cc(280)] Forced update signaled (interactive), allowing update check.
[0201/035442:INFO:update_attempter.cc(873)] Processing Done.
$


Assigning to update_engine owners.
Summary: update_engine slowness (was: CrOS OOBE slowness)
So I look at the log, the 22 second delay is coming from these two things:

- 10 seconds from running set-good-firmware. Remember that update_engine is single threaded, and we run this script synchronously. I'm not exactly sure the security consequences of continuing with the update while the firmware is not set as good is, but that is how it works at this point. The fact is that the update check is just coincided with running this script.

- ~10 seconds from setting vpd and sending some UMA metrics. Setting vpd should not take more than 2-3 seconds, and same true for sending UMA metrics (even less since it goes through chrome). Probably there would be some delay on replying back to chrome to set 'no update'.

But I don't think this is a bug, this WAI, but we may be able to move stuff around to reduce the delay.
Let me know if other owners have any opinion.
Cc: keta...@chromium.org
Ketaki, could you dig into this some more and work on how we can drive improvements?

Comment 8 by r...@chromium.org, Feb 1 2018

Why would it take 2-3 seconds to send UMA metrics? Also, that can 100% be made parallel.




Re #8: I have not fully tested it figure out where exactly those delays for the second ~10 seconds coming from. It was based on my observation of the code. 

I don't think we want to change parallelism in update engine internals at this point. UE should be very lightweight and should've consume too much resources. If we wanted to spawn a new thread for each UMA update, that is kind of a not good. I think we have a message loop that gets checked out once in a while to reply to requests, but there are times that UE's thread is busy and cannot respond. We should probably prioritize doing operations better, but I don't think parallelizing them for this instance makes much sense. Although this is just my opinion :P 

Comment 10 by r...@chromium.org, Feb 2 2018

Sure - whatever gets this down to a more sane level. 22 seconds is a bit excessive.
Why do we need to block the update check on set-good-firmware?

Comment 12 by de...@google.com, Feb 2 2018

Re #11: You can't apply an update to the other slot if the current one is not mark as good. Otherwise you will be writing to the good slot. The set-good-firmware step could be delayed until after the update check (omaha request/response) but before start applying the payload... if that helps.
> The set-good-firmware step could be delayed until after the update check (omaha request/response) but before start applying the payload... if that helps.

Based on comment #6 that would save us 10 seconds -- which would help a lot.
Does those times include force updates too (which actually does the update)?
I've been wondering about that as well. Forced updates might cause the bump around the 64s-79s bucket. I doubt that the other bump around 10s-12s is caused by forced updates because that would just be too fast. And the overflow bin is anyone's guess -- I'm trying to extend the range of that histogram right now.
Thus my conclusion in comment #14 is probably not substantiated, however it seems fair to say that the update check takes 20 seconds in the 90% percentile.
Then, this is consistent (20 seconds) with the intent of the origin of this bug. I have this in my TODO list, but there are few higher priority things going on. So this might take a while to get fixed. But feel free to send a patch if you feel like it :)
Comment #17 is still wrong -- I need to subtract the second bump of (assumed) forced updates. Third attempt: 19 seconds in the 95th percentile.
Blockedon: 827674
Project Member

Comment 21 by bugdroid1@chromium.org, Jul 25

The following revision refers to this bug:
  https://chromium.googlesource.com/aosp/platform/system/update_engine/+/d3f4bea600867b4ee85074869d723ba070db0ae7

commit d3f4bea600867b4ee85074869d723ba070db0ae7
Author: Amin Hassani <ahassani@chromium.org>
Date: Wed Jul 25 07:15:01 2018

update_engine: Pass Action ownership to ActionProcessor

Currently, an object that uses an ActionProcessor for processing one or more
actions has to own the Actions. This is problematic, because if we want to
create an action on the fly and use an ActionProcessor to perform it, we have to
own the Action until it is finished. Furthermore, if someone forget to own the
action, there will be memory leaks because ActionProcessor does not delete the
Action.

This patch passes the ownership of the Actions to the ActionProcessor through
unique pointers. If an object wants to have access to the Action, it can get it
when ActionComplete() is called.

BUG= chromium:807976 
TEST=unittests
TEST=cros flash
TEST=precq

Change-Id: I28f7e9fd3425f17cc51b4db4a4abc130a7d6ef8f
Reviewed-on: https://chromium-review.googlesource.com/1065113
Commit-Ready: Amin Hassani <ahassani@chromium.org>
Tested-by: Amin Hassani <ahassani@chromium.org>
Reviewed-by: Xiaochu Liu <xiaochu@chromium.org>

[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/common/action_processor.h
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/payload_consumer/filesystem_verifier_action_unittest.cc
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/omaha_response_handler_action.h
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/common/action_processor.cc
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/payload_consumer/download_action_unittest.cc
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/update_attempter_unittest.cc
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/omaha_request_action_unittest.cc
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/omaha_response_handler_action.cc
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/update_attempter.cc
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/common/action_processor_unittest.cc
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/common/mock_http_fetcher.cc
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/common/mock_action_processor.h
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/omaha_response_handler_action_unittest.cc
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/payload_consumer/postinstall_runner_action_unittest.cc
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/omaha_request_action.h
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/update_attempter.h
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/common/action_unittest.cc
[modify] https://crrev.com/d3f4bea600867b4ee85074869d723ba070db0ae7/common/mock_http_fetcher.h

Project Member

Comment 22 by bugdroid1@chromium.org, Jul 26

The following revision refers to this bug:
  https://chromium.googlesource.com/aosp/platform/system/update_engine/+/0882a5198b58173044e8b4b28eb4d30b1bcfe230

commit 0882a5198b58173044e8b4b28eb4d30b1bcfe230
Author: Amin Hassani <ahassani@chromium.org>
Date: Thu Jul 26 02:45:32 2018

update_engine: Pospone setgoodkernel after update check

Currently, if an update is forced, we first run setgoodkernel and then continue
with the update check. However, this delays the update check screen in the
OOBE. This patch adds a new Action, UpdateBootFlagsAction which is configured to
run AFTER an update check and response. However, now if the update check failed
due for example the response was invalid, then setgoodkernel will not
happen. But we already have a mechanism in SystemState where setgoodkernel is
always called (currently) 45 seconds after the boot. So there would no worries.

BUG= chromium:807976 
TEST=unittests
TEST=powerwashed the device, and the check for update passed faster on OOBE.

Change-Id: I5845d7a4c1393aec568b6279dfbb7d3dfa31cdd8
Reviewed-on: https://chromium-review.googlesource.com/1011244
Commit-Ready: Amin Hassani <ahassani@chromium.org>
Tested-by: Amin Hassani <ahassani@chromium.org>
Reviewed-by: Sen Jiang <senj@chromium.org>

[modify] https://crrev.com/0882a5198b58173044e8b4b28eb4d30b1bcfe230/real_system_state.h
[modify] https://crrev.com/0882a5198b58173044e8b4b28eb4d30b1bcfe230/real_system_state.cc
[modify] https://crrev.com/0882a5198b58173044e8b4b28eb4d30b1bcfe230/update_attempter_unittest.cc
[add] https://crrev.com/0882a5198b58173044e8b4b28eb4d30b1bcfe230/update_boot_flags_action.h
[add] https://crrev.com/0882a5198b58173044e8b4b28eb4d30b1bcfe230/update_boot_flags_action.cc
[modify] https://crrev.com/0882a5198b58173044e8b4b28eb4d30b1bcfe230/Android.mk
[add] https://crrev.com/0882a5198b58173044e8b4b28eb4d30b1bcfe230/update_boot_flags_action_unittest.cc
[modify] https://crrev.com/0882a5198b58173044e8b4b28eb4d30b1bcfe230/update_engine.gyp
[modify] https://crrev.com/0882a5198b58173044e8b4b28eb4d30b1bcfe230/update_attempter.h
[modify] https://crrev.com/0882a5198b58173044e8b4b28eb4d30b1bcfe230/update_attempter.cc

Project Member

Comment 23 by bugdroid1@chromium.org, Jul 31

The following revision refers to this bug:
  https://chromium.googlesource.com/aosp/platform/system/update_engine/+/667cf7b5b3d02cb38b786760c6c1764dfc758b85

commit 667cf7b5b3d02cb38b786760c6c1764dfc758b85
Author: Amin Hassani <ahassani@chromium.org>
Date: Tue Jul 31 23:37:15 2018

update_engine: Reflect ActionProcessor and UpdateBootFlags changes in AOSP

We changed the ActionProcessor interface and how we update boot flags in
CL:1065113 and CL:1011244. This patch reflects those changes in
update_attempter_android.*

BUG= chromium:807976 
TEST=unittests
TEST=build and run in android

Change-Id: Ib06b6c1560e58771e66288208a525130a5811360
Reviewed-on: https://chromium-review.googlesource.com/1150737
Commit-Ready: Amin Hassani <ahassani@chromium.org>
Tested-by: Amin Hassani <ahassani@chromium.org>
Reviewed-by: Sen Jiang <senj@chromium.org>

[modify] https://crrev.com/667cf7b5b3d02cb38b786760c6c1764dfc758b85/update_attempter_android.h
[modify] https://crrev.com/667cf7b5b3d02cb38b786760c6c1764dfc758b85/common/action_processor_unittest.cc
[modify] https://crrev.com/667cf7b5b3d02cb38b786760c6c1764dfc758b85/Android.mk
[modify] https://crrev.com/667cf7b5b3d02cb38b786760c6c1764dfc758b85/update_attempter_android.cc

Status: Fixed (was: Assigned)
I just checked it again, and it seems to just take only 2-3 seconds for an update check, so the previous trick of postponing update boot flags have worked :)

Sign in to add a comment