New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 910411 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 883029



Sign in to add a comment

flashrom hogs CPU and causing chrome to fail liveness check.

Project Member Reported by xiy...@chromium.org, Nov 29

Issue description

In investigating issue 883029, a popular (est. 50%) cause is a busy flashrom command since M70. I added code to collect "top" output when session_manager daemon kills chrome.

In M72, the data collected shows the following flashrom command line:
====
http://crash/45ba8a29fc21f977

flashrom -p host -i FMAP -i RW_VPD:/tmp/vpd.flashrom.7eUpZE -r /tmp/vpd.flashrom.kyK2kO
====

More example crashes:
http://crash/063b9faf4d6eaed9
http://crash/c0ba2615b053462f

 
Cc: hungte@chromium.org
Do you know which parent process calls this?
Does it cause busy every time, and on which device do you see this?

I don't have repro steps. The data is collected from crash reports. If we need the parent process, we need to add that in addition to the "top" output and wait for the new reports (could take a while).

There seems to be no specific devices when this happens. In M72, I could see it happen on the following devices: zako, falco, monroe (not an exhaustive list).
Was this seen on end user's devices, or only in our lab machines?
Sometimes the lab machines may not have proper VPD values provisioned and causing programs to keep re-trying fetching data.

Meanwhile, zako, falco, monroe were old-generation devices without our firmware-based VPD cache introduced, so if some programs try to fetch VPD too often (especially reading without cache) this may happen.

One thing I can tell is that this should be not related to the firmware updater, since starting from updater5 (and M72), we only use 'vpd' with pre-fetched data.

There's a command 'update_rw_vpd' that will call vpd directly for read+write, so I think that may be related.
Think this is happening in the wild on real user's devices.
In session manager, it may run 'vpd' in background for DevicePolicyService::UpdateSystemSettings, but that was already merged for over one year.

There's also update_engine:HardwareChromeOS::GetFirstActiveOmahaPingSent and HardwareChromeOS::SetFirstActiveOmahaPingSent, merged on 2017/6.

A very recent one (ToT on 11/16) is DevicePolicyService::ClearCheckEnrollmentVpd.

The one most close to M70 is probably DebugdDBusAdaptor::SetRlzPingSent, merged on 2018/03~04, which was more close to M70 branch date and may be the reason.


Cc: wzang@chromium.org
+wzang

Last one (rlz ping) is quite close to when the master issue 842272 is noticed (2018/05). I wonder what could we do to improve the situation.
another set of files calling vpd are in chromiumos-overlay/chromeos-base/infineon-firmware-updater/files/tpm-firmware-updater, but that can be tracked back to 2017/08.

These are all what I found that may run in enduser's environment by cs/.

To check further, is it possible to find out the cmdline of parent process of the flashrom call?

It should be something like 'vpd -i RW_VPD ...'. which would really help to figure out which one is the problem.
Currently, I collect "top" output when liveness ping fails.

https://cs.corp.google.com/chromeos_public/src/platform2/login_manager/liveness_checker_impl.cc?rcl=fdd4e481c798c0e49c84b901533f835b6be24dba&l=71

I could add code to collect parent process for flashrom. But the data might take a while to come back.
I think we'll need it even not just for this issue, since it's very often seeing issues due to boot time flashrom causing CPU busy (for example audio noise).
Seems that |DebugdDBusAdaptor::SetRlzPingSent| is unlikely to be related? According to issue 842272, the crash happens every time user signs in. But |SetRlzPingSent| is only called once ever for each device (or up to three retries in a row if it fails in the first attempt). It is called 24 hours after the first omnibox search is initiated by user.
Project Member

Comment 12 by bugdroid1@chromium.org, Dec 7

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/f8ab5136f85ea9251b37a0ba4f59e1f42d4a2748

commit f8ab5136f85ea9251b37a0ba4f59e1f42d4a2748
Author: Xiyuan Xia <xiyuan@google.com>
Date: Fri Dec 07 06:06:43 2018

login: Collect pstree of flashrom

If flashrom shows up on the top output, dump
  pstree -sal -p <flashrom_pid>
output to log.

BUG=chromium:883029,chromium:910411
TEST=Manual

Change-Id: I3ed8084763feea188b6b6b19ee64ee834ab22055
Reviewed-on: https://chromium-review.googlesource.com/1362396
Commit-Ready: Xiyuan Xia <xiyuan@chromium.org>
Tested-by: Xiyuan Xia <xiyuan@chromium.org>
Reviewed-by: Dan Erat <derat@chromium.org>

[modify] https://crrev.com/f8ab5136f85ea9251b37a0ba4f59e1f42d4a2748/login_manager/liveness_checker_impl.cc

Cc: cjmcdonald@chromium.org

Sign in to add a comment