flashrom hogs CPU and causing chrome to fail liveness check. |
|||
Issue descriptionIn investigating issue 883029, a popular (est. 50%) cause is a busy flashrom command since M70. I added code to collect "top" output when session_manager daemon kills chrome. In M72, the data collected shows the following flashrom command line: ==== http://crash/45ba8a29fc21f977 flashrom -p host -i FMAP -i RW_VPD:/tmp/vpd.flashrom.7eUpZE -r /tmp/vpd.flashrom.kyK2kO ==== More example crashes: http://crash/063b9faf4d6eaed9 http://crash/c0ba2615b053462f
,
Nov 30
Do you know which parent process calls this? Does it cause busy every time, and on which device do you see this?
,
Nov 30
I don't have repro steps. The data is collected from crash reports. If we need the parent process, we need to add that in addition to the "top" output and wait for the new reports (could take a while). There seems to be no specific devices when this happens. In M72, I could see it happen on the following devices: zako, falco, monroe (not an exhaustive list).
,
Nov 30
Was this seen on end user's devices, or only in our lab machines? Sometimes the lab machines may not have proper VPD values provisioned and causing programs to keep re-trying fetching data. Meanwhile, zako, falco, monroe were old-generation devices without our firmware-based VPD cache introduced, so if some programs try to fetch VPD too often (especially reading without cache) this may happen. One thing I can tell is that this should be not related to the firmware updater, since starting from updater5 (and M72), we only use 'vpd' with pre-fetched data. There's a command 'update_rw_vpd' that will call vpd directly for read+write, so I think that may be related.
,
Nov 30
Think this is happening in the wild on real user's devices.
,
Nov 30
In session manager, it may run 'vpd' in background for DevicePolicyService::UpdateSystemSettings, but that was already merged for over one year. There's also update_engine:HardwareChromeOS::GetFirstActiveOmahaPingSent and HardwareChromeOS::SetFirstActiveOmahaPingSent, merged on 2017/6. A very recent one (ToT on 11/16) is DevicePolicyService::ClearCheckEnrollmentVpd. The one most close to M70 is probably DebugdDBusAdaptor::SetRlzPingSent, merged on 2018/03~04, which was more close to M70 branch date and may be the reason.
,
Nov 30
+wzang Last one (rlz ping) is quite close to when the master issue 842272 is noticed (2018/05). I wonder what could we do to improve the situation.
,
Nov 30
another set of files calling vpd are in chromiumos-overlay/chromeos-base/infineon-firmware-updater/files/tpm-firmware-updater, but that can be tracked back to 2017/08. These are all what I found that may run in enduser's environment by cs/. To check further, is it possible to find out the cmdline of parent process of the flashrom call? It should be something like 'vpd -i RW_VPD ...'. which would really help to figure out which one is the problem.
,
Nov 30
Currently, I collect "top" output when liveness ping fails. https://cs.corp.google.com/chromeos_public/src/platform2/login_manager/liveness_checker_impl.cc?rcl=fdd4e481c798c0e49c84b901533f835b6be24dba&l=71 I could add code to collect parent process for flashrom. But the data might take a while to come back.
,
Nov 30
I think we'll need it even not just for this issue, since it's very often seeing issues due to boot time flashrom causing CPU busy (for example audio noise).
,
Dec 1
Seems that |DebugdDBusAdaptor::SetRlzPingSent| is unlikely to be related? According to issue 842272, the crash happens every time user signs in. But |SetRlzPingSent| is only called once ever for each device (or up to three retries in a row if it fails in the first attempt). It is called 24 hours after the first omnibox search is initiated by user.
,
Dec 7
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/f8ab5136f85ea9251b37a0ba4f59e1f42d4a2748 commit f8ab5136f85ea9251b37a0ba4f59e1f42d4a2748 Author: Xiyuan Xia <xiyuan@google.com> Date: Fri Dec 07 06:06:43 2018 login: Collect pstree of flashrom If flashrom shows up on the top output, dump pstree -sal -p <flashrom_pid> output to log. BUG=chromium:883029,chromium:910411 TEST=Manual Change-Id: I3ed8084763feea188b6b6b19ee64ee834ab22055 Reviewed-on: https://chromium-review.googlesource.com/1362396 Commit-Ready: Xiyuan Xia <xiyuan@chromium.org> Tested-by: Xiyuan Xia <xiyuan@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> [modify] https://crrev.com/f8ab5136f85ea9251b37a0ba4f59e1f42d4a2748/login_manager/liveness_checker_impl.cc
,
Dec 7
|
|||
►
Sign in to add a comment |
|||
Comment 1 by dlaurie@google.com
, Nov 30