New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 865731 link

Starred by 4 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

mosys/flashrom are super unhappy w/ erased SPI flash

Project Member Reported by diand...@chromium.org, Jul 19

Issue description

If you happen to manage to boot up a system and your SPI flash exists but is erased then flashrom churns and churns and churns.  It also blocks bootup in a bunch of cases.

---

Specifically my flash looks like this:

# flashrom -r /tmp/foo.bin
flashrom v0.9.9  : 622128c : Jul 05 2018 14:19:12 UTC on Linux 4.14.55 (aarch64)
Reading flash... SUCCESS

 # hexdump -C /tmp/foo.bin
00000000  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
01000000


In my particular case I'm working on a board that can boot the BIOS from a source other than SPI flash which is why I'm able to boot but my SPI flash is empty.

---

If I look at "top" on my system I almost always see "flashrom" pegged at 100%.  Running "ps auxf", I see things like this:

 /bin/sh -ue /usr/bin/dev_debug_vboot --cleanup
  \_ crossystem --all
      \_ /usr/sbin/mosys nvram vboot read
          \_ flashrom -p host -i RW_NVRAM:/tmp/flashrom_WQ2qL2 -r

---

There appear to be at least two problems here:

1. It sure seems like flashrom shouldn't be invoked quite so many times.

2. It appears that using the "-i RW_NVRAM" in the case of fully erased flash is much worse than just reading flash.  It eats up 10x the CPU time and takes 20x as long.  Specifically:

--

2a) During the following time CPU usage for flashrom hovers at ~8% on my system.

# time flashrom -r /tmp/foo.bin
flashrom v0.9.9  : 622128c : Jul 05 2018 14:19:12 UTC on Linux 4.14.55 (aarch64)
Reading flash... SUCCESS

real    0m16.914s
user    0m0.036s
sys     0m1.535s


2b) During the following time CPU usage for flashrom hovers at ~80% on my system.

# time flashrom -p host -i RW_NVRAM:/tmp/flashrom_WQ2qL2 -r
flashrom v0.9.9  : 622128c : Jul 05 2018 14:19:12 UTC on Linux 4.14.55 (aarch64)
No layout info is available.
FAILED

real    6m36.582s
user    0m5.736s
sys     4m55.899s

 
Cc: vbendeb@chromium.org
yes, when layout option is given and no map is present in the flash, flashrom ends up doing binary search for the fmap pattern in the flash, reading it again and again and again, reducing granularity to 1 byte.

I should at least read it once and then search in the file instead, but it needs to decide when to give up searching in the chip.
@2: It seems like you could just keep the binary search algorithm but:
- Always read a block at a time
- Always cache the results

That would at least not make it take 6 minutes to run.  I wonder if you have any ideas of the other part of this bug: making it so we don't call flashrom over and over again in this case?
Truth be told, system running with the bios flash empty has never happened before, is this something we need to account for now?

Also, I am not quite following: where do we call flashrom all over again? You mean crossystem --all results in multiple invocations of flashrom?
> is this something we need to account for now

It's not a terribly critical bug, but it was pretty annoying to track down and the fix would be easy.  If nobody else wants to do it I might just spend the hour to code it up so I don't have to deal with this again.


> where do we call flashrom all over again?

Right, it seems like crossystem -all keep reading flashrom over again over again.  I guess maybe it's not a big deal if flashrom is fast, but when it's slow like this it causes problems...


I guess the question is: where is the binary search algorithm useful?  It seems like any time we do a significant amount of binary searching flashrom will be slow and thus crossystem will also be slow.  Could this be something affecting real users?


the binary search is slow because the assumption is that fmap can be placed anywhere in the flash. We (chrome os) pretty much always align it at a few KB boundary (citation needed!), so we can just either give up if fmap is not found at say 4K granularity, or read the entire flashrom at that point.

We could generate the layout map on first flashrom invocation and then have the rest use the generated layout instead of looking for it again and again.

Maybe the layout file name could be put in an environment variable and then flashrom would use that file if the variable is set.

I have been trying to improve flashrom for CCD use, I'll look into these enhancements too.
> We (chrome os) pretty much always align it at a few KB boundary (citation needed!)

Maybe we could make a parameter for flashrom to disable the binary search?
> We could generate the layout map on first flashrom invocation and then have the rest use the generated layout instead of looking for it again and again.

What would you store if you have *no* FMAP (per $subject)? Also, you'd need some validation/expiration policy, if you want to support a change in FMAP (e.g., eventually we program a proper FMAP to the flash).

BTW, the funny thing is that flashrom already has a sort of caching methodology, but it only works when an FMAP is actually present. We check the output of 'crossystem fmap_base' to see if there's a *suggested* offset, and we check that first. If the FMAP is there, we're done; if not, only *then* do we fall back to the expensive binary search.

'crossystem fmap_base' gets its info from /proc/device-tree/firmware/chromeos/fmap-offset or similar (which is stashed by the BIOS).


> Maybe we could make a parameter for flashrom to disable the binary search?

And do what instead? Just look at 4KB strides? That probably would work. It's probably worth checking if that is fast enough for you.
> What would you store if you have *no* FMAP (per $subject)?

an invocation requesting to use a certain section should fail in this case (and is probably failing today, I have not checked).

> Maybe we could make a parameter for flashrom to disable the binary search?

or maybe make a parameter to specify the finest granularity for this search.

Also, note that I made a flashrom change recently where the fmap is looked up in the supplied binary image first, and only if not found falls back to looking in the resident image.

This helps a lot in case of write operations when the supplied image is a proper chrome os bios blob.


> an invocation requesting to use a certain section should fail in this case (and is probably failing today, I have not checked).

Yes...but only after failing to find an FMAP. The problem is that we want to optimize the currently-glacial process used to find that there is no FMAP. I'm just saying that I don't think your suggestion (as I understand it) would really improve things.

> or maybe make a parameter to specify the finest granularity for this search.

Sure.

> Also, note that I made a flashrom change recently where the fmap is looked up in the supplied binary image first, and only if not found falls back to looking in the resident image.

Doesn't really affect the original report, which is just about reading regions via mosys/crossystem. Those wouldn't provide a binary image to flash.
Blocking: 883029
Is there anything we could do abou this. This looks like a popular reason to cause chrome to fail liveness ping and get killed (issue 883029).

http://crash/26e3770dd0a3d093

And "top" output when session_manager kills chrome:
====
top - 18:30:19 up 0 min,  0 users,  load average: 5.54, 1.39, 0.46
Tasks: 139 total,   6 running, 132 sleeping,   0 stopped,   1 zombie
%Cpu0  : 64.9 us, 32.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  2.7 si,  0.0 st
%Cpu1  : 36.1 us, 63.9 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   1877.1 total,   1069.4 free,    363.9 used,    443.8 buff/cache
MiB Swap:   2749.7 total,   2749.7 free,      0.0 used.   1155.6 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 1334 root      20   0   13224   1232    856 R  82.6   0.1   0:01.40 flashrom
  895 chronos   20   0  594008  92596  24132 R  52.2   4.8   0:00.74 chrome
  658 chronos   12  -8  848404 232488  86908 R  39.1  12.1   0:01.65 chrome
  877 chronos   12  -8  318912  37712  15112 R  21.7   2.0   0:00.27 chrome
   24 root      20   0       0      0      0 S  13.0   0.0   0:00.03 kworker/1:1
   10 root      20   0       0      0      0 R   8.7   0.0   0:00.05 rcu_sched
  353 syslog    20   0  297212   2480   1576 S   8.7   0.1   0:00.05 rsyslogd
  817 root      20   0       0      0      0 S   4.3   0.0   0:00.12 kworker/0:2
 1457 root      20   0   11400   1228    936 S   4.3   0.1   0:00.02 mosys
    1 root      20   0   12024   2504   1340 S   0.0   0.1   0:00.29 init
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd
    3 root      20   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/0
    4 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kworker/0:0
    5 root       0 -20       0      0      0 S   0.0   0.0   0:00.00 kworker/0:0H
    6 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kworker/u:0
    7 root       0 -20       0      0      0 S   0.0   0.0   0:00.00 kworker/u:0H
    8 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/0
    9 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_bh

without looking into much details, when flashrom is invoked to read a section from the chip, it tries to read the fmap from the chip to find the section boundary. If the chip is erased, looking for fmap takes very long time, because it is done as a binary search ending up reading the entire flash.


is it known what exact flashrom invocation this is caused by?
How to find that out? Would full command line to flashrom help?
yes, this is what I meant to ask: full flash command line.
I could not repro it locally. The "top" output is collected when the crash happens in the field.

I will update the collection code. Last time, it took about 1 week to get the data. I will ping back once I get something.
Got data back. The flashrom command line looks like the following:

flashrom -p host -i FMAP -i RW_VPD:/tmp/vpd.flashrom.7eUpZE -r /tmp/vpd.flashrom.kyK2kO

in http://crash/45ba8a29fc21f977.

Trimmed top output:
====
op - 10:59:15 up 0 min,  0 users,  load average: 5.71, 1.42, 0.48
Tasks: 145 total,   3 running, 141 sleeping,   0 stopped,   1 zombie
%Cpu(s): 68.5 us, 31.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   1877.1 total,   1024.4 free,    298.4 used,    554.4 buff/cache
MiB Swap:   2749.7 total,   2749.7 free,      0.0 used.   1142.0 avail Mem 
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 1358 root      20   0   21480   3440    932 R  44.0   0.2   0:03.15 flashrom -p host -i FMAP -i RW_VPD:/tmp/vpd.flashrom.7eUpZE -r /tmp/vpd.flashrom.kyK2kO
  677 chronos   12  -8  753840 138540  65704 S  32.0   7.2   0:01.54 /opt/google/chrome/chrome --ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so --ppapi-flash-version=31.0.0.148 --ui-prioritize-in-gpu-process --use-gl=egl --enable-native-gpu-memory-buffers --gpu-sandbox-failures-fatal=yes --enable-logging --log-level=1 --use-cras --enable-wayland-server --user-data-dir=/home/chronos --enable-features=MachineLearningService --login-profile=user --has-chromeos-keyboard --aura-legacy-power-bu+
  914 chronos   20   0  587096  88120  21644 R  24.0   4.6   0:00.51 /opt/google/chrome/chrome --type=renderer --enable-logging --log-level=1 --use-gl=egl --vmodule=nss_cert_database_chromeos=1,*/assistant/*=1,*chromeos/login/*=1,auto_enrollment_controller=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,*night_light*=1,update_engine=1,component_updater_service=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2 --field-trial-handle=1766717298446023+
  954 chronos   20   0  595264  77936  19020 S  24.0   4.1   0:00.50 /opt/google/chrome/chrome --type=renderer --enable-logging --log-level=1 --use-gl=egl --vmodule=nss_cert_database_chromeos=1,*/assistant/*=1,*chromeos/login/*=1,auto_enrollment_controller=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,*night_light*=1,update_engine=1,component_updater_service=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2 --field-trial-handle=1766717298446023+
   10 root      20   0       0      0      0 S   4.0   0.0   0:00.04 [rcu_sched]
   15 root      20   0       0      0      0 S   4.0   0.0   0:00.21 [kworker/1:0]
   61 root       0 -20       0      0      0 S   4.0   0.0   0:00.06 [kworker/0:1H]
  884 chronos   12  -8  373480  78956  58816 S   4.0   4.1   0:00.51 /opt/google/chrome/chrome --type=gpu-process --field-trial-handle=17667172984460237150,14811314750318420328,131072 --enable-features=MachineLearningService --gpu-sandbox-failures-fatal=yes --enable-logging --log-level=1 --vmodule=nss_cert_database_chromeos=1,*/assistant/*=1,*chromeos/login/*=1,auto_enrollment_controller=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,*night_light*=1,update_engine=1,component_updater_service=1,power_bu+
 1580 root      20   0    9544   1404    980 R   4.0   0.1   0:00.01 top -b -c -n1 -w512
    1 root      20   0   12208   2628   1372 S   0.0   0.1   0:00.30 /sbin/init
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 [kthreadd]
    3 root      20   0       0      0      0 S   0.0   0.0   0:00.00 [ksoftirqd/0]
    4 root      20   0       0      0      0 S   0.0   0.0   0:00.00 [kworker/0:0]
    5 root       0 -20       0      0      0 S   0.0   0.0   0:00.00 [kworker/0:0H]
Seems like the command line is constructed in flashromPartialRead with partition_name "RW_VPD" ?

https://cs.corp.google.com/chromeos_public/src/platform/vpd/lib/flashrom.c?rcl=4649d42d10a7fa32f18da7e77e019891d1d7d940&l=63
Presumably a firmware image could be found on the target in the chromeos-firmwareupdate self extracting tarball, dump_fmap could be used to extract fmap from it, it can be converted into a layout file which flashrom can use. In this case it would not try retrieving layout from the currently installed flash image.

We could also have the layout generated at build time and added to the chrome os image.
Thanks for the quick response. Is there anything we could do about it? Or we need more info the identify where the problem is?

This has been around a while and now contributing to the current top #1 crash of beta release.
Pardon my ignorance, but how come there are many devices with their onboard SPI flash not programmed? Otherwise this problem would not show up, right?
Sorry, I knew almost nothing about firmware and flash update so not sure why we start to hit this problem.

I was looking at the spike of SIGABRT crash of chrome and found part of the crash is due to chrome fails to respond to the liveness ping. However, the chrome stack in the crash reports shows that chrome is idle at the time of the kill. After I added "top" dump when we kill chrome, I found flashrom is pretty common when it happens. I am here in the hope that people cc'ed on the bug could shed some light on the situation.
re #21 but the problem this bug started with was specific for chrome os devices where the firmware SPI flash is empty, which is extremely unlikely to happen anywhere but in development environment.

maybe the crashes you are looking ag are happening for a different reason.
Re #22: Fair enough. Filed issue 910411 and cc'ed you. Please help to cc relevant people and even better to find an owner for the new issue. And let me know what I should look at to find more clues for you guys. Thanks.
Blocking: -883029

Sign in to add a comment