veyron: unable to install R44 FSI (7077.111.1) on R59 (9390.0.0) |
||||||
Issue description
WARNING: Primary GPT header is invalid
Segmentation fault (core dumped)
[0323/120249:ERROR:postinstall_runner_action.cc(291)] Postinst command failed with code: 139
[0323/120249:ERROR:postinstall_runner_action.cc(328)] Postinstall action failed.
--
Looking at the core dump, using 7077.11.1 debug symbol, we fail in:
bt
#0 memcmp (s1=0x80, s2=0x8097d <guid_chromeos_kernel>, len=16) at memcmp.c:328
#1 0x000135c8 in GuidEqual ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(I am guessing gdb fails to properly unwind the stack due to optimization).
On my machine, the primary GPT entry is marked as IGNOREME. (dump in veyron_begin)
cgpt show /dev/mmcblk0 | less
WARNING: Primary GPT header is being ignored
start size part contents
0 1 PMBR (Boot GUID: 304BCEA3-F781-7949-964C-0CD0F918D42C)
1 1 IGNORED Pri GPT header
30785503 32 Sec GPT table
8671232 22081536 1 Label: "STATE"
Type: Linux data
Looking into vboot_reference history, IGNOREME was introduced in c//340072, which landed after R44.
We have to be sure FSI images we are using in the lab are post R44 on veyron machines.
,
Mar 23 2017
And why didn't this show before a few days ago? Shouldn't it have shown up in R44 testing?
,
Mar 23 2017
See issue 599960 for context. TL;DR: We made an irreversible change to the eMMCs on Veyron devices to mitigate a security vulnerability. Veyron images before a certain version (IIRC R53?) can no longer be installed on those systems. In the field, any attempt to do so should be caught by rollback prevention already, but I guess in the lab we are clearing rollback flags so that won't help. I have tried to avoid impact to the lab from this by making sure the eMMC change is never applied on lab devices (checking 'crossystem debug' which is always 1 for test images)... and that seems to have helped quite a bit if this is the first time you found a problem with it. However, if you add new Veyron devices to the lab that were previously used as consumer/dogfood units (or maybe newer ones that just came like that out of the factory, depending on details of the factory process), it's hard to avoid that you'll eventually get some units that already have this change applied before you reimage them with a test image. Since the version that added support for this has long been stable, I hope this won't really cause an issue (we're not testing anything older than the current stable channel anyway, right?). Just make sure all the images you use are new enough.
,
Mar 23 2017
The issue is that we continue to test our ability to update from all FSI versions to current. To do that, you have to be able to install the FSI version as prep for the update test. We have the ability to block selected FSI versions from testing because of previous issues, but I really hate to use it since a critical bug leaves users with recovery as their only chance to keep a device (probably brand new out of an old box) working.
,
Mar 23 2017
The TPMs are the ones who can do the test blocking for these boards via GE. I'd like to ask that we try to make sure each board continues to have at least one FSI listed, even if we have to add a pretend FSI version for this purpose.
,
Mar 23 2017
Also, this only applies to veyron, right?
,
Mar 23 2017
Yes, this only applies to Veyron. Restricting the old FSIs is the easiest solution and I think should be good enough. If you really want something better, it's also possible to continue running your FSI tests as long as you make sure that you only run them on DUTs whose eMMCs were never permanently write-protected. I guess you could set up a special lab pool for those or something and then make sure the devices in there will never run non-test images. (You can use any sufficiently recent version of cgpt to identify whether the protection is applied... just look for the "IGNOREME".) Like I said, I'm not sure if newer Veyrons already come out of the factory like this, so you'd have to deal with the ones you (or Chromestop) already have. For the devices so new that they've always had this (Fievel and Tiger), their FSIs should also all be new enough to support it.
,
Mar 23 2017
> Restricting the old FSIs is the easiest solution and I think should be good enough. Yes, that'll be the way to go in this case. > If you really want something better, [ ... ] make sure that you only run them > on DUTs whose eMMCs were never permanently write-protected. Yup, but that's not really viable. Basically, I think we're stuck with blacklisting the older builds, and taking our chances.
,
Mar 24 2017
To be clear to the TPMs, we need to uncheck "Is FSI Lab stable?" for every veyron derived FSI that is from R44 or earlier.
,
Mar 24 2017
,
Mar 24 2017
> from R44 or earlier. This should be R51 or earlier. Or, more precisely (since it was backported into R51 after the branch point), all FSIs that older than 8244.0.0 or 8172.59.0.
,
Mar 24 2017
As expected, upgrade to FSI image R53 8530.96.0 works, it has support for IGNOREME. [however, the image crashes on my veyron. There is a newer FSI image anyway]. I will use samus, another machine that have post 53 FSI images.
,
Mar 24 2017
Lets give this bug to the TPMs to do the blacklisting.
,
Mar 24 2017
ok, I'll go ahead and mark pre-M53 FSI as non stable for veyron devices. Chances to get into any issues are small since I have already declared M53 as "FSI" for these devices and update path for any pre M53 unit will go like this a) pre-M53 -> M53 <--- The test for these transitions was already tested when we pushed M53 b) M53 -> Latest <--- This delta transition will be tested on each release since M53 is set as "FSI" I'll update the bug once all veyron as set to not stable
,
Mar 24 2017
Oh, we have stepping stones? I didn't think we'd deployed any yet. It's always safe to stop testing from versions older than a stepping stone. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by dgarr...@chromium.org
, Mar 23 2017