Issue metadata
Sign in to add a comment
|
invalid opcode from chrome process on braswell boards |
||||||||||||||||||||||
Issue descriptionCQ fails due to provisioning failure 7 cycles in a row. https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927284551986369504 In this cycle, cyan-paladin, wizpig-paladin, and edgar-paladin fail on the same failures: TestLabFailure: ** HWTest did not complete due to infrastructure issues (code 3) ** In other cycles, link-paladin and sentry-paladin also failed. This might be related to issue 771257 , but not limited to wizpig-paladin. Maybe there is a network issue in the lab? Can anyone from Infra folks look into the issue?
,
Dec 13
I haven't looked at all the failures yet, but the first one I looked at (edgar) at https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927283469920808384 is failing with "Failure in build R73-11395.0.0-rc1: Chrome failed to reach login screen", and is dumping all sorts of crash logs. See for instance https://stainless.corp.google.com/browse/chromeos-autotest-results/266572306-chromeos-test/
,
Dec 13
,
Dec 13
-> sheriff
,
Dec 13
,
Dec 13
Have we seen this on the canaries yet? If so, that would rule out a bad CL in the current CQ being at fault.
,
Dec 13
^ can't tell, canaries are broken prior to hwtest due to Issue 914705
,
Dec 13
FYI: I looked into logs on cyan-paladin. the same "chromeos-chrome-73.0.3638.0_rc-r1" is used for successful builds and failed builds, good: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927352051002358192 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927338878729688624 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927322974143567712 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927307053279764608 bad: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927294403103862944 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927294403103862944 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927270341872315488 chrome has been uprev'd to chromeos-chrome-73.0.3638.0_rc-r1. It passed once, but next build failed.
,
Dec 13
Looks like it started on one of two builds below: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927477447245955376 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927472092515883456 Chrome wasn't upreved during this transition (73.0.3636.0 rc1), so it's probably chromeos side issue?
,
Dec 13
In all of the failures I've looked, immediately preceding the skipped chrome coredump is invalid opcode attempts from chrome (sub)processes. Some examples (blank lines separate different runs / different dmesg logs): [ 189.481125] do_trap: 3 callbacks suppressed [ 189.481144] traps: chrome[12723] trap invalid opcode ip:5ee6f27488f3 sp:7ffd5aacfa80 error:0 in chrome[5ee6edc00000+6c71000] [ 189.487410] traps: chrome[12736] trap invalid opcode ip:5ee6f27488f3 sp:7ffd5aacfa80 error:0 in chrome[5ee6edc00000+6c71000] [ 189.503536] traps: CompositorTileW[12684] trap divide error ip:616d7cbc917c sp:711123b6ddf0 error:0 in chrome[616d78cac000+8c1a000] [ 189.548173] traps: chrome[12737] trap invalid opcode ip:5ee6f27488f3 sp:7ffd5aacfa80 error:0 in chrome[5ee6edc00000+6c71000] [ 189.559921] traps: chrome[12743] trap invalid opcode ip:5ee6f27488f3 sp:7ffd5aacfa80 error:0 in chrome[5ee6edc00000+6c71000] [ 189.560187] Pid 1(chrome) over core_pipe_limit [ 189.560198] Skipping core dump [ 23.535511] traps: CompositorTileW[1760] trap divide error ip:5a99e869616c sp:7525c2864da0 error:0 in chrome[5a99e486a000+9bb5000] [ 23.559163] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [ 23.571081] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [ 23.921045] SELinux: initialized (dev proc, type proc), uses genfs_contexts [ 24.090179] SELinux: initialized (dev proc, type proc), uses genfs_contexts [ 24.105967] SELinux: initialized (dev proc, type proc), uses genfs_contexts [ 25.134436] SELinux: initialized (dev proc, type proc), uses genfs_contexts [ 25.139315] SELinux: initialized (dev proc, type proc), uses genfs_contexts [ 25.163586] SELinux: initialized (dev proc, type proc), uses genfs_contexts [ 25.194459] SELinux: initialized (dev proc, type proc), uses genfs_contexts [ 25.269128] traps: chrome[1918] trap invalid opcode ip:5eb0dfd30d93 sp:7ffdcb170b00 error:0 in chrome[5eb0dba00000+7c8e000] [ 25.269259] traps: chrome[1911] trap invalid opcode ip:5eb0dfd30d93 sp:7ffdcb170b00 error:0 in chrome[5eb0dba00000+7c8e000] [ 25.288725] SELinux: initialized (dev proc, type proc), uses genfs_contexts [ 25.289835] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [ 25.296731] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [ 25.296939] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [ 25.302272] SELinux: initialized (dev proc, type proc), uses genfs_contexts [ 25.302951] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [ 25.330112] traps: chrome[1909] trap invalid opcode ip:5eb0dfd30d93 sp:7ffdcb170b00 error:0 in chrome[5eb0dba00000+7c8e000] [ 25.354729] traps: chrome[1931] trap invalid opcode ip:5eb0dfd30d93 sp:7ffdcb170b00 error:0 in chrome[5eb0dba00000+7c8e000] [ 25.380869] traps: CompositorTileW[1869] trap divide error ip:61a472e5016c sp:7e0e0f6b4da0 error:0 in chrome[61a46f024000+9bb5000] [ 25.423163] traps: chrome[1941] trap invalid opcode ip:5eb0dfd30d93 sp:7ffdcb170b00 error:0 in chrome[5eb0dba00000+7c8e000] [ 25.431023] traps: chrome[1943] trap invalid opcode ip:5eb0dfd30d93 sp:7ffdcb170b00 error:0 in chrome[5eb0dba00000+7c8e000] [ 25.431198] Pid 1(chrome) over core_pipe_limit [ 25.431209] Skipping core dump
,
Dec 13
Re: #10 - The first failure looks unrelated to this, it's in uprev prior to DUT provisioning.
,
Dec 13
caveh@ found https://chromium-review.googlesource.com/c/chromiumos/overlays/board-overlays/+/1351133 which seems highly suspcious as a root cause. He is setting it to verify -1 and we will see how next CQ run goes.
,
Dec 13
,
Dec 13
,
Dec 14
I had tested celes which is from same family as edgar but that worked fine. Trying to see if I can repro on edgar. Changing to p1 since next edagr runs looks fine.
,
Dec 14
,
Dec 14
Launched a few tryjobs with HWTests: edgar: All HWTests passed : https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927213768345951664 celes: All HWtests passed (except 2 ARC++ tests): https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927208281494186576
,
Dec 14
I flashed the image from https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927213768345951664 on a edgar device locally and it works fine. So, I am not sure why CQ is hitting this problem. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by jclinton@chromium.org
, Dec 13Owner: akes...@chromium.org
Status: Assigned (was: Untriaged)