New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 914717 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug
Build-Toolchain



Sign in to add a comment

invalid opcode from chrome process on braswell boards

Project Member Reported by fukino@chromium.org, Dec 13

Issue description

CQ fails due to provisioning failure 7 cycles in a row.

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927284551986369504
In this cycle, cyan-paladin, wizpig-paladin, and edgar-paladin fail on the same failures:
  TestLabFailure: ** HWTest did not complete due to infrastructure issues (code 3) **

In other cycles, link-paladin and sentry-paladin also failed.
This might be related to  issue 771257 , but not limited to wizpig-paladin.

Maybe there is a network issue in the lab?
Can anyone from Infra folks look into the issue?
 
Components: -Infra>Client>ChromeOS>CI Infra>Client>ChromeOS>Test
Owner: akes...@chromium.org
Status: Assigned (was: Untriaged)
I haven't looked at all the failures yet, but the first one I looked at (edgar) at https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927283469920808384 is failing with "Failure in build R73-11395.0.0-rc1: Chrome failed to reach login screen", and is dumping all sorts of crash logs. See for instance https://stainless.corp.google.com/browse/chromeos-autotest-results/266572306-chromeos-test/
Summary: CQ fails frequently with "Chrome failed to reach login screen" (was: CQ continuously fails in "The HWTest [provision] stage".)
Cc: -pho...@chromium.org osh...@chromium.org
Owner: matth...@chromium.org
-> sheriff
Labels: -Pri-1 Pri-0
Have we seen this on the canaries yet? If so, that would rule out a bad CL in the current CQ being at fault.
^ can't tell, canaries are broken prior to hwtest due to Issue 914705
Looks like it started on one of two builds below:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927477447245955376
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927472092515883456

Chrome wasn't upreved during this transition (73.0.3636.0 rc1), so it's probably chromeos side issue?

In all of the failures I've looked, immediately preceding the skipped chrome coredump is invalid opcode attempts from chrome (sub)processes.  Some examples (blank lines separate different runs / different dmesg logs):


[  189.481125] do_trap: 3 callbacks suppressed
[  189.481144] traps: chrome[12723] trap invalid opcode ip:5ee6f27488f3 sp:7ffd5aacfa80 error:0 in chrome[5ee6edc00000+6c71000]
[  189.487410] traps: chrome[12736] trap invalid opcode ip:5ee6f27488f3 sp:7ffd5aacfa80 error:0 in chrome[5ee6edc00000+6c71000]
[  189.503536] traps: CompositorTileW[12684] trap divide error ip:616d7cbc917c sp:711123b6ddf0 error:0 in chrome[616d78cac000+8c1a000]
[  189.548173] traps: chrome[12737] trap invalid opcode ip:5ee6f27488f3 sp:7ffd5aacfa80 error:0 in chrome[5ee6edc00000+6c71000]
[  189.559921] traps: chrome[12743] trap invalid opcode ip:5ee6f27488f3 sp:7ffd5aacfa80 error:0 in chrome[5ee6edc00000+6c71000]
[  189.560187] Pid 1(chrome) over core_pipe_limit
[  189.560198] Skipping core dump


[   23.535511] traps: CompositorTileW[1760] trap divide error ip:5a99e869616c sp:7525c2864da0 error:0 in chrome[5a99e486a000+9bb5000]
[   23.559163] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
[   23.571081] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
[   23.921045] SELinux: initialized (dev proc, type proc), uses genfs_contexts
[   24.090179] SELinux: initialized (dev proc, type proc), uses genfs_contexts
[   24.105967] SELinux: initialized (dev proc, type proc), uses genfs_contexts
[   25.134436] SELinux: initialized (dev proc, type proc), uses genfs_contexts
[   25.139315] SELinux: initialized (dev proc, type proc), uses genfs_contexts
[   25.163586] SELinux: initialized (dev proc, type proc), uses genfs_contexts
[   25.194459] SELinux: initialized (dev proc, type proc), uses genfs_contexts
[   25.269128] traps: chrome[1918] trap invalid opcode ip:5eb0dfd30d93 sp:7ffdcb170b00 error:0 in chrome[5eb0dba00000+7c8e000]
[   25.269259] traps: chrome[1911] trap invalid opcode ip:5eb0dfd30d93 sp:7ffdcb170b00 error:0 in chrome[5eb0dba00000+7c8e000]
[   25.288725] SELinux: initialized (dev proc, type proc), uses genfs_contexts
[   25.289835] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
[   25.296731] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
[   25.296939] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
[   25.302272] SELinux: initialized (dev proc, type proc), uses genfs_contexts
[   25.302951] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
[   25.330112] traps: chrome[1909] trap invalid opcode ip:5eb0dfd30d93 sp:7ffdcb170b00 error:0 in chrome[5eb0dba00000+7c8e000]
[   25.354729] traps: chrome[1931] trap invalid opcode ip:5eb0dfd30d93 sp:7ffdcb170b00 error:0 in chrome[5eb0dba00000+7c8e000]
[   25.380869] traps: CompositorTileW[1869] trap divide error ip:61a472e5016c sp:7e0e0f6b4da0 error:0 in chrome[61a46f024000+9bb5000]
[   25.423163] traps: chrome[1941] trap invalid opcode ip:5eb0dfd30d93 sp:7ffdcb170b00 error:0 in chrome[5eb0dba00000+7c8e000]
[   25.431023] traps: chrome[1943] trap invalid opcode ip:5eb0dfd30d93 sp:7ffdcb170b00 error:0 in chrome[5eb0dba00000+7c8e000]
[   25.431198] Pid 1(chrome) over core_pipe_limit
[   25.431209] Skipping core dump
Re: #10 - The first failure looks unrelated to this, it's in uprev prior to DUT provisioning.
Cc: manojgupta@chromium.org
caveh@ found https://chromium-review.googlesource.com/c/chromiumos/overlays/board-overlays/+/1351133 which seems highly suspcious as a root cause.  He is setting it to verify -1 and we will see how next CQ run goes.
Owner: manojgupta@chromium.org
Summary: invalid opcode from chrome process on braswell boards (was: CQ fails frequently with "Chrome failed to reach login screen")
Labels: -Pri-0 Pri-1
I had tested celes which is from same family as edgar but that worked fine.
Trying to see if I can repro on edgar.

Changing to p1 since next edagr runs looks fine.
Components: Tools>ChromeOS-Toolchain
I flashed the image from  https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8927213768345951664 on a edgar device locally and it works fine.


So, I am not sure why CQ is hitting this problem.

Sign in to add a comment