New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 830321 link

Starred by 24 users

Issue metadata

Status: Verified
Owner:
Closed: Apr 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocked on:
issue 830865



Sign in to add a comment

M67: Caroline and Terra builds are RED; Chrome crashes at boot

Project Member Reported by kkan...@chromium.org, Apr 9 2018

Issue description

Cc: jkop@chromium.org
Components: -Infra>Client>ChromeOS
Labels: -Pri-1 Pri-0
Owner: alemate@chromium.org
Status: Assigned (was: Untriaged)
This is not an infrastructure bug.

Looking at this build:
    https://uberchromegw.corp.google.com/i/chromeos/builders/caroline-release/builds/1637

You see that the build failed with this error message:
    " ... After update and reboot, Chrome failed to reach login screen within 180 seconds, ..."

That message is caused by a Chrome bug, so the gardener gets the
task.

P0, because caroline is DOA, which will hold up releases.
Summary: M67: Caroline and Terra builds are RED; Chrome crashes at boot (was: M67: Caroline build is RED for the last 1 week Starting R67-10530.0.0)

Comment 4 by dchan@chromium.org, Apr 9 2018

is this dup of https://bugs.chromium.org/p/chromium/issues/detail?id=826163 ?

based on c#33, we need a new chrome to fix the crash loop.
> is this dup of https://bugs.chromium.org/p/chromium/issues/detail?id=826163 ?

For purposes of bug tracking: NO.  The prior bug was fixed, and
caroline and terra both turned green at the time of the fix.  This
is a new failure, requiring a new bug.

It could turn out that this new failure is related to the old failure
in some way, but this would still be a new bug for all that.

Note:  I've checked caroline-chrome-pfq and terra-chrome-pfq:  Neither
of them are showing this failure.  That's highly suspicious.  Really, it
shouldn't be possible.

This is a Chrome failure, but the failure isn't due to a change
in Chrome.  The last green caroline build was here:
    https://uberchromegw.corp.google.com/i/chromeos/builders/caroline-release/builds/1606

The first red build was here:
    https://uberchromegw.corp.google.com/i/chromeos/builders/caroline-release/builds/1607

Both of those builds used Chrome 67.0.3383.0.

The most obvious explanation would be a Chrome OS change that broke
Chrome.  However, that would cause the PFQ to go red, and that hasn't
happened.  So, we have a mystery on our hands.

FTR, cros blamelist between the green/red release build: https://crosland.corp.google.com/log/10529.0.0..10530.0.0

Comment 9 by jkop@chromium.org, Apr 9 2018

Cc: uekawa@chromium.org pbe...@chromium.org alemate@chromium.org sheckylin@chromium.org cmtm@chromium.org dianwa@chromium.org jhorwich@chromium.org hashimoto@chromium.org
Adding current rotations.

Comment 10 by jkop@chromium.org, Apr 9 2018

alemate@, when you've diagnosed the cause, please create a second bug (assigned to me) for how it got through the PFQ, with whatever info you gathered that's relevant.
This bug could be related to  Issue 825425  ?

Cc: ka...@chromium.org sontis@chromium.org
Cc: newcomer@chromium.org
> This bug could be related to  Issue 825425  ?

Looking at the history of  bug 825425 , that bug is both  bug 826163 
and this bug.  Recent debug on 825425 (anything after about 3/29) is
probably relevant here.

Please note (in case it's not clear):  This is likely an OS bug that
causes Chrome to crash.  But we need to study the Chrome crash in order
to point the finger somewhere in the OS.

Cc: hoegsberg@chromium.org dcasta...@chromium.org marc...@chromium.org mcasas@chromium.org
I also think this looks like  issue 825425 , (which is actually  issue 827188  ).

CCing graphics folks.
Labels: ReleaseBlock-Dev
Tagging as a M67 blocker for caroline and terra
Labels: -Restrict-View-Google
This needs to be visible to users (see  bug 826163 ), and there's nothing
secret here.  So, dropping RVG.

I tried 10562 on caroline and it does bring up part of the UI (background image and bottom menu), but not the login prompt in the middle of the screen.

I'm getting this in /var/log/ui/ui.LATEST:

[1566:1566:0410/095306.047447:ERROR:input_method_manager_impl.cc(1080)] IMEEngine for "jkghodnilhceideoidjikpgommlajknk" is not registered
device-enumerator: scan all dirs
  device-enumerator: scanning /sys/bus
  device-enumerator: scanning /sys/class
device-enumerator: scan all dirs
  device-enumerator: scanning /sys/bus
  device-enumerator: scanning /sys/class
[1566:1566:0410/095308.526384:FATAL:login_display_host_webui.cc(841)] Renderer crash on login window

Unexpected crash report id length
System crash_reporter failed to process crash report.
Report Id: 


Blockedon: 830865
I spent whole day yesterday trying to reproduce this locally, and I could not.

Basically, dev image with 67.0.3390.0 chrome os 10.561.0.0 cannot start chrome.
When I deploy locally built chrome using simple chrome workflow and sdk --version 10561.0.0 , it works OK, no failures.
You can repro after deploying if you reboot.
Re #21: - I cannot reproduce this. Are you sure you are not boooting to the previous version?
Hm, yes, I may have been doing that...
FTR, yes, I was passing

  --target-dir=/usr/local/chrome --mount-dir=/opt/google/chrome --nostrip

to deploy_chrome and that gets unmounted on reboot, of course. Rebuilding and deploying 3390 without these options, I can't repro either.
Components: OS>Kernel>Graphics
I'm tempted to think that  crbug.com/831649  is similar/same? At least the error on chromeos4-row9-rack9-host2 for https://luci-milo.appspot.com/buildbot/chromeos/veyron_minnie-chrome-pfq/2955 is identical.
And the same for the last few peach_pit-chrome-pfq runs: https://luci-milo.appspot.com/buildbot/chromeos/peach_pit-chrome-pfq/
Components: -OS>Kernel>Graphics
(not a kernel problem)
Cc: kbleicher@chromium.org abodenha@chromium.org craigtumblison@chromium.org trumbull@chromium.org abod...@chromium.org jdufault@chromium.org akes...@chromium.org songsuk@chromium.org dhadd...@chromium.org josa...@chromium.org
 Issue 826163  has been merged into this issue.
 Issue 831649  has been merged into this issue.
contrary to subject, I see veyron-minnie-chrome-pfq and peach-pit-chrome-pfq blocking chrome PFQ. Is this the right bug? 

> contrary to subject, I see veyron-minnie-chrome-pfq and peach-pit-chrome-pfq blocking chrome PFQ. Is this the right bug? 

My guess is that  bug 831649  is an unrelated Chrome crash, but there's
not enough data in the bug report to say this way or that.

I've reopened  bug 831649 ; it's not a duplicate.

I could not reproduce this. My local Chrome OS image build succeeded. The image from the builder is definitely broken, but all the Chrome builds that I tried to deploy on it, succeeded.
Cc: yunlian@chromium.org cmt...@chromium.org
Cc: llozano@chromium.org manojgupta@chromium.org
Why not download one of the failing canaries, and see what can be
reproduced with that build?

Also, it looks like the reproduction attempts are using the simple
chrome workflow.  It may be that to reproduce it requires building
with the OS workflow.  Certainly, it's necessary to build with the
latest OS bits:  Although it's a Chrome crash, this failure was
caused by an OS change.

Also, we do know the blamelist for the change:
    https://crosland.corp.google.com/log/10529.0.0..10530.0.0

Given that we've been trying to blame graphics for the failures,
we might study the mesa changes.

Finally, every failure in the waterfall produces logs.  This is
the most recent for terra:
    https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/192171038-chromeos-test/chromeos4-row8-rack6-host3

I can't find any crash dumps there, but under the "crashinfo"
directory, there's "messages" file, and it shows stuff like this:

2018-04-14T21:04:18.559280+00:00 INFO session_manager[1103]: [INFO:child_exit_handler.cc(77)] Handling 1153 exit.
2018-04-14T21:04:18.559559+00:00 ERR session_manager[1103]: [ERROR:child_exit_handler.cc(85)]   Exited with signal 6
2018-04-14T21:04:18.559654+00:00 INFO session_manager[1103]: [INFO:session_manager_service.cc(296)] Exiting process is chrome.

From "crashinfo", digging down through var/log/ui or var/log/chrome,
you can find messages like this:

[6953:6953:0414/140818.466498:FATAL:login_display_host_webui.cc(841)] Renderer crash on login window

So, there's definitely a chrome crash involved.

There is a small chance that this is related to CFI (Control Flow Integrity) checking, which is currently enabled only on the terra and caroline release builders.  However CFI was enabled back on March 9, in Chrome OS R67-10475.0.0, and both caroline & terra started really failing quite a bit after that.

I suppose perhaps some change may have been committed to Chrome since CFI was enabled, which might be causing a CFI failure...It might be worth building caroline & terra without CFI (turn off the USE="cfi" flag) and see if that fixes the issue...
I created a CL to test disabling CFI on terra & caroline, then submitted tryjobs with that CL to the terra & caroline release tryjob builders.  The terra builder succeeded:

https://ci.chromium.org/p/chromeos/builds/b8949167561624961184

The caroline builder will probably fail because there are no working caroline boards in the suites pool, but the builder is here if you want to download & test the build image:

https://ci.chromium.org/p/chromeos/builds/b8949167562946755616

The CL, which I'm guessing we will probably want to commit, to unblock these builders, is here:  

https://chromium-review.googlesource.com/c/chromiumos/chromite/+/1013064


> There is a small chance that this is related to CFI
> (Control Flow Integrity) checking, which is currently
> enabled only on the terra and caroline release builders

If CFI is enabled only on terra-release and caroline-release, then
I'd rate the chance that this is related at well-nigh certain, since
the fact that this failure is restricted to just those two builders
is one of its key characteristics.  Another key characteristic is that
we can't reproduce it with local builds.  Local builds, it would seem,
also don't enable CFI.

We're seeing this now presumably because one of the OS changes in the
blamelist has tripped over an undiscovered problem with CFI.

Given where we are, I'd say the best option will be to commit the CL to
turn off "cfi" in the builders, and see what happens.  If we get that in
before 11:00 today, we'll have a definitive answer this afternoon.
The CL is already on it's way through the commit queue...it will go in whenever the CQ gets through with it.
> The CL is already on it's way through the commit queue...
> it will go in whenever the CQ gets through with it.

It might be wise to chump the CL; likely, it's important to make that
11:00 deadline.

Project Member

Comment 43 by bugdroid1@chromium.org, Apr 16 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/d88eaf5315d4963f1a16fc569aa90a5b7be531be

commit d88eaf5315d4963f1a16fc569aa90a5b7be531be
Author: Caroline Tice <cmtice@google.com>
Date: Mon Apr 16 15:49:41 2018

[release builders] Disable CFI on caroline & terra.

caroline & terra release builders have been failing recently. This
Disabling CFI on those two builders seems to fix the issue.

BUG= chromium:830321 
TEST=Tested on terra-release-tryjob builder and it passed.

Change-Id: I4e4709edc9ee2dade6b29486a6857bf2c6f440de
Reviewed-on: https://chromium-review.googlesource.com/1013064
Reviewed-by: Manoj Gupta <manojgupta@chromium.org>
Commit-Queue: Caroline Tice <cmtice@chromium.org>
Tested-by: Caroline Tice <cmtice@chromium.org>
Trybot-Ready: Caroline Tice <cmtice@chromium.org>

[modify] https://crrev.com/d88eaf5315d4963f1a16fc569aa90a5b7be531be/cbuildbot/config_dump.json
[modify] https://crrev.com/d88eaf5315d4963f1a16fc569aa90a5b7be531be/cbuildbot/chromeos_config.py

Ok, the change has been chumped.
Project Member

Comment 45 by bugdroid1@chromium.org, Apr 16 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b3eb773cd8b17c9aa4f37190d30d1040242d18c0

commit b3eb773cd8b17c9aa4f37190d30d1040242d18c0
Author: chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Mon Apr 16 16:58:07 2018

Roll src/third_party/chromite/ c90ccbc26..d88eaf531 (1 commit)

https://chromium.googlesource.com/chromiumos/chromite.git/+log/c90ccbc26d04..d88eaf5315d4

$ git log c90ccbc26..d88eaf531 --date=short --no-merges --format='%ad %ae %s'
2018-04-15 cmtice [release builders] Disable CFI on caroline & terra.

Created with:
  roll-dep src/third_party/chromite
BUG= chromium:830321 


The AutoRoll server is located here: https://chromite-chromium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


TBR=chrome-os-gardeners@chromium.org

Change-Id: I9289723920afdf2dda519c0cb1c750efadc2f29f
Reviewed-on: https://chromium-review.googlesource.com/1014175
Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#551013}
[modify] https://crrev.com/b3eb773cd8b17c9aa4f37190d30d1040242d18c0/DEPS

Labels: Hotlist-CrOS-Sheriffing
Owner: jdufault@chromium.org
Assigning to current gardener.
Re #37:

Richard, yes, I built at least two full images locally for caroline and peach_pit, and both of them worked.
alemate, Did you use the same USE flags in build_packages/build-image when building local images?

From the build_packages log at https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fcaroline-release%2F1606%2F%2B%2Frecipes%2Fsteps%2FBuildPackages__afdo_use_%2F0%2Fstdout
: 'USE=-cros-debug cfi chrome_internal thinlto afdo_use' 
No, I naively expected build_packages to create correct build for the board.
Labels: -Pri-0 Pri-1
Caroline DEV is working again.
Labels: -ReleaseBlock-Dev
(Removing RBD).
Issue 833563 has been merged into this issue.
Should this be release blocker?
Cc: abodeti@google.com dtapu...@chromium.org sahel@chromium.org pucchakayala@chromium.org mkarkada@chromium.org sdantuluri@google.com matthewjoseph@chromium.org pgangishetty@chromium.org ajha@chromium.org pbath...@chromium.org sdantul...@chromium.org brajkumar@chromium.org
 Issue 825425  has been merged into this issue.
Labels: ReleaseBlock-Beta
Tagging as a beta blocker so we don't lose this.

Per alemate@, scope is dependent on feedback from toolchain team.
I mean we probably need Toolchain team feedback to decide on further actions.
> Per alemate@, scope is dependent on feedback from toolchain team.

The confirmed failures were limited to caroline and terra, and the
code change that fixed this problem was limited to caroline and terra.
Also comment #38 says the configuration is limited to caroline and terra.
So, this bug is limited to caroline and terra.

#58: I'm asking about scope since crbug/825425 was tagged as a DUP and it included daisy, Peppy, Kip and Reks.  That bug was perhaps closed as a DUP incorrectly, however.

I need to be absolutely sure of scope if we're tagging blockers.
> #58: I'm asking about scope since crbug/825425 was tagged
> as a DUP and it included daisy, Peppy, Kip and Reks.
> That bug was perhaps closed as a DUP incorrectly, however.

Yeah,  bug 825425  seemed to have become an agglomeration of multiple
different bugs.  It was originally the caroline and terra issue that
preceded this one, but it seems to have been confused with other bugs,
including this one.

I've dropped the duplicate tag, for clarity.  This bug is definitely
only caroline and terra, and it's definitely fixed in the canary.

Just to confirm what jrbarnette@ already said:
This issue reported in this bug is limited to caroline and terra release builds. If there are fails on any other boards, they are unrelated issues.
Cc: wzang@chromium.org
It looks like caroline-release and terra-release cycled green [1][2] (but then went red due to [3]) so I'm going to close this as fixed.

Please reopen if there is additional action that needs to be taken here.

1: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8949066683798464880
2: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8949066677820217136
3: https://bugs.chromium.org/p/chromium/issues/detail?id=833886
Status: Fixed (was: Assigned)
Labels: Merge-TBD
[Auto-generated comment by a script] We noticed that this issue is targeted for M-67; it appears the fix may have landed after branch point, meaning a merge might be required. Please confirm if a merge is required here - if so add Merge-Request-67 label, otherwise remove Merge-TBD label. Thanks.
Project Member

Comment 65 by bugdroid1@chromium.org, Apr 17 2018

Labels: merge-merged-testbranch
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b3eb773cd8b17c9aa4f37190d30d1040242d18c0

commit b3eb773cd8b17c9aa4f37190d30d1040242d18c0
Author: chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Mon Apr 16 16:58:07 2018

Roll src/third_party/chromite/ c90ccbc26..d88eaf531 (1 commit)

https://chromium.googlesource.com/chromiumos/chromite.git/+log/c90ccbc26d04..d88eaf5315d4

$ git log c90ccbc26..d88eaf531 --date=short --no-merges --format='%ad %ae %s'
2018-04-15 cmtice [release builders] Disable CFI on caroline & terra.

Created with:
  roll-dep src/third_party/chromite
BUG= chromium:830321 


The AutoRoll server is located here: https://chromite-chromium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


TBR=chrome-os-gardeners@chromium.org

Change-Id: I9289723920afdf2dda519c0cb1c750efadc2f29f
Reviewed-on: https://chromium-review.googlesource.com/1014175
Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#551013}
[modify] https://crrev.com/b3eb773cd8b17c9aa4f37190d30d1040242d18c0/DEPS

Did this get merged to M67 yet?  Caroline is still failing at build and not making the RCs.
Owner: tbarzic@chromium.org
+tbarzic (current gardener) to check status of release builder.
> Did this get merged to M67 yet?  Caroline is still failing at build and not making the RCs.

It seems it didn't get merged.  However, the failures in caroline and terra
for M67 are different from the failures in this bug.  A new bug should be opened
for M67.

We should close this bug, presumably after merging the fix to M67, and probably
also M66.
Labels: Merge-Request-67
Labels: -Merge-Request-67 Merge-Approved-67
Approving merge to M67 Chrome OS.
Project Member

Comment 72 by bugdroid1@chromium.org, Apr 23 2018

Labels: merge-merged-release-R67-10575.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/a5f2995ea53207dec31ad8597feb31cc8205c6a1

commit a5f2995ea53207dec31ad8597feb31cc8205c6a1
Author: Caroline Tice <cmtice@google.com>
Date: Mon Apr 23 23:27:58 2018

[release builders] Disable CFI on caroline & terra.

caroline & terra release builders have been failing recently. This
Disabling CFI on those two builders seems to fix the issue.

BUG= chromium:830321 
TEST=Tested on terra-release-tryjob builder and it passed.

Change-Id: I4e4709edc9ee2dade6b29486a6857bf2c6f440de
Reviewed-on: https://chromium-review.googlesource.com/1013064
Reviewed-by: Manoj Gupta <manojgupta@chromium.org>
Commit-Queue: Caroline Tice <cmtice@chromium.org>
Tested-by: Caroline Tice <cmtice@chromium.org>
Trybot-Ready: Caroline Tice <cmtice@chromium.org>
(cherry picked from commit d88eaf5315d4963f1a16fc569aa90a5b7be531be)
Reviewed-on: https://chromium-review.googlesource.com/1025130
Reviewed-by: Richard Barnette <jrbarnette@google.com>
Commit-Queue: Bernie Thompson <bhthompson@chromium.org>
Tested-by: Bernie Thompson <bhthompson@chromium.org>

[modify] https://crrev.com/a5f2995ea53207dec31ad8597feb31cc8205c6a1/cbuildbot/config_dump.json
[modify] https://crrev.com/a5f2995ea53207dec31ad8597feb31cc8205c6a1/cbuildbot/chromeos_config.py

Tested on Terra with build 10575.13.0/67.0.3396.17 and was able to sign in successfully after recovery with USB stick.  
Caroline started showing the results on stainless.
Project Member

Comment 75 by sheriffbot@chromium.org, Apr 27 2018

Cc: bhthompson@google.com
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 76 by sheriffbot@chromium.org, Apr 30 2018

This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Owner: cmt...@chromium.org
Assigning to cmtice to verify whether this has to be merged anywhere else.
No, this does not need to be merged anywhere else.  The initial change only went into R67.

Do I mark this as verified now?
Just in case anybody here is interested, we finally figured out (and fixed) the cause of these failures.  It was rather complicated, and a conjunction of multiple things occurring that caused the failure and made it hard to diagnose.

The basic issue involved Goma + a compiler change goma did not know about:  In order to work properly CFI has a blacklist file of known issues -- functions/files not to check.  Goma knows about this and puts/looks for the file in a certain place.   LLVM changed the location of the file, and nobody thought to tell goma.  CFI was enabled in Chrome OS (on caroline & terra) on March 9, when LLVM & goma were both still using the old location.  Everything worked properly, until LLVM was upgraded around March 20 (to start using the new location).  The CFI files were now in a new location but goma was still looking for them in the old location.  So goma builds (and ONLY goma builds) with CFI started failing.  The issue was muddied by two green builds on the builders, near the end of March, which made it look like the old issue was fixed and a new issue came up.  In fact, those were two builds where goma failed, and the build system fell back onto local builds where the files were looked for in the correct location.

We now have several fixes either in flight or actually in place, to prevent this particular issue from arising again.  We also are working on some changes in our processes to try to catch these types of issues sooner.
Hi, the Merge-Approved-67 label was never removed after this blocking merge request.  Assume the merge was made and we can remove it?
Yes, it was done.
Labels: -Merge-Approved-67 OS-iOS
Labels: -OS-iOS
unintended
Project Member

Comment 85 by sheriffbot@chromium.org, Jul 18

Labels: -Merge-TBD

Sign in to add a comment