New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 4 users
Status: Fixed
Owner:
Last visit 28 days ago
Closed: Apr 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 0
Type: Bug

Blocked on:
issue 708693
issue 708758
issue 709696



Sign in to add a comment
Frequent pre-cq failures on caroline.
Project Member Reported by wonderfly@google.com, Apr 5 2017 Back to list
I noticed that many of the recent pre-cq runs on the board have failed. https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/?limit=100

Is the board broken? It's a mandatory pre-cq for chromiumos-overlay changes so I hope this is something the build sheriffs could help with?
 
VMTests on the face of it are timing out because they're running slow.

Looking at the messages from one of the tests confirms the suspicion that the image is crashing:

http://shortn/_d9oFc9Q2oA


2017-04-04T21:46:18.292770-05:00 INFO crash_reporter[2162]: Enabling kernel crash handling
2017-04-04T21:46:18.292906-05:00 WARNING crash_reporter[2162]: Last shutdown was not clean

Sadly the kcrash file is empty.


Cc: akes...@chromium.org
Labels: -Pri-1 Pri-0
pre-cq is blocked on this I believe. Upping to P0
This bug is very much alive. None of the passing pre-cq runs from today were caroline. The only caroline-pre-cq run today failed the same way.

shchen@ is driving this now.
Cc: benzh@chromium.org
Cc: bhthompson@chromium.org dgarr...@chromium.org
There's a suggestion that this could be related to bug 708693.

There's crbug.com/708693 also that was a kernel crash on Caroline.
Blockedon: 708693
https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/26218 failed due to:
13:44:50: ERROR: Cannot find prebuilts for chromeos-base/chromeos-chrome on caroline

https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/26209 not sure exactly why this failed looking at the log, but there is a warning:
12:48:50: WARNING: Patch jashur:*346943:*5de1285a has already been merged.


The previous failures were in VMTest. I think these are related to bug 708693
Blockedon: 708758
The Chrome prebuilts issue is probably  issue 708758 . Will watch the caroline pre-cq status after it's resolved.
We currently believe that this was entirely due to bug 708693.
Revert has been landed and a verification pre-cq run is in-flight.
Status: Verified
pre-cq passed.
Status: Assigned
This is not fixed.

caroline-pre-cq is still failing most of the times. (just with lesser probability?)

http://shortn/_RCSUh0JHLU
https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/26636/

/var/log/messages from one of the tests contains:
2017-04-07T20:14:16.193125+00:00 WARNING kernel: [    9.192877] ------------[ cut here ]------------
2017-04-07T20:14:16.193128+00:00 WARNING kernel: [    9.192886] WARNING: CPU: 2 PID: 845 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/gpu/drm/ttm/ttm_bo_vm.c:265 ttm_bo_mmap+0x19e/0x1ab [ttm]()
2017-04-07T20:14:16.193129+00:00 WARNING kernel: [    9.192887] Modules linked in: cfg80211 ip6table_filter snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device cirrus ttm
2017-04-07T20:14:16.193130+00:00 WARNING kernel: [    9.192894] CPU: 2 PID: 845 Comm: Chrome_ProcessL Not tainted 3.18.0-14544-g313323ca34e5 #1
2017-04-07T20:14:16.193132+00:00 WARNING kernel: [    9.192895] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
2017-04-07T20:14:16.193133+00:00 WARNING kernel: [    9.192897]  0000000000000000 00000000e94d52a1 ffff88007636fd50 ffffffff93a991c0
2017-04-07T20:14:16.193134+00:00 WARNING kernel: [    9.192899]  0000000000000000 0000000000000000 ffff88007636fd90 ffffffff93463a1a
2017-04-07T20:14:16.193135+00:00 WARNING kernel: [    9.192901]  00007434caf23000 ffffffffc0253cd1 ffff88007b1c9400 ffff88007adac000
2017-04-07T20:14:16.193135+00:00 WARNING kernel: [    9.192903] Call Trace:
2017-04-07T20:14:16.193136+00:00 WARNING kernel: [    9.192908]  [<ffffffff93a991c0>] dump_stack+0x4e/0x71
2017-04-07T20:14:16.193136+00:00 WARNING kernel: [    9.192913]  [<ffffffff93463a1a>] warn_slowpath_common+0x81/0x9b
2017-04-07T20:14:16.193137+00:00 WARNING kernel: [    9.192916]  [<ffffffffc0253cd1>] ? ttm_bo_mmap+0x19e/0x1ab [ttm]
2017-04-07T20:14:16.193138+00:00 WARNING kernel: [    9.192918]  [<ffffffff93463b1d>] warn_slowpath_null+0x1a/0x1c
2017-04-07T20:14:16.193139+00:00 WARNING kernel: [    9.192920]  [<ffffffffc0253cd1>] ttm_bo_mmap+0x19e/0x1ab [ttm]
2017-04-07T20:14:16.193140+00:00 WARNING kernel: [    9.192923]  [<ffffffff9346220a>] copy_process.part.41+0xe11/0x1798
2017-04-07T20:14:16.193141+00:00 WARNING kernel: [    9.192925]  [<ffffffff93462d37>] do_fork+0xc9/0x2b0
2017-04-07T20:14:16.193142+00:00 WARNING kernel: [    9.192928]  [<ffffffff93a9dc53>] ? _raw_spin_unlock_irq+0xe/0x22
2017-04-07T20:14:16.193142+00:00 WARNING kernel: [    9.192931]  [<ffffffff93470752>] ? __set_current_blocked+0x49/0x4e
2017-04-07T20:14:16.193143+00:00 WARNING kernel: [    9.192933]  [<ffffffff93462f98>] SyS_clone+0x16/0x18
2017-04-07T20:14:16.193144+00:00 WARNING kernel: [    9.192935]  [<ffffffff93a9e5e9>] stub_clone+0x69/0x90
2017-04-07T20:14:16.193145+00:00 WARNING kernel: [    9.192937]  [<ffffffff93a9e2dc>] ? system_call_fastpath+0x1c/0x21
2017-04-07T20:14:16.193145+00:00 WARNING kernel: [    9.192939] ---[ end trace e50daafcf694fd2e ]---
Labels: Restrict-View-Google
I'm not entirely sure if #15 should have been RVG. Someone please advice.
Cc: marc...@chromium.org
+ marcheu@

Hi marcheu@,

There seems to be a gpu-related kernel crash.
Could you take a look at the log to help us find which CL to blame?
Cc: za...@chromium.org
This isn't caroline graphics, this is VM graphics. zachr@ have you seen this?
Blockedon: 709696
Cc: snanda@chromium.org
Labels: -Restrict-View-Google
What's going on here? This looks like it's failing about a quarter of pre-cq runs.

Can we remove the caroline builder from the pre-cq? Leaving it in doesn't appear to be accomplishing anything.
Comment 21 by ihf@chromium.org, Apr 8 2017
The solution is to mark caroline as caroline-no-vmtest-pre-cq here
https://chromium-review.googlesource.com/#/c/446586/3/lib/constants.py
as argued in  issue 709696 .
Comment 23 Deleted
Comment 24 by ihf@chromium.org, Apr 8 2017
[Cleaned up wrong statements about persistence of container.]

More thoughts. No guarantee this fixes the cq, but at least it rearranges the chairs. I am baffled that caroline in 3.18 has problems while cyan, which is also on 3.18 and runs vmtest on other builders is fine. (They should be both the same in the vm.)
https://uberchromegw.corp.google.com/i/chromeos/builders/cyan-release?numbuilds=200

I checked a few caroline failures and they seem to happen around

  results-22-security_EnableChromeTesting/
  results-23-login_OwnershipNotRetaken/
  results-25-security_SandboxLinuxUnittests/
Project Member Comment 25 by bugdroid1@chromium.org, Apr 8 2017
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/5b15f28b8579bdc2f119194c50f793f641788af5

commit 5b15f28b8579bdc2f119194c50f793f641788af5
Author: Ilja H. Friedel <ihf@chromium.org>
Date: Sat Apr 08 05:15:10 2017

Workaround caroline vmtest problems.

With this change we still build vulkan library on caroline. And we run
vmtest on a newer Intel board (samus). Coverage with this change should
be practically unchanged.

TEST=None.
BUG= chromium:708715 

Change-Id: I8ddbc682a5c625b3dc8232559dfb02a13db64bd9
Reviewed-on: https://chromium-review.googlesource.com/472069
Tested-by: Ilja H. Friedel <ihf@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Dan Erat <derat@chromium.org>

[modify] https://crrev.com/5b15f28b8579bdc2f119194c50f793f641788af5/lib/constants.py

Comment 26 by ihf@chromium.org, Apr 8 2017
The true reason of caroline vmtest failures is that the smoke suite times out. It times out because one of the tests (not limited to the ones mentioned in #24) hangs at the login screen and burns suite time.

FAIL	login_OwnershipNotRetaken	login_OwnershipNotRetaken	timestamp=1491628084	localtime=Apr 08 00:08:04	Unhandled LoginException: Timed out going through login screen. Cryptohome not mounted. OOBE not dismissed.

Comment 27 by ihf@chromium.org, Apr 8 2017
Stephane determined the warning in #15 is harmless and is removing it. Which means the Chrome login timeouts remain the main suspect.
Project Member Comment 28 by bugdroid1@chromium.org, Apr 8 2017
Labels: merge-merged-chromeos-3.18
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/56dcbb0bfcaa37a239f52344e3dd6d3aa01629f8

commit 56dcbb0bfcaa37a239f52344e3dd6d3aa01629f8
Author: Stéphane Marchesin <marcheu@chromium.org>
Date: Sat Apr 08 09:37:57 2017

CHROMIUM: drm/ttm: Remove wrong warning

When a ttm buffer is created by one process, shared with another
through prime, the buffer carries the address_space of the creator,
but we are using the vma of the importer. Since this case is valid,
it means that this warning is invalid, so let's remove it.

BUG= chromium:708715 
TEST=build and run VM for caroline

Change-Id: I5244a4aa0f9377d5b5f733056ace2cdbfbcf43f7
Reviewed-on: https://chromium-review.googlesource.com/472207
Commit-Ready: Ilja H. Friedel <ihf@chromium.org>
Tested-by: Ilja H. Friedel <ihf@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>

[modify] https://crrev.com/56dcbb0bfcaa37a239f52344e3dd6d3aa01629f8/drivers/gpu/drm/ttm/ttm_bo_vm.c

Project Member Comment 29 by bugdroid1@chromium.org, Apr 8 2017
Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/258e87b6917b8fab354f36679e8fec5ae121f869

commit 258e87b6917b8fab354f36679e8fec5ae121f869
Author: Stéphane Marchesin <marcheu@chromium.org>
Date: Sat Apr 08 09:37:53 2017

CHROMIUM: drm/ttm: Remove wrong warning

When a ttm buffer is created by one process, shared with another
through prime, the buffer carries the address_space of the creator,
but we are using the vma of the importer. Since this case is valid,
it means that this warning is invalid, so let's remove it.

BUG= chromium:708715 
TEST=build and run VM for caroline

Change-Id: I3cc41d7ad7640c9ee40e6b1d2f794fabc6f154dd
Reviewed-on: https://chromium-review.googlesource.com/472226
Commit-Ready: Ilja H. Friedel <ihf@chromium.org>
Tested-by: Ilja H. Friedel <ihf@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>

[modify] https://crrev.com/258e87b6917b8fab354f36679e8fec5ae121f869/drivers/gpu/drm/ttm/ttm_bo_vm.c

Project Member Comment 30 by bugdroid1@chromium.org, Apr 8 2017
Labels: merge-merged-chromeos-3.14
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/a9a790a7764ea54a45b0dbfc1e3ea6ba67f04289

commit a9a790a7764ea54a45b0dbfc1e3ea6ba67f04289
Author: Stéphane Marchesin <marcheu@chromium.org>
Date: Sat Apr 08 09:37:56 2017

CHROMIUM: drm/ttm: Remove wrong warning

When a ttm buffer is created by one process, shared with another
through prime, the buffer carries the address_space of the creator,
but we are using the vma of the importer. Since this case is valid,
it means that this warning is invalid, so let's remove it.

BUG= chromium:708715 
TEST=build and run VM for caroline

Change-Id: I702e96a1d995ba38c37ff93d203b709d37bcb63e
Reviewed-on: https://chromium-review.googlesource.com/472246
Commit-Ready: Ilja H. Friedel <ihf@chromium.org>
Tested-by: Ilja H. Friedel <ihf@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>

[modify] https://crrev.com/a9a790a7764ea54a45b0dbfc1e3ea6ba67f04289/drivers/gpu/drm/ttm/ttm_bo_vm.c

Cc: -philipchen@chromium.org -itspeter@chromium.org jrbarnette@chromium.org dgreid@chromium.org
Owner: reinauer@chromium.org
+this week's sheriffs.
Status: Fixed
I'm pretty sure that the underlying symptom is now fixed in that
we're no longer testing on caroline in the Pre-CQ.

There's some discussion regarding the right long-term fix in
 bug 709696 .

Sign in to add a comment