Frequent pre-cq failures on caroline. |
||||||||||||||||||||
Issue descriptionI noticed that many of the recent pre-cq runs on the board have failed. https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/?limit=100 Is the board broken? It's a mandatory pre-cq for chromiumos-overlay changes so I hope this is something the build sheriffs could help with?
,
Apr 5 2017
,
Apr 5 2017
pre-cq is blocked on this I believe. Upping to P0
,
Apr 5 2017
This bug is very much alive. None of the passing pre-cq runs from today were caroline. The only caroline-pre-cq run today failed the same way. shchen@ is driving this now.
,
Apr 5 2017
,
Apr 5 2017
,
Apr 5 2017
There's a suggestion that this could be related to bug 708693.
,
Apr 5 2017
There's crbug.com/708693 also that was a kernel crash on Caroline.
,
Apr 5 2017
,
Apr 5 2017
https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/26218 failed due to: 13:44:50: ERROR: Cannot find prebuilts for chromeos-base/chromeos-chrome on caroline https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/26209 not sure exactly why this failed looking at the log, but there is a warning: 12:48:50: WARNING: Patch jashur:*346943:*5de1285a has already been merged. The previous failures were in VMTest. I think these are related to bug 708693
,
Apr 5 2017
The Chrome prebuilts issue is probably issue 708758 . Will watch the caroline pre-cq status after it's resolved.
,
Apr 5 2017
We currently believe that this was entirely due to bug 708693. Revert has been landed and a verification pre-cq run is in-flight.
,
Apr 6 2017
pre-cq passed.
,
Apr 7 2017
This is not fixed. caroline-pre-cq is still failing most of the times. (just with lesser probability?) http://shortn/_RCSUh0JHLU
,
Apr 7 2017
https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/26636/ /var/log/messages from one of the tests contains: 2017-04-07T20:14:16.193125+00:00 WARNING kernel: [ 9.192877] ------------[ cut here ]------------ 2017-04-07T20:14:16.193128+00:00 WARNING kernel: [ 9.192886] WARNING: CPU: 2 PID: 845 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/gpu/drm/ttm/ttm_bo_vm.c:265 ttm_bo_mmap+0x19e/0x1ab [ttm]() 2017-04-07T20:14:16.193129+00:00 WARNING kernel: [ 9.192887] Modules linked in: cfg80211 ip6table_filter snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device cirrus ttm 2017-04-07T20:14:16.193130+00:00 WARNING kernel: [ 9.192894] CPU: 2 PID: 845 Comm: Chrome_ProcessL Not tainted 3.18.0-14544-g313323ca34e5 #1 2017-04-07T20:14:16.193132+00:00 WARNING kernel: [ 9.192895] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 2017-04-07T20:14:16.193133+00:00 WARNING kernel: [ 9.192897] 0000000000000000 00000000e94d52a1 ffff88007636fd50 ffffffff93a991c0 2017-04-07T20:14:16.193134+00:00 WARNING kernel: [ 9.192899] 0000000000000000 0000000000000000 ffff88007636fd90 ffffffff93463a1a 2017-04-07T20:14:16.193135+00:00 WARNING kernel: [ 9.192901] 00007434caf23000 ffffffffc0253cd1 ffff88007b1c9400 ffff88007adac000 2017-04-07T20:14:16.193135+00:00 WARNING kernel: [ 9.192903] Call Trace: 2017-04-07T20:14:16.193136+00:00 WARNING kernel: [ 9.192908] [<ffffffff93a991c0>] dump_stack+0x4e/0x71 2017-04-07T20:14:16.193136+00:00 WARNING kernel: [ 9.192913] [<ffffffff93463a1a>] warn_slowpath_common+0x81/0x9b 2017-04-07T20:14:16.193137+00:00 WARNING kernel: [ 9.192916] [<ffffffffc0253cd1>] ? ttm_bo_mmap+0x19e/0x1ab [ttm] 2017-04-07T20:14:16.193138+00:00 WARNING kernel: [ 9.192918] [<ffffffff93463b1d>] warn_slowpath_null+0x1a/0x1c 2017-04-07T20:14:16.193139+00:00 WARNING kernel: [ 9.192920] [<ffffffffc0253cd1>] ttm_bo_mmap+0x19e/0x1ab [ttm] 2017-04-07T20:14:16.193140+00:00 WARNING kernel: [ 9.192923] [<ffffffff9346220a>] copy_process.part.41+0xe11/0x1798 2017-04-07T20:14:16.193141+00:00 WARNING kernel: [ 9.192925] [<ffffffff93462d37>] do_fork+0xc9/0x2b0 2017-04-07T20:14:16.193142+00:00 WARNING kernel: [ 9.192928] [<ffffffff93a9dc53>] ? _raw_spin_unlock_irq+0xe/0x22 2017-04-07T20:14:16.193142+00:00 WARNING kernel: [ 9.192931] [<ffffffff93470752>] ? __set_current_blocked+0x49/0x4e 2017-04-07T20:14:16.193143+00:00 WARNING kernel: [ 9.192933] [<ffffffff93462f98>] SyS_clone+0x16/0x18 2017-04-07T20:14:16.193144+00:00 WARNING kernel: [ 9.192935] [<ffffffff93a9e5e9>] stub_clone+0x69/0x90 2017-04-07T20:14:16.193145+00:00 WARNING kernel: [ 9.192937] [<ffffffff93a9e2dc>] ? system_call_fastpath+0x1c/0x21 2017-04-07T20:14:16.193145+00:00 WARNING kernel: [ 9.192939] ---[ end trace e50daafcf694fd2e ]---
,
Apr 7 2017
I'm not entirely sure if #15 should have been RVG. Someone please advice.
,
Apr 7 2017
+ marcheu@ Hi marcheu@, There seems to be a gpu-related kernel crash. Could you take a look at the log to help us find which CL to blame?
,
Apr 7 2017
This isn't caroline graphics, this is VM graphics. zachr@ have you seen this?
,
Apr 8 2017
,
Apr 8 2017
What's going on here? This looks like it's failing about a quarter of pre-cq runs. Can we remove the caroline builder from the pre-cq? Leaving it in doesn't appear to be accomplishing anything.
,
Apr 8 2017
The solution is to mark caroline as caroline-no-vmtest-pre-cq here https://chromium-review.googlesource.com/#/c/446586/3/lib/constants.py as argued in issue 709696 .
,
Apr 8 2017
Ok, here is my suggestion https://chromium-review.googlesource.com/#/c/472069/1/lib/constants.py
,
Apr 8 2017
[Cleaned up wrong statements about persistence of container.] More thoughts. No guarantee this fixes the cq, but at least it rearranges the chairs. I am baffled that caroline in 3.18 has problems while cyan, which is also on 3.18 and runs vmtest on other builders is fine. (They should be both the same in the vm.) https://uberchromegw.corp.google.com/i/chromeos/builders/cyan-release?numbuilds=200 I checked a few caroline failures and they seem to happen around results-22-security_EnableChromeTesting/ results-23-login_OwnershipNotRetaken/ results-25-security_SandboxLinuxUnittests/
,
Apr 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/5b15f28b8579bdc2f119194c50f793f641788af5 commit 5b15f28b8579bdc2f119194c50f793f641788af5 Author: Ilja H. Friedel <ihf@chromium.org> Date: Sat Apr 08 05:15:10 2017 Workaround caroline vmtest problems. With this change we still build vulkan library on caroline. And we run vmtest on a newer Intel board (samus). Coverage with this change should be practically unchanged. TEST=None. BUG= chromium:708715 Change-Id: I8ddbc682a5c625b3dc8232559dfb02a13db64bd9 Reviewed-on: https://chromium-review.googlesource.com/472069 Tested-by: Ilja H. Friedel <ihf@chromium.org> Reviewed-by: Stéphane Marchesin <marcheu@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> [modify] https://crrev.com/5b15f28b8579bdc2f119194c50f793f641788af5/lib/constants.py
,
Apr 8 2017
The true reason of caroline vmtest failures is that the smoke suite times out. It times out because one of the tests (not limited to the ones mentioned in #24) hangs at the login screen and burns suite time. FAIL login_OwnershipNotRetaken login_OwnershipNotRetaken timestamp=1491628084 localtime=Apr 08 00:08:04 Unhandled LoginException: Timed out going through login screen. Cryptohome not mounted. OOBE not dismissed.
,
Apr 8 2017
Stephane determined the warning in #15 is harmless and is removing it. Which means the Chrome login timeouts remain the main suspect.
,
Apr 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/56dcbb0bfcaa37a239f52344e3dd6d3aa01629f8 commit 56dcbb0bfcaa37a239f52344e3dd6d3aa01629f8 Author: Stéphane Marchesin <marcheu@chromium.org> Date: Sat Apr 08 09:37:57 2017 CHROMIUM: drm/ttm: Remove wrong warning When a ttm buffer is created by one process, shared with another through prime, the buffer carries the address_space of the creator, but we are using the vma of the importer. Since this case is valid, it means that this warning is invalid, so let's remove it. BUG= chromium:708715 TEST=build and run VM for caroline Change-Id: I5244a4aa0f9377d5b5f733056ace2cdbfbcf43f7 Reviewed-on: https://chromium-review.googlesource.com/472207 Commit-Ready: Ilja H. Friedel <ihf@chromium.org> Tested-by: Ilja H. Friedel <ihf@chromium.org> Reviewed-by: Ilja H. Friedel <ihf@chromium.org> [modify] https://crrev.com/56dcbb0bfcaa37a239f52344e3dd6d3aa01629f8/drivers/gpu/drm/ttm/ttm_bo_vm.c
,
Apr 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/258e87b6917b8fab354f36679e8fec5ae121f869 commit 258e87b6917b8fab354f36679e8fec5ae121f869 Author: Stéphane Marchesin <marcheu@chromium.org> Date: Sat Apr 08 09:37:53 2017 CHROMIUM: drm/ttm: Remove wrong warning When a ttm buffer is created by one process, shared with another through prime, the buffer carries the address_space of the creator, but we are using the vma of the importer. Since this case is valid, it means that this warning is invalid, so let's remove it. BUG= chromium:708715 TEST=build and run VM for caroline Change-Id: I3cc41d7ad7640c9ee40e6b1d2f794fabc6f154dd Reviewed-on: https://chromium-review.googlesource.com/472226 Commit-Ready: Ilja H. Friedel <ihf@chromium.org> Tested-by: Ilja H. Friedel <ihf@chromium.org> Reviewed-by: Ilja H. Friedel <ihf@chromium.org> [modify] https://crrev.com/258e87b6917b8fab354f36679e8fec5ae121f869/drivers/gpu/drm/ttm/ttm_bo_vm.c
,
Apr 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/a9a790a7764ea54a45b0dbfc1e3ea6ba67f04289 commit a9a790a7764ea54a45b0dbfc1e3ea6ba67f04289 Author: Stéphane Marchesin <marcheu@chromium.org> Date: Sat Apr 08 09:37:56 2017 CHROMIUM: drm/ttm: Remove wrong warning When a ttm buffer is created by one process, shared with another through prime, the buffer carries the address_space of the creator, but we are using the vma of the importer. Since this case is valid, it means that this warning is invalid, so let's remove it. BUG= chromium:708715 TEST=build and run VM for caroline Change-Id: I702e96a1d995ba38c37ff93d203b709d37bcb63e Reviewed-on: https://chromium-review.googlesource.com/472246 Commit-Ready: Ilja H. Friedel <ihf@chromium.org> Tested-by: Ilja H. Friedel <ihf@chromium.org> Reviewed-by: Ilja H. Friedel <ihf@chromium.org> [modify] https://crrev.com/a9a790a7764ea54a45b0dbfc1e3ea6ba67f04289/drivers/gpu/drm/ttm/ttm_bo_vm.c
,
Apr 10 2017
+this week's sheriffs.
,
Apr 10 2017
I'm pretty sure that the underlying symptom is now fixed in that we're no longer testing on caroline in the Pre-CQ. There's some discussion regarding the right long-term fix in bug 709696 .
,
Mar 30 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/4316e542a5b67884134eb9f6980c7ea14d8c4dd2 commit 4316e542a5b67884134eb9f6980c7ea14d8c4dd2 Author: Stéphane Marchesin <marcheu@chromium.org> Date: Tue Dec 12 06:44:26 2017 CHROMIUM: drm/ttm: Remove wrong warning When a ttm buffer is created by one process, shared with another through prime, the buffer carries the address_space of the creator, but we are using the vma of the importer. Since this case is valid, it means that this warning is invalid, so let's remove it. BUG= chromium:708715 TEST=build and run VM for caroline Change-Id: I3cc41d7ad7640c9ee40e6b1d2f794fabc6f154dd Reviewed-on: https://chromium-review.googlesource.com/472226 Commit-Ready: Ilja H. Friedel <ihf@chromium.org> Tested-by: Ilja H. Friedel <ihf@chromium.org> Reviewed-by: Ilja H. Friedel <ihf@chromium.org> (cherry picked from commit 258e87b6917b8fab354f36679e8fec5ae121f869) Reviewed-on: https://chromium-review.googlesource.com/783670 Tested-by: Lann Martin <lannm@chromium.org> Tested-by: Craig Bergstrom <craigb@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Reviewed-by: Stéphane Marchesin <marcheu@chromium.org> Reviewed-by: Craig Bergstrom <craigb@chromium.org> [modify] https://crrev.com/4316e542a5b67884134eb9f6980c7ea14d8c4dd2/drivers/gpu/drm/ttm/ttm_bo_vm.c |
||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||
Comment 1 by pprabhu@chromium.org
, Apr 5 2017