Celes keep reboot w/ latest v3.18 kernel at Jun. 6th. (tip commit: 860bb18c8ec3)
Reported by
gs0...@gmail.com,
Jun 6 2016
|
|||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.63 Safari/537.36 Platform: Celes Steps to reproduce the problem: 1. repo sync to latest source of Jun. 6th 2. Build kernel, e.g. emerge-celes sys-kernel/chromeos-kernel-3_18 3. Update kernel, e.g. ./update_kernel.sh --remote=10.5.232.32 What is the expected behavior? Stable boot to login window What went wrong? Device keep rebooting, potentially hit kernel panic Did this work before? Yes commit ef45d91ecae6 2016-06-02 Jaiganesh Narayanan (b1) CHROMIMUM: qcom: rng: enable rng hardware Chrome version: 49.0.2623.63 Channel: n/a OS Version: 3.18.0 Flash Version: Shockwave Flash 21.0 r0 Had naive dichotomy, looks to me problematic commit landed in Jun. 3rd. Bad in tip commit 860bb18c8ec3, Good in Jun. 2nd commit of ef45d91ecae6, Bad in Jun 4th. commit of 6be651d49e7a Failed on Celes (BSW), but not hit rebooting when I used Skylake chromebook.
,
Jun 6 2016
Rollbacked to safe commit, rebuilt it w/ tty enabling options, updated and capture UART logs as enclosed, observed several rounds, it looked to me consistant hitting crash pattern as down below example: [ 9.269364] BUG: unable to handle kernel paging request at ffffffffc06d4def [ 9.272681] IP: [<ffffffffba023bf4>] strcpy+0xc/0x18 [ 9.281387] PGD 3aa1b067 PUD 3aa1d067 PMD 604b2067 PTE 800000005f20e161 [ 9.288031] Oops: 0003 [#1] PREEMPT SMP [ 9.288031] gsmi: Log Shutdown Reason 0x03 <snip> [ 9.341856] CPU: 1 PID: 132 Comm: udevd Tainted: G W 3.18.0 #5 [ 9.341856] Hardware name: GOOGLE Celes, BIOS Google_Celes.7287.92.34 05/24/2016 <snip> [ 9.488223] Call Trace: [ 9.488223] [<ffffffffc06d3389>] init_module+0x2e0389/0x2e03e5 [snd_soc_sst_cht_bsw_rt5645] [ 9.488223] [<ffffffffba1aa02a>] platform_drv_probe+0x4b/0x91 [ 9.488223] [<ffffffffba1a55fa>] ? devices_kset_move_last+0x60/0x64 [ 9.488223] [<ffffffffba1a86ec>] driver_probe_device+0x109/0x2b1 [ 9.488223] [<ffffffffba1a895a>] __driver_attach+0x5e/0x81 [ 9.488223] [<ffffffffba1a88fc>] ? __device_attach_driver+0x68/0x68 [ 9.488223] [<ffffffffba1a77bd>] bus_for_each_dev+0x8c/0xaf [ 9.488223] [<ffffffffba1a817a>] driver_attach+0x1e/0x20 [ 9.488223] [<ffffffffba1a7e02>] bus_add_driver+0xeb/0x1e3 [ 9.488223] [<ffffffffba1a913d>] driver_register+0x8f/0xcc [ 9.488223] [<ffffffffc03f3000>] ? 0xffffffffc03f3000 [ 9.488223] [<ffffffffba1a9fa4>] __platform_driver_register+0x4a/0x4c [ 9.488223] [<ffffffffc03f3017>] init_module+0x17/0x1000 [snd_soc_sst_cht_bsw_rt5645] [ 9.488223] [<ffffffffb9e003b5>] do_one_initcall+0x188/0x19d [ 9.488223] [<ffffffffb9f091ae>] ? __vunmap+0xac/0xb7 [ 9.488223] [<ffffffffb9ea0481>] load_module+0x15e4/0x1bb4 [ 9.488223] [<ffffffffb9ea0bd2>] SyS_finit_module+0x86/0xab [ 9.488223] [<ffffffffba421edc>] system_call_fastpath+0x1c/0x21 [ 9.488223] Code: 7b ba 31 c0 e8 7a 84 3f 00 48 89 de 48 c7 c7 3a 76 79 ba 31 c0 e8 69 84 3f 00 5b 41 5c 5d c3 55 48 89 f8 31 d2 48 89 e5 8a 0c 16 <88> 0c 10 48 ff c2 84 c9 75 f3 5d c3 55 48 89 f8 31 c9 48 89 e5 [ 9.488223] RIP [<ffffffffba023bf4>] strcpy+0xc/0x18 [ 9.488223] RSP <ffff8800788ffb98> [ 9.488223] CR2: ffffffffc06d4def [ 9.488223] ---[ end trace 0c090b7b0e56978c ]--- [ 9.488223] Kernel panic - not syncing: Fatal exception
,
Jun 6 2016
FYI, tried to review my original report. Downloaded three Celes CPFE images as USB key, I was still able to hit rebooting issue: R53-8415.0.0 2016-06-05 R53-8418.0.0 2016-06-06 R53-8419.0.0 2016-06-06
,
Jun 8 2016
This is also reproducible on edgar, and possibly all braswell devices. Probably working in 8406.0.0, failing on 8409.0.0. https://crosland.corp.google.com/log/8406.0.0..8409.0.0
,
Jun 8 2016
Issue 618020 has been merged into this issue.
,
Jun 8 2016
This CL is almost certainly the cause: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/9e0680b08d99a49af9ce5238cacfdefd4f57d80f Rajat, can you please take a look?
,
Jun 8 2016
,
Jun 9 2016
Rajat is on paternity leave. Benson, can you PTAL?
,
Jun 9 2016
Sure. Since this is a critical issue across all Braswell systems, let's just revert the patch in question to get Braswell healthy and then we'll take a closer look at how to more carefully enable this to support BYT systems coming to 3.18? Revert here : https://chromium-review.googlesource.com/351113 Looking for a reviewer.
,
Jun 9 2016
I have verified Benson's revert that it does work on my Celes. Meanwhile, w/ the callstack I also located potential fix as salvation. Since I have Celes only, perhaps chromium-dev could review/evaluate my cherry-pick'ed patch on all BSW, then re-submit Rajat's one? https://chromium-review.googlesource.com/#/c/351150
,
Jun 9 2016
I don't have any Braswell based systems with me, but adding dylan and bernie, who may be able to help verify on thursday Mountain View time. The patch looks good to me though. We haven't landed the revert yet, and if this one looks good we can skip revert and just land this fix instead.
,
Jun 9 2016
I guess since Cyan uses a different audio codec it did not fail on this one, I can confirm an Ultima fails locally on a canary build, at this point it looks like the CL is in the CQ and comment #10 indicates this is fixed with the revert so I think we are good to go, we can verify the build after this lands.
,
Jun 10 2016
Didn't see a commit message here, but Harry's CL was merged : https://chromium-review.googlesource.com/#/c/351150 Closing. Let me know if we see something like this again after 8434.0.0.
,
Jul 1 2016
Bulk verified
,
Jul 31 2016
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by gs0...@gmail.com
, Jun 6 2016It looks me same problem if I build whole image into USB key and boots Celes w/ it. i.e. /build_image --board=${BOARD} --noenable_rootfs_verification test