[Eve] Kernel crash logs not generated in /var/spool/crash after turning on VMX support |
|||||||||||||||||||
Issue description
Chrome Version: 64.0.3246.0
OS: ChromeOS 10063.0.0
What steps will reproduce the problem?
1) Recover device with R64-10063.0.0 or R63-10032.14.0
2) Log into User Account
3) Open VT2 or crosh shell
4) Execute: echo BUG > /sys/kernel/debug/provoke-crash/DIRECT
a) Or Alt + F10 + X
5) Wait for system to restart
6) Look for crashes in /var/spool/crash
What is the expected result?
.kcrash and .meta files should be present after a crash
What happens instead?
No .kcrash or .meta kernal crash files are in /var/spool/crash
Notes:
- Reproduced this on Eve R63-10032.14.0 and R64-10063.0.0
- This issue is *not* observed on M62-9901.51.0 Stable
,
Oct 24 2017
someone feel like bisecting this down between 9901.0.0 & 10032.0.0 ?
,
Oct 24 2017
Could this be the latency for crash files creation we caught with issue 774154?
,
Oct 25 2017
I don't think this is related to issue 774154 because when testing manually, the logs were still not present after more than 30 minutes.
,
Oct 27 2017
Looking at the diff https://crosland.corp.google.com/log/9901.54.0..10006.0.0 I see few candidates: https://chromium-review.googlesource.com/c/chromiumos/platform2/+/687806 https://chromium-review.googlesource.com/c/breakpad/breakpad/+/621307 https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/653188 There could be more.
,
Oct 27 2017
kernel crash logs don't involve breakpad at all. it's up to the kernel to save/preserve the log so that when crash-reporter starts at the next boot, it can find them under /dev/pstore/. would still be nice if someone has a system that reproduces to bisect a bit more between those ~150 versions.
,
Oct 27 2017
,
Oct 27 2017
After some digging, I've narrowed it down to a change between R63-9935.0.0 and R63-9937.0.0. Kernel crash files are present on 9935.0.0 but not on 9937.0.0 or later. Only one CL that I could find mentions changes to crash reporting: https://chromium-review.googlesource.com/c/chromiumos/platform2/+/659138
,
Oct 27 2017
that should only impact lakitu systems as that changes systemd files (which eve doesn't use) we also turned on VMX for eve in that range: https://chromium-review.googlesource.com/657918 i wouldn't expect it to affect this, but i also wouldn't be surprised if it made subtle changes to pstore behavior.
,
Oct 30 2017
#c8 & 9 thanks for narrowing the scope down. Does appear that vmx change is causing the regression as manually changing the bootarg to disablevmx=on and running platform_KernelErrorPaths.BUG to verify fixes the issue. Dylan can you have a look?
,
Oct 31 2017
Sure, doesn't reproduce on samus with v4.4, I'll set up a eve with a test build and take a look.
,
Oct 31 2017
Certainly does happen on Eve. I don't have any goo dleads as to why yet. I'll try to determine if it is failing to write or read the ramoops tomorrow.
,
Oct 31 2017
to mitigate, we could move the option to the eve-kvm overlay so we can backport it to branches until we narrow it down more
,
Oct 31 2017
I think we should probably move to the eve-kvm overlay. I noticed that it only happens after a hard reset, so the lock bit is cleared. As long as vmx is locked, ramoops works even if the kernel is told not to disable it. There isn't a lot of code in coreboot that is different, it sets the IA32_FEATURE_CONTROL msr, but that's about it.
,
Nov 4 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/b4bc5ea348016eef2fd2caa50e1e59ceed46482f commit b4bc5ea348016eef2fd2caa50e1e59ceed46482f Author: Dylan Reid <dgreid@chromium.org> Date: Sat Nov 04 04:56:47 2017 Eve: Disable VMX extensions Enabling VMX caused ramoops to fail for an unknown reason. Turn it off for now, the VMX testing has moved to an experimental board. CQ-DEPEND=CL:*496617 BUG= 777985 TEST=crash kernel, see ramoops in /dev/pstore/ after it reboots. Signed-off-by: Dylan Reid <dgreid@chromium.org> Change-Id: Id5a0f4c7cd189bf16d0f042166ae0ea786c506d4 Reviewed-on: https://chromium-review.googlesource.com/753139 Commit-Ready: Dylan Reid <dgreid@chromium.org> Tested-by: Dylan Reid <dgreid@chromium.org> Reviewed-by: Stephen Barber <smbarber@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> [delete] https://crrev.com/aac16598b16d59ae4a55ec2cf2a695f712d20a97/overlay-eve/scripts/build_kernel_image.sh
,
Nov 4 2017
this landed for M63. merging back should be safe.
,
Nov 4 2017
This bug requires manual review: M63 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: cmasso@(Android), cmasso@(iOS), gkihumba@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Nov 6 2017
,
Nov 10 2017
,
Nov 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/7c7a99f6c9399c45455e951ab539e6d550bcc2d7 commit 7c7a99f6c9399c45455e951ab539e6d550bcc2d7 Author: Dylan Reid <dgreid@chromium.org> Date: Fri Nov 10 02:09:30 2017 Eve: Disable VMX extensions Enabling VMX caused ramoops to fail for an unknown reason. Turn it off for now, the VMX testing has moved to an experimental board. CQ-DEPEND=CL:*496617 BUG= 777985 TEST=crash kernel, see ramoops in /dev/pstore/ after it reboots. Signed-off-by: Dylan Reid <dgreid@chromium.org> Change-Id: Id5a0f4c7cd189bf16d0f042166ae0ea786c506d4 Reviewed-on: https://chromium-review.googlesource.com/753139 Commit-Ready: Dylan Reid <dgreid@chromium.org> Tested-by: Dylan Reid <dgreid@chromium.org> Reviewed-by: Stephen Barber <smbarber@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> (cherry picked from commit b4bc5ea348016eef2fd2caa50e1e59ceed46482f) Reviewed-on: https://chromium-review.googlesource.com/762408 Reviewed-by: Dylan Reid <dgreid@chromium.org> Commit-Queue: Dylan Reid <dgreid@chromium.org> [delete] https://crrev.com/b9c86f29bd75b5f7acc61ab12cd2a61171013ac0/overlay-eve/scripts/build_kernel_image.sh
,
Nov 10 2017
we haven't really fixed this issue have we ? we can't turn on VMX or crosvm or any of that on eve devices until this is understood & fixed. moving to eve-kvm was more a mitigation than a fix.
,
Nov 10 2017
No, it's not really fixed. But I wanted to close this and get it our of Eve's 63 release path. I'll open a new one for tracking turning it back on.
,
Dec 14 2017
Since R64-10132.0.0 this issue has again resurfaced on Eve. The between the latest time it worked and the first failure: https://crosland.corp.google.com/log/10132.0.0..10137.0.0 It appears that the original fix for this bug (Disable VMX extensions) was recently re-enabled with: https://chromium.googlesource.com/chromiumos/third_party/coreboot/+/4c7496b123895690a18ddd242ce52345ec50d4b9
,
Dec 14 2017
,
Jan 4 2018
pgeorgi@, PTAL per #27. Thanks... Tagged as a stable blocker.
,
Jan 4 2018
,
Jan 4 2018
Doubt if this fix will make it to last M63 stable release next week.
,
Jan 5 2018
From Duncan's description in https://chromium.googlesource.com/chromiumos/third_party/coreboot/+/4c7496b123895690a18ddd242ce52345ec50d4b9, could it be that kernel crash logs are only kept on cold boots?
,
Jan 6 2018
Interesting. Does it now not work with VMX disabled? We can turn it back on if that's the case.
,
Jan 6 2018
Adding disablevmx=off back to the command line fixes it for me locally. I'll upload a patch.
,
Jan 6 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/fde38aae79098fe199a5003a49e8dfb7066a9cc0 commit fde38aae79098fe199a5003a49e8dfb7066a9cc0 Author: Dylan Reid <dgreid@chromium.org> Date: Sat Jan 06 05:14:18 2018 Revert "Eve: Disable VMX extensions" After firmware update, enabling VMX no longer causes ramoops to fail. In fact, not enabling it causes ramoops to fail. This reverts commit b4bc5ea348016eef2fd2caa50e1e59ceed46482f. BUG= 777985 TEST=alt-volup-x until it reboots, check ramoops is present. Change-Id: Iea13e4293d8a0161dc4d3ab8d34d020d50427e2d Signed-off-by: Dylan Reid <dgreid@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/853312 Reviewed-by: Mike Frysinger <vapier@chromium.org> [add] https://crrev.com/fde38aae79098fe199a5003a49e8dfb7066a9cc0/overlay-eve/scripts/build_kernel_image.sh
,
Jan 8 2018
Yes we have to match the VMX enable state with a BIOS update (and not just by enabling in the kernel) because of the way FSP tries to disable it and forces a reboot. This wasn't an issue before FSP, but we were also not properly forcing a cold reboot in the BIOS on a disable transition like FSP does... The upstream coreboot behavior for VMX changed recently and VMX is enabled by default in all the boards that come after Eve.
,
Jan 8 2018
For completeness: Eve BIOS 9584.107.0 has VMX enabled by default now, so if you are still running into this ensure you have updated to this release. (this update is present in R64 but not R63)
,
Jan 23 2018
No more M63 releases Marking as fixed since it should be resolved in 64 as per c#37
,
Jan 23 2018
[Auto-generated comment by a script] We noticed that this issue is targeted for M-65; it appears the fix may have landed after branch point, meaning a merge might be required. Please confirm if a merge is required here - if so add Merge-Request-65 label, otherwise remove Merge-TBD label. Thanks.
,
Feb 22 2018
,
Feb 22 2018
This bug seems not to be fixed as eve continues not creating crash files for M64 builds. Can someone triage?
,
Feb 22 2018
This is high priority given our stable schedule. Thanks
,
Feb 22 2018
I'm removing this as a RBS for M64 since it's not a regression in M64 and too late to consider a new merge at this point.
,
Feb 23 2018
Weird ... just tested locally on 10176.73.0 and works for me however if I test on device in lab w/ same SW & FW it fails. Digging a little deeper ... PASSING device: localhost ~ # crossystem | grep fw ecfw_act = RW # Active EC firmware fwb_tries = 0 # Try firmware B count (writable) fw_vboot2 = 1 # 1 if firmware was selected by vboot2 or 0 otherwise fwid = Google_Eve.9584.107.0 # Active firmware ID fwupdate_tries = 0 # Times to try OS firmware update (writable, inside kern_nv) fw_tried = B # Firmware tried this boot (vboot2) fw_try_count = 0 # Number of times to try fw_try_next (writable) fw_try_next = B # Firmware to try next (vboot2,writable) fw_result = success # Firmware result this boot (vboot2,writable) fw_prev_tried = B # Firmware tried on previous boot (vboot2) fw_prev_result = success # Firmware result of previous boot (vboot2) mainfw_act = B # Active main firmware mainfw_type = normal # Active main firmware type ro_fwid = Google_Eve.9584.107.0 # Read-only firmware ID tpm_fwver = 0x00010001 # Firmware version stored in TPM tried_fwb = 0 # Tried firmware B before A this boot localhost ~ # grep DESCRIP /etc/lsb-release CHROMEOS_RELEASE_DESCRIPTION=10176.73.0 (Official Build) dev-channel eve test localhost ~ # dmesg | grep VMX [ 0.000000] can not disable VMX on CPU0 (already locked) [ 0.051766] can not disable VMX on CPU1 (already locked) [ 0.060218] can not disable VMX on CPU2 (already locked) [ 0.068687] can not disable VMX on CPU3 (already locked) FAILING device: localhost ~ # crossystem | grep fw ecfw_act = RW # Active EC firmware fwb_tries = 0 # Try firmware B count (writable) fw_vboot2 = 1 # 1 if firmware was selected by vboot2 or 0 otherwise fwid = Google_Eve.9584.107.0 # Active firmware ID fwupdate_tries = 0 # Times to try OS firmware update (writable, inside kern_nv) fw_tried = A # Firmware tried this boot (vboot2) fw_try_count = 0 # Number of times to try fw_try_next (writable) fw_try_next = A # Firmware to try next (vboot2,writable) fw_result = success # Firmware result this boot (vboot2,writable) fw_prev_tried = A # Firmware tried on previous boot (vboot2) fw_prev_result = unknown # Firmware result of previous boot (vboot2) mainfw_act = A # Active main firmware mainfw_type = normal # Active main firmware type ro_fwid = Google_Eve.9584.107.0 # Read-only firmware ID tpm_fwver = 0x00010001 # Firmware version stored in TPM tried_fwb = 0 # Tried firmware B before A this boot localhost ~ # grep DESCRIP /etc/lsb-release CHROMEOS_RELEASE_DESCRIPTION=10176.73.0 (Official Build) dev-channel eve test localhost ~ # dmesg | grep VMX [ 0.000000] Disabling VMX on cpu 0 [ 0.051702] Disabling VMX on cpu 1 [ 0.060163] Disabling VMX on cpu 2 [ 0.068608] Disabling VMX on cpu 3 So while the SW appears to be the same one device is stable able to 'disable VMX' Not sure how to proceed however. I tried chromeos-firmwareupdate --mode=autoupdate But it said device had current FW I then tried 'chromeos-firmwareupdate --mode=factory' which proceeded but now the device will not boot :(
,
Mar 2 2018
,
Mar 7 2018
If this is not a regression, it will not block stable, we can still consider a merge of a fix though, however time runs short and this might be better suited to 66 or later if we don't have a fix in the next few days.
,
Apr 19 2018
,
Apr 26 2018
There is no crashes report on chrome://crashes. Chrome 67.0.3396.19/10575.17.0- Eve.
,
Aug 17
afaik, we're all set now |
|||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||
Comment 1 by matthewjoseph@chromium.org
, Oct 24 2017