New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 777985 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 17
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

[Eve] Kernel crash logs not generated in /var/spool/crash after turning on VMX support

Project Member Reported by matthewjoseph@chromium.org, Oct 24 2017

Issue description

Chrome Version: 64.0.3246.0
OS: ChromeOS 10063.0.0

What steps will reproduce the problem?
1) Recover device with R64-10063.0.0 or R63-10032.14.0
2) Log into User Account
3) Open VT2 or crosh shell
4) Execute: echo BUG > /sys/kernel/debug/provoke-crash/DIRECT
    a) Or Alt + F10 + X
5) Wait for system to restart
6) Look for crashes in /var/spool/crash

What is the expected result?
.kcrash and .meta files should be present after a crash

What happens instead?
No .kcrash or .meta kernal crash files are in /var/spool/crash

Notes:
- Reproduced this on Eve R63-10032.14.0 and R64-10063.0.0
- This issue is *not* observed on M62-9901.51.0 Stable 

 

Comment 2 by vapier@chromium.org, Oct 24 2017

someone feel like bisecting this down between 9901.0.0 & 10032.0.0 ?

Comment 3 by ka...@chromium.org, Oct 24 2017

Could this be the latency for crash files creation we caught with issue 774154?
I don't think this is related to issue 774154 because when testing manually, the logs were still not present after more than 30 minutes.  

Comment 6 by vapier@chromium.org, Oct 27 2017

kernel crash logs don't involve breakpad at all.  it's up to the kernel to save/preserve the log so that when crash-reporter starts at the next boot, it can find them under /dev/pstore/.

would still be nice if someone has a system that reproduces to bisect a bit more between those ~150 versions.

Comment 7 by snanda@chromium.org, Oct 27 2017

Owner: tbroch@chromium.org
After some digging, I've narrowed it down to a change between R63-9935.0.0 and R63-9937.0.0.  Kernel crash files are present on 9935.0.0 but not on 9937.0.0 or later.

Only one CL that I could find mentions changes to crash reporting:
https://chromium-review.googlesource.com/c/chromiumos/platform2/+/659138

Comment 9 by vapier@chromium.org, Oct 27 2017

that should only impact lakitu systems as that changes systemd files (which eve doesn't use)

we also turned on VMX for eve in that range:
  https://chromium-review.googlesource.com/657918
i wouldn't expect it to affect this, but i also wouldn't be surprised if it made subtle changes to pstore behavior.

Comment 10 Deleted

Comment 11 Deleted

Cc: tbroch@chromium.org
Owner: dgreid@chromium.org
Status: Assigned (was: Untriaged)
#c8 & 9 thanks for narrowing the scope down.  Does appear that vmx change is causing the regression as manually changing the bootarg to disablevmx=on and running platform_KernelErrorPaths.BUG to verify fixes the issue.

Dylan can you have a look?
Sure, doesn't reproduce on samus with v4.4, I'll set up a eve with a test build and take a look.
Certainly does happen on Eve. I don't have any goo dleads as to why yet. I'll try to determine if it is failing to write or read the ramoops tomorrow.
Summary: [Eve] Kernel crash logs not generated in /var/spool/crash after turning on VMX support (was: [Eve] Kernel crash logs not generated in /var/spool/crash)
to mitigate, we could move the option to the eve-kvm overlay so we can backport it to branches until we narrow it down more
I think we should probably move to the eve-kvm overlay.

I noticed that it only happens after a hard reset, so the lock bit is cleared. As long as vmx is locked, ramoops works even if the kernel is told not to disable it.

There isn't a lot of code in coreboot that is different, it sets the IA32_FEATURE_CONTROL msr, but that's about it.

Comment 17 Deleted

Project Member

Comment 18 by bugdroid1@chromium.org, Nov 4 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/b4bc5ea348016eef2fd2caa50e1e59ceed46482f

commit b4bc5ea348016eef2fd2caa50e1e59ceed46482f
Author: Dylan Reid <dgreid@chromium.org>
Date: Sat Nov 04 04:56:47 2017

Eve: Disable VMX extensions

Enabling VMX caused ramoops to fail for an unknown reason. Turn it off
for now, the VMX testing has moved to an experimental board.

CQ-DEPEND=CL:*496617
BUG= 777985 
TEST=crash kernel, see ramoops in /dev/pstore/ after it reboots.
Signed-off-by: Dylan Reid <dgreid@chromium.org>

Change-Id: Id5a0f4c7cd189bf16d0f042166ae0ea786c506d4
Reviewed-on: https://chromium-review.googlesource.com/753139
Commit-Ready: Dylan Reid <dgreid@chromium.org>
Tested-by: Dylan Reid <dgreid@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[delete] https://crrev.com/aac16598b16d59ae4a55ec2cf2a695f712d20a97/overlay-eve/scripts/build_kernel_image.sh

Labels: -Restrict-View-Google Merge-Request-63
this landed for M63.  merging back should be safe.
Project Member

Comment 20 by sheriffbot@chromium.org, Nov 4 2017

Labels: -Merge-Request-63 Merge-Review-63 Hotlist-Merge-Review
This bug requires manual review: M63 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), cmasso@(iOS), gkihumba@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Hotlist-Merge-Review -Merge-Review-63 Merge-Approved-63
Labels: -Merge-Approved-63 Merge-Merged
Status: Fixed (was: Assigned)
Project Member

Comment 23 by bugdroid1@chromium.org, Nov 10 2017

Labels: merge-merged-release-R63-10032.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/7c7a99f6c9399c45455e951ab539e6d550bcc2d7

commit 7c7a99f6c9399c45455e951ab539e6d550bcc2d7
Author: Dylan Reid <dgreid@chromium.org>
Date: Fri Nov 10 02:09:30 2017

Eve: Disable VMX extensions

Enabling VMX caused ramoops to fail for an unknown reason. Turn it off
for now, the VMX testing has moved to an experimental board.

CQ-DEPEND=CL:*496617
BUG= 777985 
TEST=crash kernel, see ramoops in /dev/pstore/ after it reboots.
Signed-off-by: Dylan Reid <dgreid@chromium.org>

Change-Id: Id5a0f4c7cd189bf16d0f042166ae0ea786c506d4
Reviewed-on: https://chromium-review.googlesource.com/753139
Commit-Ready: Dylan Reid <dgreid@chromium.org>
Tested-by: Dylan Reid <dgreid@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>
(cherry picked from commit b4bc5ea348016eef2fd2caa50e1e59ceed46482f)
Reviewed-on: https://chromium-review.googlesource.com/762408
Reviewed-by: Dylan Reid <dgreid@chromium.org>
Commit-Queue: Dylan Reid <dgreid@chromium.org>

[delete] https://crrev.com/b9c86f29bd75b5f7acc61ab12cd2a61171013ac0/overlay-eve/scripts/build_kernel_image.sh

we haven't really fixed this issue have we ?  we can't turn on VMX or crosvm or any of that on eve devices until this is understood & fixed.  moving to eve-kvm was more a mitigation than a fix.
No, it's not really fixed. But I wanted to close this and get it our of Eve's 63 release path. I'll open a new one for tracking turning it back on.

Comment 26 Deleted

Since R64-10132.0.0 this issue has again resurfaced on Eve.

The between the latest time it worked and the first failure:
https://crosland.corp.google.com/log/10132.0.0..10137.0.0

It appears that the original fix for this bug (Disable VMX extensions) was recently re-enabled with:
https://chromium.googlesource.com/chromiumos/third_party/coreboot/+/4c7496b123895690a18ddd242ce52345ec50d4b9
Cc: pgeorgi@chromium.org
pgeorgi@, PTAL per #27.  Thanks...  Tagged as a stable blocker.  
Labels: -M-63
Doubt if this fix will make it to last M63 stable release next week.
From Duncan's description in https://chromium.googlesource.com/chromiumos/third_party/coreboot/+/4c7496b123895690a18ddd242ce52345ec50d4b9, could it be that kernel crash logs are only kept on cold boots?
Interesting. Does it now not work with VMX disabled? We can turn it back on if that's the case.
Adding disablevmx=off back to the command line fixes it for me locally. I'll upload a patch.
Project Member

Comment 35 by bugdroid1@chromium.org, Jan 6 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/fde38aae79098fe199a5003a49e8dfb7066a9cc0

commit fde38aae79098fe199a5003a49e8dfb7066a9cc0
Author: Dylan Reid <dgreid@chromium.org>
Date: Sat Jan 06 05:14:18 2018

Revert "Eve: Disable VMX extensions"

After firmware update, enabling VMX no longer causes ramoops to fail. In
fact, not enabling it causes ramoops to fail.

This reverts commit b4bc5ea348016eef2fd2caa50e1e59ceed46482f.

BUG= 777985 
TEST=alt-volup-x until it reboots, check ramoops is present.

Change-Id: Iea13e4293d8a0161dc4d3ab8d34d020d50427e2d
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/853312
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[add] https://crrev.com/fde38aae79098fe199a5003a49e8dfb7066a9cc0/overlay-eve/scripts/build_kernel_image.sh

Yes we have to match the VMX enable state with a BIOS update (and not just by enabling in the kernel) because of the way FSP tries to disable it and forces a reboot.  This wasn't an issue before FSP, but we were also not properly forcing a cold reboot in the BIOS on a disable transition like FSP does...

The upstream coreboot behavior for VMX changed recently and VMX is enabled by default in all the boards that come after Eve.
For completeness: Eve BIOS 9584.107.0 has VMX enabled by default now, so if you are still running into this ensure you have updated to this release.  (this update is present in R64 but not R63)
Status: Fixed (was: Assigned)
No more M63 releases
Marking as fixed since it should be resolved in 64 as per c#37 
Labels: Merge-TBD
[Auto-generated comment by a script] We noticed that this issue is targeted for M-65; it appears the fix may have landed after branch point, meaning a merge might be required. Please confirm if a merge is required here - if so add Merge-Request-65 label, otherwise remove Merge-TBD label. Thanks.

Comment 40 by ka...@chromium.org, Feb 22 2018

Cc: kbleicher@chromium.org dgreid@chromium.org
 Issue 814826  has been merged into this issue.

Comment 41 by ka...@chromium.org, Feb 22 2018

Status: Assigned (was: Fixed)
This bug seems not to be fixed as eve continues not creating crash files for M64 builds.

Can someone triage?
This is high priority given our stable schedule.  Thanks
Labels: -M-64
I'm removing this as a RBS for M64 since it's not a regression in M64 and too late to consider a new merge at this point.
Weird ... just tested locally on 10176.73.0 and works for me however if I test on device in lab w/ same SW & FW it fails.


Digging a little deeper ...

PASSING device:

localhost ~ # crossystem | grep fw
ecfw_act               = RW                             # Active EC firmware
fwb_tries              = 0                              # Try firmware B count (writable)
fw_vboot2              = 1                              # 1 if firmware was selected by vboot2 or 0 otherwise
fwid                   = Google_Eve.9584.107.0          # Active firmware ID
fwupdate_tries         = 0                              # Times to try OS firmware update (writable, inside kern_nv)
fw_tried               = B                              # Firmware tried this boot (vboot2)
fw_try_count           = 0                              # Number of times to try fw_try_next (writable)
fw_try_next            = B                              # Firmware to try next (vboot2,writable)
fw_result              = success                        # Firmware result this boot (vboot2,writable)
fw_prev_tried          = B                              # Firmware tried on previous boot (vboot2)
fw_prev_result         = success                        # Firmware result of previous boot (vboot2)
mainfw_act             = B                              # Active main firmware
mainfw_type            = normal                         # Active main firmware type
ro_fwid                = Google_Eve.9584.107.0          # Read-only firmware ID
tpm_fwver              = 0x00010001                     # Firmware version stored in TPM
tried_fwb              = 0                              # Tried firmware B before A this boot
localhost ~ # grep DESCRIP /etc/lsb-release 
CHROMEOS_RELEASE_DESCRIPTION=10176.73.0 (Official Build) dev-channel eve test
localhost ~ # dmesg | grep VMX
[    0.000000] can not disable VMX on CPU0 (already locked)
[    0.051766] can not disable VMX on CPU1 (already locked)
[    0.060218] can not disable VMX on CPU2 (already locked)
[    0.068687] can not disable VMX on CPU3 (already locked)

FAILING device:

localhost ~ # crossystem | grep fw
ecfw_act               = RW                             # Active EC firmware
fwb_tries              = 0                              # Try firmware B count (writable)
fw_vboot2              = 1                              # 1 if firmware was selected by vboot2 or 0 otherwise
fwid                   = Google_Eve.9584.107.0          # Active firmware ID
fwupdate_tries         = 0                              # Times to try OS firmware update (writable, inside kern_nv)
fw_tried               = A                              # Firmware tried this boot (vboot2)
fw_try_count           = 0                              # Number of times to try fw_try_next (writable)
fw_try_next            = A                              # Firmware to try next (vboot2,writable)
fw_result              = success                        # Firmware result this boot (vboot2,writable)
fw_prev_tried          = A                              # Firmware tried on previous boot (vboot2)
fw_prev_result         = unknown                        # Firmware result of previous boot (vboot2)
mainfw_act             = A                              # Active main firmware
mainfw_type            = normal                         # Active main firmware type
ro_fwid                = Google_Eve.9584.107.0          # Read-only firmware ID
tpm_fwver              = 0x00010001                     # Firmware version stored in TPM
tried_fwb              = 0                              # Tried firmware B before A this boot
localhost ~ # grep DESCRIP /etc/lsb-release 
CHROMEOS_RELEASE_DESCRIPTION=10176.73.0 (Official Build) dev-channel eve test
localhost ~ # dmesg | grep VMX
[    0.000000] Disabling VMX on cpu 0
[    0.051702] Disabling VMX on cpu 1
[    0.060163] Disabling VMX on cpu 2
[    0.068608] Disabling VMX on cpu 3

So while the SW appears to be the same one device is stable able to 'disable VMX' 

Not sure how to proceed however.

I tried chromeos-firmwareupdate --mode=autoupdate 

But it said device had current FW

I then tried 'chromeos-firmwareupdate --mode=factory'

which proceeded but now the device will not boot :(


Cc: dlaurie@chromium.org
Labels: -ReleaseBlock-Stable
If this is not a regression, it will not block stable, we can still consider a merge of a fix though, however time runs short and this might be better suited to 66 or later if we don't have a fix in the next few days.
Project Member

Comment 47 by sheriffbot@chromium.org, Apr 19 2018

Labels: -Merge-TBD
There is no crashes report on chrome://crashes.  Chrome 67.0.3396.19/10575.17.0- Eve. 
Components: -Internals>CrashReporting OS>Systems>CrashReporting
Status: Fixed (was: Assigned)
afaik, we're all set now

Sign in to add a comment