New issue
Advanced search Search tips

Issue 888089 link

Starred by 1 user

Issue metadata

Status: Duplicate
Owner:
Closed: Oct 16
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

security_SandboxLinuxUnittests failing on peach_pit-tot-chrome-pfq-informational

Project Member Reported by steve...@chromium.org, Sep 21

Issue description

Builder:
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934790789490023936

Failure output:
security_SandboxLinuxUnittests            FAIL: Autotest client terminated unexpectedly: DUT rebooted during the test run.

This failed twice in a row. There appears to be a kernel crash file in the test output, 'kernel.20180921.111627.0.kcrash':
https://stainless.corp.google.com/browse/chromeos-autotest-results/240301181-chromeos-test/


 
Failure appears between CrOS R71-11084.0.0 and R71-11086.0.0, but it does not appear to be failing in the release builders.

Attempting a bisect:

10087  [2018-09-21 12:30:57 -0700] cros tryjob peach_pit-tot-chrome-pfq-informational-tryjob --chrome_version 52cae12
10088  [2018-09-21 12:31:11 -0700] cros tryjob peach_pit-tot-chrome-pfq-informational-tryjob --chrome_version b731e53
10089  [2018-09-21 12:31:25 -0700] cros tryjob peach_pit-tot-chrome-pfq-informational-tryjob --chrome_version 3eba7fd
10090  [2018-09-21 12:31:36 -0700] cros tryjob peach_pit-tot-chrome-pfq-informational-tryjob --chrome_version 86b3922

https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=peach_pit-tot-chrome-pfq-informational-tryjob&buildBranch

Am I right that the kernel is OOPSing? Is this due to some sort of in-kernel unit test, or is this in normal kernel code? If it's a kernel OOPS there's a much deeper problem than userspace code.
This test just runs sandbox_linux_unittests from Chrome on a Chrome OS device/VM.
Is the bisect finished? How can I read it?

The original crash report includes a bunch of kernel panics, which my CL might have triggered, but the actual buggy code is unlikely to be mine. I can look into it anyway.
I think forgot to pass the flag to make the try jobs run auto tests, I seem to recall that they do not run them by default :(

I will start them again shortly.

Cc: minch@chromium.org
OK, I have narrowed the bisect range to #593052 - #593109. Continuing the bisect.

Note: We had one successful run, then another failure, so there may be some flake in the test, which will not help the bisect process :(

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934453936527756384

It also looks like we are seeing a similar failure in security_SandboxLinuxUnittests on tricky:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934413669237741792

Owner: mpdenton@chromium.org
Status: Assigned (was: Untriaged)
This test keeping failure in peach_pit-tot-chrome-pfq-informational (success once recently) and occasionally in tricky-tot-chrome-pfq-informational.

mpdenton@, could you help take a look? Thanks.
FWIW, I ran a bunch more tryjobs in the failure range and they all succeeded, and the test hasn't failed on peach_pit recently, so this is either directly related to a Chrome OS change and not a Chrome change, or the test is flakey in a way that is impacted by load or other factors.

Another similar test failed with the same fail output.
security_ProfilePermissions.guest: FAIL: Autotest client terminated unexpectedly: DUT rebooted during the test run.

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934322558516573104
It looks to me like the tests stopped failing, am I right? The crash was a kernel NULL-pointer dereference, which my userspace code was triggering. Since userspace code should never be able to crash the kernel I'm fairly certain the issue wasn't mine, and it seems it must have been fixed with kernel commits.
Status: Fixed (was: Assigned)
The security_SandboxLinuxUnittests failing hasn't been seen in recent related informational builds. Close it currently. Thanks.
Cc: sammiequon@chromium.org
Status: Assigned (was: Fixed)
Reopen, since see it again https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8933915799093632016

This test is flaky and seems will not break the pfq?
The bisect was not conclusive because the test is very flakey. For some reason it is failing less often now, but still occasionally failing.

Cc: agawronska@chromium.org
OK, I finally figured out how to get useful failure history from Stainless (I had to uncheck 'Exclude non-release builds').

The failures go back to at least R68-10718.104.0.
They are not limited to peach_pit, but are far more frequent on peach_pit for some reason.
The test does not appear to be especially more flakey now than it was in 69.

Failures only across all boards:
https://stainless.corp.google.com/search?view=matrix&row=build&col=model&test=%5Esecurity%5C_SandboxLinuxUnittests%24&status=FAIL&exclude_cts=false&exclude_not_run=false&exclude_non_release=false&exclude_au=false&exclude_acts=false&exclude_retried=false&exclude_non_production=false&days=28

All peach_pit runs:
https://stainless.corp.google.com/search?view=matrix&row=build&col=model&builder_name=peach_pit&test=%5Esecurity%5C_SandboxLinuxUnittests%24&exclude_cts=false&exclude_not_run=false&exclude_non_release=false&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false&days=28

Cc: -agawronska@chromium.org -sammiequon@chromium.org newcomer@chromium.org
if it's oopsing, i suspect this is a dupe of issue 871915
Mergedinto: 871915
Status: Duplicate (was: Assigned)
Yes, most certainly a dupe--these failures here are also limited to kernel 3.8, oopsing on a NULL-ptr dereference in the same function.

Sign in to add a comment