security_SandboxLinuxUnittests failing on peach_pit-tot-chrome-pfq-informational |
|||||||||
Issue descriptionBuilder: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934790789490023936 Failure output: security_SandboxLinuxUnittests FAIL: Autotest client terminated unexpectedly: DUT rebooted during the test run. This failed twice in a row. There appears to be a kernel crash file in the test output, 'kernel.20180921.111627.0.kcrash': https://stainless.corp.google.com/browse/chromeos-autotest-results/240301181-chromeos-test/
,
Sep 21
Chrome changes: https://chromium.googlesource.com/chromium/src/+log/16acd1cbfadccc05fe68cec64526d788a2554fa2..b731e532b76383ac9296967e973862fd99ae8e59 Maybe this one? +mpdenton@ https://chromium.googlesource.com/chromium/src/+/3eba7fd09df5997c765aeb5aad2ce28c2f890e49
,
Sep 21
Attempting a bisect: 10087 [2018-09-21 12:30:57 -0700] cros tryjob peach_pit-tot-chrome-pfq-informational-tryjob --chrome_version 52cae12 10088 [2018-09-21 12:31:11 -0700] cros tryjob peach_pit-tot-chrome-pfq-informational-tryjob --chrome_version b731e53 10089 [2018-09-21 12:31:25 -0700] cros tryjob peach_pit-tot-chrome-pfq-informational-tryjob --chrome_version 3eba7fd 10090 [2018-09-21 12:31:36 -0700] cros tryjob peach_pit-tot-chrome-pfq-informational-tryjob --chrome_version 86b3922 https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=peach_pit-tot-chrome-pfq-informational-tryjob&buildBranch
,
Sep 21
Am I right that the kernel is OOPSing? Is this due to some sort of in-kernel unit test, or is this in normal kernel code? If it's a kernel OOPS there's a much deeper problem than userspace code.
,
Sep 21
This test just runs sandbox_linux_unittests from Chrome on a Chrome OS device/VM.
,
Sep 24
Is the bisect finished? How can I read it? The original crash report includes a bunch of kernel panics, which my CL might have triggered, but the actual buggy code is unlikely to be mine. I can look into it anyway.
,
Sep 24
I think forgot to pass the flag to make the try jobs run auto tests, I seem to recall that they do not run them by default :( I will start them again shortly.
,
Sep 24
,
Sep 25
OK, I have narrowed the bisect range to #593052 - #593109. Continuing the bisect.
,
Sep 25
Note: We had one successful run, then another failure, so there may be some flake in the test, which will not help the bisect process :( https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934453936527756384
,
Sep 25
It also looks like we are seeing a similar failure in security_SandboxLinuxUnittests on tricky: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934413669237741792
,
Sep 25
This test keeping failure in peach_pit-tot-chrome-pfq-informational (success once recently) and occasionally in tricky-tot-chrome-pfq-informational. mpdenton@, could you help take a look? Thanks.
,
Sep 26
FWIW, I ran a bunch more tryjobs in the failure range and they all succeeded, and the test hasn't failed on peach_pit recently, so this is either directly related to a Chrome OS change and not a Chrome change, or the test is flakey in a way that is impacted by load or other factors.
,
Sep 26
Another similar test failed with the same fail output. security_ProfilePermissions.guest: FAIL: Autotest client terminated unexpectedly: DUT rebooted during the test run. https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934322558516573104
,
Sep 28
It looks to me like the tests stopped failing, am I right? The crash was a kernel NULL-pointer dereference, which my userspace code was triggering. Since userspace code should never be able to crash the kernel I'm fairly certain the issue wasn't mine, and it seems it must have been fixed with kernel commits.
,
Sep 28
The security_SandboxLinuxUnittests failing hasn't been seen in recent related informational builds. Close it currently. Thanks.
,
Oct 1
Reopen, since see it again https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8933915799093632016 This test is flaky and seems will not break the pfq?
,
Oct 1
The bisect was not conclusive because the test is very flakey. For some reason it is failing less often now, but still occasionally failing.
,
Oct 1
,
Oct 4
OK, I finally figured out how to get useful failure history from Stainless (I had to uncheck 'Exclude non-release builds'). The failures go back to at least R68-10718.104.0. They are not limited to peach_pit, but are far more frequent on peach_pit for some reason. The test does not appear to be especially more flakey now than it was in 69. Failures only across all boards: https://stainless.corp.google.com/search?view=matrix&row=build&col=model&test=%5Esecurity%5C_SandboxLinuxUnittests%24&status=FAIL&exclude_cts=false&exclude_not_run=false&exclude_non_release=false&exclude_au=false&exclude_acts=false&exclude_retried=false&exclude_non_production=false&days=28 All peach_pit runs: https://stainless.corp.google.com/search?view=matrix&row=build&col=model&builder_name=peach_pit&test=%5Esecurity%5C_SandboxLinuxUnittests%24&exclude_cts=false&exclude_not_run=false&exclude_non_release=false&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false&days=28
,
Oct 8
,
Oct 16
if it's oopsing, i suspect this is a dupe of issue 871915
,
Oct 16
Yes, most certainly a dupe--these failures here are also limited to kernel 3.8, oopsing on a NULL-ptr dereference in the same function. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by steve...@chromium.org
, Sep 21