New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 734801 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug
Hotlist-MemoryInfra



Sign in to add a comment

Mac ASan errors on chromium.memory for unit_tests don't turn turn the bot red

Project Member Reported by rsesek@chromium.org, Jun 19 2017

Issue description

Chrome Version: N/A
OS: Mac

What steps will reproduce the problem?
(1) Go to this build result page and see that all steps are green and passing: https://luci-milo.appspot.com/buildbot/chromium.memory/Mac%20ASan%2064%20Tests%20%281%29/31460
(2) Open the test log for unit_tests: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.memory%2FMac_ASan_64_Tests__1_%2F31460%2F%2B%2Frecipes%2Fsteps%2Funit_tests%2F0%2Fstdout
(3) Search for "heap-use-after-free" and find results!!

What is the expected result?
"heap-use-after-free" should fail the unit_tests step and turn the bot red.

What happens instead?
The unit_tests step is passing.

Please use labels and text to provide additional information.


For graphics-related bugs, please copy/paste the contents of the about:gpu
page at the end of this report.

 

Comment 1 by rsesek@chromium.org, Jun 19 2017

Cc: dpranke@chromium.org
Looking at the output.json file from the tests https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=1c25ebca62685d8dd1abe70cb3a2841308dbf031&as=output.json, "CRASH" is reported for the first run. But I'm guessing the retry run with "SUCCESS" means that the failure is treated as flaky. I'm not sure that's the right behavior for ASAan tests.
Your diagnosis is correct, the test is crashing during the initial run, and passing on a retry, so it is being treated as a flake.

It's not clear to me that we should have a different policy for sanitizer failures than any other; obviously, such a crash is bad, but so is any crash ...
Cc: -erikc...@chromium.org
Owner: erikc...@chromium.org
Status: Assigned (was: Untriaged)
[mac triage] erikchen@ for memory

Comment 4 by rsesek@chromium.org, Jun 20 2017

The problem is that these memory errors occur across several runs, and the retry-in-isolation is papering over that fact. It even looks like some of these errors are totally reliable when not run in isolation: https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=unit_tests&builder=chromium.memory%3AMac%20ASan%2064%20Tests%20(1).

So these are real errors, and they're either problems with the tests themselves (which would then potentially cause corruption in other tests) or even worse, memory corruption in the non-test code/product. I think part of the problem of retry-in-isolation is that some memory errors require a degree of heap activity in order to trigger.

I only noticed this because someone reported an ASan failure as  bug 734019 . And that test does have a clear memory error in the code, so I don't think that treating this as flake is correct.
That same argument can be made about any other kind of failure. I'm not disagreeing with you; it's simply true (a known tradeoff) that retrying failures will lead you to ignore classes of failures, which is why people need to be looking at the flakiness dashboard, which *doesn't* ignore these failures.

Unfortunately, we don't currently have effective mechanisms for actively working on these sorts of failures. This is something the ops team is looking at. It will likely require changes in processes like sheriffing as well as just tooling changes, and I am totally fine w/ making such changes if need be.

Sign in to add a comment