New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 676030 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Renable core.minidump_unittest.BrowserMinidumpTest.testSymbolizeMinidump

Project Member Reported by eyaich@chromium.org, Dec 20 2016

Issue description

Our only integration tests for minidumps and stack symbolization were reverted in  crbug.com/633761 .  

Fix and re-enable these tests.  

Thoughts around failures and debugging at the time these tests were failing: 

Issue was it often found more minidumps than expected

- Wondering if it has something to do with other tests that are run in parallel and how the directories that chrome utilizes differences between user profiles and processes. 
  File "/b/swarm_slave/w/ir39C4lL/tools/perf/core/minidump_unittest.py", line 57, in testMultipleCrashMinidumps     self.assertEquals(len(self._browser.GetAllMinidumpPaths()), 2) AssertionError: 3 != 2
 
Above error indicates @decorators.isolated isn’t working correctly.

Thoughts on debugging that: 
1) How about when the browser is start, you add something that write a random file to some fixed folder name.  When the browser is stopped, it delete the file.  Add some line to browser log that print out the content of that folder.  Then we can use that log to check whether testMultipleCrashMinidumps is actually run in isolation

2) another thing you can add is improve the message of your unittest so when it fails, in print out all the minidump paths' timestamp maybe content

Thoughts of even when chrome is closed sometimes it has some zombie processes lingering on the machine, maybe something like that is happening when running chrome here


Maybe add some additional debug tools to telemetry to print things out when necessary, 
Things like pstree to be able to understand the state of the system when the test is run and what else is running


 
Cc: dpranke@chromium.org
Cc: mark@chromium.org
Mark, we are seeing issues before we try and get symbols and symbolize stack traces with minidumps not being present, or more minidumps than we expect.  

See  crbug.com/671049  for a case where we can't find them on macs.  

Can you talk a little more to how dumps get written to the crashpad_database_util and the steps before we query it for the current dumps?  We are wondering if our running in isolation isn't working correctly or if we are cutting off processing too soon for the dumps to be generated.  
Cc: nednguyen@chromium.org

Comment 4 by mark@chromium.org, Jan 6 2017

crashpad_handler writes the dumps to the Crashpad database in the “pending” state before letting the crashy program go on its way and finish crashing in a way that’s visible to the rest of the system, such as to its parent process via waitpid(). Subsequently, another thread in crashpad_handler is supposed to pick up the pending crash and decide what to do with it, which basically means that it’ll either try to upload it or not. At some point, the report will transition from “pending” to “completed”. That means that no more upload attempts will be made. This either happens quickly the first time that the pending report is examined (because uploads are off), or it can happen because the upload succeeds, or it can happen because the upload fails but won’t be retried. Even when it happens quickly, though, it’s a bit racy with any action that you might take as soon as the crash becomes visible to the rest of the system.

If you’re the parent of a test, and the test crashes, and you learn of the crash when you call waitpid(), and you immediately take action to run crashpad_database_util to find the crash, then you should be aware that this action can race crashpad_handler transitioning the report from “pending” to “completed.”

If you’re only asking crashpad_database_util to tell you about completed reports, then you won’t find it in the list if crashpad_handler hasn’t had a chance to move it to completed yet.

That’s probably what would cause reports to not show up. As for more reports than expected, you should bear in mind that Crashpad is watching more than just your main test process, it’s potentially watching all of that process’ children as well, and if any of them crash, nobody’s necessarily going to tell you about it (the crash would be reported to the parent process, the main test process, via waitpid()), but Crashpad will produce a report anyway. If you run multiple test processes in parallel as your own children and they share the same database, that’ll be another way that multiple processes can (roughly) simultaneously wind up producing reports in the same database.
Cc: scottmg@chromium.org
Components: Test>Telemetry
Components: -Tests>Telemetry

Sign in to add a comment