Tests that crash renderers occassionally time out on win clang dbg bots |
|||||
Issue descriptionThis is currently failing reliably on the 32-bit clang tot win dbg bot: https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin%28dbg%29%20tester/builds/4276 It failed recently on the 64-bit bot, also with a timeout, although it has passed more recently as well: https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin64%28dbg%29%20tester/builds/3669 My guess is that there's something wrong with the PDBs produced by clang + link.exe, and that is causing dbghelp to hang while symbolizing the crash stack trace. At least, that's more or less what happened in the OOMRenderers test that Hans recently fixed: http://crbug.com/692564 Perhaps we want to disable stack traces in all render-crashing tests by sinking the previous fix (https://codereview.chromium.org/2879793003/diff/20001/chrome/browser/metrics/metrics_service_browsertest.cc) down into OpenTabsAndNavigateToCrashyUrl?
,
Jun 1 2017
The pinned bots seem happy though: https://build.chromium.org/p/chromium.fyi/console?category=win%20clang Could something have changed in clang, are they configured differently, or could it somehow be related to the ToT bots doing full builds each time..
,
Jun 1 2017
The pinned bots appear to use symbol_level=1, and the tot bots use the default, which on Windows is 2: https://build.chromium.org/p/chromium.fyi/builders/CrWinClang%28dbg%29/builds/15642 https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin%28dbg%29/builds/8189 That seems like more evidence that this is clang-pdb related. This might be something we really need to fix before the switch. We can disable stack traces in these tests on bots, but developers need fast symbolized stack traces for local development.
,
Jun 1 2017
I've reproduced this locally, and this dialog box came up. I'll try attaching in a debugger and see what I can get.
,
Jun 1 2017
Uh, is this intentional behavior? This looks like it would happen even without clang... # ChildEBP RetAddr WARNING: Stack unwind information not available. Following frames may be wrong. 00 0b2f94a0 7630040c ntdll!NtRaiseHardError+0xc 01 0b2f9608 7630027c USER32!MessageBoxW+0x16c 02 0b2f968c 762fffeb USER32!MessageBoxTimeoutW+0x6c 03 0b2f96ac 763002b8 USER32!MessageBoxExW+0x1b *** WARNING: Unable to verify checksum for C:\src\chromium\src\out\clang\base.dll 04 0b2f96c8 1015131e USER32!MessageBoxW+0x18 05 0b2f9728 1014ffdf base!logging::DisplayDebugMessageInDialog+0xae [C:\src\chromium\src\base\logging.cc @ 502] *** WARNING: Unable to verify checksum for C:\src\chromium\src\out\clang\content.dll 06 0b2f9d80 15dff449 base!logging::LogMessage::~LogMessage+0x62f [C:\src\chromium\src\base\logging.cc @ 779] 07 0b2fa588 15e4a534 content!content::MaybeHandleDebugURL+0xed9 [C:\src\chromium\src\content\renderer\render_frame_impl.cc @ 801]
,
Jun 1 2017
Iirc we pop up a message box with stack on win (but it shouldn't be empty).
,
Jun 2 2017
I built browser_tests with MSVC and ran this test locally, and it has the same behavior: it pops a message box and times out if you don't click it. However, we *do* run this test on the normal waterfall, and it passes: https://build.chromium.org/p/chromium.win/builders/Win7%20Tests%20(dbg)(1) The logs of a recent run indicate that the test passed on the first try: ... [80/305] MetricsServiceBrowserTest.CheckCrashRenderers (13996 ms) Both our ToT bot and the main waterfall bots are Win7, but my workstation is Win10. That might still be related. Does the main waterfall bot do some kind of dialog box suppression that we aren't?
,
Jun 2 2017
https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.win%2FWin7_Tests__dbg__1_%2F60515%2F%2B%2Frecipes%2Fsteps%2Fbrowser_tests%2F0%2Fstdout Command: .\browser_tests.exe --brave-new-test-launcher --test-launcher-bot-mode --test-launcher-summary-output=e:\b\swarm_slave\w\iofjg2ft\output.json Maybe --test-launcher-bot-mode?
,
Jun 2 2017
I tried all of those flags, and still got the error box. =/
,
Jun 2 2017
CHROME_HEADLESS=1
,
Jun 2 2017
,
Jun 2 2017
(which suppresses via https://cs.chromium.org/chromium/src/chrome/common/logging_chrome.cc?rcl=819697cf41f496d914415cc88a39e39afe4dfe29&l=307 and CHROME_HEADLESS is set on bots I believe) I don't know why your bots are timing out though. A local debug clang symbol_level=2 takes ~8s to run that test for me locally.
,
Jun 3 2017
Do these tests require CHROME_HEADLESS to be set? Should they early-out if it isn't? Or at least print a note?
,
Jun 3 2017
I guess nobody runs all browser_tests locally anymore. =/ That doesn't explain the clang bot timeout, though: https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin%28dbg%29%20tester/builds/4276 I checked various logs, and I think this test is just on the edge of timing out normally: - MSVC dbg bots: test passes in ~14s - Clang pinned release bot: test passes in ~3s - Clang pinned dbg bot (symbol_level=1): test passes in ~9s - Clang tot dbg bot (symbol_level=2): test times out after 45s So, while clang PDBs aren't broken, we might be doing something inefficient. If I had to guess, I'd say we emit more inlined call site line tables. That's where I've seen us blow out format limitations (overflowing 16 bit table sizes) in the past.
,
Jun 3 2017
scottmg: Are you using is_win_fastlink=true and symbol_level=2? The bots don't seem to use /debug:fastlink.
,
Jun 5 2017
Yes, I was using both of those. is_debug = true is_component_build = true enable_nacl = false is_chrome_branded = true symbol_level = 2 target_cpu = "x86" is_win_fastlink = true is_clang = true win_console_app = true win_linker_timing = true use_goma = true
,
Sep 27 2017
Adding 'TE-NeedsTriageHelp' for moving this out of TE Unconfirmed triaging bucket.
,
Nov 10 2017
,
Nov 12
Issue has not been modified or commented on in the last 365 days, please re-open or file a new bug if this is still an issue. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by r...@chromium.org
, Jun 1 2017