New issue
Advanced search Search tips

Issue 728690 link

Starred by 3 users

Issue metadata

Status: Archived
Owner: ----
Closed: Nov 12
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 3
Type: Bug



Sign in to add a comment

Tests that crash renderers occassionally time out on win clang dbg bots

Project Member Reported by r...@chromium.org, Jun 1 2017

Issue description

This is currently failing reliably on the 32-bit clang tot win dbg bot:
https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin%28dbg%29%20tester/builds/4276

It failed recently on the 64-bit bot, also with a timeout, although it has passed more recently as well:
https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin64%28dbg%29%20tester/builds/3669

My guess is that there's something wrong with the PDBs produced by clang + link.exe, and that is causing dbghelp to hang while symbolizing the crash stack trace. At least, that's more or less what happened in the OOMRenderers test that Hans recently fixed:  http://crbug.com/692564 

Perhaps we want to disable stack traces in all render-crashing tests by sinking the previous fix (https://codereview.chromium.org/2879793003/diff/20001/chrome/browser/metrics/metrics_service_browsertest.cc) down into OpenTabsAndNavigateToCrashyUrl?
 

Comment 1 by r...@chromium.org, Jun 1 2017

These are some crashing tests that I see time out:
PrerenderBrowserTest.PrerenderRendererCrash (https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin%28dbg%29%20tester/builds/4276)
ECKEncryptedMediaTest.CDMCrashDuringDecode (https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin%28dbg%29%20tester/builds/4269)
MetricsServiceBrowserTest.CheckCrashRenderers (https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin%28dbg%29%20tester/builds/4276)

Comment 2 by h...@chromium.org, Jun 1 2017

The pinned bots seem happy though: https://build.chromium.org/p/chromium.fyi/console?category=win%20clang

Could something have changed in clang, are they configured differently, or could it somehow be related to the ToT bots doing full builds each time..

Comment 3 by r...@chromium.org, Jun 1 2017

The pinned bots appear to use symbol_level=1, and the tot bots use the default, which on Windows is 2:
https://build.chromium.org/p/chromium.fyi/builders/CrWinClang%28dbg%29/builds/15642
https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin%28dbg%29/builds/8189

That seems like more evidence that this is clang-pdb related. This might be something we really need to fix before the switch. We can disable stack traces in these tests on bots, but developers need fast symbolized stack traces for local development.

Comment 4 by r...@chromium.org, Jun 1 2017

I've reproduced this locally, and this dialog box came up. I'll try attaching in a debugger and see what I can get.

error-after-crash.png
492 KB View Download

Comment 5 by r...@chromium.org, Jun 1 2017

Uh, is this intentional behavior? This looks like it would happen even without clang...

 # ChildEBP RetAddr  
WARNING: Stack unwind information not available. Following frames may be wrong.
00 0b2f94a0 7630040c ntdll!NtRaiseHardError+0xc
01 0b2f9608 7630027c USER32!MessageBoxW+0x16c
02 0b2f968c 762fffeb USER32!MessageBoxTimeoutW+0x6c
03 0b2f96ac 763002b8 USER32!MessageBoxExW+0x1b
*** WARNING: Unable to verify checksum for C:\src\chromium\src\out\clang\base.dll
04 0b2f96c8 1015131e USER32!MessageBoxW+0x18
05 0b2f9728 1014ffdf base!logging::DisplayDebugMessageInDialog+0xae [C:\src\chromium\src\base\logging.cc @ 502]
*** WARNING: Unable to verify checksum for C:\src\chromium\src\out\clang\content.dll
06 0b2f9d80 15dff449 base!logging::LogMessage::~LogMessage+0x62f [C:\src\chromium\src\base\logging.cc @ 779]
07 0b2fa588 15e4a534 content!content::MaybeHandleDebugURL+0xed9 [C:\src\chromium\src\content\renderer\render_frame_impl.cc @ 801]
Iirc we pop up a message box with stack on win (but it shouldn't be empty).

Comment 7 by r...@chromium.org, Jun 2 2017

I built browser_tests with MSVC and ran this test locally, and it has the same behavior: it pops a message box and times out if you don't click it.

However, we *do* run this test on the normal waterfall, and it passes:
https://build.chromium.org/p/chromium.win/builders/Win7%20Tests%20(dbg)(1)

The logs of a recent run indicate that the test passed on the first try:
...
[80/305] MetricsServiceBrowserTest.CheckCrashRenderers (13996 ms)

Both our ToT bot and the main waterfall bots are Win7, but my workstation is Win10. That might still be related.

Does the main waterfall bot do some kind of dialog box suppression that we aren't?
https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.win%2FWin7_Tests__dbg__1_%2F60515%2F%2B%2Frecipes%2Fsteps%2Fbrowser_tests%2F0%2Fstdout

Command: .\browser_tests.exe --brave-new-test-launcher --test-launcher-bot-mode --test-launcher-summary-output=e:\b\swarm_slave\w\iofjg2ft\output.json

Maybe --test-launcher-bot-mode?

Comment 9 by r...@chromium.org, Jun 2 2017

I tried all of those flags, and still got the error box. =/
CHROME_HEADLESS=1
Cc: scottmg@chromium.org
(which suppresses via https://cs.chromium.org/chromium/src/chrome/common/logging_chrome.cc?rcl=819697cf41f496d914415cc88a39e39afe4dfe29&l=307 and CHROME_HEADLESS is set on bots I believe)

I don't know why your bots are timing out though. A local debug clang symbol_level=2 takes ~8s to run that test for me locally.
Do these tests require CHROME_HEADLESS to be set? Should they early-out if it isn't? Or at least print a note?

Comment 14 by r...@chromium.org, Jun 3 2017

I guess nobody runs all browser_tests locally anymore. =/

That doesn't explain the clang bot timeout, though:
https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin%28dbg%29%20tester/builds/4276

I checked various logs, and I think this test is just on the edge of timing out normally:
- MSVC dbg bots: test passes in ~14s
- Clang pinned release bot: test passes in ~3s
- Clang pinned dbg bot (symbol_level=1): test passes in ~9s
- Clang tot dbg bot (symbol_level=2): test times out after 45s

So, while clang PDBs aren't broken, we might be doing something inefficient. If I had to guess, I'd say we emit more inlined call site line tables. That's where I've seen us blow out format limitations (overflowing 16 bit table sizes) in the past.

Comment 15 by r...@chromium.org, Jun 3 2017

scottmg: Are you using is_win_fastlink=true and symbol_level=2? The bots don't seem to use /debug:fastlink.
Yes, I was using both of those.

is_debug = true
is_component_build = true
enable_nacl = false
is_chrome_branded = true
symbol_level = 2
target_cpu = "x86"
is_win_fastlink = true
is_clang = true
win_console_app = true
win_linker_timing = true
use_goma = true


Comment 17 by ajha@chromium.org, Sep 27 2017

Labels: TE-NeedsTriageHelp
Adding 'TE-NeedsTriageHelp' for moving this out of TE Unconfirmed triaging bucket.
Components: Build
Project Member

Comment 19 by sheriffbot@chromium.org, Nov 12

Status: Archived (was: Unconfirmed)
Issue has not been modified or commented on in the last 365 days, please re-open or file a new bug if this is still an issue.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Sign in to add a comment