shutdown DCHECKs in interactive_ui_tests' renderers are swallowed |
||||||
Issue descriptionSuddenly as of r520525 or thereabouts (https://ci.chromium.org/buildbot/chromium.win/Win7%20Tests%20(dbg)(1)/64917), various interactive_ui_tests have started timing out as a result of something throwing up an empty MessageBox with the descriptive title "Error"; see attached screenshot. A typical failure can be found at https://chromium-swarm.appspot.com/task?id=3a261778b09db410&refresh=10&show_raw=1, though the log doesn't provide any useful information. "Error" is the default caption for a call to ::MessageBox, so I think what we have is a call to MessageBox(nullptr, nullptr, nullptr, 0); (0 == MB_OK).
,
Dec 1 2017
+sky@ (general guru) and wfh@ (sandbox guru) I was mistaken. It's a DCHECK being hit in a sandboxed renderer (specifically, WebServiceWorkerRegistrationImpl::OnConnectionError; see issue 791041 ). The details of the failure are somehow eaten by the sandbox. I was only able to see what was happening if I repro'd locally with: out\Deb\interactive_ui_tests.exe --gtest_filter=PlatformNotificationServiceBrowserTest.WebNotificationOptionsReflection --single_process --no-sandbox (over, and over, and over again) It took a long, long time for me put the pieces together to get to this point. Did I simply miss a PSA? Since I just burned a full day on this one failure, I wonder how we can make this sort of thing on the bots more actionable in the future. Yes, I learned a lot in the process (CHROME_HEADLESS regressed, --no-sandbox is needed to see failures in renderers, ETW can be magical at times), but I don't plan to be the eternal guardian of interactive_ui_tests. Thoughts? (dropping to P2 since https://chromium-review.googlesource.com/c/chromium/src/+/803516 should fix the tests)
,
Dec 1 2017
The DCHECK message box behavior on bots bit someone else recently and wasted at least a day or two of their time as well. This affects other multiprocess content test suites, not just interactive_ui_tests. It would be nice if there were a way for us to plumb sandboxed process logs to the console and disable the message box behavior, at least behind a flag so the bots could use it. Another option might be to capture a screenshot of the bot on test failure and include it as part of the logs, but a) that's probably harder and b) it's probably a huge waste of space most of the time.
,
Dec 1 2017
grt did add logic to take a screenshot in certain cases (always on top window present before/after test) and I would expect that logic to trigger here. jam knows more about crashes in renderers. I do agree that for tests we would really like the logs of the renderers to get dumped or at least to see some mention that the renderer is crashing unexpectedly and ideally a stack.
,
Dec 1 2017
Once issue 790985 is fixed, we should no longer have message boxes appearing for DCHECK failures on the bots. With that out of the way, I'm not sure what would happen to the details in this case. I'll try it out.
,
Dec 1 2017
re crashes in renderers: we do print the stack trace in browser tests when that happens. We have a test for this :) ContentBrowserTest.RendererCrashCallStack
,
Dec 4 2017
For this particular DCHECK (in WebServiceWorkerRegistrationImpl::OnConnectionError; see issue 791041 ), I find that the failure does appear in the log with CHROME_HEADLESS=1, but it doesn't fail the test. Methodology (in case someone wants to repeat my findings): - sync to position r520889 (or anywhere between r520479 and r521008) - build debug interactive_ui_tests - set CHROME_HEADLESS=1 - out\Deb\interactive_ui_tests.exe --gtest_filter=PlatformNotificationServiceBrowserTest.WebNotificationOptionsReflection --gtest_repeat=10 the output that streams by sometimes has: [17120:13220:1204/120105.054:FATAL:web_service_worker_registration_impl.cc(225)] Check failed: LifecycleState::kDetached == state_ (2 vs. 1) Backtrace: base::debug::StackTrace::StackTrace [0x101FE856+102] base::debug::StackTrace::StackTrace [0x101FD3F3+35] logging::LogMessage::~LogMessage [0x10273F55+149] content::WebServiceWorkerRegistrationImpl::OnConnectionError [0x15F90B49+457] .... but the test never fails. When I add in "--brave-new-test-launcher --test-launcher-bot-mode" (as is used on the bots), this output is swallowed: out\Deb\interactive_ui_tests.exe --cfi-diag=0 --snapshot-output-dir=c:\src\chromium\src\out\Deb --gtest_filter=PlatformNotificationServiceBrowserTest.WebNotificationOptionsReflection --gtest_repeat=10 --brave-new-test-launcher --test-launcher-bot-mode IMPORTANT DEBUGGING NOTE: each test is run inside its own process. For debugging a test inside a debugger, use the --gtest_filter=<your_test_name> flag along with either --single_process (to run the test in one launcher/browser process) or --single-process (to do the above, and also run Chrome in single-process mode). Using sharding settings from environment. This is shard 0/1 Using 1 parallel jobs. [1/1] PlatformNotificationServiceBrowserTest.WebNotificationOptionsReflection (4553 ms) SUCCESS: all tests passed. [1/1] PlatformNotificationServiceBrowserTest.WebNotificationOptionsReflection (4496 ms) SUCCESS: all tests passed. [1/1] PlatformNotificationServiceBrowserTest.WebNotificationOptionsReflection (4653 ms) SUCCESS: all tests passed. [1/1] PlatformNotificationServiceBrowserTest.WebNotificationOptionsReflection (4511 ms) SUCCESS: all tests passed. [1/1] PlatformNotificationServiceBrowserTest.WebNotificationOptionsReflection (4549 ms) SUCCESS: all tests passed. [1/1] PlatformNotificationServiceBrowserTest.WebNotificationOptionsReflection (4547 ms) SUCCESS: all tests passed. [1/1] PlatformNotificationServiceBrowserTest.WebNotificationOptionsReflection (4440 ms) SUCCESS: all tests passed. [1/1] PlatformNotificationServiceBrowserTest.WebNotificationOptionsReflection (4633 ms) SUCCESS: all tests passed. [1/1] PlatformNotificationServiceBrowserTest.WebNotificationOptionsReflection (4688 ms) SUCCESS: all tests passed. [1/1] PlatformNotificationServiceBrowserTest.WebNotificationOptionsReflection (4544 ms) SUCCESS: all tests passed. Summary of all test iterations: End of the summary. How can we have late-process DCHECK failures in subprocesses lead to overall test failure?
,
Dec 6 2017
I added a DCHECK(false) to the start of this test and I found that I got call stacks and "1 test crashed" both with and without "--brave-new-test-launcher --test-launcher-bot-mode". However --test-launcher-bot-mode apparently tells the test runner to retry the tests if it fails. So if you have a flaky test then this will cause it to be reported as passing. Could our controversial test retry logic be the only explanation needed for why the test never shows up as failing?
,
Dec 6 2017
In your case, the CHECK is in the main test process -- the one running the test -- and will naturally be handled properly. In the WebServiceWorkerRegistrationImpl case, however, it's in a sandboxed renderer. To replicate, put a DCHECK(false) somewhere in a renderer codepath. Maybe during renderer startup the error would be handled (since the test wouldn't be able to do anything at all). For the failure I was looking at, I think the DCHECK was hit during shutdown in a racy way such that the test passed no matter what.
,
Nov 19
***UI Mass Triage *** Just checking whether there is any pending work here? If not please feel free to close this issue.
,
Nov 22
I don't know that anything has changed to resolve the issue described in comments 7 and 9. I don't have cycles to work on this any more. If the status quo is good enough, this can be closed out. I suspect that we're missing seeing certain failures, though.
,
Nov 22
(updating the summary a bit) |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by grt@chromium.org
, Dec 1 2017Status: Started (was: Available)