Issue metadata
Sign in to add a comment
|
browser_tests failing to exit cleanly on Windows, causing many failures on win_chromium_rel_ng |
||||||||||||||||||||||
Issue descriptionIndividual shards of browser_tests are intermittently but frequently failing to clean up their temporary directories on win_chromium_rel_ng, causing the step to fail. This is happening on the win_chromium_rel_ng tryserver with a high failure rate -- perhaps as high as 20% of the time -- causing large slowdowns of the CQ due to retries. There is some significant flakiness of the CloudPolicyTest.InvalidatePolicy test which chromium-try-flakes auto-detected in Issue 722246 , but it doesn't seem to have found this overall flakiness of the harness. Here are just a few affected builds: https://luci-milo.appspot.com/buildbot/tryserver.chromium.win/win_chromium_rel_ng/449315 https://luci-milo.appspot.com/buildbot/tryserver.chromium.win/win_chromium_rel_ng/449296 https://luci-milo.appspot.com/buildbot/tryserver.chromium.win/win_chromium_rel_ng/449284 https://luci-milo.appspot.com/buildbot/tryserver.chromium.win/win_chromium_rel_ng/449280 There are surely many more here: https://luci-milo.appspot.com/buildbot/tryserver.chromium.win/win_chromium_rel_ng/?limit=200 I only discovered this because I was trying to motivate raising Issue 722246 to P0. Log excerpt: SUCCESS: all tests passed. Failed to delete e:\b\swarm_slave\w\ir (3 files remaining). Maybe the test has a subprocess outliving it. Sleeping 2 seconds. Failed to delete e:\b\swarm_slave\w\ir (3 files remaining). Maybe the test has a subprocess outliving it. Sleeping 4 seconds. Failed to delete e:\b\swarm_slave\w\ir. The following files remain: - \\?\e:\b\swarm_slave\w\ir\out\Release - \\?\e:\b\swarm_slave\w\ir\out - \\?\e:\b\swarm_slave\w\ir Enumerating processes: Failed to delete e:\b\swarm_slave\w\ir. The following files remain: - \\?\e:\b\swarm_slave\w\ir\out\Release - \\?\e:\b\swarm_slave\w\ir\out - \\?\e:\b\swarm_slave\w\ir 3676 2017-05-19 01:56:42.769 E: Failure with [Error 32] The process cannot access the file because it is being used by another process: u'\\\\?\\e:\\b\\swarm_slave\\w\\ir\\out\\Release' Failed to delete the run directory, thus failing the task. This may be due to a subprocess outliving the main task process, holding on to resources. Please fix the task so that it releases resources and cleans up subprocesses. Does this directory have a special purpose for Swarming? I'm more used to seeing obvious temporary directory names and this one looks like a well-known one.
,
May 19 2017
Thanks maruel@ for the feedback. It seems to be new behavior that browser_tests are this flaky on the tryserver. How can we go about tracking down what's going on?
,
May 19 2017
Run locally one of the fastest shard you can find that exhibited the problem while keeping procexp open and figure out what child process outlives the parent. --- I meant to eventually implement fancy tracking inside the swarming bot but that's highly OS specific. This is still blocked on Windows 7 inability of using nested job objects. https://msdn.microsoft.com/en-us/library/windows/desktop/hh448388.aspx So until the bulk of the load is transferred to Windows 10, this won't help.
,
May 26 2017
Issue 724588 has been merged into this issue.
,
May 26 2017
What's the plan here?
,
May 31 2017
Having a difficult time triaging the current browser_tests failures on win_chromium_rel_ng. I see this failure: https://luci-milo.appspot.com/buildbot/tryserver.chromium.win/win_chromium_rel_ng/457559 [0530/180331.011:FATAL:registry.cc(242)] Check failed: key_. Backtrace: base::debug::StackTrace::StackTrace [0x026407B7+55] base::debug::StackTrace::StackTrace [0x026083CA+10] base::win::RegKey::DeleteKey [0x025ECAE4+100] ChromeTestLauncherDelegate::PreSharding [0x04BD0287+167] content::LaunchTests [0x027275A1+456] LaunchChromeTests [0x04BD01C4+62] main [0x04BCFFD9+63] __scrt_common_main_seh [0x04B7FC2B+249] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:253) BaseThreadInitThunk [0x7580336A+18] RtlInitializeExceptionChain [0x777F9902+99] RtlInitializeExceptionChain [0x777F98D5+54] which I think may be related to earlier problems with registry keys sticking around between runs. To be honest, the rest of the browser_tests failures look legitimate at this point. If the current sheriff can confirm this then perhaps this should be closed as WontFix.
,
May 31 2017
Oops, this is a bug in that cleanup code. It should try to DeleteKey() if result == FILE_NOT_FOUND. Fixing immediately.
,
May 31 2017
s/should/shouldn't/
,
May 31 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/978ad0704f0ab11c422088c6b468aabdf10ff4b5 commit 978ad0704f0ab11c422088c6b468aabdf10ff4b5 Author: gab <gab@chromium.org> Date: Wed May 31 18:36:42 2017 Do not delete sub key if main key was not found in ChromeTestLauncherDelegate::PreSharding() Otherwise DeleteKey hits DCHECK(key_) when distrubution_key.Open() results in ERROR_FILE_NOT_FOUND. BUG= 724350 Review-Url: https://codereview.chromium.org/2919523002 Cr-Commit-Position: refs/heads/master@{#475965} [modify] https://crrev.com/978ad0704f0ab11c422088c6b468aabdf10ff4b5/chrome/test/base/chrome_test_launcher.cc
,
May 31 2017
Not sure it was only failure but will call this Fixed for now.
,
May 31 2017
,
May 31 2017
Thanks for the fix gab@. Relating to earlier bug. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by mar...@chromium.org
, May 19 2017