New issue
Advanced search Search tips

Issue 724350 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug-Regression

Blocked on:
issue 721245

Blocking:
issue 722246



Sign in to add a comment

browser_tests failing to exit cleanly on Windows, causing many failures on win_chromium_rel_ng

Project Member Reported by kbr@chromium.org, May 19 2017

Issue description

Individual shards of browser_tests are intermittently but frequently failing to clean up their temporary directories on win_chromium_rel_ng, causing the step to fail. This is happening on the win_chromium_rel_ng tryserver with a high failure rate -- perhaps as high as 20% of the time -- causing large slowdowns of the CQ due to retries.

There is some significant flakiness of the CloudPolicyTest.InvalidatePolicy test which chromium-try-flakes auto-detected in  Issue 722246 , but it doesn't seem to have found this overall flakiness of the harness.

Here are just a few affected builds:

https://luci-milo.appspot.com/buildbot/tryserver.chromium.win/win_chromium_rel_ng/449315
https://luci-milo.appspot.com/buildbot/tryserver.chromium.win/win_chromium_rel_ng/449296
https://luci-milo.appspot.com/buildbot/tryserver.chromium.win/win_chromium_rel_ng/449284
https://luci-milo.appspot.com/buildbot/tryserver.chromium.win/win_chromium_rel_ng/449280

There are surely many more here:

https://luci-milo.appspot.com/buildbot/tryserver.chromium.win/win_chromium_rel_ng/?limit=200

I only discovered this because I was trying to motivate raising  Issue 722246  to P0.

Log excerpt:

SUCCESS: all tests passed.
Failed to delete e:\b\swarm_slave\w\ir (3 files remaining).
  Maybe the test has a subprocess outliving it.
  Sleeping 2 seconds.
Failed to delete e:\b\swarm_slave\w\ir (3 files remaining).
  Maybe the test has a subprocess outliving it.
  Sleeping 4 seconds.
Failed to delete e:\b\swarm_slave\w\ir. The following files remain:
- \\?\e:\b\swarm_slave\w\ir\out\Release
- \\?\e:\b\swarm_slave\w\ir\out
- \\?\e:\b\swarm_slave\w\ir
Enumerating processes:
Failed to delete e:\b\swarm_slave\w\ir. The following files remain:
- \\?\e:\b\swarm_slave\w\ir\out\Release
- \\?\e:\b\swarm_slave\w\ir\out
- \\?\e:\b\swarm_slave\w\ir
3676 2017-05-19 01:56:42.769 E: Failure with [Error 32] The process cannot access the file because it is being used by another process: u'\\\\?\\e:\\b\\swarm_slave\\w\\ir\\out\\Release'
Failed to delete the run directory, thus failing the task.
This may be due to a subprocess outliving the main task
process, holding on to resources. Please fix the task so
that it releases resources and cleans up subprocesses.


Does this directory have a special purpose for Swarming? I'm more used to seeing obvious temporary directory names and this one looks like a well-known one.

 

Comment 1 by mar...@chromium.org, May 19 2017

Components: -Infra>Platform>Swarming
"ir" used to be random, but because of goma we had to make it deterministic. It doesn't change anything to the underlying issue, a child process outlives the parent process.

This error is only surfaced on Windows due to a technically, the underlying problem (child process outliving parent) is not Windows specific.

Comment 2 by kbr@chromium.org, May 19 2017

Thanks maruel@ for the feedback.

It seems to be new behavior that browser_tests are this flaky on the tryserver. How can we go about tracking down what's going on?

Comment 3 by mar...@chromium.org, May 19 2017

Run locally one of the fastest shard you can find that exhibited the problem while keeping procexp open and figure out what child process outlives the parent.

---

I meant to eventually implement fancy tracking inside the swarming bot but that's highly OS specific. This is still blocked on Windows 7 inability of using nested job objects. https://msdn.microsoft.com/en-us/library/windows/desktop/hh448388.aspx
So until the bulk of the load is transferred to Windows 10, this won't help.
 Issue 724588  has been merged into this issue.
What's the plan here?

Comment 6 by kbr@chromium.org, May 31 2017

Having a difficult time triaging the current browser_tests failures on win_chromium_rel_ng.

I see this failure:

https://luci-milo.appspot.com/buildbot/tryserver.chromium.win/win_chromium_rel_ng/457559

[0530/180331.011:FATAL:registry.cc(242)] Check failed: key_. 
Backtrace:
	base::debug::StackTrace::StackTrace [0x026407B7+55]
	base::debug::StackTrace::StackTrace [0x026083CA+10]
	base::win::RegKey::DeleteKey [0x025ECAE4+100]
	ChromeTestLauncherDelegate::PreSharding [0x04BD0287+167]
	content::LaunchTests [0x027275A1+456]
	LaunchChromeTests [0x04BD01C4+62]
	main [0x04BCFFD9+63]
	__scrt_common_main_seh [0x04B7FC2B+249] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:253)
	BaseThreadInitThunk [0x7580336A+18]
	RtlInitializeExceptionChain [0x777F9902+99]
	RtlInitializeExceptionChain [0x777F98D5+54]

which I think may be related to earlier problems with registry keys sticking around between runs.

To be honest, the rest of the browser_tests failures look legitimate at this point. If the current sheriff can confirm this then perhaps this should be closed as WontFix.

Comment 7 by gab@chromium.org, May 31 2017

Owner: gab@chromium.org
Status: Started (was: Untriaged)
Oops, this is a bug in that cleanup code. It should try to DeleteKey() if result == FILE_NOT_FOUND. Fixing immediately.

Comment 8 by gab@chromium.org, May 31 2017

s/should/shouldn't/
Project Member

Comment 9 by bugdroid1@chromium.org, May 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/978ad0704f0ab11c422088c6b468aabdf10ff4b5

commit 978ad0704f0ab11c422088c6b468aabdf10ff4b5
Author: gab <gab@chromium.org>
Date: Wed May 31 18:36:42 2017

Do not delete sub key if main key was not found in ChromeTestLauncherDelegate::PreSharding()

Otherwise DeleteKey hits DCHECK(key_) when distrubution_key.Open() results
in ERROR_FILE_NOT_FOUND.

BUG= 724350 

Review-Url: https://codereview.chromium.org/2919523002
Cr-Commit-Position: refs/heads/master@{#475965}

[modify] https://crrev.com/978ad0704f0ab11c422088c6b468aabdf10ff4b5/chrome/test/base/chrome_test_launcher.cc

Comment 10 by gab@chromium.org, May 31 2017

Status: Fixed (was: Started)
Not sure it was only failure but will call this Fixed for now.

Comment 11 by kbr@chromium.org, May 31 2017

Blockedon: 721245

Comment 12 by kbr@chromium.org, May 31 2017

Thanks for the fix gab@. Relating to earlier bug.

Sign in to add a comment