New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 834736 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner: ----
Closed: Dec 14
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug



Sign in to add a comment

Flaky check in ANGLE dEQP FYI testers: [FATAL:test_suite.cc(246)] Check failed: printer_->Initialize(output_path).

Project Member Reported by jmad...@chromium.org, Apr 19 2018

Issue description

This seems to be causing flaky timeouts on the GPU.FYI dEQP bots. Once there is a single instance of this crash, this triggers this existing bug 788031 which makes the test step time out after a few crashes, because the test executor switches to a mode where it runs a single test per batch.

Example timeout builds:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win10%20FYI%20dEQP%20Release%20%28NVIDIA%29/2829

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win10%20FYI%20dEQP%20Release%20%28NVIDIA%29/2829

Error text:

[396:2056:0419/004213.971:79375187:FATAL:test_suite.cc(246)] Check failed: printer_->Initialize(output_path). 
Backtrace:
	base::debug::StackTrace::StackTrace [0x01097740+32]
	base::debug::StackTrace::StackTrace [0x0107C05D+13]
	logging::LogMessage::~LogMessage [0x01079643+83]
	base::TestSuite::AddTestLauncherResultPrinter [0x010E689E+334]
	base::TestSuite::Initialize [0x010E6D41+721]
	base::TestSuite::Run [0x010E64B2+26]
	main [0x0105C6D1+277]
	base::internal::Invoker<base::internal::BindState<int (__cdecl*)(base::TestSuite *),base::internal::UnretainedWrapper<base::TestSuite> >,int __cdecl(void)>::Run [0x010B699C+12]
	base::OnceCallback<int __cdecl(void)>::Run [0x010E81CE+44]
	std::unique_ptr<logging::ScopedLogAssertHandler,std::default_delete<logging::ScopedLogAssertHandler> >::reset [0x010E741D+299]
	base::LaunchUnitTestsSerially [0x010E7882+157]
	main [0x0105C672+182]
	__scrt_common_main_seh [0x01135FDA+248] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:283)
	BaseThreadInitThunk [0x75028654+36]
	RtlGetAppContainerNamedObjectPath [0x77D84B17+311]
	RtlGetAppContainerNamedObjectPath [0x77D84AE7+263]

Ken do you know the right labels for this kind of issue? Unsure if it happens in other configurations ATM.
 

Comment 1 by kbr@chromium.org, Apr 19 2018

Cc: dcheng@chromium.org
Components: Internals>Core
Hoping someone who watches Internals>Core can help route this. dcheng@ do you have any suggestions as well?

Owner: dcheng@chromium.org
Status: Available (was: Unconfirmed)
Cc: dpranke@chromium.org grt@chromium.org
Still happening.
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win10%20FYI%20dEQP%20Release%20%28NVIDIA%29/3683

Looks like what fails is creating output file. Maybe we are running out of file descriptors, or something?
Still happening
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win10%20FYI%20dEQP%20Release%20(NVIDIA)/4785

Seems like our bug handling process has a flaw if a P1 bug isn't being taken care of for 2 months.

Comment 6 by kbr@chromium.org, Jun 27 2018

Cc: wfh@chromium.org brucedaw...@chromium.org
Status: Assigned (was: Available)
+wfh, brucedawson

Could a couple of Windows experts please help us understand whether this is a Windows-specific bug in Chromium's GTest harness?

Comment 7 by grt@chromium.org, Jun 29 2018

OpenFile is failing. The question is "why"? We could change XmlUnitTestResultPrinter so that it uses an API from which we could get more fine-grained failure info. base::File, for example, has base::File::error_details() to get the Windows last error code, which may provide some insights.
Debugging this is a bit of a nuisance because the code that creates a temporary directory and passes the test-launcher-output is not invoked if you are debugging. This is the call stack where I found it being created in the normal case, although I'm not certain if this is what is happening on the bots:

>	base.dll!base::CreateTemporaryDirInDir
 	base.dll!base::CreateNewTempDirectory
 	base_unittests.exe!base::`anonymous namespace'::DefaultUnitTestPlatformDelegate::CreateResultsFile
 	base_unittests.exe!base::RunUnitTestsBatch
 	base_unittests.exe!base::UnitTestLauncherDelegate::RunTests

This code creates the temporary directory for the child processes and leaves it to them to create the output file in it. All steps of this process are properly CHECKed so it should be guaranteed that the directory is created.

Running out of file descriptors seems unlikely, but Windows does have some other odd behaviors in its file system so ??? I think we need more diagnostics. It's easy enough to print out the path when the CHECK fails - maybe that will help. If not then we may need to plumb through the error code or add some extra diagnostics to figure out why the failure is happening.

I created crrev.com/c/1121252 to add some diagnostics - any thoughts? Switching to base::File to open the file sounds like a good idea also.

Fixing bug 788031 might be the most productive thing since that would minimize the cost of this bug, and of other failures.

Comment 10 by kbr@chromium.org, Jun 30 2018

Seems useful to me. I have a feeling it'll wind up being a file which exists and is not writable because it's stale for some reason.

The CreateTemporaryDirInDir code is supposed to guarantee that it creates a new directory and I couldn't see any flaws in that. On the other hand there's this comment just before the change that I made:

  // Do not add the result printer if output path already exists. It's an
  // indicator there is a process printing to that file, and we're likely
  // its child. Do not clobber the results in that case.
  if (PathExists(output_path)) {
    LOG(WARNING) << "Test launcher output path " << output_path.AsUTF8Unsafe()
                 << " exists. Not adding test launcher result printer.";
    return;
  }

That makes me wonder if a race condition is possible, but I can't really tell because the code changes its behavior so much when being debugged.

I'll see about landing that CL. No harm anyway.

Project Member

Comment 12 by bugdroid1@chromium.org, Jul 6

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/66cce109c5d3e0d593e32a0d9aa37b84114984d3

commit 66cce109c5d3e0d593e32a0d9aa37b84114984d3
Author: Bruce Dawson <brucedawson@chromium.org>
Date: Fri Jul 06 19:24:22 2018

Add extra output to diagnose file creation failure

printer->Initialize(output_path) sometimes fails and we don't know why
so step one is to print some extra information when it fails.

Bug:  834736 
Change-Id: I65bda362d38c0899542da9234e0b47ef57d1aa56
Reviewed-on: https://chromium-review.googlesource.com/1121252
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Gabriel Charette <gab@chromium.org>
Commit-Queue: Bruce Dawson <brucedawson@chromium.org>
Cr-Commit-Position: refs/heads/master@{#573038}
[modify] https://crrev.com/66cce109c5d3e0d593e32a0d9aa37b84114984d3/base/test/test_suite.cc

Owner: jmad...@chromium.org
GPU Triage: Ping, this P1 has been sitting for a while.  jmadill@ could you take a look?
Owner: ----
Status: Available (was: Assigned)
I'm not sure I'm a good owner for this. Sorry for letting it idle. Could probably lower it to P2 if no one is going to look at it.
Status: WontFix (was: Available)
Last 1000 builds on https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win10%20FYI%20dEQP%20Release%20%28NVIDIA%29 had 2 failures, non of which was this issue, so let's close this as Won't Fix.

Sign in to add a comment