context_lost_tests failed with FATAL:filesystem_policy.cc(87): Check failed: false |
|||||||||
Issue descriptionThis assertion failure started showing up today multiple times on Chromium's commit queue: https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKgsSBUZsYWtlIh9jb250ZXh0X2xvc3RfdGVzdHMgKHdpdGggcGF0Y2gpDA [0610/131706:FATAL:filesystem_policy.cc(87)] Check failed: false. Backtrace: base::debug::StackTrace::StackTrace [0x00EE8C77+23] logging::LogMessage::~LogMessage [0x00ED4551+49] sandbox::FileSystemPolicy::GenerateRules [0x00F0C060+176] sandbox::PolicyBase::AddRuleInternal [0x00F021F0+256] sandbox::PolicyBase::AddRule [0x00F0201E+30] ... Example failing try jobs: https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/237333 https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/237250 https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/237208 "git log" in containing directories doesn't show any relevant changes recently (there was an rpath change in 3ea22b3a4855512be84da4bc186392434e14de7c , surely unrelated), so can anyone think of why this assertion would have started firing? P1 because this is affecting commit queue jobs.
,
Jun 13 2016
,
Jun 13 2016
This is weird, I was diagnosing a crash like this a few weeks ago. I think related to junction points?
,
Jun 13 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a9f54f04db87c747ae932552fa49d88967c98032 commit a9f54f04db87c747ae932552fa49d88967c98032 Author: kbr <kbr@chromium.org> Date: Mon Jun 13 20:08:01 2016 Mark ContextLost.WebGLContextLostFromGPUProcessExit failing on Win7. BUG= 619196 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel TBR=zmo@chromium.org NOTRY=true Review-Url: https://codereview.chromium.org/2068503002 Cr-Commit-Position: refs/heads/master@{#399518} [modify] https://crrev.com/a9f54f04db87c747ae932552fa49d88967c98032/content/test/gpu/gpu_tests/context_lost_expectations.py
,
Jun 13 2016
Yep. A junction is about the only reason to fail there.
,
Jun 13 2016
has the bot configuration been changed to run Chrome from a junction point?
,
Jun 13 2016
Thanks for looking at these failures. These bots have been running tests via Swarming for a long time now. Swarming sets up a directory tree of hardlinks into a persistent cache directory during each test run. I don't think that mechanism has changed recently and to the best of my knowledge it only set up file-by-file links, not directory-by-directory. (maruel@, vadimsh@, can you confirm?)
,
Jun 13 2016
Where are the minidumps for these test run crashes stored? maybe I can take a look at them. Perhaps the user-data-dir has changed to a junction point. We do not support that configuration in Chromium sandbox.
,
Jun 13 2016
Re #7: correct. There were no changes there.
,
Jun 13 2016
+nednguyen as Telemetry TL, but I doubt there were any changes to its creation of temporary user profiles recently. Unfortunately we don't upload these minidumps to cloud storage yet...I'm in the middle of replacing this Telemetry-based test harness with another one ( Issue 352807 ) so let me block this on that instead of adding cloud storage uploading of minidumps to the old harness.
,
Jun 15 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a9f54f04db87c747ae932552fa49d88967c98032 commit a9f54f04db87c747ae932552fa49d88967c98032 Author: kbr <kbr@chromium.org> Date: Mon Jun 13 20:08:01 2016 Mark ContextLost.WebGLContextLostFromGPUProcessExit failing on Win7. BUG= 619196 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel TBR=zmo@chromium.org NOTRY=true Review-Url: https://codereview.chromium.org/2068503002 Cr-Commit-Position: refs/heads/master@{#399518} [modify] https://crrev.com/a9f54f04db87c747ae932552fa49d88967c98032/content/test/gpu/gpu_tests/context_lost_expectations.py
,
Jun 15 2016
jmadill@ and jbauman@ pointed out that this is very likely caused by the new sandbox rule added in https://chromium.googlesource.com/chromium/src/+/4015b488f743a7399e3362fd49917f494ff7caaf .
,
Jun 15 2016
jmadill@ also pointed out that this bot: https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Release%20%28NVIDIA%20GeForce%20730%29?numbuilds=200 has been pretty solidly red since that CL landed. Reverting it; it looks like more work is needed on it.
,
Jun 15 2016
Assigning to stanisc@ for future investigation.
,
Jun 15 2016
For the record: the full stack trace in the original report above isn't symbolized well: [0610/131706:FATAL:filesystem_policy.cc(87)] Check failed: false. Backtrace: base::debug::StackTrace::StackTrace [0x00EE8C77+23] logging::LogMessage::~LogMessage [0x00ED4551+49] sandbox::FileSystemPolicy::GenerateRules [0x00F0C060+176] sandbox::PolicyBase::AddRuleInternal [0x00F021F0+256] sandbox::PolicyBase::AddRule [0x00F0201E+30] GetHandleVerifier [0x66290BE9+12202569] GetHandleVerifier [0x65EA4B37+8090007] GetHandleVerifier [0x661C24D0+11356976] GetHandleVerifier [0x661C29B2+11358226] GetHandleVerifier [0x6573CB37+323991] GetHandleVerifier [0x656EFE0B+9323] GetHandleVerifier [0x656EF145+6053] GetHandleVerifier [0x6573EA8B+332011] GetHandleVerifier [0x656EF947+8103] GetHandleVerifier [0x6572AA99+250105] GetHandleVerifier [0x656EF8D2+7986] GetHandleVerifier [0x65707BAB+107019] GetHandleVerifier [0x660F27EF+10505807] GetHandleVerifier [0x660F29A4+10506244] GetHandleVerifier [0x65708227+108679] GetHandleVerifier [0x6571E632+199826] BaseThreadInitThunk [0x76AC337A+18] RtlInitializeExceptionChain [0x76FF9882+99] RtlInitializeExceptionChain [0x76FF9855+54] It's not 100% clear to me even which process (GPU / renderer) this came from. These tests forcibly crash and restart the GPU process. Is this coming from the GPU process that's launched after the first one crashes?
,
Jun 16 2016
Infra has temp on a junction? That's a less than desirable situation given that it breaks the sandbox. How hard would it be to change that?
,
Jun 16 2016
maruel@: does Swarming use NTFS junction points for anything related to the setup of the directory tree, or the temporary directory? nednguyen@: I thought Telemetry was responsible for profile creation and setting --user-data-dir, and that that happened in a vanilla directory, not an NTFS junction. Is that correct?
,
Jun 16 2016
swarming doesn't use junction points, only hardlinks. You should try the reproduce command and assign to someone else.
,
Jun 16 2016
Thanks for checking. stanisc@, it sounds like you may be reimplementing this another way, so assigning back to you to close as necessary. These failures were flaky, so I'm not sure reproduction is going to be easy.
,
Aug 16 2016
Since it doesn't seem there's been any progress on this and since these flaky failures aren't seen any more, closing this as WontFix (not reproducible). |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by dpranke@chromium.org
, Jun 10 2016