All tests failing on Win10 GPU bots with an exception: "Exception while processing test results: Invalid data given" |
|||||||||||||||
Issue descriptionAll tests on the Win10 Debug (Intel HD 530) GPU FYI bot are failing with an exception. json.output (exception) shows "No JSON object could be decoded". Example build: https://build.chromium.org/p/chromium.gpu.fyi/builders/Win10%20Debug%20%28Intel%20HD%20530%29/builds/644 It looks like there was a roll of the build repository before the first failure. Log: https://chromium.googlesource.com/chromium/tools/build/+log/a081bfaddc1..590c75a Assigning to John Budorick because he made a change with json output on testers.
,
May 8 2017
Yeah, this isn't my change. From the stdout (https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.gpu.fyi%2FWin10_Debug__Intel_HD_530_%2F644%2F%2B%2Frecipes%2Fsteps%2Fcontext_lost_tests%2F0%2Fstdout): Failed to delete C:\b\c\b\Win10_Debug__Intel_HD_530_\irxa34jx (163 files remaining). Maybe the test has a subprocess outliving it. Sleeping 2 seconds. Failed to delete C:\b\c\b\Win10_Debug__Intel_HD_530_\irxa34jx (163 files remaining). Maybe the test has a subprocess outliving it. Sleeping 4 seconds. ... Failed to delete the run directory, forcibly failing the task because of it. No zombie process can outlive a successful task run and still be marked as successful. Fix your stuff.
,
May 8 2017
Thanks for taking a look. Also seeing similar failures on a few other bots (example: https://build.chromium.org/p/chromium.gpu.fyi/builders/Win10%20Debug%20%28NVIDIA%29/builds/1033) These ones also fail to delete files after execution.
,
May 8 2017
Pri-0 since we lost coverage on these platforms.
,
May 8 2017
,
May 8 2017
All Win 10 FYI bots started failing since May 05 11:50~. Could this be related to work started on Issue 711839?
,
May 8 2017
,
May 8 2017
Re: 711839 I haven't rolled out the build1703 image to any slaves yet.
,
May 8 2017
Hi, In general FYI bots should not be labeled as P0. If these bots are critical they should be moved off of an FYI master.
,
May 8 2017
OK, set to P1. Still, we lose coverage on Win10.
,
May 8 2017
Further, it's only Telemetry tests launching Chrome that have this issue. Other unit tests are running fine. Can https://chromium-review.googlesource.com/c/468527/ be related?
,
May 8 2017
# 11 this build does not use kitchen, so no the code modified in https://chromium-review.googlesource.com/c/495011 runs after build completion, not during the build
,
May 8 2017
FWIU, output of the step process (run_isolated.py) was not valid JSON
,
May 8 2017
run_isolate.py output was never JSON. Why the step is trying to parse the output as JSON? Unless I am missing something
,
May 8 2017
ignore prev comment, the last green build had JSON output with run_isolated.py https://build.chromium.org/p/chromium.gpu.fyi/builders/Win10%20Debug%20%28Intel%20HD%20530%29/builds/643
,
May 8 2017
I believe this is the actual error: INFO:root:Starting Chrome ['C:\\b\\c\\b\\Win10_Debug__Intel_HD_530_\\irxa34jx\\out\\Debug\\chrome.exe', '--js-flags=--expose-gc', '--enable-logging=stderr', '--disable-domain-blocking-for-3d-apis', '--disable-gpu-process-crash-limit', '--enable-gpu-benchmarking', '--enable-net-benchmarking', '--metrics-recording-only', '--no-default-browser-check', '--no-first-run', '--enable-gpu-benchmarking', '--disable-background-networking', '--proxy-server=socks://localhost:53527', '--ignore-certificate-errors', '--disable-component-extensions-with-background-pages', '--disable-default-apps', '--disable-search-geolocation-disclosure', '--remote-debugging-port=0', '--enable-crash-reporter-for-testing', '--disable-component-update', '--window-size=1280,1024', '--user-data-dir=c:\\b\\c\\b\\win10_debug__intel_hd_530_\\itwi3dyi\\tmp2nseim', 'about:blank'] [5536:2392:0505/153401.074:ERROR:memory_mapped_file.cc(52)] Couldn't open C:\b\c\b\Win10_Debug__Intel_HD_530_\irxa34jx\out\Debug\chrome_200_percent.pak [5536:2392:0505/153401.074:ERROR:data_pack.cc(164)] Failed to mmap datapack INFO:root:Discovered ephemeral port 53529 [0505/153403.403:ERROR:memory_mapped_file.cc(52)] Couldn't open C:\b\c\b\Win10_Debug__Intel_HD_530_\irxa34jx\out\Debug\chrome_200_percent.pak [0505/153403.403:ERROR:data_pack.cc(164)] Failed to mmap datapack Logging into the bot, chrome_200_percent.pak indeed does not exist in another recent debug folder /cygdrive/c/b/c/b/Win10_Debug__Intel_HD_530_/irfzjvpa/out/Debug
,
May 8 2017
This is basically a "failed to launch" chrome failure though, I'm not sure why it's purple.
,
May 8 2017
The .pak file is in gs://chromium-gpu-fyi-archive/chromium.gpu.fyi/GPU Win Builder (dbg)/full-build-win32_34718d4879fbba5182af5611438da97f17058142.zip It disappeared inbetween the extract build step and the failing step?
,
May 8 2017
The build is extracted into: C:\b\c\b\Win10_Debug__Intel_HD_530_\src\out\Debug\... But the test is trying to fun from: C:\b\c\b\Win10_Debug__Intel_HD_530_\irxa34jx\out\Debug\... Why?
,
May 8 2017
This seems to be a new error: [0505/115733.926:ERROR:target_services.cc(58)] Failed to find CSR Port heap handle
,
May 8 2017
Looks like the build extration is a red herring. this is an isolated run Looks like the chrome_200 pack is indeed not in the isolate: https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=b4e8cff03ce160cd196b17f99ab77be7418bf5ef chrome_100 is there instead.
,
May 8 2017
#20: I believe that's run_isolated.
,
May 8 2017
#18: it's purple because it expects to be able to create its results from the JSON file and can't do so: https://codesearch.chromium.org/chromium/build/scripts/slave/recipe_modules/chromium_tests/steps.py?rcl=fa6566763ff505e21cb7a012ae31b363dc08aad6&l=876
,
May 8 2017
Is this the reason? Failed to hardlink, falling back to copy \\?\C:\b\c\b\Win10_Release__NVIDIA_Quadro_P400_\cache\da39a3ee5e6b4b0d3255bfef95601890afd80709 to C:\b\c\b\Win10_Release__NVIDIA_Quadro_P400_\ir4rlzgd
,
May 8 2017
That file was not there in the succeeding run either https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=aefb8cabe7802e2c336c7c8da731a7f1ddb876aa
,
May 8 2017
#25: no, that bot was seeing hardlink failures before. As long as it successfully copies, that shouldn't be an issue.
,
May 8 2017
(not sure how OS-Linux got there)
,
May 8 2017
Actually that error is present in the passing build too: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.gpu.fyi%2FWin10_Debug__Intel_HD_530_%2F643%2F%2B%2Frecipes%2Fsteps%2Fcontext_lost_tests%2F0%2Fstdout These log outputs are awful, I've wasted half an hour chasing false error messages.
,
May 8 2017
Error in #21 is associated to this change https://codereview.chromium.org/2859273005 which is just changing a LOG to DLOG. There's a comment on the associated change https://codereview.chromium.org/2726733003/ "I just pulled and this is causing all my tabs to be sad as soon as they start up. Replacing "return false" with "return true" in CsrssDisconnectCleanup makes it stop."
,
May 8 2017
Please see Comment #30.
,
May 8 2017
The build crashed purple because the run_isolated step did not return valid json the run_isolated step did not return valid json because run_gpu_integration_test.py crashed run_gpu_integration_test.py crashed because the underlying runner run_browser_tests.py crashed run_browser_tests.py crashed due to: DevtoolsTargetCrashException: Web content with index 0 may have crashed. filtered_context_ids = [] At this point I'm not really sure what that means.
,
May 8 2017
I see, start of crash chain probably due to #21 / #30. This looks like a src-side bug at this point, is there anything the trooper can help with?
,
May 8 2017
The start of failures seems to line up well with when the feature in #30 got turned on for testing. https://chromium.googlesource.com/chromium/src/+/1290a798ea209beefb9c00e8836aa91e0cf8b87f This creates a feature "EnableCsrssLockdown" so that this capability can be finched. BUG=464430 Review-Url: https://codereview.chromium.org/2862563004 Cr-Commit-Position: refs/heads/master@{#469712} I'm going to revert this.
,
May 8 2017
,
May 8 2017
what exact version of Windows is running on the win10 GPU bots?
,
May 8 2017
,
May 8 2017
they are running Microsoft Windows [Version 10.0.10586]
,
May 8 2017
,
May 11 2017
|
|||||||||||||||
►
Sign in to add a comment |
|||||||||||||||
Comment 1 by jbudorick@chromium.org
, May 8 2017