New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 719547 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner: ----
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug

Blocked on:
issue 464430



Sign in to add a comment

All tests failing on Win10 GPU bots with an exception: "Exception while processing test results: Invalid data given"

Project Member Reported by geoffl...@chromium.org, May 8 2017

Issue description

All tests on the Win10 Debug (Intel HD 530) GPU FYI bot are failing with an exception. json.output (exception) shows "No JSON object could be decoded".

Example build: https://build.chromium.org/p/chromium.gpu.fyi/builders/Win10%20Debug%20%28Intel%20HD%20530%29/builds/644

It looks like there was a roll of the build repository before the first failure.  Log: https://chromium.googlesource.com/chromium/tools/build/+log/a081bfaddc1..590c75a

Assigning to John Budorick because he made a change with json output on testers.


 
at a glance, this doesn't appear to be due to my change, but I'll dig in further in a bit before releasing this bug.
Cc: jbudorick@chromium.org
Owner: ----
Status: Available (was: Assigned)
Yeah, this isn't my change. From the stdout (https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.gpu.fyi%2FWin10_Debug__Intel_HD_530_%2F644%2F%2B%2Frecipes%2Fsteps%2Fcontext_lost_tests%2F0%2Fstdout):

Failed to delete C:\b\c\b\Win10_Debug__Intel_HD_530_\irxa34jx (163 files remaining).
  Maybe the test has a subprocess outliving it.
  Sleeping 2 seconds.
Failed to delete C:\b\c\b\Win10_Debug__Intel_HD_530_\irxa34jx (163 files remaining).
  Maybe the test has a subprocess outliving it.
  Sleeping 4 seconds.
...
Failed to delete the run directory, forcibly failing
the task because of it. No zombie process can outlive a
successful task run and still be marked as successful.
Fix your stuff.
Thanks for taking a look.

Also seeing similar failures on a few other bots (example: https://build.chromium.org/p/chromium.gpu.fyi/builders/Win10%20Debug%20%28NVIDIA%29/builds/1033)

These ones also fail to delete files after execution.

Comment 4 by zmo@chromium.org, May 8 2017

Labels: -Pri-3 Infra-Troopers Pri-0
Pri-0 since we lost coverage on these platforms.

Comment 5 by zmo@chromium.org, May 8 2017

Labels: OS-Windows
All Win 10 FYI bots started failing since May 05 11:50~.

Could this be related to work started on Issue 711839?

Comment 7 by zmo@chromium.org, May 8 2017

Cc: pschmidt@chromium.org
Summary: All tests failing on Win10 GPU bots with an exception: "Exception while processing test results: Invalid data given" (was: All tests failing with an exception: "Exception while processing test results: Invalid data given")
Re: 711839  I haven't rolled out the build1703 image to any slaves yet.
Hi,

In general FYI bots should not be labeled as P0.  If these bots are critical they should be moved off of an FYI master.

Comment 10 by zmo@chromium.org, May 8 2017

Labels: -Pri-0 Pri-1
OK, set to P1.  Still, we lose coverage on Win10.

Comment 11 by zmo@chromium.org, May 8 2017

Cc: no...@chromium.org
Further, it's only Telemetry tests launching Chrome that have this issue. Other unit tests are running fine.

Can https://chromium-review.googlesource.com/c/468527/ be related?
# 11
this build does not use kitchen, so no
the code modified in https://chromium-review.googlesource.com/c/495011 runs after build completion, not during the build

Comment 13 Deleted

FWIU, output of the step process (run_isolated.py) was not valid JSON
run_isolate.py output was never JSON. Why the step is trying to parse the output as JSON? Unless I am missing something
ignore prev comment, the last green build had JSON output with run_isolated.py https://build.chromium.org/p/chromium.gpu.fyi/builders/Win10%20Debug%20%28Intel%20HD%20530%29/builds/643
I believe this is the actual error:

INFO:root:Starting Chrome ['C:\\b\\c\\b\\Win10_Debug__Intel_HD_530_\\irxa34jx\\out\\Debug\\chrome.exe', '--js-flags=--expose-gc', '--enable-logging=stderr', '--disable-domain-blocking-for-3d-apis', '--disable-gpu-process-crash-limit', '--enable-gpu-benchmarking', '--enable-net-benchmarking', '--metrics-recording-only', '--no-default-browser-check', '--no-first-run', '--enable-gpu-benchmarking', '--disable-background-networking', '--proxy-server=socks://localhost:53527', '--ignore-certificate-errors', '--disable-component-extensions-with-background-pages', '--disable-default-apps', '--disable-search-geolocation-disclosure', '--remote-debugging-port=0', '--enable-crash-reporter-for-testing', '--disable-component-update', '--window-size=1280,1024', '--user-data-dir=c:\\b\\c\\b\\win10_debug__intel_hd_530_\\itwi3dyi\\tmp2nseim', 'about:blank']
[5536:2392:0505/153401.074:ERROR:memory_mapped_file.cc(52)] Couldn't open C:\b\c\b\Win10_Debug__Intel_HD_530_\irxa34jx\out\Debug\chrome_200_percent.pak
[5536:2392:0505/153401.074:ERROR:data_pack.cc(164)] Failed to mmap datapack
INFO:root:Discovered ephemeral port 53529
[0505/153403.403:ERROR:memory_mapped_file.cc(52)] Couldn't open C:\b\c\b\Win10_Debug__Intel_HD_530_\irxa34jx\out\Debug\chrome_200_percent.pak
[0505/153403.403:ERROR:data_pack.cc(164)] Failed to mmap datapack

Logging into the bot, chrome_200_percent.pak indeed does not exist in another recent debug folder /cygdrive/c/b/c/b/Win10_Debug__Intel_HD_530_/irfzjvpa/out/Debug
This is basically a "failed to launch" chrome failure though, I'm not sure why it's purple.
The .pak file is in gs://chromium-gpu-fyi-archive/chromium.gpu.fyi/GPU Win Builder (dbg)/full-build-win32_34718d4879fbba5182af5611438da97f17058142.zip

It disappeared inbetween the extract build step and the failing step?
The build is extracted into:
C:\b\c\b\Win10_Debug__Intel_HD_530_\src\out\Debug\...

But the test is trying to fun from:
C:\b\c\b\Win10_Debug__Intel_HD_530_\irxa34jx\out\Debug\...

Why?
This seems to be a new error:

[0505/115733.926:ERROR:target_services.cc(58)] Failed to find CSR Port heap handle

Looks like the build extration is a red herring.  this is an isolated run

Looks like the chrome_200 pack is indeed not in the isolate: https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=b4e8cff03ce160cd196b17f99ab77be7418bf5ef

chrome_100 is there instead.
#20: I believe that's run_isolated.
#18: it's purple because it expects to be able to create its results from the JSON file and can't do so: https://codesearch.chromium.org/chromium/build/scripts/slave/recipe_modules/chromium_tests/steps.py?rcl=fa6566763ff505e21cb7a012ae31b363dc08aad6&l=876

Comment 25 by zmo@chromium.org, May 8 2017

Is this the reason?

Failed to hardlink, falling back to copy \\?\C:\b\c\b\Win10_Release__NVIDIA_Quadro_P400_\cache\da39a3ee5e6b4b0d3255bfef95601890afd80709 to C:\b\c\b\Win10_Release__NVIDIA_Quadro_P400_\ir4rlzgd
Labels: OS-Linux
#25: no, that bot was seeing hardlink failures before. As long as it successfully copies, that shouldn't be an issue.
Labels: -OS-Linux
(not sure how OS-Linux got there)
Actually that error is present in the passing build too:
https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.gpu.fyi%2FWin10_Debug__Intel_HD_530_%2F643%2F%2B%2Frecipes%2Fsteps%2Fcontext_lost_tests%2F0%2Fstdout

These log outputs are awful, I've wasted half an hour chasing false error messages.
Error in #21 is associated to this change https://codereview.chromium.org/2859273005 which is just changing a LOG to DLOG.

There's a comment on the associated change https://codereview.chromium.org/2726733003/ "I just pulled and this is causing all my tabs to be sad as soon as they start up. Replacing "return false" with "return true" in CsrssDisconnectCleanup makes it stop."

Cc: liamjm@chromium.org bsep@chromium.org
Please see Comment #30.
The build crashed purple because the run_isolated step did not return valid json

the run_isolated step did not return valid json because run_gpu_integration_test.py crashed

run_gpu_integration_test.py crashed because the underlying runner run_browser_tests.py crashed

run_browser_tests.py crashed due to: 
DevtoolsTargetCrashException: Web content with index 0 may have crashed. filtered_context_ids = []

At this point I'm not really sure what that means.
I see, start of crash chain probably due to #21 / #30.

This looks like a src-side bug at this point, is there anything the trooper can help with?
The start of failures seems to line up well with when the feature in #30 got turned on for testing.

https://chromium.googlesource.com/chromium/src/+/1290a798ea209beefb9c00e8836aa91e0cf8b87f

This creates a feature "EnableCsrssLockdown" so that this capability can be finched.

BUG=464430

Review-Url: https://codereview.chromium.org/2862563004
Cr-Commit-Position: refs/heads/master@{#469712}

I'm going to revert this.
Labels: -Infra-Troopers

Comment 36 by wfh@chromium.org, May 8 2017

Cc: wfh@chromium.org
what exact version of Windows is running on the win10 GPU bots?
Components: -Infra
they are running Microsoft Windows [Version 10.0.10586]
Blockedon: 464430
Status: Fixed (was: Available)

Sign in to add a comment