Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Issue 328249 Linux Debug (NVIDIA) bot became flaky recently
Starred by 1 user Project Member Reported by kbr@chromium.org, Dec 13, 2013 Back to list
Status: Fixed
Owner: phajdan.jr@chromium.org
Closed: Dec 2013
Cc: phajdan.jr@chromium.org, alokp@chromium.org, jln@chromium.org, briander...@chromium.org, bajones@chromium.org, dtu@chromium.org, jorgelo@chromium.org, zmo@chromium.org
Components:
OS: All
Pri: 1
Type: Bug

Blocked on:
issue 309093

Blocking:
issue 328925


Sign in to add a comment
The Linux Debug (NVIDIA) bot became flaky recently:

http://build.chromium.org/p/chromium.gpu/builders/Linux%20Debug%20%28NVIDIA%29?numbuilds=200

It was solid green up until around December 10 and has started failing about half its builds since then.

The first failing build was:

http://build.chromium.org/p/chromium.gpu/builders/Linux%20Debug%20%28NVIDIA%29/builds/22614

and, going back several green builds to:

http://build.chromium.org/p/chromium.gpu/builders/Linux%20Debug%20%28NVIDIA%29/builds/22608

results in the following regression ranges:

http://build.chromium.org/f/chromium/perf/dashboard/ui/changelog.html?url=/trunk&range=239801:239827

http://build.chromium.org/f/chromium/perf/dashboard/ui/changelog_blink.html?url=/trunk&range=163535:163543

The most likely change in my opinion to have caused this issue is:

http://src.chromium.org/viewvc/chrome?revision=239811&view=revision

because that affects exactly this configuration (Linux Debug).

The symptom appears to be that we're leaking processes again (see Issue 309093), or the test harness is hanging upon exit. Note that recent builds complete various test steps but then report:

command timed out: 1200 seconds without output, attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=2386.957749

@brianderson, please work with @phajdan.jr to try reverting his change and see whether it clears up the issue.

 
Comment 1 by jln@chromium.org, Dec 13, 2013
Would a DCHECK in the GPU process be clearly visible here ?

It seems unlikely, but r239894 / 50cecd8abd85598a850671033224b61994913294 is not completely out of the question.

Could you temporarily have this bot, run with --disable-gpu-sandbox to rule out anything sandboxing related?

The fact that this bot is one of the only ones that will actually have the gpu sandbox enabled is interesting.

I could also try and revert 50cecd8abd85598a850671033224b61994913294 tomorrow.
Comment 2 by jln@chromium.org, Dec 13, 2013
Cc: jln@chromium.org
Comment 3 by jln@chromium.org, Dec 13, 2013
Cc: jorgelo@chromium.org
The night made me suspicious. Reverted in r240670. (https://codereview.chromium.org/106903012)
Project Member Comment 4 by bugdroid1@chromium.org, Dec 13, 2013
------------------------------------------------------------------------
r240670 | jln@chromium.org | 2013-12-13T16:25:53.999282Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/sandbox/linux/seccomp-bpf/sandbox_bpf.cc?r1=240670&r2=240669&pathrev=240670
   M http://src.chromium.org/viewvc/chrome/trunk/src/sandbox/linux/services/broker_process.cc?r1=240670&r2=240669&pathrev=240670

Revert 239894 "Linux Sandbox: check no threads before fork()."

BUG= 327241 ,  328249 

> Linux Sandbox: check no threads before fork().
> 
> Always check that no threads are running before fork().
> 
> BUG= 327241 
> NOTRY=true
> 
> Review URL: https://codereview.chromium.org/108173008

TBR=jln@chromium.org

Review URL: https://codereview.chromium.org/106903012
------------------------------------------------------------------------
Comment 5 by jorgelo@chromium.org, Dec 13, 2013
jln: why do you think your check could be causing this? The issues with the bots are leaked processes/hanging harnesses.
This is not obvious.

http://build.chromium.org/p/chromium.gpu/builders/Linux%20Debug%20%28NVIDIA%29/builds/22730/steps/memory_test/logs/stdio seems to be an example of a hang or leaked process.

But http://build.chromium.org/p/chromium.gpu/builders/Linux%20Debug%20%28NVIDIA%29/builds/22728/steps/gpu_process_launch_tests/logs/stdio says this:

Traceback (most recent call last):
  File "/b/build/slave/Linux_Debug__NVIDIA_/build/src/content/test/gpu/../../../tools/telemetry/telemetry/page/page_runner.py", line 446, in _RunPage
    test.Run(finder_options, page, page_state.tab, results)
  File "/b/build/slave/Linux_Debug__NVIDIA_/build/src/content/test/gpu/../../../tools/telemetry/telemetry/page/page_test.py", line 228, in Run
    self._test_method(page, tab, results)
  File "/b/build/slave/Linux_Debug__NVIDIA_/build/src/content/test/gpu/gpu_tests/gpu_process.py", line 33, in ValidatePage
    raise page_test.Failure('No GPU process detected')
Failure: No GPU process detected

There are other timeouts there as well, likely related to the browser just misbehaving.

Just in case, my change http://src.chromium.org/viewvc/chrome?revision=239811&view=revision is in the blamelist for build http://build.chromium.org/p/chromium.gpu/builders/Linux%20Debug%20%28NVIDIA%29/builds/22612 and the first failures started appearing in build #22614.

This is indeed suspicious, but I'd rather give it a try to identify and fix the problem (and if libstdc++ debug mode flags a problem, it is a problem - and it's also a problem that the harness doesn't give full info what's happening).

If that doesn't work within reasonable time, I'll revert. If anyone has a strong opinion to revert http://src.chromium.org/viewvc/chrome?revision=239811&view=revision please go ahead and just do so - although I'd prefer to instead make a smaller change to disable the libstdc++ debug mode in build/common.gypi. In fact, if possible we could disable that only for GPU bots while the problem is investigated, so we continue to catch regressions on the main waterfall.
Comment 7 by jln@chromium.org, Dec 13, 2013
Four green in a row since I reverted. Suspicion in growing.

kbr: a DCHECK should be more clearly visible and not fail so silently though. I think this may be related to issue 328471 (graceful fallback of GPU process on crash).
Note that even before the revert there was one occurrence of ~8 greens in a row.

Still, thanks for revert, and fingers crossed.
Thanks for making my job easy @jln. I'll keep my eye on the bot to see if it flakes again.
Comment 10 by jln@chromium.org, Dec 13, 2013
brianderson: what could we do to make DCHECKs more visible ? The DCHECK might have been flaky, but a DCHECK should yield a clear error though.

Could these bots expose stderr somehow ?
Comment 11 by kbr@chromium.org, Dec 13, 2013
Cc: dtu@chromium.org
Thanks @jln and @phajdan.jr for jumping on this issue so quickly, and @jln for the quick speculative revert. Let's keep an eye on the bot.

The Telemetry harness should be forwarding output already, but it's definitely possible that some sub-processes' output is being squelched. Do we need more flags like --enable-logging or --disable-breakpad?

It looks like the flake has returned. @phajdan.jr is going to disable his patch for the GPU bots while we figure out how to get stdout/stderr from the GPU process.

If the flake exists after disabling the patch, we need to keep looking.
If the flake goes away, we can re-enable the patch once we have stdout/stderr for us to debug with.
Project Member Comment 13 by bugdroid1@chromium.org, Dec 13, 2013
------------------------------------------------------------------------
r240769 | phajdan.jr@chromium.org | 2013-12-13T22:10:27.784521Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/build/common.gypi?r1=240769&r2=240768&pathrev=240769

Add a flag to force disable libstdc++ debug mode.

BUG= 328249 ,  65151 
R=kbr@chromium.org

Review URL: https://codereview.chromium.org/102903005
------------------------------------------------------------------------
Project Member Comment 14 by bugdroid1@chromium.org, Dec 13, 2013
------------------------------------------------------------------------
r240789 | phajdan.jr@chromium.org | 2013-12-13T23:00:19.455571Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/linux_debug_tryserver.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/mac_debug_tryserver.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/win_release.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/linux_release.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/mac_release.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_upload.expected/win_release.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/mac_release_git.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/win_debug.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/mac_release_tryserver_blink.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_upload.expected/linux_release.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/linux_debug.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_upload.expected/mac_release.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/mac_debug.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/mac_release_skip_checkout.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/win_release_blink.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipe_modules/gpu/api.py?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/linux_release_blink.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/mac_release_blink.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/win_release_tryserver.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/win_debug_blink.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/linux_debug_blink.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/linux_release_tryserver.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/mac_debug_blink.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/mac_release_tryserver.json?r1=240789&r2=240788&pathrev=240789
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/slave/recipes/gpu/build_and_test.expected/win_debug_tryserver.json?r1=240789&r2=240788&pathrev=240789

Disable libstdc++ debug mode on GPU bots.

BUG= 328249 ,  65151 

Review URL: https://codereview.chromium.org/115583002
------------------------------------------------------------------------
Comment 15 by kbr@chromium.org, Dec 16, 2013
Cc: alokp@chromium.org briander...@chromium.org
Owner: phajdan.jr@chromium.org
Status: Fixed
Disabling libstdc++ debug mode on this bot seems to have brought it back to reliability. Closing as fixed. Filed follow-on Issue 328925 to track re-enabling this debugging facility on this bot.

Comment 16 by kbr@chromium.org, Dec 16, 2013
Blocking: chromium:328925
Comment 17 by jln@chromium.org, Dec 16, 2013
Excellent! FYI, I'm going to revert my revert (r240670).
Project Member Comment 18 by bugdroid1@chromium.org, Dec 16, 2013
------------------------------------------------------------------------
r240961 | jln@chromium.org | 2013-12-16T19:09:48.865181Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/sandbox/linux/seccomp-bpf/sandbox_bpf.cc?r1=240961&r2=240960&pathrev=240961
   M http://src.chromium.org/viewvc/chrome/trunk/src/sandbox/linux/services/broker_process.cc?r1=240961&r2=240960&pathrev=240961

Revert 240670 "Revert 239894 "Linux Sandbox: check no threads be..."

> Revert 239894 "Linux Sandbox: check no threads before fork()."
> 
> BUG= 327241 ,  328249 
> 
> > Linux Sandbox: check no threads before fork().
> > 
> > Always check that no threads are running before fork().
> > 
> > BUG= 327241 
> > NOTRY=true
> > 
> > Review URL: https://codereview.chromium.org/108173008
> 
> TBR=jln@chromium.org
> 
> Review URL: https://codereview.chromium.org/106903012

TBR=jln@chromium.org

Review URL: https://codereview.chromium.org/100623014
------------------------------------------------------------------------
Sign in to add a comment