New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 674213 link

Starred by 7 users

Issue metadata

Status: Fixed
Owner:
Last visit 26 days ago
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: Bug

Blocking:
issue 672985



Sign in to add a comment

generate_build_files flaked a few times on Builder: Android Arm64 Builder (dbg)

Project Member Reported by thakis@chromium.org, Dec 14 2016

Issue description

On
https://build.chromium.org/p/chromium.linux/builders/Android%20Arm64%20Builder%20(dbg) , generate_build_files failed once, then passed, then failed 4 times, then passed again, without any related-looking change.

The failure each time was:


Writing """\
ffmpeg_branding = "Chrome"
goma_dir = "/b/c/cipd/goma"
is_component_build = false
is_debug = true
proprietary_codecs = true
symbol_level = 1
target_cpu = "arm64"
target_os = "android"
use_goma = true
""" to /b/c/b/Android_Arm64_Builder__dbg_/src/out/Debug/args.gn.

/b/c/b/Android_Arm64_Builder__dbg_/src/buildtools/linux64/gn gen //out/Debug --check
  -> returned 1
ERROR at build arg file (use "gn args <out_dir>" to edit):1:19: Build argument has no effect.
ffmpeg_branding = "Chrome"
                  ^-------
The variable "ffmpeg_branding" was set as a build argument
but never appeared in a declare_args() block in any buildfile.

To view all possible args, run "gn args --list <builddir>"
ERROR No targets.
I could not find any targets to write, so I'm doing nothing.
ERROR 
GN gen failed: 1
step returned non-z


This doesn't repro locally.


Failing builds:

https://build.chromium.org/p/chromium.linux/builders/Android%20Arm64%20Builder%20%28dbg%29/builds/38833
https://build.chromium.org/p/chromium.linux/builders/Android%20Arm64%20Builder%20%28dbg%29/builds/38834 (this one passed)
https://build.chromium.org/p/chromium.linux/builders/Android%20Arm64%20Builder%20%28dbg%29/builds/38835
https://build.chromium.org/p/chromium.linux/builders/Android%20Arm64%20Builder%20%28dbg%29/builds/38836
https://build.chromium.org/p/chromium.linux/builders/Android%20Arm64%20Builder%20%28dbg%29/builds/38837
https://build.chromium.org/p/chromium.linux/builders/Android%20Arm64%20Builder%20%28dbg%29/builds/38838

https://build.chromium.org/p/chromium.linux/builders/Android%20Arm64%20Builder%20%28dbg%29/builds/38839 passed again.



cc'ing gn and ffmpeg folks.

If it never happens again, then I guess we don't need to worry about it.
 
Cc: scottmg@chromium.org
Hmm, not sure how this could happen, ffmpeg_branding defined here:
https://cs.chromium.org/chromium/src/third_party/ffmpeg/ffmpeg_options.gni

Always included here:
https://cs.chromium.org/chromium/src/third_party/ffmpeg/BUILD.gn

Which is always included here:
https://cs.chromium.org/chromium/src/media/BUILD.gn
Cc: machenb...@chromium.org thakis@chromium.org
Components: -Internals>Media>FFmpeg Build
Owner: brettw@chromium.org
Status: Assigned (was: Unconfirmed)
I'm increasingly suspecting that something has regressed / broken in GN. We've seen a few flaky reports of things like this that shouldn't be happening lately.

I'm suspecting that my fix for  bug 542846  (r433944, https://codereview.chromium.org/2509333003) might be the culprit. We could revert it without breaking anything, I think, but I'm reluctant to do so unless we get better luck at reproducing these flakes.

brettw@, any thoughts? Maybe you could take another look at that CL and see if I screwed up something?

Removing the ffmpeg component, I doubt this is actually an ffmpeg thing at all.
Labels: -Pri-3 Pri-1
Cc: smut@chromium.org
 Issue 674802  has been merged into this issue.
Blocking: 672985
Project Member

Comment 7 by bugdroid1@chromium.org, Dec 18 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8d1f54effdc07f29a6fd00411e453ae3d67f9a86

commit 8d1f54effdc07f29a6fd00411e453ae3d67f9a86
Author: dpranke <dpranke@chromium.org>
Date: Sun Dec 18 05:13:02 2016

Revert GN declare_args() change.

We've seen a bunch of bug reports for flaky build failures
complaining about unused or undeclared arguments; it seems
possible that this CL might be the cause, so I'm speculatively
reverting it and we can see if things get better; reverting the
change makes GN more permissive, so this should be reasonably safe.

This reverts r433944 (f6fd4d747c77e1c7968eb5af8b195a94a7ed2fac),
i.e. https://codereview.chromium.org/2509333003. This CL leaves
the build files changes that were in that CL because they
are backwards-compatible and improvements regardless.

TBR=brettw@chromium.org
BUG= 674213 

Review-Url: https://codereview.chromium.org/2586073002
Cr-Commit-Position: refs/heads/master@{#439366}

[modify] https://crrev.com/8d1f54effdc07f29a6fd00411e453ae3d67f9a86/tools/gn/docs/reference.md
[modify] https://crrev.com/8d1f54effdc07f29a6fd00411e453ae3d67f9a86/tools/gn/functions.cc
[modify] https://crrev.com/8d1f54effdc07f29a6fd00411e453ae3d67f9a86/tools/gn/functions.h
[modify] https://crrev.com/8d1f54effdc07f29a6fd00411e453ae3d67f9a86/tools/gn/functions_unittest.cc
[modify] https://crrev.com/8d1f54effdc07f29a6fd00411e453ae3d67f9a86/tools/gn/parse_tree.cc
[modify] https://crrev.com/8d1f54effdc07f29a6fd00411e453ae3d67f9a86/tools/gn/scope.cc
[modify] https://crrev.com/8d1f54effdc07f29a6fd00411e453ae3d67f9a86/tools/gn/scope.h

Owner: dpranke@chromium.org
Status: Fixed (was: Assigned)
This is hopefully fixed now; please re-open if you see this again.
Status: Assigned (was: Fixed)
We started seeing the same issue downstream around gn r436733 (not 100% sure here). Updating to r439377 did not help, builds still occasionally fail with "ERROR No targets" and a reference to a random build argument being unused.

Usually retrying the gn run succeeds, so the issue is not easy to track. We run  gn with tracing enabled and it looks like the failure happened in such a way that the log was not written at all.

We're currently trying to see if rolling back to gn r436326 is enough to make the issues go away.
Correction: we're trying r422996 as the "probabaly good" version.
Cc: brettw@chromium.org
 Issue 672598  has been merged into this issue.
Owner: brettw@chromium.org
Hm. If the revert didn't fix things (and I'm not super-surprised by this, since I didn't know what in that CL would've broken it), then I'm not sure what's going on.

I'm punting this over to brettw@, who's probably better suited to debug this sort of thing.

Brett, can you take a look at this, since it's causing flaky build failures?
The issue reproduces more often for me when the machine is under load, I'm able to get a ~10% reproduction rate when running our downstream build script up till the GN point under high background load (and <1% when no load). 
Reproduces nearly 100% with "stress --cpu 8" running on the builder. I should have enough cycles to try and bisect this today.
Cc: fdoray@chromium.org
Bisects down to https://codereview.chromium.org/2515383005 aka "gn: Remove Scheduler::pool()."

Confirmed by checking tip of master behavior: fails as-is, works after reverting that CL.
Awesome work!

After studying this, I understand what's going on. Lemme post a patch.
Status: Started (was: Assigned)
What I think happened is that there's a race condition in noticing there's no work (work count == 0) and adding more work.

Previously loading files (the first part of the build) didn't change the work count since they were posted to the pool. Since these never decremented the work count, we never checked for 0 when they completed. Later, writing targets would modify the work count, but there would be lots of these happening so the race condition was exceedingly rare.

With the identified change, we started doing work counting for file loading. So whenever a file load finishes and there's no other work, GN will declare that the build is done and exit (resulting in confusing errors).

The race condition happens when we post the declared targets back to the main thread for hooking up into the build graph. And mostly it will happen after processing the root BUILD file. There will only be one file load at this point as all other loads will be triggered by the main thread noticing unresolved dependencies. So if the file load finishes before the main thread can hook up any targets and post more loads to the work queue, we'll exit.

Most of the time, the main thread wins this race because it doesn't have anything else to do and everything will be fine. But if the file load wins because there is high thread contention, the bug will trigger.
Project Member

Comment 18 by bugdroid1@chromium.org, Jan 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/717182cd794aae4577fbae651a88e3ba6d973dc8

commit 717182cd794aae4577fbae651a88e3ba6d973dc8
Author: brettw <brettw@chromium.org>
Date: Tue Jan 10 00:25:19 2017

Fix a GN race condition.

The loader maintains a reference count on the "pending work". But the
Setup object was starting the load before incrementing this reference count.
It the loader happened to load anything on the background thread before the
main thread could increment the reference count, GN would think there is no
work left and exit early.

There is another possible race condition between defining items and doing
work on the thread pool can cause the build to be marked complete before
everything is actually done. The race condition is described in more detail
in the added comment in setup.cc.

BUG= 674213 

Review-Url: https://codereview.chromium.org/2617253002
Cr-Commit-Position: refs/heads/master@{#442416}

[modify] https://crrev.com/717182cd794aae4577fbae651a88e3ba6d973dc8/tools/gn/setup.cc

That CL seems to fix the issue for me. I guess we can now un-revert the speculative declare_args change that was reverted earlier.
Status: Fixed (was: Started)
This is rolled in in

https://codereview.chromium.org/2635713002
https://chromium.googlesource.com/chromium/src/+/2592262cc9dc8ea6cbe00a69a9b25b37970dae13
Cr-Commit-Position: refs/heads/master@{#443805}

and hopefully fixed.

Sign in to add a comment