New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 595798 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner: ----
Closed: May 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: ----



Sign in to add a comment

compile occasionally hits 20 minute buildbot timeout on Android GN

Project Member Reported by hush@chromium.org, Mar 17 2016

Issue description

Build is broken:
compile

Revision range:
chromium 381687 : 381701

Failing builders:
Android GN: https://build.chromium.org/p/chromium.linux/builders/Android%20GN


This bot is taking unusually long to compile (about 2 hours) even when it succeeds. When it fails, the error message is:

command timed out: 1200 seconds without output, attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=4677.154063

 
Labels: -Pri-2 Infra-Android OS-Android Pri-1
+Infra-Android FYI, not sure if this is an infra issue or not.
note that this bot is building 'all' in contrast with most of the other builders on chromium.linux, and it runs tests in contrast with Android GN (dbg), so incremental compilation has more CLs to handle.
Summary: compile occasionally hits 20 minute buildbot timeout on Android GN (was: Build failure)
Cc: dpranke@chromium.org
Dirk: wdyt about removing 'all' from Android GN but leaving it on Android GN (dbg)?
Cc: primiano@chromium.org
We do build 'all' intentionally, because it catches breakages, so I'd be a bit reluctant to turn that off.

It seems more like there's been a significant regression; most compiles are closer to ~40 min, and I don't know why any compile would go for 20 min w/o output, unless we've recently turned on some sort of expensive link-time optimization?
FWIW, we're seeing a big spike in the 1 day median this week. 1 day 90th similarly spiked last week and this week.
Android_GN.png
63.2 KB View Download
The builds are pretty darn large:
--- /b/build/slave/Android_GN/build/src -----------------------------------------------------------
  102.6GiB [##########] /out                                                                                                                                 
    7.2GiB [          ] /third_party
    1.0GiB [          ] /build
  418.5MiB [          ] /chrome
  111.8MiB [          ] /v8
   79.9MiB [          ] /components
   70.5MiB [          ] /content
   61.8MiB [          ] /tools
   61.3MiB [          ] /media

We're not specifying anything for fastbuild. What does it default to?
depends: https://code.google.com/p/chromium/codesearch#chromium/src/build/config/compiler/compiler.gni&l=26

That (by itself) wouldn't explain the recent change, though -- this bot has been on mb with its current configuration for a while AFAICT.
We probably should be (and currently aren't) specifying the `minimal_symbols` config on those bots. I feel like we hit this in the past and I did fix this once, but maybe that change got lost ...

Comment 11 by ukai@chromium.org, Mar 18 2016

Cc: brettw@chromium.org krasin@chromium.org
 Issue 595226  has been merged into this issue.
 Issue 595962  has been merged into this issue.
Also, to be clear, it seems like this is causing a serious number of real failures:

I am fine with us changing the build target or doing whatever else we need to do to stop causing tree failures while we continue figuring out what's going on.

However, looking at, e.g.,

https://build.chromium.org/p/chromium.linux/builders/Android%20GN/builds/33366

that build timed out after 57 minutes, and it's clearly in the middle of a build. Something weird is going on ...

Comment 14 by ukai@chromium.org, Mar 18 2016

setting GYP_LINK_CONCURRENCY to 1 might help (if it is caused by memory threshing) ?

Comment 15 by ukai@chromium.org, Mar 18 2016

(not sure gn is using build/toolchain/get_concurrent_links.py to determine link_pool's depth, though...)

Comment 16 by ukai@chromium.org, Mar 18 2016

hmm. in gn, concurrent_links value in toolchain, which is generated by executing get_concurrent_links.py by default?

seems get_concurrent_links have --lto option, used by gn's is_lto value?
Happened again now and closed the tree:

[17439/19540] STAMP obj/android_webview/test/android_webview_test_apk__apk_dist_ijar.stamp
[17440/19540] STAMP obj/android_webview/test/android_webview_test_apk.stamp
[17441/19540] STAMP obj/android_webview/test/android_webview_test_apk_incremental.stamp
[17442/19540] STAMP obj/android_webview/test/test.stamp


--------------------------------------------------------------------------------
started: Fri Mar 18 06:11:23 2016
ended: Fri Mar 18 07:11:50 2016
duration: 1 hrs, 26 secs
status: FAILURE
status reason: return code was -1.

Just abruptly dies in the middle of building.
dropping the 'all' target and adding minimal_symbols explicitly.
Project Member

Comment 19 by bugdroid1@chromium.org, Mar 18 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/210a55755d89d07f60f09645960bbf9dfd1d67b0

commit 210a55755d89d07f60f09645960bbf9dfd1d67b0
Author: jbudorick <jbudorick@chromium.org>
Date: Fri Mar 18 16:28:53 2016

[Android] Remove all target from and add minimal_symbols to Android GN.

BUG= 595798 
TBR=dpranke@chromium.org

Review URL: https://codereview.chromium.org/1807013005

Cr-Commit-Position: refs/heads/master@{#381987}

[modify] https://crrev.com/210a55755d89d07f60f09645960bbf9dfd1d67b0/testing/buildbot/chromium.linux.json
[modify] https://crrev.com/210a55755d89d07f60f09645960bbf9dfd1d67b0/tools/mb/mb_config.pyl

Components: Infra>Client>Android
Labels: -Infra-Android
Status: Fixed (was: Available)
Looks like john dropped the all target, and I think this problem went away. 

Compile now takes ~20 min.
(also we want to turn this bot down now)

Sign in to add a comment