New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 665492 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug

Blocked on:
issue 596622



Sign in to add a comment

Flaky slow compile causes android_optional_gpu_tests_rel timeout

Project Member Reported by ynovikov@chromium.org, Nov 15 2016

Issue description

In https://codereview.chromium.org/2498873005:
https://build.chromium.org/p/tryserver.chromium.android/builders/android_optional_gpu_tests_rel/builds/1011 failed, but
https://build.chromium.org/p/tryserver.chromium.android/builders/android_optional_gpu_tests_rel/builds/1013 succeeded.

The error is in the end of trace_test:

(INFO) 2016-11-14 17:40:36,202 atexit_with_log._wrapped_function:10  Try running <bound method ChromeTracingAgent._RemoveTraceConfigFile of <telemetry.internal.platform.tracing_agent.chrome_tracing_agent.ChromeTracingAgent object at 0x7f1dca1815d0>>
(INFO) 2016-11-14 17:40:36,202 atexit_with_log._wrapped_function:12  Did run <bound method ChromeTracingAgent._RemoveTraceConfigFile of <telemetry.internal.platform.tracing_agent.chrome_tracing_agent.ChromeTracingAgent object at 0x7f1dca1815d0>>
(INFO) 2016-11-14 17:40:36,202 atexit_with_log._wrapped_function:10  Try running <bound method AndroidForwarder.Close of <telemetry.internal.forwarders.android_forwarder.AndroidForwarder object at 0x7f1dca173950>>
(INFO) 2016-11-14 17:40:36,202 atexit_with_log._wrapped_function:12  Did run <bound method AndroidForwarder.Close of <telemetry.internal.forwarders.android_forwarder.AndroidForwarder object at 0x7f1dca173950>>
(INFO) 2016-11-14 17:40:36,202 atexit_with_log._wrapped_function:10  Try running <bound method TsProxyServer.StopServer of <telemetry.internal.util.ts_proxy_server.TsProxyServer object at 0x7f1dca173710>>
(WARNING) 2016-11-14 17:40:36,202 ts_proxy_server.StopServer:126  Attempting to stop TsProxy server that is not running.

Repeated many times.
After which:
command timed out: 6900 seconds elapsed, attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=6900.012828

 
Components: Infra
Swarm log in https://chromium-swarm.appspot.com/task?id=327e727192744d10&refresh=10&show_raw=1 (Shard 0 link in https://build.chromium.org/p/tryserver.chromium.android/builders/android_optional_gpu_tests_rel/builds/1011) shows the test actually succeeds. I think this is an infra issue.
trace_test is marked as succeeded, but next test, webgl_ conformance_tests
is marked as failed
The swarming link I gave in #1 is about webgl_conformance_tests. The "exit code" is 0.

Comment 4 by kbr@chromium.org, Nov 15 2016

Components: -Infra Internals>GPU>Testing Infra>Client>Android
Summary: Flaky slow compile causes android_optional_gpu_tests_rel timeout (was: Flaky TsProxy failure on android_optional_gpu_tests_rel bot)
This just happened again.
https://build.chromium.org/p/tryserver.chromium.android/builders/android_optional_gpu_tests_rel/builds/1023
I think TsProxy is a red herring, and what happens is that the whole bot times out.
Successful run is 1 hrs, 28 mins, 28 secs, while both 1011 and 1023 terminated after 1 hrs, 55 mins, 0 secs.
There is also the "command timed out: 6900 seconds elapsed, attempting to kill" message.

I think what stalls it is compilation. In successful 1013 it takes 33 mins, 19 secs, while in 1011 it's 1 hrs, 32 mins, 25 secs and in 1023 it's 1 hrs, 19 mins, 21 secs.

Maybe we should increase bot timeout time while we figure out while compilation is slow sometimes?
I agree with increasing bot timeout for now. 
 
Do we have monitoring for compile time?

Comment 8 by kbr@chromium.org, Nov 17 2016

Cc: phajdan.jr@chromium.org stip@chromium.org bpastene@chromium.org
I'm not sure where this timeout (apparently of 1 hour 55 minutes, or possibly, ~2 hours) is specified. I grepped through the sources in tools/build/scripts/slave and didn't find anything obvious. bpastene@ / stip@ / phajdan.jr@, do you know?

As for compile time monitoring, you can see how compile has jumped a bit in duration this week in the per-step graphs for this bot:

https://viceroy.corp.google.com/chrome_infra/Buildbot/per_builder?builder=android_optional_gpu_tests_rel&duration=30d&master=master.tryserver.chromium.android&refresh=-1

Comment 11 by kbr@chromium.org, Nov 17 2016

Thanks Ben for the pointer.

Yuly, would you like to put up a CL doubling the timeout for this bot, assuming we won't be hogging Swarming resources for that long?

Comment 12 by kbr@chromium.org, Nov 17 2016

Owner: ynovikov@chromium.org
Status: Assigned (was: Untriaged)

Comment 13 by kbr@chromium.org, Dec 5 2016

Cc: jbudorick@chromium.org mar...@chromium.org ynovikov@chromium.org
 Issue 671377  has been merged into this issue.

Comment 14 by kbr@chromium.org, Dec 5 2016

Blocking: 596622
Labels: -Pri-3 Pri-2
As it happens this is one of the last sources of flakiness visible affecting  Issue 596622 . Bumping to P2.

Project Member

Comment 15 by bugdroid1@chromium.org, Dec 6 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build.git/+/13fbeebd3f1ee6e2618ed679c8900dce0ea7b085

commit 13fbeebd3f1ee6e2618ed679c8900dce0ea7b085
Author: Yuly Novikov <ynovikov@google.com>
Date: Mon Dec 05 23:12:55 2016

Increase android_optional_gpu_tests_rel builder timeout.

This bot's compile step takes 15 minutes on average,
and in these cases the bot finishes in about 1 hour.
Sometimes a CL will make goma's cache irrelevant,
and compile time increases approximately by 1 hour, causing bot to time out.
Thus, increasing bot timeout from 2 hours to 3 hours.

BUG= chromium:665492 

Change-Id: I476fcc3b812a17092fac13cd46fa9792d4d50c01
Reviewed-on: https://chromium-review.googlesource.com/416198
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Yuly Novikov <ynovikov@chromium.org>

[modify] https://crrev.com/13fbeebd3f1ee6e2618ed679c8900dce0ea7b085/masters/master.tryserver.chromium.android/builders.pyl

Comment 16 by kbr@chromium.org, Dec 6 2016

Restart of tryserver.chromium.android scheduled for ~7 PM Pacific time today per https://chrome-internal-review.googlesource.com/309605 .

Project Member

Comment 17 by bugdroid1@chromium.org, Dec 6 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/master-manager.git/+/0f6b668644c982de438a702d36c31532028877f5

commit 0f6b668644c982de438a702d36c31532028877f5
Author: Kenneth Russell <kbr@chromium.org>
Date: Tue Dec 06 00:11:51 2016

Comment 18 by kbr@chromium.org, Dec 6 2016

Status: Started (was: Assigned)

Comment 19 by kbr@chromium.org, Dec 6 2016

I think the master restart took effect but am not 100% sure how to tell. I don't think there were any active builds on android_optional_gpu_tests_rel so there were none which reported "slave lost":
https://build.chromium.org/p/tryserver.chromium.android/builders/android_optional_gpu_tests_rel

There were several on the main N5X tryserver:
https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_swarming_rel?numbuilds=200

Yuly, would you be willing to watch android_optional_gpu_tests_rel for another day or two and see if this problem happens again? Or, feel free to close this as fixed and please watch chromium-try-flakes (the link is in  Issue 596622 , but for reference it's this):
https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiR3ZWJnbF9jb25mb3JtYW5jZV90ZXN0cyAod2l0aCBwYXRjaCkM

and see if android_optional_gpu_tests_rel shows up again.

Thanks.

Sure, I'll watch it.

Comment 21 by kbr@chromium.org, Dec 19 2016

Blocking: -596622

Comment 22 by kbr@chromium.org, Dec 19 2016

Blockedon: 596622
I think this is fixed, but for some reason we didn't have slow compiles last week, so the bot finishes in about an hour.

Comment 24 by kbr@chromium.org, Dec 19 2016

Status: Fixed (was: Started)
Thanks Yuly for following up. Let's close this to keep the bug list manageable, and re-open if it seems necessary.

Sign in to add a comment