Flaky slow compile causes android_optional_gpu_tests_rel timeout |
|||||||||||
Issue descriptionIn https://codereview.chromium.org/2498873005: https://build.chromium.org/p/tryserver.chromium.android/builders/android_optional_gpu_tests_rel/builds/1011 failed, but https://build.chromium.org/p/tryserver.chromium.android/builders/android_optional_gpu_tests_rel/builds/1013 succeeded. The error is in the end of trace_test: (INFO) 2016-11-14 17:40:36,202 atexit_with_log._wrapped_function:10 Try running <bound method ChromeTracingAgent._RemoveTraceConfigFile of <telemetry.internal.platform.tracing_agent.chrome_tracing_agent.ChromeTracingAgent object at 0x7f1dca1815d0>> (INFO) 2016-11-14 17:40:36,202 atexit_with_log._wrapped_function:12 Did run <bound method ChromeTracingAgent._RemoveTraceConfigFile of <telemetry.internal.platform.tracing_agent.chrome_tracing_agent.ChromeTracingAgent object at 0x7f1dca1815d0>> (INFO) 2016-11-14 17:40:36,202 atexit_with_log._wrapped_function:10 Try running <bound method AndroidForwarder.Close of <telemetry.internal.forwarders.android_forwarder.AndroidForwarder object at 0x7f1dca173950>> (INFO) 2016-11-14 17:40:36,202 atexit_with_log._wrapped_function:12 Did run <bound method AndroidForwarder.Close of <telemetry.internal.forwarders.android_forwarder.AndroidForwarder object at 0x7f1dca173950>> (INFO) 2016-11-14 17:40:36,202 atexit_with_log._wrapped_function:10 Try running <bound method TsProxyServer.StopServer of <telemetry.internal.util.ts_proxy_server.TsProxyServer object at 0x7f1dca173710>> (WARNING) 2016-11-14 17:40:36,202 ts_proxy_server.StopServer:126 Attempting to stop TsProxy server that is not running. Repeated many times. After which: command timed out: 6900 seconds elapsed, attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=6900.012828
,
Nov 15 2016
trace_test is marked as succeeded, but next test, webgl_ conformance_tests is marked as failed
,
Nov 15 2016
The swarming link I gave in #1 is about webgl_conformance_tests. The "exit code" is 0.
,
Nov 15 2016
,
Nov 15 2016
This just happened again. https://build.chromium.org/p/tryserver.chromium.android/builders/android_optional_gpu_tests_rel/builds/1023 I think TsProxy is a red herring, and what happens is that the whole bot times out. Successful run is 1 hrs, 28 mins, 28 secs, while both 1011 and 1023 terminated after 1 hrs, 55 mins, 0 secs. There is also the "command timed out: 6900 seconds elapsed, attempting to kill" message. I think what stalls it is compilation. In successful 1013 it takes 33 mins, 19 secs, while in 1011 it's 1 hrs, 32 mins, 25 secs and in 1023 it's 1 hrs, 19 mins, 21 secs. Maybe we should increase bot timeout time while we figure out while compilation is slow sometimes?
,
Nov 15 2016
I agree with increasing bot timeout for now. Do we have monitoring for compile time?
,
Nov 17 2016
This is starting to be quite frequent: https://build.chromium.org/p/tryserver.chromium.android/builders/android_optional_gpu_tests_rel/builds/1037 https://build.chromium.org/p/tryserver.chromium.android/builders/android_optional_gpu_tests_rel/builds/1049 Ken, do you know how to increase bot timeout?
,
Nov 17 2016
I'm not sure where this timeout (apparently of 1 hour 55 minutes, or possibly, ~2 hours) is specified. I grepped through the sources in tools/build/scripts/slave and didn't find anything obvious. bpastene@ / stip@ / phajdan.jr@, do you know?
,
Nov 17 2016
It's configured here: https://chromium.googlesource.com/chromium/tools/build/+/84cafd0fe7a6ec51a0d4f81fe928adc6e828c610/masters/master.tryserver.chromium.android/builders.pyl#201 Note that changing it would require a master restart.
,
Nov 17 2016
As for compile time monitoring, you can see how compile has jumped a bit in duration this week in the per-step graphs for this bot: https://viceroy.corp.google.com/chrome_infra/Buildbot/per_builder?builder=android_optional_gpu_tests_rel&duration=30d&master=master.tryserver.chromium.android&refresh=-1
,
Nov 17 2016
Thanks Ben for the pointer. Yuly, would you like to put up a CL doubling the timeout for this bot, assuming we won't be hogging Swarming resources for that long?
,
Nov 17 2016
,
Dec 5 2016
Issue 671377 has been merged into this issue.
,
Dec 5 2016
As it happens this is one of the last sources of flakiness visible affecting Issue 596622 . Bumping to P2.
,
Dec 6 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build.git/+/13fbeebd3f1ee6e2618ed679c8900dce0ea7b085 commit 13fbeebd3f1ee6e2618ed679c8900dce0ea7b085 Author: Yuly Novikov <ynovikov@google.com> Date: Mon Dec 05 23:12:55 2016 Increase android_optional_gpu_tests_rel builder timeout. This bot's compile step takes 15 minutes on average, and in these cases the bot finishes in about 1 hour. Sometimes a CL will make goma's cache irrelevant, and compile time increases approximately by 1 hour, causing bot to time out. Thus, increasing bot timeout from 2 hours to 3 hours. BUG= chromium:665492 Change-Id: I476fcc3b812a17092fac13cd46fa9792d4d50c01 Reviewed-on: https://chromium-review.googlesource.com/416198 Reviewed-by: Kenneth Russell <kbr@chromium.org> Reviewed-by: John Budorick <jbudorick@chromium.org> Commit-Queue: Yuly Novikov <ynovikov@chromium.org> [modify] https://crrev.com/13fbeebd3f1ee6e2618ed679c8900dce0ea7b085/masters/master.tryserver.chromium.android/builders.pyl
,
Dec 6 2016
Restart of tryserver.chromium.android scheduled for ~7 PM Pacific time today per https://chrome-internal-review.googlesource.com/309605 .
,
Dec 6 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/master-manager.git/+/0f6b668644c982de438a702d36c31532028877f5 commit 0f6b668644c982de438a702d36c31532028877f5 Author: Kenneth Russell <kbr@chromium.org> Date: Tue Dec 06 00:11:51 2016
,
Dec 6 2016
,
Dec 6 2016
I think the master restart took effect but am not 100% sure how to tell. I don't think there were any active builds on android_optional_gpu_tests_rel so there were none which reported "slave lost": https://build.chromium.org/p/tryserver.chromium.android/builders/android_optional_gpu_tests_rel There were several on the main N5X tryserver: https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_swarming_rel?numbuilds=200 Yuly, would you be willing to watch android_optional_gpu_tests_rel for another day or two and see if this problem happens again? Or, feel free to close this as fixed and please watch chromium-try-flakes (the link is in Issue 596622 , but for reference it's this): https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiR3ZWJnbF9jb25mb3JtYW5jZV90ZXN0cyAod2l0aCBwYXRjaCkM and see if android_optional_gpu_tests_rel shows up again. Thanks.
,
Dec 6 2016
Sure, I'll watch it.
,
Dec 19 2016
,
Dec 19 2016
,
Dec 19 2016
I think this is fixed, but for some reason we didn't have slow compiles last week, so the bot finishes in about an hour.
,
Dec 19 2016
Thanks Yuly for following up. Let's close this to keep the bug list manageable, and re-open if it seems necessary. |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by nedngu...@google.com
, Nov 15 2016