New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 590943 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner: ----
Closed: May 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Intermittent goma failure on multiple tryservers

Project Member Reported by kbr@chromium.org, Mar 1 2016

Issue description

https://build.chromium.org/p/tryserver.chromium.android/builders/cast_shell_android/builds/29154

/usr/bin/python /b/build/goma/goma_ctl.py ensure_start
creating crash dump dir (/tmp/goma_crash.chrome-bot).
17512
Using goma VERSION=99 (no_auto_update)
Traceback (most recent call last):
GOMA version b1b0d518a1273b5ae920bd4805b899acc23a9e15@1455784174
  File "/b/build/goma/goma_ctl.py", line 2292, in <module>
waiting for compiler_proxy...
waiting for compiler_proxy...
waiting for compiler_proxy...
waiting for compiler_proxy...
waiting for compiler_proxy...
waiting for compiler_proxy...
waiting for compiler_proxy...
waiting for compiler_proxy...
waiting for compiler_proxy...
waiting for compiler_proxy...
waiting for compiler_proxy...
    sys.exit(main())
  File "/b/build/goma/goma_ctl.py", line 2287, in main
compiler proxy (pid=17512,17497) status: http://127.0.0.1:8088 error: timed out to send request to backend servers
    goma.Dispatch(sys.argv[1:])
  File "/b/build/goma/goma_ctl.py", line 1009, in Dispatch
    self._action_mappings.get(args[0], self._DefaultAction)()
  File "/b/build/goma/goma_ctl.py", line 645, in _EnsureStartCompilerProxy
    self._GenericStartCompilerProxy(ensure=True)
  File "/b/build/goma/goma_ctl.py", line 639, in _GenericStartCompilerProxy
    raise Error('Failed to start compiler_proxy successfully.')
__main__.Error: Failed to start compiler_proxy successfully.

/usr/bin/python /b/build/goma/goma_ctl.py jsonstatus /tmp/tmpGd6VlI.json

/usr/bin/python /b/build/goma/goma_ctl.py stop
Killing compiler proxy.
compiler proxy status: http://127.0.0.1:8088 quit!

/b/build/scripts/slave/gsutil cp file:///tmp/tmpS0pqMg gs://chrome-goma-log/2016/03/01/slave106-c4/compiler_proxy.slave106-c4.chrome-bot.log.INFO.20160229-165705.17512.gz
Copying file:///tmp/tmpS0pqMg [Content-Type=application/octet-stream]...
Uploading   ...-c4.chrome-bot.log.INFO.20160229-165705.17512.gz: 0 B/73.26 KiB    
Uploading   ...-c4.chrome-bot.log.INFO.20160229-165705.17512.gz: 72 KiB/73.26 KiB    
Uploading   ...-c4.chrome-bot.log.INFO.20160229-165705.17512.gz: 73.26 KiB/73.26 KiB    
Copied log file to gs://chrome-goma-log/2016/03/01/slave106-c4/compiler_proxy.slave106-c4.chrome-bot.log.INFO.20160229-165705.17512.gz
Visualization at http://chromium-build-stats.appspot.com/compiler_proxy_log/2016/03/01/slave106-c4/compiler_proxy.slave106-c4.chrome-bot.log.INFO.20160229-165705.17512.gz

/usr/bin/python /opt/infra-python/run.py infra.tools.send_monitoring_event --event-mon-run-type prod --build-event-type BUILD --event-mon-timestamp-kind POINT --event-logrequest-path /b/build/slave/cast_shell_android/build/.recipe_runtime/tmpIRVar6/build_data/log_request_proto --build-event-goma-stats-path /b/build/slave/cast_shell_android/build/.recipe_runtime/tmpIRVar6/build_data/goma_stats_proto
error: failed to start goma; fallback has been disabled
Traceback (most recent call last):
  File "/b/build/scripts/slave/compile.py", line 1317, in <module>
    sys.exit(real_main())
  File "/b/build/scripts/slave/compile.py", line 1313, in real_main
    return main(options, args)
  File "/b/build/scripts/slave/compile.py", line 841, in main_ninja
    goma_ready = goma_setup(options, env)
  File "/b/build/scripts/slave/compile.py", line 200, in goma_setup
    raise Exception('failed to start goma')
Exception: failed to start goma
step returned non-zero exit code: 1
@@@STEP_LOG_LINE@json.output@{@@@
@@@STEP_LOG_LINE@json.output@  "notice": [@@@
@@@STEP_LOG_LINE@json.output@    {@@@
@@@STEP_LOG_LINE@json.output@      "infra_status": {@@@
@@@STEP_LOG_LINE@json.output@        "num_compiler_info_fail": 0, @@@
@@@STEP_LOG_LINE@json.output@        "num_compiler_info_miss": 0, @@@
@@@STEP_LOG_LINE@json.output@        "num_exec_compiler_proxy_failure": 0, @@@
@@@STEP_LOG_LINE@json.output@        "num_exec_fail_fallback": 0, @@@
@@@STEP_LOG_LINE@json.output@        "num_http_active": 0, @@@
@@@STEP_LOG_LINE@json.output@        "num_http_error": 0, @@@
@@@STEP_LOG_LINE@json.output@        "num_http_retry": 0, @@@
@@@STEP_LOG_LINE@json.output@        "num_http_sent": 3, @@@
@@@STEP_LOG_LINE@json.output@        "num_http_timeout": 0, @@@
@@@STEP_LOG_LINE@json.output@        "num_network_error": 0, @@@
@@@STEP_LOG_LINE@json.output@        "num_network_recovered": 0, @@@
@@@STEP_LOG_LINE@json.output@        "num_user_error": 0, @@@
@@@STEP_LOG_LINE@json.output@        "num_user_warning": 0, @@@
@@@STEP_LOG_LINE@json.output@        "ping_status_code": 408@@@
@@@STEP_LOG_LINE@json.output@      }, @@@
@@@STEP_LOG_LINE@json.output@      "version": 1@@@
@@@STEP_LOG_LINE@json.output@    }@@@
@@@STEP_LOG_LINE@json.output@  ]@@@
@@@STEP_LOG_LINE@json.output@}@@@
@@@STEP_LOG_END@json.output@@@
@@@STEP_EXCEPTION@@@


From https://build.chromium.org/p/tryserver.chromium.android/builders/cast_shell_android?numbuilds=200 it looks like there are several such failures. Needs investigation. CC'ing sheriffs. Not sure which label to use - there's no auto-complete for Sheriff-related labels.

 

Comment 1 by kbr@chromium.org, Mar 1 2016

Labels: Infra-CommitQueue

Comment 2 by kbr@chromium.org, Mar 1 2016

Summary: Intermittent goma failure on multiple tryservers (was: Intermittent goma failure on cast_shell_android tryserver)
Updating synopsis. Seen also on linux_chromium_asan_rel_ng:

https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_asan_rel_ng/builds/123653

Labels: -Infra-CommitQueue Infra-Goma
Cc: ukai@chromium.org shinyak@chromium.org yyanagisawa@chromium.org

Comment 5 by ukai@chromium.org, Mar 25 2016

I believe VERSION=102 handles this case more nicely?
is this still happening?
Not so much since version 102 release? https://goto.google.com/ytlnx

I feel it a bit slow to come up with this question but...
Which is preferred?
a. Goma client gives up and warns network/server error as soon as possible.
b. Goma client retries on network error as much as possible.

In other words, how long can people wait for goma set up?  When it is ok to give up?  Since goma is client/server application, it sometimes need to give up and warn if network and/or server has an issue.

In March 1st, it was:
give up in 10 seconds, 4 seconds retry interval, never use IP addresses with an issue.

Currently,
give up in 30 seconds, 10 seconds retry interval, avoid to use IP address with an issue until all IP addresses are used up, but may use IP address again.
(Actually, there is certain number of cases that takes 10 seconds https://goto.google.com/ocyno)
Project Member

Comment 7 by bugdroid1@chromium.org, Apr 4 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/goma/client/+/9fd3d9653daa3669e7e7c8e302097d0dc2b5a321

commit 9fd3d9653daa3669e7e7c8e302097d0dc2b5a321
Author: Yoshisato Yanagisawa <yyanagisawa@google.com>
Date: Fri Apr 01 01:09:51 2016

Comment 8 by benhenry@google.com, Apr 27 2016

Components: Infra>Goma
Labels: -Infra-Goma

Comment 9 by ukai@chromium.org, May 20 2016

still failing?

Comment 10 by kbr@chromium.org, May 20 2016

Not as far as I can tell. Shall we call this fixed by the above commit?

Comment 11 by ukai@chromium.org, May 20 2016

Status: Fixed (was: Untriaged)
let's mark as fixed.
if it happens again, please reopen or file new bug.

Sign in to add a comment