CanaryCompletion failing due to GoB flakiness |
|||||||
Issue description
As previously observed in issue 800949, I'm seeing CanaryCompletion failures like this:
00:48:25: INFO: Updating manifest-versions checkout.
[1;31m00:50:20: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m00:50:20: ERROR: Retrying to update the status: Retry 1/20[0m
00:50:20: INFO: Updating manifest-versions checkout.
[1;31m00:50:58: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m00:50:58: ERROR: Retrying to update the status: Retry 2/20[0m
00:50:58: INFO: Updating manifest-versions checkout.
[1;31m00:51:36: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m00:51:36: ERROR: Retrying to update the status: Retry 3/20[0m
00:51:36: INFO: Updating manifest-versions checkout.
[1;31m00:52:14: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m00:52:14: ERROR: Retrying to update the status: Retry 4/20[0m
00:52:14: INFO: Updating manifest-versions checkout.
[1;31m00:52:52: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m00:52:52: ERROR: Retrying to update the status: Retry 5/20[0m
00:52:52: INFO: Updating manifest-versions checkout.
[1;31m00:53:30: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m00:53:30: ERROR: Retrying to update the status: Retry 6/20[0m
00:53:30: INFO: Updating manifest-versions checkout.
[1;31m00:54:07: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m00:54:07: ERROR: Retrying to update the status: Retry 7/20[0m
00:54:07: INFO: Updating manifest-versions checkout.
[1;31m00:54:45: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m00:54:45: ERROR: Retrying to update the status: Retry 8/20[0m
00:54:45: INFO: Updating manifest-versions checkout.
[1;31m00:55:23: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m00:55:23: ERROR: Retrying to update the status: Retry 9/20[0m
00:55:23: INFO: Updating manifest-versions checkout.
[1;31m00:56:01: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m00:56:01: ERROR: Retrying to update the status: Retry 10/20[0m
00:56:01: INFO: Updating manifest-versions checkout.
[1;31m00:56:40: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m00:56:40: ERROR: Retrying to update the status: Retry 11/20[0m
00:56:40: INFO: Updating manifest-versions checkout.
[1;31m00:57:19: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m00:57:19: ERROR: Retrying to update the status: Retry 12/20[0m
00:57:19: INFO: Updating manifest-versions checkout.
[1;31m00:57:56: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m00:57:56: ERROR: Retrying to update the status: Retry 13/20[0m
00:57:57: INFO: Updating manifest-versions checkout.
[1;31m00:58:34: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m00:58:34: ERROR: Retrying to update the status: Retry 14/20[0m
00:58:34: INFO: Updating manifest-versions checkout.
[1;31m00:59:12: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m00:59:12: ERROR: Retrying to update the status: Retry 15/20[0m
00:59:12: INFO: Updating manifest-versions checkout.
[1;31m00:59:49: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m00:59:49: ERROR: Retrying to update the status: Retry 16/20[0m
00:59:49: INFO: Updating manifest-versions checkout.
[1;31m01:00:27: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m01:00:27: ERROR: Retrying to update the status: Retry 17/20[0m
01:00:27: INFO: Updating manifest-versions checkout.
[1;31m01:01:04: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m01:01:04: ERROR: Retrying to update the status: Retry 18/20[0m
01:01:04: INFO: Updating manifest-versions checkout.
[1;31m01:01:42: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m01:01:42: ERROR: Retrying to update the status: Retry 19/20[0m
01:01:42: INFO: Updating manifest-versions checkout.
[1;31m01:02:19: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m01:02:19: ERROR: Retrying to update the status: Retry 20/20[0m
01:02:19: INFO: Updating manifest-versions checkout.
[1;31m01:02:57: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal[0m
[1;31m01:02:57: ERROR: Retrying to update the status: Retry 21/20[0m
01:02:57: INFO: Updating manifest-versions checkout.
[1;31m01:03:18: ERROR: Traceback (most recent call last):
File "/b/c/cbuild/repository/chromite/cbuildbot/stages/generic_stages.py", line 701, in Run
self.PerformStage()
File "/b/c/cbuild/repository/chromite/cbuildbot/stages/completion_stages.py", line 264, in PerformStage
super(MasterSlaveSyncCompletionStage, self).PerformStage()
File "/b/c/cbuild/repository/chromite/cbuildbot/stages/completion_stages.py", line 84, in PerformStage
message=self.message)
File "/b/c/cbuild/repository/chromite/cbuildbot/manifest_version.py", line 780, in UpdateStatus
raise StatusUpdateException(last_error)
StatusUpdateException: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
[0m
Note that these errors don't look like the expected flakes due to concurrent commit attempts from multiple builders, but something else. Too bad that we don't see any diagnostics / error codes from git.
This affects multiple builders, here are the runs I could find from the red blobs I'm seeing on the waterfall:
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fauron_yuna-release%2F2068%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fchell-release%2F2048%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Felm-release%2F2083%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fenguarde-release%2F2213%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fnami-release%2F345%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Forco-release%2F2086%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fscarlet-release%2F1237%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fveyron_minnie-release%2F2085%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
Adding people on previous bug to CC.
I'll watch the situation to see whether this persists.
,
Apr 18 2018
See this again: https://luci-logdog.appspot.com/v/?s=chromeos/bb/chromeos/eve-arcnext-release/510/+/recipes/steps/CanaryCompletion/0/stdout https://luci-logdog.appspot.com/v/?s=chromeos/bb/chromeos/eve-arcnext-release/510/+/recipes/steps/CanaryCompletion/0/stdout https://luci-logdog.appspot.com/v/?s=chromeos/bb/chromeos/nautilus-release/545/+/recipes/steps/CanaryCompletion/0/stdout ...
,
Apr 18 2018
This looks like another example of Git-on-Borg not meeting its SLO's. We should probably have better logging in this section of the CBuildBot code. Mike recently touched this code; Mike can you improve the error logging here a little bit so we have something to go to the GoB team with?
,
Apr 18 2018
I can look to add some timing to have a better idea of what is going on. The challenge may be that we recently removed the git retry logic, as it is being handled by the git wrapper itself, which may be the better option for logging the timings. I look into both options for the best approach. -- Mike
,
Apr 26 2018
,
Apr 26 2018
,
Apr 26 2018
,
May 11 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/7657336de212e6dba55511f3f07457f9add6e3e3 commit 7657336de212e6dba55511f3f07457f9add6e3e3 Author: Mike Nichols <mikenichols@chromium.org> Date: Fri May 11 19:10:53 2018 [chromite] Capture job_name and git timing for ts_mon in cbuildbot runs. Simple overload of existing configuration to provide a means to capture the builder config as part of the metric schema. It is easy to overload the schema but using the Task schema limits options for capturing more fields. Will need to examine options for passing metric fields to the lower level calls. Wraps RunGit command in SecondsTimer metric to collect overall durations of git calls in metric distributions. BUG= chromium:829289 TEST=`./cbuildbot/run_tests` PASS Change-Id: I23f0fd6e8f4870dffdb87a06b4384f351d93e6c4 Reviewed-on: https://chromium-review.googlesource.com/1050954 Commit-Ready: Mike Nichols <mikenichols@chromium.org> Tested-by: Mike Nichols <mikenichols@chromium.org> Reviewed-by: Jason Clinton <jclinton@chromium.org> [modify] https://crrev.com/7657336de212e6dba55511f3f07457f9add6e3e3/lib/ts_mon_config.py [modify] https://crrev.com/7657336de212e6dba55511f3f07457f9add6e3e3/scripts/cbuildbot.py [modify] https://crrev.com/7657336de212e6dba55511f3f07457f9add6e3e3/scripts/cbuildbot_launch.py [modify] https://crrev.com/7657336de212e6dba55511f3f07457f9add6e3e3/lib/git.py
,
May 11 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f812c075051dd302e9bae4572715bce9a78f8bd9 commit f812c075051dd302e9bae4572715bce9a78f8bd9 Author: chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Fri May 11 21:16:45 2018 Roll src/third_party/chromite/ 746c45d5d..7657336de (3 commits) https://chromium.googlesource.com/chromiumos/chromite.git/+log/746c45d5d8b1..7657336de212 $ git log 746c45d5d..7657336de --date=short --no-merges --format='%ad %ae %s' 2018-05-08 mikenichols [chromite] Capture job_name and git timing for ts_mon in cbuildbot runs. 2018-05-10 khmel arc: Retry copying resource in case race condition for android uprev 2018-05-09 achuith cros_run_vm_test: Support for --host-cmd Created with: roll-dep src/third_party/chromite BUG= chromium:829289 , chromium:841937 , chromium:832374 The AutoRoll server is located here: https://chromite-chromium-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. TBR=chrome-os-gardeners@chromium.org Change-Id: I7712f93eb43ad2661af0fe11a71a9b228ce916ad Reviewed-on: https://chromium-review.googlesource.com/1055816 Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#558020} [modify] https://crrev.com/f812c075051dd302e9bae4572715bce9a78f8bd9/DEPS
,
May 14 2018
,
May 15 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/c8e4e795ed7527807910153dea581ee6ad9a2698 commit c8e4e795ed7527807910153dea581ee6ad9a2698 Author: Ben Pastene <bpastene@chromium.org> Date: Tue May 15 04:29:56 2018 Revert "[chromite] Capture job_name and git timing for ts_mon in cbuildbot runs." This reverts commit 7657336de212e6dba55511f3f07457f9add6e3e3. Reason for revert: Suspected to have blown out our metric memory quota See https://bugs.chromium.org/p/chromium/issues/detail?id=842423 Original change's description: > [chromite] Capture job_name and git timing for ts_mon in cbuildbot runs. > > Simple overload of existing configuration to provide a means to capture > the builder config as part of the metric schema. It is easy to overload > the schema but using the Task schema limits options for capturing more > fields. Will need to examine options for passing metric fields to the > lower level calls. > > Wraps RunGit command in SecondsTimer metric to collect overall durations > of git calls in metric distributions. > > BUG= chromium:829289 > TEST=`./cbuildbot/run_tests` PASS > > Change-Id: I23f0fd6e8f4870dffdb87a06b4384f351d93e6c4 > Reviewed-on: https://chromium-review.googlesource.com/1050954 > Commit-Ready: Mike Nichols <mikenichols@chromium.org> > Tested-by: Mike Nichols <mikenichols@chromium.org> > Reviewed-by: Jason Clinton <jclinton@chromium.org> Bug: chromium:829289 Change-Id: I23497c75a6803b7274d9a40e5584c43aa3bf0616 Reviewed-on: https://chromium-review.googlesource.com/1058207 Commit-Ready: Ben Pastene <bpastene@chromium.org> Tested-by: Ben Pastene <bpastene@chromium.org> Reviewed-by: Jason Clinton <jclinton@chromium.org> Reviewed-by: Mike Nichols <mikenichols@chromium.org> [modify] https://crrev.com/c8e4e795ed7527807910153dea581ee6ad9a2698/lib/ts_mon_config.py [modify] https://crrev.com/c8e4e795ed7527807910153dea581ee6ad9a2698/scripts/cbuildbot.py [modify] https://crrev.com/c8e4e795ed7527807910153dea581ee6ad9a2698/scripts/cbuildbot_launch.py [modify] https://crrev.com/c8e4e795ed7527807910153dea581ee6ad9a2698/lib/git.py
,
May 15 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0295b45b3351d71cf8366784d3c0762bbf870e2d commit 0295b45b3351d71cf8366784d3c0762bbf870e2d Author: chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Tue May 15 07:01:20 2018 Roll src/third_party/chromite/ 7e9e7b065..c8e4e795e (1 commit) https://chromium.googlesource.com/chromiumos/chromite.git/+log/7e9e7b065a98..c8e4e795ed75 $ git log 7e9e7b065..c8e4e795e --date=short --no-merges --format='%ad %ae %s' 2018-05-14 bpastene Revert "[chromite] Capture job_name and git timing for ts_mon in cbuildbot runs." Created with: roll-dep src/third_party/chromite BUG= chromium:829289 The AutoRoll server is located here: https://chromite-chromium-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. TBR=chrome-os-gardeners@chromium.org Change-Id: Ifcc9ffe22c45cf58facf0522b1e9604c3638b9b9 Reviewed-on: https://chromium-review.googlesource.com/1058980 Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#558632} [modify] https://crrev.com/0295b45b3351d71cf8366784d3c0762bbf870e2d/DEPS
,
Jun 6 2018
This is a transient issue that is related to Git-on-Borg flakiness. We've moved to a git wrapper with retries capabilities but even at 20 retries we are not receiving a valid response from Git. The addition of some timing metrics may help identify build issues but it is part of a large project to improve monitoring and it is not something that can be done in a short amount of time; for example, durations today are captured as a distribution which means outliers are naturally lost in the rollup, creating a need for absolute timings. Rather than hold this bug open indefinitely, I've opened up http://crbug.com/850217 to track the request for further visibility. -- Mike
,
Jun 6 2018
We've also escalated to the GoB team: https://b.corp.google.com/issues/68258327 |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by akes...@chromium.org
, Apr 5 2018Components: -Infra>Client>ChromeOS Infra>Client>ChromeOS>CI