New issue
Advanced search Search tips

Issue 829289 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Jun 6
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Blocking:
issue 842831



Sign in to add a comment

CanaryCompletion failing due to GoB flakiness

Project Member Reported by mnissler@chromium.org, Apr 5 2018

Issue description

As previously observed in issue 800949, I'm seeing CanaryCompletion failures like this:

00:48:25: INFO: Updating manifest-versions checkout.
00:50:20: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:50:20: ERROR: Retrying to update the status:  Retry 1/20
00:50:20: INFO: Updating manifest-versions checkout.
00:50:58: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:50:58: ERROR: Retrying to update the status:  Retry 2/20
00:50:58: INFO: Updating manifest-versions checkout.
00:51:36: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:51:36: ERROR: Retrying to update the status:  Retry 3/20
00:51:36: INFO: Updating manifest-versions checkout.
00:52:14: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:52:14: ERROR: Retrying to update the status:  Retry 4/20
00:52:14: INFO: Updating manifest-versions checkout.
00:52:52: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:52:52: ERROR: Retrying to update the status:  Retry 5/20
00:52:52: INFO: Updating manifest-versions checkout.
00:53:30: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:53:30: ERROR: Retrying to update the status:  Retry 6/20
00:53:30: INFO: Updating manifest-versions checkout.
00:54:07: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:54:07: ERROR: Retrying to update the status:  Retry 7/20
00:54:07: INFO: Updating manifest-versions checkout.
00:54:45: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:54:45: ERROR: Retrying to update the status:  Retry 8/20
00:54:45: INFO: Updating manifest-versions checkout.
00:55:23: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:55:23: ERROR: Retrying to update the status:  Retry 9/20
00:55:23: INFO: Updating manifest-versions checkout.
00:56:01: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:56:01: ERROR: Retrying to update the status:  Retry 10/20
00:56:01: INFO: Updating manifest-versions checkout.
00:56:40: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:56:40: ERROR: Retrying to update the status:  Retry 11/20
00:56:40: INFO: Updating manifest-versions checkout.
00:57:19: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:57:19: ERROR: Retrying to update the status:  Retry 12/20
00:57:19: INFO: Updating manifest-versions checkout.
00:57:56: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:57:56: ERROR: Retrying to update the status:  Retry 13/20
00:57:57: INFO: Updating manifest-versions checkout.
00:58:34: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:58:34: ERROR: Retrying to update the status:  Retry 14/20
00:58:34: INFO: Updating manifest-versions checkout.
00:59:12: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:59:12: ERROR: Retrying to update the status:  Retry 15/20
00:59:12: INFO: Updating manifest-versions checkout.
00:59:49: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:59:49: ERROR: Retrying to update the status:  Retry 16/20
00:59:49: INFO: Updating manifest-versions checkout.
01:00:27: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
01:00:27: ERROR: Retrying to update the status:  Retry 17/20
01:00:27: INFO: Updating manifest-versions checkout.
01:01:04: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
01:01:04: ERROR: Retrying to update the status:  Retry 18/20
01:01:04: INFO: Updating manifest-versions checkout.
01:01:42: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
01:01:42: ERROR: Retrying to update the status:  Retry 19/20
01:01:42: INFO: Updating manifest-versions checkout.
01:02:19: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
01:02:19: ERROR: Retrying to update the status:  Retry 20/20
01:02:19: INFO: Updating manifest-versions checkout.
01:02:57: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
01:02:57: ERROR: Retrying to update the status:  Retry 21/20
01:02:57: INFO: Updating manifest-versions checkout.
01:03:18: ERROR: Traceback (most recent call last):
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/generic_stages.py", line 701, in Run
    self.PerformStage()
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/completion_stages.py", line 264, in PerformStage
    super(MasterSlaveSyncCompletionStage, self).PerformStage()
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/completion_stages.py", line 84, in PerformStage
    message=self.message)
  File "/b/c/cbuild/repository/chromite/cbuildbot/manifest_version.py", line 780, in UpdateStatus
    raise StatusUpdateException(last_error)
StatusUpdateException: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal



Note that these errors don't look like the expected flakes due to concurrent commit attempts from multiple builders, but something else. Too bad that we don't see any diagnostics / error codes from git.

This affects multiple builders, here are the runs I could find from the red blobs I'm seeing on the waterfall:

https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fauron_yuna-release%2F2068%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fchell-release%2F2048%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Felm-release%2F2083%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fenguarde-release%2F2213%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fnami-release%2F345%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Forco-release%2F2086%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fscarlet-release%2F1237%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fveyron_minnie-release%2F2085%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout

Adding people on previous bug to CC.

I'll watch the situation to see whether this persists.
 
Cc: -akes...@chromium.org -jrbarnette@chromium.org -pprabhu@chromium.org jclinton@chromium.org
Components: -Infra>Client>ChromeOS Infra>Client>ChromeOS>CI
Cc: mikenichols@chromium.org
Labels: -Pri-3 Pri-2
Owner: mikenichols@chromium.org
Status: Assigned (was: Unconfirmed)
This looks like another example of Git-on-Borg not meeting its SLO's. We should probably have better logging in this section of the CBuildBot code. Mike recently touched this code; Mike can you improve the error logging here a little bit so we have something to go to the GoB team with?
I can look to add some timing to have a better idea of what is going on.  The challenge may be that we recently removed the git retry logic, as it is being handled by the git wrapper itself, which may be the better option for logging the timings.  I look into both options for the best approach.

-- Mike
Status: Started (was: Assigned)
Summary: CanaryCompletion failing do to GoB flakiness (was: CanaryCompletion failing on multiple builders)
Summary: CanaryCompletion failing due to GoB flakiness (was: CanaryCompletion failing do to GoB flakiness)
Project Member

Comment 8 by bugdroid1@chromium.org, May 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/7657336de212e6dba55511f3f07457f9add6e3e3

commit 7657336de212e6dba55511f3f07457f9add6e3e3
Author: Mike Nichols <mikenichols@chromium.org>
Date: Fri May 11 19:10:53 2018

[chromite] Capture job_name and git timing for ts_mon in cbuildbot runs.

Simple overload of existing configuration to provide a means to capture
the builder config as part of the metric schema.  It is easy to overload
the schema but using the Task schema limits options for capturing more
fields.  Will need to examine options for passing metric fields to the
lower level calls.

Wraps RunGit command in SecondsTimer metric to collect overall durations
of git calls in metric distributions.

BUG= chromium:829289 
TEST=`./cbuildbot/run_tests` PASS

Change-Id: I23f0fd6e8f4870dffdb87a06b4384f351d93e6c4
Reviewed-on: https://chromium-review.googlesource.com/1050954
Commit-Ready: Mike Nichols <mikenichols@chromium.org>
Tested-by: Mike Nichols <mikenichols@chromium.org>
Reviewed-by: Jason Clinton <jclinton@chromium.org>

[modify] https://crrev.com/7657336de212e6dba55511f3f07457f9add6e3e3/lib/ts_mon_config.py
[modify] https://crrev.com/7657336de212e6dba55511f3f07457f9add6e3e3/scripts/cbuildbot.py
[modify] https://crrev.com/7657336de212e6dba55511f3f07457f9add6e3e3/scripts/cbuildbot_launch.py
[modify] https://crrev.com/7657336de212e6dba55511f3f07457f9add6e3e3/lib/git.py

Project Member

Comment 9 by bugdroid1@chromium.org, May 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f812c075051dd302e9bae4572715bce9a78f8bd9

commit f812c075051dd302e9bae4572715bce9a78f8bd9
Author: chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Fri May 11 21:16:45 2018

Roll src/third_party/chromite/ 746c45d5d..7657336de (3 commits)

https://chromium.googlesource.com/chromiumos/chromite.git/+log/746c45d5d8b1..7657336de212

$ git log 746c45d5d..7657336de --date=short --no-merges --format='%ad %ae %s'
2018-05-08 mikenichols [chromite] Capture job_name and git timing for ts_mon in cbuildbot runs.
2018-05-10 khmel arc: Retry copying resource in case race condition for android uprev
2018-05-09 achuith cros_run_vm_test: Support for --host-cmd

Created with:
  roll-dep src/third_party/chromite
BUG= chromium:829289 , chromium:841937 , chromium:832374 


The AutoRoll server is located here: https://chromite-chromium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


TBR=chrome-os-gardeners@chromium.org

Change-Id: I7712f93eb43ad2661af0fe11a71a9b228ce916ad
Reviewed-on: https://chromium-review.googlesource.com/1055816
Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#558020}
[modify] https://crrev.com/f812c075051dd302e9bae4572715bce9a78f8bd9/DEPS

Blocking: 842831
Project Member

Comment 11 by bugdroid1@chromium.org, May 15

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/c8e4e795ed7527807910153dea581ee6ad9a2698

commit c8e4e795ed7527807910153dea581ee6ad9a2698
Author: Ben Pastene <bpastene@chromium.org>
Date: Tue May 15 04:29:56 2018

Revert "[chromite] Capture job_name and git timing for ts_mon in cbuildbot runs."

This reverts commit 7657336de212e6dba55511f3f07457f9add6e3e3.

Reason for revert: Suspected to have blown out our metric memory quota
See https://bugs.chromium.org/p/chromium/issues/detail?id=842423

Original change's description:
> [chromite] Capture job_name and git timing for ts_mon in cbuildbot runs.
>
> Simple overload of existing configuration to provide a means to capture
> the builder config as part of the metric schema.  It is easy to overload
> the schema but using the Task schema limits options for capturing more
> fields.  Will need to examine options for passing metric fields to the
> lower level calls.
>
> Wraps RunGit command in SecondsTimer metric to collect overall durations
> of git calls in metric distributions.
>
> BUG= chromium:829289 
> TEST=`./cbuildbot/run_tests` PASS
>
> Change-Id: I23f0fd6e8f4870dffdb87a06b4384f351d93e6c4
> Reviewed-on: https://chromium-review.googlesource.com/1050954
> Commit-Ready: Mike Nichols <mikenichols@chromium.org>
> Tested-by: Mike Nichols <mikenichols@chromium.org>
> Reviewed-by: Jason Clinton <jclinton@chromium.org>

Bug:  chromium:829289 
Change-Id: I23497c75a6803b7274d9a40e5584c43aa3bf0616
Reviewed-on: https://chromium-review.googlesource.com/1058207
Commit-Ready: Ben Pastene <bpastene@chromium.org>
Tested-by: Ben Pastene <bpastene@chromium.org>
Reviewed-by: Jason Clinton <jclinton@chromium.org>
Reviewed-by: Mike Nichols <mikenichols@chromium.org>

[modify] https://crrev.com/c8e4e795ed7527807910153dea581ee6ad9a2698/lib/ts_mon_config.py
[modify] https://crrev.com/c8e4e795ed7527807910153dea581ee6ad9a2698/scripts/cbuildbot.py
[modify] https://crrev.com/c8e4e795ed7527807910153dea581ee6ad9a2698/scripts/cbuildbot_launch.py
[modify] https://crrev.com/c8e4e795ed7527807910153dea581ee6ad9a2698/lib/git.py

Project Member

Comment 12 by bugdroid1@chromium.org, May 15

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0295b45b3351d71cf8366784d3c0762bbf870e2d

commit 0295b45b3351d71cf8366784d3c0762bbf870e2d
Author: chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Tue May 15 07:01:20 2018

Roll src/third_party/chromite/ 7e9e7b065..c8e4e795e (1 commit)

https://chromium.googlesource.com/chromiumos/chromite.git/+log/7e9e7b065a98..c8e4e795ed75

$ git log 7e9e7b065..c8e4e795e --date=short --no-merges --format='%ad %ae %s'
2018-05-14 bpastene Revert "[chromite] Capture job_name and git timing for ts_mon in cbuildbot runs."

Created with:
  roll-dep src/third_party/chromite
BUG= chromium:829289 


The AutoRoll server is located here: https://chromite-chromium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


TBR=chrome-os-gardeners@chromium.org

Change-Id: Ifcc9ffe22c45cf58facf0522b1e9604c3638b9b9
Reviewed-on: https://chromium-review.googlesource.com/1058980
Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#558632}
[modify] https://crrev.com/0295b45b3351d71cf8366784d3c0762bbf870e2d/DEPS

Status: WontFix (was: Started)
This is a transient issue that is related to Git-on-Borg flakiness.  We've moved to a git wrapper with retries capabilities but even at 20 retries we are not receiving a valid response from Git.  

The addition of some timing metrics may help identify build issues but it is part of a large project to improve monitoring and it is not something that can be done in a short amount of time; for example, durations today are captured as a distribution which means outliers are naturally lost in the rollup, creating a need for absolute timings. 

Rather than hold this bug open indefinitely, I've opened up http://crbug.com/850217 to track the request for further visibility.  

-- Mike
We've also escalated to the GoB team: https://b.corp.google.com/issues/68258327

Sign in to add a comment