New issue
Advanced search Search tips
Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Blocking:
issue 842831



Sign in to add a comment
link

Issue 829289: CanaryCompletion failing due to GoB flakiness

Reported by mnissler@chromium.org, Apr 5 2018 Project Member

Issue description

As previously observed in issue 800949, I'm seeing CanaryCompletion failures like this:

00:48:25: INFO: Updating manifest-versions checkout.
00:50:20: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:50:20: ERROR: Retrying to update the status:  Retry 1/20
00:50:20: INFO: Updating manifest-versions checkout.
00:50:58: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:50:58: ERROR: Retrying to update the status:  Retry 2/20
00:50:58: INFO: Updating manifest-versions checkout.
00:51:36: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:51:36: ERROR: Retrying to update the status:  Retry 3/20
00:51:36: INFO: Updating manifest-versions checkout.
00:52:14: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:52:14: ERROR: Retrying to update the status:  Retry 4/20
00:52:14: INFO: Updating manifest-versions checkout.
00:52:52: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:52:52: ERROR: Retrying to update the status:  Retry 5/20
00:52:52: INFO: Updating manifest-versions checkout.
00:53:30: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:53:30: ERROR: Retrying to update the status:  Retry 6/20
00:53:30: INFO: Updating manifest-versions checkout.
00:54:07: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:54:07: ERROR: Retrying to update the status:  Retry 7/20
00:54:07: INFO: Updating manifest-versions checkout.
00:54:45: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:54:45: ERROR: Retrying to update the status:  Retry 8/20
00:54:45: INFO: Updating manifest-versions checkout.
00:55:23: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:55:23: ERROR: Retrying to update the status:  Retry 9/20
00:55:23: INFO: Updating manifest-versions checkout.
00:56:01: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:56:01: ERROR: Retrying to update the status:  Retry 10/20
00:56:01: INFO: Updating manifest-versions checkout.
00:56:40: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:56:40: ERROR: Retrying to update the status:  Retry 11/20
00:56:40: INFO: Updating manifest-versions checkout.
00:57:19: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:57:19: ERROR: Retrying to update the status:  Retry 12/20
00:57:19: INFO: Updating manifest-versions checkout.
00:57:56: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:57:56: ERROR: Retrying to update the status:  Retry 13/20
00:57:57: INFO: Updating manifest-versions checkout.
00:58:34: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:58:34: ERROR: Retrying to update the status:  Retry 14/20
00:58:34: INFO: Updating manifest-versions checkout.
00:59:12: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:59:12: ERROR: Retrying to update the status:  Retry 15/20
00:59:12: INFO: Updating manifest-versions checkout.
00:59:49: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
00:59:49: ERROR: Retrying to update the status:  Retry 16/20
00:59:49: INFO: Updating manifest-versions checkout.
01:00:27: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
01:00:27: ERROR: Retrying to update the status:  Retry 17/20
01:00:27: INFO: Updating manifest-versions checkout.
01:01:04: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
01:01:04: ERROR: Retrying to update the status:  Retry 18/20
01:01:04: INFO: Updating manifest-versions checkout.
01:01:42: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
01:01:42: ERROR: Retrying to update the status:  Retry 19/20
01:01:42: INFO: Updating manifest-versions checkout.
01:02:19: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
01:02:19: ERROR: Retrying to update the status:  Retry 20/20
01:02:19: INFO: Updating manifest-versions checkout.
01:02:57: ERROR: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal
01:02:57: ERROR: Retrying to update the status:  Retry 21/20
01:02:57: INFO: Updating manifest-versions checkout.
01:03:18: ERROR: Traceback (most recent call last):
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/generic_stages.py", line 701, in Run
    self.PerformStage()
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/completion_stages.py", line 264, in PerformStage
    super(MasterSlaveSyncCompletionStage, self).PerformStage()
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/completion_stages.py", line 84, in PerformStage
    message=self.message)
  File "/b/c/cbuild/repository/chromite/cbuildbot/manifest_version.py", line 780, in UpdateStatus
    raise StatusUpdateException(last_error)
StatusUpdateException: Failed to update the status for chell-release during remote command: cmd=['git', 'push', 'origin', 'temp_auto_checkin_branch:refs/heads/master'], cwd=/b/c/cbuild/repository/manifest-versions-internal



Note that these errors don't look like the expected flakes due to concurrent commit attempts from multiple builders, but something else. Too bad that we don't see any diagnostics / error codes from git.

This affects multiple builders, here are the runs I could find from the red blobs I'm seeing on the waterfall:

https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fauron_yuna-release%2F2068%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fchell-release%2F2048%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Felm-release%2F2083%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fenguarde-release%2F2213%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fnami-release%2F345%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Forco-release%2F2086%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fscarlet-release%2F1237%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fveyron_minnie-release%2F2085%2F%2B%2Frecipes%2Fsteps%2FCanaryCompletion%2F0%2Fstdout

Adding people on previous bug to CC.

I'll watch the situation to see whether this persists.
 

Comment 1 by akes...@chromium.org, Apr 5 2018

Cc: -akes...@chromium.org -jrbarnette@chromium.org -pprabhu@chromium.org jclinton@chromium.org
Components: -Infra>Client>ChromeOS Infra>Client>ChromeOS>CI

Comment 3 by jclinton@chromium.org, Apr 18 2018

Cc: mikenichols@chromium.org
Labels: -Pri-3 Pri-2
Owner: mikenichols@chromium.org
Status: Assigned (was: Unconfirmed)
This looks like another example of Git-on-Borg not meeting its SLO's. We should probably have better logging in this section of the CBuildBot code. Mike recently touched this code; Mike can you improve the error logging here a little bit so we have something to go to the GoB team with?

Comment 4 by mikenichols@chromium.org, Apr 18 2018

I can look to add some timing to have a better idea of what is going on.  The challenge may be that we recently removed the git retry logic, as it is being handled by the git wrapper itself, which may be the better option for logging the timings.  I look into both options for the best approach.

-- Mike

Comment 5 by mikenichols@chromium.org, Apr 26 2018

Status: Started (was: Assigned)

Comment 6 by jclinton@chromium.org, Apr 26 2018

Summary: CanaryCompletion failing do to GoB flakiness (was: CanaryCompletion failing on multiple builders)

Comment 7 by jclinton@chromium.org, Apr 26 2018

Summary: CanaryCompletion failing due to GoB flakiness (was: CanaryCompletion failing do to GoB flakiness)

Comment 8 by bugdroid1@chromium.org, May 11 2018

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/7657336de212e6dba55511f3f07457f9add6e3e3

commit 7657336de212e6dba55511f3f07457f9add6e3e3
Author: Mike Nichols <mikenichols@chromium.org>
Date: Fri May 11 19:10:53 2018

[chromite] Capture job_name and git timing for ts_mon in cbuildbot runs.

Simple overload of existing configuration to provide a means to capture
the builder config as part of the metric schema.  It is easy to overload
the schema but using the Task schema limits options for capturing more
fields.  Will need to examine options for passing metric fields to the
lower level calls.

Wraps RunGit command in SecondsTimer metric to collect overall durations
of git calls in metric distributions.

BUG= chromium:829289 
TEST=`./cbuildbot/run_tests` PASS

Change-Id: I23f0fd6e8f4870dffdb87a06b4384f351d93e6c4
Reviewed-on: https://chromium-review.googlesource.com/1050954
Commit-Ready: Mike Nichols <mikenichols@chromium.org>
Tested-by: Mike Nichols <mikenichols@chromium.org>
Reviewed-by: Jason Clinton <jclinton@chromium.org>

[modify] https://crrev.com/7657336de212e6dba55511f3f07457f9add6e3e3/lib/ts_mon_config.py
[modify] https://crrev.com/7657336de212e6dba55511f3f07457f9add6e3e3/scripts/cbuildbot.py
[modify] https://crrev.com/7657336de212e6dba55511f3f07457f9add6e3e3/scripts/cbuildbot_launch.py
[modify] https://crrev.com/7657336de212e6dba55511f3f07457f9add6e3e3/lib/git.py

Comment 9 by bugdroid1@chromium.org, May 11 2018

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f812c075051dd302e9bae4572715bce9a78f8bd9

commit f812c075051dd302e9bae4572715bce9a78f8bd9
Author: chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Fri May 11 21:16:45 2018

Roll src/third_party/chromite/ 746c45d5d..7657336de (3 commits)

https://chromium.googlesource.com/chromiumos/chromite.git/+log/746c45d5d8b1..7657336de212

$ git log 746c45d5d..7657336de --date=short --no-merges --format='%ad %ae %s'
2018-05-08 mikenichols [chromite] Capture job_name and git timing for ts_mon in cbuildbot runs.
2018-05-10 khmel arc: Retry copying resource in case race condition for android uprev
2018-05-09 achuith cros_run_vm_test: Support for --host-cmd

Created with:
  roll-dep src/third_party/chromite
BUG= chromium:829289 , chromium:841937 , chromium:832374 


The AutoRoll server is located here: https://chromite-chromium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


TBR=chrome-os-gardeners@chromium.org

Change-Id: I7712f93eb43ad2661af0fe11a71a9b228ce916ad
Reviewed-on: https://chromium-review.googlesource.com/1055816
Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#558020}
[modify] https://crrev.com/f812c075051dd302e9bae4572715bce9a78f8bd9/DEPS

Comment 10 by mikenichols@chromium.org, May 14 2018

Blocking: 842831

Comment 11 by bugdroid1@chromium.org, May 15 2018

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/c8e4e795ed7527807910153dea581ee6ad9a2698

commit c8e4e795ed7527807910153dea581ee6ad9a2698
Author: Ben Pastene <bpastene@chromium.org>
Date: Tue May 15 04:29:56 2018

Revert "[chromite] Capture job_name and git timing for ts_mon in cbuildbot runs."

This reverts commit 7657336de212e6dba55511f3f07457f9add6e3e3.

Reason for revert: Suspected to have blown out our metric memory quota
See https://bugs.chromium.org/p/chromium/issues/detail?id=842423

Original change's description:
> [chromite] Capture job_name and git timing for ts_mon in cbuildbot runs.
>
> Simple overload of existing configuration to provide a means to capture
> the builder config as part of the metric schema.  It is easy to overload
> the schema but using the Task schema limits options for capturing more
> fields.  Will need to examine options for passing metric fields to the
> lower level calls.
>
> Wraps RunGit command in SecondsTimer metric to collect overall durations
> of git calls in metric distributions.
>
> BUG= chromium:829289 
> TEST=`./cbuildbot/run_tests` PASS
>
> Change-Id: I23f0fd6e8f4870dffdb87a06b4384f351d93e6c4
> Reviewed-on: https://chromium-review.googlesource.com/1050954
> Commit-Ready: Mike Nichols <mikenichols@chromium.org>
> Tested-by: Mike Nichols <mikenichols@chromium.org>
> Reviewed-by: Jason Clinton <jclinton@chromium.org>

Bug:  chromium:829289 
Change-Id: I23497c75a6803b7274d9a40e5584c43aa3bf0616
Reviewed-on: https://chromium-review.googlesource.com/1058207
Commit-Ready: Ben Pastene <bpastene@chromium.org>
Tested-by: Ben Pastene <bpastene@chromium.org>
Reviewed-by: Jason Clinton <jclinton@chromium.org>
Reviewed-by: Mike Nichols <mikenichols@chromium.org>

[modify] https://crrev.com/c8e4e795ed7527807910153dea581ee6ad9a2698/lib/ts_mon_config.py
[modify] https://crrev.com/c8e4e795ed7527807910153dea581ee6ad9a2698/scripts/cbuildbot.py
[modify] https://crrev.com/c8e4e795ed7527807910153dea581ee6ad9a2698/scripts/cbuildbot_launch.py
[modify] https://crrev.com/c8e4e795ed7527807910153dea581ee6ad9a2698/lib/git.py

Comment 12 by bugdroid1@chromium.org, May 15 2018

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0295b45b3351d71cf8366784d3c0762bbf870e2d

commit 0295b45b3351d71cf8366784d3c0762bbf870e2d
Author: chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Tue May 15 07:01:20 2018

Roll src/third_party/chromite/ 7e9e7b065..c8e4e795e (1 commit)

https://chromium.googlesource.com/chromiumos/chromite.git/+log/7e9e7b065a98..c8e4e795ed75

$ git log 7e9e7b065..c8e4e795e --date=short --no-merges --format='%ad %ae %s'
2018-05-14 bpastene Revert "[chromite] Capture job_name and git timing for ts_mon in cbuildbot runs."

Created with:
  roll-dep src/third_party/chromite
BUG= chromium:829289 


The AutoRoll server is located here: https://chromite-chromium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


TBR=chrome-os-gardeners@chromium.org

Change-Id: Ifcc9ffe22c45cf58facf0522b1e9604c3638b9b9
Reviewed-on: https://chromium-review.googlesource.com/1058980
Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#558632}
[modify] https://crrev.com/0295b45b3351d71cf8366784d3c0762bbf870e2d/DEPS

Comment 13 by mikenichols@chromium.org, Jun 6 2018

Status: WontFix (was: Started)
This is a transient issue that is related to Git-on-Borg flakiness.  We've moved to a git wrapper with retries capabilities but even at 20 retries we are not receiving a valid response from Git.  

The addition of some timing metrics may help identify build issues but it is part of a large project to improve monitoring and it is not something that can be done in a short amount of time; for example, durations today are captured as a distribution which means outliers are naturally lost in the rollup, creating a need for absolute timings. 

Rather than hold this bug open indefinitely, I've opened up  http://crbug.com/850217  to track the request for further visibility.  

-- Mike

Comment 14 by jclinton@chromium.org, Jun 6 2018

We've also escalated to the GoB team: https://b.corp.google.com/issues/68258327

Sign in to add a comment