New issue
Advanced search Search tips

Issue 892733 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner: ----
Closed: Nov 6
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: ----



Sign in to add a comment

`git cl upload` fails with server RPC failures

Project Member Reported by c...@chromium.org, Oct 5

Issue description

I'm trying to upload a CL using `git cl upload`.  The upload is failing with this output:

error: RPC failed; HTTP 502 curl 22 The requested URL returned error: 502 Bad Gateway
fatal: the remote end hung up unexpectedly
fatal: the remote end hung up unexpectedly
Everything up-to-date

`git cl creds-check` says "No problems detected in your .gitcookies file."

Is this a problem local to my system or is there a server issue happening?
 
I ran `git cl upload` a second time which hung for 5+ mins after printing the change info and before it would show git uploading the commit to the server:

Presubmit checks passed.
 content/renderer/BUILD.gn                                                                                                           |  14 ----
 ...
 third_party/blink/renderer/platform/testing/testing_platform_support.h                                                              |   1 -
 61 files changed, 1682 insertions(+), 1070 deletions(-)
<hanging here, press CTRL-c after 5+ mins>
^C
Error after CL description prompt -- saving description to /usr/local/google/home/cmp/.git_cl_description_backup

Failed to create a change. Please examine output above for the reason of the failure.
Hint: run command below to diagnose common Git/Gerrit credential problems:
  git cl creds-check

Labels: Pri-1
Trying this again, still seeing issues.  Here's my current git cl upload ps output:

cmp@cmp:~/co/chromium/src$ ps auxwf|grep git
cmp       10795  0.0  0.0  19468  3540 pts/0    S+   14:26   0:00  |   \_ git cl upload
cmp       10796  0.5  0.0 140160 34272 pts/0    S+   14:26   0:01  |       \_ python /usr/local/google/home/cmp/co/depot_tools/git_cl.py upload
cmp       11593  0.0  0.1 402216 276500 pts/0   S+   14:27   0:00  |           \_ git push https://chromium.googlesource.com/chromium/src.git b534d91b4de02c5483ff1612055cdb5913b2fb32:refs/for/refs/heads/master%wip,m=Initial_upload,hashtag=indexeddb
cmp       11594  0.3  0.0 177952 17052 pts/0    S+   14:27   0:01  |               \_ /usr/lib/git-core/git-remote-https https://chromium.googlesource.com/chromium/src.git https://chromium.googlesource.com/chromium/src.git
cmp       11597  0.1  0.1 410440 276864 pts/0   Sl+  14:27   0:00  |                   \_ /usr/lib/git-core/git send-pack --stateless-rpc --helper-status --thin --no-progress https://chromium.googlesource.com/chromium/src.git/ --stdin
cmp       12737  0.0  0.0  14664   968 pts/1    S+   14:32   0:00  |   \_ grep --color=auto git

This shows the upload has taken ~6 mins so far.  There's no progress output in the git cl upload command.

Setting Pri-1 in case that helps someone see it.
OK, the last git cl upload completed after another 4 mins.  Took 10 mins total:

 61 files changed, 1680 insertions(+), 1070 deletions(-)
remote: Replicating objects... done        
remote: Processing changes: refs: 1, new: 1, done            
remote: commit b534d91: warning: subject >50 characters; use shorter first paragraph        
remote: 
remote: SUCCESS        
remote: 
remote: New Changes:        
remote:   https://chromium-review.googlesource.com/c/chromium/src/+/1265900 IndexedDB: Move content/renderer/indexed_db/ to Blink [WIP]        
To https://chromium.googlesource.com/chromium/src.git
 * [new branch]                b534d91b4de02c5483ff1612055cdb5913b2fb32 -> refs/for/refs/heads/master%wip,m=Initial_upload,hashtag=indexeddb

I'd like to know where to look in the future to understand if I'm hitting a known issue.

Do you have telemetry for when `git cl upload` takes longer than, say, 20s to complete / is that available to view?
Components: -Infra Infra>Codereview>Gerrit Infra>SDK
Cc: ajp@chromium.org
The hanging upload sounds similar to https://crbug.com/887712, but I think it may be hanging earlier than for those people. 

A bunch of questions to help me understand this issue and hopefully get to a resolution.

Do you know if the 502 happened as part of git push, as opposed to the subsequent POST requests we make to Gerrit?

How consistently is the upload hanging for you?
 * Every time on this CL?
 * Does it happen on other CLs?

Do you know if other people are experiencing similar issues?

As to your questions:

We do have very broad metrics on the timing for git cl upload. I'm sad to report that the last collected average is above 20s (30-35s). It is not easily viewable now (we manually write and run queries) but we are starting on a project to collect and display a variety of project health metrics and git cl upload metrics are on the list of things to collect and display. I don't know how broadly viewable that dashboard will be, it's a good question, but I hope reasonably.

I expect git cl upload to get a bit faster with the fix to https://crbug.com/881860, but one of the slower pieces has always been the initial git push and that is not changing with that bug fix.

As for where to look when hitting a known issue, also not as straight forward as I'd like. For Chromium the components I've put this bug into are a great start:
Infra>Codereview>Gerrit
Infra>SDK

If the problem is not in our git cl upload scripts, but in Gerrit itself there is also a separate Monorail project for Gerrit:
https://bugs.chromium.org/p/gerrit/issues/list

However, the Gerrit team primarily uses the internal only Google bug tracker that most of Google uses. Their use issues in the Monorail project is minimal. So if you're a Googler looking at our internal bug tracker is also an option. I work to make the Gerrit team aware of all the issues in Monorail that are important to Chromium (as I learn of them).
> The hanging upload sounds similar to https://crbug.com/887712, but I think
> it may be hanging earlier than for those people.

Yes, I think so, too.

> Do you know if the 502 happened as part of git push, as opposed to the
> subsequent POST requests we make to Gerrit?

I only looked at `ps output` during the last 2 times (which both took 9-10 mins, completed, and never 502'd).  In both of those cases, the hang happened during `git push`.  Not sure if the hang happened during `git push` in the 502 case.  I didn't get any `git` output in the 502 case, though, which implies it was hanging during `git push` or earlier.

> How consistently is the upload hanging for you?
>  * Every time on this CL?
>  * Does it happen on other CLs?

So far consistently on this CL.  Not sure if it happens on other CLs, haven't uploaded any others in the last 3 days or so.  Will update this bug if I see it on other CLs.

> Do you know if other people are experiencing similar issues?

No, but when I asked around the kinds of responses I got implied that most people are used to `git cl upload` taking a long time.  If you asked for a show of hands I suspect you'd see a lot of people who experience similar issues.
I uploaded a new patchset to that CL just now which took 10 mins to complete:

cmp@cmp:~/co/chromium/src/third_party/blink$ ps auxwf|grep git
cmp      226083  0.0  0.0  19468  3604 pts/0    S+   16:30   0:00  |   \_ git cl upload
cmp      226084  0.1  0.0 139820 34412 pts/0    S+   16:30   0:01  |       \_ python /usr/local/google/home/cmp/co/depot_tools/git_cl.py upload
cmp      230326  0.0  0.1 402248 277104 pts/0   S+   16:38   0:00  |           \_ git push https://chromium.googlesource.com/chromium/src.git 0a5672b4a47844f36758c126729784c464970e72:refs/for/refs/heads/master%notify=NONE,m=initial_comments_addressed
cmp      230327  0.2  0.0 177952 17296 pts/0    S+   16:38   0:01  |               \_ /usr/lib/git-core/git-remote-https https://chromium.googlesource.com/chromium/src.git https://chromium.googlesource.com/chromium/src.git
cmp      230402  0.1  0.1 475980 277512 pts/0   Sl+  16:39   0:00  |                   \_ /usr/lib/git-core/git send-pack --stateless-rpc --helper-status --thin --no-progress https://chromium.googlesource.com/chromium/src.git/ --stdin
cmp      233871  0.0  0.0  14664   988 pts/1    S+   16:48   0:00  |   \_ grep --color=auto git

(ps|grep ran at 4:48p, git push started at 4:38p)
Cc: ehmaldonado@chromium.org
I looked in more detail into git cl upload times. Of ~25k git cl upload runs we have metrics for in the last month it looks like only 3 are over 5 minutes, and only 17 are over 1 minute. These timings include more work than just git push, but git push is from what I know usually the slowest step (aside from any waiting for user input when necessary). I don't think this is a pervasive problem, but definitely not good for the people it does affect!

Can you think of anything usual about your CL? Is it unusually large, etc?

The more information for debugging the better, especially as the git/Gerrit team isn't that familiar with our usage.

> ~25k git cl upload runs we have metrics for in the last month it
> looks like only 3 are over 5 minutes, and only 17 are over 1 minute

I experienced this 3 times, so that would mean that my git cl pushes are 100% of the "over 5 mins" numbers.  This may be true but it's hard to believe.  Another possibility is the metric gathering isn't capturing all of the data somehow, but that seems like a slim possibility, too.

> Can you think of anything usual about your CL? Is it unusually large, etc?

Maybe it's big and that causes a problem.  The chromium-review site claims it's "XL" but that site also claims 2 other CLs I've uploaded in the past are XL and I didn't experience issues with those.

Do you see anything that's unusual about it?  The CL https://chromium-review.googlesource.com/c/chromium/src/+/1265900 should be publicly available.  If you download it to your checkout and git cl upload, do you see the same long time to upload?
I uploaded patchset 5 today, this took 11 mins to complete:

cmp@cmp:~/co/chromium/src$ ps auxwf|grep git
cmp      198135  0.0  0.0  19468  3776 pts/0    S+   11:21   0:00  |   \_ git cl upload
cmp      198137  0.2  0.0 140236 34532 pts/0    S+   11:21   0:01  |       \_ python /usr/local/google/home/cmp/co/depot_tools/git_cl.py upload
cmp      198963  0.0  0.1 402248 276684 pts/0   S+   11:22   0:00  |           \_ git push https://chromium.googlesource.com/chromium/src.git e57e6458d8fd29b27004ad070cb3408de5ecc312:refs/for/refs/heads/master%notify=NONE,m=fix_array%5Ftraits%5Fweb%5Fvector%2Eh_BUILD%2Egn_entry
cmp      198964  0.2  0.0 177952 17308 pts/0    S+   11:22   0:01  |               \_ /usr/lib/git-core/git-remote-https https://chromium.googlesource.com/chromium/src.git https://chromium.googlesource.com/chromium/src.git
cmp      198968  0.1  0.1 475980 277508 pts/0   Sl+  11:22   0:00  |                   \_ /usr/lib/git-core/git send-pack --stateless-rpc --helper-status --thin --no-progress https://chromium.googlesource.com/chromium/src.git/ --stdin
cmp      203539  0.0  0.0  14664   972 pts/1    S+   11:32   0:00  |   \_ grep --color=auto git

(`git cl upload` shows "remote: Replicating objects... <spinner>")

cmp@cmp:~/co/chromium/src$ ps auxwf|grep git
cmp      198135  0.0  0.0  19468  3776 pts/0    S+   11:21   0:00  |   \_ git cl upload
cmp      198137  0.2  0.0 140236 34532 pts/0    S+   11:21   0:01  |       \_ python /usr/local/google/home/cmp/co/depot_tools/git_cl.py upload
cmp      198963  0.0  0.1 402248 276684 pts/0   S+   11:22   0:00  |           \_ git push https://chromium.googlesource.com/chromium/src.git e57e6458d8fd29b27004ad070cb3408de5ecc312:refs/for/refs/heads/master%notify=NONE,m=fix_array%5Ftraits%5Fweb%5Fvector%2Eh_BUILD%2Egn_entry
cmp      198964  0.2  0.0 177952 17308 pts/0    S+   11:22   0:01  |               \_ /usr/lib/git-core/git-remote-https https://chromium.googlesource.com/chromium/src.git https://chromium.googlesource.com/chromium/src.git
cmp      198968  0.1  0.1 475980 277508 pts/0   Sl+  11:22   0:00  |                   \_ /usr/lib/git-core/git send-pack --stateless-rpc --helper-status --thin --no-progress https://chromium.googlesource.com/chromium/src.git/ --stdin
cmp      203552  0.0  0.0  14664   944 pts/1    S+   11:32   0:00  |   \_ grep --color=auto git

(`git cl upload` shows "remote: SUCCESS" plus ... "* [new branch]    ..." output)

cmp@cmp:~/co/chromium/src$ ps auxwf|grep git
cmp      198135  0.0  0.0  19468  3776 pts/0    S+   11:21   0:00  |   \_ git cl upload
cmp      198137  0.3  0.0 141160 35340 pts/0    R+   11:21   0:01  |       \_ python /usr/local/google/home/cmp/co/depot_tools/git_cl.py upload
cmp      203562  0.0  0.0  14664   948 pts/1    S+   11:32   0:00  |   \_ grep --color=auto git
My latest upload (patchset 7) took <10s between entering "title for patchset" and "remote: Processing changes: ...".  The difference was that the branch was now pointing to commit ac2feb00c which is from today.

My recommendation to help others is to modify `git cl upload` to check if the base commit is > X commits behind remote master or > Y days and warn that upload times may take considerably longer with a suggestion to rebase onto a recent master to fix.

Also, were 100% of the slow cases from my `git cl upload` calls?  You might want to update the telemetry to include some more stats like "# of commits behind remote master" etc since the slow cases might neatly fall into groups using those stats.

I don't have anything in this bug that's RVG, I propose we remove that label.  WDYT?
Labels: -Restrict-View-Google
> I experienced this 3 times, so that would mean that my git cl pushes are 100% of the "over 5 mins" numbers.  This may be true but it's hard to believe.  Another possibility is the metric gathering isn't capturing all of the data somehow, but that seems like a slim possibility, too.

It might be that metrics are disabled for some reason.
Do you have metrics reporting enabled? And are you running on corp?

> My recommendation to help others is to modify `git cl upload` to check if the base commit is > X commits behind remote master or > Y days and warn that upload times may take considerably longer with a suggestion to rebase onto a recent master to fix.
> You might want to update the telemetry to include some more stats like "# of commits behind remote master" etc since the slow cases might neatly fall into groups using those stats.

That's an interesting observation, I'm not sure why that would be the case though. It would be good to pursue this further.

> I don't have anything in this bug that's RVG, I propose we remove that label.  WDYT?

I agree, removed.
@cmp - I think this bug is possibly conflating two issues. I'd love to clean it up so we can triage effectively.

You started by asking about RPC failures. Is that still happening or can we consider that issue resolved or not reproducing?

The second is issue is git cl upload performance and how slow it was for your CL. We going to add more metrics to at least identify what specific requests within cl upload are slow so we can escalate as appropriate with the Gerrit team. The distance behind master as a cause of slowness is an interesting theory. Have your uploads stayed fast while your branch isn't far behind master?
@ajp - yeah, I'm fine with splitting the bugs up.

I haven't seen the RPC failures for a while.  Makes sense to consider it resolved or not reproducing.

Re: slowness, yes, `git cl upload` has stayed fast as I've kept my branch rebased onto a recent master.
Status: WontFix (was: Untriaged)
Closing this bug as will not reproduce for the server RPC failures. I've filed a new tracking bug 902412, that I will dedupe slow upload bugs to and use for communication purposes. Any actual work to address will most likely be tracked separately by the Gerrit/Git teams whose APIs are the primary bottleneck.

Sign in to add a comment