New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 725233 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Closed: Jun 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Feature

Blocked on:
issue 725968



Sign in to add a comment

cbuildbot: cherry-pick changes faster

Project Member Reported by diand...@chromium.org, May 22 2017

Issue description

cbuildbot currently cherry-picks changes very slowly and inefficiently.  We can do at least an order of magnitude better.

An example is guado moblab build 5814.  For that build we had 133 changes and 106 of them were kernel changes.

We spent about 50 minutes cherry-picking changes.

As per  bug #725220 , picking a kernel changes from gerrit one-at-a-time takes between 20 and 25 seconds.

===

So how can we be faster?  One trick I personally do as an individual developer is to "FETCH" the top change in the chain, then cherry-pick by git hash.

That particular guado moblab had a huge chain of CLs in it.  I count 100 kernel changes from mcchou, and it's clear that many of these are in one big chain in gerrit.  It turns out that doing a "git fetch" of the highest change in the list actually fetches all the earlier ones too but still only takes between 20 and 25 seconds.


Let's take an example w/ just two patches.  Let's say I want to cherry-pick these two:

  https://chromium-review.googlesource.com/c/474435/6
  https://chromium-review.googlesource.com/c/474436/6

474435 is the parent of 474436.


I will say that these two commands produce equivalent results but one is twice as fast:

  git fetch https://chromium.googlesource.com/chromiumos/third_party/kernel refs/changes/35/474435/6
  git cherry-pick FETCH_HEAD
  git fetch https://chromium.googlesource.com/chromiumos/third_party/kernel refs/changes/36/474436/6
  git cherry-pick FETCH_HEAD

vs.

  git fetch https://chromium.googlesource.com/chromiumos/third_party/kernel refs/changes/36/474436/6
  git cherry-pick FETCH_HEAD~
  git cherry-pick FETCH_HEAD


NOTE: it's important to be careful in the above.  If 474435 has been revved then you'll need to fetch it separately.  AKA if 474435/6 is a parent of 474436/6 and #6 is the newest patch set for 474436, but there's a patch set #7 for 474435, then we need to do the extra fetch.


If it's fast, we could easily validate git hashes.  There's probably a better way, but this shows the concept:

  hash_474436=$(curl -s https://chromium-review.googlesource.com/changes/474436/revisions/6/patch?download  | base64 -d | head -1 | awk '{print $2;}')
  hash_474435=$(curl -s https://chromium-review.googlesource.com/changes/474435/revisions/6/patch?download  | base64 -d | head -1 | awk '{print $2;}')

  if [[ "$(git show ${hash_474436}~)" == "$(git show ${hash_474435})" ]]; then
     echo "Good"; 
  else 
    echo "Bad, need a separate fetch"; 
  fi

===

Another possible solution is much easier but is a bigger change.  ...that's to just use the "base 64" patch download.  That's always super fast.  I'm a bit hesitant to just suggest we do this because it's possible (???) it could give some different results, I think.

If we do the full "git fetch" we get the full git history for a given change.  That should give "git" more context for how the change should be applied.  With just the "PATCH" file I don't think we're guaranteed that git will make the same decisions.  Can someone who knows git better confirm?

In any case, if we can do this, you'd just need to run:

  curl -s https://chromium-review.googlesource.com/changes/474433/revisions/6/patch?download | base64 -d | git am
  curl -s https://chromium-review.googlesource.com/changes/474436/revisions/6/patch?download | base64 -d | git am

That's pretty darn fast.


 
Cc: semenzato@chromium.org
OK, I think I have proved that the "git am" isn't as smart as "git fetch / git cherry-pick" and probably can't be used for the builder.

Basically: having full git history and cherry-picking can handle renames.  ...and git am can't.  In the below, I do this:

1. Make a change in a file (change text "Doug" -> "Frank")
2. Rename the file
3. Make the opposite change in the file (change text "Frank" -> "Doug")

If I cherry-pick the original change ("Doug" -> "Frank") then it actually works and makes the change in the renamed file.

If I do a "git format-patch" and "git am" of the original change, it doesn't work properly and doesn't change the renamed file.

---

dianders@tictac:tmp$ mkdir gittest

dianders@tictac:tmp$ cd gittest

dianders@tictac:gittest$ git init .
Initialized empty Git repository in /b/tip/tmp/gittest/.git/

dianders@tictac:gittest (master)$ echo "My name is Doug" > name

dianders@tictac:gittest (master)$ git add name

dianders@tictac:gittest (master)$ git commit -am "my name is doug"
[master (root-commit) c933037bac73] my name is doug
 1 file changed, 1 insertion(+)
 create mode 100644 name

dianders@tictac:gittest (master)$ echo "My name is Frank" > name

dianders@tictac:gittest (master)$ git add name

dianders@tictac:gittest (master)$ git commit -am "my name is now frank"
[master 92e041a7545a] my name is now frank
 1 file changed, 1 insertion(+), 1 deletion(-)

dianders@tictac:gittest (master)$ git mv name name.txt

dianders@tictac:gittest (master)$ git commit -m "oops, forgot .txt"
[master 1dbd5ec0f2b1] oops, forgot .txt
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename name => name.txt (100%)

dianders@tictac:gittest (master)$ echo "My name is Doug" > name.txt

dianders@tictac:gittest (master)$ git add -u

dianders@tictac:gittest (master)$ git commit -am "Just kidding, my name is really Doug"
[master b8adb72f1a23] Just kidding, my name is really Doug
 1 file changed, 1 insertion(+), 1 deletion(-)

dianders@tictac:gittest (master)$ git log --oneline
b8adb72f1a23 (HEAD -> master) Just kidding, my name is really Doug
1dbd5ec0f2b1 oops, forgot .txt
92e041a7545a my name is now frank
c933037bac73 my name is doug

dianders@tictac:gittest (master)$ git cherry-pick 92e041a7545a
[master 56b089e4e30c] my name is now frank
 Date: Mon May 22 14:25:34 2017 -0700
 1 file changed, 1 insertion(+), 1 deletion(-)

dianders@tictac:gittest (master)$ git reset --hard HEAD~
HEAD is now at b8adb72f1a23 Just kidding, my name is really Doug

dianders@tictac:gittest (master)$ git format-patch 92e041a7545a~..92e041a7545a
0001-my-name-is-now-frank.patch

dianders@tictac:gittest (master)$ git am 0001-my-name-is-now-frank.patch
Applying: my name is now frank
error: name: does not exist in index
Patch failed at 0001 my name is now frank
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

I very much like where this is going, but as nxia@ pointed out in email earlier... how do we detect that a new revision has been uploaded for a subset of the git stack?

For example, I upload a stack of three CLs:
  A - Rev 1
  B - Rev 1
  C - Rev 1

I then create a new branch locally. Cherry pick B to this new branch, amend it, and upload it to Gerrit. This upgrades B to Rev 2 of the CL (which changes the SHA, resets flags, etc).

If you fetch change C, you'll get rev 1 of B, not rev 2.

We never want to test Rev 1 after 2 has been uploaded.
Yes you can absolutely run cherry-pick an order-of-magnitude faster.

The real cost here is the 'git fetch', not the 'cherry-pick'. You can absolutely speed up the 'git fetch' by skipping all-but-the-last 'git fetch' operations. Note that in order for this to work the whole stack of patches needs to be rebased exactly on top of each other, but we already have code in Android TreeHugger to check for this that you could borrow.

I'd probably also recommend implementing some multiprocessing/threading for this code, if you haven't already.
Status: Available (was: Untriaged)
@3: I think I was suggesting a workaround for your exact case above.  Let me try to summarize it like this:

1. Find out all git hashes of proper revisions of all CLs you intend to commit.  If there's no better way to do this, this can be done semi quickly with the command:

  curl -s https://chromium-review.googlesource.com/changes/${CLNUM}/revisions/${REVNUM}/patch?download | base64 -d 

2. Run this for each CL in _reverse_ order of what we have today:
   if 'git show ${GITHASH}' works: do nothing
   else: do the same "git fetch" we'd do today, but not the cherry pick.

3. Run this for each CL in _the same_ order we have today:
   git cherry-pick ${GITHASH}


That's a pretty simple way to get started.

---

David James is right that threading would be ideal here.  I'm not terribly familiar with the above, but one easy thing to do would be to fork off a thread per project.  Then you wouldn't need to mess with the above logic.

---

Does anyone have the cbuildbot knowhow to do this quickly?  It seems like for someone familiar with the code this wouldn't be crazy hard.
I'm NOT super familiar with the code in question, but when selecting CLs to put into the CQ, we make a Gerrit REST API request per CL. Think Gerrit the review tool, not git. This is used to fetch flag information and such, and we store off that information.

That includes CL #, revision, and the "Change-ID", and probably includes the git hash as well.

I think we just need to verify (on the master, not per-slave) that the git hash we fetched matches that Gerrit information.

A few comments:
1. I think threading will be insufficient.  If you have 100 patches in one chain in one project, one thread per project won't help.
2. We're doubly bit by this by doing stuff for the master and the slave builds.  The flipside of that is we can do stuff on the master and tell the slave the optimal set of things that need to be done.
3. There's another option that is potentially more work, but maybe it's just a week or two in total:
- master: ask gerrit to create a new branch from ToT called VERSION-cq via Gerrit REST API
- master: ask gerrit to cherry-pick all of the changes being tested to VERSION-cq via REST
- master/slave: sync VERSION-cq
- master: upon commitqueuecompletion, if ToT has not moved and all changes are being accepted, do the equivalent of reset --hard/push cros/master to VERSION-cq via REST; otherwise, use REST to cherry-pick submitted changes to cros/master
Here's my alternative proposal:
  A. Combine fetches per-project (e.g. fetch 200 refs in one run). According to local tests, it takes 6 seconds to fetch 1 ref and 12 seconds to fetch 200 refs, so it scales very well to fetch that way.
  B. Thread fetches across projects.

With A and B, you'll have the whole fetch happening in ~12 seconds, and then just have to contend with cherry-pick time. Testing locally, it takes ~10-12 seconds to cherry-pick 200 patches (2 seconds less if you combine into a single command). So really, with the threading & the combined fetch, we're talking about 24 seconds for the whole fetch + cherry-pick on the master, and 24 seconds on the slave. If we could get there there'd be no reason to create branches (which I think would be more work/complexity and could lead to new kinds of flake as we try to get the logic right)
+1 to c#8.  I think in the long run, I still prefer doing this server side once (especially if we can optimize the commit), but I think if we can collapse all the fetches, it's much simpler and easier to do that now.
Labels: -Type-Bug Type-Feature
Owner: davidri...@chromium.org
Status: Started (was: Available)
Talked with David and he's already started on this :)
Blockedon: 725968
Project Member

Comment 12 by bugdroid1@chromium.org, May 25 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/78ea03cd8699bd6fda622fb9f8ea8a09df53dfec

commit 78ea03cd8699bd6fda622fb9f8ea8a09df53dfec
Author: David Riley <davidriley@chromium.org>
Date: Thu May 25 12:55:29 2017

validation_pool: Pre-fetch patches when syncing slaves.

BUG= chromium:725220 
BUG= chromium:725233 
TEST=validation_pool_unittest; run_tests

Change-Id: I7be8749aaf018f69c75ad0bd6f9f58108dd585ed
Reviewed-on: https://chromium-review.googlesource.com/513397
Tested-by: David Riley <davidriley@chromium.org>
Trybot-Ready: David Riley <davidriley@chromium.org>
Reviewed-by: David James <davidjames@chromium.org>

[modify] https://crrev.com/78ea03cd8699bd6fda622fb9f8ea8a09df53dfec/cbuildbot/patch_series.py
[modify] https://crrev.com/78ea03cd8699bd6fda622fb9f8ea8a09df53dfec/cbuildbot/validation_pool.py
[modify] https://crrev.com/78ea03cd8699bd6fda622fb9f8ea8a09df53dfec/cbuildbot/validation_pool_unittest.py

Change appears to be good and is doing multiple all the fetches for a repo at one time  The builds I've looked at haven't been the best examples since they're submitting the same changes repeatedly due to guado_moblab-paladin failing and don't require a fetch in any case.

Still waiting to land is the change to get rid of sleeps between cherry-picks https://chromium-review.googlesource.com/#/c/513421/ which will save 1s per patch.

The following build ran into some repo sync errors:
https://luci-logdog.appspot.com/v/?s=chromeos%2Fbb%2Fchromeos%2Fbeaglebone-paladin%2F13461%2F%2B%2Frecipes%2Fsteps%2FCommitQueueSync%2F0%2Fstdout
 Issue 725220  has been merged into this issue.
Project Member

Comment 15 by bugdroid1@chromium.org, May 26 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/5c15b33f5f051da0715f3d769692b28ad6bd26c9

commit 5c15b33f5f051da0715f3d769692b28ad6bd26c9
Author: David Riley <davidriley@chromium.org>
Date: Fri May 26 00:16:02 2017

Revert "Put sleep between apply_change operations."

This reverts commit 250bc41528aa5f90b8aadab998ad4818ddb97499.

By pre-fetching the sleep hopefully isn't necessary.

BUG= chromium:725233 
BUG= chromium:725220 

Change-Id: Ibb95ca96daa6fd66d53de7553e60f600b8f4700c
Reviewed-on: https://chromium-review.googlesource.com/513421
Commit-Ready: David Riley <davidriley@chromium.org>
Tested-by: David Riley <davidriley@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/5c15b33f5f051da0715f3d769692b28ad6bd26c9/cbuildbot/validation_pool.py

The changes I landed might be causing some issues:
https://luci-logdog.appspot.com/v/?s=chromeos%2Fbb%2Fchromeos%2Fbeaglebone-paladin%2F13480%2F%2B%2Frecipes%2Fsteps%2FCommitQueueSync%2F0%2Fstdout

17:36:24: ERROR: <class 'chromite.lib.cros_build_lib.RunCommandError'>: return code: 128; command: git fetch -f https://chromium-review.googlesource.com/chromiumos/platform/crosvm refs/changes/20/509920/6 refs/changes/82/510782/5
fatal: unable to access 'https://chromium-review.googlesource.com/chromiumos/platform/crosvm/': Operation too slow. Less than 1000 bytes/sec transferred the last 300 seconds
cmd=['git', 'fetch', '-f', u'https://chromium-review.googlesource.com/chromiumos/platform/crosvm', u'refs/changes/20/509920/6', u'refs/changes/82/510782/5'], cwd=/b/cbuild/src/platform/crosvm
Traceback (most recent call last):
  File "/b/cbuild/chromite/lib/parallel.py", line 602, in TaskRunner
    task(*x, **task_kwargs)
  File "/b/cbuild/chromite/lib/parallel.py", line 800, in <lambda>
    fn = lambda idx, task_args: out_queue.put((idx, task(*task_args)))
  File "/b/cbuild/chromite/cbuildbot/patch_series.py", line 138, in _FetchChangesForRepo
    git.RunGit(repo, cmd, print_cmd=True)
  File "/b/cbuild/chromite/lib/git.py", line 817, in RunGit
    ['git'] + cmd, **kwargs)
  File "/b/cbuild/chromite/lib/retry_util.py", line 109, in GenericRetry
    ret = functor(*args, **kwargs)
  File "/b/cbuild/chromite/lib/cros_build_lib.py", line 645, in RunCommand
    raise RunCommandError(msg, cmd_result)
RunCommandError: return code: 128; command: git fetch -f https://chromium-review.googlesource.com/chromiumos/platform/crosvm refs/changes/20/509920/6 refs/changes/82/510782/5
fatal: unable to access 'https://chromium-review.googlesource.com/chromiumos/platform/crosvm/': Operation too slow. Less than 1000 bytes/sec transferred the last 300 seconds
cmd=['git', 'fetch', '-f', u'https://chromium-review.googlesource.com/chromiumos/platform/crosvm', u'refs/changes/20/509920/6', u'refs/changes/82/510782/5'], cwd=/b/cbuild/src/platform/crosvm


Suggest filing a bug against Gerrit on Borg folks for that -- that failure seems like a Gerrit on borg problem rather than a problem with your CL. On our end, suggest we add a retry to lib/git.py on the error message 'Operation too slow' ? (Currently we don't retry on that)
Cc: pho...@chromium.org
+phobbs:

re c#17: Can infra deputy (phobbs/dgarrett) take on dealing with GoB team?
Pinging deputies.
Re #19: Are you still seeing the slowness issues?
- If we've only seen them on one day:
  - The primary deputy should have gotten involved on the same day, as this is an operational problem.
- If we've seen them multiple times intermittently
  - Let's collect evidence that this is the case (ping this bug say three times this happens)
  - At that point, the correct response is to add retry / metrics / involve the GoB team in a non-critical issue.
  - For this case, someone from infra team is on the hook, but it's not the deputies' job.

My reading of the issue in #16 so far is that we've only seen this one. I'd even argue that we shouldn't (yet) add the retry requested in #17 so that we don't start ignoring these failures too early (it'll still cost time).
Cc: d...@chromium.org
Right now, we have a double set of retries on git commands (long story). I wonder if we were going through all/most of them here, or if they were causing issues, and/or saving us from what would have been GoB based failures?

Comment 22 by d...@chromium.org, Jun 2 2017

FYI: If there are Git wrapper retries, some text indicating this would have been printed to STDERR.
Don, I believe we're not doing retries here, because we don't whitelist "Operation too slow" as a known error condition. Seems worth whitelisting.
Owner: dgarr...@chromium.org
I've no problem with that change, but in the cases above we are already succeeding. How will more retries make us faster?
Owner: davidri...@chromium.org
Produced a CL with that change, but I don't think it's the core problem.
Status: Verified (was: Started)
This looks like a Gerrit bug -- please file against Gerrit. In those examples, cherry-pick is going very fast so let's file a bug in buganizer against Gerrit team to look at those issues (they look legitimate and the reason why Sync takes 5 minutes is because the timeout for 'too slow' syncs is 5 minutes)
Re: c#24: Your fix addresses builds failures like: https://uberchromegw.corp.google.com/i/chromeos/builders/beaglebone-paladin/builds/13480

It doesn't address the slow sync's listed in c#26.
Spun off b/62300477.
Project Member

Comment 30 by bugdroid1@chromium.org, Jun 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/d2a44898b7ae8d8f755af8d0c3e0e0e73b2d9510

commit d2a44898b7ae8d8f755af8d0c3e0e0e73b2d9510
Author: Don Garrett <dgarrett@google.com>
Date: Sat Jun 03 01:39:39 2017

git: Retry "Operation too slow".

Adding another retry condition.

BUG= chromium:725233 
TEST=None

Change-Id: I19c9f605ae6164705e9357579100ce68ab928b88
Reviewed-on: https://chromium-review.googlesource.com/522945
Commit-Ready: Don Garrett <dgarrett@chromium.org>
Tested-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: David Riley <davidriley@chromium.org>
Reviewed-by: David James <davidjames@chromium.org>

[modify] https://crrev.com/d2a44898b7ae8d8f755af8d0c3e0e0e73b2d9510/lib/git.py

Project Member

Comment 31 by bugdroid1@chromium.org, Jun 3 2017

Sign in to add a comment