New issue
Advanced search Search tips

Issue 892292 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 15
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Multiple release builds failed to fetch kernel: unknown SHA1

Project Member Reported by apronin@chromium.org, Oct 4

Issue description

Several release builders failed with SrcCheckOutException during ManifestVersionedSync.

In the logs they contain

error: Cannot fetch chromiumos/third_party/kernel (GitError: chromiumos/third_party/kernel update-ref: fatal: 136704e905fc41e8ec97d2c3422e11fc88e2c54d^0: not a valid SHA1
)

Sample builders:
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8933573584903527248
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8933573547859159104
and more
 
Description: Show this description
Cc: briannorris@chromium.org
Labels: -Pri-3 OS-Chrome Pri-2
Summary: Multiple release builds failed to fetch kernel: unknown SHA1 (was: Multiple release builds failed to fetch : unknown SHA1)
Labels: Hotlist-CrOS-Sheriffing
Labels: -Pri-2 Pri-1
It appears to affect all release builders that finished so far (except for beaglebone* that has it's own unique issues). So, raising to P1 in case it's not just some flake.
Owner: apronin@chromium.org
Status: Assigned (was: Untriaged)
Did someone chump a bad manifest change to ToT? Are the CQ builders about to break the same way?

How would such a change look like? I don't see anything similar chumped, but maybe I'm just missing it.
The current paladins are building R71-11126.0.0-rc3 and doing ok with it.
Release builders failed on 11127.0.0 (and were successful on 11126.0.0).
So far, CQ runs fine on 11127.0.0. E.g. gale-paladin #7098 on R71-11127.0.0-rc1 is green, even though gale-release from bug descr was red with this issue.
If the CQ didn't fail, then that wasn't the cause. It's possible that someone temporarily was modifying the kernel repo in a destructive way. There are a few people with that permission. Has this recurred? If not, I suggest closing.
Yes, I was waiting for release builders to succeed. CQ didn't see this issue. And for release builders those who saw it once (gale, octopus, ...) were green next build. It was ~5 builders, all started around the same time (short red runs near the end of list in https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8933574730886724448) that failed. So, yes, I suspect some temporary flake. Closing.
Status: WontFix (was: Assigned)
Owner: dgarr...@chromium.org
The buildspec for these builds includes the entry:

  <project name="chromiumos/third_party/kernel" path="src/third_party/kernel/next" revision="136704e905fc41e8ec97d2c3422e11fc88e2c54d" upstream="refs/heads/next/chromeos-next"/>

If every single release builder got the same error, then we generated an invalid buildspec.

More likely GoB had intermittent errors. Possibly, the ref was brand new, and not all shards had it yet.

One way to test would be to run a tryjob against one of the boards that failed.

  cros tryjob --version 11127.0.0 gale-release-tryjob

Running here:
  https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8933474803285517968


If it gets pass ManifestVersions, then this was flake, probably in GoB.
Status: Started (was: WontFix)
groeck is constantly playing with the next/chromeos-next branch, so that could easily tickle problems there.
Oh... if someone is rewriting branch history, then that could cause it. And it could be seen by part of the builders (but not all) if it happened as they were starting up. You would even end up with a clear line defining the problems based on when the relative sync's hit that repo.
Cc: groeck@chromium.org
Yeah, there's definitely been some rewriting of history. +groeck FYI, and to make sure we're on the same page -- either we need this to be robustly supported, or we need to not do that. AFAIK, kernel-next stiff is totally experimental, and is basically only there for running trybots before we promote a branch to become a real kernel release.
We cannot robustly support rewriting of history, especially for any repository in the manifest. It just breaks to many assumptions.

It almost certainly has broken our ability to re produce builds against some versions of ChromeOS. That's something we mostly won't notice, but is super critical in the rare cases where it comes up (mostly on branches).
That said... the tryjob was able to sync using the SHA1 in question. It's currently valid.

Owner: jclinton@chromium.org
Jason, I strongly believe this was GoB flake. Unless history was rewritten to remove, then readd that SHA1.
Hmm. Kind of odd that builders would care about branches unrelated to the build in question. We live and learn.

Ok, I'll try to point the manifest for kernel-next to the kernel-next repository instead. That was done before, so hopefully it won't cause problems if the history in that repository is rewritten.

Labels: -Pri-1 Pri-2
Owner: groeck@chromium.org
Status: Assigned (was: Started)
Status: Started (was: Assigned)
Cc: zwisler@chromium.org
Project Member

Comment 24 by bugdroid1@chromium.org, Oct 10

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/c0b8703f5900064d860de1ca62f2a29bb928dcc1

commit c0b8703f5900064d860de1ca62f2a29bb928dcc1
Author: Guenter Roeck <groeck@chromium.org>
Date: Wed Oct 10 08:23:53 2018

Remove support for kernel-next

This reverts commits 6097e42c684a64cb2363c2f719cbff65d29b1e5a
and 27761a51c322c3b347d13bb1a18da103120066da.

We have to remove all remnants of kernel-next before we can point it to
another repository.

BUG= chromium:892292 

Change-Id: I5957ecdf6be35928eddd341eb88dbd7171004487
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1271816
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[delete] https://crrev.com/0bd4918213607e291e0340cec65c945a3bfc7b97/sys-kernel/chromeos-kernel-next/chromeos-kernel-next-4.19_rc6-r13.ebuild
[rename] https://crrev.com/c0b8703f5900064d860de1ca62f2a29bb928dcc1/virtual/linux-sources/linux-sources-1-r16.ebuild
[delete] https://crrev.com/0bd4918213607e291e0340cec65c945a3bfc7b97/sys-kernel/chromeos-kernel-next/chromeos-kernel-next-9999.ebuild
[delete] https://crrev.com/0bd4918213607e291e0340cec65c945a3bfc7b97/sys-kernel/chromeos-kernel-next/files/chromeos-version.sh
[modify] https://crrev.com/c0b8703f5900064d860de1ca62f2a29bb928dcc1/virtual/linux-sources/linux-sources-1.ebuild
[delete] https://crrev.com/0bd4918213607e291e0340cec65c945a3bfc7b97/sys-kernel/chromeos-kernel-next/metadata.xml

Could you copy commits referenced by the manifest files between 11085.0.0 and 11145.0.0 from kernel-next to kernel git repo? Otherwise we cannot 'repo sync' by those release version manifests.

To find the shas....

1) git clone https://chrome-internal.googlesource.com/chromeos/manifest-versions
2) Look in <git>/buildspecs for pinned manifests in XML format.
3) Look at the kernel repo entries.
Status: Fixed (was: Started)
Immediate issue with CI system is resolved. Please use a new bug to track kernel and kernel-next history discussion.

Sign in to add a comment