New issue
Advanced search Search tips
Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Apr 16
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

android-container-nyc failure got into tree due to ebuild skipping testing

Project Member Reported by jorgelo@chromium.org, Apr 12 Back to list

Issue description

I have a CL that modifies an Autotest test: https://chromium-review.googlesource.com/c/1010518/

android-container-nyc fails:
=== Start output for job android-container-nyc-4717008-r1 (0m13.7s) ===
android-container-nyc-4717008-r1: >>> Emerging (1 of 1) chromeos-base/android-container-nyc-4717008-r1::cheets-private for /build/betty/
android-container-nyc-4717008-r1:  * cheets_x86_userdebug-target_files-4717008.zip SHA256 SHA512 WHIRLPOOL size ;-) ...            [ ok ]
android-container-nyc-4717008-r1: 13:24:12: INFO: RunCommand: /mnt/host/source/.cache/common/gsutil_4.30.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' cp -v -- gs://chromeos-arc-images/builds/git_nyc-mr1-arc-linux-static_sdk_tools/4717008/aapt /var/cache/chromeos-cache/distfiles/target/aapt.tmp
android-container-nyc-4717008-r1: !!! Fetched file: aapt VERIFY FAILED!
android-container-nyc-4717008-r1: !!! Reason: Failed on SHA256 verification
android-container-nyc-4717008-r1: !!! Got:      a3d82bca505fdded001a33cc347697cdb8c5f50ec3281f65b4e60c05f01d39b5
android-container-nyc-4717008-r1: !!! Expected: 1f314e854fca30643c3a244ce55a75ca4b963f8ef45092d8d28d37e3a5852e17
android-container-nyc-4717008-r1: Refetching... File renamed to '/var/cache/chromeos-cache/distfiles/target/aapt._checksum_failure_.C6ey7E'
android-container-nyc-4717008-r1: 
android-container-nyc-4717008-r1: !!! Couldn't download 'aapt'. Aborting.
android-container-nyc-4717008-r1:  * Fetch failed for 'chromeos-base/android-container-nyc-4717008-r1', Log file:
android-container-nyc-4717008-r1:  *  '/build/betty/tmp/portage/logs/chromeos-base:android-container-nyc-4717008-r1:20180412-202359.log'
android-container-nyc-4717008-r1: >>> Failed to emerge chromeos-base/android-container-nyc-4717008-r1 for /build/betty/, Log file:
android-container-nyc-4717008-r1: >>>  '/build/betty/tmp/portage/logs/chromeos-base:android-container-nyc-4717008-r1:20180412-202359.log'
android-container-nyc-4717008-r1: 
android-container-nyc-4717008-r1:  * Messages for package chromeos-base/android-container-nyc-4717008-r1 merged to /build/betty/:
android-container-nyc-4717008-r1: 
android-container-nyc-4717008-r1:  * Fetch failed for 'chromeos-base/android-container-nyc-4717008-r1', Log file:
android-container-nyc-4717008-r1:  *  '/build/betty/tmp/portage/logs/chromeos-base:android-container-nyc-4717008-r1:20180412-202359.log'
=== Complete: job android-container-nyc-4717008-r1 (0m13.7s) ===
Failed chromeos-base/android-container-nyc-4717008-r1 (in 0m13.7s). Your build has failed.

This seems to be blocking the PreCQ for me.
 
Labels: -Pri-1 Pri-0
Owner: jkop@chromium.org
Status: Assigned
The fix is being tested here:

http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8949420733278071712

But we need to record the series of related events, and figure out how things broke.
Cc: pbe...@chromium.org
Labels: Hotlist-CrOS-Sheriffing
The change that introduced the problem (or at least the one whose revert fixed it) was crrev.com/i/604790.

Comment 5 by dgarr...@chromium.org, Apr 13 (6 days ago)

Can we explain why the Android PFQ passed and performed an uprev after the initial change landed?

Comment 6 by jorgelo@chromium.org, Apr 13 (6 days ago)

I was curious about the same thing that Don is asking about in c#5.

Comment 7 by jkop@chromium.org, Apr 13 (6 days ago)

Cc: jkop@chromium.org
Labels: -Pri-0 Chase-Pending Pri-1
Owner: ----
Status: Available
Summary: android-container-nyc failure got into tree (was: android-container-nyc failing BuildPackages on an unrelated CL)
A good question, which needs followup.

Comment 8 by akes...@chromium.org, Apr 16 (3 days ago)

Components: -Infra>Client>ChromeOS Infra>Client>ChromeOS>CI
Labels: -Chase-Pending
Owner: jclinton@chromium.org
Status: Assigned
-> jclinton@

Comment 9 by jclinton@chromium.org, Apr 16 (3 days ago)

Cc: jclinton@chromium.org
Owner: jorgelo@chromium.org
Need more data. How can https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1010518 be related to the SHA256 of aapt changing?
 
The Comment #1 fix being tested doesn't really tell me anything either.

Comment 10 by jorgelo@chromium.org, Apr 16 (3 days ago)

Cc: jorgelo@chromium.org
Owner: jclinton@chromium.org
That was the point, essentially. A change in Autotest should not fail in BuildPackages.  The relevant CL was https://chrome-internal-review.googlesource.com/c/chromeos/overlays/project-cheets-private/+/604790. Reverting that CL fixed the problem. So, how did that CL (604790) make it in the first place? Aviv seems to think you should own figuring this out.

Comment 11 by jclinton@chromium.org, Apr 16 (3 days ago)

I don't understand this section of OS platform. What's going on here? How can these two things possibly be related? You made the change to the software; do you have any guesses?

Comment 12 by jclinton@chromium.org, Apr 16 (3 days ago)

Labels: OS-Chrome
Owner: dgarr...@chromium.org
From offline conversation: the initial bug report was simply that the PreCQ was broken, not that the CL mentioned was the cause of that breakage. It seems that Don knew at the time what the root cause was (Comment #1). Over to him to see if he knows of another bug that tracked that root cause or more information. Assign back to me once info. is added. Thank you.

Comment 13 by jkop@chromium.org, Apr 16 (3 days ago)

Cc: -jkop@chromium.org

Comment 14 by dgarr...@chromium.org, Apr 16 (3 days ago)

Owner: jclinton@chromium.org
From #4, this is the CL that caused the break. https://crrev.com/i/604790

I don't know if the script change or the ebuild change caused the problem (no idea what the actual problem was), just got dragged into reverting a .9999 ebuild revert.

.9999 ebuilds are really just templates that are unused until the next time the relevant ebuild is uprevved. Since the ebuild in question is only uprevved by the Android PFQ, it couldn't have caused a problem until the Android PFQ passed.

So.... how did the Android PFQ pass when the result broke everything else?

Or was the script change the problem?

Or is my analysis wrong somehow?

Comment 15 by jorgelo@chromium.org, Apr 16 (3 days ago)

I think those are the right questions, especially "how did the Android PFQ pass when the result broke everything else?".

Comment 16 by jclinton@chromium.org, Apr 16 (3 days ago)

Cc: khmel@chromium.org
Components: -Infra>Client>ChromeOS>CI Platform>ARC
Owner: khmel@chromium.org
Summary: android-container-nyc failure got into tree due to ebuild skipping testing (was: android-container-nyc failure got into tree)
This ebuild specifically skips SRC_URI in the CQ if the 9999.ebuild is being modified. https://chrome-internal-review.googlesource.com/c/chromeos/overlays/project-cheets-private/+/604790/2/chromeos-base/android-container-nyc/android-container-nyc-9999.ebuild#30 . Later, when the uprev happens, we are no longer hitting the skip logic.

Over to the team who owns this ebuild and the CL in question to fix this. This ebuild should not have a 9999 PV short-circuit of all testing in the CQ. Just delete that entire if statement; I can't think of any reason it should be there.

Comment 17 by jclinton@chromium.org, Apr 16 (3 days ago)

Cc: elijahtaylor@chromium.org

Comment 18 by khmel@chromium.org, Apr 16 (3 days ago)

>> So.... how did the Android PFQ pass when the result broke everything else?

Android PFQ had non-clean $DISTDIR and did not fetch fresh tools from new Android build (aapt). That is why Manifest was not updated. However during build_packages phase build system fetched new tool and failed due Manifest pointed to old file.

I traced how ebuild was updating the manifest and this is portage behavior. From one point it uses $DISDIR during the Manifest update. From other side it tries to download new binaries during build_packages phase.

We have alternative solution to use build build suffix for such tools. That should avoid this problem.





Comment 19 by khmel@chromium.org, Apr 16 (3 days ago)

Status: WontFix

Comment 20 by khmel@chromium.org, Apr 16 (3 days ago)

I think problem with CQ was resolved last week. I also explained why Android PFQ passed but build failed.

Sign in to add a comment