New issue
Advanced search Search tips

Issue 794276 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Closed: Jul 24
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Repair failing because of missing R63 artifacts.

Project Member Reported by dgarr...@chromium.org, Dec 12 2017

Issue description

We have a number of DUT repairs failing with messages like the following:

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos2-row11-rack6-host6/380400-repair/20171212124258/

          500 Internal Server Error
          The server encountered an unexpected condition which prevented it from fulfilling the request.
          Traceback (most recent call last):
    File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 656, in respond
      response.body = self.handler()
    File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 188, in __call__
      self.body = self.oldhandler(*args, **kwargs)
    File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 34, in __call__
      return self.callable(*self.args, **self.kwargs)
    File "/home/chromeos-test/chromiumos/src/platform/dev/devserver.py", line 796, in is_staged
      response = str(dl.IsStaged(factory))
    File "/home/chromeos-test/chromiumos/src/platform/dev/downloader.py", line 216, in IsStaged
      raise DownloaderException(exceptions)
  DownloaderException: Could not find full_dev_part_*.bin.gz in Google Storage at gs://chromeos-image-archive/lumpy-release/R63-10032.68.0
  Traceback (most recent call last):
    File "/home/chromeos-test/chromiumos/src/platform/dev/build_artifact.py", line 338, in Process
      self.name, self.is_regex_name, timeout)
    File "/home/chromeos-test/chromiumos/src/platform/dev/downloader.py", line 336, in Wait
      (name, self._archive_url))
 
As a test, we manually created an empty file artifact for the lumpy build in question with:

gsutil copy /dev/null gs://chromeos-image-archive/lumpy-release/R63-10032.68.0/full_dev_part_FAKE.bin.gz

Repair appears to be working, but waiting to see for sure.
FYI: https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/810021 was implicated and reverted.
revert is being pushed to prod in an emergency push now.


Re #2: This should have been caught in test_push.

Was it the case that the DUTs in staging lab did not have this flow turned on? But some in prod did?
If so, we need to make sure that all changes related to the new quick provision work are actually being tested in the staging lab before deployement.
How well does the test_push test the repair flow?
So the change to switch over to quick provisioning has not been flipped, and I wasn't thinking about older branches (eg the repair images) that didn't have the new artifacts used by quick provisioning.

I simplified code by just always attempting to stage the artifact which then failed when it didn't exist.

I reverted the change (https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/822180/1) which Don is emergency pushing, and the new version of the change will only stage during the normal provisioning flow and make it optional (https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/823231).  The actual quick provisioning code should fall back to the old provisioning method if it fails (eg when the artifacts don't exist).

I can't say to whether or not we do a repair against old builds on the staging instance or not.
Note: We will also do some provisions against fairly old branches because of payload testing, which does provision to the source version tested.
Re #5: 
My intention here is to figure out if there is a testing gap in staging that we can plug.

I don't think I fully understood this, but the takeaway is that this was only failing when installing builds against fairly old images, which we did when staging the repair image.
We still should have staged the repair image in the staging lab.
repair has many, many different paths, and this only happens for some paths. I don't know if all repair stages are equivalent or not.
Project Member

Comment 9 by bugdroid1@chromium.org, Dec 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/8cd4edc7c732cf887c21d80de929a08689654012

commit 8cd4edc7c732cf887c21d80de929a08689654012
Author: David Riley <davidriley@chromium.org>
Date: Wed Dec 13 11:08:53 2017

autotest: Stage quick provision payload

Only stage quick provision payload as part of normal provision workflow
and make stage of quick provision payload optional.

BUG= chromium:794276 
TEST=None

Change-Id: Ic3c367c1b44193c0e35829dd0dea500dad22cd4b
Reviewed-on: https://chromium-review.googlesource.com/823231
Commit-Ready: David Riley <davidriley@chromium.org>
Tested-by: David Riley <davidriley@chromium.org>
Reviewed-by: Xixuan Wu <xixuan@chromium.org>

[modify] https://crrev.com/8cd4edc7c732cf887c21d80de929a08689654012/server/site_tests/provision_AutoUpdate/provision_AutoUpdate.py

Status: Archived (was: Untriaged)

Sign in to add a comment