New issue
Advanced search Search tips

Issue 901565 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

autoupdate_Cellular* tests fail to start the update on some devices in pool:cellular but not others

Project Member Reported by dhaddock@google.com, Nov 3

Issue description

Both regular update and forced OOBE over cellular are failing:

https://stainless.corp.google.com/search?view=list&first_date=2018-10-20&last_date=2018-11-02&suite=%5Ecellular%5C_au%24&model=nautilus&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false

autoupdate_CannedOmahaUpdate client test did not pass (reason: Update attempt failed: Command <update_engine_client -update -omaha_url=http://127.0.0.1:39709/update > failed, rc=1, Command returned non-zero exit status).
autoupdate_StartOOBEUpdate client test did not pass (reason: Update status was unexpectedly IDLE when we were waiting for the update to complete. Please check the update engine logs.).
 
Example update_engine.log

[1101/001437:INFO:omaha_request_action.cc(847)] Request: <?xml version="1.0" encoding="UTF-8"?>
<request protocol="3.0" version="ChromeOSUpdateEngine-0.1.0.0" updaterversion="ChromeOSUpdateEngine-0.1.0.0" installsource="ondemandupdate" ismachine="1">
    <os version="Indy" platform="Chrome OS" sp="ForcedUpdate_x86_64"></os>
    <app appid="{85F8FA82-F276-4EA6-8980-93FE091F6D25}" version="ForcedUpdate" track="stable-channel" lang="en-US" board="nautilus" hardware_class="NAUTILUS F3B-A2D-B5W-394-Q8M" delta_okay="true" fw_version="" ec_version="" >
        <event eventtype="13" eventresult="1"></event>
    </app>
</request>

[1101/001437:INFO:libcurl_http_fetcher.cc(146)] Starting/Resuming transfer
[1101/001437:INFO:libcurl_http_fetcher.cc(165)] Using proxy: no
[1101/001437:INFO:libcurl_http_fetcher.cc(286)] Not setting http(s) curl options because we are running a dev/test image
[1101/001437:INFO:libcurl_http_fetcher.cc(433)] HTTP response code: 200
[1101/001437:INFO:libcurl_http_fetcher.cc(509)] Transfer completed (200), 1365 bytes downloaded
[1101/001437:INFO:omaha_request_action.cc(1210)] Omaha request response: 
          <response protocol="3.0">
            <daystart elapsed_seconds="44801"/>
              <app appid="{85F8FA82-F276-4EA6-8980-93FE091F6D25}" status="ok">
              <ping status="ok"/>
                <updatecheck status="ok">
                <urls>
                  <url codebase="https://storage.googleapis.com/"/><url codebase="https://storage.googleapis.com/"/>
                </urls>
                <manifest version="999999.0.0">
                  <packages>
                    <package hash_sha256="71c4b1ac12a01d537204f0663618dfdedacaf9c7f400d8beafe298d9b10043e1" name="chromeos-throw-away-bucket/CrOSPayloads/Cellular/chromeos_R72-11197.0.0_nautilus_full_dev.bin"
                    size="1396035064" required="true"/>
                  </packages>
                  <actions>
                    <action event="postinstall"
                    ChromeOSVersion="999999.0.0"
                    sha256="71c4b1ac12a01d537204f0663618dfdedacaf9c7f400d8beafe298d9b10043e1"
                    needsadmin="false"
                    IsDeltaPayload="false"
                    MaxFailureCountPerUrl="1"
                    DisablePayloadBackoff="true"
                    deadline="now"
                    />
                  </actions>
                </manifest>
              </updatecheck>
            </app>
          </response>
        
[1101/001437:INFO:omaha_request_action.cc(1899)] No device policy is loaded. Assuming rollback enabled.
[1101/001437:INFO:omaha_request_action.cc(1929)] Rollback is enabled. Setting kernel_max_rollforward to 65537
[1101/001437:INFO:metrics_reporter_omaha.cc(609)] Sending 65537 for metric UpdateEngine.KernelKey.MinVersion
[1101/001437:INFO:metrics_reporter_omaha.cc(614)] Sending 65537 for metric UpdateEngine.KernelKey.MaxRollforwardVersion
[1101/001437:INFO:metrics_reporter_omaha.cc(619)] Sending 1 for metric UpdateEngine.KernelKey.MaxRollforwardSetSuccess (bool)
[1101/001437:INFO:action_processor.cc(116)] ActionProcessor: finished OmahaRequestAction with code ErrorCode::kSuccess
[1101/001437:INFO:action_processor.cc(143)] ActionProcessor: starting DownloadAction
[1101/001437:INFO:install_plan.cc(83)] InstallPlan: resume, version: 999999.0.0, source_slot: B, target_slot: A, url: https://storage.googleapis.com/chromeos-throw-away-bucket/CrOSPayloads/Cellular/chromeos_R72-11197.0.0_nautilus_full_dev.bin, payload: (size: 1396035064, metadata_size: 0, metadata signature: , hash: 71C4B1AC12A01D537204F0663618DFDEDACAF9C7F400D8BEAFE298D9B10043E1, payload type: full), hash_checks_mandatory: false, powerwash_required: false, switch_slot_on_reboot: true, run_post_install: true
[1101/001437:INFO:download_action.cc(200)] Marking new slot as unbootable
[1101/001437:INFO:boot_control_chromeos.cc(201)] Marking slot A unbootable
[1101/001437:INFO:multi_range_http_fetcher.cc(45)] starting first transfer
[1101/001437:INFO:multi_range_http_fetcher.cc(74)] starting transfer of range 0+66039
[1101/001437:INFO:libcurl_http_fetcher.cc(146)] Starting/Resuming transfer
[1101/001437:INFO:libcurl_http_fetcher.cc(165)] Using proxy: no
[1101/001437:INFO:libcurl_http_fetcher.cc(286)] Not setting http(s) curl options because we are running a dev/test image
[1101/001437:ERROR:libcurl_http_fetcher.cc(436)] Unable to get http response code.
[1101/001437:INFO:libcurl_http_fetcher.cc(468)] Transfer resulted in an error (0), 0 bytes downloaded
[1101/001437:INFO:libcurl_http_fetcher.cc(482)] No further proxies, indicating transfer complete
[1101/001437:INFO:multi_range_http_fetcher.cc(168)] Received transfer complete.
[1101/001437:INFO:multi_range_http_fetcher.cc(125)] TransferEnded w/ code 0
[1101/001437:INFO:multi_range_http_fetcher.cc(140)] Didn't get enough bytes. Ending w/ failure.
[1101/001437:INFO:action_processor.cc(116)] ActionProcessor: finished DownloadAction with code ErrorCode::kDownloadTransferError
[1101/001437:INFO:action_processor.cc(121)] ActionProcessor: Aborting processing due to failure.
[1101/001437:INFO:update_attempter.cc(993)] Processing Done.
[1101/001437:INFO:cpu_limiter.cc(71)] Setting cgroup cpu shares to  1024
[1101/001437:INFO:cpu_limiter.cc(79)] CPU shares = 1024
[1101/001437:ERROR:update_attempter.cc(1407)] Update failed.
Cc: benchan@chromium.org
The only thing I can see is that it can't receive any data over the cellular. Benchan@ would be the expert on this :)
I have tested at my desk and I cannot reproduce this

autoupdate_Cellular.delta
  astronaut: 5/5 pass
  nautilus: 3/3 pass

autoupdate_ForcedOOBEUpdate.cellular.delta
  astronaut: 3/3 pass
  nautilus: 3/3 pass
However it is still happening in the lab. It looks to be flaky as it is only happening sometimes. 

Latest build the test ran on two different devices: One passed, one failed with update did not start 

Passing: https://stainless.corp.google.com/browse/chromeos-autotest-results/256146464-chromeos-test/

Failing: https://stainless.corp.google.com/browse/chromeos-autotest-results/256146011-chromeos-test/


Cc: -benchan@chromium.org
Owner: benchan@chromium.org
A bit more investigation...

I kicked off the test from my desk against some devices in pool:cellular

chromeos1-row2-rack7-host1 (daisy): Fail 3/3 with the failure in this bug 
chromeos1-row2-rack7-host4 (nautilus): Pass 3/3
chromeos1-row1-rack8-host2 (quawks): Pass 2/3 1 failure for INIT_ERROR: org.chromium.flimflam.Error.Failure: Serial command timed out

chromeos1-row2-rack7-host3 (nautilus): Fail 2/2 with the failure in this bug 
Cc: harpreet@chromium.org
Interestingly the two devices that failed for me are also the two that have last failed in the lab. 

chromeos1-row2-rack7-host1 (daisy)
chromeos1-row2-rack7-host3 (nautilus)
Summary: autoupdate_Cellular* tests fail to start the update on some devices in pool:cellular but not others (was: Cellular AU tests fail to start the update )
Status: Assigned (was: Untriaged)
This issue has an owner, a component and a priority, but is still listed as untriaged or unconfirmed. By definition, this bug is triaged. Changing status to "assigned". Please reach out to me if you disagree with how I've done this.

Sign in to add a comment