New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 898509 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Dec 12
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Debug symbols failing for M71 and M72/ToT

Project Member Reported by kbleicher@google.com, Oct 24

Issue description

Debug symbols failing for a number of M71 boards with the last two M71 builds and recent M72 ToT Builds.

M71:
2018-10-24 00:51	11151.11.0	71.0.3578.21
2018-10-23 00:30	11151.10.0	71.0.3578.18

M72:
Recent as well

Also note that it's not failing for all boards this time, but for most.


 
Labels: ReleaseBlock-Beta
Once resolved we'll need to backfill for 11151.11.0 / 71.0.3578.21
As a person going through this process for the first few times, is there an alert that announces that the debug symbol upload process is failing?  

It seems like we usually (so far) we find this out at release time when looking at a dashboard.  An alert at a minimum and ideally a bug automatic filing would be helpful.  
Owner: dgarr...@chromium.org
Status: Assigned (was: Untriaged)
Over to current oncaller.
Is this the problem with the symbols server crashing during symbol upload? Looking for logs of the failure.

Sample build:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8931791811345421664

DebugInfoCheck:

This is only a warning, but seems relevant.

06:46:01: INFO: RunCommand: cros_sdk -- debug_info_test /build/eve/usr/lib/debug
  /build/eve/usr/lib/debug/opt/intel/fw_parser.debug failed check: check_exist: check_debug_info
  07:07:20: WARNING: Traceback (most recent call last):
    File "/b/swarming/w/ir/cache/cbuild/repository/chromite/cbuildbot/stages/generic_stages.py", line 702, in Run
      self.PerformStage()
    File "/b/swarming/w/ir/cache/cbuild/repository/chromite/cbuildbot/stages/test_stages.py", line 592, in PerformStage
      cros_build_lib.RunCommand(cmd, enter_chroot=True)
    File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/cros_build_lib.py", line 647, in RunCommand
      raise RunCommandError(msg, cmd_result)
  RunCommandError: return code: 1; command: cros_sdk -- debug_info_test /build/eve/usr/lib/debug


Debug Symbols:

The only real failure I see is this:

  07:35:04: INFO: Uploading symbol_file: chrome/5215AF1973DA1BBE933AC2F407D00CD80/chrome.sym
  08:16:52: WARNING: could not upload: chrome.sym: HTTP 400: Bad Request

Everything else appears to have uploaded normally. I believe that means this is the problem which we have already escalated to the crash server team, but I don't have a link to the relevant bugs.
Re #2) No alerts.

We intentionally treat these failures as warnings only, because the crash server upload process is much to flaky to consider this a real failure. I would love to revamp the upload process to be MUCH faster and more efficient. We upoad the symbols tarball to GS in about a minute, but spend about an hour attempting to upload them to the crash server.
Hrm, I guess the difference in reports boils down to one where it was 500's before and now it's 400's. I think we should just reopen b/117235960 .

Status: Started (was: Assigned)
I reopened that bug and asked for feedback.
b/117235960 is reporting this as resolved, and confirmed via M72: 11191.0.0 
/ 72.0.3589.0

We'll need to backfill symbols for 11151.11.0 / 71.0.3578.21 since it's the DEV / Beta Candidate, however.  Let's keep this open until the backfill is complete.  Thanks

Owner: kbleicher@chromium.org
Kevin, the backfill instructions are in the DebugSymbolsUpload stage logs for each of the failed uploads that you are interested in. Can you run those on your workstation?
Respectfully, this was an infra issue so hoping infra can resolve. We're quite time pressured on other aspects of the release at the moment...  Thanks
Cc: dgarr...@chromium.org
If this is an infra failure, would this be in the CI Bobby's jurisdiction?
We own most build related infra. Debug symbols is a bit on the fuzzy side.

I would say that this is something that is scripted at the build level, but not at the release level. Having a script for that would be really helpful for situations like this.
vapier@, can you assist with the backfill as you did last time?  Critical we get these in place.  For beta too when we go to push that.
Sample instructions:

  08:07:01: NOTICE: upload_symbols --failed-list gs://chromeos-image-archive/expresso-release/R72-11185.0.0/failed_upload_symbols.list gs://chromeos-image-archive/expresso-release/R72-11185.0.0/debug_breakpad.tar.xz

This can be bulk uploaded with a command like:

gsutil ls gs://chromeos-image-archive/*-release/R71-11151.11.0/debug_breakpad.tar.xz | xargs -n 20 bin/upload_symbols

I would expect that to take somewhere between 8 and 150 hours to run.


Note: I'm running that now.
Slightly revised command:

gsutil ls gs://chromeos-image-archive/*-release/R71-11151.11.0/debug_breakpad.tar.xz | xargs bin/upload_symbols --dedupe --yes

Don, should I expect to see this resolve / update in GoldenEye once it completes?

https://cros-goldeneye.corp.google.com/chromeos/console/viewRelease?releaseName=M71-DEV-CHROMEOS-7

Thx
Nope. That's based only on the success of the upload by the build.

BTW: The upload is still running. The current rate looks very, very roughly like about 60 hours total run time.
Still backfilling, I assume?  Is it progressing, stuck,.. ?
Still back filling, but it seems to have finished uploading symbols for Chrome and be back filling a bunch of small system binaries.
For what it's worth, it's going at roughly one file per 7 seconds.
Got it, should I expect to see the symbols when I view the RC in GoldenEye?

https://cros-goldeneye.corp.google.com/chromeos/console/viewRelease?releaseName=M71-DEV-CHROMEOS-7
No, that's based on build results, not based on the symbols actually being uploaded.
Thanks; nice to confirm :-)
Labels: -ReleaseBlock-Beta
I'm going to remove as a blocker since the issue is resolved and the backfill is progressing.
Status: Fixed (was: Started)

Sign in to add a comment