New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 864590 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 17
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Release builders are failing DebugSymbols with 403: Forbidden when uploading

Project Member Reported by bhthompson@google.com, Jul 17

Issue description

Some builders pass, some builders fail, as an example we can look at paine and yuna which should be functionally identical, yet yuna passes and paine fails. 

Failing case:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8940761309377842448
https://luci-logdog.appspot.com/v/?s=chromeos/buildbucket/cr-buildbucket.appspot.com/8940761309377842448/+/steps/DebugSymbols/0/stdout
...
06:05:01: INFO: Starting new HTTPS connection (1): isolateserver.appspot.com
06:05:02: INFO: Queried 100 files, 92 cache hit
06:05:02: INFO: Uploading symbol_file: chrome/DAAC907241B34718FC7BEC0BA1F5D27D0/chrome.sym
06:57:48: WARNING: could not upload: chrome.sym: HTTP 403: Forbidden
06:57:48: INFO: Uploading symbol_file: nacl_helper/2E6FF4CAFB0BEB0F6EC0AAEEE03D6D510/nacl_helper.sym
07:03:51: WARNING: could not upload: nacl_helper.sym: HTTP 403: Forbidden
07:03:51: INFO: Uploading symbol_file: chromedriver/513F138808B9573F178412A8187AE45D0/chromedriver.sym
07:06:42: WARNING: could not upload: chromedriver.sym: HTTP 403: Forbidden
07:06:42: INFO: Uploading symbol_file: shill/0B92C6811C22C42935C0CE449369A0FA0/shill.sym
07:09:09: WARNING: could not upload: shill.sym: HTTP 403: Forbidden
...

Passing case:
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8940761308492253520
https://luci-logdog.appspot.com/v/?s=chromeos/buildbucket/cr-buildbucket.appspot.com/8940761308492253520/+/steps/DebugSymbols/0/stdout
...
06:06:30: INFO: Starting new HTTPS connection (1): isolateserver.appspot.com
06:06:35: INFO: Queried 100 files, 23 cache hit
06:06:36: INFO: Uploading symbol_file: chrome/0664D0DDFC820EBACEF9321BD61C01470/chrome.sym
06:16:16: INFO: upload of  605699110 bytes took 0:09:40.027021
06:16:16: INFO: Push state size: 93
06:16:16: INFO: Resetting dropped connection: isolateserver.appspot.com
06:16:16: INFO: sent chrome.sym
06:16:16: INFO: Uploading symbol_file: nacl_helper/A6D21C68F4278C9CD8E59B6EB2B3413F0/nacl_helper.sym
06:17:49: INFO: upload of   84720608 bytes took 0:01:32.867301
06:17:49: INFO: Push state size: 98
06:17:49: INFO: sent nacl_helper.sym
06:17:50: INFO: Uploading symbol_file: deqp-gles3/D295A8F45E6201EA46A0A2084B11397F0/deqp-gles3.sym
06:17:51: INFO: upload of   27382370 bytes took 0:00:00.903543
06:17:51: INFO: Push state size: 97
06:17:51: INFO: sent deqp-gles3.sym
...

It looks like we have some problem talking to the isolate server? Is there something on the build machine itself that is failing to authenticate?
 
looks like a flake on the crash side, not the isolate side.  this doesn't block the build, and you can recover the symbols by manually running upload_symbols and pointing the --failed-list option at the uploaded failed list, and specify the breakpad tarball.  both should be in the gs bucket for the specific build.
I believe there is an IP address whitelist that needs to be updated to include:

104.154.112/24

We have 256 new GCE swarm bots in that IP range. If I'm right, then builds which happen to land on a builder with one of the new IPs will fail, but builds that land on a builder with an old IP will pass.

It's hard to identify which IP address range a builder is in from the name, only via looking up in the name in pantheon.

When we figure out where/how to update the IP whitelist, we should document that in go/cros-builder-address-whitelisting
I didn't look through all examples, but the failing build linked above is from the new IP address range while the passing build is not.
We started failing around 10840.0.0 entirely. 

It looks like these got into this state with some working around 10853.0.0 on 7/6.

Is this something we need to talk to the crash folks about then?

Normally I would expect no debug symbols to block the release itself, we want to be able to see crashes and such right? We can recover it in this case, but it sounds like it may be  painful to run upload_symbols by hand on every one of these boards, but that may be worth it in this case.
Owner: dgarr...@chromium.org
The IP whitelisting sounds like the issue, my comment is outdated. 
> Normally I would expect no debug symbols to block the release itself

TPMs are the ones who designed this policy because of random symbol upload flakes, and because it's trivial to recover.  you have to run the upload_symbols command once per failing bot.  we prob could even add a logging line so people can copy & paste the command.

> we want to be able to see crashes

you still get crashes, you just won't get symbols until the symbol files are uploaded.  and crashes don't happen until the images are pushed, so you've got time.
i don't think that's where the whitelist is maintained.  i think you want to file a b/ here:
  https://b.corp.google.com/issues/new?component=25229
That makes sense. Isolate was working, but I was looking at isolate whitelists.
Status: Fixed (was: Untriaged)
This should be fixed. Please ping if it keeps happening.


I updated our docs as well.
if you need help recovering the symbols for those builds rather than try to kick off another set, let us know
I think this particular build is dead for a separate blocking bug, so we are probably ok without any manual symbol intervention here.
Project Member

Comment 15 by bugdroid1@chromium.org, Jul 26

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/f85b60ef1e734ec4af62eb1b93f7ed700b3e9a03

commit f85b60ef1e734ec4af62eb1b93f7ed700b3e9a03
Author: Mike Frysinger <vapier@chromium.org>
Date: Thu Jul 26 22:52:07 2018

DebugSymbolsStage: fix & improve handling of failed uploads

It's possible to use upload_symbols with the failed symbol list to
recover a partial symbol upload failure.  However, the exact way to
run that command is not obvious, especially across multiple builds.
Add a logging line so users can just copy & paste the right command.

While adding unittests for this, I noticed that while we intended to
upload the failed file list when the upload command failed, we threw
the exception too early for that.  This was compounded by the unittest
also throwing the wrong exception, so while we overall made sure the
stage continued to pass in face of failures, we swallowed too many.
So change UploadSymbols to delay throwing DebugSymbolsUploadException
until the end, and fix the unittest that checks this behavior to throw
failures_lib.BuildScriptFailure.

BUG= chromium:864590 
TEST=unittests pass

Change-Id: I6df08e3be7c3df08badab7cf9c15a900fcbdec8f
Reviewed-on: https://chromium-review.googlesource.com/1142595
Commit-Ready: Mike Frysinger <vapier@chromium.org>
Tested-by: Mike Frysinger <vapier@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>

[modify] https://crrev.com/f85b60ef1e734ec4af62eb1b93f7ed700b3e9a03/cbuildbot/stages/artifact_stages_unittest.py
[modify] https://crrev.com/f85b60ef1e734ec4af62eb1b93f7ed700b3e9a03/cbuildbot/stages/artifact_stages.py

Sign in to add a comment