Release builders are failing DebugSymbols with 403: Forbidden when uploading |
|||
Issue descriptionSome builders pass, some builders fail, as an example we can look at paine and yuna which should be functionally identical, yet yuna passes and paine fails. Failing case: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8940761309377842448 https://luci-logdog.appspot.com/v/?s=chromeos/buildbucket/cr-buildbucket.appspot.com/8940761309377842448/+/steps/DebugSymbols/0/stdout ... 06:05:01: INFO: Starting new HTTPS connection (1): isolateserver.appspot.com 06:05:02: INFO: Queried 100 files, 92 cache hit 06:05:02: INFO: Uploading symbol_file: chrome/DAAC907241B34718FC7BEC0BA1F5D27D0/chrome.sym [1;33m06:57:48: WARNING: could not upload: chrome.sym: HTTP 403: Forbidden[0m 06:57:48: INFO: Uploading symbol_file: nacl_helper/2E6FF4CAFB0BEB0F6EC0AAEEE03D6D510/nacl_helper.sym [1;33m07:03:51: WARNING: could not upload: nacl_helper.sym: HTTP 403: Forbidden[0m 07:03:51: INFO: Uploading symbol_file: chromedriver/513F138808B9573F178412A8187AE45D0/chromedriver.sym [1;33m07:06:42: WARNING: could not upload: chromedriver.sym: HTTP 403: Forbidden[0m 07:06:42: INFO: Uploading symbol_file: shill/0B92C6811C22C42935C0CE449369A0FA0/shill.sym [1;33m07:09:09: WARNING: could not upload: shill.sym: HTTP 403: Forbidden[0m ... Passing case: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8940761308492253520 https://luci-logdog.appspot.com/v/?s=chromeos/buildbucket/cr-buildbucket.appspot.com/8940761308492253520/+/steps/DebugSymbols/0/stdout ... 06:06:30: INFO: Starting new HTTPS connection (1): isolateserver.appspot.com 06:06:35: INFO: Queried 100 files, 23 cache hit 06:06:36: INFO: Uploading symbol_file: chrome/0664D0DDFC820EBACEF9321BD61C01470/chrome.sym 06:16:16: INFO: upload of 605699110 bytes took 0:09:40.027021 06:16:16: INFO: Push state size: 93 06:16:16: INFO: Resetting dropped connection: isolateserver.appspot.com 06:16:16: INFO: sent chrome.sym 06:16:16: INFO: Uploading symbol_file: nacl_helper/A6D21C68F4278C9CD8E59B6EB2B3413F0/nacl_helper.sym 06:17:49: INFO: upload of 84720608 bytes took 0:01:32.867301 06:17:49: INFO: Push state size: 98 06:17:49: INFO: sent nacl_helper.sym 06:17:50: INFO: Uploading symbol_file: deqp-gles3/D295A8F45E6201EA46A0A2084B11397F0/deqp-gles3.sym 06:17:51: INFO: upload of 27382370 bytes took 0:00:00.903543 06:17:51: INFO: Push state size: 97 06:17:51: INFO: sent deqp-gles3.sym ... It looks like we have some problem talking to the isolate server? Is there something on the build machine itself that is failing to authenticate?
,
Jul 17
I believe there is an IP address whitelist that needs to be updated to include: 104.154.112/24 We have 256 new GCE swarm bots in that IP range. If I'm right, then builds which happen to land on a builder with one of the new IPs will fail, but builds that land on a builder with an old IP will pass. It's hard to identify which IP address range a builder is in from the name, only via looking up in the name in pantheon. When we figure out where/how to update the IP whitelist, we should document that in go/cros-builder-address-whitelisting
,
Jul 17
I didn't look through all examples, but the failing build linked above is from the new IP address range while the passing build is not.
,
Jul 17
We started failing around 10840.0.0 entirely. It looks like these got into this state with some working around 10853.0.0 on 7/6. Is this something we need to talk to the crash folks about then? Normally I would expect no debug symbols to block the release itself, we want to be able to see crashes and such right? We can recover it in this case, but it sounds like it may be painful to run upload_symbols by hand on every one of these boards, but that may be worth it in this case.
,
Jul 17
The IP whitelisting sounds like the issue, my comment is outdated.
,
Jul 17
> Normally I would expect no debug symbols to block the release itself TPMs are the ones who designed this policy because of random symbol upload flakes, and because it's trivial to recover. you have to run the upload_symbols command once per failing bot. we prob could even add a logging line so people can copy & paste the command. > we want to be able to see crashes you still get crashes, you just won't get symbols until the symbol files are uploaded. and crashes don't happen until the images are pushed, so you've got time.
,
Jul 17
Something in this files needs updating: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chrome-infra-auth/ip_whitelist.cfg I just don't see what.
,
Jul 17
i don't think that's where the whitelist is maintained. i think you want to file a b/ here: https://b.corp.google.com/issues/new?component=25229
,
Jul 17
That makes sense. Isolate was working, but I was looking at isolate whitelists.
,
Jul 17
,
Jul 17
This should be fixed. Please ping if it keeps happening.
,
Jul 17
I updated our docs as well.
,
Jul 17
if you need help recovering the symbols for those builds rather than try to kick off another set, let us know
,
Jul 17
I think this particular build is dead for a separate blocking bug, so we are probably ok without any manual symbol intervention here.
,
Jul 26
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/f85b60ef1e734ec4af62eb1b93f7ed700b3e9a03 commit f85b60ef1e734ec4af62eb1b93f7ed700b3e9a03 Author: Mike Frysinger <vapier@chromium.org> Date: Thu Jul 26 22:52:07 2018 DebugSymbolsStage: fix & improve handling of failed uploads It's possible to use upload_symbols with the failed symbol list to recover a partial symbol upload failure. However, the exact way to run that command is not obvious, especially across multiple builds. Add a logging line so users can just copy & paste the right command. While adding unittests for this, I noticed that while we intended to upload the failed file list when the upload command failed, we threw the exception too early for that. This was compounded by the unittest also throwing the wrong exception, so while we overall made sure the stage continued to pass in face of failures, we swallowed too many. So change UploadSymbols to delay throwing DebugSymbolsUploadException until the end, and fix the unittest that checks this behavior to throw failures_lib.BuildScriptFailure. BUG= chromium:864590 TEST=unittests pass Change-Id: I6df08e3be7c3df08badab7cf9c15a900fcbdec8f Reviewed-on: https://chromium-review.googlesource.com/1142595 Commit-Ready: Mike Frysinger <vapier@chromium.org> Tested-by: Mike Frysinger <vapier@chromium.org> Reviewed-by: Don Garrett <dgarrett@chromium.org> [modify] https://crrev.com/f85b60ef1e734ec4af62eb1b93f7ed700b3e9a03/cbuildbot/stages/artifact_stages_unittest.py [modify] https://crrev.com/f85b60ef1e734ec4af62eb1b93f7ed700b3e9a03/cbuildbot/stages/artifact_stages.py |
|||
►
Sign in to add a comment |
|||
Comment 1 by vapier@chromium.org
, Jul 17