New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 626650 link

Starred by 5 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 626755



Sign in to add a comment

Cloud Endpoints proxy server flakes with 404 when buildbot talks to buildbucket

Project Member Reported by machenb...@chromium.org, Jul 8 2016

Issue description

A bunch of CLs got stuck in CQ around 13-15 CET on July 8.

E.g.:
https://codereview.chromium.org/2130293002/

CQ is stuck on this builder:
https://build.chromium.org/p/tryserver.v8/builders/v8_win_rel_ng/builds/10201

The step triggering the triggered trybot went purple, so CQ seems to be blocked on the triggered trybot not running.

Here, probably the purple trigger should have made the build fail. Or the trigger should have retried internally.
 
Components: Infra>CQ
Filed two specific buildbucket bugs: http://crbug.com/626652 and http://crbug.com/626652. This bug should be fixed asap to avoid impact on V8 devs, while the ones I filed would fix the root cause (i think).
Also here:
https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2023503002/1890001

The bot v8_linux_mips64el_compile_rel is marked not-started despite lots of idle bots on the tryserver.
The reason for the revert is that as tAndrii noticed that CQ started failing about the same time new version was deployed.
Labels: -Restrict-View-Google
Making this public. Nothing internal.
FTR, the revert didn't really help, at least not visibly - there are still weird 404s in the log for certainly valid URLs that return 202 in my browser:

/_ah/api/buildbucket/v1/search?max_builds=40&alt=json&tag=buildset%3Apatch%2Frietveld%2Fcodereview.chromium.org%2F2034083002%2F120001&bucket=master.tryserver.v8&bucket=master.tryserver.chromium.linux



Returned back to newest version 1f0d017.
Labels: -Pri-1 Pri-0
looks like typical cloud endpoints outage. Pinged internal bug. Will try a workaround

Comment 11 by d...@chromium.org, Jul 8 2016

I also observed a wave of 404s last night around 11PM PST. The leases expired and BuildBucket recovered, and the 404s stopped. The 404s did show up in the GAE logs, but with no additional information. Is that possible if this was an endpoints outage?

Either way, if this is not still ongoing, we should probably lower the priority.

Do we have an internal bug to track this?
Owner: no...@chromium.org
Status: Started (was: Untriaged)
the internal bug is b/25147957

I am landing https://codereview.chromium.org/2137583002/ which bypasses Cloud Endpoints API server proxy, which should fix the 404 with endpoints permanently. The context of that change is https://codereview.chromium.org/2117833003/
Project Member

Comment 14 by bugdroid1@chromium.org, Jul 8 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build.git/+/01a9273aee506a2463a290c407eb4cd3599d53d3

commit 01a9273aee506a2463a290c407eb4cd3599d53d3
Author: nodir <nodir@chromium.org>
Date: Fri Jul 08 17:56:58 2016

buildbucket: bypass Cloud Endpoints API server

Cloud Endpoints API server causes occasional 404s and increases of latency.
We don't use the benefits that it provides, so bypass it.

This CL also vendors discovery doc from buildbucket, primarily because
apiclient API does not allow to manipulate API baseURL, but also it
simplfies code.

R=dnj@chromium.org
BUG= 626650 

Review-Url: https://codereview.chromium.org/2137583002

[modify] https://crrev.com/01a9273aee506a2463a290c407eb4cd3599d53d3/scripts/master/buildbucket/__init__.py
[modify] https://crrev.com/01a9273aee506a2463a290c407eb4cd3599d53d3/scripts/master/buildbucket/client.py
[add] https://crrev.com/01a9273aee506a2463a290c407eb4cd3599d53d3/scripts/master/buildbucket/discovery_doc.json
[modify] https://crrev.com/01a9273aee506a2463a290c407eb4cd3599d53d3/scripts/master/buildbucket/status.py
[add] https://crrev.com/01a9273aee506a2463a290c407eb4cd3599d53d3/scripts/master/buildbucket/update_discovery_doc.sh

Project Member

Comment 15 by bugdroid1@chromium.org, Jul 8 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build.git/+/01a9273aee506a2463a290c407eb4cd3599d53d3

commit 01a9273aee506a2463a290c407eb4cd3599d53d3
Author: nodir <nodir@chromium.org>
Date: Fri Jul 08 17:56:58 2016

buildbucket: bypass Cloud Endpoints API server

Cloud Endpoints API server causes occasional 404s and increases of latency.
We don't use the benefits that it provides, so bypass it.

This CL also vendors discovery doc from buildbucket, primarily because
apiclient API does not allow to manipulate API baseURL, but also it
simplfies code.

R=dnj@chromium.org
BUG= 626650 

Review-Url: https://codereview.chromium.org/2137583002

[modify] https://crrev.com/01a9273aee506a2463a290c407eb4cd3599d53d3/scripts/master/buildbucket/__init__.py
[modify] https://crrev.com/01a9273aee506a2463a290c407eb4cd3599d53d3/scripts/master/buildbucket/client.py
[add] https://crrev.com/01a9273aee506a2463a290c407eb4cd3599d53d3/scripts/master/buildbucket/discovery_doc.json
[modify] https://crrev.com/01a9273aee506a2463a290c407eb4cd3599d53d3/scripts/master/buildbucket/status.py
[add] https://crrev.com/01a9273aee506a2463a290c407eb4cd3599d53d3/scripts/master/buildbucket/update_discovery_doc.sh

Summary: Cloud Endpoints proxy server flakes with 404 when buildbot talks to buildbucket (was: Several buildbucket errors on V8 CQ)
Project Member

Comment 17 by bugdroid1@chromium.org, Jul 8 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/master-manager.git/+/b26d9382bf13c6a2bb29ac7e57e1e02acf925b3a

commit b26d9382bf13c6a2bb29ac7e57e1e02acf925b3a
Author: nodir <nodir@google.com>
Date: Fri Jul 08 18:01:35 2016

Project Member

Comment 18 by bugdroid1@chromium.org, Jul 8 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/master-manager.git/+/b26d9382bf13c6a2bb29ac7e57e1e02acf925b3a

commit b26d9382bf13c6a2bb29ac7e57e1e02acf925b3a
Author: nodir <nodir@google.com>
Date: Fri Jul 08 18:01:35 2016

the problem is fixed for master.tryserver.infra
https://codereview.chromium.org/2133263002/ removes the whitelist of buckets, so the fix will apply to all masters (maters still need to be restarted)
Project Member

Comment 20 by bugdroid1@chromium.org, Jul 8 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build.git/+/98030827a18cd840b8853d38ddd3c27c2f85aafb

commit 98030827a18cd840b8853d38ddd3c27c2f85aafb
Author: nodir <nodir@chromium.org>
Date: Fri Jul 08 18:26:53 2016

buildbucket: always bypass cloud endpoints server

The whitelisted master.tryserver.infra just worked, so apply the change
to all masters to fix them

R=dnj@chromium.org, tandrii@chromium.org
BUG= 626650 

Review-Url: https://codereview.chromium.org/2133263002

[modify] https://crrev.com/98030827a18cd840b8853d38ddd3c27c2f85aafb/scripts/master/buildbucket/client.py

Cc: shey...@chromium.org machenb...@chromium.org
Labels: -Pri-0 Pri-1
Workarounds landed, but picked up only by master.tryserver.infra

The worst impact that this problem may cause is 
1) longer cycle time because **some** completed builds are retried
2) longer pending queues

however, we don't see unusually long pending queues in http://shortn/_WYRUuURIGt. Restarting chromium tryservers now would result in restarting of **all** builds, which would be probably worse than not restarting.

Let's restart tryservers in the evening.
Blockedon: 626755

Comment 23 by no...@chromium.org, Jul 18 2016

Status: Fixed (was: Started)
#20 helped
 Issue 626768  has been merged into this issue.
Labels: CloudEndpoints404

Sign in to add a comment