New issue
Advanced search Search tips

Issue 843393 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner: ----
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Internal server error when viewing a build

Project Member Reported by vadimsh@chromium.org, May 15 2018

Issue description

Owner: tandrii@chromium.org
Status: Assigned (was: Untriaged)
I bet this is my recent refactoring.
Labels: -Pri-3 Pri-0
Looks like it is affecting even more "normal" builds, e.g. this CI build:

https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/47286

Please revert the deployment first.
Labels: -Pri-0 Pri-1
Status: Started (was: Assigned)
hinoka@ rolled back.

final fix in https://chromium-review.googlesource.com/#/c/infra/luci/luci-go/+/1060580
Project Member

Comment 4 by bugdroid1@chromium.org, May 16 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-go.git/+/5ca4cf2123d3ec046d94e09e8cb37f451836297f

commit 5ca4cf2123d3ec046d94e09e8cb37f451836297f
Author: Andrii Shyshkalov <tandrii@chromium.org>
Date: Wed May 16 00:17:02 2018

[milo] fix regression for builds w/o associated gitiles/commit buildset.

Regression was introduced in https://crrev.com/c/1060467.

R=hinoka@chromium.org

Bug:  843393 ,  843245 
Change-Id: If5c51bc609562ffb6be0af792ad30c0b8d0ccbbf
Reviewed-on: https://chromium-review.googlesource.com/1060580
Reviewed-by: Ryan Tseng <hinoka@chromium.org>
Commit-Queue: Andrii Shyshkalov <tandrii@chromium.org>

[modify] https://crrev.com/5ca4cf2123d3ec046d94e09e8cb37f451836297f/milo/buildsource/buildbucket/build.go

Status: Fixed (was: Started)
Deployed 3063-5ca4cf2 which has this fixed.

Comment 6 by no...@chromium.org, May 16 2018

were we paged? I don't see it in https://o.corp.google.com/#Tickets:chrome-infra::::chrome-ops-foundation
we should have

Comment 7 by estaab@chromium.org, May 16 2018

Cc: estaab@chromium.org
We weren't. Some users complained in Hangouts. My guess is that QPS from builder page views (done manually by humans) is negligibly small (compared to various automates calls) for LuciMilo5xxRateHigh alert to fire :-/

Comment 9 by no...@chromium.org, May 16 2018

Maybe we should have a separate metric of panics? Our code should never panic. I think we want to be alerted on at least one panic. WDYT?

Comment 10 by no...@chromium.org, May 16 2018

Or rather should we have a lower threshold of URL paths accessed by humans? Humans are less HTTP 500 tolerant. 
Cc: vadimsh@chromium.org
Owner: ----
Status: Available (was: Fixed)
Do we have 500s that are expected and acceptable? Maybe non-zero is enough to alert if not.

(technically this should stay fixed and we should open another bug for tracking fixing alerting but I didn't want to lose this or the context)
500s on backends (cron / pubsub) aren't expected, but are acceptable.  They're usually due to datastore (or memcache, if ds_cache is on strict) flakes.

500s on prpc endpoints aren't expected nor ideal, but aren't usually user-visible

500s on frontends aren't expected nor acceptable, but setting the threshold too low will alerts us on things that aren't always actionable (eg. short term datastore flake, gerrit flake).
Status: Fixed (was: Available)
filed  issue 849289  for improved monitoring

Sign in to add a comment