Internal server error when viewing a build |
||||||
Issue descriptionThis one: https://ci.chromium.org/p/fuchsia/builds/b8946418250004949856 Initiated by Scheduler here: https://luci-scheduler.appspot.com/jobs/fuchsia/zircon-x64-asan-qemu_kvm/9111848312310527376 From the server log: Caught panic during handling of "/p/fuchsia/builds/b8946418255666079952": runtime error: invalid memory address or nil pointer dereference at go.chromium.org/luci/common/runtime/paniccatcher.Catch (catch.go:41) at panic (go/src/runtime/panic.go:489) at go.chromium.org/luci/buildbucket/proto.(*GitilesCommit).RepoURL (buildset.go:49) at go.chromium.org/luci/milo/buildsource/buildbucket.simplisticBlamelist (build.go:145) at go.chromium.org/luci/milo/buildsource/buildbucket.getBlame (build.go:453) at go.chromium.org/luci/milo/buildsource/buildbucket.(*BuildID).Get (build.go:494) at go.chromium.org/luci/milo/frontend.BuildHandler (view_build.go:16) at go.chromium.org/luci/milo/frontend.Run.func1 (routes.go:86) at go.chromium.org/luci/milo/frontend.handleError.func1 (routes.go:251) at go.chromium.org/luci/server/router.run (handler.go:95) at go.chromium.org/luci/server/router.run.func2 (handler.go:90) at go.chromium.org/luci/milo/frontend.projectACLMiddleware (middleware.go:419) ...
,
May 15 2018
Looks like it is affecting even more "normal" builds, e.g. this CI build: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/47286 Please revert the deployment first.
,
May 16 2018
hinoka@ rolled back. final fix in https://chromium-review.googlesource.com/#/c/infra/luci/luci-go/+/1060580
,
May 16 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-go.git/+/5ca4cf2123d3ec046d94e09e8cb37f451836297f commit 5ca4cf2123d3ec046d94e09e8cb37f451836297f Author: Andrii Shyshkalov <tandrii@chromium.org> Date: Wed May 16 00:17:02 2018 [milo] fix regression for builds w/o associated gitiles/commit buildset. Regression was introduced in https://crrev.com/c/1060467. R=hinoka@chromium.org Bug: 843393 , 843245 Change-Id: If5c51bc609562ffb6be0af792ad30c0b8d0ccbbf Reviewed-on: https://chromium-review.googlesource.com/1060580 Reviewed-by: Ryan Tseng <hinoka@chromium.org> Commit-Queue: Andrii Shyshkalov <tandrii@chromium.org> [modify] https://crrev.com/5ca4cf2123d3ec046d94e09e8cb37f451836297f/milo/buildsource/buildbucket/build.go
,
May 16 2018
Deployed 3063-5ca4cf2 which has this fixed.
,
May 16 2018
were we paged? I don't see it in https://o.corp.google.com/#Tickets:chrome-infra::::chrome-ops-foundation we should have
,
May 16 2018
,
May 16 2018
We weren't. Some users complained in Hangouts. My guess is that QPS from builder page views (done manually by humans) is negligibly small (compared to various automates calls) for LuciMilo5xxRateHigh alert to fire :-/
,
May 16 2018
Maybe we should have a separate metric of panics? Our code should never panic. I think we want to be alerted on at least one panic. WDYT?
,
May 16 2018
Or rather should we have a lower threshold of URL paths accessed by humans? Humans are less HTTP 500 tolerant.
,
May 16 2018
Do we have 500s that are expected and acceptable? Maybe non-zero is enough to alert if not. (technically this should stay fixed and we should open another bug for tracking fixing alerting but I didn't want to lose this or the context)
,
May 16 2018
500s on backends (cron / pubsub) aren't expected, but are acceptable. They're usually due to datastore (or memcache, if ds_cache is on strict) flakes. 500s on prpc endpoints aren't expected nor ideal, but aren't usually user-visible 500s on frontends aren't expected nor acceptable, but setting the threshold too low will alerts us on things that aren't always actionable (eg. short term datastore flake, gerrit flake).
,
Jun 4 2018
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by tandrii@chromium.org
, May 15 2018Status: Assigned (was: Untriaged)