Infrequent LUCI/GAE settings panic(s), probably race. |
|||
Issue descriptionI've seen these two errors occur regularly, but very infrequently. Both stem from the ToT LUCI/GAE module. Neither settings nor the code have been modified recently, and the nature of this suggests a race of some sort. Posting here for visibility. (1) panic: runtime error: invalid memory address or nil pointer dereference goroutine 30066 [running]: google.golang.org/appengine/github.com/luci/luci-go/server/settings.(*Bundle).get(0xc8220306f0, 0xcdabb8, 0x3, 0xa00740, 0xc822000b90, 0x0, 0x0) /go/src/app/_gopath/src/github.com/luci/luci-go/server/settings/settings.go:45 +0x7e github.com/luci/luci-go/server/settings.(*Settings).Get(0xc820217770, 0x7fac27bcfad8, 0xc8216cb980, 0xcdabb8, 0x3, 0xa00740, 0xc822000b90, 0x0, 0x0) /go/src/app/_gopath/src/github.com/luci/luci-go/server/settings/settings.go:148 +0x11d github.com/luci/luci-go/server/settings.Get(0x7fac27bcfad8, 0xc8216cb980, 0xcdabb8, 0x3, 0xa00740, 0xc822000b90, 0x0, 0x0) /go/src/app/_gopath/src/github.com/luci/luci-go/server/settings/context.go:33 +0x92 github.com/luci/luci-go/appengine/gaemiddleware.fetchCachedSettings(0x7fac27bcfad8, 0xc8216cb980, 0x0, 0xa3c700) /go/src/app/_gopath/src/github.com/luci/luci-go/appengine/gaemiddleware/settings.go:74 +0xa1 github.com/luci/luci-go/appengine/gaemiddleware.WithProd(0x7fac27bcfad8, 0xc8216cb980, 0xc8321775e0, 0x0, 0x0) /go/src/app/_gopath/src/github.com/luci/luci-go/appengine/gaemiddleware/context.go:55 +0x287 github.com/luci/luci-go/appengine/gaemiddleware.BaseProd.func1(0x7fac27bcfaa0, 0xc820724c60, 0xc8321775e0, 0x0, 0x0, 0x0) /go/src/app/_gopath/src/github.com/luci/luci-go/appengine/gaemiddleware/context.go:74 +0x74 github.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc820286d00, 0x7fac27bcfaa0, 0xc820724c60, 0xc8321775e0) ( Referencing: https://github.com/luci/luci-go/blob/38c86425c374d50a55261f2d67ab579d7506ac6f/server/settings/settings.go#L45 ) ----- (2) panic: could not fetch GAE settings - Canceled: Deadline exceeded (timeout) goroutine 1022247 [running]: google.golang.org/appengine/github.com/luci/luci-go/appengine/gaemiddleware.fetchCachedSettings(0x7fe961648dc8, 0xc844282ff0, 0x0, 0xa3c700) /go/src/app/_gopath/src/github.com/luci/luci-go/appengine/gaemiddleware/settings.go:84 +0x244 github.com/luci/luci-go/appengine/gaemiddleware.WithProd(0x7fe961648dc8, 0xc844282ff0, 0xc8470168c0, 0x0, 0x0) /go/src/app/_gopath/src/github.com/luci/luci-go/appengine/gaemiddleware/context.go:55 +0x287 github.com/luci/luci-go/appengine/gaemiddleware.BaseProd.func1(0x7fe961648c38, 0xc82896b440, 0xc8470168c0, 0x0, 0x0, 0x0) /go/src/app/_gopath/src/github.com/luci/luci-go/appengine/gaemiddleware/context.go:74 +0x74 github.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc820286b40, 0x7fe961648c38, 0xc82896b440, 0xc8470168c0) ( Referencing https://github.com/luci/luci-go/blob/38c86425c374d50a55261f2d67ab579d7506ac6f/appengine/gaemiddleware/settings.go#L84 ) (1) Seems like a race. AFAICT the only thing that could panic at that line would be "b == nil", which suggests that the value retrieved from the lazyValue is (*Bundle)(nil). (2) The request related to the timeout lived for 5 seconds. That seems like a really long period of time to wait for a datastore Get of an uncontested entity. Maybe this is a race in lazyValue? Is this something we can replace with common/promise, which I've hammered pretty thoroughly?
,
Feb 29 2016
(1): I'll take a look. common/promise and common/lazyslot are not interchangable. common/lazyslot doesn't block if there's already a cached (perhaps stale) value. (2) It is sort of WAI. If first call to FetchSettings in new instance process fails, it panics. Otherwise all places where settings are needed should do error handling, and its PITA. If process had called FetchSettings successfully at least once, it won't ever panic. Datastore RPC (any GAE RPC) occasionally timeout regardless of contention. It has been happening since beginning of times.
,
Mar 1 2016
The following revision refers to this bug: https://chromium.googlesource.com/external/github.com/luci/luci-go.git/+/e3dd046f72f64f6213a504101b26f345834a903f commit e3dd046f72f64f6213a504101b26f345834a903f Author: vadimsh <vadimsh@chromium.org> Date: Tue Mar 01 02:41:44 2016 common/lazyslot: Simplify code a little bit. Be more strict about nil return value. BUG= 590594 R=dnj@chromium.org, iannucci@chromium.org Review URL: https://codereview.chromium.org/1750023002 [modify] https://crrev.com/e3dd046f72f64f6213a504101b26f345834a903f/common/lazyslot/lazyslot.go [modify] https://crrev.com/e3dd046f72f64f6213a504101b26f345834a903f/common/lazyslot/lazyslot_test.go
,
Mar 2 2016
The following revision refers to this bug: https://chromium.googlesource.com/external/github.com/luci/luci-go.git/+/af6c7eaa57f2d353a135df9b61c6401144bb5c9e commit af6c7eaa57f2d353a135df9b61c6401144bb5c9e Author: vadimsh <vadimsh@chromium.org> Date: Wed Mar 02 23:18:57 2016 Retry RPC deadlines when fetching GAE settings. Should eliminate rare panics during initial settings fetch. R=dnj@chromium.org BUG= 590594 Review URL: https://codereview.chromium.org/1748933002 [modify] https://crrev.com/af6c7eaa57f2d353a135df9b61c6401144bb5c9e/common/lazyslot/lazyslot.go [modify] https://crrev.com/af6c7eaa57f2d353a135df9b61c6401144bb5c9e/server/settings/settings.go
,
Mar 25 2016
I don't see it anymore in my apps. Reopen if you see it again. |
|||
►
Sign in to add a comment |
|||
Comment 1 by d...@chromium.org
, Feb 28 2016Status: Available (was: Untriaged)