New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 590594 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Infrequent LUCI/GAE settings panic(s), probably race.

Project Member Reported by d...@chromium.org, Feb 28 2016

Issue description

I've seen these two errors occur regularly, but very infrequently. Both stem from the ToT LUCI/GAE module. Neither settings nor the code have been modified recently, and the nature of this suggests a race of some sort. Posting here for visibility.

(1) panic: runtime error: invalid memory address or nil pointer dereference

goroutine 30066 [running]:
google.golang.org/appengine/github.com/luci/luci-go/server/settings.(*Bundle).get(0xc8220306f0, 0xcdabb8, 0x3, 0xa00740, 0xc822000b90, 0x0, 0x0)
	/go/src/app/_gopath/src/github.com/luci/luci-go/server/settings/settings.go:45 +0x7e
github.com/luci/luci-go/server/settings.(*Settings).Get(0xc820217770, 0x7fac27bcfad8, 0xc8216cb980, 0xcdabb8, 0x3, 0xa00740, 0xc822000b90, 0x0, 0x0)
	/go/src/app/_gopath/src/github.com/luci/luci-go/server/settings/settings.go:148 +0x11d
github.com/luci/luci-go/server/settings.Get(0x7fac27bcfad8, 0xc8216cb980, 0xcdabb8, 0x3, 0xa00740, 0xc822000b90, 0x0, 0x0)
	/go/src/app/_gopath/src/github.com/luci/luci-go/server/settings/context.go:33 +0x92
github.com/luci/luci-go/appengine/gaemiddleware.fetchCachedSettings(0x7fac27bcfad8, 0xc8216cb980, 0x0, 0xa3c700)
	/go/src/app/_gopath/src/github.com/luci/luci-go/appengine/gaemiddleware/settings.go:74 +0xa1
github.com/luci/luci-go/appengine/gaemiddleware.WithProd(0x7fac27bcfad8, 0xc8216cb980, 0xc8321775e0, 0x0, 0x0)
	/go/src/app/_gopath/src/github.com/luci/luci-go/appengine/gaemiddleware/context.go:55 +0x287
github.com/luci/luci-go/appengine/gaemiddleware.BaseProd.func1(0x7fac27bcfaa0, 0xc820724c60, 0xc8321775e0, 0x0, 0x0, 0x0)
	/go/src/app/_gopath/src/github.com/luci/luci-go/appengine/gaemiddleware/context.go:74 +0x74
github.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc820286d00, 0x7fac27bcfaa0, 0xc820724c60, 0xc8321775e0)

( Referencing: https://github.com/luci/luci-go/blob/38c86425c374d50a55261f2d67ab579d7506ac6f/server/settings/settings.go#L45 )

-----

(2) panic: could not fetch GAE settings - Canceled: Deadline exceeded (timeout)

goroutine 1022247 [running]:
google.golang.org/appengine/github.com/luci/luci-go/appengine/gaemiddleware.fetchCachedSettings(0x7fe961648dc8, 0xc844282ff0, 0x0, 0xa3c700)
	/go/src/app/_gopath/src/github.com/luci/luci-go/appengine/gaemiddleware/settings.go:84 +0x244
github.com/luci/luci-go/appengine/gaemiddleware.WithProd(0x7fe961648dc8, 0xc844282ff0, 0xc8470168c0, 0x0, 0x0)
	/go/src/app/_gopath/src/github.com/luci/luci-go/appengine/gaemiddleware/context.go:55 +0x287
github.com/luci/luci-go/appengine/gaemiddleware.BaseProd.func1(0x7fe961648c38, 0xc82896b440, 0xc8470168c0, 0x0, 0x0, 0x0)
	/go/src/app/_gopath/src/github.com/luci/luci-go/appengine/gaemiddleware/context.go:74 +0x74
github.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc820286b40, 0x7fe961648c38, 0xc82896b440, 0xc8470168c0)

( Referencing https://github.com/luci/luci-go/blob/38c86425c374d50a55261f2d67ab579d7506ac6f/appengine/gaemiddleware/settings.go#L84 )


(1) Seems like a race. AFAICT the only thing that could panic at that line would be "b == nil", which suggests that the value retrieved from the lazyValue is (*Bundle)(nil). 
(2) The request related to the timeout lived for 5 seconds. That seems like a really long period of time to wait for a datastore Get of an uncontested entity.

Maybe this is a race in lazyValue? Is this something we can replace with common/promise, which I've hammered pretty thoroughly?
 

Comment 1 by d...@chromium.org, Feb 28 2016

Labels: -Pri-2 Infra Pri-1
Status: Available (was: Untriaged)
+Labels.
Owner: vadimsh@chromium.org
Status: Assigned (was: Available)
(1): I'll take a look. common/promise and common/lazyslot are not interchangable. common/lazyslot doesn't block if there's already a cached (perhaps stale) value.

(2) It is sort of WAI. If first call to FetchSettings in new instance process fails, it panics. Otherwise all places where settings are needed should do error handling, and its PITA. If process had called FetchSettings successfully at least once, it won't ever panic.

Datastore RPC (any GAE RPC) occasionally timeout regardless of contention. It has been happening since beginning of times.
Status: Fixed (was: Assigned)
I don't see it anymore in my apps. Reopen if you see it again.

Sign in to add a comment