datastore_v3: TIMEOUT when retrieving many builds |
|||
Issue descriptione.g. for https://luci-milo.appspot.com/buildbot/chromium.perf/Win%207%20x64%20Perf/?limit=100 I consistently see Error: 500 API error 5 (datastore_v3: TIMEOUT): The datastore operation timed out, or the data was temporarily unavailable. Request ID: 5890cd5e00ff0d3ffb7e1e60b80001737e6c7563692d6d696c6f0001313430372d31386161383434000100
,
Jan 31 2017
Log here: https://pantheon.corp.google.com/logs/viewer?project=luci-milo&minLogLevel=0&expandAll=false&resource=gae_app%2Fmodule_id%2Fdefault&logName=projects%2Fluci-milo%2Flogs%2Fappengine.googleapis.com%252Frequest_log&filters=text:5890cd5e00ff0d3ffb7e1e60b80001737e6c7563692d6d696c6f0001313430372d31386161383434000100×tamp=2017-01-31T17:46:06.868347000Z It's taking 10s exactly so it's probably hitting some sort of timeout from fetching a lot of builds. First thing that comes to mind is that unmarshalling a build is expensive.
,
Jan 31 2017
Ok. Do we have to unmarshall everything? 10 seconds seems long for rendering this page.
,
Jan 31 2017
Because of the way the build struct was designed, it has to be unmarshalled on load :(. In this case it is loading -> decompress -> base64 decode -> json unmarshal 100 items. I do want to refactor it to do something more sensible (split summary and detail) but it will be a little tricky/risky. I think this will be a blocker for console view so it'll probably also happen sooner rather than later.
,
Jan 31 2017
,
Jan 31 2017
Propose using a Batcher: https://codereview.chromium.org/2668763002
,
Jan 31 2017
The following revision refers to this bug: https://chromium.googlesource.com/external/github.com/luci/luci-go.git/+/b7c33500af911388323e6e612e933ea0f7f57888 commit b7c33500af911388323e6e612e933ea0f7f57888 Author: dnj <dnj@chromium.org> Date: Tue Jan 31 19:27:53 2017 Use a datastore batcher for build queries. Queries are timing out. This is because processing elements is CPU-intensive, and datastore queries have a maximum lifetime of 30 seconds. Use batching to break the single query/deserialize process into a series of consecutive queries so any individual query doesn't run into the timeout limit. Note that really high limits will still bump into the actual AppEngine request timeout. BUG= chromium:687236 TEST=None R=estaab@chromium.org, hinoka@chromium.org Review-Url: https://codereview.chromium.org/2668763002 [modify] https://crrev.com/b7c33500af911388323e6e612e933ea0f7f57888/milo/appengine/buildbot/builder.go [modify] https://crrev.com/b7c33500af911388323e6e612e933ea0f7f57888/milo/appengine/buildbot/console.go [add] https://crrev.com/b7c33500af911388323e6e612e933ea0f7f57888/milo/appengine/buildbot/datastore.go [modify] https://crrev.com/b7c33500af911388323e6e612e933ea0f7f57888/milo/appengine/buildbot/grpc.go [modify] https://crrev.com/b7c33500af911388323e6e612e933ea0f7f57888/milo/appengine/buildbot/master.go [modify] https://crrev.com/b7c33500af911388323e6e612e933ea0f7f57888/milo/appengine/buildbot/pubsub.go
,
Jan 31 2017
This works now but it's hella slow. Marking as fixed for now and will open another bug for the speed issue.
,
Jan 31 2017
Cool! Yeah nothing I did speeds anything up at all :P
,
Jan 31 2017
,
Jan 31 2017
Nice! Thanks Ryan for the quick debugging and Dan for the fix! |
|||
►
Sign in to add a comment |
|||
Comment 1 by estaab@chromium.org
, Jan 31 2017Labels: -Pri-2 Pri-1
Owner: hinoka@chromium.org
Status: Assigned (was: Untriaged)