Investigate cause of Milo pubsub OOM |
|||||
Issue descriptionTracking bug to investigate root cause of 800530
,
Jan 11 2018
,
Jan 12 2018
Here's what we know: * It's an OOM issue. * Concurrency wasn't actually turned off (oops), but the instances still recovered, and the only mitigation done was to stop processing client.art builds. * With a bit of logging, we know that the request that ultimately logs a critical OOM error might not be the actual failing request. * This build looks particularly suspicious: https://uberchromegw.corp.google.com/i/client.art/builders/bullhead-armv8-gcstress-debug/builds/238 https://screenshot.googleplex.com/BGmyjzuEv1F This is a finish build, but the status is running. TODO: Try to repro this locally by generating large builds similar to the one above.
,
Jan 12 2018
Talked it over with nodir@. It's still somewhat speculative, but I think we can reasonably believe this is the canonical root cause, and it has been written in the PM.
,
Feb 28 2018
,
Jun 2 2018
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by hinoka@chromium.org
, Jan 11 2018Status: Assigned (was: Unconfirmed)