New issue
Advanced search Search tips

Issue 801056 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Investigate cause of Milo pubsub OOM

Project Member Reported by hinoka@chromium.org, Jan 11 2018

Issue description

Tracking bug to investigate root cause of 800530
 

Comment 1 by hinoka@chromium.org, Jan 11 2018

Owner: hinoka@chromium.org
Status: Assigned (was: Unconfirmed)

Comment 2 by hinoka@chromium.org, Jan 11 2018

Components: Infra>Platform>Milo

Comment 3 by hinoka@chromium.org, Jan 12 2018

Here's what we know:

* It's an OOM issue.
* Concurrency wasn't actually turned off (oops), but the instances still recovered, and the only mitigation done was to stop processing client.art builds.
* With a bit of logging, we know that the request that ultimately logs a critical OOM error might not be the actual failing request.
* This build looks particularly suspicious:
https://uberchromegw.corp.google.com/i/client.art/builders/bullhead-armv8-gcstress-debug/builds/238
https://screenshot.googleplex.com/BGmyjzuEv1F

This is a finish build, but the status is running.

TODO: Try to repro this locally by generating large builds similar to the one above.

Comment 4 by hinoka@chromium.org, Jan 12 2018

Status: Fixed (was: Assigned)
Talked it over with nodir@.  It's still somewhat speculative, but I think we can reasonably believe this is the canonical root cause, and it has been written in the PM.

Comment 5 by efoo@chromium.org, Feb 28 2018

Labels: LUCI-Chromium-CQSets LUCI-Blocker-Chromium-CQSets

Comment 6 by efoo@chromium.org, Jun 2 2018

Labels: cit-pm-67

Sign in to add a comment