New issue
Advanced search Search tips

Issue 868066 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

milo thinks master is dead if it sends giant pubsub messages

Project Member Reported by no...@chromium.org, Jul 26

Issue description

on 2018-07-25 night, chromium.clang master sent large PubSub messages. Milo rejected them all because of a fear of OOM. 4 hours later that master was declared as dead because it was not modified for 4 hours.

This is incorrect. Whether master is dead or not should be computed based on "last seen time", not "modified time" because we don't always modified master when we see a message from it.

This resulted in migration app closing all chromium.clang bugs.
 
That's buildbot only right? Why not just WontFix?
Owner: hinoka@chromium.org
Status: Assigned (was: Untriaged)
Labels: -Pri-2 Pri-1
Yes, buildbot only but we need the migration app to be accurate during migrations. This is why we need this fixed even though it's buildbot. 

From Nodir's description, it should be simple to fix to move to a different measurement of time to determine whether a master is alive. 
As part of this, is it possible to refresh the state of the builders under the follow masters listed in the migration app. They are all currently still marked as migrated due to this bug. 

chromium.fyi
chromium.chromedriver
chromium.clang
thinking about this a bit more, to record last seen time we'd have to modify the entity, at which point we'd have to modify Modified time :) so it sounds like in the case of potential OOM, we could just bump Modified time without changing anything else.

Sign in to add a comment