milo thinks master is dead if it sends giant pubsub messages |
|||
Issue descriptionon 2018-07-25 night, chromium.clang master sent large PubSub messages. Milo rejected them all because of a fear of OOM. 4 hours later that master was declared as dead because it was not modified for 4 hours. This is incorrect. Whether master is dead or not should be computed based on "last seen time", not "modified time" because we don't always modified master when we see a message from it. This resulted in migration app closing all chromium.clang bugs.
,
Jul 26
,
Jul 26
Yes, buildbot only but we need the migration app to be accurate during migrations. This is why we need this fixed even though it's buildbot. From Nodir's description, it should be simple to fix to move to a different measurement of time to determine whether a master is alive.
,
Jul 26
As part of this, is it possible to refresh the state of the builders under the follow masters listed in the migration app. They are all currently still marked as migrated due to this bug. chromium.fyi chromium.chromedriver chromium.clang
,
Jul 26
thinking about this a bit more, to record last seen time we'd have to modify the entity, at which point we'd have to modify Modified time :) so it sounds like in the case of potential OOM, we could just bump Modified time without changing anything else. |
|||
►
Sign in to add a comment |
|||
Comment 1 by mar...@chromium.org
, Jul 26