MasterSchedulerTickRateLow not solved by push to prod |
||
Issue descriptionUsually it is solved by a push to prod, however, this time push to prod made no change. Moreover, master scheduler tick rate (http://shortn/_HjnVXweuIP) and master host-scheduler tick rate (http://shortn/_jGO9ObiPr4) are usually in sync and both increase after a push to prod. However, starting Oct 6 master scheduler tick rate deteriorated while master host-scheduler tick rate stayed high. Master host-scheduler tick rate was raised by the latest push to prod (4pm Oct 9), while master scheduler tick rate was not.
,
Oct 10
I've been moving boards around between shards. That means they are intermittently shardless, and thus fall to the master. That might temporarily cause master slowness.
,
Oct 10
The tick rate dropped off dramatically on Saturday and stayed low ever since. Is that when the skunk-3 issue started?
,
Oct 10
Actually, it does align with skunk-3 dying pretty well, so it is probably the root cause: http://shortn/_Hoo7jT2sDG
,
Oct 10
Oh wait, I just remembered (from your observation) that if a shard goes down, the master will still try to talk to it somehow. I don't remember if we chased it last time it came up, or where exactly it was, but I remember it being depressing. |
||
►
Sign in to add a comment |
||
Comment 1 by zamorzaev@chromium.org
, Oct 10