tryserver.chromium.linux is completely bricked. |
||||
Issue descriptionhttps://build.chromium.org/p/tryserver.chromium.linux/ P0 for obvious reasons. Result: HTTP 503 Noticed by nodir@, probably not related to his BuildBucket API changes (see why below). I'm not actively working on this, but observations so far are: - Last log in "twistd.log" was at 18:39 PST. - The process is still running. - The Postgres database for this master is using 100% CPU and has the following command-line: postgres: tryserverchromiumlinux tryserverchromiumlinux 127.0.0.1(50095) DELETE This is almost certainly the waterfall's various builders all becoming blocked on database operations. This may suggest that the previous revert that stip@ rolled out did not actually fix this.
,
Jul 9 2016
Still down in this morning, same query yields same IDs, so that suggests that the database is not successfully deleting them.
,
Jul 9 2016
I killed the database process and restarted the master, and it seems to be humming along nicely. If this is systemic, we should see this probably again in a future prune.
,
Jul 9 2016
The master is up and building. Decreasing priority, but leaving this open for trooper to look at on Monday.
,
Jul 11 2016
/var/log/postgresql didn't log anything for that time range: 2016-07-07 11:32:50 PDT WARNING: pgstat wait timeout 2016-07-08 22:12:03 PDT FATAL: role "chrome-bot" does not exist The query from #c1 "2016-07-09 10:03:02 PDT STATEMENT: DELETE FROM changes WHERE changes.changeid IN (4416339, 4416338, 4416337, 4416336, 4416335, 4416334, 4416333, 4416332, 4416331," is actually super huge - I didn't reach the end after pressing Down arrow for a couple of minutes, which is probably the reason for the db outage. Where do we get the change id list from?
,
Jul 12 2016
,
Jul 14 2016
|
||||
►
Sign in to add a comment |
||||
Comment 1 by d...@chromium.org
, Jul 9 2016