New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 626871 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner: ----
Closed: Jul 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

tryserver.chromium.linux is completely bricked.

Project Member Reported by d...@chromium.org, Jul 9 2016

Issue description

https://build.chromium.org/p/tryserver.chromium.linux/

P0 for obvious reasons.
Result: HTTP 503
Noticed by nodir@, probably not related to his BuildBucket API changes (see why below).

I'm not actively working on this, but observations so far are:

- Last log in "twistd.log" was at 18:39 PST.
- The process is still running.
- The Postgres database for this master is using 100% CPU and has the following command-line: postgres: tryserverchromiumlinux tryserverchromiumlinux 127.0.0.1(50095) DELETE

This is almost certainly the waterfall's various builders all becoming blocked on database operations. This may suggest that the previous revert that stip@ rolled out did not actually fix this.
 

Comment 1 by d...@chromium.org, Jul 9 2016

Looking at database activity:

$ SELECT count(*) as cnt, usename, current_query FROM pg_stat_activity GROUP BY usename,current_query ORDER BY cnt DESC;

tryserverchromiumlinux          | DELETE FROM changes WHERE changes.changeid IN (4416339, 4416338, 4416337, 4416336, 4416335, 4416334, 4416333, 4416332, 4416331, 4416330, 4416329, 4416328, 4416327, 4416326, 4416325, 4416324, 4416323, 4416322, 4416321, 4416320, 4416319, 4416318, 4416317, 4416316, 4416315, 4416314, 4416313, 4416312, 4416311, 4416310, 4416309, 4416308, 4416307, 4416306, 4416305, 4416304, 4416303, 4416302, 4416301, 4416300, 4416299, 4416298, 4416297, 4416296, 4416295, 4416294, 4416293, 4416292, 4416291, 4416290, 4416289, 4416288, 4416287, 4416286, 4416285, 4416284, 4416283, 4416282, 4416281, 4416280, 4416279, 4416278, 4416277, 4416276, 4416275, 4416274, 4416273, 4416272, 4416271, 4416270, 4416269, 4416268, 4416267, 4416266, 4416265, 4416264, 4416263, 4416262, 4416261, 4416260, 4416259, 4416258, 4416257, 4416256, 4416255, 4416254, 4416253, 4416252, 4416251, 4416250, 4416249, 4416248, 4416247, 4416246, 4416245, 4416244, 4416243, 4416242, 4416241, 4416240, 4416239, 4416238, 4416237, 4416236, 4416235, 4416234, 4416233, 4416232, 4416

The query result is truncated. This looks bad. The code that's executing this is here:
https://chromium.googlesource.com/chromium/tools/build/+/master/third_party/buildbot_8_4p1/buildbot/db/changes.py#258

(I assume, it's the only BuildBot code that deletes from that table)

I ran a delete manually of the first two entries and it was really fast. I can't tell, but maybe this delete is for an absurdly large number of rows?

Comment 2 by d...@chromium.org, Jul 9 2016

Still down in this morning, same query yields same IDs, so that suggests that the database is not successfully deleting them.

Comment 3 by d...@chromium.org, Jul 9 2016

I killed the database process and restarted the master, and it seems to be humming along nicely. If this is systemic, we should see this probably again in a future prune.

Comment 4 by d...@chromium.org, Jul 9 2016

Labels: -Pri-0 Pri-1
The master is up and building. Decreasing priority, but leaving this open for trooper to look at on Monday.
/var/log/postgresql didn't log anything for that time range:

2016-07-07 11:32:50 PDT WARNING:  pgstat wait timeout
2016-07-08 22:12:03 PDT FATAL:  role "chrome-bot" does not exist

The query from #c1 "2016-07-09 10:03:02 PDT STATEMENT:  DELETE FROM changes WHERE changes.changeid IN (4416339, 4416338, 4416337, 4416336, 4416335, 4416334, 4416333, 4416332, 4416331," is actually super huge - I didn't reach the end after pressing Down arrow for a couple of minutes, which is probably the reason for the db outage. Where do we get the change id list from?

Components: -Infra

Comment 7 by no...@chromium.org, Jul 14 2016

Status: Fixed (was: Untriaged)

Sign in to add a comment