Issue metadata
Sign in to add a comment
|
CQ commits the same CL twice |
||||||||||||||||||||||||
Issue descriptionOn Tue, 08 Nov 09:41, tree is closed due to a compilation error: http://build.chromium.org/p/chromium.webkit/builders/WebKit%20Mac%20Builder/builds/184044 The strange thing about this compilation error is that, the CL that causes this problem is the same CL that's been landed twice. Check up https://codereview.chromium.org/2485013002, patchset is landed at https://crrev.com/b2fd274ec5b20ace720420106a13b8c90b61bed1 Cr-Commit-Position: refs/heads/master@{#430607} and https://crrev.com/93ed99a0238f25fe391874d3f6dfa02c01acb781 Cr-Commit-Position: refs/heads/master@{#430608} . Even more strangely, the second land only lands a partial code of the first land. We have to manually revert the second landed commit, because using CodeReview tool to do the revert will wrongly reverts the first landed commit, which is not the culprit. Please investigate. This is quite a serious problem.
,
Nov 8 2016
Looking. While indeed serious, my first hunch is that it's likely a low probability error, will evaluate if it needs to stay trooper-P0 (but should probably stay CQ P0).
,
Nov 8 2016
Stats: the two commits happened within 1 second of each other (time in UTC): b2fd274ec5b20ace720420106a13b8c90b61bed1 (original, good): Tue Nov 08 14:35:53 2016 93ed99a0238f25fe391874d3f6dfa02c01acb781 (additional, bad): Tue Nov 08 14:35:54 2016
,
Nov 8 2016
From CQ logs: 2016-11-08 14:32:18 UTC [pid:24867 tid:139871004825344 pending_manager.base:320] Committer thread is processing 2485013002. 2016-11-08 14:31:05 UTC [pid:13058 tid:140511909295872 pending_manager.base:320] Committer thread is processing 2485013002. Apparently, the same patch got dispatched for commit twice. I have a theory as to why, looking to confirm it (will post later).
,
Nov 8 2016
CQ restarted at 14:31:43 - right in between the two commits: http://chromium-cq-status.appspot.com/recent#action=cq_start,project=chromium So, my theory is confirmed: - CQ relies on Rietveld's Datastore (eventual) consistency to know if the issue has been committed. - CQ keeps internal cache in memory for committed issues to avoid duplicate commits, but this cache is flushed and repopulated from Rietveld on restart. - Datastore's eventual consistency may lag for minutes, and occasionally hours. SLO: 24h. What actually happened: - So, CQ committed the patch, then immediately restarted. - Rietveld's Datastore didn't catch up, so CQ loaded the issue as not-yet-closed, but ready to commit (since all try jobs passed). - CQ committed the patchset, which happen to apply nontrivially (usually it fails to apply twice, or resolve to no changes). - Result: erroneous extra commit. This was a known possible scenario at the time the feature was implemented (~2 years ago), and this is the first known incident when it caused a problem. I'd say, this is a very rare event, and is not likely to repeat very often. I'm downgrading to P1 and sending to the CQ team for triage. I believe that the upcoming transition to Gerrit will solve this problem. In the meantime, if AppEngine continues to give us trouble with degraded eventual consistency, we may do something about it. In particular, running CQ longer without interruption will definitely help mitigate this issue, so consider running it under the service_manager.
,
Nov 8 2016
This is very known issue. Your analysis is almost right: datastore isn't to blame. CQ was restarted as it was waiting on the git push to finish, and hence Rietveld issue was never closed. That said, this is happening with elevated frequencies lately - see issue 432905 and issue 662140 . Maybe we should fix this, or add trooper book manual item? Gerrit indeed solves this by ensuring CL isn't committed twice.
,
Nov 8 2016
Also, the stats of commits in gnumbd-enabled repo are meaningful: timestamp there correcpond to local gnumbd machine clock && it ensures they are monotonically increasing. One has to look at refs/pending/heads/master equivalents - timestamps there are actually those from cq machine. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by estaab@chromium.org
, Nov 8 2016Labels: Infra-Troopers