Some build ids are missing from the cq_attempts BQ table but present in cq_events |
|||
Issue descriptionIn the first patchset of this CL: https://chromium-review.googlesource.com/c/chromium/src/+/1214081/1. Builder: android-kitkat-arm-rel has a failed build: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/77142, and its build id is: 8935882956476713904. However, when running the following query: SELECT * FROM `chrome-infra-events.aggregated.cq_attempts` AS ca WHERE ca.issue = '1214081' AND ca.patchset = '1' 8935882956476713904 is not present in the contributing_buildbucket_ids.
,
Sep 11
,
Sep 12
interesting. If I query raw CQ events, I can see it being present in a bunch of CQ sent events: SELECT issue, patchset, timestamp_millis, contributing_buildbucket_ids FROM [chrome-infra-events:raw_events.cq] WHERE issue = '1214081' AND patchset = '1' AND contributing_buildbucket_ids = 8935882956476713904 ORDER BY timestamp_millis ASC LIMIT 1000 However, I also don't see it cq attempts table: SELECT * FROM [chrome-infra-events:aggregated.cq_attempts] AS ca WHERE ca.issue = "1214081" AND ca.patchset = "1" AND contributing_bbucket_ids = 8935882956476713904; This suggests a bug in dataflow job itself. I wonder if the fact that on Sunday 9th there were failures in creating new dataflow jobs have something to do with this: https://screenshot.googleplex.com/V3DyzXpP4Qo If not, then the bug in source code here: https://cs.chromium.org/chromium/infra/packages/dataflow/cq_attempts.py?q=cq+dataflow+cq_attempt&sq=package:chromium&dr=C&l=10 Do you see any bug there?
,
Sep 16
Lowering priority and de-assigning. Among CQ bugs, this is Pri2. And I must fix a few Pri1s. Unless someone volunteers to look into this, I intend to revisit this bug together with rework of CQ BQ schema.
,
Sep 17
Hi Andrii, I just find out that for all the cq attempts that have retries, only the build id of the first build is recorded in the cq_attempts table. For example, https://chromium-review.googlesource.com/c/chromium/src/+/1227495/1 has 3 builds on linux-chromeos-rel, however, only one of them is present in the cq_attempts table. Is this by design?
,
Sep 17
And this seems to apply to all the try jobs, if there are multiple builds of a builder are associated with an issues, then only the first build is recorded in the cq_attempts table.
,
Sep 17
liaoyuke@ i don't recall if it was by design or not. In any case, your team are the primary consumers of this data. I'm personally in favor of changing it to suit your needs. So, I encourage you to modify data flow job to produce better data in cq attempts (and btw, history of that code may provide a clue, maybe...)
,
Sep 17
Sure, I'll look into it. It is interesting that the build ids of the retries appear in the cq_attempts table now, but just with a few hours delays.
,
Sep 18
tandrii@: what's the expected latency here for a build id to show up in cq_attempts?
,
Sep 18
The dataflow job is triggered every 3 hours. Looking at https://pantheon.corp.google.com/dataflow?project=chrome-infra-events, it appears job runs for 40 minutes. So, approximately 4 hour latency is expected. TBH, i don't know why the job is so slow. It seems like it loads the whole existing cq_events dataset and then computes cq_attempts. This appears quite wasteful to me.
,
Sep 18
3-hours is too long for us. Maybe it's better to derive the needed info from cq_events directly via a sql query? Is it possible?
,
Sep 18
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/4d7ff593e51a42149410366410129be75f925acf commit 4d7ff593e51a42149410366410129be75f925acf Author: Yuke Liao <liaoyuke@chromium.org> Date: Tue Sep 18 22:51:40 2018 [Findit] Change sql query to use cq raw events table This CL changes sql query to use cq raw events table instead of the aggregated one because the aggregated one has about 4 days delay. TBR=stgao@chromium.org Bug: 882940 Change-Id: I96838bd91ddb33daa8d57defb24e55482840884f Reviewed-on: https://chromium-review.googlesource.com/1231876 Commit-Queue: Yuke Liao <liaoyuke@chromium.org> Reviewed-by: Andrii Shyshkalov <tandrii@chromium.org> Cr-Commit-Position: refs/heads/master@{#17680} [modify] https://crrev.com/4d7ff593e51a42149410366410129be75f925acf/appengine/findit/services/flake_detection/flaky_tests.cq_false_rejection.sql |
|||
►
Sign in to add a comment |
|||
Comment 1 by liaoyuke@chromium.org
, Sep 11