chromium-try-flakes seems to simply ignore some flakes and only report others |
||||
Issue descriptionExample: https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyEgsSBUZsYWtlIgdhbmFseXplDA Flake at 2016-09-08 09:03:24 UTC does not have a bug number shown next to it. In datastore the recorded bug number is "0" (https://pantheon.corp.google.com/datastore/entities/edit?key=0%2F%7C8%2FFlakyRun%7C19%2Fid:5695705886752768&project=chromium-try-flakes&ns=&kind=FlakyRun&filter=11%2Ffailure_run%7CKEY%7CEQ%7C156%2FKey(Namespace(%27%27),%20PatchsetBuilderRuns,%20%272314763002.40001.master.tryserver.chromium.linux.linux_chromium_chromeos_ozone_rel_ng%27,%20BuildRun,%205629499534213120). From the code it seems as if Monorail API returned id=0 for the created/updated issues, but it needs to be investigated deeper.
,
Sep 11 2016
Hm. I've just realized that "0" is expected since this is the default value. It seems like we just didn't ever update the record with the actual bug number.
,
Sep 11 2016
I've studied the logs for the flake "org.chromium.chrome.browser.sync.BookmarksTest#testDownloadMovedBookmark" (from the link in #1) and tried to reconstruct what happened (all times are in UTC): 1. The test was flaky before August 2016 and there are 12 recorded FlakyRuns (all from 2015). 1. First 3 FlakyRuns are recorded at 2016-08-23 15:22:12, 2016-08-23 22:52:09 and 2016-08-23 23:44:11. 2. CronJob /cron/update_issue_tracker starts to schedule processing of the Flake every 30 minutes and first processing call happens at 14:53:00.481 UTC. 3. Many more FlakyRuns are recorded on 2016-08-24. The first 3 runs are at 16:01:15, 15:45:47 and 15:36:09. 4. Issue 640728 is created at 2016-08-24 20:23:08 and only 8 flakes are reported on it. 5. At 2016-08-24 21:38:58, the issue is marked as a duplicate of issue 640669 , which was already marked as fixed by then (at 21:33:06). In datastore the Flake properties are: - issue_id == 0 - old_issue_id == 640728 - count_all == len(occurrences) == 100 - num_reported_flaky_runs == 23 Several questions arise out of this: 1. Why didn't processing start on 2016-08-23 when there were already 3 issues? 2. What caused processing to start at 2016-08-24 14:53:00? First flake on 2016-18-24 was recorded >1 hour later. 3. Why only 8 flakes were reported on the issue? There were 77 FlakyRuns in the previous 24 hours. 4. It is clear that issue_id is set to 0 since issue 640669 was marked as Fixed. But why is old_issue_id set to 640728? 5. Why is number of reported FlakyRuns equals to 23? This is all rather confusing and I wonder if it all could be caused by some race conditions. OTH, they should not happen since I've limited max concurrent tasks to 1. Another cause could be datastore eventual consistency, but I'm just guessing in the wild...
,
Sep 11 2016
IMHO, this should all just be solved in Dremel. Current design is just too complicated.
,
Sep 11 2016
I've manually set num_reported_flaky_runs = 100 on Flake entity.
,
Sep 12 2016
,
Nov 2 2016
Example flakes that were not reported on the bug, see issue 661434 .
,
Jan 18 2017
If this is really a Pri-1, find an owner and update the priority. This is the result of a bulk edit that moved high priority available bugs to a lower priority in an attempt to be more honest with bug filers.
,
Apr 24 2017
|
||||
►
Sign in to add a comment |
||||
Comment 1 by serg...@chromium.org
, Sep 11 2016