Issue metadata
Sign in to add a comment
|
A bunch of tests incorrectly marked flaky because of swarming failures |
||||||||||||||||||||||||
Issue description"PolicyPrefIndicatorTestInstance/PolicyPrefIndicatorTest.CheckPolicyIndicators/22" is flaky. This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label. We have detected 4 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyWwsSBUZsYWtlIlBQb2xpY3lQcmVmSW5kaWNhdG9yVGVzdEluc3RhbmNlL1BvbGljeVByZWZJbmRpY2F0b3JUZXN0LkNoZWNrUG9saWN5SW5kaWNhdG9ycy8yMgw. Flaky tests should be disabled within 30 minutes unless culprit CL is found and reverted. Please see more details here: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues#triaging-auto-filed-flakiness-bugs
,
Oct 31 2016
,
Oct 31 2016
Issue 660712 has been merged into this issue.
,
Oct 31 2016
Issue 660710 has been merged into this issue.
,
Oct 31 2016
Issue 660691 has been merged into this issue.
,
Oct 31 2016
,
Oct 31 2016
,
Oct 31 2016
,
Oct 31 2016
,
Nov 1 2016
,
Nov 2 2016
,
Nov 2 2016
Re #1: We are aware that some flakes are infra-related, but it's non-trivial to automatically detect this. Since most flakes are caused by tests, we've made a decision to route bugs to sheriffs by default and expect sheriffs to investigate the issue and re-route to troopers if needed. This is also why we need human judgment and do not simply automate disabling tests. In fact, we report flakes in some known infra steps directly to troopers, e.g. see issue 594867 . If you think we can further improve automated detection of infra-related flakes, please file a bug describing the suggested approach and add a label Infra>Flakiness>Pipeline to it. Thank you.
,
Nov 2 2016
,
Nov 8 2016
It looks like each of the tests listed as failing also produced "excessive output," but still passed. (I'm assuming that lines like the following indicate the test passed: [ OK ] ContentSettingBubbleModelMediaStreamTest.ManageLink (9285 ms)) I'm a little bit confused as to why these tests are producing excess output. @phoglund, who is the owner of these tests? Can we CC that person here to get some insight? I'm assigning this to MA as owner of swarming. Why are these tests being marked as INFRA FAILURE? Is this the desired outcome? If so, why?
,
Nov 9 2016
Re #12: I understand, it's a hard problem. Assigning to sheriffs is perhaps the most reasonable thing to do. Also I don't "own" browser_tests, I was just the sheriff when this happened. I don't know who to talk about general browser_tests problems unfortunately. Pawel, you know? I know you've worked with the test launchers and what they print; appears browser_tests is printing too much data for swarming, in this case.
,
Nov 9 2016
We are now discussing who should own the test launchers, like browser_tests or webkit_tests here: https://groups.google.com/a/google.com/d/msg/chrome-infra/dVpYIDMsH2M/mC0woRPWCQAJ.
,
Nov 23 2016
@maruel, If these tests are normally fine, I'm guessing they are producing excessive output because they are failing in some way. In that case, these should be reported as red, correct?
,
Nov 23 2016
https://chromium-swarm.appspot.com/tasklist?f=buildername%3Alinux_chromium_chromeos_rel_ng&f=buildnumber%3A305492 I looked at each of the individual failure and they generated a json file of around 100Mb each. The file was successfully stored, so Swarming and Isolate worked fine. I "suspect" it is the recipes that chocked on trying to load all these json files at once and failed to. I could be wrong but still, the tasks ran successfully (from Swarming's perspective). So yes, they should have been reported as normal failure and the fact that they are reported as infrastructure failure is a bug.
,
Nov 29 2016
Thanks @maruel! Assigned to @iannucci for recipe expertise.
,
Dec 8 2016
,
Dec 16 2016
If the recipe author requested that the json file be loaded with `api.json.output`, then yes, it will attempt to read the json file into the recipe. If those JSON files are enormous, then I can see it overwhelming the recipe engine process. I would recommend changing the recipe to not load multiple 100MB JSON files into memory, but I'm not familiar with the recipes in question. The obvious solution, of course, is to have the test harness spit out a smaller summary json document which only has the information needed by the recipe. If that's for some reason impossible, a second thing to do would be to immediately trim the document by implementing a custom Placeholder; it would read the document, and then trim it down to just the details the recipe needs before retaining it. I would be willing to augment json.output() to take a trim function that could be used to implement that. If simply reading and parsing the document is too much work for the recipe engine (and we can't emit less data), then we'd have to investigate adding support for an external tool, such as jq, to do a streaming filter of the json while reading it from disk before the recipe engine ever sees it. This would be a lot more work and would add a binary dependency to the recipe engine runtime (something I'm planning to support, but am not yet working on).
,
Dec 16 2016
Oh, and while we're at it, I would probably recommend making api.raw_io (and thus api.json) hard-fail when being asked to handle documents > 256KB. "Doc! It hurts when I do this!" "... then don't do that."
,
Dec 16 2016
Thanks @iannucci! Do you know someone who is familiar with the recipe who would be a good fit to take this on?
,
Dec 16 2016
,
Dec 16 2016
,
Dec 16 2016
I think we actually duped those two bugs into each other
,
Dec 16 2016
Heh, ok. Either way!
,
Dec 17 2016
Dirk could you figure out priority/owner and such for this? I'm out this next few weeks so I don't want to be holding it during that time anyhow.
,
Dec 17 2016
(oh and see comments #21, #22 for diagnosis/proposed solution(s))
,
Dec 18 2016
Yup, assigning to me is fine.
,
Dec 19 2016
There is an owner on this bug, but the status was not "Assigned" or "Started". Fixing. If you do not own this bug, please remove yourself as the owner and make the status "Available".
,
Jan 15 2017
,
Jan 25 2017
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by phoglund@chromium.org
, Oct 31 2016Labels: -Sheriff-Chromium Infra-Troopers
Summary: "PolicyPrefIndicatorTestInstance/PolicyPrefIndicatorTest.CheckPolicyIndicators/22" is flaky, probably because of swarming failures (was: "PolicyPrefIndicatorTestInstance/PolicyPrefIndicatorTest.CheckPolicyIndicators/22" is flaky)