Issue metadata
Sign in to add a comment
|
Time-out in chromedriver_py_tests results in INVALID_TEST_RESULTS |
||||||||||||||||||||||
Issue descriptionTask 'with patch': https://chromium-swarm.appspot.com/task?id=415947063d08e310&refresh=10&show_raw=1 Task 'retry with patch': https://chromium-swarm.appspot.com/task?id=4159812e20a01710&refresh=10&show_raw=1 This caused an otherwise unrelated CL to fail: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_rel_ng/239041
,
Nov 30
This is a different problem than issue 904061. 1) There are many failing tests. 2) The point of this crbug is that a timeout in chromedriver_py_tests should not result in INVALID_TEST_RESULTS. Timeouts should be handled gracefully and the test should continue to emit a JSON w/ a list of all tests that would have run, and the statuses for each [including TIMED_OUT, or NOT_RUN] if necessary.
,
Nov 30
The root cause of the failing tests appears to be identical to issue 904061, though I agree the failures should be handled more gracefully. I think the problem is the failed tests were not properly cleaned up, and after a large number of failures the system resources became exhausted, and things started to fail in other ways. Eventually the test script either crashed or was killed before it could reach the point to generate the test result file, thus the INVALID_TEST_RESULTS outcome. I agree the test script should better handle the failures.
,
Nov 30
> Eventually the test script either crashed or was killed before it could reach the point to generate the test result file It most likely hit the swarming task timeout [1hr]. Swarming will first send a SIGTERM, wait 30 seconds, and then do a hard kill. The script should ideally handle SIGTERM and use that signal to emit the JSON results, and then immediately exit.
,
Dec 4
,
Dec 5
,
Dec 12
Happening again, recently: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win7%20Tests%20%281%29/86149 https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win7%20Tests%20%281%29/86129
,
Dec 13
More examples: https://chromium-swarm.appspot.com/task?id=41b92414f5462410&refresh=10&show_raw=1 I'm seeing timeouts in testSwitchToParentFrame, testSwitchToStaleFrame, testSwitchToWindow, testSwitchesToTopFrameAfterGoingBack, testSwitchesToTopFrameAfterNavigation, testSwitchesToTopFrameAfterRefresh, and many more.
,
Dec 13
This task normally completes in 5-10 minutes on Windows: https://chromium-swarm.appspot.com/tasklist?c=name&c=state&c=created_ts&c=duration&c=pending_time&c=pool&c=bot&et=1544736480000&f=name-tag%3Achromedriver_py_tests&f=master-tag%3Atryserver.chromium.win&l=50&n=true&q=mast&s=created_ts%3Adesc&st=1544650080000 But when it flakes, it regularly hits the 30 (?) minute timeout. This is likely due to excessive waits in the code before the test times out.
,
Dec 13
I've separated out the flakiness issue into: https://monorail-prod.appspot.com/p/chromium/issues/detail?id=899886 +stgao. Timeouts are not being caught by Find-It, it seems (?). I've filed a separate bug to disable tests that time out: https://bugs.chromium.org/p/chromium/issues/detail?id=910584 This bug will continue to track graceful recovery from time outs, which currently cause the shard [and all other shards] to fail.
,
Dec 13
Step-level timeouts or test-level timeouts? chromedriver_py_tests is not supported by Findit yet.
,
Dec 13
clarification: chromedriver_py_tests is not supported by Findit for flake analysis yet, but supported for flake detection.
,
Dec 13
These are test-level timeouts which when strung together cause a step-level timeout [the swarming task produces INVALID_TEST_RESULTS due to an incorrect implementation]. I guess that answers my own question -- the latter means that there's no way for Findit to figure out which tests are flaky. :(
,
Dec 13
I don't know how we implement the test runner for chromedriver_py_tests on Swarming, but some gtest still produces output.json even the task shard timeouted. So it depends. As long as the test results are available in the bigquery table test-results, flake detection is supported.
,
Dec 13
Currently chromedriver_py_tests only writes output.json at the very end, so if it is interrupted due to a timeout, no output.json would be generated. I can modify it to incrementally write out output.json as each test case is completed, but it seems tricky to efficiently do incremental output while keeping the output as valid JSON all the time. How does other gtest produce output.json? Does the framework accept truncated output.json file?
,
Dec 14
+maruel IIUC, Swarming runner will send a process TERM signal with a grace period before killing the process. If we register a handler for the TERM signal, maybe we will have the chance to write the output.json file for all executed tests up to then. But anyway, we should still set the result as invalid.
,
Dec 14
https://chromium.googlesource.com/infra/luci/luci-py.git/+/master/appengine/swarming/doc/Bot.md#graceful-termination_aka-the-sigterm-and-sigkill-dance Grace period is currently 30 but this can be fine tuned per task; https://cs.chromium.org/chromium/infra/luci/appengine/swarming/swarming_rpcs.py?l=355 Yes, do something like: def handler(signum, frame): write_output() cancel_all() kill_kill_💥() signal.signal(signal.SIGTERM, handler)
,
Dec 14
c#15: the JSON file must be correctly formatted. Most of our test runners do this: 1) Prior to starting any tests, write output with placeholder iteration data [status = NOTRUN]. 2) Keep test state in memory as tests are run 3) On SIGTERM signal, write output using (2), continuing to use placeholder data e.g. status=NOTRUN for tests that haven't run.
,
Dec 15
Sounds good. I'll implement the algorithm in comments 16 to 18.
,
Dec 15
Awesome! |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by johnchen@chromium.org
, Nov 30Status: Duplicate (was: Untriaged)