New issue
Advanced search Search tips

Issue 910584 link

Starred by 2 users

Issue metadata

Status: Started
Merged: issue 904061
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 881991



Sign in to add a comment

Time-out in chromedriver_py_tests results in INVALID_TEST_RESULTS

Project Member Reported by erikc...@chromium.org, Nov 30

Issue description

Mergedinto: 904061
Status: Duplicate (was: Untriaged)
This appears to be the same as issue 904061, though it had only been observed on Mac before. Looks like it's not platform-dependent as previously thought. I'll submit a CL to disable the culprit test on all platforms.
Status: Available (was: Duplicate)
This is a different problem than issue 904061.

1) There are many failing tests.
2) The point of this crbug is that a timeout in chromedriver_py_tests should not result in INVALID_TEST_RESULTS. Timeouts should be handled gracefully and the test should continue to emit a JSON w/ a list of all tests that would have run, and the statuses for each [including TIMED_OUT, or NOT_RUN] if necessary.
The root cause of the failing tests appears to be identical to issue 904061, though I agree the failures should be handled more gracefully. I think the problem is the failed tests were not properly cleaned up, and after a large number of failures the system resources became exhausted, and things started to fail in other ways. Eventually the test script either crashed or was killed before it could reach the point to generate the test result file, thus the INVALID_TEST_RESULTS outcome.

I agree the test script should better handle the failures.
>  Eventually the test script either crashed or was killed before it could reach the point to generate the test result file

It most likely hit the swarming task timeout [1hr]. Swarming will first send a SIGTERM, wait 30 seconds, and then do a hard kill. The script should ideally handle SIGTERM and use that signal to emit the JSON results, and then immediately exit.
Labels: Infra-Platform-Test
Blocking: 881991
More examples:
https://chromium-swarm.appspot.com/task?id=41b92414f5462410&refresh=10&show_raw=1

I'm seeing timeouts in testSwitchToParentFrame, testSwitchToStaleFrame, testSwitchToWindow, testSwitchesToTopFrameAfterGoingBack, testSwitchesToTopFrameAfterNavigation, testSwitchesToTopFrameAfterRefresh, and many more.
This task normally completes in 5-10 minutes on Windows:
https://chromium-swarm.appspot.com/tasklist?c=name&c=state&c=created_ts&c=duration&c=pending_time&c=pool&c=bot&et=1544736480000&f=name-tag%3Achromedriver_py_tests&f=master-tag%3Atryserver.chromium.win&l=50&n=true&q=mast&s=created_ts%3Adesc&st=1544650080000

But when it flakes, it regularly hits the 30 (?) minute timeout. This is likely due to excessive waits in the code before the test times out.
Cc: st...@chromium.org
Labels: -Pri-3 Pri-1
Owner: johnchen@chromium.org
Status: Assigned (was: Available)
I've separated out the flakiness issue into: https://monorail-prod.appspot.com/p/chromium/issues/detail?id=899886

+stgao. Timeouts are not being caught by Find-It, it seems (?). I've filed a separate bug to disable tests that time out:
https://bugs.chromium.org/p/chromium/issues/detail?id=910584

This bug will continue to track graceful recovery from time outs, which currently cause the shard [and all other shards] to fail.
Step-level timeouts or test-level timeouts?

chromedriver_py_tests is not supported by Findit yet.
clarification: chromedriver_py_tests is not supported by Findit for flake analysis yet, but supported for flake detection.
These are test-level timeouts which when strung together cause a step-level timeout [the swarming task produces INVALID_TEST_RESULTS due to an incorrect implementation]. I guess that answers my own question -- the latter means that there's no way for Findit to figure out which tests are flaky. :(
I don't know how we implement the test runner for chromedriver_py_tests on Swarming, but some gtest still produces output.json even the task shard timeouted. So it depends.

As long as the test results are available in the bigquery table test-results, flake detection is supported.
Currently chromedriver_py_tests only writes output.json at the very end, so if it is interrupted due to a timeout, no output.json would be generated. I can modify it to incrementally write out output.json as each test case is completed, but it seems tricky to efficiently do incremental output while keeping the output as valid JSON all the time. How does other gtest produce output.json? Does the framework accept truncated output.json file?
Cc: mar...@chromium.org
+maruel

IIUC, Swarming runner will send a process TERM signal with a grace period before killing the process. If we register a handler for the TERM signal, maybe we will have the chance to write the output.json file for all executed tests up to then.

But anyway, we should still set the result as invalid.
https://chromium.googlesource.com/infra/luci/luci-py.git/+/master/appengine/swarming/doc/Bot.md#graceful-termination_aka-the-sigterm-and-sigkill-dance

Grace period is currently 30 but this can be fine tuned per task;
https://cs.chromium.org/chromium/infra/luci/appengine/swarming/swarming_rpcs.py?l=355


Yes, do something like:

def handler(signum, frame):
  write_output()
  cancel_all()
  kill_kill_💥()

signal.signal(signal.SIGTERM, handler)
c#15: the JSON file must be correctly formatted. Most of our test runners do this:

1) Prior to starting any tests, write output with placeholder iteration data [status = NOTRUN].
2) Keep test state in memory as tests are run
3) On SIGTERM signal, write output using (2), continuing to use placeholder data e.g. status=NOTRUN for tests that haven't run.

Status: Started (was: Assigned)
Sounds good. I'll implement the algorithm in comments 16 to 18.
Awesome!

Sign in to add a comment