New issue
Advanced search Search tips

Issue 862194 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocking:
issue v8:8328



Sign in to add a comment

Dump the list of tests that are running when we are close to overall shard timeout

Project Member Reported by serg...@chromium.org, Jul 10

Issue description

Currently, when shard is timing out, e.g. see https://chromium-swarm.appspot.com/task?id=3e9820d1959edc10, we can't tell which tests are taking too long. It would nice if we could dump the list of tests that are running when that happens.

One way to implement this is to capture SIGTERM from swarming. Alternatively, we can have our own watchdog thread in test runner that prints the list of running tests when we are close to 45 minutes.
 
Nice to have. Currently, this is non-trivial since the multiprocess worker pull the tests from a queue. We don't know what's pulled from the main thread. We'd need to either implement some feedback mechanism, or indeed handle SIGTERM better on the worker side and print what's currently running. Though, I tried the latter before, it somehow doesn't work for a particular class of hanging tests that can't be killed. In particular on Mac.
Labels: -Restrict-View-Google Type-Bug
Status: Available (was: Untriaged)
Cc: machenb...@chromium.org
Michael, do you have a draft CL for your failed attempt?
I think I only tried locally. But on Mac, on failed death-tests we can see in some builds that the text we usually print on caught SIGTERM is not printed. So we couldn't print anything else either...
The it seems we need to go with the watchdog thread approach that will have a way to communicate with the worker thread to get the list of currently running tests. One disadvantage with this is that we'll like have to duplicate the information about the timeout in cr-buildbucket.cfg and in logic for this watchdog thread, unless we find a way to request the current timeout from parent swarming process on the bot somehow or by reading cr-buildbucket.cfg via Gitiles APIs.
Cc: mslekova@chromium.org
Blocking: v8:8328

Sign in to add a comment