New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 649831 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Oct 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Mac
Pri: 1
Type: Feature

Blocking:
issue 649391
issue 649841



Sign in to add a comment

Change test_launcher to write output.json at the start of the execution and on-going

Project Member Reported by mar...@chromium.org, Sep 23 2016

Issue description

Current state:
- Chromium recipe isolate and triggers a test as a swarming task.
- In the Swarming task:
  - test_launcher starts.
  - When it runs in parallel (bot) mode, it starts N child processes.
    - Each child processes runs their chunk of test cases.
    - They write the results in a temporary file upon exit as instructed by the master. See test_launcher.cc:DoLaunchChildTestProcess()
  - The main test_launcher processes gather results and writes it to ${ISOLATED_OUTDIR}/output.json.
- The recipe collects the results.

Relevant code: //base/test/launcher


Problems:
Source:
- When the master test_launcher process is forcibly killed mid-execution, ${ISOLATED_OUTDIR}/output.json is not written at all.
Effects:
- When the recipe collects the task results and there is no output.json, it doesn't have enough information to report anything else than InfraFailure.
- We lose all information about which test cases had time to succeed, fail and which one weren't run.


Notes:
- It doesn't really matter why the process was killed in the context of this issue, the goal here is to know that at least some test cases ran or not.
- The query written by Katie shown that this is happening on a continuous basis.
- This doesn't fix the problem when the bot reboots mid-task, this will be addressed separately in  issue 645280 .


AI:
- Change //base/test/launcher to write output.json right at the start and mark it clear that there are test cases pending execution.
  - This likely require change in the format.
- Optional, needs discussion: every time there's more information (or buffer for N seconds), update the file. This could cause a small (large?) perf hit but would make the data much more useful. This is trickier since it needs collaboration between the children and master processes.

Changes to the recipes are outside the scope of this bug, will file another one.
 

Comment 1 by mar...@chromium.org, Sep 23 2016

Labels: OS-Linux OS-Mac OS-Windows

Comment 2 by mar...@chromium.org, Sep 23 2016

Blocking: 649841

Comment 3 by kbr@chromium.org, Sep 23 2016

Cc: nedngu...@google.com dpranke@chromium.org eyaich@chromium.org tansell@chromium.org
What should we do about Telemetry based tests (SwarmingIsolatedScriptTest)? The same policy change should take effect there as well?

I assume webkit_tests should do the same once they're Swarmed?

Hmm ... I'm not sure I agree with this basic idea, though I'd have to think about it more. 

Maybe it would be better if the test launcher was streaming results back over a socket or a pipe? 

Alternatively maybe we should just make the individual jobs small enough that we don't have to wait too long for them to die and it's not that painful to retry them?

Generally in my test harnesses I've tried hard to make sure that the harness itself never dies, even though the underlying tests might, but of course we can't make that always work.
json format is not very streaming friendly. They way we do it in telemetry is for each page (~ test) run in a benchmark, we stream gtest output that looks s.t like:
[ Run  ]  text_10000_pixels_per_second 
....

[       OK ] text_10000_pixels_per_second (18481 ms)
[ Run  ]  text_15000_pixels_per_second 
.....
[       OK ] text_15000_pixels_per_second (14745 ms)
...

(example log: https://uberchromegw.corp.google.com/i/chromium.perf/builders/Win%208%20Perf%20%281%29/builds/7742/steps/thread_times.tough_scrolling_cases/logs/stdio)

Then we output json file at the end that contains detailed info of tests which is more bot friendly.

I think this is enough to figure out: which tests have run, for how long, and which ones are currently stuck.
The JSON format was not designed to be streamable, but it wouldn't be too hard to come up with a variant that was.
- We lose all information about which test cases had time to succeed, fail and which one weren't run.

This isn't totally true, correct? We can see that information in the log.
Right, but the whole point of writing the output json is to provide the data in a known format that gets uploaded somewhere, so that we don't have to parse logs (that are potentially in ad-hoc or varying formats).
Labels: Restrict-View-Google
Why did you mark this Restrict-View-Google? There's nothing that needs to be private in this ...
Labels: -Restrict-View-Google
Removing RVG, there's no reason for this flag AFAIK.

Re comment #4: streaming may become an option eventually but this is longer term anyway. This issue is about short term improvement. This is the wrong channel to talk about this, ping me directly for more details.

Re comment #6: in the proposal I note that test_launcher could write periodically. It's up to Pawel to decide on this.
Status: Started (was: Assigned)
Uploaded https://codereview.chromium.org/2411003003 . We could make it more sophisticated later.
Blocking: 649391
Project Member

Comment 14 by bugdroid1@chromium.org, Oct 12 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5dc1ee3fccf13062e1ee9b1fbb25189f617c74c8

commit 5dc1ee3fccf13062e1ee9b1fbb25189f617c74c8
Author: phajdan.jr <phajdan.jr@chromium.org>
Date: Wed Oct 12 10:46:14 2016

Save an early test summary in case the test launcher crashes or gets killed

This may turn some infra failures (purple) into INVALID_TEST_RESULTS (red)
in case the test launcher is for example early terminated on swarming.

See bug for more details.

BUG= 649831 

Review-Url: https://codereview.chromium.org/2411003003
Cr-Commit-Position: refs/heads/master@{#424709}

[modify] https://crrev.com/5dc1ee3fccf13062e1ee9b1fbb25189f617c74c8/base/test/launcher/test_launcher.cc
[modify] https://crrev.com/5dc1ee3fccf13062e1ee9b1fbb25189f617c74c8/base/test/launcher/test_launcher.h
[modify] https://crrev.com/5dc1ee3fccf13062e1ee9b1fbb25189f617c74c8/base/test/launcher/test_results_tracker.cc
[modify] https://crrev.com/5dc1ee3fccf13062e1ee9b1fbb25189f617c74c8/base/test/launcher/test_results_tracker.h

Issue 654074 has been merged into this issue.
Cc: jbudorick@chromium.org stip@chromium.org
 Issue 652787  has been merged into this issue.
Cc: phajdan@google.com
Status: Fixed (was: Started)
Closing. We can revisit this later in case a more sophisticated fix is needed.

Sign in to add a comment