New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 652787 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 649831
Owner:
Closed: Oct 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 3
Type: Bug



Sign in to add a comment

StartInstrumentation hanging on content_browsertests

Project Member Reported by katthomas@chromium.org, Oct 4 2016

Issue description

In the past 24 hours, 14 tryserver.chromium.android-linux_android_rel_ng's content_browsertests (with patch) step failed due to a timeout. It looks like this step usually takes less, than 8 minutes, but these guys were hitting the 16 minute timeout. It looks like they were spending lots of time on a hanging "StartInstrumentation", although this is my uninformed analysis. 

A look at http://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=content_browsertests%20%28with%20patch%29&builder=tryserver.chromium.android%3Alinux_android_rel_ng, http://shortn/_61BQ4pIlCz, and http://shortn/_hcPPjzwo28 show nothing out of the ordinary. 

List of builds:

https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/153539/steps/content_browsertests%20%28with%20patch%29%20on%20Android/logs/stdio/text
https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/153458/steps/content_browsertests%20%28with%20patch%29%20on%20Android/logs/stdio/text
https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/153432/steps/content_browsertests%20%28with%20patch%29%20on%20Android/logs/stdio/text
https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/153373/steps/content_browsertests%20%28with%20patch%29%20on%20Android/logs/stdio/text
https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/153371/steps/content_browsertests%20%28with%20patch%29%20on%20Android/logs/stdio/text
https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/153185/steps/content_browsertests%20%28with%20patch%29%20on%20Android/logs/stdio/text
https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/153190/steps/content_browsertests%20%28with%20patch%29%20on%20Android/logs/stdio/text
https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/153047/steps/content_browsertests%20%28with%20patch%29%20on%20Android/logs/stdio/text
https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/153045/steps/content_browsertests%20%28with%20patch%29%20on%20Android/logs/stdio/text
https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/152915/steps/content_browsertests%20%28with%20patch%29%20on%20Android/logs/stdio/text
https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/152656/steps/content_browsertests%20%28with%20patch%29%20on%20Android/logs/stdio/text
https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/152621/steps/content_browsertests%20%28with%20patch%29%20on%20Android/logs/stdio/text
https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/152474/steps/content_browsertests%20%28with%20patch%29%20on%20Android/logs/stdio/text
https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/152372/steps/content_browsertests%20%28with%20patch%29%20on%20Android/logs/stdio/text
 
StartInstrumentation is the test execution. One or more tests are flakily taking a long time.

I don't think we have any results to upload to the flakiness dashboard in the event of a swarming timeout.
Labels: -Test-
Status: Available (was: Untriaged)
Do you mean, in this case, we don't upload anything to the flakiness dashboard?

Can you identify which tests are flakily taking a log time from the logs? It's not really clear to me.
Cc: katthomas@chromium.org
#3: correct.

And no, not in this case. We lose too much of the log w/ the timeout kill, and we don't get the logcat at all.
er, to clarify: we do upload a JSON to the dashboard, but it's empty: http://test-results.appspot.com/testfile?builder=linux_android_rel_ng&name=full_results.json&master=tryserver.chromium.android&testtype=content_browsertests%20%28with%20patch%29&buildnumber=153539

I think we're *supposed* to be getting a SIGTERM from swarming when we hit the timeout, but I don't see evidence of that in the log (which I would expect to see, even if it's just a log message). I'm not sure if that's because our SIGTERM handling is broken or because swarming's not sending it correctly.
Swarming sends the SIGTERM to the process it started. If you are a few process down the tree, it's possible the propagation is not done properly (?)
Owner: jbudorick@chromium.org
Status: Started (was: Available)
Yeah, that's exactly what's going on - SIGTERM is getting delivered to the wrapper script, but that's not propagating it down to the test runner.
Project Member

Comment 9 by bugdroid1@chromium.org, Oct 5 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/626bdde804d615c4582b5c1a78bd8929f83b1ab9

commit 626bdde804d615c4582b5c1a78bd8929f83b1ab9
Author: jbudorick <jbudorick@chromium.org>
Date: Wed Oct 05 18:12:05 2016

[Android] Use os.exec in the test wrapper scripts.

BUG= 652787 

Review-Url: https://codereview.chromium.org/2397523003
Cr-Commit-Position: refs/heads/master@{#423220}

[modify] https://crrev.com/626bdde804d615c4582b5c1a78bd8929f83b1ab9/build/android/gyp/create_test_runner_script.py

Description: Show this description
What effect do we expect this change to have with respect to the original issue? With the test runner receiving the SIGTERM, will we be able to see more info in the logs? Will the results be uploaded to the flakiness dashboard? 
#11: we should see:
 1) 'Received SIGTERM. Stopping test execution.' in the log (from https://codesearch.chromium.org/chromium/src/build/android/pylib/local/device/local_device_test_run.py?rcl=0&l=85)
 2) the test runner complete with an exit code other than -15. I expect it to be nonzero, but I'm not sure what it'll be.
 3) Some results reported in the log and uploaded to the flakiness dashboard in the subsequent step.
Mergedinto: 649831
Status: Duplicate (was: Started)
 Issue 649831  should have fixed execution timeout errors being marked as infra failures.
Project Member

Comment 14 by bugdroid1@chromium.org, Oct 27 2016

Labels: merge-merged-2840
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/626bdde804d615c4582b5c1a78bd8929f83b1ab9

commit 626bdde804d615c4582b5c1a78bd8929f83b1ab9
Author: jbudorick <jbudorick@chromium.org>
Date: Wed Oct 05 18:12:05 2016

[Android] Use os.exec in the test wrapper scripts.

BUG= 652787 

Review-Url: https://codereview.chromium.org/2397523003
Cr-Commit-Position: refs/heads/master@{#423220}

[modify] https://crrev.com/626bdde804d615c4582b5c1a78bd8929f83b1ab9/build/android/gyp/create_test_runner_script.py

Comment 15 by dimu@google.com, Nov 4 2016

Labels: -merge-merged-2840
[Automated comment] removing mislabelled merge-merged-2840

Sign in to add a comment