New issue
Advanced search Search tips

Issue 895027 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 13
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 3
Type: Bug



Sign in to add a comment

Ensure the android test runner still emits test results json even if a SIGTERM is caught

Project Member Reported by bpastene@chromium.org, Oct 12

Issue description

See https://chromium-swarm.appspot.com/task?id=407d3fa49b092e10

The test got sent a sigterm from swarming due to timeout. It caught the signal and quit execution... except it never printed the test results it had so far. That can wreak havoc w/ the recipe result-handling/retry-logic.

If we can, let's dump the results if we get sigterm'ed.
 
I was under the impression that context managers got a chance to exit after we get a sigterm & that we'd thus write the JSON via https://codesearch.chromium.org/chromium/src/build/android/test_runner.py?rcl=c8e7788f4f786a59d39bbe9720548f5ac796d507&l=793 -- perhaps either I'm mistaken or all_raw_results isn't populated at all until the run completes.
Owner: erikc...@chromium.org
Status: Assigned (was: Available)
Investigating.
Cc: mar...@chromium.org
I've confirmed that //build/android/test_runner.py will correctly populate the file sent to --test-launcher-summary-output= on SIGTERM. This suggests that swarming is likely not picking up the output from the file after it sends SIGTERM. maruel@, any ideas? I'll continue to dig in the meanwhile.
The properties passed to the swarming remote client can be seen here:
https://apis-explorer.appspot.com/apis-explorer/?base=https://chromium-swarm.appspot.com/_ah/api#p/swarming/v1/swarming.task.request?task_id=407d3fa49b092e10&_h=2&

To repro a local run with a 100s hard timeout, 30s grace period:

1) Patch https://chromium-review.googlesource.com/c/infra/luci/luci-py/+/1301720.

2) 
"""
python ~/projects/infra/infra/luci/client/isolateserver.py download -I https://isolateserver.appspot.com --namespace default-gzip -s 7e65e8492576d1a2763ad8642d39ac3162116948 --target bar
"""

3) 
"""
~/projects/infra/infra/luci/client/run_isolated.py -I https://isolateserver.appspot.com -s 7e65e8492576d1a2763ad8642d39ac3162116948 --hard-timeout=100 --grace-period=30 --test-launcher-summary-output=\${ISOLATED_OUTDIR}/output.json --leak-temp-dir
"""

Output from local run:
"""
C   99.319s Main  Received SIGTERM. Stopping test execution.
C   99.334s Main  ********************************************************************************
C   99.334s Main  Detailed Logs
C   99.334s Main  ********************************************************************************
C   99.361s Main  ********************************************************************************
C   99.361s Main  Summary
C   99.361s Main  ********************************************************************************
C   99.366s Main  [==========] 2552 tests ran.
C   99.366s Main  [  PASSED  ] 15 tests.
C   99.366s Main  ********************************************************************************
I   99.367s TimeoutThread-1-for-individual_device_tear_down(FA79E1A05388)  [host]> /usr/local/google/home/erikchen/projects/test_isolates/bar/ir/third_party/android_tools/sdk/platform-tools/adb -s FA79E1A05388 shell '( su 0 sh -c '"'"'echo -n '"'"'"'"'"'"'"'"'_ --strict-mode=testing --enable-test-intents'"'"'"'"'"'"'"'"' > /data/local/tmp/test-cmdline-file'"'"' );echo %$?'
I   99.483s TimeoutThread-1-for-individual_device_tear_down(FA79E1A05388)  [host]> /usr/local/google/home/erikchen/projects/test_isolates/bar/ir/third_party/android_tools/sdk/platform-tools/adb -s FA79E1A05388 shell '( test -e /data/local/tmp/test-cmdline-file );echo %$?'
I   99.486s TimeoutThread-1-for-run_tests_on_device(FA79E1A05388)  Setting permissions for org.chromium.chrome.
I   99.487s TimeoutThread-1-for-run_tests_on_device(FA79E1A05388)  [host]> /usr/local/google/home/erikchen/projects/test_isolates/bar/ir/third_party/android_tools/sdk/platform-tools/adb -s FA79E1A05388 shell '( ( p=org.chromium.chrome;for q in android.permission.ACCESS_COARSE_LOCATION android.permission.ACCESS_FINE_LOCATION android.permission.CAMERA android.permission.GET_ACCOUNTS android.permission.READ_EXTERNAL_STORAGE android.permission.RECORD_AUDIO android.permission.WRITE_EXTERNAL_STORAGE;do pm grant "$p" "$q";echo "~X~$q~X~$?~X~";done )>/data/local/tmp/temp_file-57d76c4e6a40 2>&1 );echo %$?'
I   99.548s TimeoutThread-1-for-individual_device_tear_down(FA79E1A05388)  [host]> /usr/local/google/home/erikchen/projects/test_isolates/bar/ir/third_party/android_tools/sdk/platform-tools/adb -s FA79E1A05388 shell '( su 0 sh -c '"'"'TZ=utc ls -a -l /data/local/tmp/'"'"' );echo %$?'
I   99.604s TimeoutThread-1-for-individual_device_tear_down(FA79E1A05388)  [host]> /usr/local/google/home/erikchen/projects/test_isolates/bar/ir/third_party/android_tools/sdk/platform-tools/adb -s FA79E1A05388 shell '( su 0 sh -c '"'"'cat /data/local/tmp/test-cmdline-file'"'"' );echo %$?'
I   99.657s individual_device_tear_down(FA79E1A05388)  Flags now set on the device: ['--strict-mode=testing', '--enable-test-intents']
I   99.658s TimeoutThread-1-for-individual_device_tear_down(FA79E1A05388)  [host]> /usr/local/google/home/erikchen/projects/test_isolates/bar/ir/third_party/android_tools/sdk/platform-tools/adb -s FA79E1A05388 shell '( am clear-debug-app );echo %$?'
I   99.758s TimeoutThread-1-for-individual_device_tear_down(FA79E1A05388)  [host]> /usr/local/google/home/erikchen/projects/test_isolates/bar/ir/third_party/android_tools/sdk/platform-tools/adb -s FA79E1A05388 shell '( su 0 sh -c '"'"'rm -f /data/local/tmp/chrome_timeout_scale'"'"' );echo %$?'
I   99.837s Main  Total time spent symbolizing: 0.32s
I   99.986s tear_down_device(FA79E1A05388)  Wrote device cache: /usr/local/google/home/erikchen/projects/test_isolates/bar/ir/out/Release/device_cache_FA79E1A05388.json
I  100.062s Main  Generated json results file at /usr/local/google/home/erikchen/projects/test_isolates/bar/itToVaZS/tmpau0JkL
I  100.062s Main  Finishing archiving output.
[run_isolated_out_hack]{"hash":"35896db86fa9875984d2cfbcc133f4113f2f3d0d","namespace":"default-gzip","storage":"https://isolateserver.appspot.com"}[/run_isolated_out_hack]
"""

Output from swarming run above:
"""
C 3598.871s Main  Received SIGTERM. Stopping test execution.
C 3598.876s Main  ********************************************************************************
C 3598.876s Main  Detailed Logs
C 3598.876s Main  ********************************************************************************
C 3598.877s Main  ********************************************************************************
C 3598.877s Main  Summary
C 3598.877s Main  ********************************************************************************
C 3598.880s Main  [==========] 203 tests ran.
C 3598.880s Main  [  PASSED  ] 177 tests.
C 3598.880s Main  [  SKIPPED ] Skipped 10 tests, listed below:
C 3598.880s Main  [  SKIPPED ] org.chromium.chrome.browser.browserservices.TrustedWebActivityClientTest#doesntUseIconFromServiceIfStatusBarBitmapsSupported
C 3598.880s Main  [  SKIPPED ] org.chromium.chrome.browser.compositor.overlays.strip.TabStripTest#testSwitchStripStackersWithFirstTabSelected
C 3598.881s Main  [  SKIPPED ] org.chromium.chrome.browser.externalnav.ExternalNavigationHandlerTest#testCanExternalAppHandleUrl
C 3598.881s Main  [  SKIPPED ] org.chromium.chrome.browser.externalnav.ExternalNavigationHandlerTest#testInstantAppsIntent_incomingIntentRedirect
C 3598.881s Main  [  SKIPPED ] org.chromium.chrome.browser.externalnav.ExternalNavigationHandlerTest#testLaunchWebApk_ShowIntentPickerInvalidWebApk
C 3598.881s Main  [  SKIPPED ] org.chromium.chrome.browser.externalnav.ExternalNavigationHandlerTest#testPageTransitionType
C 3598.881s Main  [  SKIPPED ] org.chromium.chrome.browser.externalnav.ExternalNavigationHandlerTest#testRedirectToMarketWithReferrer
C 3598.881s Main  [  SKIPPED ] org.chromium.chrome.browser.externalnav.ExternalNavigationHandlerTest#testSameDomainDifferentApps
C 3598.881s Main  [  SKIPPED ] org.chromium.chrome.browser.notifications.NotificationPlatformBridgeTest#testReplyToNotification
C 3598.881s Main  [  SKIPPED ] org.chromium.chrome.browser.notifications.channels.ChannelsInitializerTest#testDeleteLegacyChannels_noopOnCurrentDefinitions
C 3598.881s Main  [  FAILED  ] 15 tests, listed below:
C 3598.882s Main  [  FAILED  ] org.chromium.chrome.browser.download.DownloadNotificationServiceTest#testServiceWillNotStopOnInterruptedDownload (TIMEOUT)
C 3598.882s Main  [  FAILED  ] org.chromium.chrome.browser.feature_engagement.ScreenshotTabObserverTest#testScreenshotActionReporting (TIMEOUT)
C 3598.882s Main  [  FAILED  ] org.chromium.chrome.browser.firstrun.FirstRunIntegrationTest#testAbortFirstRun (TIMEOUT)
C 3598.882s Main  [  FAILED  ] org.chromium.chrome.browser.ntp.cards.NewTabPageRecyclerViewTest#testClickSuggestion (TIMEOUT)
C 3598.882s Main  [  FAILED  ] org.chromium.chrome.browser.offlinepages.OfflinePageBridgeTest#testOfflinePageBridgeDisabledInIncognito (TIMEOUT)
C 3598.882s Main  [  FAILED  ] org.chromium.chrome.browser.offlinepages.OfflinePageBridgeTest#testRemoveRequestsFromQueue (TIMEOUT)
C 3598.882s Main  [  FAILED  ] org.chromium.chrome.browser.omnibox.OmniboxTest#testDefaultText with --disable-features=SpannableInlineAutocomplete (TIMEOUT)
C 3598.882s Main  [  FAILED  ] org.chromium.chrome.browser.omnibox.OmniboxTest#testRequestZeroSuggestAfterDelete with --disable-features=SpannableInlineAutocomplete (TIMEOUT)
C 3598.883s Main  [  FAILED  ] org.chromium.chrome.browser.omnibox.UrlBarTest#testBatchModeChangesTriggerCorrectSuggestions with --disable-features=SpannableInlineAutocomplete (TIMEOUT)
C 3598.883s Main  [  FAILED  ] org.chromium.chrome.browser.partnercustomizations.PartnerDisableIncognitoModeIntegrationTest#testIncognitoMenuItemEnabledBasedOnParentalControls (TIMEOUT)
C 3598.883s Main  [  FAILED  ] org.chromium.chrome.browser.payments.CurrencyFormatterTest#testMultipleConversions (TIMEOUT)
C 3598.883s Main  [  FAILED  ] org.chromium.chrome.browser.payments.PaymentRequestEmailAndFreeShippingTest#testPaymentRequestEventsMetric (TIMEOUT)
C 3598.883s Main  [  FAILED  ] org.chromium.chrome.browser.tabmodel.UndoTabModelTest#testTwoTabs (TIMEOUT)
C 3598.883s Main  [  FAILED  ] org.chromium.chrome.browser.webapps.WebappModeTest#testWebappLaunches (TIMEOUT)
C 3598.883s Main  [  FAILED  ] org.chromium.chrome.browser.webapps.WebappNavigationTest#testNewTabLinkToExternalApp (TIMEOUT)
C 3598.883s Main  
C 3598.883s Main  15 FAILED TESTS
C 3598.883s Main  ********************************************************************************
I 3598.884s TimeoutThread-1-for-individual_device_tear_down(04d6c3b700f4400b)  [host]> /b/swarming/w/ir/third_party/android_tools/sdk/platform-tools/adb -s 04d6c3b700f4400b shell '( echo -n '"'"'_ --strict-mode=testing --disable-features=SpannableInlineAutocomplete --enable-test-intents'"'"' > /data/local/tmp/test-cmdline-file );echo %$?'
C 3601.282s run_tests_on_device(04d6c3b700f4400b)  (TimeoutThread-2-for-run_tests_on_device(04d6c3b700f4400b)) Exception on _RunAdbCmd(<class 'devil.android.sdk.adb_wrapper.AdbWrapper'>, ['shell', '( mkdir -p /storage/emulated/legacy/tmp-c58ebcf96e5de );echo %$?'], check_error=False, retries=2, device_serial=04d6c3b700f4400b, ensure_logs_on_timeout=False, timeout=30), attempt 2 of 3: TimeoutError()
I 3601.283s TimeoutThread-3-for-run_tests_on_device(04d6c3b700f4400b)  [host]> /b/swarming/w/ir/third_party/android_tools/sdk/platform-tools/adb -s 04d6c3b700f4400b shell '( mkdir -p /storage/emulated/legacy/tmp-c58ebcf96e5de );echo %$?'
"""
And now, looking at the diff in these logs we see a problem:

"""
C 3601.282s run_tests_on_device(04d6c3b700f4400b)  (TimeoutThread-2-for-run_tests_on_device(04d6c3b700f4400b)) Exception on _RunAdbCmd(<class 'devil.android.sdk.adb_wrapper.AdbWrapper'>, ['shell', '( mkdir -p /storage/emulated/legacy/tmp-c58ebcf96e5de );echo %$?'], check_error=False, retries=2, device_serial=04d6c3b700f4400b, ensure_logs_on_timeout=False, timeout=30), attempt 2 of 3: TimeoutError()
"""

ADB is timing out. We retry up to 3 times, waiting 30s each. This causes us to hit the 30s grace period timeout, at which point the process is killed.
Note to self: The easiest way to test is: 

1) following instructions in c#4 to create isolated python files.
2) Apply the patch below, and then directly make changes to the python files. The changes will not be overwritten and can be directly tested.
disable_isolate_overwrite.diff
1.4 KB Download
There are two problems.

1) The SIGTERM signal handler in test_runner.py:
https://cs.chromium.org/chromium/src/build/android/test_runner.py?l=745

Is being temporarily overwritten by the SIGTERM signal handler in local_device_test_run.py:
https://cs.chromium.org/chromium/src/build/android/pylib/local/device/local_device_test_run.py?l=116

Which is eventually catching, and then ignoring the exception:
https://cs.chromium.org/chromium/src/build/android/pylib/local/device/local_device_test_run.py?l=174

2) LocalDeviceEnvironment.Teardown:
https://cs.chromium.org/chromium/src/build/android/pylib/local/device/local_device_environment.py?type=cs&q=local_device_environment.py+&g=0&l=234

LocalDeviceInstrumentationTestRun.Teardown:
https://cs.chromium.org/chromium/src/build/android/pylib/local/device/local_device_instrumentation_test_run.py?type=cs&q=local_device_instrumentation_test_run.py&g=0&l=333

Both attempt to invoke ADB. If ADB is borked and timing out, then these may never return before the process is hard killed.


If we get a SIGTERM, then we should attempt to write the output results and exit. We should not attempt to make any ADB calls, since ADB may timeout. I'll investigate to figure out the best way to get this desired behavior.

For (1), we may want to use signal_handler.AddSignalHandler rather than signal_handler.SignalHandler.

For (2), we'll need to teach timeout_retry to not retry in the event of a SIGTERM: https://codesearch.chromium.org/chromium/src/third_party/catapult/devil/devil/utils/timeout_retry.py?rcl=687f318e303c48b791fa608da923c23b31cd6afe&l=168
Status: Started (was: Assigned)
Thanks. I believe that the following 4 changes should fix the problem:

1) signal_handler.SignalHandler -> signal_handler.AddSignalHandler
2) teach timeout_retry to not retry in the event of a SIGTERM
3) Change Teardown methods to not call ADB on SIGTERM
4) Propagate test results to test_runner.py even on SIGTERM
Project Member

Comment 10 by bugdroid1@chromium.org, Oct 29

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d4d9c16f1a163d205ac631236dea509d30e65ac4

commit d4d9c16f1a163d205ac631236dea509d30e65ac4
Author: erikchen <erikchen@chromium.org>
Date: Mon Oct 29 18:10:42 2018

Fix status/string conversion for android json NOTRUN status.

Previously, the if/else block was missing the NOTRUN status. Since the statuses
are already represented by strings, this CL converts the strings to be expected
outputs. This allows us to remove the entire if/else block.

Bug:  895027 
Change-Id: Ieb426eb268c30dd07a6642dcfd97122cdb07c926
Reviewed-on: https://chromium-review.googlesource.com/c/1305414
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>
Cr-Commit-Position: refs/heads/master@{#603562}
[modify] https://crrev.com/d4d9c16f1a163d205ac631236dea509d30e65ac4/build/android/pylib/base/base_test_result.py
[modify] https://crrev.com/d4d9c16f1a163d205ac631236dea509d30e65ac4/build/android/pylib/results/json_results.py

Project Member

Comment 11 by bugdroid1@chromium.org, Oct 29

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/43aaf3b8b671171f29ffc9bb74917ebb16bce5ea

commit 43aaf3b8b671171f29ffc9bb74917ebb16bce5ea
Author: erikchen <erikchen@chromium.org>
Date: Mon Oct 29 20:10:41 2018

Propagate Android test results through exceptions.

Previously, Android test results were propagated after all tests finished
running. In the event of an Exception, none of the test results would be written
to file. This CL slightly changes the control flow of TestRun.RunTests().
Instead of returning a results array, it now receives a results array as input
and modifies the array as it receives test results. This allows the json_writer
context TearDown() to serialize the partial test results.

This CL updates local_device_test_run.py to have the correct flow. The flow of
local_device_perf_test_run.py and local_machine_junit_test_run.py are left
untouched -- they can be modified in the future if it's deemed necessary. This
minimizes the size of this CL.

Bug:  895027 
Change-Id: I60fe78da993de366d678252b84a91aaff044c10b
Reviewed-on: https://chromium-review.googlesource.com/c/1305396
Commit-Queue: Erik Chen <erikchen@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>
Cr-Commit-Position: refs/heads/master@{#603599}
[modify] https://crrev.com/43aaf3b8b671171f29ffc9bb74917ebb16bce5ea/build/android/pylib/base/test_run.py
[modify] https://crrev.com/43aaf3b8b671171f29ffc9bb74917ebb16bce5ea/build/android/pylib/local/device/local_device_perf_test_run.py
[modify] https://crrev.com/43aaf3b8b671171f29ffc9bb74917ebb16bce5ea/build/android/pylib/local/device/local_device_test_run.py
[modify] https://crrev.com/43aaf3b8b671171f29ffc9bb74917ebb16bce5ea/build/android/pylib/local/machine/local_machine_junit_test_run.py
[modify] https://crrev.com/43aaf3b8b671171f29ffc9bb74917ebb16bce5ea/build/android/test_runner.py

Project Member

Comment 12 by bugdroid1@chromium.org, Oct 30

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/5e1c1c293b07ef04a247dd8dff50972d207663a4

commit 5e1c1c293b07ef04a247dd8dff50972d207663a4
Author: erikchen <erikchen@chromium.org>
Date: Tue Oct 30 15:10:31 2018

ADB commands should only retry errors, not exceptions.

By default, all exceptions were swallowed and retried, including exceptions
generated by SIGTERM handler. This is undesired.

Bug:  chromium:895027 
Change-Id: Ic6ce39eae466399e2543410a16906f154de47179
Reviewed-on: https://chromium-review.googlesource.com/c/1305714
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>

[modify] https://crrev.com/5e1c1c293b07ef04a247dd8dff50972d207663a4/devil/devil/android/sdk/adb_wrapper.py

Project Member

Comment 13 by bugdroid1@chromium.org, Oct 30

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c5466799a1a5b20be34cc4c2a3b90f52d6d70660

commit c5466799a1a5b20be34cc4c2a3b90f52d6d70660
Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Date: Tue Oct 30 17:52:33 2018

Roll src/third_party/catapult d15a3be3ab45..5e1c1c293b07 (1 commits)

https://chromium.googlesource.com/catapult.git/+log/d15a3be3ab45..5e1c1c293b07


git log d15a3be3ab45..5e1c1c293b07 --date=short --no-merges --format='%ad %ae %s'
2018-10-30 erikchen@chromium.org ADB commands should only retry errors, not exceptions.


Created with:
  gclient setdep -r src/third_party/catapult@5e1c1c293b07

The AutoRoll server is located here: https://autoroll.skia.org/r/catapult-autoroll

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG= chromium:895027 
TBR=sullivan@chromium.org

Change-Id: Ifa4ab1888a5a3fd36652fbda11af4e269463b18c
Reviewed-on: https://chromium-review.googlesource.com/c/1307753
Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#603954}
[modify] https://crrev.com/c5466799a1a5b20be34cc4c2a3b90f52d6d70660/DEPS

Project Member

Comment 14 by bugdroid1@chromium.org, Oct 30

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/bf9552573217f0ca53fa4a57fd6de9af5c82881b

commit bf9552573217f0ca53fa4a57fd6de9af5c82881b
Author: erikchen <erikchen@chromium.org>
Date: Tue Oct 30 20:48:02 2018

local_device_test_run.py should not swallow SIGTERM signal.

Previously, local_device_test_run.py used SignalHandler() to catch SIGTERM,
raise an exception, and then handle the exception. This prevents other SIGTERM
handlers from ever receiving the exception.

This CL changes local_device_test_run.py to instead use AddSignalHandler(), so
that the exception is propagated to the previously registered SIGTERM handler.

Bug:  895027 
Change-Id: Ia6f84022ee4cf54e7dc36afad012acb3212b51e2
Reviewed-on: https://chromium-review.googlesource.com/c/1308753
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>
Cr-Commit-Position: refs/heads/master@{#604004}
[modify] https://crrev.com/bf9552573217f0ca53fa4a57fd6de9af5c82881b/build/android/pylib/local/device/local_device_test_run.py

Project Member

Comment 15 by bugdroid1@chromium.org, Nov 1

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b16aeb4251477d70bf27694c52597fdb33d37359

commit b16aeb4251477d70bf27694c52597fdb33d37359
Author: erikchen <erikchen@chromium.org>
Date: Thu Nov 01 14:56:20 2018

Context TearDown should not invoke ADB if process receives SIGTERM.

The TearDown method on several contexts will invokes ADB. In some instances
[e.g. on receiving a SIGTERM due to timeout] there's a high probability that ADB
is non-responsive. In these cases, sending an ADB command will potentially take
a long time to time out. Before this happens, the process will be hard-killed
for not responding to SIGTERM fast enough.

This CL makes the test_runner.py SIGTERM handler set
allow_adb_on_teardown=False. This prevents these contexts from invoking ADB,
increasing the likelihood that SIGTERM will handled within the 30 second limit
and emit JSON test results.

Bug:  895027 
Change-Id: Ic30cff73a82a5415c5f531eae5c8f62fb4c953aa
Reviewed-on: https://chromium-review.googlesource.com/c/1311259
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>
Cr-Commit-Position: refs/heads/master@{#604584}
[modify] https://crrev.com/b16aeb4251477d70bf27694c52597fdb33d37359/build/android/pylib/base/environment.py
[modify] https://crrev.com/b16aeb4251477d70bf27694c52597fdb33d37359/build/android/pylib/base/test_run.py
[modify] https://crrev.com/b16aeb4251477d70bf27694c52597fdb33d37359/build/android/pylib/local/device/local_device_environment.py
[modify] https://crrev.com/b16aeb4251477d70bf27694c52597fdb33d37359/build/android/pylib/local/device/local_device_gtest_run.py
[modify] https://crrev.com/b16aeb4251477d70bf27694c52597fdb33d37359/build/android/pylib/local/device/local_device_instrumentation_test_run.py
[modify] https://crrev.com/b16aeb4251477d70bf27694c52597fdb33d37359/build/android/test_runner.py

Project Member

Comment 16 by bugdroid1@chromium.org, Nov 1

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/971689c0597c1b92614e0700e3c5d2518f0e70ee

commit 971689c0597c1b92614e0700e3c5d2518f0e70ee
Author: Ben Pastene <bpastene@chromium.org>
Date: Thu Nov 01 22:33:17 2018

Revert "ADB commands should only retry errors, not exceptions."

This reverts commit 5e1c1c293b07ef04a247dd8dff50972d207663a4.

Reason for revert: speculative fix for crbug.com/900511

Original change's description:
> ADB commands should only retry errors, not exceptions.
>
> By default, all exceptions were swallowed and retried, including exceptions
> generated by SIGTERM handler. This is undesired.
>
> Bug:  chromium:895027 
> Change-Id: Ic6ce39eae466399e2543410a16906f154de47179
> Reviewed-on: https://chromium-review.googlesource.com/c/1305714
> Reviewed-by: John Budorick <jbudorick@chromium.org>
> Commit-Queue: Erik Chen <erikchen@chromium.org>

TBR=erikchen@chromium.org,jbudorick@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug:  chromium:895027 
Change-Id: I9b76f5fda03d1aed1f569652e34770170914e9e7
Reviewed-on: https://chromium-review.googlesource.com/c/1313251
Commit-Queue: Ben Pastene <bpastene@chromium.org>
Reviewed-by: Ben Pastene <bpastene@chromium.org>
Reviewed-by: Erik Chen <erikchen@chromium.org>

[modify] https://crrev.com/971689c0597c1b92614e0700e3c5d2518f0e70ee/devil/devil/android/sdk/adb_wrapper.py

Project Member

Comment 17 by bugdroid1@chromium.org, Nov 2

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9ce5a6743ef251e6509ff1eae579367fe7485964

commit 9ce5a6743ef251e6509ff1eae579367fe7485964
Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Date: Fri Nov 02 03:01:04 2018

Roll src/third_party/catapult b281d52b301d..971689c0597c (1 commits)

https://chromium.googlesource.com/catapult.git/+log/b281d52b301d..971689c0597c


git log b281d52b301d..971689c0597c --date=short --no-merges --format='%ad %ae %s'
2018-11-01 bpastene@chromium.org Revert "ADB commands should only retry errors, not exceptions."


Created with:
  gclient setdep -r src/third_party/catapult@971689c0597c

The AutoRoll server is located here: https://autoroll.skia.org/r/catapult-autoroll

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG= chromium:895027 
TBR=sullivan@chromium.org

Change-Id: I805d1b998211fe410ea719e9a3486f8f90dc61ae
Reviewed-on: https://chromium-review.googlesource.com/c/1313689
Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#604825}
[modify] https://crrev.com/9ce5a6743ef251e6509ff1eae579367fe7485964/DEPS

Project Member

Comment 18 by bugdroid1@chromium.org, Nov 2

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/4527bffcdfd7d9f9e1dff0781f3824681eb3a0b0

commit 4527bffcdfd7d9f9e1dff0781f3824681eb3a0b0
Author: Erik Chen <erikchen@chromium.org>
Date: Fri Nov 02 14:24:39 2018

[Reland 1]  ADB commands should only retry errors, not exceptions.

Previously, TimeoutErrors were exceptions rather than errors, so they were not
being retried. This has been fixed.

> ADB commands should only retry errors, not exceptions.
>
> By default, all exceptions were swallowed and retried, including exceptions
> generated by SIGTERM handler. This is undesired.
>
> Bug:  chromium:895027 
> Change-Id: Ic6ce39eae466399e2543410a16906f154de47179
> Reviewed-on: https://chromium-review.googlesource.com/c/1305714
> Reviewed-by: John Budorick <jbudorick@chromium.org>
> Commit-Queue: Erik Chen <erikchen@chromium.org>

Bug:  chromium:895027 
Change-Id: Icec85af8b978942ec7699636e68b7500df32e0ba
Reviewed-on: https://chromium-review.googlesource.com/c/1313609
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>

[modify] https://crrev.com/4527bffcdfd7d9f9e1dff0781f3824681eb3a0b0/devil/devil/utils/cmd_helper.py
[modify] https://crrev.com/4527bffcdfd7d9f9e1dff0781f3824681eb3a0b0/devil/devil/android/sdk/adb_wrapper.py
[modify] https://crrev.com/4527bffcdfd7d9f9e1dff0781f3824681eb3a0b0/devil/devil/utils/reraiser_thread.py

Project Member

Comment 19 by bugdroid1@chromium.org, Nov 2

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/661fddf6f497b0392411d47198312c0b39608736

commit 661fddf6f497b0392411d47198312c0b39608736
Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Date: Fri Nov 02 18:11:14 2018

Roll src/third_party/catapult a6ef719b69f4..a8f088180a1c (2 commits)

https://chromium.googlesource.com/catapult.git/+log/a6ef719b69f4..a8f088180a1c


git log a6ef719b69f4..a8f088180a1c --date=short --no-merges --format='%ad %ae %s'
2018-11-02 chiniforooshan@chromium.org Telemetry: migrate thread_*_cpu_time_per_frame
2018-11-02 erikchen@chromium.org [Reland 1]  ADB commands should only retry errors, not exceptions.


Created with:
  gclient setdep -r src/third_party/catapult@a8f088180a1c

The AutoRoll server is located here: https://autoroll.skia.org/r/catapult-autoroll

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG= chromium:890757 , chromium:895027 
TBR=sullivan@chromium.org

Change-Id: I070af3f6172541300ce6dc4d5ac183f117dd1db6
Reviewed-on: https://chromium-review.googlesource.com/c/1315408
Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#604985}
[modify] https://crrev.com/661fddf6f497b0392411d47198312c0b39608736/DEPS

Cc: bpastene@chromium.org martiniss@chromium.org
This still isn't fully functional. Looks like logdog/butler is not transmitting the output when the subprocess exits via sys.exit(constants.INFRA_EXIT_CODE).

Example build:
https://chromium-swarm.appspot.com/task?id=40e6b33438f2be10&refresh=10&show_raw=1

Output:
"""
I 3599.583s Main  Generated json results file at /b/swarming/w/itYFNqM4/tmpkDNAgX
I 3599.583s Main  Finishing archiving output.
E 3599.583s Main  Unrecognized error occurred.
Traceback (most recent call last):
  File "/b/swarming/w/ir/build/android/test_runner.py", line 1018, in main
    return RunTestsCommand(args)
  File "/b/swarming/w/ir/build/android/test_runner.py", line 715, in RunTestsCommand
    return RunTestsInPlatformMode(args)
  File "/b/swarming/w/ir/build/android/test_runner.py", line 854, in RunTestsInPlatformMode
    test_run.RunTests(raw_results)
  File "/b/swarming/w/ir/build/android/pylib/local/device/local_device_test_run.py", line 155, in RunTests
    run_tests_on_device, tc, try_results).pGet(None)
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/parallelizer.py", line 263, in pMap
    r.pFinish(None)
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/parallelizer.py", line 135, in pFinish
    self._objs.JoinAll()
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 196, in JoinAll
    self._JoinAll(watcher, timeout)
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 163, in _JoinAll
    thread.join(0.1)
  File "/b/swarming/w/ir/.swarming_module/lib/python2.7/threading.py", line 951, in join
    self.__block.wait(delay)
  File "/b/swarming/w/ir/.swarming_module/lib/python2.7/threading.py", line 359, in wait
    _sleep(delay)
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/signal_handler.py", line 41, in handler
    existing_handler(signum, frame)
  File "/b/swarming/w/ir/build/android/test_runner.py", line 758, in unexpected_sigterm
    infra_error('\n'.join(msg))
  File "/b/swarming/w/ir/build/android/test_runner.py", line 738, in infra_error
    sys.exit(constants.INFRA_EXIT_CODE)
SystemExit: 87
[W2018-11-01T07:07:02.603707Z 2412 0 butler.go:240] Butler Context was cancelled. Initiating shutdown. {"error":"context canceled"}
INFO:devil.utils.timeout_retry:condition 'logdog_stopped' not met
[E2018-11-01T07:07:02.603855Z 2412 0 main.go:205] Butler terminated with error.               {"error":"context canceled"}
[E2018-11-01T07:07:02.604048Z 2412 0 main.go:233] Failed to serve.
original error: context canceled

[I2018-11-01T07:07:02.604084Z 2412 0 main.go:349] Terminating.                                {"returnCode":250}
INFO:devil.utils.timeout_retry:condition 'logdog_stopped' met
"""
jbudorick investigated. This is likely because of incorrect handling for state == 'TIMED_OUT' in standard_gtest_merge.py. I'm working on a fix.
Project Member

Comment 22 by bugdroid1@chromium.org, Nov 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/cf68f1d78354078932322dc24b192d262f6b7b4e

commit cf68f1d78354078932322dc24b192d262f6b7b4e
Author: Erik Chen <erikchen@chromium.org>
Date: Mon Nov 12 17:24:37 2018

Recipes should parse results output regardless of failure status.

When a bot times out, it may still output json results. This CL updates the
gtest_merge logic to always parse json results, if they exist.

In addition to adding a test, this CL refactors the tests to make string
comparisons independent of unicode vs str type, which otherwise causes failures.

Bug:  895027 
Change-Id: Ifc6b65ead4879c9ccb136a2ca31796b186b234c0
Reviewed-on: https://chromium-review.googlesource.com/c/1330161
Commit-Queue: Erik Chen <erikchen@chromium.org>
Auto-Submit: Erik Chen <erikchen@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>

[modify] https://crrev.com/cf68f1d78354078932322dc24b192d262f6b7b4e/scripts/slave/recipe_modules/swarming/resources/standard_gtest_merge.py
[modify] https://crrev.com/cf68f1d78354078932322dc24b192d262f6b7b4e/scripts/slave/recipe_modules/swarming/unittests/standard_gtest_merge_test.py

Comment 23 Deleted

Cc: nedngu...@google.com estaab@chromium.org
Labels: Infra-Platform-Test
Oops, c#23 had some bad links. Trying again.

This build:
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/145816

Had a shard time out:
https://chromium-swarm.appspot.com/task?id=41bc71b36a7b8310&refresh=10&show_raw=1

The output.json from the shard looks correct:
https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=87247384865ee6a3464527b1d8b7d4a835ec4107&as=output.json

So there's probably a bug causing the chromium_test recipe to treat the step as "TEST RESULTS WERE INVALID".
Note to future self:

To test this, I ran the following commands:
"""
./tools/swarming_client/swarming.py collect -S chromium-swarm.appspot.com --task-output-dir=foo 41bc71b36a7b8310

build/android/pylib/results/presentation/test_results_presentation.py --test-name "blah" --build-number 10 --builder-name asdf --output-json /tmp/output2.json --summary-json ./foo/summary.json ./foo/0/output.json --bucket None
"""

This bug is caused by my failure to land an earlier fix, which had already been code reviewed. :(
https://chromium-review.googlesource.com/c/chromium/tools/build/+/1330161

Project Member

Comment 28 by bugdroid1@chromium.org, Dec 13

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1b497b3b3e089f9149228b92998493d62d395408

commit 1b497b3b3e089f9149228b92998493d62d395408
Author: erikchen <erikchen@chromium.org>
Date: Thu Dec 13 21:08:19 2018

Update gtest_merge logic to parse outputs regardless of failure status.

When a bot times out, it may still output json results. This CL updates the
gtest_merge logic to always parse json results, if they exist.

Bug:  895027 
Change-Id: I4c78e5f8a7270a100a97fb8d6f9ffe8f3cf30b83
Reviewed-on: https://chromium-review.googlesource.com/c/1330066
Commit-Queue: Erik Chen <erikchen@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>
Cr-Commit-Position: refs/heads/master@{#616430}
[modify] https://crrev.com/1b497b3b3e089f9149228b92998493d62d395408/build/android/pylib/results/presentation/standard_gtest_merge.py

Status: Fixed (was: Started)
I think this problem should be fixed. I'm closing for now and will reopen if I observe this again in the future.

Sign in to add a comment