New issue
Advanced search Search tips

Issue 713539 link

Starred by 1 user

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

Restore stacktrace on SIGALRM for timeout.

Project Member Reported by hidehiko@chromium.org, Apr 20 2017

Issue description

Chrome Version: ToT
OS: ChromeOS

Example;

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/113404183-chromeos-test/chromeos4-row9-rack9-host21/ssp_logs/debug/

04/19 17:17:10.223 ERROR|         traceback:0013| Traceback (most recent call last):
04/19 17:17:10.225 ERROR|         traceback:0013|   File "/usr/local/autotest/server/autoserv", line 530, in run_autoserv
04/19 17:17:10.226 ERROR|         traceback:0013|     machines)
04/19 17:17:10.227 ERROR|         traceback:0013|   File "/usr/local/autotest/server/autoserv", line 163, in _run_with_ssp
04/19 17:17:10.228 ERROR|         traceback:0013|     dut_name=dut_name)
04/19 17:17:10.229 ERROR|         traceback:0013|   File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 274, in wrapper
04/19 17:17:10.230 ERROR|         traceback:0013|     return fn(*args, **kwargs)
04/19 17:17:10.231 ERROR|         traceback:0013|   File "/usr/local/autotest/site_utils/lxc.py", line 235, in func_cleanup_if_fail
04/19 17:17:10.233 ERROR|         traceback:0013|     return func(*args, **kwargs)
04/19 17:17:10.233 ERROR|         traceback:0013|   File "/usr/local/autotest/site_utils/lxc.py", line 946, in setup_test
04/19 17:17:10.234 ERROR|         traceback:0013|     download_extract(server_package_url, autotest_pkg_path, usr_local_path)
04/19 17:17:10.235 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/cros/retry.py", line 240, in func_retry
04/19 17:17:10.236 ERROR|         traceback:0013|     raise error.TimeoutException(exception_message)
04/19 17:17:10.237 ERROR|         traceback:0013| TimeoutException: retry exception (function="download_extract()"), timeout = 300s

In case of timeout, the stacktrace in the wrapped function is lost, which makes debugging difficult.
This looks just a python work.
 
Cc: pprabhu@chromium.org
Update:
So, in the code review
https://chromium-review.googlesource.com/c/483220/
pprabhu@ recommended me to use chromite.lib, instead of fixing the existing one to avoid dup-maintaining.

Looking at chromite.lib, then, conceptually cros.retry can be migrated somehow with chromite.lib.{retry_util,timeout_util}.
timeout_util had some minor bug. Fixing it, and the CL is under review.

BTW, retry_util.GenericRetry family is a function, rather than decorator.
IMHO, decorator looks better because it has many optional args, which means we cannot apply the retry to the function using the names for its param.
The only cons is, we cannot directly call GenericRetry inline. Instead we'd need some local function if we'd like to run inline:

@GenericRetry(max_retry=1)
def _run():
  # ... do something ...
_run()

GenericRetry currently requires a function for its argument, so considering the current usage, it looks acceptable change, IMHO.
pprabhu@, any thoughts?

Hm, so let's move forward with decorator approach, then migrate retry in autotest by chromite's. Will send CLs.
Status: Started (was: Assigned)
I'm not sure I follow what you mean in #1.
If you send me a quick CL (don't put too much effort into unittests etc at first), we can discuss there.
Sure. Here it is.
https://chromium-review.googlesource.com/c/491586/ (no test, no real use).

Note: I'm not planning to send this to review directly, because some clean up is needed. So, let's discuss only on rough direction now, and on details in real code review process.
Project Member

Comment 6 by bugdroid1@chromium.org, May 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/e55af7fa61b43047927d4af51e06b1d8c9d9bb49

commit e55af7fa61b43047927d4af51e06b1d8c9d9bb49
Author: Hidehiko Abe <hidehiko@chromium.org>
Date: Mon May 08 19:43:43 2017

Rework RunCurl.

This simplifies RunCurl implementation.
- Remove fail option. This can be a part of args.
- Do not overwrite capture_output. Let caller set it.
- Flatten several nested statements.

This is preparation to make GenericRetry a decorator,
to migrate with autotest's.

BUG=chromium:713539
TEST=Ran bots.

Change-Id: I65740bdb9ffa94d0764ed9e06cdf916ffd8bb3d3
Reviewed-on: https://chromium-review.googlesource.com/491228
Commit-Ready: Hidehiko Abe <hidehiko@chromium.org>
Tested-by: Hidehiko Abe <hidehiko@chromium.org>
Reviewed-by: Hidehiko Abe <hidehiko@chromium.org>

[modify] https://crrev.com/e55af7fa61b43047927d4af51e06b1d8c9d9bb49/lib/cache.py
[modify] https://crrev.com/e55af7fa61b43047927d4af51e06b1d8c9d9bb49/lib/retry_util.py
[modify] https://crrev.com/e55af7fa61b43047927d4af51e06b1d8c9d9bb49/scripts/cros_sdk.py

Project Member

Comment 7 by bugdroid1@chromium.org, Jun 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/9bacfb6b7d6531669a23893864a1b652f5d7c98e

commit 9bacfb6b7d6531669a23893864a1b652f5d7c98e
Author: Hidehiko Abe <hidehiko@chromium.org>
Date: Fri Jun 02 06:49:31 2017

Extract retry_util related tests from cros_build_lib_unittest.

Now retry_util is extracted from cros_build_lib, so tests
are also extracted.

BUG=chromium:713539
TEST=Ran retry_util_unittest, cros_build_lib_unittest locally. Ran bots.

Change-Id: I09bf4b892e0b14e80d3d1ceb3fcab224a7fbe4c5
Reviewed-on: https://chromium-review.googlesource.com/519042
Commit-Ready: Hidehiko Abe <hidehiko@chromium.org>
Tested-by: Hidehiko Abe <hidehiko@chromium.org>
Reviewed-by: Shuhei Takahashi <nya@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[add] https://crrev.com/9bacfb6b7d6531669a23893864a1b652f5d7c98e/lib/retry_util_unittest
[modify] https://crrev.com/9bacfb6b7d6531669a23893864a1b652f5d7c98e/lib/cros_build_lib_unittest.py
[add] https://crrev.com/9bacfb6b7d6531669a23893864a1b652f5d7c98e/lib/retry_util_unittest.py

Project Member

Comment 8 by bugdroid1@chromium.org, Jun 9 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/a2ed3a15a0d9401824945d23a60f59908b952f12

commit a2ed3a15a0d9401824945d23a60f59908b952f12
Author: Hidehiko Abe <hidehiko@chromium.org>
Date: Fri Jun 09 14:10:42 2017

Extract main implementation of GenericRetry into a class.

Now, WithRetry can be used as a decorator of a function.

BUG=chromium:713539
TEST=Ran unittests locally. Ran try.

Change-Id: I832d66ef9adf35ceb79d543599c8decac08d03c1
Reviewed-on: https://chromium-review.googlesource.com/523888
Commit-Ready: Hidehiko Abe <hidehiko@chromium.org>
Tested-by: Hidehiko Abe <hidehiko@chromium.org>
Reviewed-by: Hidehiko Abe <hidehiko@chromium.org>

[modify] https://crrev.com/a2ed3a15a0d9401824945d23a60f59908b952f12/lib/retry_util.py
[modify] https://crrev.com/a2ed3a15a0d9401824945d23a60f59908b952f12/lib/retry_util_unittest.py

Components: Internals>CrashReporting OS>Kernel

Sign in to add a comment