New issue
Advanced search Search tips

Issue 891625 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Set appropriate timeout for some unstable tests

Project Member Reported by tikuta@chromium.org, Oct 3

Issue description

Sometimes test execution failed in timed out due to machine or test flakiness.
For such case, I think it is better to fail fast instead of waiting 1hour.
Such long timeout gives us very long waiting duration in CQ and become outlier.

Even, some patches seeing timeout were treated as success by with patch retry steps.
e.g.
browser_tests in https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_asan_rel_ng/108863
browser_tests in https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_asan_rel_ng/108935
content_browsertests in https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/96173
chrome_public_test_apk in https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/91231

Above builds are taken by
https://pantheon.corp.google.com/bigquery?organizationId=433637338589&project=cr-buildbucket&j=bquxjob_ddcb518_16638efa172&page=queryresults
 
Owner: tikuta@chromium.org
Status: Assigned (was: Untriaged)
Let me set shorter timeout for some CQ builder.
Cc: erikc...@chromium.org
+erikchen, who's been doing a bunch of work lately in retries.
I generally agree with the sentiment of failing fast, but please be careful. We also wait 1-hour to lease a bot from swarming [e.g. if fleet is over-capacity]. IN this case, I think the 1-hour wait is justified and should not cause an early failure.
the webkit_layout_tests have per-test timeouts and a maximum number of failures allowed per test suite, which are enforced by the runner inside of the swarming task. For example, if all of the first 50 tests in a task fail, it can be better to abort early than to run the remaining 4500.

Ideally every test suite should support something like this, because it's much better to optimize the handling inside the swarming task and bail out cleanly than it is to let swarming do it for us, since swarming has no knowledge of what the task is doing.
#4

I see.
Currently, chrome_public_test_apk is a test that can be timeout most frequently.
126 chrome_public_test_apk and 55 content_browsertests failed and took more than 30 minutes in android-kitkat-arm-rel builder in last 7 days.
https://pantheon.corp.google.com/bigquery?organizationId=433637338589&project=cr-buildbucket&j=bquxjob_1c7377f5_1663d3a4248&page=queryresults

If we want to handle such timeout in test harnes, I want someone knowing about android test taking a look these timeout.

tikuta: I don't have access to your query.
"""
Access Denied: Project cr-buildbucket: The user erikchen@google.com does not have bigquery.jobs.list permission in project cr-buildbucket.
"""

Who should I talk to to get access?


I looked at the chrome_public_test_apk link you posted in the opening comment. Looking at the swarming task:
https://chromium-swarm.appspot.com/task?id=4035f1cb0119c010&refresh=10&show_raw=1

Around 5 minutes in, all tests starting timing out. This continued for 55 minutes. My guess for what happened is that the device was disconnected/ADB started having issues, but the test runner just kept trying until the swarming task timeout was hit. Bug filed: https://bugs.chromium.org/p/chromium/issues/detail?id=892161
Cc: no...@chromium.org
#6

Thank you for filing a bug.
Currently, chrome-infrastructure-team or chrome-troopers group have right to access the data.

Nodir:
Is there some policy allowing access to cr-buildbucket for some other googlers, especially BigQuery table?
If it is OK, I'd like to give BigQuery User role to erikchen.
Cc: st...@chromium.org
+stgao - didn't you do something in this area? I'm guessing maybe there's something tikuta@ needs to do to be able to share the query?
the following tables are available to all googlers
- cr-buildbucket.chromium.completed_builds_BETA, scoped to project='chromium'
- cr-buildbucket.chrome.completed_builds_BETA,   scoped to project IN ('chromium', 'chrome')

please them in chromium/chrome-specific queries

https://groups.google.com/a/google.com/forum/#!topic/chops-data/2nPrgkkwFh0
thread represents the latest state of the "policy"

Sign in to add a comment