New issue
Advanced search Search tips

Issue 907852 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 817842



Sign in to add a comment

Infra failure on perf.fyi webview bots

Project Member Reported by perezju@chromium.org, Nov 22

Issue description

The go webview bot has been constantly failing with Infra Failure:
https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/android-go_webview-perf

And the pixel 2 one has recently started doing the same, e.g.:
https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/android-pixel2_webview-perf/778

Any idea why is that? Looks like maybe a timeout? The (huge) logs for the performance_webview_test_suite step show a "Waiting for results from the following shards" close to the end.
 
Labels: Performance-Sheriff-BotHealth
like the windows bot (https://bugs.chromium.org/p/chromium/issues/detail?id=906654), the build is hitting its configured 7 hour timeout. I feel that this should be raised and then the time increase investigated; perf folks disagree.
Those webview bots have plenty of shards, so it's different. The problem here is the swarming shard's timeout exceed the LUCI build timeout (7hr iirc). I suggest for this bug, we reduce the swarming shard timeout of webview bots to 3-4 hours.
Cc: eyaich@chromium.org crouleau@chromium.org
#2: I'm not following -- how is this different? In both cases, the perf tests take too long and wind up timing out the entire build. I'm also not following how your suggestion would help...
In the other case, there very few Window machines, which suggest the shard timeout is due to the lack of hardware. 

In this case, iirc, there are not a lack of harwares. If the 7hr mark is crossed, it's more like due to some shards have a tests being stuck. If the shard timeout limit is reduced so that the build step is not timeout, we can easily find out which shards exceed the timeout limit & look into it further to debug.
Issue 908126 has been merged into this issue.
908126 seems related to this. Merged it into this.
Owner: sergeybe...@chromium.org
Status: Assigned (was: Untriaged)
It appears #c2 and #c5 are accurate. The recipe quickly triggers a bunch of shards (which don't seem to be pending much - so the hardware is indeed not an issue), and then times out waiting for the shards. The individual shard timeout is set to 7h, just as the build, so the recipe has no chance to collect the timed out shards.

I'll take this as the current trooper and see if I can update the shard timeout tomorrow.
Labels: -Pri-2 Pri-1
Ping - this is now the only remaining blocker of issue 817842 :)
Sergey, did you have a chance to look at this?
Ping
Sorry, didn't have a chance to get to it. I'm a trooper again today, will take a look very soon.
Status: Started (was: Assigned)
Found a place where the timeout is configured: https://cs.chromium.org/chromium/src/testing/buildbot/chromium.perf.fyi.json?l=86&rcl=f4728fa7e63e608a59684c6a3cadf43138f961e8

The scary bit is that most of these *.json files are autogenerated from *.pyl specs, but this file apparently isn't.
Project Member

Comment 15 by bugdroid1@chromium.org, Dec 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/27287febd7142c4100044813a057b653850e3a8b

commit 27287febd7142c4100044813a057b653850e3a8b
Author: Sergey Berezin <sergeyberezin@google.com>
Date: Tue Dec 11 23:13:40 2018

[chromium.perf.fyi] Update swarming shards timeout to <7h

The main build always runs with a timeout of 7h. Make sure the individual shards
always complete sooner, so any failed or timed out shards are correctly
indicated on the build UI.

Bug: 907852
Change-Id: I1580ee0a92f371536f1435e56596ee3d8aeb861b
Reviewed-on: https://chromium-review.googlesource.com/c/1372533
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Reviewed-by: Stephen Martinis <martiniss@chromium.org>
Commit-Queue: Sergey Berezin <sergeyberezin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#615714}
[modify] https://crrev.com/27287febd7142c4100044813a057b653850e3a8b/testing/buildbot/chromium.perf.fyi.json

Status: Fixed (was: Started)
Thanks! I think this looks better now.
The fix seems to work: https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/android-go_webview-perf/2736
has a shard that timed out and correctly reported as such in a build step.
Status: Assigned (was: Fixed)
Sergey, unfortunately your change https://chromium-review.googlesource.com/c/chromium/src/+/1372533 doesn't work because 
https://cs.chromium.org/chromium/src/testing/buildbot/chromium.perf.fyi.json is actually autogenerated by the src/tools/perf/generate_perf_data script. So the next time that script was run it removed your changes. I have committed that change removal in https://chromium-review.googlesource.com/c/chromium/src/+/1388663 along with some code that will help prevent similar things from happening in the future.

Sorry about this!

I'm not sure what should be done now. The timeout tuning can be done in src/tools/perf/core/perf_data_generator.py and then you can re-run src/tools/perf/generate_perf_data to apply it. But I'm not sure the tuning is as fine-grain as you want.
Thanks for finding this out and adding a presubmit check! (I missed it originally because the generator was in an unusual place). I'll look into it later to see what can be done with the timeouts.
Cc: -nedngu...@google.com

Sign in to add a comment