telemetry_perf_unittests timing out when re-enabling smoke tests |
|||||||||
Issue descriptiontelemetry_perf_unittests failing on chromium.linux/Linux Tests Builders failed on: - Linux Tests: https://build.chromium.org/p/chromium.linux/builders/Linux%20Tests Following a pair of commits: - https://chromium-review.googlesource.com/654868 - https://chromium-review.googlesource.com/655299
,
Sep 8 2017
The error was "shard #6 timed out, took too much time to complete" I assume we tipped over some time limit when re-enabling the stories. For now I've disabled multitab:misc:typical24 again, let's see if that helps the bot to recover. I guess we'll need to decide between increasing the time limit or keeping that story permanently disabled.
,
Sep 8 2017
,
Sep 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ed15e687ba59854b2124a8f6c2bda9e4bc5616e1 commit ed15e687ba59854b2124a8f6c2bda9e4bc5616e1 Author: Roger McFarlane <rogerm@chromium.org> Date: Fri Sep 08 15:06:59 2017 Revert "[tools/perf] Re-enable meadia system health smoke tests" This reverts commit 359bafdab6c2f69932618e2879f1f5689e08af59. Reason for revert: Seeing telemetry-perf bot failures. Original change's description: > [tools/perf] Re-enable meadia system health smoke tests > > These are running fine on bots now. > > Bug: 726439 > Change-Id: I97a5f9e85d873c685af14085f9534081fd3a5ee5 > Reviewed-on: https://chromium-review.googlesource.com/655299 > Reviewed-by: Ned Nguyen <nednguyen@google.com> > Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> > Cr-Commit-Position: refs/heads/master@{#500564} TBR=perezju@chromium.org,nednguyen@google.com Change-Id: Ib3d38babf138c04ea32fa62cf257d20727593888 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 726439 , 763379 Reviewed-on: https://chromium-review.googlesource.com/657917 Reviewed-by: Roger McFarlane <rogerm@chromium.org> Commit-Queue: Roger McFarlane <rogerm@chromium.org> Cr-Commit-Position: refs/heads/master@{#500594} [modify] https://crrev.com/ed15e687ba59854b2124a8f6c2bda9e4bc5616e1/tools/perf/benchmarks/system_health_smoke_test.py
,
Sep 8 2017
Status Update: The second revert has fixed the bots. I'm removing Sheriff-Chromium label. I've synced with perezju@ who will investigate/fix/reland after the weekend.
,
Sep 11 2017
,
Sep 26 2017
Ned, do you know how are tests assigned to shards for telemetry_perf_unittests? From a recent run [1] I see 12 shards with times ranging from 112 to 205 seconds; if these smoke tests can be added to the shards with lower load it would all be fine? Also what is the time limit for each shard to run? [1]: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.linux%2FLinux_Tests%2F62708%2F%2B%2Frecipes%2Fsteps%2Ftelemetry_perf_unittests%2F0%2Flogs%2Fswarming.summary%2F0
,
Sep 26 2017
,
Sep 26 2017
The number of shards & timeout for Linux are configured in https://cs.chromium.org/chromium/src/testing/buildbot/chromium.linux.json?rcl=6569f043431d49051467dd9cc52df361034e3771&l=4180 +John/Dirk: can we just increase the number of shards here?
,
Sep 26 2017
That 960 timeout looks reasonable enough to hold the extra tests. Wondering why it failed on the previous attempt? I'm going to go ahead and re-enable the media smoke tests, which should fit without much trouble. Waiting to re-enable multitab:misc:typical24 after we see the impact of the media tests.
,
Sep 26 2017
Before doing that, I'm also interested in what Juan asked in #7 -- how does t_p_u do sharding? The current task execution timeout for each shard on this suite is 16 minutes. Over the past month, the 90th percentile shard execution time for this suite has never cracked 5 minutes. There was only one day where the maximum shard execution time for this suite was above 10 minutes, and that was this event. My suspicion is that we should be looking at how we're sharding before arbitrary increasing the shard count. We're not close to the currently allocated time.
,
Sep 26 2017
(task data for #11 from an internal tool that I'm happy to share in a non-public setting)
,
Sep 26 2017
... and my "before doing that" from #11 was in re increasing the number of shards as Ned proposed in #9. No objections to reenabling media smoke tests as Juan proposed in #10.
,
Sep 26 2017
telemetry_perf_unittest are sharded in contiguous chunks (https://github.com/catapult-project/catapult/blob/master/third_party/typ/typ/runner.py#L367) Would be great if we can use smarter sharding algorithm: either the greedy one (used by gpu test) or the cutting algorithm that Stephen proposed.
,
Sep 26 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/71921771dba55d265b7f953a4c0809c36ee180a0 commit 71921771dba55d265b7f953a4c0809c36ee180a0 Author: Juan Antonio Navarro Pérez <perezju@chromium.org> Date: Tue Sep 26 13:30:46 2017 Reland "[tools/perf] Re-enable meadia system health smoke tests" This reverts commit ed15e687ba59854b2124a8f6c2bda9e4bc5616e1. Reason for revert: Tests should now fit within the allotted time. Original change's description: > Revert "[tools/perf] Re-enable meadia system health smoke tests" > > This reverts commit 359bafdab6c2f69932618e2879f1f5689e08af59. > > Reason for revert: > > Seeing telemetry-perf bot failures. > > Original change's description: > > [tools/perf] Re-enable meadia system health smoke tests > > > > These are running fine on bots now. > > > > Bug: 726439 > > Change-Id: I97a5f9e85d873c685af14085f9534081fd3a5ee5 > > Reviewed-on: https://chromium-review.googlesource.com/655299 > > Reviewed-by: Ned Nguyen <nednguyen@google.com> > > Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> > > Cr-Commit-Position: refs/heads/master@{#500564} > > TBR=perezju@chromium.org,nednguyen@google.com > > Change-Id: Ib3d38babf138c04ea32fa62cf257d20727593888 > No-Presubmit: true > No-Tree-Checks: true > No-Try: true > Bug: 726439 , 763379 > Reviewed-on: https://chromium-review.googlesource.com/657917 > Reviewed-by: Roger McFarlane <rogerm@chromium.org> > Commit-Queue: Roger McFarlane <rogerm@chromium.org> > Cr-Commit-Position: refs/heads/master@{#500594} TBR=rogerm@chromium.org,perezju@chromium.org,nednguyen@google.com # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 726439 , 763379 Change-Id: I309dfdd0f7b32bd4dc275904fc24ef1e251f8c85 Reviewed-on: https://chromium-review.googlesource.com/684024 Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org> Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> Cr-Commit-Position: refs/heads/master@{#504352} [modify] https://crrev.com/71921771dba55d265b7f953a4c0809c36ee180a0/tools/perf/benchmarks/system_health_smoke_test.py
,
Sep 26 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2f37bb9a989ac2130617f13b47984a8e972b7e81 commit 2f37bb9a989ac2130617f13b47984a8e972b7e81 Author: Marc Treib <treib@chromium.org> Date: Tue Sep 26 15:48:01 2017 Revert "Reland "[tools/perf] Re-enable meadia system health smoke tests"" This reverts commit 71921771dba55d265b7f953a4c0809c36ee180a0. Reason for revert: telemetry_perf_unittests failing again: https://uberchromegw.corp.google.com/i/chromium.linux/builders/Linux%20Tests Original change's description: > Reland "[tools/perf] Re-enable meadia system health smoke tests" > > This reverts commit ed15e687ba59854b2124a8f6c2bda9e4bc5616e1. > > Reason for revert: Tests should now fit within the allotted time. > > Original change's description: > > Revert "[tools/perf] Re-enable meadia system health smoke tests" > > > > This reverts commit 359bafdab6c2f69932618e2879f1f5689e08af59. > > > > Reason for revert: > > > > Seeing telemetry-perf bot failures. > > > > Original change's description: > > > [tools/perf] Re-enable meadia system health smoke tests > > > > > > These are running fine on bots now. > > > > > > Bug: 726439 > > > Change-Id: I97a5f9e85d873c685af14085f9534081fd3a5ee5 > > > Reviewed-on: https://chromium-review.googlesource.com/655299 > > > Reviewed-by: Ned Nguyen <nednguyen@google.com> > > > Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> > > > Cr-Commit-Position: refs/heads/master@{#500564} > > > > TBR=perezju@chromium.org,nednguyen@google.com > > > > Change-Id: Ib3d38babf138c04ea32fa62cf257d20727593888 > > No-Presubmit: true > > No-Tree-Checks: true > > No-Try: true > > Bug: 726439 , 763379 > > Reviewed-on: https://chromium-review.googlesource.com/657917 > > Reviewed-by: Roger McFarlane <rogerm@chromium.org> > > Commit-Queue: Roger McFarlane <rogerm@chromium.org> > > Cr-Commit-Position: refs/heads/master@{#500594} > > TBR=rogerm@chromium.org,perezju@chromium.org,nednguyen@google.com > > # Not skipping CQ checks because original CL landed > 1 day ago. > > Bug: 726439 , 763379 > Change-Id: I309dfdd0f7b32bd4dc275904fc24ef1e251f8c85 > Reviewed-on: https://chromium-review.googlesource.com/684024 > Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org> > Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> > Cr-Commit-Position: refs/heads/master@{#504352} TBR=rogerm@chromium.org,perezju@chromium.org,nednguyen@google.com Change-Id: Ib027b20abc609bab8a6269fe658ad5b135eb66b7 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 726439 , 763379 Reviewed-on: https://chromium-review.googlesource.com/685094 Reviewed-by: Marc Treib <treib@chromium.org> Commit-Queue: Marc Treib <treib@chromium.org> Cr-Commit-Position: refs/heads/master@{#504378} [modify] https://crrev.com/2f37bb9a989ac2130617f13b47984a8e972b7e81/tools/perf/benchmarks/system_health_smoke_test.py
,
Sep 27 2017
Ok, after some digging, I found that the problem lies in play:media:soundcloud specifically, which blows up the time of shard #6 from 3 to over the 15 minutes timeout. So, there is clearly something wrong with that story. I'll re-enable the others for now before investigating a bit more.
,
Sep 27 2017
,
Sep 27 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d3994c22b44a40d3aef04fb383c3ca76bf884d11 commit d3994c22b44a40d3aef04fb383c3ca76bf884d11 Author: Juan A. Navarro Perez <perezju@chromium.org> Date: Wed Sep 27 13:35:57 2017 Re-enable media system health smoke tests Re-enable both: - load:media:soundcloud - play:media:google_play_music The following remains disabled as it causes timeouts: - play:media:soundcloud Bug: 763379 Change-Id: I8bf40d45ec747aa027540dc4f6cc637b47f35c32 Reviewed-on: https://chromium-review.googlesource.com/686874 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> Cr-Commit-Position: refs/heads/master@{#504646} [modify] https://crrev.com/d3994c22b44a40d3aef04fb383c3ca76bf884d11/tools/perf/benchmarks/system_health_smoke_test.py
,
Sep 28 2017
The media stories started running fine on this bot: https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=telemetry_perf_unittests&builder=chromium.linux%3ALinux%20Tests I'll move over to re-enable multitab:misc:typical24, from what I can gather, that test didn't fail actually fail when we tried to re-enable it last time.
,
Sep 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2b4933396a571e992ca6b474949624153edc79ae commit 2b4933396a571e992ca6b474949624153edc79ae Author: Juan Antonio Navarro Pérez <perezju@chromium.org> Date: Thu Sep 28 13:42:05 2017 Revert "Revert "[tools/perf] Reenable multitab:misc:typical24 smoke test"" This reverts commit e996f894d9e0a2a9cf7dfe67a2efda60cc8543df. Reason for revert: Story should be able to run now. Original change's description: > Revert "[tools/perf] Reenable multitab:misc:typical24 smoke test" > > This reverts commit 748a8867042a160ddbec5945c19c9afe14e1362c. > > Reason for revert: made telemetry_perf_unittests tip over some time limit > > Original change's description: > > [tools/perf] Reenable multitab:misc:typical24 smoke test > > > > The story may no longer be failing. > > > > TBR=nednguyen@google.com > > > > Bug: 698499 > > Change-Id: I9383bfee2e5882d75459e63a882b4e8fc10b2d3e > > Reviewed-on: https://chromium-review.googlesource.com/654868 > > Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org> > > Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> > > Cr-Commit-Position: refs/heads/master@{#500567} > > TBR=perezju@chromium.org > > Change-Id: I7105c8fb94af1be724b1a13b82c7da81f9c9aecf > No-Presubmit: true > No-Tree-Checks: true > No-Try: true > Bug: 698499 , 763379 > Reviewed-on: https://chromium-review.googlesource.com/657658 > Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org> > Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> > Cr-Commit-Position: refs/heads/master@{#500581} TBR=perezju@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 698499 , 763379 Change-Id: I201f175ac822d6b51ad881a8e220b6e4f30e0397 Reviewed-on: https://chromium-review.googlesource.com/690194 Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org> Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> Cr-Commit-Position: refs/heads/master@{#505004} [modify] https://crrev.com/2b4933396a571e992ca6b474949624153edc79ae/tools/perf/benchmarks/system_health_smoke_test.py
,
Sep 29 2017
.. aaand the multitab:misc:typical24 is now running too with no issues. Closing this, will follow up on issue 769263 about play:media:soundcloud.
,
Sep 29 2017
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by bugdroid1@chromium.org
, Sep 8 2017