New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 785291 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Aug 6
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: ----



Sign in to add a comment

angle_perftests timing out on chromium.perf/Win 7 ATI GPU Perf

Project Member Reported by charliea@chromium.org, Nov 15 2017

Issue description

angle_perftests failing on chromium.perf/Win 7 ATI GPU Perf

Builders failed on: 
- Win 7 ATI GPU Perf: 
  https://build.chromium.org/p/chromium.perf/builders/Win%207%20ATI%20GPU%20Perf

Going to disable the test and kick off a bisect.
 
That's unfortunate. How did this happen? The logs cut off after the first couple of tests.
I suspect that the test is being stuck. Then when swarming infra kill it, it tries to flush the buffer, which shows some incomplete logs.
Cc: kbr@chromium.org crouleau@chromium.org
Note that the prior run took 36 minutes to complete:

https://chromium-swarm.appspot.com/task?id=39d5be1cb0748610&refresh=10&show_raw=1

And the new test is timing out of 10 minutes. 

Is this possibly related to https://chromium-review.googlesource.com/761402 ?

(Make the tests isolated and deterministic)
Note the above CL is in the regression range for the broken build:

https://build.chromium.org/p/chromium.perf/builders/Win%207%20ATI%20GPU%20Perf/builds/1454
jmadill@, if the test is timing out after 10 minutes, my guess is that it's hitting the I/O timeout rather than the hard timeout. The I/O timeout is basically when swarming hasn't seen additional output from the test after 10 minutes, so kills it, assuming that the test is stuck. You can do one of two things:

1) Make it so that the test issues some heartbeat output when it's still running
2) Add an I/O timeout override to angle_perftests here: https://cs.chromium.org/chromium/src/tools/perf/core/perf_data_generator.py?type=cs&sq=package:chromium&q=package:%5E(chromium)$+file:(/%7C%5E)core/perf_data_generator(%5C.(swig%7Cpy%7Cspt)$%7C/(__init__%5C.(swig%7Cpy%7Cspt))?$)&l=849
Reverting https://chromium-review.googlesource.com/761402 : https://chromium-review.googlesource.com/c/chromium/src/+/772190

It seems very likely that my change is the culprit since my change makes it so that tests don't run in parallel so they will take longer.
The strange thing is that angle-perf-tests was already using the --single-process-tests flag for some of its subtests... I guess if my change doesn't fix this then we're know it wasn't the culprit.
Project Member

Comment 10 by bugdroid1@chromium.org, Nov 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e9a70b2c096050415e2d10b711d9552360b57f5f

commit e9a70b2c096050415e2d10b711d9552360b57f5f
Author: Jamie Madill <jmadill@chromium.org>
Date: Wed Nov 15 19:38:51 2017

Run angle_perftests on Windows AMD and Intel.

This will prevent regressions on the perf bots.

BUG= 785291 
TBR=kbr@chromium.org

Cq-Include-Trybots: master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Change-Id: I58e435466cef8450fe04f1775577dcabb84523e7
Reviewed-on: https://chromium-review.googlesource.com/771547
Commit-Queue: Jamie Madill <jmadill@chromium.org>
Reviewed-by: Jamie Madill <jmadill@chromium.org>
Cr-Commit-Position: refs/heads/master@{#516787}
[modify] https://crrev.com/e9a70b2c096050415e2d10b711d9552360b57f5f/content/test/gpu/generate_buildbot_json.py
[modify] https://crrev.com/e9a70b2c096050415e2d10b711d9552360b57f5f/testing/buildbot/chromium.gpu.fyi.json

Project Member

Comment 11 by 42576172...@developer.gserviceaccount.com, Nov 15 2017


=== BISECT JOB RESULTS ===
NO Test failure found

Bisect Details
  Configuration: winx64ati_perf_bisect
  Benchmark    : angle_perftests
  Metric       : BitSetIteratorPerf_run/score

Revision             Exit Code      N
chromium@516439      0 +- N/A       2      good
chromium@516536      0 +- N/A       2      bad

To Run This Test
  .\src\out\Release_x64\angle_perftests.exe --test-launcher-print-test-stdio=always --test-launcher-jobs=1

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8962844784548974896


For feedback, file a bug with component Speed>Bisection
So it seems even on successful builds there were timeouts:

https://chromium-swarm.appspot.com/task?id=39d5be1cb0748610&refresh=10&show_raw=1

Here 3 tests timed out:

[49/86] InterleavedAttributeDataBenchmark.Run/d3d11_9_3 (TIMED OUT)
[52/86] LinkProgramBenchmark.Run/d3d9 (TIMED OUT)
[58/86] PointSpritesBenchmark.Run/d3d9_10_3px_3vars (TIMED OUT)

However the tests are re-tried later in the run and all pass successfully, hence why there was no error reported:

[90/90] InterleavedAttributeDataBenchmark.Run/d3d11_9_3 (5068 ms)

My guess is that https://chromium-review.googlesource.com/761402 changed the tests to run in single-process mode, which then meant the timeouts caused the IO to fail entirely.

These tests were timing out as far back as I could see:

https://chromium-swarm.appspot.com/task?id=38f0562ab579c510&refresh=10&show_raw=1

I even went back in time to build 800, I could see one test timing out and some others were not. It might be related to some kind of driver bug.

The drivers for the AMD perf bots are slightly different from the mainline Chromium try bots. I could not repro the timeouts locally, but can try logging into the bot. Also not sure if OS version matters, I was trying on Win 10.

Perf bots driver version: 21.19.137.1 (9-16-2016)
GPU bots driver version: 21.19.407.0 (12-23-2016)

Unfortunately the differences in how we run on our CQ might be affecting why this timeout does not repro.

If the timeouts repro on the bot, I am going to try upgrading the drivers. I won't be able to roll back the driver version without help since I don't have the older version, but I think we can try upgrading if it works (the drivers are bit old by now). If I can't repro even on the bot, unsure what action to take.

Also I think we should disable automatic retries for failing tests - we want to catch any and all flakiness immediately.
+1 to disable automatic retries for failing tests. It's best to deal with flakiness immediately rather than hiding it
Project Member

Comment 14 by bugdroid1@chromium.org, Nov 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ac9bfc69a284300117f8a718fc05462364e0c867

commit ac9bfc69a284300117f8a718fc05462364e0c867
Author: Charlie Andrews <charliea@chromium.org>
Date: Wed Nov 15 20:27:26 2017

Disable angle_perftests on Win 7 ATI GPU perf

The test is timing out (see bug).

TBR=nednguyen@google.com, jmadill@chromium.org

Bug:  785291 
Change-Id: I4665c454b34cd791998402353766d54b3022d75a
Reviewed-on: https://chromium-review.googlesource.com/771971
Commit-Queue: Ned Nguyen <nednguyen@google.com>
Reviewed-by: Charlie Andrews <charliea@chromium.org>
Cr-Commit-Position: refs/heads/master@{#516808}
[modify] https://crrev.com/ac9bfc69a284300117f8a718fc05462364e0c867/testing/buildbot/chromium.perf.json
[modify] https://crrev.com/ac9bfc69a284300117f8a718fc05462364e0c867/tools/perf/core/perf_data_generator.py

Cc: dpranke@chromium.org
Seems we can just add the flag --test-launcher-retry-limit=0 to disable retries. Need a good way to test this before submitting it thought since I bet a lot of things will break.

+Dirk enabled retries by default for  issue 402089 .
I was thinking of just setting that flag for angle_perftests, rather than everything. Seems lower risk at least.
Project Member

Comment 17 by bugdroid1@chromium.org, Nov 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ee00f335cb479d1bb4f18739369aee3107586320

commit ee00f335cb479d1bb4f18739369aee3107586320
Author: Charlie Andrews <charliea@chromium.org>
Date: Wed Nov 15 22:31:03 2017

Disable gpu_perftests on Android One

It's been failing since at least September 20, and no one can figure
out why.

TBR=nednguyen@google.com, reveman@chromium.org

Bug:  785291 
Change-Id: I57a1ca702ba52e56c5848216864aba0f794c10f2
Reviewed-on: https://chromium-review.googlesource.com/772033
Commit-Queue: Charlie Andrews <charliea@chromium.org>
Reviewed-by: Charlie Andrews <charliea@chromium.org>
Cr-Commit-Position: refs/heads/master@{#516864}
[modify] https://crrev.com/ee00f335cb479d1bb4f18739369aee3107586320/testing/buildbot/chromium.perf.json
[modify] https://crrev.com/ee00f335cb479d1bb4f18739369aee3107586320/tools/perf/core/perf_data_generator.py

 issue 785554  filed for disabling retries.
Project Member

Comment 19 by bugdroid1@chromium.org, Nov 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5097c6d9c94b45f578f7f1d96036336ab0a16b68

commit 5097c6d9c94b45f578f7f1d96036336ab0a16b68
Author: Jamie Madill <jmadill@chromium.org>
Date: Fri Nov 17 04:26:32 2017

Revert "Run angle_perftests on Windows AMD and Intel."

This reverts commit e9a70b2c096050415e2d10b711d9552360b57f5f.

Reason for revert: Seems to break on non-swarming configs:

https://build.chromium.org/p/chromium.gpu.fyi/builders/Win10%20Release%20%28Intel%20HD%20630%29/builds/1038
https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20Release%20%28AMD%20R7%20240%29/builds/1773

Bug:  785291 

Original change's description:
> Run angle_perftests on Windows AMD and Intel.
> 
> This will prevent regressions on the perf bots.
> 
> BUG= 785291 
> TBR=kbr@chromium.org
> 
> Cq-Include-Trybots: master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
> Change-Id: I58e435466cef8450fe04f1775577dcabb84523e7
> Reviewed-on: https://chromium-review.googlesource.com/771547
> Commit-Queue: Jamie Madill <jmadill@chromium.org>
> Reviewed-by: Jamie Madill <jmadill@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#516787}

TBR=jmadill@chromium.org,kbr@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug:  785291 
Change-Id: I629eb2aced90fe2c9d4a72f003d17679fe267fe3
Cq-Include-Trybots: master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Reviewed-on: https://chromium-review.googlesource.com/775793
Reviewed-by: Jamie Madill <jmadill@chromium.org>
Commit-Queue: Jamie Madill <jmadill@chromium.org>
Cr-Commit-Position: refs/heads/master@{#517297}
[modify] https://crrev.com/5097c6d9c94b45f578f7f1d96036336ab0a16b68/content/test/gpu/generate_buildbot_json.py
[modify] https://crrev.com/5097c6d9c94b45f578f7f1d96036336ab0a16b68/testing/buildbot/chromium.gpu.fyi.json

Status: Assigned (was: Available)
Status: Fixed (was: Assigned)

Sign in to add a comment