New issue
Advanced search Search tips

Issue 616629 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 599776
Owner: ----
Closed: Jun 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug-Regression

Blocked on:
issue 614394



Sign in to add a comment

GPU tests failing randomly on Mac Retinas with AMD GPU

Project Member Reported by kbr@chromium.org, Jun 1 2016

Issue description

Comment 1 by kbr@chromium.org, Jun 1 2016

Cc: mar...@chromium.org
Components: Infra>Platform>Swarming
Summary: Disk apparently full on build53-b1 and build486-m4 (was: Disk apparently full on build53-b1)
Similarly here:
https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/1089/steps/webgl2_conformance_tests%20on%20ATI%20GPU%20on%20Mac%20Retina%20%28with%20patch%29%20on%20Mac-10.10/logs/stdio

build486-m4 this time.

[  FAILED  ] WebglConformance.conformance2_textures_image_bitmap_from_video_tex_3d_rgb16f_rgb_float (155796 ms)
[ RUN      ] WebglConformance.conformance2_textures_image_bitmap_from_video_tex_3d_rgb32f_rgb_float
(INFO) 2016-06-01 14:17:40,638 cache_temperature.EnsurePageCacheTemperature:55  PageCacheTemperature: any
[9134:56835:0601/141833:WARNING:important_file_writer.cc(55)] temp file failure: /b/swarm_slave/work/isolated/tmpEGJOVF/tmpHI_Tix/Local State : could not create temporary file: No such file or directory


Is there a way to easily check on the health of multiple slaves in the Swarming pool?

Comment 2 by d...@chromium.org, Jun 1 2016

fyi the disk is not full on these:

chrome-bot@build53-b1:(Mac 10.10.5):~$ df -h
Filesystem      Size   Used  Avail Capacity  iused    ifree %iused  Mounted on
/dev/disk0s2   465Gi   90Gi  375Gi    20% 23676122 98252107   19%   /

chrome-bot@build486-m4:(Mac 10.10.5):~$ df -h
Filesystem      Size   Used  Avail Capacity  iused    ifree %iused  Mounted on
/dev/disk1     465Gi   91Gi  373Gi    20% 23982741 97856873   20%   /

Comment 3 by kbr@chromium.org, Jun 1 2016

Cc: -mar...@chromium.org ericrk@chromium.org
Components: -Infra>Platform>Swarming
Hmm. Thanks for checking Bryce. The warning about important_file_writer.cc must be spurious. It's probably because the isolate's file system is read-only.

I'll continue digging and remove the Troopers and Infra labels in the next edit.

Comment 4 by kbr@chromium.org, Jun 1 2016

Cc: -pschmidt@chromium.org
Components: -Infra
Labels: -Infra-Troopers Hotlist-PixelWrangler

Comment 5 by kbr@chromium.org, Jun 1 2016

Summary: webgl2_conformance_tests failing randomly on Mac Retinas with AMD GPU (was: Disk apparently full on build53-b1 and build486-m4)
The first test failure in https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/1089/steps/webgl2_conformance_tests%20on%20ATI%20GPU%20on%20Mac%20Retina%20%28with%20patch%29%20on%20Mac-10.10/logs/stdio is as follows:

[ RUN      ] WebglConformance.conformance2_textures_image_bitmap_from_video_tex_3d_r11f_g11f_b10f_rgb_unsigned_int_10f_11f_11f_rev
(INFO) 2016-06-01 13:56:52,169 cache_temperature.EnsurePageCacheTemperature:55  PageCacheTemperature: any
[9138:1299:0601/135652:WARNING:webmediaplayer_impl.cc(345)] Using MultibufferDataSource
Traceback (most recent call last):
  File "/b/swarm_slave/work/isolated/run3kxECH/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 84, in _RunStoryAndProcessErrorIfNeeded
    state.RunStory(results)
  File "/b/swarm_slave/work/isolated/run3kxECH/content/test/gpu/gpu_tests/gpu_test_base.py", line 122, in RunStory
    RunStoryWithRetries(DesktopGpuSharedPageState, self, results)
  File "/b/swarm_slave/work/isolated/run3kxECH/content/test/gpu/gpu_tests/gpu_test_base.py", line 72, in RunStoryWithRetries
    super(cls, shared_page_state).RunStory(results)
  File "/b/swarm_slave/work/isolated/run3kxECH/third_party/catapult/telemetry/telemetry/page/shared_page_state.py", line 304, in RunStory
    self._current_page.Run(self)
  File "/b/swarm_slave/work/isolated/run3kxECH/third_party/catapult/telemetry/telemetry/page/__init__.py", line 95, in Run
    shared_state.page_test.RunNavigateSteps(self, current_tab)
  File "/b/swarm_slave/work/isolated/run3kxECH/third_party/catapult/telemetry/telemetry/page/legacy_page_test.py", line 191, in RunNavigateSteps
    page.RunNavigateSteps(action_runner)
  File "/b/swarm_slave/work/isolated/run3kxECH/content/test/gpu/gpu_tests/webgl_conformance.py", line 192, in RunNavigateSteps
    'webglTestHarness._finished', timeout_in_seconds=300)
  File "/b/swarm_slave/work/isolated/run3kxECH/third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py", line 186, in WaitForJavaScriptCondition
    self._tab.WaitForJavaScriptExpression(condition, timeout_in_seconds)
  File "/b/swarm_slave/work/isolated/run3kxECH/third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py", line 136, in WaitForJavaScriptExpression
    e.message + '\n' + debug_message)
TimeoutException: Timed out while waiting 300s for IsJavaScriptExpressionTrue.
Console output:


[  FAILED  ] WebglConformance.conformance2_textures_image_bitmap_from_video_tex_3d_r11f_g11f_b10f_rgb_unsigned_int_10f_11f_11f_rev (313767 ms)


All the subsequent tests failed, until the tryjob failed after an hour.

Mo, your laptop is the same model as these -- are you seeing this kind of failure?

Comment 6 by kbr@chromium.org, Jun 2 2016

Blocking: 615548

Comment 7 by kbr@chromium.org, Jun 2 2016

Blockedon: 615044
Cc: vmi...@chromium.org
Summary: GPU tests failing randomly on Mac Retinas with AMD GPU (was: webgl2_conformance_tests failing randomly on Mac Retinas with AMD GPU)
It looks like many tests are failing randomly on these machines now. I don't know whether there's a pattern to the failures -- i.e., whether they're happening on specific machines.

The symptom seems to be that the browser hangs during launch. This is really serious.

There are browser hangs upon start seen on other platforms too. See  Issue 615044 . I have a feeling they're all related and so am blocking this on the other bug. The Linux Intel bots on the chromium.gpu.fyi waterfall seem to fail one test on nearly every run with this symptom so I think taking one of them offline and debugging directly on it is the best way to proceed.

Comment 8 by kbr@chromium.org, Jun 2 2016

Labels: -Restrict-View-Google
Unrestricting access.

Blocking: -615548
Blockedon: -615044 614394
I suspect the root cause is screenshot capture:  Issue 614394 

 Issue 615044  and older reported hangups on browser start are likely unrelated.

Comment 11 by kbr@chromium.org, Jun 7 2016

Mergedinto: 599776
Status: Duplicate (was: Untriaged)
Duplicating all screenshot-related timeouts on the AMD based Retina MacBook Pros into  Issue 599776 . The root cause is known and a fix / workaround is underway.

Sign in to add a comment