New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 648369 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner: ----
Closed: Apr 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug

Blocked on:
issue 352807



Sign in to add a comment

screenshot_sync_tests are flaky on Linux Debug (New Intel)

Project Member Reported by ynovikov@chromium.org, Sep 19 2016

Issue description

https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4004
https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4003
https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/3992
https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/3988

Earliest I see is https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/3826
Earliest with log is https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/3875

A different test fails each time, for example:
GPURasterWithDivs
GPURasterWithCanvas
SWRasterWithCanvas
SWRasterWithDivs

I think the browser fails to start, for example from https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4004/steps/screenshot_sync_tests/logs/stdio:
[ RUN      ] ScreenshotSync.GPURasterWithCanvas
(INFO) 2016-09-19 11:11:03,274 desktop_browser_backend.GetBrowserStartupArgs:250  Requested remote debugging port: 0
(INFO) 2016-09-19 11:11:03,274 desktop_browser_backend.Start:285  Starting Chrome ['/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/out/Debug/chrome', '--js-flags=--expose-gc', '--enable-logging=stderr', '--force-gpu-rasterization', '--test-type=gpu', '--enable-net-benchmarking', '--metrics-recording-only', '--no-default-browser-check', '--no-first-run', '--enable-gpu-benchmarking', '--disable-background-networking', '--proxy-server=socks://localhost:41736', '--ignore-certificate-errors', '--disable-component-extensions-with-background-pages', '--disable-default-apps', '--remote-debugging-port=0', '--enable-crash-reporter-for-testing', '--window-size=1280,1024', '--user-data-dir=/b/c/b/Linux_Debug__New_Intel_/itON9j5R/tmpUkbj6D', 'about:blank']
[1:1:0919/111103:ERROR:memory_mapped_file.cc(52)] Couldn't open /b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/out/Debug/chrome_200_percent.pak
[1:1:0919/111103:ERROR:data_pack.cc(79)] Failed to mmap datapack
[21977:21977:0919/111104:ERROR:memory_mapped_file.cc(52)] Couldn't open /b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/out/Debug/chrome_200_percent.pak
[21977:21977:0919/111104:ERROR:data_pack.cc(79)] Failed to mmap datapack
[21977:21977:0919/111104:WARNING:persistent_histogram_allocator.cc(485)] Creating the results-histogram inside persistent memory can cause future allocations to crash if that memory is ever released (for testing).
[21977:21977:0919/111104:WARNING:password_store_factory.cc(248)] Using basic (unencrypted) store for password storage. See https://chromium.googlesource.com/chromium/src/+/master/docs/linux_password_storage.md for more information about password storage options.
(INFO) 2016-09-19 11:11:04,746 desktop_browser_backend.HasBrowserFinishedLaunching:237  Discovered ephemeral port 44131
[22060:22067:0919/111105:WARNING:persistent_histogram_allocator.cc(485)] Creating the results-histogram inside persistent memory can cause future allocations to crash if that memory is ever released (for testing).
(INFO) 2016-09-19 11:11:15,906 desktop_browser_backend.HasBrowserFinishedLaunching:237  Discovered ephemeral port 44131
(INFO) 2016-09-19 11:11:28,183 desktop_browser_backend.HasBrowserFinishedLaunching:237  Discovered ephemeral port 44131
(INFO) 2016-09-19 11:11:41,689 desktop_browser_backend.HasBrowserFinishedLaunching:237  Discovered ephemeral port 44131
(INFO) 2016-09-19 11:11:56,548 desktop_browser_backend.HasBrowserFinishedLaunching:237  Discovered ephemeral port 44131
(WARNING) 2016-09-19 11:12:06,559 desktop_browser_backend._GetAllCrashpadMinidumps:349  No path to crashpad_database_util found
(INFO) 2016-09-19 11:12:06,560 desktop_browser_backend._GetMostRecentMinidump:417  No minidump found via crashpad_database_util
(WARNING) 2016-09-19 11:12:06,560 desktop_browser_backend._GetAllCrashpadMinidumps:349  No path to crashpad_database_util found
(INFO) 2016-09-19 11:12:06,561 desktop_browser_backend._GetMostRecentMinidump:417  No minidump found via crashpad_database_util
Can't get standard output with --show-stdout
(WARNING) 2016-09-19 11:12:11,643 desktop_browser_backend.Close:589  Failed to gracefully shutdown.
(WARNING) 2016-09-19 11:12:11,643 desktop_browser_backend.Close:593  Proceed to kill the browser.
(ERROR) 2016-09-19 11:12:11,645 browser.__init__:62  Failure while starting browser backend.
Traceback (most recent call last):
  File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 55, in __init__
    self._browser_backend.Start()
  File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
    return func(*args, **kwargs)
  File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_backend.py", line 294, in Start
    self._WaitForBrowserToComeUp()
  File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
    return func(*args, **kwargs)
  File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/internal/backends/chrome/chrome_browser_backend.py", line 161, in _WaitForBrowserToComeUp
    raise exceptions.BrowserConnectionGoneException(self.browser, e)
BrowserConnectionGoneException: Timed out while waiting 60s for HasBrowserFinishedLaunching.
Found Minidump: False
Stack Trace:
********************************************************************************
	No crash dump found.
********************************************************************************
Standard output:
********************************************************************************
********************************************************************************
(WARNING) 2016-09-19 11:12:11,646 shared_page_state.DumpStateUponFailure:142  Cannot dump browser state: No browser.
(WARNING) 2016-09-19 11:12:11,646 shared_page_state.DumpStateUponFailure:150  Taking screenshots upon failures disabled.
Traceback (most recent call last):
  File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 79, in _RunStoryAndProcessErrorIfNeeded
    state.WillRunStory(story)
  File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
    return func(*args, **kwargs)
  File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/page/shared_page_state.py", line 224, in WillRunStory
    self._StartBrowser(page)
  File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
    return func(*args, **kwargs)
  File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/page/shared_page_state.py", line 184, in _StartBrowser
    self._browser = self._possible_browser.Create(self._finder_options)
  File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_finder.py", line 68, in Create
    browser_backend, self._platform_backend, self._credentials_path)
  File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 55, in __init__
    self._browser_backend.Start()
  File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
    return func(*args, **kwargs)
  File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_backend.py", line 294, in Start
    self._WaitForBrowserToComeUp()
  File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
    return func(*args, **kwargs)
  File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/internal/backends/chrome/chrome_browser_backend.py", line 161, in _WaitForBrowserToComeUp
    raise exceptions.BrowserConnectionGoneException(self.browser, e)
BrowserConnectionGoneException: Timed out while waiting 60s for HasBrowserFinishedLaunching.
Found Minidump: False
Stack Trace:
********************************************************************************
	No crash dump found.
********************************************************************************
Standard output:
********************************************************************************
********************************************************************************

[  FAILED  ] ScreenshotSync.GPURasterWithCanvas (68375 ms)


Possibly related  crbug.com/626987 
 

Comment 1 by kbr@chromium.org, Sep 19 2016

Blockedon: 352807
Components: Internals>GPU>Testing
We need to switch this test over to the new test harness in  Issue 352807 . Then browser launching will immediately become more reliable.

I think pixel_test on the same bot suffers from the same issue.
https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4026
https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4023
https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4008

Also trace_test, gpu_process_launch_tests, gpu_rasterization_tests and hardware_accelerated_feature_tests.
I didn't count, but is seems this bot is red half of the time.

Comment 3 by kbr@chromium.org, Sep 20 2016

Cc: yunchao...@intel.com jo...@chromium.org pschmidt@chromium.org qiankun....@intel.com
Components: Infra>Labs
The logs indicate the browser is failing to launch.

I'm not sure whether the machine is so slow that Debug builds can't be run effectively on it, or whether something is actually wrong with the browser.

Would appreciate feedback from the Labs team indicating whether this one-off machine configuration is expected to be somewhat slow, and from the Intel folks whether decommissioning it and focusing solely on Release build testing for the Intel GPU is reasonable. If this bot's red half the time it's useless.

I think it's a pretty recent (6 weeks) regression. I don't have notes on this bot being this flaky in my previous ANGLE wrangling rotations.

Comment 5 by jo...@chromium.org, Sep 21 2016

Specs for this bot:
Intel Core i5-4590T Processor (Quad Core, 6MB, 2.00GHz w/HD4600 Graphics)
16GB (2x8GB) 1600MHz DDR3L
500GB 2.5inch Serial ATA (7,200 RPM) Hard Drive

Comment 6 by pschm...@google.com, Sep 21 2016

Does it make sense to switch the underlying hosts between Linux Release and Linux Debug to see if the problem follows the host?   

Comment 7 by kbr@chromium.org, Sep 21 2016

Yuly owns the chromium.gpu.fyi waterfall this week so let's let him make the call on that.

Maybe it's a race condition in the OpenGL driver (these machines are running Mesa 11.2.0, not the top-of-tree 12.0.x series) that occasionally prevents the browser's window from opening correctly. Qiankun, Yunchao, do you think that hypothesis is possible?

Re #6 - if this will help narrowing the cause of the problem, then it does make sense. Is it easy to do?

Comment 9 by kbr@chromium.org, Sep 21 2016

pschmidt@: should I switch the entries in tools/build/masters/master.chromium.gpu.fyi/slaves.cfg ? Or did you have a different way to do the host swap?

Make the change in the slaves.cfg and I'll restart the slaves.  Note that this also effectively does a clobbber as well (which in this case should have nothing to do with this problem?)

Comment 11 by kbr@chromium.org, Sep 22 2016

pschmidt@: please feel free to restart the chromium.gpu.fyi waterfall once https://codereview.chromium.org/2359753004 lands.

The build directories for the two slaves will switch anyway, so yes, it'll be a clobber build -- no problem.

Project Member

Comment 12 by bugdroid1@chromium.org, Sep 22 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build.git/+/87d90e635544e95fcdd88919ed81d29858a6c802

commit 87d90e635544e95fcdd88919ed81d29858a6c802
Author: kbr <kbr@chromium.org>
Date: Thu Sep 22 01:59:58 2016

Switch the physical slaves for the Linux Intel Release/Debug bots.

This is to help diagnose whether failures are machine-specific.

BUG= 648369 
TBR=zmo@chromium.org

Review-Url: https://codereview.chromium.org/2359753004

[modify] https://crrev.com/87d90e635544e95fcdd88919ed81d29858a6c802/masters/master.chromium.gpu.fyi/slaves.cfg

Cc: yang...@intel.com
Project Member

Comment 14 by bugdroid1@chromium.org, Sep 22 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/master-manager.git/+/c733517daad5578ef8695bad8d63147e16930488

commit c733517daad5578ef8695bad8d63147e16930488
Author: pschmidt <pschmidt@google.com>
Date: Thu Sep 22 14:40:00 2016

You probably know this but the slave switch is in effect.

Comment 16 by kbr@chromium.org, Sep 22 2016

Owner: ynovikov@chromium.org
Status: Assigned (was: Untriaged)
Thanks Peter. Yuly, could you please own the task of watching these two bots and seeing if the unreliability has switched hosts?

Owner: ----
Status: Available (was: Assigned)
I think we have ruled out hardware being a possible cause of the problem.
https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4063
Any other ideas?
Cc: kbr@chromium.org
Maybe increase the timeout for debug builds to 120 sec?

Comment 19 by kbr@chromium.org, Sep 22 2016

I think we need to get these tests cut over to the new harness before trying to change timeouts. Those timeouts are buried deep in Telemetry's internals and are not easy to change.

I'll try to mark them all flaky.

Project Member

Comment 20 by bugdroid1@chromium.org, Sep 23 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/699db621b922b4c7b0eafa6abe63f7598c08f3de

commit 699db621b922b4c7b0eafa6abe63f7598c08f3de
Author: kbr <kbr@chromium.org>
Date: Fri Sep 23 00:53:51 2016

Mark all the pixel-related tests flaky on Linux Intel Debug.

I'm not optimistic that this will work with the current test harness,
but it's worth a try. These tests need to be cut over to the new
harness.

BUG= 648369 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel
TBR=zmo@chromium.org

Review-Url: https://codereview.chromium.org/2359373002
Cr-Commit-Position: refs/heads/master@{#420512}

[modify] https://crrev.com/699db621b922b4c7b0eafa6abe63f7598c08f3de/content/test/gpu/gpu_tests/pixel_expectations.py
[modify] https://crrev.com/699db621b922b4c7b0eafa6abe63f7598c08f3de/content/test/gpu/gpu_tests/screenshot_sync_expectations.py
[modify] https://crrev.com/699db621b922b4c7b0eafa6abe63f7598c08f3de/content/test/gpu/gpu_tests/trace_test_expectations.py

Could be related to crbug.com/649904

Comment 23 by kbr@chromium.org, Sep 24 2016

Issue 649904 is Windows-only; definitely not related.

Comment 24 by kbr@chromium.org, Oct 24 2016

At this point, the Linux Release (New Intel) bot is demonstrating flakiness:
https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29?numbuilds=200

while the Linux Debug (New Intel) bot looks pretty good (a few flakes, but mostly green):
https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29?numbuilds=200

Could the problem be bad hardware after all? Should we ask the Labs team to replace the Linux Release (New Intel) host?

I took a look at the console of both of these and there is something up with the dcim connection on build73-b1.

Let me get that fixed up first.
The flakiness on Linux Release (New Intel) is hard to see now, because of WebGL failures (going to suppress those soon, so it will be easier to see the other errors). Here is what I found:
Oct 07 13:32 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3103 - trace_test.WebGLGreenTriangle.AA.Alpha
Oct 09 11:21 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3114 - GpuProcess.driver_bug_workarounds_upon_gl_renderer
Oct 11 07:56 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3125 - GpuProcess.equal_bug_workarounds_in_browser_and_gpu_process
Oct 12 05:18 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3131 - HardwareAcceleratedFeature.canvas_accelerated
Oct 14 06:21 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3143 - GpuProcess.driver_bug_workarounds_upon_gl_renderer
Oct 16 03:24 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3154 - GpuProcess.software_gpu_process
Oct 17 15:41 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3163 - trace_test.SolidColorBackground
Oct 19 04:15 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3172 - trace_test.2DCanvasWebGL
Oct 19 20:29 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3176 - GpuProcess.identify_active_gpu4, trace_test.OffscreenCanvasWebGLGreenBox
Oct 20 00:34 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3177 - trace_test.CSS3DBlueBox
Oct 22 09:38 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3191 - GpuProcess.driver_bug_workarounds_in_gpu_process
Oct 23 11:04 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3194 - trace_test.WebGLGreenTriangle.AA.NoAlpha

Comparing Linux Debug (New Intel) to that, we have:
Oct 13 17:54 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4435 - GpuProcess.identify_active_gpu1
Oct 14 13:12 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4450 - GpuProcess.identify_active_gpu3
Oct 17 09:03 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4499 - trace_test.OffscreenCanvasAccelerated2DWorker
Oct 18 10:41 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4519 - trace_test.OffscreenCanvasAccelerated2DWorker
Oct 19 19:11 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4543 - trace_test.OffscreenCanvasTransferToImageBitmapWorker
Oct 20 10:23 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4555 - ScreenshotSync.SWRasterWithCanvas
Oct 24 12:35 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4608 - ScreenshotSync.SWRasterWithDivs

I'd say there is more flakiness on Release build-wise, but about same flakiness time-wise - 8 compared to 7 from Oct 13th to 24th. Strangely, Debug gets more builds than Release, so I can't see Debug history before Oct 12th.
And, Release and Debug flakes don't match.
Also, failure looks similar both on Release and Debug "Timed out while waiting 60s for HasBrowserFinishedLaunching". Maybe we need faster CPU + SSD :)
The Windows and Linux 'New Intel" bots are Dell Optiplex 9020 desktops with i5-4590T cpu  (HD 4600 integrated graphics) 

Should we replace these with something faster?

Comment 28 by kbr@chromium.org, Oct 27 2016

It'd be great to qualify new hardware -- would it be possible to bring it up side-by-side with the current hardware?

Now that you mention it (comment #27), I think all Windows and Linux "New Intel" bots are flaky. See issue 653541, issue 659810, issue 649904.
Maybe GPU overheats from our tests, and in turn heats the CPU, and then all the system is thermal throttled and starts to flake? Can we get some throttling diagnostics from these machines?

Comment 30 by kbr@chromium.org, Apr 27 2017

Status: WontFix (was: Available)
These machines don't exist any more.

Sign in to add a comment