screenshot_sync_tests are flaky on Linux Debug (New Intel) |
|||||||
Issue descriptionhttps://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4004 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4003 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/3992 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/3988 Earliest I see is https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/3826 Earliest with log is https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/3875 A different test fails each time, for example: GPURasterWithDivs GPURasterWithCanvas SWRasterWithCanvas SWRasterWithDivs I think the browser fails to start, for example from https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4004/steps/screenshot_sync_tests/logs/stdio: [ RUN ] ScreenshotSync.GPURasterWithCanvas (INFO) 2016-09-19 11:11:03,274 desktop_browser_backend.GetBrowserStartupArgs:250 Requested remote debugging port: 0 (INFO) 2016-09-19 11:11:03,274 desktop_browser_backend.Start:285 Starting Chrome ['/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/out/Debug/chrome', '--js-flags=--expose-gc', '--enable-logging=stderr', '--force-gpu-rasterization', '--test-type=gpu', '--enable-net-benchmarking', '--metrics-recording-only', '--no-default-browser-check', '--no-first-run', '--enable-gpu-benchmarking', '--disable-background-networking', '--proxy-server=socks://localhost:41736', '--ignore-certificate-errors', '--disable-component-extensions-with-background-pages', '--disable-default-apps', '--remote-debugging-port=0', '--enable-crash-reporter-for-testing', '--window-size=1280,1024', '--user-data-dir=/b/c/b/Linux_Debug__New_Intel_/itON9j5R/tmpUkbj6D', 'about:blank'] [1:1:0919/111103:ERROR:memory_mapped_file.cc(52)] Couldn't open /b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/out/Debug/chrome_200_percent.pak [1:1:0919/111103:ERROR:data_pack.cc(79)] Failed to mmap datapack [21977:21977:0919/111104:ERROR:memory_mapped_file.cc(52)] Couldn't open /b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/out/Debug/chrome_200_percent.pak [21977:21977:0919/111104:ERROR:data_pack.cc(79)] Failed to mmap datapack [21977:21977:0919/111104:WARNING:persistent_histogram_allocator.cc(485)] Creating the results-histogram inside persistent memory can cause future allocations to crash if that memory is ever released (for testing). [21977:21977:0919/111104:WARNING:password_store_factory.cc(248)] Using basic (unencrypted) store for password storage. See https://chromium.googlesource.com/chromium/src/+/master/docs/linux_password_storage.md for more information about password storage options. (INFO) 2016-09-19 11:11:04,746 desktop_browser_backend.HasBrowserFinishedLaunching:237 Discovered ephemeral port 44131 [22060:22067:0919/111105:WARNING:persistent_histogram_allocator.cc(485)] Creating the results-histogram inside persistent memory can cause future allocations to crash if that memory is ever released (for testing). (INFO) 2016-09-19 11:11:15,906 desktop_browser_backend.HasBrowserFinishedLaunching:237 Discovered ephemeral port 44131 (INFO) 2016-09-19 11:11:28,183 desktop_browser_backend.HasBrowserFinishedLaunching:237 Discovered ephemeral port 44131 (INFO) 2016-09-19 11:11:41,689 desktop_browser_backend.HasBrowserFinishedLaunching:237 Discovered ephemeral port 44131 (INFO) 2016-09-19 11:11:56,548 desktop_browser_backend.HasBrowserFinishedLaunching:237 Discovered ephemeral port 44131 (WARNING) 2016-09-19 11:12:06,559 desktop_browser_backend._GetAllCrashpadMinidumps:349 No path to crashpad_database_util found (INFO) 2016-09-19 11:12:06,560 desktop_browser_backend._GetMostRecentMinidump:417 No minidump found via crashpad_database_util (WARNING) 2016-09-19 11:12:06,560 desktop_browser_backend._GetAllCrashpadMinidumps:349 No path to crashpad_database_util found (INFO) 2016-09-19 11:12:06,561 desktop_browser_backend._GetMostRecentMinidump:417 No minidump found via crashpad_database_util Can't get standard output with --show-stdout (WARNING) 2016-09-19 11:12:11,643 desktop_browser_backend.Close:589 Failed to gracefully shutdown. (WARNING) 2016-09-19 11:12:11,643 desktop_browser_backend.Close:593 Proceed to kill the browser. (ERROR) 2016-09-19 11:12:11,645 browser.__init__:62 Failure while starting browser backend. Traceback (most recent call last): File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 55, in __init__ self._browser_backend.Start() File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_backend.py", line 294, in Start self._WaitForBrowserToComeUp() File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/internal/backends/chrome/chrome_browser_backend.py", line 161, in _WaitForBrowserToComeUp raise exceptions.BrowserConnectionGoneException(self.browser, e) BrowserConnectionGoneException: Timed out while waiting 60s for HasBrowserFinishedLaunching. Found Minidump: False Stack Trace: ******************************************************************************** No crash dump found. ******************************************************************************** Standard output: ******************************************************************************** ******************************************************************************** (WARNING) 2016-09-19 11:12:11,646 shared_page_state.DumpStateUponFailure:142 Cannot dump browser state: No browser. (WARNING) 2016-09-19 11:12:11,646 shared_page_state.DumpStateUponFailure:150 Taking screenshots upon failures disabled. Traceback (most recent call last): File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 79, in _RunStoryAndProcessErrorIfNeeded state.WillRunStory(story) File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/page/shared_page_state.py", line 224, in WillRunStory self._StartBrowser(page) File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/page/shared_page_state.py", line 184, in _StartBrowser self._browser = self._possible_browser.Create(self._finder_options) File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_finder.py", line 68, in Create browser_backend, self._platform_backend, self._credentials_path) File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 55, in __init__ self._browser_backend.Start() File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_backend.py", line 294, in Start self._WaitForBrowserToComeUp() File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/b/c/b/Linux_Debug__New_Intel_/irg0Y9qS/third_party/catapult/telemetry/telemetry/internal/backends/chrome/chrome_browser_backend.py", line 161, in _WaitForBrowserToComeUp raise exceptions.BrowserConnectionGoneException(self.browser, e) BrowserConnectionGoneException: Timed out while waiting 60s for HasBrowserFinishedLaunching. Found Minidump: False Stack Trace: ******************************************************************************** No crash dump found. ******************************************************************************** Standard output: ******************************************************************************** ******************************************************************************** [ FAILED ] ScreenshotSync.GPURasterWithCanvas (68375 ms) Possibly related crbug.com/626987
,
Sep 20 2016
I think pixel_test on the same bot suffers from the same issue. https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4026 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4023 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4008 Also trace_test, gpu_process_launch_tests, gpu_rasterization_tests and hardware_accelerated_feature_tests. I didn't count, but is seems this bot is red half of the time.
,
Sep 20 2016
The logs indicate the browser is failing to launch. I'm not sure whether the machine is so slow that Debug builds can't be run effectively on it, or whether something is actually wrong with the browser. Would appreciate feedback from the Labs team indicating whether this one-off machine configuration is expected to be somewhat slow, and from the Intel folks whether decommissioning it and focusing solely on Release build testing for the Intel GPU is reasonable. If this bot's red half the time it's useless.
,
Sep 21 2016
I think it's a pretty recent (6 weeks) regression. I don't have notes on this bot being this flaky in my previous ANGLE wrangling rotations.
,
Sep 21 2016
Specs for this bot: Intel Core i5-4590T Processor (Quad Core, 6MB, 2.00GHz w/HD4600 Graphics) 16GB (2x8GB) 1600MHz DDR3L 500GB 2.5inch Serial ATA (7,200 RPM) Hard Drive
,
Sep 21 2016
Does it make sense to switch the underlying hosts between Linux Release and Linux Debug to see if the problem follows the host?
,
Sep 21 2016
Yuly owns the chromium.gpu.fyi waterfall this week so let's let him make the call on that. Maybe it's a race condition in the OpenGL driver (these machines are running Mesa 11.2.0, not the top-of-tree 12.0.x series) that occasionally prevents the browser's window from opening correctly. Qiankun, Yunchao, do you think that hypothesis is possible?
,
Sep 21 2016
Re #6 - if this will help narrowing the cause of the problem, then it does make sense. Is it easy to do?
,
Sep 21 2016
pschmidt@: should I switch the entries in tools/build/masters/master.chromium.gpu.fyi/slaves.cfg ? Or did you have a different way to do the host swap?
,
Sep 21 2016
Make the change in the slaves.cfg and I'll restart the slaves. Note that this also effectively does a clobbber as well (which in this case should have nothing to do with this problem?)
,
Sep 22 2016
pschmidt@: please feel free to restart the chromium.gpu.fyi waterfall once https://codereview.chromium.org/2359753004 lands. The build directories for the two slaves will switch anyway, so yes, it'll be a clobber build -- no problem.
,
Sep 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build.git/+/87d90e635544e95fcdd88919ed81d29858a6c802 commit 87d90e635544e95fcdd88919ed81d29858a6c802 Author: kbr <kbr@chromium.org> Date: Thu Sep 22 01:59:58 2016 Switch the physical slaves for the Linux Intel Release/Debug bots. This is to help diagnose whether failures are machine-specific. BUG= 648369 TBR=zmo@chromium.org Review-Url: https://codereview.chromium.org/2359753004 [modify] https://crrev.com/87d90e635544e95fcdd88919ed81d29858a6c802/masters/master.chromium.gpu.fyi/slaves.cfg
,
Sep 22 2016
,
Sep 22 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/master-manager.git/+/c733517daad5578ef8695bad8d63147e16930488 commit c733517daad5578ef8695bad8d63147e16930488 Author: pschmidt <pschmidt@google.com> Date: Thu Sep 22 14:40:00 2016
,
Sep 22 2016
You probably know this but the slave switch is in effect.
,
Sep 22 2016
Thanks Peter. Yuly, could you please own the task of watching these two bots and seeing if the unreliability has switched hosts?
,
Sep 22 2016
I think we have ruled out hardware being a possible cause of the problem. https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4063 Any other ideas?
,
Sep 22 2016
Maybe increase the timeout for debug builds to 120 sec?
,
Sep 22 2016
I think we need to get these tests cut over to the new harness before trying to change timeouts. Those timeouts are buried deep in Telemetry's internals and are not easy to change. I'll try to mark them all flaky.
,
Sep 23 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/699db621b922b4c7b0eafa6abe63f7598c08f3de commit 699db621b922b4c7b0eafa6abe63f7598c08f3de Author: kbr <kbr@chromium.org> Date: Fri Sep 23 00:53:51 2016 Mark all the pixel-related tests flaky on Linux Intel Debug. I'm not optimistic that this will work with the current test harness, but it's worth a try. These tests need to be cut over to the new harness. BUG= 648369 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel TBR=zmo@chromium.org Review-Url: https://codereview.chromium.org/2359373002 Cr-Commit-Position: refs/heads/master@{#420512} [modify] https://crrev.com/699db621b922b4c7b0eafa6abe63f7598c08f3de/content/test/gpu/gpu_tests/pixel_expectations.py [modify] https://crrev.com/699db621b922b4c7b0eafa6abe63f7598c08f3de/content/test/gpu/gpu_tests/screenshot_sync_expectations.py [modify] https://crrev.com/699db621b922b4c7b0eafa6abe63f7598c08f3de/content/test/gpu/gpu_tests/trace_test_expectations.py
,
Sep 23 2016
,
Sep 24 2016
Could be related to crbug.com/649904
,
Sep 24 2016
Issue 649904 is Windows-only; definitely not related.
,
Oct 24 2016
At this point, the Linux Release (New Intel) bot is demonstrating flakiness: https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29?numbuilds=200 while the Linux Debug (New Intel) bot looks pretty good (a few flakes, but mostly green): https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29?numbuilds=200 Could the problem be bad hardware after all? Should we ask the Labs team to replace the Linux Release (New Intel) host?
,
Oct 24 2016
I took a look at the console of both of these and there is something up with the dcim connection on build73-b1. Let me get that fixed up first.
,
Oct 24 2016
The flakiness on Linux Release (New Intel) is hard to see now, because of WebGL failures (going to suppress those soon, so it will be easier to see the other errors). Here is what I found: Oct 07 13:32 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3103 - trace_test.WebGLGreenTriangle.AA.Alpha Oct 09 11:21 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3114 - GpuProcess.driver_bug_workarounds_upon_gl_renderer Oct 11 07:56 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3125 - GpuProcess.equal_bug_workarounds_in_browser_and_gpu_process Oct 12 05:18 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3131 - HardwareAcceleratedFeature.canvas_accelerated Oct 14 06:21 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3143 - GpuProcess.driver_bug_workarounds_upon_gl_renderer Oct 16 03:24 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3154 - GpuProcess.software_gpu_process Oct 17 15:41 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3163 - trace_test.SolidColorBackground Oct 19 04:15 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3172 - trace_test.2DCanvasWebGL Oct 19 20:29 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3176 - GpuProcess.identify_active_gpu4, trace_test.OffscreenCanvasWebGLGreenBox Oct 20 00:34 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3177 - trace_test.CSS3DBlueBox Oct 22 09:38 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3191 - GpuProcess.driver_bug_workarounds_in_gpu_process Oct 23 11:04 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28New%20Intel%29/builds/3194 - trace_test.WebGLGreenTriangle.AA.NoAlpha Comparing Linux Debug (New Intel) to that, we have: Oct 13 17:54 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4435 - GpuProcess.identify_active_gpu1 Oct 14 13:12 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4450 - GpuProcess.identify_active_gpu3 Oct 17 09:03 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4499 - trace_test.OffscreenCanvasAccelerated2DWorker Oct 18 10:41 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4519 - trace_test.OffscreenCanvasAccelerated2DWorker Oct 19 19:11 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4543 - trace_test.OffscreenCanvasTransferToImageBitmapWorker Oct 20 10:23 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4555 - ScreenshotSync.SWRasterWithCanvas Oct 24 12:35 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Debug%20%28New%20Intel%29/builds/4608 - ScreenshotSync.SWRasterWithDivs I'd say there is more flakiness on Release build-wise, but about same flakiness time-wise - 8 compared to 7 from Oct 13th to 24th. Strangely, Debug gets more builds than Release, so I can't see Debug history before Oct 12th. And, Release and Debug flakes don't match. Also, failure looks similar both on Release and Debug "Timed out while waiting 60s for HasBrowserFinishedLaunching". Maybe we need faster CPU + SSD :)
,
Oct 27 2016
The Windows and Linux 'New Intel" bots are Dell Optiplex 9020 desktops with i5-4590T cpu (HD 4600 integrated graphics) Should we replace these with something faster?
,
Oct 27 2016
It'd be great to qualify new hardware -- would it be possible to bring it up side-by-side with the current hardware?
,
Oct 28 2016
Now that you mention it (comment #27), I think all Windows and Linux "New Intel" bots are flaky. See issue 653541, issue 659810, issue 649904. Maybe GPU overheats from our tests, and in turn heats the CPU, and then all the system is thermal throttled and starts to flake? Can we get some throttling diagnostics from these machines?
,
Apr 27 2017
These machines don't exist any more. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by kbr@chromium.org
, Sep 19 2016Components: Internals>GPU>Testing