Issue metadata
Sign in to add a comment
|
"telemetry_unittests (with patch)" is flaky |
||||||||||||||||||||||||
Issue description"telemetry_unittests (with patch)" is flaky. This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label. We have detected 3 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKwsSBUZsYWtlIiB0ZWxlbWV0cnlfdW5pdHRlc3RzICh3aXRoIHBhdGNoKQw. This flaky test/step was previously tracked in issue 608296 .
,
Jun 1 2016
,
Jun 2 2016
So here's what happens in the failing test (https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/internal/backends/chrome_inspector/tracing_backend_unittest.py#L42): self.assertRaises(Exception, self._browser.DumpMemory) # This passes so tracing was not running before the test (at least according # to Telemetry), as EXPECTED. ... self._tracing_controller.StartTracing(config) # This starts tracing with the CORRECT config that enables memory-infra # (taken from the stdio log): # { # 'params': { # 'transferMode': 'ReturnAsStream', # 'traceConfig': { # 'enableSystrace': False, # 'recordMode': 'recordAsMuchAsPossible', # 'includedCategories': ['disabled-by-default-memory-infra'] # } # }, # 'method': 'Tracing.start' # } ... dump_id = self._browser.DumpMemory() self.assertIsNotNone(dump_id) # This FAILS indicating that the memory dump manager on Chrome's side # failed to create a memory dump. As far as I can tell, a memory dump can fail for several reasons: 1. Tracing is not enabled: * when the dump is requested (https://code.google.com/p/chromium/codesearch#chromium/src/base/trace_event/memory_dump_manager.cc&l=362), * during the dump (https://code.google.com/p/chromium/codesearch#chromium/src/base/trace_event/memory_dump_manager.cc&l=446), * when the dump is finalized (https://code.google.com/p/chromium/codesearch#chromium/src/base/trace_event/memory_dump_manager.cc&l=620). 2. Dump request from child process failed (https://code.google.com/p/chromium/codesearch#chromium/src/components/tracing/child/child_memory_dump_manager_delegate_impl.cc&l=18). Unless there's a bug in Telemetry, this CANNOT be the cause of this issue because we send the Tracing.start request to the browser process. 3. Another memory dump is in progress (https://code.google.com/p/chromium/codesearch#chromium/src/content/browser/tracing/tracing_controller_impl.cc&l=928). 4. Some child process failed to dump (https://code.google.com/p/chromium/codesearch#chromium/src/content/browser/tracing/tracing_controller_impl.cc&l=1028). I think that one of the following two things might have caused the flakiness: * The custom browser flags required by TracingBackendTest (https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/internal/backends/chrome_inspector/tracing_backend_unittest.py#L30) are not passed to the browser on which the test runs. nenguyen@: Have there been any changes to the way Telemetry tests are run (e.g. swarming) that could have caused this? * There might have been a recent change in Chrome that causes memory dumps to fail for reasons 1 or, more likely, 4. In either case, the problem is that there's no way to determine *why* a memory dump failed: primiano@: Could you please add DLOG calls on failures or, even better, extend the DevTools protocol with an optional string error message result parameter? nednguyen@: Does Telemetry display Chrome's (D)LOGs upon unittest failures?
,
Jun 2 2016
We don't display DLOG on every unittest failure cuz not all unittest in telemetry involve the browser. You can get the browser's stdout upon failure using browser.GetStandardOutput(): https://code.google.com/p/chromium/codesearch#chromium/src/third_party/catapult/telemetry/telemetry/internal/browser/browser.py&l=272 or add a method to expose the log_file_path on browser API (https://code.google.com/p/chromium/codesearch#chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_backend.py&l=143)
,
Jun 2 2016
I don't think the custom browser flags is the cause cuz this is a flaky failure. You can also check the flag in the log here: https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_rel_ng/builds/223360/steps/telemetry_unittests%20%28with%20patch%29%20on%20Windows-7-SP1/logs/stdio Retrying failed tests (attempt #3 of 3)... [1/1] telemetry.internal.backends.chrome_inspector.tracing_backend_unittest.TracingBackendTest.testDumpMemorySuccess failed unexpectedly 0.5690s: Chrome build location for win_AMD64 not found. Browser will be run without Flash. Requested remote debugging port: 0 Starting Chrome ['../../out\\Release_x64\\chrome.exe', '--no-sandbox', '--enable-memory-benchmarking', '--enable-net-benchmarking', '--metrics-recording-only', '--no-default-browser-check', '--no-first-run', '--enable-gpu-benchmarking', '--disable-background-networking', '--no-proxy-server', '--disable-component-extensions-with-background-pages', '--disable-default-apps', '--remote-debugging-port=0', '--enable-crash-reporter-for-testing', '--window-size=1280,1024', '--user-data-dir=e:\\b\\swarm_slave\\work\\isolated\\tmpbuiesn\\tmpbeqotx', '--trace-config-file=e:\\b\\swarm_slave\\work\\isolated\\tmpbuiesn\\tmp2rhrdv\\chrome-trace-config.json', 'about:blank'] Discovered ephemeral port 61085 OS: win win7
,
Jun 3 2016
Detected 5 new flakes for test/step "telemetry_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKwsSBUZsYWtlIiB0ZWxlbWV0cnlfdW5pdHRlc3RzICh3aXRoIHBhdGNoKQw. This message was posted automatically by the chromium-try-flakes app.
,
Jun 3 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/138c1557c50620eb1bbc567e1938a658909ff061 commit 138c1557c50620eb1bbc567e1938a658909ff061 Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Fri Jun 03 02:24:56 2016 Roll src/third_party/catapult/ 857c0b7e4..3a308b4e6 (12 commits). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/857c0b7e4db6..3a308b4e6d56 $ git log 857c0b7e4..3a308b4e6 --date=short --no-merges --format='%ad %ae %s' BUG= 352807 ,616832, 608414 , 616483 TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2033963002 Cr-Commit-Position: refs/heads/master@{#397578} [modify] https://crrev.com/138c1557c50620eb1bbc567e1938a658909ff061/DEPS
,
Jun 4 2016
Detected 11 new flakes for test/step "telemetry_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKwsSBUZsYWtlIiB0ZWxlbWV0cnlfdW5pdHRlc3RzICh3aXRoIHBhdGNoKQw. This message was posted automatically by the chromium-try-flakes app.
,
Jun 6 2016
Detected 3 new flakes for test/step "telemetry_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKwsSBUZsYWtlIiB0ZWxlbWV0cnlfdW5pdHRlc3RzICh3aXRoIHBhdGNoKQw. This message was posted automatically by the chromium-try-flakes app.
,
Jun 7 2016
Detected 7 new flakes for test/step "telemetry_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKwsSBUZsYWtlIiB0ZWxlbWV0cnlfdW5pdHRlc3RzICh3aXRoIHBhdGNoKQw. This message was posted automatically by the chromium-try-flakes app.
,
Jun 8 2016
Detected 6 new flakes for test/step "telemetry_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKwsSBUZsYWtlIiB0ZWxlbWV0cnlfdW5pdHRlc3RzICh3aXRoIHBhdGNoKQw. This message was posted automatically by the chromium-try-flakes app.
,
Jun 8 2016
Petr, this is blocking many other's CLs already. Can you find someone to help with fixing this?
,
Jun 8 2016
Ned: The logs now contain standard output from the browser, but it's always empty, which I suppose is due to the fact that the perfbots use release builds. Primiano: It seems that adding DLOGs to all code locations mentioned above (https://bugs.chromium.org/p/chromium/issues/detail?id=616483#c3) will NOT work here (since we're using release builds on perfbots). Can you think of any other way than plumbing the dump failure reasons through the DevTools protocol?
,
Jun 8 2016
To the bot support --enable-logging -v? If so we could add a VLOG (verbose log). Does this issue reproduce locally by any chance?
,
Jun 8 2016
I'll land a patch tomorrow that logs reasons for memory dump failures: https://codereview.chromium.org/2049143002/ Hopefully, it will help us figure out what's going on.
,
Jun 9 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8303e43ab4eea56f6e173e92215e36f5ecfec0eb commit 8303e43ab4eea56f6e173e92215e36f5ecfec0eb Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Thu Jun 09 01:35:10 2016 Roll src/third_party/catapult/ 901f7edd7..8ff23141e (14 commits). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/901f7edd757b..8ff23141e1df $ git log 901f7edd7..8ff23141e --date=short --no-merges --format='%ad %ae %s' BUG=552376,552376, 613706 ,617849, 616483 ,616832 TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2048953003 Cr-Commit-Position: refs/heads/master@{#398760} [modify] https://crrev.com/8303e43ab4eea56f6e173e92215e36f5ecfec0eb/DEPS
,
Jun 9 2016
Detected 17 new flakes for test/step "telemetry_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKwsSBUZsYWtlIiB0ZWxlbWV0cnlfdW5pdHRlc3RzICh3aXRoIHBhdGNoKQw. This message was posted automatically by the chromium-try-flakes app.
,
Jun 9 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7d8aadbdcca5302d1c84637aeca865a791afd5da commit 7d8aadbdcca5302d1c84637aeca865a791afd5da Author: petrcermak <petrcermak@chromium.org> Date: Thu Jun 09 11:13:04 2016 [memory-infra] Log reasons for memory dump failure All memory dump failures will now lead to VLOG(1) error messages so that the reason for failure could be determined from outside chrome (e.g. in Telemetry). BUG= 616483 TBR=shatch Review-Url: https://codereview.chromium.org/2049143002 Cr-Commit-Position: refs/heads/master@{#398838} [modify] https://crrev.com/7d8aadbdcca5302d1c84637aeca865a791afd5da/base/trace_event/memory_dump_manager.cc [modify] https://crrev.com/7d8aadbdcca5302d1c84637aeca865a791afd5da/base/trace_event/memory_dump_manager.h [modify] https://crrev.com/7d8aadbdcca5302d1c84637aeca865a791afd5da/components/tracing/child/child_memory_dump_manager_delegate_impl.cc [modify] https://crrev.com/7d8aadbdcca5302d1c84637aeca865a791afd5da/content/browser/tracing/tracing_controller_impl.cc
,
Jun 9 2016
Now that logging is enabled (see previous comment), we have to wait for the next flake. Hopefully, it'll provide us with more info.
,
Jun 10 2016
Detected 19 new flakes for test/step "telemetry_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKwsSBUZsYWtlIiB0ZWxlbWV0cnlfdW5pdHRlc3RzICh3aXRoIHBhdGNoKQw. This message was posted automatically by the chromium-try-flakes app.
,
Jun 10 2016
It turns out that all messages from Chrome are stored in the browser's log file (rather than printed to standard output). Unfortunately, as far as I can tell, the logs are thrown away because "Environment variable DISABLE_CLOUD_STORAGE_IO is set to 1" :-\ I uploaded a Telemetry patch that, in the case of a test failure, prints the browser log contents as well (https://codereview.chromium.org/2053313002/). Furtmermore, I manually checked that the printed log is non-empty on Windows (it's actually quite verbose) if I force an exception in the flaky test. The only thing (that I can think of right now) that could prevent this approach from working is if, for some reason, the error message is not flushed into the log before it's read by Telemetry. Let's hope that's not the case.
,
Jun 10 2016
,
Jun 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/cbd90f2b8b145f79e09cb0848c6d34df214898aa commit cbd90f2b8b145f79e09cb0848c6d34df214898aa Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Fri Jun 10 14:43:37 2016 Roll src/third_party/catapult/ 2c5cde737..e37097a1c (1 commit). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/2c5cde737888..e37097a1cbf9 $ git log 2c5cde737..e37097a1c --date=short --no-merges --format='%ad %ae %s' BUG= 616483 , 612779 TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2058073002 Cr-Commit-Position: refs/heads/master@{#399180} [modify] https://crrev.com/cbd90f2b8b145f79e09cb0848c6d34df214898aa/DEPS
,
Jun 10 2016
And now we play the waiting game (for another flake)...
,
Jun 10 2016
,
Jun 10 2016
See also Issue 619106 for a possible cause.
,
Jun 10 2016
#26: That is very likely. Thanks! I'll wait until the flake happens again and mark it as a duplicate of 619106 if appropriate.
,
Jun 10 2016
OK, I can confirm that the issue is due to a GPU failure (https://build.chromium.org/p/chromium.win/builders/Win7%20Tests%20%28dbg%29%281%29/builds/49551/steps/telemetry_unittests%20on%20Windows-7-SP1/logs/stdio): [2228:1488:0610/115003:FATAL:gpu_info_collector.cc(104)] Check failed: gl::GetGLImplementation() != gl::kGLImplementationNone (0 vs. 0) Backtrace: base::debug::StackTrace::StackTrace [0x000007FEE17C5571+33] logging::LogMessage::~LogMessage [0x000007FEE1722FAC+76] gpu::CollectGraphicsInfoGL [0x000007FEDE67E06B+335] gpu::CollectContextGraphicsInfo [0x000007FEDE67FFDC+452] content::GpuChildThread::OnCollectGraphicsInfo [0x000007FEE12B27E1+149] ??$Dispatch@VGpuChildThread@content@@V12@XP812@EAAXXZ@?$MessageT@UGpuMsg_CollectGraphicsInfo_Meta@@V?$tuple@$$V@std@@X@IPC@@SA_NPEBVMessage@1@PEAVGpuChildThread@content@@1PEAXP834@EAAXXZ@Z [0x000007FEE12B06F6+190] content::GpuChildThread::OnControlMessageReceived [0x000007FEE12B2DBF+943] content::ChildThreadImpl::OnMessageReceived [0x000007FEDFF7056B+171] content::GpuChildThread::OnMessageReceived [0x000007FEE12B3C56+34] IPC::ChannelProxy::Context::OnDispatchMessage [0x000007FEDDC86000+40] base::debug::TaskAnnotator::RunTask [0x000007FEE17C67CC+316] base::MessageLoop::RunTask [0x000007FEE1730F6C+876] base::MessageLoop::DoWork [0x000007FEE172FF10+752] base::MessagePumpForGpu::DoRunLoop [0x000007FEE178A01A+186] base::MessagePumpWin::Run [0x000007FEE178AE64+84] base::RunLoop::Run [0x000007FEE17BC17E+46] base::MessageLoop::Run [0x000007FEE1730B0B+107] content::GpuMain [0x000007FEE12B54AE+2426] content::RunNamedProcessTypeMain [0x000007FEE16FB0F0+448] content::ContentMainRunnerImpl::Run [0x000007FEE16FAED6+358] content::ContentMain [0x000007FEE16F9FA8+48] ChromeMain [0x000007FEDD9C8424+148] MainDllLoader::Launch [0x000000013FDB9826+838] wWinMain [0x000000013FDB7BD3+611] __scrt_common_main_seh [0x00000001408989B6+286] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:255) BaseThreadInitThunk [0x0000000076C859CD+13] RtlUserThreadStart [0x0000000076EBB891+33] [5996:4596:0610/115004:VERBOSE1:message_pipe_dispatcher.cc(109)] Closing message pipe 15119425933333389701 endpoint 0 [port=73FE752643878BDA.623B449E3C6F492E] [5996:4596:0610/115004:VERBOSE1:message_pipe_dispatcher.cc(109)] Closing message pipe 13771432344605787599 endpoint 1 [port=EB929AFD4AAEE753.3DC39E98B382D4DB] [5996:4596:0610/115004:VERBOSE1:message_pipe_dispatcher.cc(109)] Closing message pipe 9187201950435737471 endpoint 0 [port=BA16779B9BB3478.43C06E4F0FC3F95D] [5996:4596:0610/115004:VERBOSE1:node_controller.cc(445)] Dropped peer DAD12464AC277DEC.C58CD361D2B5AB11 [5996:4612:0610/115004:VERBOSE1:tracing_controller_impl.cc(1005)] Memory-infra dump failed because of NACK from child 2228
,
Jun 15 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7d8aadbdcca5302d1c84637aeca865a791afd5da commit 7d8aadbdcca5302d1c84637aeca865a791afd5da Author: petrcermak <petrcermak@chromium.org> Date: Thu Jun 09 11:13:04 2016 [memory-infra] Log reasons for memory dump failure All memory dump failures will now lead to VLOG(1) error messages so that the reason for failure could be determined from outside chrome (e.g. in Telemetry). BUG= 616483 TBR=shatch Review-Url: https://codereview.chromium.org/2049143002 Cr-Commit-Position: refs/heads/master@{#398838} [modify] https://crrev.com/7d8aadbdcca5302d1c84637aeca865a791afd5da/base/trace_event/memory_dump_manager.cc [modify] https://crrev.com/7d8aadbdcca5302d1c84637aeca865a791afd5da/base/trace_event/memory_dump_manager.h [modify] https://crrev.com/7d8aadbdcca5302d1c84637aeca865a791afd5da/components/tracing/child/child_memory_dump_manager_delegate_impl.cc [modify] https://crrev.com/7d8aadbdcca5302d1c84637aeca865a791afd5da/content/browser/tracing/tracing_controller_impl.cc
,
Jun 15 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/cbd90f2b8b145f79e09cb0848c6d34df214898aa commit cbd90f2b8b145f79e09cb0848c6d34df214898aa Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Fri Jun 10 14:43:37 2016 Roll src/third_party/catapult/ 2c5cde737..e37097a1c (1 commit). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/2c5cde737888..e37097a1cbf9 $ git log 2c5cde737..e37097a1c --date=short --no-merges --format='%ad %ae %s' BUG= 616483 , 612779 TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2058073002 Cr-Commit-Position: refs/heads/master@{#399180} [modify] https://crrev.com/cbd90f2b8b145f79e09cb0848c6d34df214898aa/DEPS |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by pkasting@chromium.org
, Jun 1 2016Labels: -Sheriff-Chromium OS-Windows
Owner: petrcermak@chromium.org
I don't know whether it's a coincidence, but all three failing runs have telemetry.internal.backends.chrome_inspector.tracing_backend_unittest.TracingBackendTest.testDumpMemorySuccess as a failing test. Here's a sample failure line for that test: Traceback (most recent call last): File "e:\b\swarm_slave\work\isolated\runpnnbth\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\tracing_backend_unittest.py", line 58, in testDumpMemorySuccess self.assertIsNotNone(dump_id) AssertionError: unexpectedly None This portion of the test has not been touched in some time, so the bug is presumably not in the test itself. However, knowing nothing about telemetry, I'm not sure how to proceed further. I checked the recent catapult DEPS rolls, but saw nothing obviously wrong (but, again, since I don't know much, that doesn't mean much). ->petrcermak, the person who last touched the failing line, in hopes of understanding the test well enough to triage better.