New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 616483 link

Starred by 0 users

Issue metadata

Status: Duplicate
Merged: issue 619106
Owner:
Last visit > 30 days ago
Closed: Jun 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug

Blocked on:
issue 612779
issue 619106



Sign in to add a comment

"telemetry_unittests (with patch)" is flaky

Project Member Reported by chromium...@appspot.gserviceaccount.com, Jun 1 2016

Issue description

"telemetry_unittests (with patch)" is flaky.

This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label.

We have detected 3 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKwsSBUZsYWtlIiB0ZWxlbWV0cnlfdW5pdHRlc3RzICh3aXRoIHBhdGNoKQw.



This flaky test/step was previously tracked in  issue 608296 .
 
Components: Tests>Telemetry
Labels: -Sheriff-Chromium OS-Windows
Owner: petrcermak@chromium.org
I don't know whether it's a coincidence, but all three failing runs have telemetry.internal.backends.chrome_inspector.tracing_backend_unittest.TracingBackendTest.testDumpMemorySuccess as a failing test.

Here's a sample failure line for that test:

  Traceback (most recent call last):
    File "e:\b\swarm_slave\work\isolated\runpnnbth\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\tracing_backend_unittest.py", line 58, in testDumpMemorySuccess
      self.assertIsNotNone(dump_id)
  AssertionError: unexpectedly None

This portion of the test has not been touched in some time, so the bug is presumably not in the test itself.  However, knowing nothing about telemetry, I'm not sure how to proceed further.  I checked the recent catapult DEPS rolls, but saw nothing obviously wrong (but, again, since I don't know much, that doesn't mean much).

->petrcermak, the person who last touched the failing line, in hopes of understanding the test well enough to triage better.
Cc: primiano@chromium.org perezju@chromium.org
So here's what happens in the failing test (https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/internal/backends/chrome_inspector/tracing_backend_unittest.py#L42):

  self.assertRaises(Exception, self._browser.DumpMemory)
  # This passes so tracing was not running before the test (at least according
  # to Telemetry), as EXPECTED.

  ...

  self._tracing_controller.StartTracing(config)
  # This starts tracing with the CORRECT config that enables memory-infra
  # (taken from the stdio log):
  # {
  #   'params': {
  #     'transferMode': 'ReturnAsStream',
  #     'traceConfig': {
  #       'enableSystrace': False,
  #       'recordMode': 'recordAsMuchAsPossible',
  #       'includedCategories': ['disabled-by-default-memory-infra']
  #     }
  #   },
  #   'method': 'Tracing.start'
  # }

  ...

  dump_id = self._browser.DumpMemory()
  self.assertIsNotNone(dump_id)
  # This FAILS indicating that the memory dump manager on Chrome's side
  # failed to create a memory dump.

As far as I can tell, a memory dump can fail for several reasons:

  1. Tracing is not enabled:
       * when the dump is requested (https://code.google.com/p/chromium/codesearch#chromium/src/base/trace_event/memory_dump_manager.cc&l=362),
       * during the dump (https://code.google.com/p/chromium/codesearch#chromium/src/base/trace_event/memory_dump_manager.cc&l=446),
       * when the dump is finalized (https://code.google.com/p/chromium/codesearch#chromium/src/base/trace_event/memory_dump_manager.cc&l=620).

  2. Dump request from child process failed (https://code.google.com/p/chromium/codesearch#chromium/src/components/tracing/child/child_memory_dump_manager_delegate_impl.cc&l=18).
     Unless there's a bug in Telemetry, this CANNOT be the cause of this issue
     because we send the Tracing.start request to the browser process.

  3. Another memory dump is in progress (https://code.google.com/p/chromium/codesearch#chromium/src/content/browser/tracing/tracing_controller_impl.cc&l=928).

  4. Some child process failed to dump (https://code.google.com/p/chromium/codesearch#chromium/src/content/browser/tracing/tracing_controller_impl.cc&l=1028).

I think that one of the following two things might have caused the flakiness:

  * The custom browser flags required by TracingBackendTest (https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/internal/backends/chrome_inspector/tracing_backend_unittest.py#L30)
    are not passed to the browser on which the test runs.

    nenguyen@: Have there been any changes to the way Telemetry tests are run
    (e.g. swarming) that could have caused this?

  * There might have been a recent change in Chrome that causes memory dumps
    to fail for reasons 1 or, more likely, 4.

In either case, the problem is that there's no way to determine *why* a memory
dump failed:

  primiano@: Could you please add DLOG calls on failures or, even better, extend
  the DevTools protocol with an optional string error message result parameter?

  nednguyen@: Does Telemetry display Chrome's (D)LOGs upon unittest failures?
We don't display DLOG on every unittest failure cuz not all unittest in telemetry involve the browser. You can get the browser's stdout upon failure using browser.GetStandardOutput(): https://code.google.com/p/chromium/codesearch#chromium/src/third_party/catapult/telemetry/telemetry/internal/browser/browser.py&l=272 or add a method to expose the log_file_path on browser API (https://code.google.com/p/chromium/codesearch#chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_backend.py&l=143) 
I don't think the custom browser flags is the cause cuz this is a flaky failure. You can also check the flag in the log here:
https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_rel_ng/builds/223360/steps/telemetry_unittests%20%28with%20patch%29%20on%20Windows-7-SP1/logs/stdio

Retrying failed tests (attempt #3 of 3)...

[1/1] telemetry.internal.backends.chrome_inspector.tracing_backend_unittest.TracingBackendTest.testDumpMemorySuccess failed unexpectedly 0.5690s:
  Chrome build location for win_AMD64 not found. Browser will be run without Flash.
  Requested remote debugging port: 0
  Starting Chrome ['../../out\\Release_x64\\chrome.exe', '--no-sandbox', '--enable-memory-benchmarking', '--enable-net-benchmarking', '--metrics-recording-only', '--no-default-browser-check', '--no-first-run', '--enable-gpu-benchmarking', '--disable-background-networking', '--no-proxy-server', '--disable-component-extensions-with-background-pages', '--disable-default-apps', '--remote-debugging-port=0', '--enable-crash-reporter-for-testing', '--window-size=1280,1024', '--user-data-dir=e:\\b\\swarm_slave\\work\\isolated\\tmpbuiesn\\tmpbeqotx', '--trace-config-file=e:\\b\\swarm_slave\\work\\isolated\\tmpbuiesn\\tmp2rhrdv\\chrome-trace-config.json', 'about:blank']
  Discovered ephemeral port 61085
  OS: win win7

Project Member

Comment 6 by chromium...@appspot.gserviceaccount.com, Jun 3 2016

Detected 5 new flakes for test/step "telemetry_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKwsSBUZsYWtlIiB0ZWxlbWV0cnlfdW5pdHRlc3RzICh3aXRoIHBhdGNoKQw. This message was posted automatically by the chromium-try-flakes app.
Project Member

Comment 7 by bugdroid1@chromium.org, Jun 3 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/138c1557c50620eb1bbc567e1938a658909ff061

commit 138c1557c50620eb1bbc567e1938a658909ff061
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Fri Jun 03 02:24:56 2016

Roll src/third_party/catapult/ 857c0b7e4..3a308b4e6 (12 commits).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/857c0b7e4db6..3a308b4e6d56

$ git log 857c0b7e4..3a308b4e6 --date=short --no-merges --format='%ad %ae %s'

BUG= 352807 ,616832, 608414 , 616483 

TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2033963002
Cr-Commit-Position: refs/heads/master@{#397578}

[modify] https://crrev.com/138c1557c50620eb1bbc567e1938a658909ff061/DEPS

Project Member

Comment 8 by chromium...@appspot.gserviceaccount.com, Jun 4 2016

Detected 11 new flakes for test/step "telemetry_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKwsSBUZsYWtlIiB0ZWxlbWV0cnlfdW5pdHRlc3RzICh3aXRoIHBhdGNoKQw. This message was posted automatically by the chromium-try-flakes app.
Project Member

Comment 9 by chromium...@appspot.gserviceaccount.com, Jun 6 2016

Detected 3 new flakes for test/step "telemetry_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKwsSBUZsYWtlIiB0ZWxlbWV0cnlfdW5pdHRlc3RzICh3aXRoIHBhdGNoKQw. This message was posted automatically by the chromium-try-flakes app.
Project Member

Comment 10 by chromium...@appspot.gserviceaccount.com, Jun 7 2016

Detected 7 new flakes for test/step "telemetry_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKwsSBUZsYWtlIiB0ZWxlbWV0cnlfdW5pdHRlc3RzICh3aXRoIHBhdGNoKQw. This message was posted automatically by the chromium-try-flakes app.
Project Member

Comment 11 by chromium...@appspot.gserviceaccount.com, Jun 8 2016

Detected 6 new flakes for test/step "telemetry_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKwsSBUZsYWtlIiB0ZWxlbWV0cnlfdW5pdHRlc3RzICh3aXRoIHBhdGNoKQw. This message was posted automatically by the chromium-try-flakes app.
Petr, this is blocking many other's CLs already. Can you find someone to help with fixing this?
Ned: The logs now contain standard output from the browser, but it's always empty, which I suppose is due to the fact that the perfbots use release builds.

Primiano: It seems that adding DLOGs to all code locations mentioned above (https://bugs.chromium.org/p/chromium/issues/detail?id=616483#c3) will NOT work here (since we're using release builds on perfbots). Can you think of any other way than plumbing the dump failure reasons through the DevTools protocol?
To the bot support --enable-logging -v? If so we could add a VLOG (verbose log).
Does this issue reproduce locally by any chance?
Status: Started (was: Untriaged)
I'll land a patch tomorrow that logs reasons for memory dump failures: https://codereview.chromium.org/2049143002/
Hopefully, it will help us figure out what's going on.
Project Member

Comment 16 by bugdroid1@chromium.org, Jun 9 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8303e43ab4eea56f6e173e92215e36f5ecfec0eb

commit 8303e43ab4eea56f6e173e92215e36f5ecfec0eb
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Thu Jun 09 01:35:10 2016

Roll src/third_party/catapult/ 901f7edd7..8ff23141e (14 commits).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/901f7edd757b..8ff23141e1df

$ git log 901f7edd7..8ff23141e --date=short --no-merges --format='%ad %ae %s'

BUG=552376,552376, 613706 ,617849, 616483 ,616832

TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2048953003
Cr-Commit-Position: refs/heads/master@{#398760}

[modify] https://crrev.com/8303e43ab4eea56f6e173e92215e36f5ecfec0eb/DEPS

Project Member

Comment 17 by chromium...@appspot.gserviceaccount.com, Jun 9 2016

Detected 17 new flakes for test/step "telemetry_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKwsSBUZsYWtlIiB0ZWxlbWV0cnlfdW5pdHRlc3RzICh3aXRoIHBhdGNoKQw. This message was posted automatically by the chromium-try-flakes app.
Project Member

Comment 18 by bugdroid1@chromium.org, Jun 9 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7d8aadbdcca5302d1c84637aeca865a791afd5da

commit 7d8aadbdcca5302d1c84637aeca865a791afd5da
Author: petrcermak <petrcermak@chromium.org>
Date: Thu Jun 09 11:13:04 2016

[memory-infra] Log reasons for memory dump failure

All memory dump failures will now lead to VLOG(1) error messages so that
the reason for failure could be determined from outside chrome (e.g. in
Telemetry).

BUG= 616483 
TBR=shatch

Review-Url: https://codereview.chromium.org/2049143002
Cr-Commit-Position: refs/heads/master@{#398838}

[modify] https://crrev.com/7d8aadbdcca5302d1c84637aeca865a791afd5da/base/trace_event/memory_dump_manager.cc
[modify] https://crrev.com/7d8aadbdcca5302d1c84637aeca865a791afd5da/base/trace_event/memory_dump_manager.h
[modify] https://crrev.com/7d8aadbdcca5302d1c84637aeca865a791afd5da/components/tracing/child/child_memory_dump_manager_delegate_impl.cc
[modify] https://crrev.com/7d8aadbdcca5302d1c84637aeca865a791afd5da/content/browser/tracing/tracing_controller_impl.cc

Now that logging is enabled (see previous comment), we have to wait for the next flake. Hopefully, it'll provide us with more info.
Project Member

Comment 20 by chromium...@appspot.gserviceaccount.com, Jun 10 2016

Detected 19 new flakes for test/step "telemetry_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKwsSBUZsYWtlIiB0ZWxlbWV0cnlfdW5pdHRlc3RzICh3aXRoIHBhdGNoKQw. This message was posted automatically by the chromium-try-flakes app.
It turns out that all messages from Chrome are stored in the browser's log file (rather than printed to standard output). Unfortunately, as far as I can tell, the logs are thrown away because "Environment variable DISABLE_CLOUD_STORAGE_IO is set to 1" :-\

I uploaded a Telemetry patch that, in the case of a test failure, prints the browser log contents as well (https://codereview.chromium.org/2053313002/). Furtmermore, I manually checked that the printed log is non-empty on Windows (it's actually quite verbose) if I force an exception in the flaky test.

The only thing (that I can think of right now) that could prevent this approach from working is if, for some reason, the error message is not flushed into the log before it's read by Telemetry. Let's hope that's not the case.
Blockedon: 612779
Project Member

Comment 23 by bugdroid1@chromium.org, Jun 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/cbd90f2b8b145f79e09cb0848c6d34df214898aa

commit cbd90f2b8b145f79e09cb0848c6d34df214898aa
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Fri Jun 10 14:43:37 2016

Roll src/third_party/catapult/ 2c5cde737..e37097a1c (1 commit).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/2c5cde737888..e37097a1cbf9

$ git log 2c5cde737..e37097a1c --date=short --no-merges --format='%ad %ae %s'

BUG= 616483 , 612779 

TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2058073002
Cr-Commit-Position: refs/heads/master@{#399180}

[modify] https://crrev.com/cbd90f2b8b145f79e09cb0848c6d34df214898aa/DEPS

And now we play the waiting game (for another flake)...

Comment 25 by kbr@chromium.org, Jun 10 2016

Blockedon: 619106

Comment 26 by kbr@chromium.org, Jun 10 2016

Cc: zmo@chromium.org kbr@chromium.org piman@chromium.org
See also  Issue 619106  for a possible cause.

#26: That is very likely. Thanks! I'll wait until the flake happens again and mark it as a duplicate of 619106 if appropriate.
Mergedinto: 619106
Status: Duplicate (was: Started)
OK, I can confirm that the issue is due to a GPU failure (https://build.chromium.org/p/chromium.win/builders/Win7%20Tests%20%28dbg%29%281%29/builds/49551/steps/telemetry_unittests%20on%20Windows-7-SP1/logs/stdio):

[2228:1488:0610/115003:FATAL:gpu_info_collector.cc(104)] Check failed: gl::GetGLImplementation() != gl::kGLImplementationNone (0 vs. 0)
  Backtrace:
  	base::debug::StackTrace::StackTrace [0x000007FEE17C5571+33]
  	logging::LogMessage::~LogMessage [0x000007FEE1722FAC+76]
  	gpu::CollectGraphicsInfoGL [0x000007FEDE67E06B+335]
  	gpu::CollectContextGraphicsInfo [0x000007FEDE67FFDC+452]
  	content::GpuChildThread::OnCollectGraphicsInfo [0x000007FEE12B27E1+149]
  	??$Dispatch@VGpuChildThread@content@@V12@XP812@EAAXXZ@?$MessageT@UGpuMsg_CollectGraphicsInfo_Meta@@V?$tuple@$$V@std@@X@IPC@@SA_NPEBVMessage@1@PEAVGpuChildThread@content@@1PEAXP834@EAAXXZ@Z [0x000007FEE12B06F6+190]
  	content::GpuChildThread::OnControlMessageReceived [0x000007FEE12B2DBF+943]
  	content::ChildThreadImpl::OnMessageReceived [0x000007FEDFF7056B+171]
  	content::GpuChildThread::OnMessageReceived [0x000007FEE12B3C56+34]
  	IPC::ChannelProxy::Context::OnDispatchMessage [0x000007FEDDC86000+40]
  	base::debug::TaskAnnotator::RunTask [0x000007FEE17C67CC+316]
  	base::MessageLoop::RunTask [0x000007FEE1730F6C+876]
  	base::MessageLoop::DoWork [0x000007FEE172FF10+752]
  	base::MessagePumpForGpu::DoRunLoop [0x000007FEE178A01A+186]
  	base::MessagePumpWin::Run [0x000007FEE178AE64+84]
  	base::RunLoop::Run [0x000007FEE17BC17E+46]
  	base::MessageLoop::Run [0x000007FEE1730B0B+107]
  	content::GpuMain [0x000007FEE12B54AE+2426]
  	content::RunNamedProcessTypeMain [0x000007FEE16FB0F0+448]
  	content::ContentMainRunnerImpl::Run [0x000007FEE16FAED6+358]
  	content::ContentMain [0x000007FEE16F9FA8+48]
  	ChromeMain [0x000007FEDD9C8424+148]
  	MainDllLoader::Launch [0x000000013FDB9826+838]
  	wWinMain [0x000000013FDB7BD3+611]
  	__scrt_common_main_seh [0x00000001408989B6+286] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:255)
  	BaseThreadInitThunk [0x0000000076C859CD+13]
  	RtlUserThreadStart [0x0000000076EBB891+33]
  
  [5996:4596:0610/115004:VERBOSE1:message_pipe_dispatcher.cc(109)] Closing message pipe 15119425933333389701 endpoint 0 [port=73FE752643878BDA.623B449E3C6F492E]
  [5996:4596:0610/115004:VERBOSE1:message_pipe_dispatcher.cc(109)] Closing message pipe 13771432344605787599 endpoint 1 [port=EB929AFD4AAEE753.3DC39E98B382D4DB]
  [5996:4596:0610/115004:VERBOSE1:message_pipe_dispatcher.cc(109)] Closing message pipe 9187201950435737471 endpoint 0 [port=BA16779B9BB3478.43C06E4F0FC3F95D]
  [5996:4596:0610/115004:VERBOSE1:node_controller.cc(445)] Dropped peer DAD12464AC277DEC.C58CD361D2B5AB11
  [5996:4612:0610/115004:VERBOSE1:tracing_controller_impl.cc(1005)] Memory-infra dump failed because of NACK from child 2228
Project Member

Comment 29 by bugdroid1@chromium.org, Jun 15 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7d8aadbdcca5302d1c84637aeca865a791afd5da

commit 7d8aadbdcca5302d1c84637aeca865a791afd5da
Author: petrcermak <petrcermak@chromium.org>
Date: Thu Jun 09 11:13:04 2016

[memory-infra] Log reasons for memory dump failure

All memory dump failures will now lead to VLOG(1) error messages so that
the reason for failure could be determined from outside chrome (e.g. in
Telemetry).

BUG= 616483 
TBR=shatch

Review-Url: https://codereview.chromium.org/2049143002
Cr-Commit-Position: refs/heads/master@{#398838}

[modify] https://crrev.com/7d8aadbdcca5302d1c84637aeca865a791afd5da/base/trace_event/memory_dump_manager.cc
[modify] https://crrev.com/7d8aadbdcca5302d1c84637aeca865a791afd5da/base/trace_event/memory_dump_manager.h
[modify] https://crrev.com/7d8aadbdcca5302d1c84637aeca865a791afd5da/components/tracing/child/child_memory_dump_manager_delegate_impl.cc
[modify] https://crrev.com/7d8aadbdcca5302d1c84637aeca865a791afd5da/content/browser/tracing/tracing_controller_impl.cc

Project Member

Comment 30 by bugdroid1@chromium.org, Jun 15 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/cbd90f2b8b145f79e09cb0848c6d34df214898aa

commit cbd90f2b8b145f79e09cb0848c6d34df214898aa
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Fri Jun 10 14:43:37 2016

Roll src/third_party/catapult/ 2c5cde737..e37097a1c (1 commit).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/2c5cde737888..e37097a1cbf9

$ git log 2c5cde737..e37097a1c --date=short --no-merges --format='%ad %ae %s'

BUG= 616483 , 612779 

TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2058073002
Cr-Commit-Position: refs/heads/master@{#399180}

[modify] https://crrev.com/cbd90f2b8b145f79e09cb0848c6d34df214898aa/DEPS

Sign in to add a comment