New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 721072 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Jun 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

MSAN: MemoryDumpManagerTest.TestBackgroundTracingSetup occasionally taking 40+ minutes

Project Member Reported by mbjorge@chromium.org, May 10 2017

Issue description



When run under MSAN base_unittestss occasionally takes significantly longer (40-60mins) instead of the usualy 10-20 seconds.

Example https://build.chromium.org/p/chromium.memory/builders/Linux%20MSan%20Tests/builds/704


The logs when this happen are characterized by the presence of:

...
Still waiting for the following processes to finish:
	./base_unittests --brave-new-test-launcher --gtest_flagfile=/b/s/w/itw8rsKu/.org.chromium.Chromium.6Coy45/.org.chromium.Chromium.LfjDMI --single-process-tests --test-launcher-bot-mode --test-launcher-output=/b/s/w/itw8rsKu/.org.chromium.Chromium.zUyPdp/test_results.xml --test-launcher-print-test-stdio=always --test-launcher-summary-output=/b/s/w/io7eUJhl/output.json
Still waiting for the following processes to finish:
	./base_unittests --brave-new-test-launcher --gtest_flagfile=/b/s/w/itw8rsKu/.org.chromium.Chromium.6Coy45/.org.chromium.Chromium.LfjDMI --single-process-tests --test-launcher-bot-mode --test-launcher-output=/b/s/w/itw8rsKu/.org.chromium.Chromium.zUyPdp/test_results.xml --test-launcher-print-test-stdio=always --test-launcher-summary-output=/b/s/w/io7eUJhl/output.json
Still waiting for the following processes to finish:
	./base_unittests --brave-new-test-launcher --gtest_flagfile=/b/s/w/itw8rsKu/.org.chromium.Chromium.6Coy45/.org.chromium.Chromium.LfjDMI --single-process-tests --test-launcher-bot-mode --test-launcher-output=/b/s/w/itw8rsKu/.org.chromium.Chromium.zUyPdp/test_results.xml --test-launcher-print-test-stdio=always --test-launcher-summary-output=/b/s/w/io7eUJhl/output.json
...
[2525/2534] MemoryDumpManagerTest.TestBackgroundTracingSetup (TIMED OUT)

And then 
MemoryDumpManagerTest.TestBackgroundTracingSetup retries and runs successfully very quickly.

This happens somewhat regularly on internal chromecast builds for x86/linux. Looking at our metrics, it looks like our msan test builds started slowing down around 4/30 or 5/1 (but we often lag behind the Chromium tree by a few days).
 

Comment 2 by hjd@chromium.org, May 10 2017

Status: Assigned (was: Untriaged)
Interesting, I thought I had fixed a hang that sometimes occurred when running that test in https://codereview.chromium.org/2861133002/ which landed on 5/5 but it seems not. I can take a look tomorrow.
Ah, our next roll will pick up that change probably tomorrow (we are sync'd up to 5/4 as of today), so that will probably help. Though build 704 in the original post is from today, so maybe not.

I will update the bug when we pick up that change though. We're hitting this in ~50% of our base_unittest msan runs, so it should be pretty obvious if that fix is helping

Comment 4 by hjd@chromium.org, May 11 2017

Managed to replicate on ToT:
- Follow instructions here to get a msan directory: https://www.chromium.org/developers/testing/memorysanitizer#TOC-How-to-build-and-run
- Run: ninja -C out/msan base_unittests -j 320 && out/msan/base_unittests --gtest_filter='MemoryDumpManagerTest.TestBackgroundTracingSetup' --gtest_repeat=100

Comment 5 by hjd@chromium.org, May 11 2017

Cc: primiano@chromium.org

Comment 6 by hjd@chromium.org, May 11 2017

Have identified the problem (thanks primiano!) and I'm uploading a fix now: https://codereview.chromium.org/2881563002

Comment 7 by hjd@chromium.org, May 11 2017

Okay I've sent the fix to the CQ so hopefully that will land shortly and then then you'll pick up the fix in a few days. Sorry about that, I know how much long cycle times suck.

Thanks for pointing out the bug though, super helpful! :)
Awesome, thanks for the quick fix!
Project Member

Comment 9 by bugdroid1@chromium.org, May 11 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f1610b19b843e8cdc00a215074f9c9e04de9dd17

commit f1610b19b843e8cdc00a215074f9c9e04de9dd17
Author: hjd <hjd@chromium.org>
Date: Thu May 11 17:16:18 2017

memory-infra: Fix deadlock in MemoryDumpManager

When we destroy a memory dump manager (MDM is a singleton so this only
happens in tests) we attempt to stop the thread under the lock. This
can lead to a deadlock if there are outstanding tasks which also try to
take the lock. Instead we should get a pointer to the thread under the
lock, stop the thread, then finally reset the pointer which is what this
CL does.

BUG= 721072 

Review-Url: https://codereview.chromium.org/2881563002
Cr-Commit-Position: refs/heads/master@{#470989}

[modify] https://crrev.com/f1610b19b843e8cdc00a215074f9c9e04de9dd17/base/trace_event/memory_dump_manager.cc

Comment 10 by hjd@chromium.org, Jun 1 2017

Status: Fixed (was: Assigned)
Hopefully everything is fixed at this point and the cycle time of the internal bots has recovered, please reopen if not :)
Ah, sorry, yes everything appears to be good now. Thanks!

Sign in to add a comment