MSAN: MemoryDumpManagerTest.TestBackgroundTracingSetup occasionally taking 40+ minutes |
||||
Issue descriptionWhen run under MSAN base_unittestss occasionally takes significantly longer (40-60mins) instead of the usualy 10-20 seconds. Example https://build.chromium.org/p/chromium.memory/builders/Linux%20MSan%20Tests/builds/704 The logs when this happen are characterized by the presence of: ... Still waiting for the following processes to finish: ./base_unittests --brave-new-test-launcher --gtest_flagfile=/b/s/w/itw8rsKu/.org.chromium.Chromium.6Coy45/.org.chromium.Chromium.LfjDMI --single-process-tests --test-launcher-bot-mode --test-launcher-output=/b/s/w/itw8rsKu/.org.chromium.Chromium.zUyPdp/test_results.xml --test-launcher-print-test-stdio=always --test-launcher-summary-output=/b/s/w/io7eUJhl/output.json Still waiting for the following processes to finish: ./base_unittests --brave-new-test-launcher --gtest_flagfile=/b/s/w/itw8rsKu/.org.chromium.Chromium.6Coy45/.org.chromium.Chromium.LfjDMI --single-process-tests --test-launcher-bot-mode --test-launcher-output=/b/s/w/itw8rsKu/.org.chromium.Chromium.zUyPdp/test_results.xml --test-launcher-print-test-stdio=always --test-launcher-summary-output=/b/s/w/io7eUJhl/output.json Still waiting for the following processes to finish: ./base_unittests --brave-new-test-launcher --gtest_flagfile=/b/s/w/itw8rsKu/.org.chromium.Chromium.6Coy45/.org.chromium.Chromium.LfjDMI --single-process-tests --test-launcher-bot-mode --test-launcher-output=/b/s/w/itw8rsKu/.org.chromium.Chromium.zUyPdp/test_results.xml --test-launcher-print-test-stdio=always --test-launcher-summary-output=/b/s/w/io7eUJhl/output.json ... [2525/2534] MemoryDumpManagerTest.TestBackgroundTracingSetup (TIMED OUT) And then MemoryDumpManagerTest.TestBackgroundTracingSetup retries and runs successfully very quickly. This happens somewhat regularly on internal chromecast builds for x86/linux. Looking at our metrics, it looks like our msan test builds started slowing down around 4/30 or 5/1 (but we often lag behind the Chromium tree by a few days).
,
May 10 2017
Interesting, I thought I had fixed a hang that sometimes occurred when running that test in https://codereview.chromium.org/2861133002/ which landed on 5/5 but it seems not. I can take a look tomorrow.
,
May 10 2017
Ah, our next roll will pick up that change probably tomorrow (we are sync'd up to 5/4 as of today), so that will probably help. Though build 704 in the original post is from today, so maybe not. I will update the bug when we pick up that change though. We're hitting this in ~50% of our base_unittest msan runs, so it should be pretty obvious if that fix is helping
,
May 11 2017
Managed to replicate on ToT: - Follow instructions here to get a msan directory: https://www.chromium.org/developers/testing/memorysanitizer#TOC-How-to-build-and-run - Run: ninja -C out/msan base_unittests -j 320 && out/msan/base_unittests --gtest_filter='MemoryDumpManagerTest.TestBackgroundTracingSetup' --gtest_repeat=100
,
May 11 2017
,
May 11 2017
Have identified the problem (thanks primiano!) and I'm uploading a fix now: https://codereview.chromium.org/2881563002
,
May 11 2017
Okay I've sent the fix to the CQ so hopefully that will land shortly and then then you'll pick up the fix in a few days. Sorry about that, I know how much long cycle times suck. Thanks for pointing out the bug though, super helpful! :)
,
May 11 2017
Awesome, thanks for the quick fix!
,
May 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f1610b19b843e8cdc00a215074f9c9e04de9dd17 commit f1610b19b843e8cdc00a215074f9c9e04de9dd17 Author: hjd <hjd@chromium.org> Date: Thu May 11 17:16:18 2017 memory-infra: Fix deadlock in MemoryDumpManager When we destroy a memory dump manager (MDM is a singleton so this only happens in tests) we attempt to stop the thread under the lock. This can lead to a deadlock if there are outstanding tasks which also try to take the lock. Instead we should get a pointer to the thread under the lock, stop the thread, then finally reset the pointer which is what this CL does. BUG= 721072 Review-Url: https://codereview.chromium.org/2881563002 Cr-Commit-Position: refs/heads/master@{#470989} [modify] https://crrev.com/f1610b19b843e8cdc00a215074f9c9e04de9dd17/base/trace_event/memory_dump_manager.cc
,
Jun 1 2017
Hopefully everything is fixed at this point and the cycle time of the internal bots has recovered, please reopen if not :)
,
Jun 1 2017
Ah, sorry, yes everything appears to be good now. Thanks! |
||||
►
Sign in to add a comment |
||||
Comment 1 by mbjorge@chromium.org
, May 10 2017