New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 757162 link

Starred by 8 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug



Sign in to add a comment

Data race in pthread_mutex_lock

Project Member Reported by ClusterFuzz, Aug 19 2017

Issue description

Detailed report: https://clusterfuzz.com/testcase?key=6030249841393664

Fuzzer: tokenfuzz_pdf_april16
Job Type: linux_tsan_chrome_mp
Platform Id: linux

Crash Type: Data race ATOMIC READ 1
Crash Address: 0x7b0c00068be0
Crash State:
  pthread_mutex_lock
  pa_mutex_lock
  media::pulse::InitPulse
  
Sanitizer: thread (TSAN)

Regressed: https://clusterfuzz.com/revisions?job=linux_tsan_chrome_mp&range=495506:495507

Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=6030249841393664

Issue filed automatically.

See https://github.com/google/clusterfuzz-tools for more information.
 
 Issue 758877  has been merged into this issue.
Status: WontFix (was: Untriaged)
As per Bug ID -- 747664, ClusterFuzz detected the Crash state as flaky, so marking the issue as Won't Fix.
Thank You.
Project Member

Comment 3 by ClusterFuzz, Sep 5 2017

Labels: ReleaseBlock-Stable ClusterFuzz-Top-Crash
Testcase 5568827730690048 is a top crash on ClusterFuzz for linux platform. Marking this crash as a stable release blocker. If this is incorrect, remove the ReleaseBlock label.
Status: Untriaged (was: WontFix)
c#1 testcase still reproduces.
Project Member

Comment 5 by sheriffbot@chromium.org, Sep 6 2017

This issue is marked as a release blocker with no milestone associated. Please add an appropriate milestone.

All release blocking issues should have milestones associated to it, so that the issue can tracked and the fixes can be pushed promptly.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Cc: xhw...@chromium.org
Components: Blink>Media
Owner: dalecur...@chromium.org
Status: Assigned (was: Untriaged)
Dale, can you help to triage.
Hmm, seems like a bug in pulse audio? Not sure there's anything we can do here. Maybe try reporting to Pulse upstream or upgrading the Pulse version on the CF bots?
Labels: -ReleaseBlock-Stable
Actually, maybe we're supposed to acquire the lock before start. I'll review the pulse examples.
Cc: tommi@chromium.org olka@chromium.org
Components: -Blink>Media Internals>Media>Audio
They mix and match inside the pulse source tree, so I'm not sure which is most correct, but the simple test case does things differently then us.

https://github.com/pulseaudio/pulseaudio/blob/5de4b652cb578f6386e550eba4c5dec7dc0a81b0/src/pulse/simple.c

I'll try that and see if that helps here. Still dropping RBS since this has been around forever.

cc: some other audio folk who might know.

Comment 11 by aarya@google.com, Sep 8 2017

Cc: msrchandra@chromium.org alokp@chromium.org
 Issue 761660  has been merged into this issue.
Project Member

Comment 13 by ClusterFuzz, Sep 15 2017

Labels: ClusterFuzz-Verified
Status: Verified (was: Started)
ClusterFuzz testcase 5568827730690048 is verified as fixed, so closing issue as verified.

If this is incorrect, please add ClusterFuzz-Wrong label and re-open the issue.
Cc: pnangunoori@chromium.org dalecur...@chromium.org
 Issue 765982  has been merged into this issue.
Labels: ClusterFuzz-Wrong
Status: Assigned (was: Verified)
Project Member

Comment 16 by bugdroid1@chromium.org, Sep 18 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5f5a83284c3a137e1a848c7590996c1f637526fe

commit 5f5a83284c3a137e1a848c7590996c1f637526fe
Author: Dale Curtis <dalecurtis@chromium.org>
Date: Mon Sep 18 20:51:49 2017

Reorder pulseaudio initialization routines to match pa_simple_new().

Changes the order of where we start the pa_threaded_mainloop to be
after we've configured the mainloop and context callbacks. This is
the same model used by pa_simple_new().

https://github.com/pulseaudio/pulseaudio/blob/master/src/pulse/simple.c#L137

BUG= 757162 
TEST=audio still works.

Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Change-Id: Iec04fae58ba0d2b1849f5a95e030b8bb76ef9fc0
Reviewed-on: https://chromium-review.googlesource.com/664362
Reviewed-by: Olga Sharonova <olka@chromium.org>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#502670}
[modify] https://crrev.com/5f5a83284c3a137e1a848c7590996c1f637526fe/media/audio/pulse/pulse_util.cc

Labels: -ClusterFuzz-Wrong
ClusterFuzz can't do much with verification here since it is a flaky data race crash. To verify this, we just need to open https://clusterfuzz.com/v2/testcase-detail/6030249841393664?noredirect=1 and click on stats button to see crash trends/frequency and see if it gets fixed with that.
Cc: kkaluri@chromium.org
 Issue 766647  has been merged into this issue.
Seems the crash has moved along a little bit, but still of the same type:

https://cs.chromium.org/chromium/src/media/audio/pulse/pulse_util.cc?l=162

Not sure how to fix this one. There are some differences in our signal callback, i.e. we may be waking up the signal too early. I'll see if that helps, but if not our code matches what pulse has so I'm not sure what to do here otherwise.
Labels: ClusterFuzz-Ignore
Adding ClusterFuzz-Ignore so that this is not refiled.
 Issue 767592  has been merged into this issue.
Cc: kbr@chromium.org rtoy@chromium.org hongchan@chromium.org
 Issue 768603  has been merged into this issue.
Cc: glider@chromium.org
Actually it looks like this might be a dupe of issue 244856, which has a TSAN suppression present for it:

https://cs.chromium.org/chromium/src/build/sanitizers/tsan_suppressions.cc?l=86

+glider. I have one more fix I'll try, but if that doesn't work I think we'll need to suppress since we're otherwise following the recommended model from Pulse.
Hmm, finally got this to repro on my desktop. Looking at the pulse audio 4.0 code (current version 11.0!!), the bit we're failing at (I think) is:

https://github.com/pulseaudio/pulseaudio/blob/v4.0/src/pulsecore/thread-posix.c#L150

I'm not sure whether this is a case of pulseaudio having bad threading or if it's usage of atomics is confusing to TSAN, but in either case I haven't been able to find a workaround, so I'm just going to expand the existing suppression for this case.
Are we still talking about the bug reported at https://clusterfuzz.com/v2/testcase-detail/6030249841393664?noredirect=1? Or is it https://clusterfuzz.com/v2/testcase-detail/6170133147156480?noredirect=1 already?

Both report data races on a pthread_mutex_t, so I doubt pa_thread_self() is related.
The first one; the stack is incomplete, but pa_thread_self is the only place from pa_threaded_mainloop_wait() which calls pa_xmalloc AFAICT. Which, unless I'm misreading, is racing with pthread_mutex_lock per the trace.
Project Member

Comment 28 by bugdroid1@chromium.org, Sep 27 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4419eeffc1defc72a1c1cab8404f43e7af95316d

commit 4419eeffc1defc72a1c1cab8404f43e7af95316d
Author: Dale Curtis <dalecurtis@chromium.org>
Date: Wed Sep 27 17:13:04 2017

Update suppression for pulse audio tsan issues.

Seems the signature for this has changed slightly for some reason.

BUG=244856, 757162 
TEST=no more race

Change-Id: Ia733eb93e65e27a93d978c2b15e5901549335e5b
Reviewed-on: https://chromium-review.googlesource.com/683156
Reviewed-by: Alexander Potapenko <glider@chromium.org>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#504713}
[modify] https://crrev.com/4419eeffc1defc72a1c1cab8404f43e7af95316d/build/sanitizers/tsan_suppressions.cc

Hmm, suppression doesn't seem to be taking affect, but did appear to work locally.

@glider, what do you recommend as a suppression for:

  Atomic read of size 1 at 0x7b0c00009930 by thread T1:
    #0 pthread_mutex_lock ??:? (audio_unittests+0x47399e)
    #1 pa_mutex_lock ??:? (libpulsecommon-4.0.so+0x4508d)

Just pa_mutex_lock? Not a lot of stack unfortunately :/

https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.gpu.fyi%2FLinux_GPU_TSAN_Release%2F9668%2F%2B%2Frecipes%2Fsteps%2Faudio_unittests_on_NVIDIA_GPU_on_Linux_on_Ubuntu%2F0%2Flogs%2FAudioInputTest.Record%2F0
libpulsecommon-4.0.so, maybe?
Oh neat, I didn't know you could create suppressions like that:

https://chromium-review.googlesource.com/c/chromium/src/+/690578
Project Member

Comment 32 by ClusterFuzz, Oct 1 2017

Components: Blink>Media>Audio
Labels: Test-Predator-AutoComponents
Automatically applying components based on information from OWNERS files. If this seems incorrect, please apply the Test-Predator-Wrong-Components label.
Project Member

Comment 33 by bugdroid1@chromium.org, Oct 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/bef716c8e4935748cd5a5f226786608401bbfc1c

commit bef716c8e4935748cd5a5f226786608401bbfc1c
Author: Dale Curtis <dalecurtis@chromium.org>
Date: Mon Oct 02 17:26:04 2017

Suppress all of libpulsecommon*.so, seems no smaller suppression.

More targeted suppressions don't seem sufficient and there's not
a lot of stack to go on, so lets add a blanket suppression for
pulseaudio, since our bots and dev machines are running ancient
versions.

BUG=244856, 757162 
TEST=none

Change-Id: Ife022ff1584296279dde9bc4a9831fba01138f83
Reviewed-on: https://chromium-review.googlesource.com/690578
Reviewed-by: Alexander Potapenko <glider@chromium.org>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#505661}
[modify] https://crrev.com/bef716c8e4935748cd5a5f226786608401bbfc1c/build/sanitizers/tsan_suppressions.cc

Cc: ajha@chromium.org
 Issue 767904  has been merged into this issue.
Status: Fixed (was: Assigned)
Stats have gone to zero, so it seems fixed/suppressed now.
Labels: -Test-Predator-AutoComponents Test-Predator-Auto-Components

Sign in to add a comment