Crash in rtc::FatalMessage::~FatalMessage |
||||||||||
Issue descriptionDetailed report: https://clusterfuzz.com/testcase?key=6244586442981376 Fuzzer: inferno_layout_test_unmodified Job Type: android_asan_chrome_latest Platform Id: android:hammerhead:l Crash Type: UNKNOWN Crash Address: Crash State: rtc::FatalMessage::~FatalMessage rtc::PlatformThread::Start webrtc::internal::Call::Call Sanitizer: address (ASAN) Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=6244586442981376 Issue filed automatically. See https://dev.chromium.org/Home/chromium-security/bugs/reproducing-clusterfuzz-bugs for more information.
,
Jul 31 2017
I did a bisect and the problem started with this WebRTC roll: https://chromium.googlesource.com/chromium/src/+/714a743dbf866eb7bd2084c58217d36ce6f4881b WebRTC range: https://chromium.googlesource.com/external/webrtc/trunk/webrtc.git/+log/d6b3a36..b2a8855
,
Jul 31 2017
A bisect of the WebRTC error range shows that the crash starts with this CL: https://chromium.googlesource.com/external/webrtc/trunk/webrtc.git/+/c0ff88b15124baa1dbcd671f4b7f8ffeba5b7144 Assigning to nisse@, who authored it.
,
Jul 31 2017
,
Jul 31 2017
,
Jul 31 2017
,
Aug 20 2017
,
Sep 5 2017
ClusterFuzz testcase 6244586442981376 is flaky and no longer crashes, so closing issue. If this is incorrect, please add ClusterFuzz-Wrong label and re-open the issue.
,
Sep 7 2017
nisse@: Can you take a look at this? Apparently it's flaky, but I could reproduce reliably and tracked it to the CL in #3 a few weeks ago.
,
Sep 7 2017
Issue 762366 has been merged into this issue.
,
Sep 7 2017
I'll investigate.
,
Sep 8 2017
Log messages just prior to the crash: [1:18:0908/113418.666550:ERROR:platform_thread_posix.cc(123)] pthread_create: Resource temporarily unavailable (11) [1:18:0908/113418.667014:ERROR:thread.cc(117)] failed to create thread [1:18:0908/113418.667369:FATAL:task_queue.cc(104)] Check failed: result. #0 0x7f4375c7869d base::debug::StackTrace::StackTrace() #1 0x7f4375c76a6c base::debug::StackTrace::StackTrace() #2 0x7f4375d0762a logging::LogMessage::~LogMessage() #3 0x7f4370446d07 rtc::TaskQueue::TaskQueue() #4 0x7f437162126f webrtc::internal::Call::Call() #5 0x7f43716205a3 webrtc::Call::Create() #6 0x7f437163a679 webrtc::CallFactory::CreateCall() #7 0x7f4371dfd93b webrtc::PeerConnectionFactory::CreateCall_w() My interpretation is as follows: The clusterfuzz scripts creates lots of peer connections (I could add some logging to peerconnection constructor and destructor to confirm this). These peerconnection consumes a lot of threads. Finally, creating a new thread fails, and then a CHECK in chrome's TaskQueue constructor crashes. Improving error handling in this case seems difficult. It could also be that we have a thread leak when a peerconnections is destroyed.
,
Sep 8 2017
I've added logging of number of peer connections and number of threads in the Peerconnection constructor and destructor. The test creates 10402 peerconnections before crashing, never destroying any. Log excerpt:
[133474:133525:0908/134015.685199:WARNING:peerconnection.cc(420)] PC, pid 133474: created 1, destroyed 0, #threads 26
...
[133474:133525:0908/135021.434895:WARNING:peerconnection.cc(420)] PC, pid 133474: created 10402, destroyed 0, #threads 31246
[133474:133526:0908/135021.443370:WARNING:rtc_event_log.cc(833)] Denied creation of additional WebRTC event logs. 5 logs open already.
[133474:133526:0908/135021.443974:ERROR:platform_thread_posix.cc(123)] pthread_create: Resource temporarily unavailable (11)
[133474:133526:0908/135021.444316:ERROR:thread.cc(117)] failed to create thread
[133474:133526:0908/135021.444500:FATAL:task_queue.cc(104)] Check failed: result.
So it seems the process can't spawn more than appr. 32000 threads (tested on my linux workstation).
There's little we can do in webrtc about exhaustion of OS threads. It would be good if some of you on the Chrome team could take over the issue.
Maybe Chromium could enforce some arbitrary limit on the number of peer connections per render process, to fail in nicer manner? Or else, clusterfuzz needs to stop creating 10000 peer connections.
BTW: This function returns the number of threads on linux (requires --no-sandbox when running chrome):
int GetThreadCount() {
struct stat st;
if (stat ("/proc/self/task/", &st) < 0)
return 0;
else // 2 for "." and "..", plus one link per thread subdirectory.
return st.st_nlink - 2;
}
,
Sep 12 2017
I've filed bug https://bugs.chromium.org/p/chromium/issues/detail?id=764265 about adding a limit on number of peerconnections. Closing as WontFix, since we don't try to recover nicely from resource exhaustion.
,
Sep 20 2017
ClusterFuzz testcase 5386304924942336 is still reproducing on tip-of-tree build (trunk). If this testcase was not reproducible locally or unworkable, ignore this notification and we will file another bug soon with hopefully a better and workable testcase. Otherwise, if this is not intended to be fixed (e.g. this is an intentional crash), please add ClusterFuzz-Ignore label to prevent future bug filing with similar crash stacktrace. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by msrchandra@chromium.org
, Jul 10 2017Components: Blink>WebRTC
Labels: Test-Predator-Correct-CLs