[TaskScheduler] Slow thread creation results in browser hangs [was: Chrome_ChromeOS: Crash Report - base::`anonymous namespace'::CreateThread] |
|||||||||||
Issue descriptionreporter:mkarkada@google.com crash_analysis_section:start crash_analysis_section:end Magic Signature: base::`anonymous namespace'::CreateThread Crash link: https://crash.corp.google.com//browse?q=ReportID%3D'0f8387e722d4f8a1'%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D'base%3A%3A%60anonymous%20namespace%5C'%3A%3ACreateThread'&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#3 ------------------------------------------------------------------------------- Sample Report ------------------------------------------------------------------------------- Product name: Chrome_ChromeOS Magic Signature : base::`anonymous namespace'::CreateThread Product Version: 62.0.3202.55 Process type: browser Report ID: 0f8387e722d4f8a1 Report Url: https://crash.corp.google.com/0f8387e722d4f8a1 Report Time: 2017-10-16T16:05:37-07:00 Upload Time: 2017-10-16T16:06:14.967-07:00 Uptime: 258617134 ms CumulativeProductUptime: 0 ms OS Name: Linux OS Version: 0.0.0 Linux 3.18.0-16036-g30f3f9ed6ff1 #1 SMP PREEMPT Thu Oct 12 22:01:26 PDT 2017 x86_64 CPU Architecture: amd64 CPU Info: family 6 model 78 stepping 3 ------------------------------------------------------------------------------- Crashing thread: Thread index: 11. Stack Quality: 99%. Thread id: 26957. ------------------------------------------------------------------------------- 0x00007dfd16ce2f6f (libc-2.23.so + 0x000f6f6f) clone 0x00007dfd17d8ca38 (libpthread-2.23.so - pthread_create.c: 679) __pthread_create_2_1 0x00000b80af677e54 (chrome - platform_thread_posix.cc: 114) base::(anonymous namespace)::CreateThread(unsigned long, bool, base::PlatformThread::Delegate*, base::PlatformThreadHandle*, base::ThreadPriority) 0x00000b80af6b092a (chrome - scheduler_worker.cc: 118) base::internal::SchedulerWorker::Thread::Create(scoped_refptr<base::internal::SchedulerWorker>) 0x00000b80af6b0801 (chrome - scheduler_worker.cc: 206) base::internal::SchedulerWorker::Start() 0x00000b80af6b1f67 (chrome - scheduler_worker_pool_impl.cc: 612) base::internal::SchedulerWorkerPoolImpl::CreateRegisterAndStartSchedulerWorkerLockRequired() 0x00000b80af6b2d0b (chrome - scheduler_worker_pool_impl.cc: 565) base::internal::SchedulerWorkerPoolImpl::WakeUpOneWorkerLockRequired() 0x00000b80af6b2297 (chrome - scheduler_worker_pool_impl.cc: 553) base::internal::SchedulerWorkerPoolImpl::ScheduleSequence(scoped_refptr<base::internal::Sequence>) 0x00000b80af6b11a3 (chrome - scheduler_worker_pool.cc: 186) base::internal::SchedulerWorkerPool::PostTaskWithSequenceNow(std::unique_ptr<base::internal::Task, std::default_delete<base::internal::Task> >, scoped_refptr<base::internal::Sequence>) 0x00000b80af6b10de (chrome - scheduler_worker_pool.cc: 132) base::internal::SchedulerWorkerPool::PostTaskWithSequence(std::unique_ptr<base::internal::Task, std::default_delete<base::internal::Task> >, scoped_refptr<base::internal::Sequence>) 0x00000b80af6b14f6 (chrome - scheduler_worker_pool.cc: 85) base::internal::SchedulerSequencedTaskRunner::PostDelayedTask(tracked_objects::Location const&, base::OnceCallback<void ()>, base::TimeDelta) 0x00000b80af66cd36 (chrome - task_runner.cc: 47) base::(anonymous namespace)::PostTaskAndReplyTaskRunner::PostTask(tracked_objects::Location const&, base::OnceCallback<void ()>) 0x00000b80af6782a2 (chrome - post_task_and_reply_impl.cc: 91) base::internal::PostTaskAndReplyImpl::PostTaskAndReply(tracked_objects::Location const&, base::OnceCallback<void ()>, base::OnceCallback<void ()>) 0x00000b80af66cc80 (chrome - task_runner.cc: 53) base::TaskRunner::PostTaskAndReply(tracked_objects::Location const&, base::OnceCallback<void ()>, base::OnceCallback<void ()>) 0x00000b80ae9ed80b (chrome - crash_handler_host_linux.cc: 394) breakpad::CrashHandlerHostLinux::FindCrashingThreadAndDump(int, std::string const&, std::unique_ptr<char [], std::default_delete<char []> >, std::unique_ptr<google_breakpad::NonAllocatingMap<256ul, 256ul, 64ul>, std::default_delete<google_breakpad::NonAllocatingMap<256ul, 256ul, 64ul> > >, unsigned long, unsigned long, int, int) 0x00000b80ae9ed315 (chrome - crash_handler_host_linux.cc: 297) breakpad::CrashHandlerHostLinux::OnFileCanReadWithoutBlocking(int) 0x00000b80af635e58 (chrome - message_pump_libevent.cc: 97) base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) 0x00000b80af6be50b (chrome - event.c: 381) event_base_loop 0x00000b80af636184 (chrome - message_pump_libevent.cc: 257) base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) 0x00000b80af656485 (chrome - run_loop.cc: 123) base::RunLoop::Run() 0x00000b80aded7010 (chrome - browser_thread_impl.cc: 278) content::BrowserThreadImpl::IOThreadRun(base::RunLoop*) 0x00000b80aded70f8 (chrome - browser_thread_impl.cc: 313) content::BrowserThreadImpl::Run(base::RunLoop*) 0x00000b80af67d149 (chrome - thread.cc: 338) base::Thread::ThreadMain() 0x00000b80af67816c (chrome - platform_thread_posix.cc: 75) base::(anonymous namespace)::ThreadFunc(void*) 0x00007dfd17d8c2b7 (libpthread-2.23.so - pthread_create.c: 333) start_thread 0x00007dfd16ce2fac (libc-2.23.so + 0x000f6fac) clone
,
Oct 18 2017
+fukino@ - FYI
,
Oct 19 2017
From which place did you copy the file?
,
Oct 19 2017
I tried copying a file (of 1GB) from Downloads back to Downloads folder. I did copying this file for 8 times consecutively back to Downloads. Though copy operation was progressing, intermittent hang was observed and eventually saw browser crash.
,
Jan 12 2018
Issue 801199 has been merged into this issue.
,
Jan 26 2018
,
Jan 28 2018
It looks like a Linux kernel bug to me (can you easily repro? can you grab a system trace of some sort? (I'm not familiar with Linux..)). The thread count is only 47 (so this isn't TaskScheduler being in a CreateThread() frenzy per having many threads blocked on I/O per some file observer going nuts with notifications of these new files -- which was my first guess when seeing this report). The SIGABRT is in __pthread_create(), perhaps it has a timeout of its own and perhaps Linux uses multi-threaded file copy and bogs down itself while doing this massive copy? Now, while we can't do much about the kernel deciding to SIGABRT: I think we can help with the hangs caused by pthread_create being slow. On the TaskScheduler end it looks like we're holding the SchedulerWorkerPoolImpl::lock_ while creating a new worker. In this crash we see that two other worker threads are blocked on grabbing this lock to continue making progress. This lock inter-dependency is mostly unintentional actually as it was intended for the wake-up scenario, not creation [1]. Creating the physical thread can totally happen without holding the pool's lock IMO, the tricky part will be that we'll have an un-started worker in the list of |workers_|. This should mostly be fine (I think) but we'll need to support SchedulerWorker::WakeUp() not being a no-op if the thread isn't alive but rather guarantee that when it does come alive, it starts active rather than waiting for the first wake-up as it does today [2]. @fdoray per this being related to dynamic worker creation. [1] // SchedulerWorker needs |lock_| as a predecessor for its thread lock // because in WakeUpOneWorker, |lock_| is first acquired and then // the thread lock is acquired when WakeUp is called on the worker. https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker_pool_impl.cc?rcl=43e41f18912dc2cf5445b51b451490ef9d495d62&l=779 [2] ThreadMain : // A SchedulerWorker starts out waiting for work. https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker.cc?rcl=f25aebd1286cbcea060c7b3b3551d58e203f857c&l=42
,
Jan 28 2018
s/Linux kernel/ChromeOS kernel/ in #7 (misread OP)
,
Jan 28 2018
I actually think there's a kernel issue on CrOS (which makes pthread_create slow and eventually crash) which exercises a suboptimal impl in TaskScheduler. Perhaps this should be two bugs?
,
Mar 26 2018
Users experienced this crash on the following builds: Linux Dev 67.0.3377.1 - 1.01 CPM, 1 reports, 1 clients (signature base::`anonymous namespace'::CreateThread) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Apr 10 2018
+scottmg@ for breakpad Observation: breakpad%FindCrashingThreadAndDump() is involved in 63457/107952 (~59%) of crash call stacks. Maybe it's not a good idea to create a thread (via a post task to TaskScheduler) when the process is already crashing? Reports with FindCrashingThreadAndDump: https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3A%60anonymous%20namespace%5C%27%3A%3ACreateThread%27%20AND%20EXISTS%20(SELECT%201%20FROM%20UNNEST(CrashedStackTrace.StackFrame)%20WHERE%20FunctionName%20LIKE%20%27%25breakpad%25FindCrashingThreadAndDump%25%27)&stbtiq=&reportid=44fa1ab25b43c97c&index=1#3 All reports: https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3A%60anonymous%20namespace%5C%27%3A%3ACreateThread%27&stbtiq=&reportid=ae02727483d29f47&index=1#
,
Apr 10 2018
I'm not too familiar with Breakpad and even less so with Linux Breakpad, but it doesn't seem like a good idea to be creating a thread during a crash. (One example where it looks like we do that https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3A%60anonymous%20namespace%5C%27%3A%3ACreateThread%27%20AND%20EXISTS%20(SELECT%201%20FROM%20UNNEST(CrashedStackTrace.StackFrame)%20WHERE%20FunctionName%20LIKE%20%27%25breakpad%25FindCrashingThreadAndDump%25%27)&stbtiq=&reportid=46566d5ed6bbcee4&index=2#3 ) I don't think there will be a reasonable solution for this until we switch to Crashpad on Linux, which does out-of-process capture.
,
Apr 13 2018
(unassigning from myself, but keeping on the Internals>TaskScheduler radar) scottmg@: When can we expect Crashpad on Linux? Can you update NextAction field with this date? Thanks!
,
Apr 13 2018
I'm currently finishing up the last bits to enable Crashpad for Android. After that I'll be moving on to ChromeOS and then Linux, generally. There is still at least one more feature needed to enable Crashpad for Chrome with pid namespacing, but I'm still expecting this sometime this quarter.
,
May 3 2018
Not TaskScheduler's fault (removing from Internals>TaskScheduler). As mentioned above: posting a task during crash reporting is problematic. @jperaza to address as part of change in #14.
,
Aug 23
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by mkarkada@chromium.org
, Oct 16 2017Components: OS>Kernel Internals
Labels: M-63