New issue
Advanced search Search tips

Issue 775274 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Chrome
Pri: 2
Type: Bug



Sign in to add a comment

[TaskScheduler] Slow thread creation results in browser hangs [was: Chrome_ChromeOS: Crash Report - base::`anonymous namespace'::CreateThread]

Project Member Reported by cr...@system.gserviceaccount.com, Oct 16 2017

Issue description

reporter:mkarkada@google.com

crash_analysis_section:start
crash_analysis_section:end

Magic Signature: base::`anonymous namespace'::CreateThread

Crash link: https://crash.corp.google.com//browse?q=ReportID%3D'0f8387e722d4f8a1'%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D'base%3A%3A%60anonymous%20namespace%5C'%3A%3ACreateThread'&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#3

-------------------------------------------------------------------------------
Sample Report
-------------------------------------------------------------------------------
Product name: Chrome_ChromeOS
Magic Signature : base::`anonymous namespace'::CreateThread
Product Version: 62.0.3202.55
Process type: browser
Report ID: 0f8387e722d4f8a1
Report Url: https://crash.corp.google.com/0f8387e722d4f8a1
Report Time: 2017-10-16T16:05:37-07:00
Upload Time: 2017-10-16T16:06:14.967-07:00
Uptime: 258617134 ms
CumulativeProductUptime: 0 ms
OS Name: Linux
OS Version: 0.0.0 Linux 3.18.0-16036-g30f3f9ed6ff1 #1 SMP PREEMPT Thu Oct 12 22:01:26 PDT 2017 x86_64
CPU Architecture: amd64
CPU Info: family 6 model 78 stepping 3

-------------------------------------------------------------------------------
Crashing thread: Thread index: 11. Stack Quality: 99%. Thread id: 26957.
-------------------------------------------------------------------------------
0x00007dfd16ce2f6f (libc-2.23.so + 0x000f6f6f)	clone
0x00007dfd17d8ca38 (libpthread-2.23.so - pthread_create.c: 679)	__pthread_create_2_1
0x00000b80af677e54 (chrome - platform_thread_posix.cc: 114)	base::(anonymous namespace)::CreateThread(unsigned long, bool, base::PlatformThread::Delegate*, base::PlatformThreadHandle*, base::ThreadPriority)
0x00000b80af6b092a (chrome - scheduler_worker.cc: 118)	base::internal::SchedulerWorker::Thread::Create(scoped_refptr<base::internal::SchedulerWorker>)
0x00000b80af6b0801 (chrome - scheduler_worker.cc: 206)	base::internal::SchedulerWorker::Start()
0x00000b80af6b1f67 (chrome - scheduler_worker_pool_impl.cc: 612)	base::internal::SchedulerWorkerPoolImpl::CreateRegisterAndStartSchedulerWorkerLockRequired()
0x00000b80af6b2d0b (chrome - scheduler_worker_pool_impl.cc: 565)	base::internal::SchedulerWorkerPoolImpl::WakeUpOneWorkerLockRequired()
0x00000b80af6b2297 (chrome - scheduler_worker_pool_impl.cc: 553)	base::internal::SchedulerWorkerPoolImpl::ScheduleSequence(scoped_refptr<base::internal::Sequence>)
0x00000b80af6b11a3 (chrome - scheduler_worker_pool.cc: 186)	base::internal::SchedulerWorkerPool::PostTaskWithSequenceNow(std::unique_ptr<base::internal::Task, std::default_delete<base::internal::Task> >, scoped_refptr<base::internal::Sequence>)
0x00000b80af6b10de (chrome - scheduler_worker_pool.cc: 132)	base::internal::SchedulerWorkerPool::PostTaskWithSequence(std::unique_ptr<base::internal::Task, std::default_delete<base::internal::Task> >, scoped_refptr<base::internal::Sequence>)
0x00000b80af6b14f6 (chrome - scheduler_worker_pool.cc: 85)	base::internal::SchedulerSequencedTaskRunner::PostDelayedTask(tracked_objects::Location const&, base::OnceCallback<void ()>, base::TimeDelta)
0x00000b80af66cd36 (chrome - task_runner.cc: 47)	base::(anonymous namespace)::PostTaskAndReplyTaskRunner::PostTask(tracked_objects::Location const&, base::OnceCallback<void ()>)
0x00000b80af6782a2 (chrome - post_task_and_reply_impl.cc: 91)	base::internal::PostTaskAndReplyImpl::PostTaskAndReply(tracked_objects::Location const&, base::OnceCallback<void ()>, base::OnceCallback<void ()>)
0x00000b80af66cc80 (chrome - task_runner.cc: 53)	base::TaskRunner::PostTaskAndReply(tracked_objects::Location const&, base::OnceCallback<void ()>, base::OnceCallback<void ()>)
0x00000b80ae9ed80b (chrome - crash_handler_host_linux.cc: 394)	breakpad::CrashHandlerHostLinux::FindCrashingThreadAndDump(int, std::string const&, std::unique_ptr<char [], std::default_delete<char []> >, std::unique_ptr<google_breakpad::NonAllocatingMap<256ul, 256ul, 64ul>, std::default_delete<google_breakpad::NonAllocatingMap<256ul, 256ul, 64ul> > >, unsigned long, unsigned long, int, int)
0x00000b80ae9ed315 (chrome - crash_handler_host_linux.cc: 297)	breakpad::CrashHandlerHostLinux::OnFileCanReadWithoutBlocking(int)
0x00000b80af635e58 (chrome - message_pump_libevent.cc: 97)	base::MessagePumpLibevent::OnLibeventNotification(int, short, void*)
0x00000b80af6be50b (chrome - event.c: 381)	event_base_loop
0x00000b80af636184 (chrome - message_pump_libevent.cc: 257)	base::MessagePumpLibevent::Run(base::MessagePump::Delegate*)
0x00000b80af656485 (chrome - run_loop.cc: 123)	base::RunLoop::Run()
0x00000b80aded7010 (chrome - browser_thread_impl.cc: 278)	content::BrowserThreadImpl::IOThreadRun(base::RunLoop*)
0x00000b80aded70f8 (chrome - browser_thread_impl.cc: 313)	content::BrowserThreadImpl::Run(base::RunLoop*)
0x00000b80af67d149 (chrome - thread.cc: 338)	base::Thread::ThreadMain()
0x00000b80af67816c (chrome - platform_thread_posix.cc: 75)	base::(anonymous namespace)::ThreadFunc(void*)
0x00007dfd17d8c2b7 (libpthread-2.23.so - pthread_create.c: 333)	start_thread
0x00007dfd16ce2fac (libc-2.23.so + 0x000f6fac)	clone

 
Cc: abod...@chromium.org mkarkada@chromium.org weifangsun@chromium.org dhadd...@chromium.org sdantul...@chromium.org
Components: OS>Kernel Internals
Labels: M-63
This crash was observed during copying of large set of files (total of around 8GB file copy) into Downloads folder of Files app. Intermittent device hang was also seen for about a min, where touch-keyboard and UI were not responding. Later the device recovered from hang.
Cc: fukino@chromium.org
+fukino@ - FYI

Comment 3 by oka@chromium.org, Oct 19 2017

From which place did you copy the file?
I tried copying a file (of 1GB) from Downloads back to Downloads folder. I did copying this file for 8 times consecutively back to Downloads. Though copy operation was progressing, intermittent hang was observed and eventually saw browser crash.
Cc: rbasuvula@chromium.org
Issue 801199 has been merged into this issue.

Comment 6 by bauerb@chromium.org, Jan 26 2018

Components: Internals>TaskScheduler

Comment 7 by gab@chromium.org, Jan 28 2018

Cc: gab@chromium.org robliao@chromium.org
Labels: -Restrict-View-EditIssue Stability-Hang M-64 M-65 M-66
Owner: fdoray@chromium.org
Status: Assigned (was: Untriaged)
Summary: [TaskScheduler] Slow thread creation results in browser hangs [was: Chrome_ChromeOS: Crash Report - base::`anonymous namespace'::CreateThread] (was: Chrome_ChromeOS: Crash Report - base::`anonymous namespace'::CreateThread)
It looks like a Linux kernel bug to me (can you easily repro? can you grab a system trace of some sort? (I'm not familiar with Linux..)). The thread count is only 47 (so this isn't TaskScheduler being in a CreateThread() frenzy per having many threads blocked on I/O per some file observer going nuts with notifications of these new files -- which was my first guess when seeing this report).

The SIGABRT is in __pthread_create(), perhaps it has a timeout of its own and perhaps Linux uses multi-threaded file copy and bogs down itself while doing this massive copy?

Now, while we can't do much about the kernel deciding to SIGABRT: I think we can help with the hangs caused by pthread_create being slow.

On the TaskScheduler end it looks like we're holding the SchedulerWorkerPoolImpl::lock_ while  creating a new worker. In this crash we see that two other worker threads are blocked on grabbing this lock to continue making progress. This lock inter-dependency is mostly unintentional actually as it was intended for the wake-up scenario, not creation [1]. Creating the physical thread can totally happen without holding the pool's lock IMO, the tricky part will be that we'll have an un-started worker in the list of |workers_|. This should mostly be fine (I think) but we'll need to support SchedulerWorker::WakeUp() not being a no-op if the thread isn't alive but rather guarantee that when it does come alive, it starts active rather than waiting for the first wake-up as it does today [2].

@fdoray per this being related to dynamic worker creation.

[1]
  // SchedulerWorker needs |lock_| as a predecessor for its thread lock
  // because in WakeUpOneWorker, |lock_| is first acquired and then
  // the thread lock is acquired when WakeUp is called on the worker.
https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker_pool_impl.cc?rcl=43e41f18912dc2cf5445b51b451490ef9d495d62&l=779

[2]
  ThreadMain :  // A SchedulerWorker starts out waiting for work.
https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker.cc?rcl=f25aebd1286cbcea060c7b3b3551d58e203f857c&l=42

Comment 8 by gab@chromium.org, Jan 28 2018

s/Linux kernel/ChromeOS kernel/ in #7 (misread OP)

Comment 9 by gab@chromium.org, Jan 28 2018

I actually think there's a kernel issue on CrOS (which makes pthread_create slow and eventually crash) which exercises a suboptimal impl in TaskScheduler.

Perhaps this should be two bugs?
Project Member

Comment 10 by sheriffbot@chromium.org, Mar 26 2018

Labels: FoundIn-67 Fracas OS-Linux
Users experienced this crash on the following builds:

Linux Dev 67.0.3377.1 -  1.01 CPM, 1 reports, 1 clients (signature base::`anonymous namespace'::CreateThread)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
Cc: mark@chromium.org jperaza@chromium.org
Components: Internals>CrashReporting
I'm not too familiar with Breakpad and even less so with Linux Breakpad, but it doesn't seem like a good idea to be creating a thread during a crash.

(One example where it looks like we do that https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27base%3A%3A%60anonymous%20namespace%5C%27%3A%3ACreateThread%27%20AND%20EXISTS%20(SELECT%201%20FROM%20UNNEST(CrashedStackTrace.StackFrame)%20WHERE%20FunctionName%20LIKE%20%27%25breakpad%25FindCrashingThreadAndDump%25%27)&stbtiq=&reportid=46566d5ed6bbcee4&index=2#3 )

I don't think there will be a reasonable solution for this until we switch to Crashpad on Linux, which does out-of-process capture.
Owner: ----
Status: Available (was: Assigned)
(unassigning from myself, but keeping on the Internals>TaskScheduler radar)

scottmg@: When can we expect Crashpad on Linux? Can you update NextAction field with this date? Thanks!
I'm currently finishing up the last bits to enable Crashpad for Android. After that I'll be moving on to ChromeOS and then Linux, generally. There is still at least one more feature needed to enable Crashpad for Chrome with pid namespacing, but I'm still expecting this sometime this quarter.

Comment 15 by gab@chromium.org, May 3 2018

Components: -Internals>TaskScheduler
Owner: jperaza@chromium.org
Status: Assigned (was: Available)
Not TaskScheduler's fault (removing from Internals>TaskScheduler). As mentioned above: posting a task during crash reporting is problematic.

@jperaza to address as part of change in #14.
Components: -Internals

Sign in to add a comment