ChromeOS shutdown hangs due to blocked net::FileStream::Context::ReadImpl task. |
||||||
Issue descriptionChromeOS (reports 95d9b6070b507b23, 6ce6e2d4c67d45df, e8b0722ee3a89683): These all have SchedulerWorker threads in base::File::ReadAtCurrentPosNoBestEffort(char*, int) calls from net::FileStream::Context::Read(). +mmenke for //net//base OWNERship. Example stack: 0x00007acfb024942d (libpthread-2.23.so + 0x0001042d ) __read_nocancel 0x00005725e002bd4e (chrome -unistd.h:37 ) base::File::ReadAtCurrentPosNoBestEffort(char*, int) 0x00005725e033fa61 (chrome -file_stream_context_posix.cc:90 ) net::FileStream::Context::ReadFileImpl(scoped_refptr<net::IOBuffer>, int) 0x00005725e033fe5c (chrome -bind_internal.h:447 ) base::internal::Invoker<base::internal::BindState<net::FileStream::Context::IOResult (net::FileStream::Context::*)(scoped_refptr<net::IOBuffer>, int), base::internal::UnretainedWrapper<net::FileStream::Context>, scoped_refptr<net::IOBuffer>, int>, net::FileStream::Context::IOResult ()>::RunOnce(base::internal::BindStateBase*) 0x00005725e033f70d (chrome -callback.h:95 ) void base::internal::ReturnAsParamAdapter<net::FileStream::Context::IOResult>(base::OnceCallback<net::FileStream::Context::IOResult ()>, net::FileStream::Context::IOResult*) 0x00005725dde805d2 (chrome -bind_internal.h:402 ) base::internal::Invoker<base::internal::BindState<void (*)(base::OnceCallback<GURL ()>, GURL*), base::OnceCallback<GURL ()>, GURL*>, void ()>::RunOnce(base::internal::BindStateBase*) 0x00005725e007dccd (chrome -callback.h:95 ) base::(anonymous namespace)::PostTaskAndReplyRelay::RunTaskAndPostReply(base::(anonymous namespace)::PostTaskAndReplyRelay) 0x00005725e007df12 (chrome -bind_internal.h:402 ) base::internal::Invoker<base::internal::BindState<void (*)(base::(anonymous namespace)::PostTaskAndReplyRelay), base::(anonymous namespace)::PostTaskAndReplyRelay>, void ()>::RunOnce(base::internal::BindStateBase*) 0x00005725dd95c89f (chrome -callback.h:95 ) base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) 0x00005725e0077e9f (chrome -task_tracker.cc:478 ) base::internal::TaskTracker::RunOrSkipTask(base::internal::Task, base::internal::Sequence*, bool) 0x00005725e0079072 (chrome -task_tracker_posix.cc:23 ) base::internal::TaskTrackerPosix::RunOrSkipTask(base::internal::Task, base::internal::Sequence*, bool) 0x00005725e0077867 (chrome -task_tracker.cc:371 ) base::internal::TaskTracker::RunAndPopNextTask(scoped_refptr<base::internal::Sequence>, base::internal::CanScheduleSequenceObserver*) 0x00005725e00b6d90 (chrome -scheduler_worker.cc:85 ) base::internal::SchedulerWorker::Thread::ThreadMain() 0x00005725e007da42 (chrome -platform_thread_posix.cc:76 ) base::(anonymous namespace)::ThreadFunc(void*) 0x00007acfb02402b7 (libpthread-2.23.so -pthread_create.c:333 ) start_thread 0x00007acfaf398fac (libc-2.23.so + 0x000f6fac ) clone
,
May 8 2018
I don't suppose we know if there's a single hanging read task (In which case, I'd blame the platform, unless some FileStream::Context consumer is reading an absurd amount of data at once) or multiple reads? Unfortunately, this is just a utility class with a number of consumers. There's only one ChromeOS-specific consumer (chromeos/dbus/pipe_reader.cc), but no way to tell if that's the relevant consumer here.
,
May 8 2018
Re #2: Looking at the other threads in one of the reports, there is definitely a thread actively in Bus::ProcessAllIncomingDataIfAny(), which would hint at DBus pipe reader being the issue here.
,
May 8 2018
CCing dbus owners, for potential dbus connection, per comments #2 and #3, though I'm certainly not convinced that's the responsible consumer here. That having been said, tearing down network requests, and the network stack itself, should stop all network-stack activity, unless we're blocked on a single long read (In which case, the OS is probably to blame, unless we're reading from a particularly funky stream source). Removing myself as an owner - I have no idea how to investigate ChromeOS crashes without a repro or a useful stack trace, and I don't consider a shutdown hang worth dropping everything to learn how to use ChromeOS crash dumps.
,
May 8 2018
Re #4: No problem; thanks!
,
May 9 2018
Probably, we should convert all PipeReader users to use base::TaskShutdownBehavior::CONTINUE_ON_SHUTDOWN to avoid blocking the shutdown. Another possible option is modifying net::FileStream to abort the read() syscall when the FileStream object is destructed.
,
May 15 2018
,
May 16 2018
,
May 21 2018
Moving this to OS>Systems (for lack of a better place to put these DBus calls) From the //net peanut gallery, CONTINUE_ON_SHUTDOWN seems more sensible for this case, although we could revisit the abort if we found it comes up in other contexts. That's just a large-enough change that I'd be uncomfortable making it without having a lot of cross-platform stability monitoring for regressions, and I'm not sure folks have the bandwidth or drive for that :)
,
May 21 2018
CONTINUE_ON_SHUTDOWN usually seems like a good idea to me too. :-P
PipeReader is used by DebugDaemonClient and LorgnetteManagerClient, both also in //chromeos/dbus. I'm not familiar with this code, but both of the classes look like they're already using CONTINUE_ON_SHUTDOWN in conjunction with PipeReader:
debug_daemon_client.cc:
48 class PipeReaderWrapper : public base::SupportsWeakPtr<PipeReaderWrapper> {
49 public:
50 explicit PipeReaderWrapper(const DebugDaemonClient::GetLogsCallback& callback)
51 : pipe_reader_(base::CreateTaskRunnerWithTraits(
52 {base::MayBlock(),
53 base::TaskShutdownBehavior::CONTINUE_ON_SHUTDOWN})),
54 callback_(callback) {}
lorgnette_manager_client.cc:
99 // Creates a pipe to read the scan data from the D-Bus service.
100 // Returns a write-side FD.
101 base::ScopedFD Start() {
102 DCHECK(!pipe_reader_.get());
103 DCHECK(!data_.has_value());
104 pipe_reader_ = std::make_unique<chromeos::PipeReader>(
105 base::CreateTaskRunnerWithTraits(
106 {base::MayBlock(),
107 base::TaskShutdownBehavior::CONTINUE_ON_SHUTDOWN}));
,
May 21 2018
Hrm...So assume that's respected, I guess that PipeReaders aren't the issue. Strange that this is only really showing up on ChromeOS.
,
May 22 2018
Probably, the same thing is happening on all platforms, but reported only on Chrome OS? AFAIK Chrome OS is the only platform which reports unsafe shutdown to the crash server.
,
May 22 2018
We've been getting alerts on shutdown hangs on Windows in UDPSocketWin::SetDiffServCodePoint, which happens on the IOThread, at least. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by w...@chromium.org
, May 8 2018