New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 660114 link

Starred by 1 user

Issue metadata

Status: WontFix
Merged: issue 493726
Owner:
Closed: Mar 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

powerd crash -> Check failed: dbus_wrapper_->PublishService(). Failed to publish D-Bus service

Project Member Reported by djkurtz@chromium.org, Oct 27 2016

Issue description

Chrome OS Version: 8872.22.0
Chrome OS Platform: elm

Example crashes (all three from the same device at the same time):
https://crash.corp.google.com/browse?q=ReportID=cd1a7e0700000000
https://crash.corp.google.com/browse?q=ReportID=07a696b900000000
https://crash.corp.google.com/browse?q=ReportID=566ba40700000000


Thread 0 CRASHED [SIGABRT @ 0x00000000 ] MAGIC SIGNATURE THREAD
0xf7163636	(libc-2.19.so + 0x00016636 )	__libc_do_syscall
0xf7173313	(libc-2.19.so -raise.c:56 )	raise
0xf71742ff	(libc-2.19.so -abort.c:89 )	abort
0xf7499ef9	(libbase-core-395517.so -debugger_posix.cc:249 )	base::debug::BreakDebugger
0xf74ada8d	(libbase-core-395517.so -logging.cc:755 )	logging::LogMessage::~LogMessage
0xf75c69e1	(powerd -daemon.cc:932 )	power_manager::Daemon::InitDBus
0xf75c7085	(powerd -daemon.cc:290 )	power_manager::Daemon::Init
0xf75b7e47	(powerd -main.cc:320 )	main
0xf7163307	(libc-2.19.so -libc-start.c:285 )	__libc_start_main
0xf75b8697	(powerd + 0x0000b697 )	_start
0xf7616f53	(powerd -elf-init.c:87 )	__libc_csu_init
0xf75969df	(ld-2.19.so + 0x0000b9df )	_dl_sort_fini
Thread 1
0xf71e6042	(libc-2.19.so + 0x00099042 )	__select
0xf74a26b5	(libbase-core-395517.so -file_path_watcher_linux.cc:215 )	InotifyReaderCallback
0xf749addd	(libbase-core-395517.so -callback.h:397 )	base::debug::TaskAnnotator::RunTask
0xf74b2d91	(libbase-core-395517.so -message_loop.cc:478 )	base::MessageLoop::RunTask
0xf74b309d	(libbase-core-395517.so -message_loop.cc:487 )	base::MessageLoop::DeferOrRunPendingTask
0xf74b33cd	(libbase-core-395517.so -message_loop.cc:604 )	base::MessageLoop::DoWork
0xf74b55a3	(libbase-core-395517.so -message_pump_default.cc:33 )	base::MessagePumpDefault::Run
0xf74d0ea5	(libbase-core-395517.so -run_loop.cc:35 )	base::RunLoop::Run
0xf74b24ab	(libbase-core-395517.so -message_loop.cc:294 )	base::MessageLoop::Run
0xf74ec747	(libbase-core-395517.so -thread.cc:252 )	base::Thread::ThreadMain
0xf74e884b	(libbase-core-395517.so -platform_thread_posix.cc:70 )	ThreadFunc
0xf73a46b9	(libpthread-2.19.so -pthread_create.c:311 )	start_thread
0xf71eb65b	(libc-2.19.so + 0x0009e65b )	clone

log file contents:
===powerd.20161025-102800 (tail)===
[1025/102800:INFO:main.cc(310)] vcsid 0.0.2-r1563-05dfd21b41959276abfbf1a01439b078b95b6336
[1025/102800:ERROR:bus.cc(576)] Failed to get the ownership of org.chromium.PowerManager: 
[1025/102800:FATAL:daemon.cc(932)] Check failed: dbus_wrapper_->PublishService(). Failed to publish D-Bus service

[1025/102800:INFO:main.cc(310)] vcsid 0.0.2-r1563-05dfd21b41959276abfbf1a01439b078b95b6336
[1025/102800:ERROR:bus.cc(576)] Failed to get the ownership of org.chromium.PowerManager: 
[1025/102800:FATAL:daemon.cc(932)] Check failed: dbus_wrapper_->PublishService(). Failed to publish D-Bus service

EOF
===powerd.20161025-102801 (tail)===
[1025/102801:INFO:main.cc(310)] vcsid 0.0.2-r1563-05dfd21b41959276abfbf1a01439b078b95b6336
[1025/102801:ERROR:bus.cc(576)] Failed to get the ownership of org.chromium.PowerManager: 
[1025/102801:FATAL:daemon.cc(932)] Check failed: dbus_wrapper_->PublishService(). Failed to publish D-Bus service

EOF

How frequently does this problem reproduce? (Always, sometimes, hard to
reproduce?)

I see 3 of these on one device.
However, I see 1440 other crashes in power_manager::Daemon::InitDBus on the crash server.

This is very similar to the crash tracked in  issue 493726 , but that one is about a segfault in socket_set_epoll_poll.
 
Mergedinto: 493726
Status: Duplicate (was: Assigned)
Summary: powerd crash -> Check failed: dbus_wrapper_->PublishService(). Failed to publish D-Bus service (was: elm: powerd crash -> Check failed: dbus_wrapper_->PublishService(). Failed to publish D-Bus service)
Nothing elm specific here, I see this on many other devices as well, and on many versions, all the way back to 5500.100.7.

But the only three instances on elm were all from the same device on version 8872.22.0.

Comment 2 by derat@chromium.org, Oct 27 2016

Cc: satorux@chromium.org jrbarnette@chromium.org hashimoto@chromium.org
Labels: -Restrict-View-Google
Status: Started (was: Duplicate)
This probably means that two copies of powerd are running at the same time. It's unrelated to  issue 493726  (note that that one is a dbus-daemon segfault, not a powerd abort -- the powerd abort there is intentional and caused by dbus-daemon dying).

powerd starts on "started boot-services" and stops on "stopping boot-services", so it should never restart unless it crashes. Maybe there can be a race (under heavy load or on certain hardware?) where the new powerd process tries to register itself with dbus-daemon before dbus-daemon has realized the old one is gone. hashimoto@ or satorux@, have you ever seen this happen?

The logs that you linked support this theory. In 566ba40700000000, the last entry from the previous log was written just two seconds before the crashing process started:

===powerd.20161025-102602 (tail)===
...
[1025/102753:INFO:daemon.cc(1319)] Saw user activity
[1025/102758:INFO:daemon.cc(1319)] Saw user activity
EOF
===powerd.20161025-102800 (tail)===
[1025/102800:INFO:main.cc(310)] vcsid 0.0.2-r1563-05dfd21b41959276abfbf1a01439b078b95b6336
[1025/102800:ERROR:bus.cc(576)] Failed to get the ownership of org.chromium.PowerManager:
[1025/102800:FATAL:daemon.cc(932)] Check failed: dbus_wrapper_->PublishService(). Failed to publish D-Bus service
EOF

These crashes are harmless, assuming that Upstart doesn't give up on starting powerd before the D-Bus name becomes available, but it'd probably be cleaner to make powerd retry taking ownership of the name a few times before aborting.

Comment 3 by derat@chromium.org, Oct 27 2016

Alternately, powerd could use REQUIRE_PRIMARY_ALLOW_REPLACEMENT rather than REQUIRE_PRIMARY when first acquiring the name. I believe that this would allow another powerd process to steal ownership of the name.

We definitely don't want to be running two functional copies of powerd at the same time, but if this just being triggered by dbus-daemon not having released an old, dead process's name yet, it'd probably be safe. org.chromium.PowerManager.conf restricts ownership of the "org.chromium.PowerManager" name to the "power" user, so I don't think there are security concerns around doing this.

libbrillo/brillo/daemons/dbus_daemon.cc also uses REQUIRE_PRIMARY, so I'm a bit surprised if we aren't seeing this with other daemons.

Comment 4 by derat@chromium.org, Oct 27 2016

I ran "while true; do killall -9 powerd; sleep 4; done" for ~15 minutes on a samus device but wasn't able to cause any SIGABRTs. (A shorter delay is likely to trigger the Upstart job's respawn limit.)

Comment 5 by derat@chromium.org, Oct 27 2016

Just as an aside, searching for the stable signature ("raise-01af9528") is probably more accurate, since it ought to just match PublishService() failures rather than all crashes within InitDBus(). The latter search also includes  issue 493726 , which powerd can't work around (the system is unusable if dbus-daemon isn't running). The number of powerd ABRTs with raise-01af9528 is much lower.
> Maybe there can be a race (under heavy load or on certain hardware?) where the new powerd process tries to register itself with dbus-daemon before dbus-daemon has realized the old one is gone. hashimoto@ or satorux@, have you ever seen this happen?

No I've never seen that.

Comment 7 by derat@chromium.org, Oct 28 2016

I'm uneasy about using REQUIRE_PRIMARY_ALLOW_REPLACEMENT, since we really don't want to run two copies of powerd simultaneously and it feels like that would give up one (maybe our only?) guarantee of atomicity.

Another option might be giving the powerd job a post-stop stanza with a loop that waits for a command like the following to fail:

dbus-send --system --type=method_call --print-reply \
  --dest=org.freedesktop.DBus \
  /org/freedesktop/DBus \
  org.freedesktop.DBus.GetNameOwner \
  string:org.chromium.PowerManager

That should guarantee that dbus-daemon no longer believes the name is owned.

Comment 8 by derat@chromium.org, Oct 28 2016

I'm still unable to repro this. With the powerd Upstart job modified locally to check whether the name is still owned in post-stop, I've still never seen this happen. I've tried both on a very fast device (samus) and a very slow one (cyan image running in a VM).

So I guess I could check in any of a variety of potential workarounds, but I'm not convinced it'll help. Could these crashes just be caused by people messing around on devices that are in dev mode?
Cc: ka...@chromium.org
@#8 ... quite possibly!  I just confirmed that the three crashes on elm are from the same manual tester device.

That same device also generated this feedback report, about 2 days after the crashes:

https://feedback.corp.google.com/product/208/neutron?lView=rd&lReport=14809536349

Maybe kalin remembers what was happening on the device when powerd was crashing on 10/25.
Status: WontFix (was: Started)
Please reopen if/when there's data suggesting that this isn't due to multiple copies of powerd being manually run.

Sign in to add a comment