Issue metadata
Sign in to add a comment
|
powerd crash -> Check failed: dbus_wrapper_->PublishService(). Failed to publish D-Bus service |
||||||||||||||||||||||||
Issue descriptionChrome OS Version: 8872.22.0 Chrome OS Platform: elm Example crashes (all three from the same device at the same time): https://crash.corp.google.com/browse?q=ReportID=cd1a7e0700000000 https://crash.corp.google.com/browse?q=ReportID=07a696b900000000 https://crash.corp.google.com/browse?q=ReportID=566ba40700000000 Thread 0 CRASHED [SIGABRT @ 0x00000000 ] MAGIC SIGNATURE THREAD 0xf7163636 (libc-2.19.so + 0x00016636 ) __libc_do_syscall 0xf7173313 (libc-2.19.so -raise.c:56 ) raise 0xf71742ff (libc-2.19.so -abort.c:89 ) abort 0xf7499ef9 (libbase-core-395517.so -debugger_posix.cc:249 ) base::debug::BreakDebugger 0xf74ada8d (libbase-core-395517.so -logging.cc:755 ) logging::LogMessage::~LogMessage 0xf75c69e1 (powerd -daemon.cc:932 ) power_manager::Daemon::InitDBus 0xf75c7085 (powerd -daemon.cc:290 ) power_manager::Daemon::Init 0xf75b7e47 (powerd -main.cc:320 ) main 0xf7163307 (libc-2.19.so -libc-start.c:285 ) __libc_start_main 0xf75b8697 (powerd + 0x0000b697 ) _start 0xf7616f53 (powerd -elf-init.c:87 ) __libc_csu_init 0xf75969df (ld-2.19.so + 0x0000b9df ) _dl_sort_fini Thread 1 0xf71e6042 (libc-2.19.so + 0x00099042 ) __select 0xf74a26b5 (libbase-core-395517.so -file_path_watcher_linux.cc:215 ) InotifyReaderCallback 0xf749addd (libbase-core-395517.so -callback.h:397 ) base::debug::TaskAnnotator::RunTask 0xf74b2d91 (libbase-core-395517.so -message_loop.cc:478 ) base::MessageLoop::RunTask 0xf74b309d (libbase-core-395517.so -message_loop.cc:487 ) base::MessageLoop::DeferOrRunPendingTask 0xf74b33cd (libbase-core-395517.so -message_loop.cc:604 ) base::MessageLoop::DoWork 0xf74b55a3 (libbase-core-395517.so -message_pump_default.cc:33 ) base::MessagePumpDefault::Run 0xf74d0ea5 (libbase-core-395517.so -run_loop.cc:35 ) base::RunLoop::Run 0xf74b24ab (libbase-core-395517.so -message_loop.cc:294 ) base::MessageLoop::Run 0xf74ec747 (libbase-core-395517.so -thread.cc:252 ) base::Thread::ThreadMain 0xf74e884b (libbase-core-395517.so -platform_thread_posix.cc:70 ) ThreadFunc 0xf73a46b9 (libpthread-2.19.so -pthread_create.c:311 ) start_thread 0xf71eb65b (libc-2.19.so + 0x0009e65b ) clone log file contents: ===powerd.20161025-102800 (tail)=== [1025/102800:INFO:main.cc(310)] vcsid 0.0.2-r1563-05dfd21b41959276abfbf1a01439b078b95b6336 [1025/102800:ERROR:bus.cc(576)] Failed to get the ownership of org.chromium.PowerManager: [1025/102800:FATAL:daemon.cc(932)] Check failed: dbus_wrapper_->PublishService(). Failed to publish D-Bus service [1025/102800:INFO:main.cc(310)] vcsid 0.0.2-r1563-05dfd21b41959276abfbf1a01439b078b95b6336 [1025/102800:ERROR:bus.cc(576)] Failed to get the ownership of org.chromium.PowerManager: [1025/102800:FATAL:daemon.cc(932)] Check failed: dbus_wrapper_->PublishService(). Failed to publish D-Bus service EOF ===powerd.20161025-102801 (tail)=== [1025/102801:INFO:main.cc(310)] vcsid 0.0.2-r1563-05dfd21b41959276abfbf1a01439b078b95b6336 [1025/102801:ERROR:bus.cc(576)] Failed to get the ownership of org.chromium.PowerManager: [1025/102801:FATAL:daemon.cc(932)] Check failed: dbus_wrapper_->PublishService(). Failed to publish D-Bus service EOF How frequently does this problem reproduce? (Always, sometimes, hard to reproduce?) I see 3 of these on one device. However, I see 1440 other crashes in power_manager::Daemon::InitDBus on the crash server. This is very similar to the crash tracked in issue 493726 , but that one is about a segfault in socket_set_epoll_poll.
,
Oct 27 2016
This probably means that two copies of powerd are running at the same time. It's unrelated to issue 493726 (note that that one is a dbus-daemon segfault, not a powerd abort -- the powerd abort there is intentional and caused by dbus-daemon dying). powerd starts on "started boot-services" and stops on "stopping boot-services", so it should never restart unless it crashes. Maybe there can be a race (under heavy load or on certain hardware?) where the new powerd process tries to register itself with dbus-daemon before dbus-daemon has realized the old one is gone. hashimoto@ or satorux@, have you ever seen this happen? The logs that you linked support this theory. In 566ba40700000000, the last entry from the previous log was written just two seconds before the crashing process started: ===powerd.20161025-102602 (tail)=== ... [1025/102753:INFO:daemon.cc(1319)] Saw user activity [1025/102758:INFO:daemon.cc(1319)] Saw user activity EOF ===powerd.20161025-102800 (tail)=== [1025/102800:INFO:main.cc(310)] vcsid 0.0.2-r1563-05dfd21b41959276abfbf1a01439b078b95b6336 [1025/102800:ERROR:bus.cc(576)] Failed to get the ownership of org.chromium.PowerManager: [1025/102800:FATAL:daemon.cc(932)] Check failed: dbus_wrapper_->PublishService(). Failed to publish D-Bus service EOF These crashes are harmless, assuming that Upstart doesn't give up on starting powerd before the D-Bus name becomes available, but it'd probably be cleaner to make powerd retry taking ownership of the name a few times before aborting.
,
Oct 27 2016
Alternately, powerd could use REQUIRE_PRIMARY_ALLOW_REPLACEMENT rather than REQUIRE_PRIMARY when first acquiring the name. I believe that this would allow another powerd process to steal ownership of the name. We definitely don't want to be running two functional copies of powerd at the same time, but if this just being triggered by dbus-daemon not having released an old, dead process's name yet, it'd probably be safe. org.chromium.PowerManager.conf restricts ownership of the "org.chromium.PowerManager" name to the "power" user, so I don't think there are security concerns around doing this. libbrillo/brillo/daemons/dbus_daemon.cc also uses REQUIRE_PRIMARY, so I'm a bit surprised if we aren't seeing this with other daemons.
,
Oct 27 2016
I ran "while true; do killall -9 powerd; sleep 4; done" for ~15 minutes on a samus device but wasn't able to cause any SIGABRTs. (A shorter delay is likely to trigger the Upstart job's respawn limit.)
,
Oct 27 2016
Just as an aside, searching for the stable signature ("raise-01af9528") is probably more accurate, since it ought to just match PublishService() failures rather than all crashes within InitDBus(). The latter search also includes issue 493726 , which powerd can't work around (the system is unusable if dbus-daemon isn't running). The number of powerd ABRTs with raise-01af9528 is much lower.
,
Oct 28 2016
> Maybe there can be a race (under heavy load or on certain hardware?) where the new powerd process tries to register itself with dbus-daemon before dbus-daemon has realized the old one is gone. hashimoto@ or satorux@, have you ever seen this happen? No I've never seen that.
,
Oct 28 2016
I'm uneasy about using REQUIRE_PRIMARY_ALLOW_REPLACEMENT, since we really don't want to run two copies of powerd simultaneously and it feels like that would give up one (maybe our only?) guarantee of atomicity. Another option might be giving the powerd job a post-stop stanza with a loop that waits for a command like the following to fail: dbus-send --system --type=method_call --print-reply \ --dest=org.freedesktop.DBus \ /org/freedesktop/DBus \ org.freedesktop.DBus.GetNameOwner \ string:org.chromium.PowerManager That should guarantee that dbus-daemon no longer believes the name is owned.
,
Oct 28 2016
I'm still unable to repro this. With the powerd Upstart job modified locally to check whether the name is still owned in post-stop, I've still never seen this happen. I've tried both on a very fast device (samus) and a very slow one (cyan image running in a VM). So I guess I could check in any of a variety of potential workarounds, but I'm not convinced it'll help. Could these crashes just be caused by people messing around on devices that are in dev mode?
,
Oct 28 2016
@#8 ... quite possibly! I just confirmed that the three crashes on elm are from the same manual tester device. That same device also generated this feedback report, about 2 days after the crashes: https://feedback.corp.google.com/product/208/neutron?lView=rd&lReport=14809536349 Maybe kalin remembers what was happening on the device when powerd was crashing on 10/25.
,
Mar 7 2017
Please reopen if/when there's data suggesting that this isn't due to multiple copies of powerd being manually run. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by djkurtz@chromium.org
, Oct 27 2016Status: Duplicate (was: Assigned)
Summary: powerd crash -> Check failed: dbus_wrapper_->PublishService(). Failed to publish D-Bus service (was: elm: powerd crash -> Check failed: dbus_wrapper_->PublishService(). Failed to publish D-Bus service)