Consider making session_manager emit login-prompt-visible Upstart signal asynchronously |
|||||||
Issue description
session_manager currently makes a synchronous D-Bus method call to Upstart to ask it to emit the login-prompt-visible signal:
----
session_manager_impl.cc:
void SessionManagerImpl::EmitLoginPromptVisible(Error* error) {
login_metrics_->RecordStats("login-prompt-visible");
dbus_emitter_->EmitSignal(kLoginPromptVisibleSignal);
if (!init_controller_->TriggerImpulse("login-prompt-visible", {})) {
static const char msg[] =
"Emitting login-prompt-visible upstart signal failed.";
LOG(ERROR) << msg;
error->Set(dbus_error::kEmitFailed, msg);
}
}
----
upstart_signal_emitter.cc:
const char UpstartSignalEmitter::kServiceName[] = "com.ubuntu.Upstart";
const char UpstartSignalEmitter::kPath[] = "/com/ubuntu/Upstart";
const char UpstartSignalEmitter::kInterface[] = "com.ubuntu.Upstart0_6";
const char UpstartSignalEmitter::kMethodName[] = "EmitEvent";
...
scoped_ptr<dbus::Response> UpstartSignalEmitter::EmitSignal(
const std::string& signal_name,
const std::vector<std::string>& args_keyvals) {
DLOG(INFO) << "Emitting " << signal_name << " Upstart signal";
dbus::MethodCall method_call(UpstartSignalEmitter::kInterface,
UpstartSignalEmitter::kMethodName);
dbus::MessageWriter writer(&method_call);
writer.AppendString(signal_name);
writer.AppendArrayOfStrings(args_keyvals);
writer.AppendBool(true);
return upstart_dbus_proxy_->CallMethodAndBlock(
&method_call, dbus::ObjectProxy::TIMEOUT_USE_DEFAULT);
}
----
We've sometimes seen Chrome's D-Bus method call asking session_manager to emit the signal time out (issue 631640, issue 669097), and one theory is that session_manager's call to Upstart is blocking on one or more login-prompt-visible-triggered jobs. Attempts to determine which job(s) are responsible haven't been successful, but Mattias suggested that we could make session_manager call Upstart asynchronously and either:
a) reply immediately to Chrome's request, or
b) reply asynchronously to Chrome's request after Upstart replies to session_manager.
I don't think that we'd get any additional benefit from b). session_manager at least wouldn't be blocked while waiting on Upstart, but from looking at chromeos/dbus/session_manager_client.cc, Chrome is already just making an async call to session_manager and discarding the response.
Given that Chrome doesn't care about waiting for the signal, I'm pretty comfortable switching session_manager's call to be async (as long as we don't do it right before a branch point). :-P
There's a risk that other parts of session_manager could be dependent on the login-prompt-visible signal being fully processed before moving on, but after looking at the dependent jobs (boot-complete, cups-clear-state, send-uptime-metrics, ui-init-late) that seemed unlikely to Mattias and me.
,
Dec 6 2016
I've uploaded changes: https://chromium-review.googlesource.com/#/c/417126/ https://chromium-review.googlesource.com/#/c/417181/ (The systemd portion of the second change is untested.)
,
Dec 7 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/2e22a9bec6e886e89e472e906ce74b8e6ae58dd4 commit 2e22a9bec6e886e89e472e906ce74b8e6ae58dd4 Author: Daniel Erat <derat@chromium.org> Date: Tue Dec 06 17:54:46 2016 login: Make UpstartSignalEmitter::EmitSignal private. UpstartSignalEmitter's EmitSignal method is only called by the same class's TriggerImpulse method, and overridden (but not called) by the derived StubUpstartSignalEmitter class. Make it be private rather than public. BUG= chromium:671662 TEST=compiles and unit tests still pass Change-Id: If7fdf9c383e2de2fd608550b7c85561ae1b03b58 Reviewed-on: https://chromium-review.googlesource.com/417126 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Mattias Nissler <mnissler@chromium.org> [modify] https://crrev.com/2e22a9bec6e886e89e472e906ce74b8e6ae58dd4/login_manager/stub_upstart_signal_emitter.h [modify] https://crrev.com/2e22a9bec6e886e89e472e906ce74b8e6ae58dd4/login_manager/upstart_signal_emitter.cc [modify] https://crrev.com/2e22a9bec6e886e89e472e906ce74b8e6ae58dd4/login_manager/upstart_signal_emitter.h
,
Dec 7 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/cc827123b9ccc0a686698bb83d6ae8adbff4b863 commit cc827123b9ccc0a686698bb83d6ae8adbff4b863 Author: Daniel Erat <derat@chromium.org> Date: Tue Dec 06 18:29:40 2016 login: Emit login-prompt-visible Upstart signal async. Make session_manager's D-Bus method call to Upstart to emit the login-prompt-visible signal be asynchronous. This is a speculative fix for cases where session_manager appears to block on this call, possibly due to other Upstart jobs hanging at startup. Continue to use synchronous requests for all other Upstart signals. I'm also updating systemd to support starting units asynchronously, although I'm unsure of whether those D-Bus calls to systemd are ever expected to block. BUG= chromium:671662 TEST=unit tests pass; also deployed to a samus device, ran "initctl log-priority info", and verified that that emit-login-prompt is still emitted and that boot and login appears to work as before Change-Id: I661a9dc03c2430622a85ce4d51c47ce22ed5d63e Reviewed-on: https://chromium-review.googlesource.com/417181 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Mattias Nissler <mnissler@chromium.org> [modify] https://crrev.com/cc827123b9ccc0a686698bb83d6ae8adbff4b863/login_manager/init_daemon_controller.h [modify] https://crrev.com/cc827123b9ccc0a686698bb83d6ae8adbff4b863/login_manager/upstart_signal_emitter.h [modify] https://crrev.com/cc827123b9ccc0a686698bb83d6ae8adbff4b863/login_manager/session_manager_impl.cc [modify] https://crrev.com/cc827123b9ccc0a686698bb83d6ae8adbff4b863/login_manager/session_manager_impl_unittest.cc [modify] https://crrev.com/cc827123b9ccc0a686698bb83d6ae8adbff4b863/login_manager/upstart_signal_emitter.cc [modify] https://crrev.com/cc827123b9ccc0a686698bb83d6ae8adbff4b863/login_manager/systemd_unit_starter.cc [modify] https://crrev.com/cc827123b9ccc0a686698bb83d6ae8adbff4b863/login_manager/session_manager_dbus_adaptor.cc [modify] https://crrev.com/cc827123b9ccc0a686698bb83d6ae8adbff4b863/login_manager/stub_upstart_signal_emitter.h [modify] https://crrev.com/cc827123b9ccc0a686698bb83d6ae8adbff4b863/login_manager/session_manager_impl.h [modify] https://crrev.com/cc827123b9ccc0a686698bb83d6ae8adbff4b863/login_manager/systemd_unit_starter.h
,
Dec 13 2016
Just checking on this bug since it is blocking https://bugs.chromium.org/p/chromium/issues/detail?id=669097 - This seems to have a huge effect on the VC fleet at Google. https://screenshot.googleplex.com/O9PvVOcYCSo - We seemed to lose about 300 units over the weekend and techs have to go to each room to reset them. Can we get an overview of current troubleshooting and if possible an eta of fix?
,
Dec 13 2016
I checked in this speculative workaround for the Upstart-job-is-hanging theory. I don't think I've seen any solid repro steps for issue 669097, though.
,
Dec 13 2016
How risky is this change ? - Can we merge it to 56 or 55 ?
,
Dec 13 2016
I haven't heard of any problems but feel like it's fairly risky. I'd prefer not to merge it unless we have reason to believe that it actually fixes the problem. (I think that Mattias and I agree that it's probably the right thing to do, but it's still just based on speculation that an Upstart job may be hanging... and we still don't have any ideas about exactly which job is hanging or why; this just makes session_manager not block on Upstart.)
,
Dec 13 2016
Xiyuan mentioned that state keys are sent to session_manager in response to the login-prompt-visible signal: - ui-init-late.conf is "start on login-prompt-visible or stopped ui-collect-machine-info" and runs the ui-init-late script. - ui-init-late uses dbus-send to make a D-Bus call to session_manager's InitMachineInfo method (but only if /run/session_manager/machine-info exists). I was initially thinking that we could have a circular dependency here: - session_manager blocks on its call to Upstart to emit login-prompt-visible. - That signal runs ui-init-late, which tries to call session_manager but times out since session_manager is still blocked. If ui-collect-machine-info typically only completes after login-prompt-visible, that would explain why we don't usually hit this. However, after some more testing (with #4 reverted), I don't think that this theory holds water: - ui-init-late's dbus-send call doesn't use --print-reply, so it just queues the method call instead of blocking on it, which means that ui-init-late returns even if session_manager is blocked. - Even when I make ui-init-late use --print-reply, session_manager's call to Upstart isn't blocked on this call finishing -- I see session_manager's call complete first, and then ui-init-late call completes. So I still don't see any evidence that this fixes the CfM issue. :-(
,
Aug 1 2017
,
Jan 22 2018
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by derat@chromium.org
, Dec 6 2016