New issue
Advanced search Search tips

Issue 791587 link

Starred by 2 users

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

Chrome GetServerBackedStateKeys D-Bus call to session_manager timing out

Project Member Reported by igorcov@chromium.org, Dec 4 2017

Issue description

This is from investigating the
https://bugs.chromium.org/p/chromium/issues/detail?id=778757

We have a D-Bus call issued by session_manager_client to request state keys. It has infinite timeout, but on new devices like Eve it fails with timeout error.

The D-Bus call:
https://cs.chromium.org/chromium/src/chromeos/dbus/session_manager_client.cc?l=393
 
Cc: derat@chromium.org
Dan, I think you've worked in D-Bus calls black-box?

Comment 2 by derat@chromium.org, Dec 4 2017

Cc: satorux@chromium.org mnissler@chromium.org hashimoto@chromium.org
Summary: Chrome GetServerBackedStateKeys D-Bus call to session_manager timing out (was: D-Bus call with infinite timeout failing with timeout on new devices)
Maybe dbus-daemon enforces a maximum on timeouts? I'm not sure.

session_manager definitely shouldn't be blocking for a long time. Do you know what it's doing?
It's waiting for the time sync from the network. Without network the user is now allowed to proceed in that flow.

Comment 4 by derat@chromium.org, Dec 4 2017

Okay, so session_manager isn't blocked; it just saves the request so it can reply to it asynchronously later.

I think there was some discussion about this before, but why can't Chrome just wait for the time to be synced before requesting state keys? chromeos::SystemClockClient has a SystemClockUpdated method on its observer interface to hear about changes and a GetLastSyncInfo method that can be called to check whether the time is currently synced. That's cleaner than having a pending D-Bus method call sitting around.
FWIW, the default timeout of D-Bus is 25 seconds [1]. I thought passing dbus::ObjectProxy::TIMEOUT_INFINITE would override this value. I don't see why the time out error could happen only with new devices like eve.


[1] https://cgit.freedesktop.org/dbus/dbus/tree/dbus/dbus-connection-internal.h#n45

/** default timeout value when waiting for a message reply, 25 seconds */
#define _DBUS_DEFAULT_TIMEOUT_VALUE (25 * 1000)

Exactly, the issue is about the TIMEOUT_INFINITE not working properly on new devices. For state keys generation there's a fix in place already to catch it and react accordingly.

But if there are other places where infinite on D-Bus call is used or will be used in the future, it will fail.
The error message on https://bugs.chromium.org/p/chromium/issues/detail?id=778757#c4 is "Message did not receive a reply (timeout by message bus)" which is different from what you get on a normal method call timeout (pasted below).
FWIW this error message comes from this line. https://cgit.freedesktop.org/dbus/dbus/tree/bus/connection.c?id=56847ae818e8c144b9c7a8102efd1c926ead2f62#n1763
I guess you are seeing that timeout because something is wrong with the dbus daemon itself or communication between the session manager and the dbus daemon.


~/chrome/src % out/gn/dbus_unittests --gtest_filter="EndToEndAsyncTest.Timeout"
IMPORTANT DEBUGGING NOTE: batches of tests are run inside their
own process. For debugging a test inside a debugger, use the
--gtest_filter=<your_test_name> flag along with
--single-process-tests.
Using sharding settings from environment. This is shard 0/1
Using 1 parallel jobs.
Note: Google Test filter = EndToEndAsyncTest.Timeout
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from EndToEndAsyncTest
[ RUN      ] EndToEndAsyncTest.Timeout
[138661:138661:1206/164607.505798:681944916356:ERROR:object_proxy.cc(626)] Failed to call method: org.chromium.TestInterface.SlowEcho: object_path= /org/chromium/TestObject: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
[       OK ] EndToEndAsyncTest.Timeout (112 ms)
[----------] 1 test from EndToEndAsyncTest (112 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (112 ms total)
[  PASSED  ] 1 test.
[1/1] EndToEndAsyncTest.Timeout (112 ms)
SUCCESS: all tests passed.
Tests took 0 seconds.

As derat@ has said, an obvious fix is stop using the infinite timeout and it'll also make things cleaner.
Components: OS>Systems

Sign in to add a comment