New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 707986 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 1
Type: Bug-Regression

Blocked on:
issue 708142

Blocking:
issue 653916



Sign in to add a comment

dbus-related assertion failure: FATAL:message_loop.h(568): Check failed: loop.

Project Member Reported by kbr@chromium.org, Apr 3 2017

Issue description

There are dbus-related assertion failures all over the GPU bots right now which all have the same stack trace:

[7328:7360:0403/144258.965030:FATAL:message_loop.h(568)] Check failed: loop. 
#0 0x7f5e48b322e7 base::debug::StackTrace::StackTrace()
#1 0x7f5e48b4bf1b logging::LogMessage::~LogMessage()
#2 0x7f5e48521844 base::MessageLoopForIO::current()
#3 0x7f5e49ab3a49 dbus::(anonymous namespace)::Watch::StartWatching()
#4 0x7f5e49ab398f dbus::Bus::OnAddWatch()
#5 0x7f5e49ab142f dbus::Bus::OnAddWatchThunk()
#6 0x7f5e420eae28 <unknown>
#7 0x7f5e420d302f dbus_connection_set_watch_functions
#8 0x7f5e49ab1301 dbus::Bus::SetUpAsyncOperations()
#9 0x7f5e49ac4422 dbus::ObjectProxy::StartAsyncMethodCall()
#10 0x7f5e49ac7f2c _ZN4base8internal13FunctorTraitsIMN4dbus11ObjectProxyEFviP11DBusMessageNS_8CallbackIFvPNS2_8ResponseEELNS0_8CopyModeE1ELNS0_10RepeatModeE1EEENS6_IFvPNS2_13ErrorResponseEELSA_1ELSB_1EEENS_9TimeTicksEEvE6InvokeIRK13scoped_refptrIS3_EJRKiRKS5_RKSC_RKSG_RKSH_EEEvSJ_OT_DpOT0_
#11 0x7f5e49ac7e2d _ZN4base8internal7InvokerINS0_9BindStateIMN4dbus11ObjectProxyEFviP11DBusMessageNS_8CallbackIFvPNS3_8ResponseEELNS0_8CopyModeE1ELNS0_10RepeatModeE1EEENS7_IFvPNS3_13ErrorResponseEELSB_1ELSC_1EEENS_9TimeTicksEEJ13scoped_refptrIS4_EiS6_SD_SH_SI_EEEFvvEE3RunEPNS0_13BindStateBaseE
#12 0x7f5e46e03ac1 _ZNO4base8CallbackIFvvELNS_8internal8CopyModeE1ELNS2_10RepeatModeE1EE3RunEv
#13 0x7f5e48bdfed3 base::debug::TaskAnnotator::RunTask()
#14 0x7f5e48b96644 base::internal::TaskTracker::PerformRunTask()
#15 0x7f5e48b969da base::internal::TaskTrackerPosix::PerformRunTask()
#16 0x7f5e48b95e37 base::internal::TaskTracker::RunTask()
#17 0x7f5e48bea95c base::internal::SchedulerWorker::Thread::ThreadMain()
#18 0x7f5e48b9e81c base::(anonymous namespace)::ThreadFunc()
#19 0x7f5e45e28182 start_thread
#20 0x7f5e3f89e47d clone

Please see the reported test failures on for example these bots:

https://chromium-swarm.appspot.com/bot?id=build148-m4&sort_stats=total%3Adesc
https://chromium-swarm.appspot.com/bot?id=build146-m4&sort_stats=total%3Adesc

These are just a couple of the Swarmed bots that are behind this critical tryserver, linux_chromium_rel_ng:
https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_rel_ng?numbuilds=200

I suspect that *all* of the recent failures of the GPU tests are this assertion failure.

Please investigate this immediately. It's causing random failures in unrelated CLs like https://codereview.chromium.org/2793913003/ . Thanks.

 

Comment 1 by kbr@chromium.org, Apr 3 2017

Note: I suspect this is a regression caused by one of the recent changes from  Issue 705051 .

Comment 2 by kbr@chromium.org, Apr 3 2017

Follow-up: my analysis is probably wrong -- it looks like those changes really were CrOS-specific -- but need help from people who work on dbus to understand what changes might have introduced this flakiness.

Cc: friedman@chromium.org
Don't see any changes in our infrastructure around dbus. cc-ing a sys admin who made puppet changes recently.

Comment 4 by kbr@chromium.org, Apr 3 2017

I don't think this was an Infra-related change; martiniss@, cc'd you mainly because your CL was affected by this flakiness. I'm 99% sure a recent Chromium code change caused these intermittent failures.

Comment 5 by derat@chromium.org, Apr 3 2017

Cc: vasi...@chromium.org satorux@chromium.org benchan@chromium.org
Owner: ----
Status: Available (was: Assigned)
Yes, Ben's changes were Chrome-OS-specific (and also to a system daemon rather than Chrome itself).

I don't see any recent changes in //dbus. I don't know much about D-Bus use on desktop Linux, but by grepping for "CallMethod" in *_unittest.cc, I see code in //chrome/browser/password_manager, //components/os_crypt, //device/battery, and //device/geolocation. Most of those don't have recent changes either, but password_manager has a few.
Cc: gab@chromium.org robliao@chromium.org
Maybe there's some change to the scheduler that made the task be posted to an unexpected thread? I can't see from the backtrace what sort of thread it's actually run on.
To confirm, does the above code expect to run on a named BrowserThread?
Blocking: -705051
Components: Internals>TaskScheduler
In either case, the code in question is expecting a MessageLoop. The TaskScheduler does not provide one (since you're supposed to post to the TaskScheduler or getting the current task runner handle instead of getting the loop).

We did enable redirection recently, but that would only matter if the above code expects to be running in a named Browser Thread. I'm attempting to chase it down, but it seems like most callers create their own DBus thread to handle this.


Mostly it runs on a specified dbus thread, but it looks like it sometimes just runs on the message loop of whatever created the object. For example with PowerSaveBlocker::Delegate::ApplyBlock I think it tries to create it on the FILE thread?
Owner: robliao@chromium.org
Status: Assigned (was: Available)
That would certainly do it. I'll start a revert.
Project Member

Comment 12 by bugdroid1@chromium.org, Apr 4 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f07732ec0e39ad7e9f1ad279f2624dbdf261803d

commit f07732ec0e39ad7e9f1ad279f2624dbdf261803d
Author: robliao <robliao@chromium.org>
Date: Tue Apr 04 00:33:33 2017

Revert of RedirectNonUINonIOBrowserThreads to TaskScheduler by default on trunk. (patchset #3 id:40001 of https://codereview.chromium.org/2690183002/ )

Reason for revert:
Breaks some DBUS code that requests for a message loop

Original issue's description:
> RedirectNonUINonIOBrowserThreads to TaskScheduler by default on trunk.
>
> The addition of AssertWaitAllowed in BrowserMainLoop::ShutdownThreadsAndCleanUp()
> is required because joining is a "wait" operation. It wasn't required
> before because PlatformThread::Join (platform_thread_posix.cc) actually still
> does AssertIOAllowed() instead of AssertWaitAllowed() for legacy reasons
> (and I/O is allowed in shutdown already). But this CL enables the redirection
> which uses a WaitableEvent to mimic thread join (ref. BrowserThreadImpl::StopRedirectionOfThreadID():
> waiting on the exact same set of tasks it did prior to redirection).
>
> BUG= 653916 
>
> Review-Url: https://codereview.chromium.org/2690183002
> Cr-Commit-Position: refs/heads/master@{#461481}
> Committed: https://chromium.googlesource.com/chromium/src/+/b6d0c9a06ab573af760763fcd7ecf76ae2c87695

TBR=jam@chromium.org,rkaplow@chromium.org,gab@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= 653916 ,  707986 

Review-Url: https://codereview.chromium.org/2791933004
Cr-Commit-Position: refs/heads/master@{#461588}

[modify] https://crrev.com/f07732ec0e39ad7e9f1ad279f2624dbdf261803d/base/threading/thread_restrictions.h
[modify] https://crrev.com/f07732ec0e39ad7e9f1ad279f2624dbdf261803d/content/browser/browser_main_loop.cc
[modify] https://crrev.com/f07732ec0e39ad7e9f1ad279f2624dbdf261803d/testing/variations/fieldtrial_testing_config.json

Comment 13 by kbr@chromium.org, Apr 4 2017

Thanks John for the essential diagnosis uncovering the problem and Rob for your quick action!

Comment 14 by gab@chromium.org, Apr 4 2017

Blocking: 653916
Owner: gab@chromium.org
Thanks, any chance a test can be added to catch this use case on CQ?
Owner: robliao@chromium.org
Blockedon: 708142
Project Member

Comment 18 by bugdroid1@chromium.org, May 1 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d4be08b81e51bbf512152939dce9955133f74864

commit d4be08b81e51bbf512152939dce9955133f74864
Author: robliao <robliao@chromium.org>
Date: Mon May 01 23:32:37 2017

Revert of Reland RedirectNonUINonIOBrowserThreads to TaskScheduler by default on trunk (patchset #1 id:1 of https://codereview.chromium.org/2856583002/ )

Reason for revert:
Possible failure: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.chromiumos%2FLinux_ChromiumOS_Tests__dbg__1_%2F25521%2F%2B%2Frecipes%2Fsteps%2Fbrowser_tests%2F0%2Flogs%2FCaptivePortalWindowTest.OnRedirected%2F0

Original issue's description:
> Reland RedirectNonUINonIOBrowserThreads to TaskScheduler by default on trunk
>
> This reverts commit f07732ec0e39ad7e9f1ad279f2624dbdf261803d
> (https://codereview.chromium.org/2791933004)
> and reapplies b6d0c9a06ab573af760763fcd7ecf76ae2c87695
> (https://codereview.chromium.org/2690183002/).
>
> BUG= 653916 ,  707986 
> TBR=jam@chromium.org, rkaplow@chromium.org
> Previously reviewed at https://codereview.chromium.org/2690183002/
>
> Review-Url: https://codereview.chromium.org/2856583002
> Cr-Commit-Position: refs/heads/master@{#468376}
> Committed: https://chromium.googlesource.com/chromium/src/+/3e27cb6e501ae55d614b56d566487590518ec223

TBR=jam@chromium.org,rkaplow@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= 653916 ,  707986 

Review-Url: https://codereview.chromium.org/2848413002
Cr-Commit-Position: refs/heads/master@{#468481}

[modify] https://crrev.com/d4be08b81e51bbf512152939dce9955133f74864/base/threading/thread_restrictions.h
[modify] https://crrev.com/d4be08b81e51bbf512152939dce9955133f74864/content/browser/browser_main_loop.cc
[modify] https://crrev.com/d4be08b81e51bbf512152939dce9955133f74864/testing/variations/fieldtrial_testing_config.json

Status: Fixed (was: Assigned)
Fixed by virtue of http://crbug.com/708142

Sign in to add a comment