New issue
Advanced search Search tips

Issue 740221 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug



Sign in to add a comment

sync_integration_tests failing

Project Member Reported by a...@chromium.org, Jul 7 2017

Issue description

sync_integration_tests failing on chromium.mac/Mac10.9 Tests (dbg)

Builders failed on: 
- Mac10.9 Tests (dbg): 
  https://build.chromium.org/p/chromium.mac/builders/Mac10.9%20Tests%20%28dbg%29

First failure on https://build.chromium.org/p/chromium.mac/builders/Mac10.9%20Tests%20%28dbg%29/builds/42425 . What failed:

TwoClientSearchEnginesSyncTest.Duplicates
TwoClientPreferencesSyncTest.BothClientsEnabledEncryptionAndChangedMultipleTimes_E2ETest
TwoClientPasswordsSyncTest.Merge_E2ETest
TwoClientBookmarksSyncTest.MC_BootStrapEmptyStateEverywhere
TwoClientSearchEnginesSyncTest.UpdateKeyword_E2ETest
TwoClientPreferencesSyncTest.SingleClientEnabledEncryptionBothChanged_E2ETest
TwoClientBookmarksSyncTest.SC_ReverseTheOrderOfTwoBMFolders
TwoClientBookmarksSyncTest.SingleClientEnabledEncryption
TwoClientBookmarksSyncTest.MC_MergeSimpleBMHierarchyEqualSetsUnderBMBar
TwoClientBookmarksSyncTest.MC_DuplicateFolders
TwoClientBookmarksSyncTest.MC_Merge_SimpleBMHierarchy_Under_BMFolder
TwoClientThemesSyncTest.CycleOptions_E2ETest
TwoClientBookmarksSyncTest.MC_SimpleMergeOfDifferentBMModels
TwoClientBookmarksSyncTest.MC_Merge_SimpleBMHierarchy_DisjointSets_Under_BookmarkBar
TwoClientBookmarksSyncTest.SingleClientEnabledEncryptionAndChanged
TwoClientBookmarksSyncTest.SC_DelEmptyBMFoldNonEmptyAccountAfterwards
TwoClientBookmarksSyncTest.MC_Merge_SimpleBMHierarchy_DisjointSets_Under_BMFolder
TwoClientSearchEnginesSyncTest.Delete_E2ETest
TwoClientPreferencesSyncTest.UnsyncableBooleanPref_E2ETest
TwoClientBookmarksSyncTest.SC_DelBMFoldWithBMsNonEmptyAccountAfterwards
TwoClientBookmarksSyncTest.SimultaneousURLChanges
TwoClientBookmarksSyncTest.SC_DelEmptyBMFoldEmptyAccountAfterwards
TwoClientTypedUrlsSyncTest.AddExpiredThenUpdate
TwoClientSearchEnginesSyncTest.MergeMultiple
TwoClientThemesSyncTest.CustomThenSyncDefault_E2ETest
TwoClientSearchEnginesSyncTest.UpdateUrl_E2ETest
TwoClientSearchEnginesSyncTest.AddMultiple_E2ETest
TwoClientBookmarksSyncTest.SC_AddSeveralBMsAndFolders
TwoClientBookmarksSyncTest.MC_BiDirectionalPushAddingBM
TwoClientBookmarksSyncTest.SC_UpdatingTitleDoesNotUpdateFaviconLastUpdatedTime
TwoClientSessionsSyncTest.BothClientsEnabledEncryption
TwoClientBookmarksSyncTest.SC_RenameBMFolderThatHasParentAndChildren
TwoClientBookmarksSyncTest.SC_DelMiddleBMUnderBMFoldNonEmptyFoldAfterwards
TwoClientBookmarksSyncTest.SC_DelLastBMUnderBMFoldNonEmptyFoldAfterwards
TwoClientPreferencesSyncTest.Bidirectional_E2ETest
TwoClientBookmarksSyncTest.MC_MergeDifferentBMModelsModeratelyComplex
TwoClientBookmarksSyncTest.RacyPositionChanges
TwoClientSessionsSyncTest.SingleClientEnabledEncryption
TwoClientBookmarksSyncTest.SC_RenameBMNameAndURL
TwoClientBookmarksSyncTest.SC_RenameEmptyBMFolder
LegacyTwoClientBookmarksSyncTest.MC_DeleteBookmark
TwoClientBookmarksSyncTest.SC_DelBMsUnderBMFoldEmptyFolderAfterwards
TwoClientBookmarksSyncTest.SC_RenameBMURL
TwoClientBookmarksSyncTest.MC_Merge_CaseInsensitivity_InNames

From the log:

[6267:771:0707/125637.565289:ERROR:child_account_info_fetcher_impl.cc(143)] ChildAccountInfoFetcherImpl::OnGetUserInfoSuccess: GetUserInfo response didn't include allServices field.

and then things go south.
 

Comment 1 by a...@chromium.org, Jul 7 2017

[ RUN      ] TwoClientBookmarksSyncTest.SimultaneousURLChanges
[7435:771:0707/130058.914738:ERROR:child_account_info_fetcher_impl.cc(143)] ChildAccountInfoFetcherImpl::OnGetUserInfoSuccess: GetUserInfo response didn't include allServices field.
[7435:108551:0707/130059.441541:WARNING:syncer_proto_util.cc(338)] Error posting from syncer: Response Code (bogus on error): -1 Content-Length (bogus on error): -1 Server Status: SYNC_AUTH_ERROR
[7435:108551:0707/130059.443975:WARNING:get_updates_processor.cc(103)] Skipping unexpected progress marker for non-enabled type Bookmarks
[7435:108551:0707/130059.444997:WARNING:sync_encryption_handler_impl.cc(971)] Nigori had empty encryption keybag.
[7435:108551:0707/130059.446598:WARNING:sync_encryption_handler_impl.cc(971)] Nigori had empty encryption keybag.
[7435:108551:0707/130059.460599:WARNING:sync_encryption_handler_impl.cc(347)] Ignoring new implicit passphrase. Keystore migration already performed.
[7435:108551:0707/130059.596106:WARNING:get_updates_processor.cc(103)] Skipping unexpected progress marker for non-enabled type Bookmarks
[7435:771:0707/130059.783992:ERROR:child_account_info_fetcher_impl.cc(143)] ChildAccountInfoFetcherImpl::OnGetUserInfoSuccess: GetUserInfo response didn't include allServices field.
[7435:101667:0707/130100.307602:WARNING:syncer_proto_util.cc(338)] Error posting from syncer: Response Code (bogus on error): -1 Content-Length (bogus on error): -1 Server Status: SYNC_AUTH_ERROR
[7435:101667:0707/130100.309707:WARNING:get_updates_processor.cc(103)] Skipping unexpected progress marker for non-enabled type Bookmarks
[7435:101667:0707/130100.309746:WARNING:get_updates_processor.cc(103)] Skipping unexpected progress marker for non-enabled type Preferences
[7435:101667:0707/130100.309789:WARNING:get_updates_processor.cc(103)] Skipping unexpected progress marker for non-enabled type Themes
[7435:101667:0707/130100.309837:WARNING:get_updates_processor.cc(103)] Skipping unexpected progress marker for non-enabled type Search Engines
[7435:101667:0707/130100.309886:WARNING:get_updates_processor.cc(103)] Skipping unexpected progress marker for non-enabled type Sessions
[7435:101667:0707/130100.309938:WARNING:get_updates_processor.cc(103)] Skipping unexpected progress marker for non-enabled type Apps
[7435:101667:0707/130100.310013:WARNING:get_updates_processor.cc(103)] Skipping unexpected progress marker for non-enabled type Device Info
[7435:101667:0707/130100.324497:WARNING:sync_encryption_handler_impl.cc(347)] Ignoring new implicit passphrase. Keystore migration already performed.

Comment 2 by a...@chromium.org, Jul 7 2017

Components: Services>Sync

Comment 3 by s...@chromium.org, Jul 7 2017

None of the logs pasted in #1 looks particularly concerning to me. Could this maybe just be a thread [checker] issue? They seem to end with

[4855:32003:0707/121801.347071:FATAL:system_hotkey_helper_mac.mm(48)] Check failed: ::content::BrowserThread::CurrentlyOn(BrowserThread::FILE). Must be called on Chrome_FileThread; actually called on TaskSchedulerSingleThreadForegroundBlocking1.
0   libbase.dylib                       0x000000012485375e base::debug::StackTrace::StackTrace(unsigned long) + 174
1   libbase.dylib                       0x00000001248537fd base::debug::StackTrace::StackTrace(unsigned long) + 29
2   libbase.dylib                       0x0000000124851c8c base::debug::StackTrace::StackTrace() + 28
3   libbase.dylib                       0x00000001248f089f logging::LogMessage::~LogMessage() + 479
4   libbase.dylib                       0x00000001248ee215 logging::LogMessage::~LogMessage() + 21
5   libcontent.dylib                    0x000000012c01bdbd content::SystemHotkeyHelperMac::LoadSystemHotkeys() + 349
6   libcontent.dylib                    0x000000012c01cbed void base::internal::FunctorTraits<void (content::SystemHotkeyHelperMac::*)(), void>::Invoke<content::SystemHotkeyHelperMac*>(void (content::SystemHotkeyHelperMac::*)(), content::SystemHotkeyHelperMac*&&) + 125
7   libcontent.dylib                    0x000000012c01cb34 void base::internal::InvokeHelper<false, void>::MakeItSo<void (content::SystemHotkeyHelperMac::* const&)(), content::SystemHotkeyHelperMac*>(void (content::SystemHotkeyHelperMac::* const&&&)(), content::SystemHotkeyHelperMac*&&) + 68
8   libcontent.dylib                    0x000000012c01cac3 void base::internal::Invoker<base::internal::BindState<void (content::SystemHotkeyHelperMac::*)(), base::internal::UnretainedWrapper<content::SystemHotkeyHelperMac> >, void ()>::RunImpl<void (content::SystemHotkeyHelperMac::* const&)(), std::__1::tuple<base::internal::UnretainedWrapper<content::SystemHotkeyHelperMac> > const&, 0ul>(void (content::SystemHotkeyHelperMac::* const&&&)(), std::__1::tuple<base::internal::UnretainedWrapper<content::SystemHotkeyHelperMac> > const&&&, base::IndexSequence<0ul>) + 99
9   libcontent.dylib                    0x000000012c01c9fc base::internal::Invoker<base::internal::BindState<void (content::SystemHotkeyHelperMac::*)(), base::internal::UnretainedWrapper<content::SystemHotkeyHelperMac> >, void ()>::Run(base::internal::BindStateBase*) + 44
10  libbase.dylib                       0x00000001247f108f base::Callback<void (), (base::internal::CopyMode)0, (base::internal::RepeatMode)0>::Run() + 95
11  libbase.dylib                       0x0000000124855cb0 base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) + 1040
12  libbase.dylib                       0x0000000124b10107 base::internal::TaskTracker::PerformRunTask(std::__1::unique_ptr<base::internal::Task, std::__1::default_delete<base::internal::Task> >, base::internal::Sequence*) + 3511
13  libbase.dylib                       0x0000000124b1261d base::internal::TaskTrackerPosix::PerformRunTask(std::__1::unique_ptr<base::internal::Task, std::__1::default_delete<base::internal::Task> >, base::internal::Sequence*) + 685
14  libbase.dylib                       0x0000000124b0e7da base::internal::TaskTracker::RunNextTask(base::internal::Sequence*) + 1034
15  libbase.dylib                       0x0000000124af589d base::internal::SchedulerWorker::Thread::ThreadMain() + 2157
16  libbase.dylib                       0x0000000124b1edf8 base::(anonymous namespace)::ThreadFunc(void*) + 632
17  libsystem_pthread.dylib             0x00007fff89979899 _pthread_body + 138
18  libsystem_pthread.dylib             0x00007fff8997972a _pthread_struct_init + 0
19  libsystem_pthread.dylib             0x00007fff8997dfc9 thread_start + 13

Comment 4 by s...@chromium.org, Jul 7 2017

Labels: OS-Mac
Owner: s...@chromium.org
Status: Started (was: Assigned)
Well, system_hotkey_helper_mac.mm hasn't been changed in a long time https://chromium.googlesource.com/chromium/src/+log/f37e7fd6eb873016a023acc58a121161128aeabd/content/browser/cocoa/system_hotkey_helper_mac.mm

Could this be part of the BrowserThread deprecation? Just looking at the file I don't see how LoadSystemHotkeys() doesn't get run on BrowserThread::FILE.

Taking ownership of this.

Comment 5 by a...@chromium.org, Jul 7 2017

Components: -Services>Sync Internals>TaskScheduler
I don't know how the TaskScheduler is doing its thing, but I would agree on your conclusions.

Re https://chromium.googlesource.com/chromium/src/+/master/docs/task_scheduler_migration.md if you switch it to  base::ThreadRestrictions::AssertIOAllowed() I'll review.

Comment 7 by s...@chromium.org, Jul 7 2017

So far we have two test cases that saw ~40 sync_integration_tests failures.

https://build.chromium.org/p/chromium.mac/builders/Mac10.9%20Tests%20%28dbg%29/builds/42425
https://build.chromium.org/p/chromium.mac/builders/Mac10.9%20Tests%20%28dbg%29/builds/42427

They're both on Mac10.9, and they have quite different sets of failing test cases.

The other mac builders have not seen this failure.

Wasn't able to find any smoking gun in either of 
https://chromium.googlesource.com/chromium/src/+log/e28db825e1f83e72777a670a448b7218022e9745/content/browser/browser_thread_impl.cc
https://chromium.googlesource.com/chromium/src/+log/96cb9e71e6049ebab48ae85f3e405366c3e21e35/content/browser/browser_main_loop.cc

First failure was Jul 7 ~19:00 +0000.

Comment 8 by a...@chromium.org, Jul 7 2017

I would say the solution is to convert that file to the TaskScheduler. Links in comments 5 and 6.

Comment 9 by a...@chromium.org, Jul 7 2017

If you want I can do it.
Any content::BrowserThread::CurrentlyOn(BrowserThread::FILE) calls need to be converted to using SequenceChecker.

Directly converting to the TaskScheduler while keeping these calls is an error.

Comment 11 by a...@chromium.org, Jul 7 2017

Cc: robliao@chromium.org
robliao: re the stack trace in comment 3, *somehow* this task that's supposed to be scheduled on the FILE thread ends up in the TaskScheduler. That's kinda the question here. This code wasn't converted yet, but somehow ended up running in new circumstances.

Comment 12 by s...@chromium.org, Jul 7 2017

My understanding of why we're seeing TaskSchedulerSingleThreadForegroundBlocking1 is because, reading https://chromium.googlesource.com/chromium/src/+/master/docs/task_scheduler_migration.md , it says 

> The SequencedWorkerPools and BrowserThreads (not UI/IO) are already being redirected to TaskScheduler under the hood so it's now “merely” a matter of updating the actual call sites.

My suspicion was it had to do with the BrowserThread::FILE being destroyed at the very end of a test case, but if there's a race condition, it's very hard to hit. A couple hundred test cases later on my Macbook, trying to tweak the delay to just the right time, did not hit any DCHECKs.

Blindly updating the code to use post_task.h now.
The thread name in this case is a red herring.
The DCHECK checks against the task runner, and it's picked up one that it hasn't expected.
https://cs.chromium.org/chromium/src/content/browser/browser_thread_impl.cc?rcl=e1466a707d3d4bc35f3bfa38169820edf9ee6060&l=571

Race conditions in BrowserThread initialization and teardown would do that. Do these tests not join against the BrowserThreads on termination?

Okay, now I see that the test brings up a full instance of Chrome for each test, this is a little more interesting.

For the record, I've yet to repro this DCHECK locally on my MBP at the first failing change, making the race condition theory more likely.
Project Member

Comment 15 by bugdroid1@chromium.org, Jul 7 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b356df4356092534a75811b9c31fb91f50ef012d

commit b356df4356092534a75811b9c31fb91f50ef012d
Author: Sky Malice <skym@chromium.org>
Date: Fri Jul 07 22:59:56 2017

Update SystemHotkeyHelperMac to use post_task instead of FILE thread.

Bug:  740221 
Change-Id: I59bcfd1a9a6a27b00d41ba9fee65f24b7dd0fc73
Reviewed-on: https://chromium-review.googlesource.com/564177
Reviewed-by: Avi Drissman <avi@chromium.org>
Commit-Queue: Sky Malice <skym@chromium.org>
Cr-Commit-Position: refs/heads/master@{#485082}
[modify] https://crrev.com/b356df4356092534a75811b9c31fb91f50ef012d/content/browser/cocoa/system_hotkey_helper_mac.mm

I think we've hit something similar to http://crbug.com/698140 , but not 
exactly the same as it can still repro because of the non-atomic shutdown.

The actors:
1) The BrowserThread::PostDelayedTask
2) ResetThread_FILE
3) BrowserMainLoop::ShutdownThreadsAndCleanUp
4) BrowserThread::CurrentlyOn

The PostDelayedTask is crucial for this to repro (and it's a very narrow repro).

First let's take a look at the call.
  BrowserThread::PostDelayedTask(
      BrowserThread::FILE,
      FROM_HERE,
      base::Bind(&SystemHotkeyHelperMac::LoadSystemHotkeys,
                 base::Unretained(this)),
      base::TimeDelta::FromSeconds(kLoadHotkeysDelaySeconds = 10));
This gets run basically at Chrome startup and effectively puts the 
LoadSystemHotkeys call into the delayed task queue.

Next, let's take a look at ResetThread_FILE, which does the following
  Post a WaitableEvent to drain the tasks _currently_ in flight
  Wait for that WaitableEvent to run
  Mark g_globals.Get().task_runners[FILE] = nullptr;

Now, we'll take a step back and look at
BrowserMainLoop::ShutdownThreadsAndCleanUp which does the following relevant 
actions non-atomically:
  ResetThread_FILE();
  base::TaskScheduler::GetInstance()->Shutdown();

Finally, we look at the BrowserThread::CurrentlyOn, which is short enough to 
show here.

bool BrowserThread::CurrentlyOn(ID identifier) {
  BrowserThreadGlobals& globals = g_globals.Get();
  base::AutoLock lock(globals.lock);
  DCHECK_GE(identifier, 0);
  DCHECK_LT(identifier, ID_COUNT);
  return globals.task_runners[identifier] &&
         globals.task_runners[identifier]->RunsTasksInCurrentSequence();
}


The stage is now set for the race.


If Chrome starts shutdown at around 10 seconds from when we posted 
LoadSystemHotKeys, the following sequence of events can happen.

1) Post the LoadSystemHotkeys delayed task.
2) ResetThread_FILE, which waits for _currently_ in flight tasks to finish
   execution and sets globals.Get().task_runners[FILE] to nullptr.
3) The 10 seconds delayed task fires and submits the LoadSystemHotkeys task to
   the FILE TaskRunner. Because delayed tasks keep track of their own task
   runner, the global FILE task_runner isn't needed to post this task.
   
   Note that we haven't even attempted to shut down the task scheduler by this
   point (and even if we did, we would instead hit http://crbug.com/698140).
   
   Now, have a FILE task in flight.
4) The task runs. It checks CurrentlyOn and very likely notes that
   globals.Get().task_runners[FILE] is nullptr, return false and triggering 
   the DCHECK.
   
If we were to keep the LoadSystemHotkeys code the same, the fix here would 
involve stopping Task Scheduler based delayed tasks before shutting down the 
browser threads.

Comment 17 by bsep@chromium.org, Jul 10 2017

Labels: -Sheriff-Chromium
Haven't seen this come up lately and it's being worked on, so removing from sheriff queue.

Comment 18 by s...@chromium.org, Jul 11 2017

Status: Fixed (was: Started)
Hasn't happened since the fix in #15 landed, resolving.

Sign in to add a comment