sync_integration_tests failing |
|||||||
Issue descriptionsync_integration_tests failing on chromium.mac/Mac10.9 Tests (dbg) Builders failed on: - Mac10.9 Tests (dbg): https://build.chromium.org/p/chromium.mac/builders/Mac10.9%20Tests%20%28dbg%29 First failure on https://build.chromium.org/p/chromium.mac/builders/Mac10.9%20Tests%20%28dbg%29/builds/42425 . What failed: TwoClientSearchEnginesSyncTest.Duplicates TwoClientPreferencesSyncTest.BothClientsEnabledEncryptionAndChangedMultipleTimes_E2ETest TwoClientPasswordsSyncTest.Merge_E2ETest TwoClientBookmarksSyncTest.MC_BootStrapEmptyStateEverywhere TwoClientSearchEnginesSyncTest.UpdateKeyword_E2ETest TwoClientPreferencesSyncTest.SingleClientEnabledEncryptionBothChanged_E2ETest TwoClientBookmarksSyncTest.SC_ReverseTheOrderOfTwoBMFolders TwoClientBookmarksSyncTest.SingleClientEnabledEncryption TwoClientBookmarksSyncTest.MC_MergeSimpleBMHierarchyEqualSetsUnderBMBar TwoClientBookmarksSyncTest.MC_DuplicateFolders TwoClientBookmarksSyncTest.MC_Merge_SimpleBMHierarchy_Under_BMFolder TwoClientThemesSyncTest.CycleOptions_E2ETest TwoClientBookmarksSyncTest.MC_SimpleMergeOfDifferentBMModels TwoClientBookmarksSyncTest.MC_Merge_SimpleBMHierarchy_DisjointSets_Under_BookmarkBar TwoClientBookmarksSyncTest.SingleClientEnabledEncryptionAndChanged TwoClientBookmarksSyncTest.SC_DelEmptyBMFoldNonEmptyAccountAfterwards TwoClientBookmarksSyncTest.MC_Merge_SimpleBMHierarchy_DisjointSets_Under_BMFolder TwoClientSearchEnginesSyncTest.Delete_E2ETest TwoClientPreferencesSyncTest.UnsyncableBooleanPref_E2ETest TwoClientBookmarksSyncTest.SC_DelBMFoldWithBMsNonEmptyAccountAfterwards TwoClientBookmarksSyncTest.SimultaneousURLChanges TwoClientBookmarksSyncTest.SC_DelEmptyBMFoldEmptyAccountAfterwards TwoClientTypedUrlsSyncTest.AddExpiredThenUpdate TwoClientSearchEnginesSyncTest.MergeMultiple TwoClientThemesSyncTest.CustomThenSyncDefault_E2ETest TwoClientSearchEnginesSyncTest.UpdateUrl_E2ETest TwoClientSearchEnginesSyncTest.AddMultiple_E2ETest TwoClientBookmarksSyncTest.SC_AddSeveralBMsAndFolders TwoClientBookmarksSyncTest.MC_BiDirectionalPushAddingBM TwoClientBookmarksSyncTest.SC_UpdatingTitleDoesNotUpdateFaviconLastUpdatedTime TwoClientSessionsSyncTest.BothClientsEnabledEncryption TwoClientBookmarksSyncTest.SC_RenameBMFolderThatHasParentAndChildren TwoClientBookmarksSyncTest.SC_DelMiddleBMUnderBMFoldNonEmptyFoldAfterwards TwoClientBookmarksSyncTest.SC_DelLastBMUnderBMFoldNonEmptyFoldAfterwards TwoClientPreferencesSyncTest.Bidirectional_E2ETest TwoClientBookmarksSyncTest.MC_MergeDifferentBMModelsModeratelyComplex TwoClientBookmarksSyncTest.RacyPositionChanges TwoClientSessionsSyncTest.SingleClientEnabledEncryption TwoClientBookmarksSyncTest.SC_RenameBMNameAndURL TwoClientBookmarksSyncTest.SC_RenameEmptyBMFolder LegacyTwoClientBookmarksSyncTest.MC_DeleteBookmark TwoClientBookmarksSyncTest.SC_DelBMsUnderBMFoldEmptyFolderAfterwards TwoClientBookmarksSyncTest.SC_RenameBMURL TwoClientBookmarksSyncTest.MC_Merge_CaseInsensitivity_InNames From the log: [6267:771:0707/125637.565289:ERROR:child_account_info_fetcher_impl.cc(143)] ChildAccountInfoFetcherImpl::OnGetUserInfoSuccess: GetUserInfo response didn't include allServices field. and then things go south.
,
Jul 7 2017
,
Jul 7 2017
None of the logs pasted in #1 looks particularly concerning to me. Could this maybe just be a thread [checker] issue? They seem to end with [4855:32003:0707/121801.347071:FATAL:system_hotkey_helper_mac.mm(48)] Check failed: ::content::BrowserThread::CurrentlyOn(BrowserThread::FILE). Must be called on Chrome_FileThread; actually called on TaskSchedulerSingleThreadForegroundBlocking1. 0 libbase.dylib 0x000000012485375e base::debug::StackTrace::StackTrace(unsigned long) + 174 1 libbase.dylib 0x00000001248537fd base::debug::StackTrace::StackTrace(unsigned long) + 29 2 libbase.dylib 0x0000000124851c8c base::debug::StackTrace::StackTrace() + 28 3 libbase.dylib 0x00000001248f089f logging::LogMessage::~LogMessage() + 479 4 libbase.dylib 0x00000001248ee215 logging::LogMessage::~LogMessage() + 21 5 libcontent.dylib 0x000000012c01bdbd content::SystemHotkeyHelperMac::LoadSystemHotkeys() + 349 6 libcontent.dylib 0x000000012c01cbed void base::internal::FunctorTraits<void (content::SystemHotkeyHelperMac::*)(), void>::Invoke<content::SystemHotkeyHelperMac*>(void (content::SystemHotkeyHelperMac::*)(), content::SystemHotkeyHelperMac*&&) + 125 7 libcontent.dylib 0x000000012c01cb34 void base::internal::InvokeHelper<false, void>::MakeItSo<void (content::SystemHotkeyHelperMac::* const&)(), content::SystemHotkeyHelperMac*>(void (content::SystemHotkeyHelperMac::* const&&&)(), content::SystemHotkeyHelperMac*&&) + 68 8 libcontent.dylib 0x000000012c01cac3 void base::internal::Invoker<base::internal::BindState<void (content::SystemHotkeyHelperMac::*)(), base::internal::UnretainedWrapper<content::SystemHotkeyHelperMac> >, void ()>::RunImpl<void (content::SystemHotkeyHelperMac::* const&)(), std::__1::tuple<base::internal::UnretainedWrapper<content::SystemHotkeyHelperMac> > const&, 0ul>(void (content::SystemHotkeyHelperMac::* const&&&)(), std::__1::tuple<base::internal::UnretainedWrapper<content::SystemHotkeyHelperMac> > const&&&, base::IndexSequence<0ul>) + 99 9 libcontent.dylib 0x000000012c01c9fc base::internal::Invoker<base::internal::BindState<void (content::SystemHotkeyHelperMac::*)(), base::internal::UnretainedWrapper<content::SystemHotkeyHelperMac> >, void ()>::Run(base::internal::BindStateBase*) + 44 10 libbase.dylib 0x00000001247f108f base::Callback<void (), (base::internal::CopyMode)0, (base::internal::RepeatMode)0>::Run() + 95 11 libbase.dylib 0x0000000124855cb0 base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) + 1040 12 libbase.dylib 0x0000000124b10107 base::internal::TaskTracker::PerformRunTask(std::__1::unique_ptr<base::internal::Task, std::__1::default_delete<base::internal::Task> >, base::internal::Sequence*) + 3511 13 libbase.dylib 0x0000000124b1261d base::internal::TaskTrackerPosix::PerformRunTask(std::__1::unique_ptr<base::internal::Task, std::__1::default_delete<base::internal::Task> >, base::internal::Sequence*) + 685 14 libbase.dylib 0x0000000124b0e7da base::internal::TaskTracker::RunNextTask(base::internal::Sequence*) + 1034 15 libbase.dylib 0x0000000124af589d base::internal::SchedulerWorker::Thread::ThreadMain() + 2157 16 libbase.dylib 0x0000000124b1edf8 base::(anonymous namespace)::ThreadFunc(void*) + 632 17 libsystem_pthread.dylib 0x00007fff89979899 _pthread_body + 138 18 libsystem_pthread.dylib 0x00007fff8997972a _pthread_struct_init + 0 19 libsystem_pthread.dylib 0x00007fff8997dfc9 thread_start + 13
,
Jul 7 2017
Well, system_hotkey_helper_mac.mm hasn't been changed in a long time https://chromium.googlesource.com/chromium/src/+log/f37e7fd6eb873016a023acc58a121161128aeabd/content/browser/cocoa/system_hotkey_helper_mac.mm Could this be part of the BrowserThread deprecation? Just looking at the file I don't see how LoadSystemHotkeys() doesn't get run on BrowserThread::FILE. Taking ownership of this.
,
Jul 7 2017
I don't know how the TaskScheduler is doing its thing, but I would agree on your conclusions. Re https://chromium.googlesource.com/chromium/src/+/master/docs/task_scheduler_migration.md if you switch it to base::ThreadRestrictions::AssertIOAllowed() I'll review.
,
Jul 7 2017
Also, mark that file in https://docs.google.com/spreadsheets/d/18x9PGMlfgWcBr4fDz2SEEtIwTpSjcBFT2Puib47ZF1w/edit#gid=0 when you're done.
,
Jul 7 2017
So far we have two test cases that saw ~40 sync_integration_tests failures. https://build.chromium.org/p/chromium.mac/builders/Mac10.9%20Tests%20%28dbg%29/builds/42425 https://build.chromium.org/p/chromium.mac/builders/Mac10.9%20Tests%20%28dbg%29/builds/42427 They're both on Mac10.9, and they have quite different sets of failing test cases. The other mac builders have not seen this failure. Wasn't able to find any smoking gun in either of https://chromium.googlesource.com/chromium/src/+log/e28db825e1f83e72777a670a448b7218022e9745/content/browser/browser_thread_impl.cc https://chromium.googlesource.com/chromium/src/+log/96cb9e71e6049ebab48ae85f3e405366c3e21e35/content/browser/browser_main_loop.cc First failure was Jul 7 ~19:00 +0000.
,
Jul 7 2017
I would say the solution is to convert that file to the TaskScheduler. Links in comments 5 and 6.
,
Jul 7 2017
If you want I can do it.
,
Jul 7 2017
Any content::BrowserThread::CurrentlyOn(BrowserThread::FILE) calls need to be converted to using SequenceChecker. Directly converting to the TaskScheduler while keeping these calls is an error.
,
Jul 7 2017
robliao: re the stack trace in comment 3, *somehow* this task that's supposed to be scheduled on the FILE thread ends up in the TaskScheduler. That's kinda the question here. This code wasn't converted yet, but somehow ended up running in new circumstances.
,
Jul 7 2017
My understanding of why we're seeing TaskSchedulerSingleThreadForegroundBlocking1 is because, reading https://chromium.googlesource.com/chromium/src/+/master/docs/task_scheduler_migration.md , it says > The SequencedWorkerPools and BrowserThreads (not UI/IO) are already being redirected to TaskScheduler under the hood so it's now “merely” a matter of updating the actual call sites. My suspicion was it had to do with the BrowserThread::FILE being destroyed at the very end of a test case, but if there's a race condition, it's very hard to hit. A couple hundred test cases later on my Macbook, trying to tweak the delay to just the right time, did not hit any DCHECKs. Blindly updating the code to use post_task.h now.
,
Jul 7 2017
The thread name in this case is a red herring. The DCHECK checks against the task runner, and it's picked up one that it hasn't expected. https://cs.chromium.org/chromium/src/content/browser/browser_thread_impl.cc?rcl=e1466a707d3d4bc35f3bfa38169820edf9ee6060&l=571 Race conditions in BrowserThread initialization and teardown would do that. Do these tests not join against the BrowserThreads on termination?
,
Jul 7 2017
Okay, now I see that the test brings up a full instance of Chrome for each test, this is a little more interesting. For the record, I've yet to repro this DCHECK locally on my MBP at the first failing change, making the race condition theory more likely.
,
Jul 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b356df4356092534a75811b9c31fb91f50ef012d commit b356df4356092534a75811b9c31fb91f50ef012d Author: Sky Malice <skym@chromium.org> Date: Fri Jul 07 22:59:56 2017 Update SystemHotkeyHelperMac to use post_task instead of FILE thread. Bug: 740221 Change-Id: I59bcfd1a9a6a27b00d41ba9fee65f24b7dd0fc73 Reviewed-on: https://chromium-review.googlesource.com/564177 Reviewed-by: Avi Drissman <avi@chromium.org> Commit-Queue: Sky Malice <skym@chromium.org> Cr-Commit-Position: refs/heads/master@{#485082} [modify] https://crrev.com/b356df4356092534a75811b9c31fb91f50ef012d/content/browser/cocoa/system_hotkey_helper_mac.mm
,
Jul 7 2017
I think we've hit something similar to http://crbug.com/698140 , but not exactly the same as it can still repro because of the non-atomic shutdown. The actors: 1) The BrowserThread::PostDelayedTask 2) ResetThread_FILE 3) BrowserMainLoop::ShutdownThreadsAndCleanUp 4) BrowserThread::CurrentlyOn The PostDelayedTask is crucial for this to repro (and it's a very narrow repro). First let's take a look at the call. BrowserThread::PostDelayedTask( BrowserThread::FILE, FROM_HERE, base::Bind(&SystemHotkeyHelperMac::LoadSystemHotkeys, base::Unretained(this)), base::TimeDelta::FromSeconds(kLoadHotkeysDelaySeconds = 10)); This gets run basically at Chrome startup and effectively puts the LoadSystemHotkeys call into the delayed task queue. Next, let's take a look at ResetThread_FILE, which does the following Post a WaitableEvent to drain the tasks _currently_ in flight Wait for that WaitableEvent to run Mark g_globals.Get().task_runners[FILE] = nullptr; Now, we'll take a step back and look at BrowserMainLoop::ShutdownThreadsAndCleanUp which does the following relevant actions non-atomically: ResetThread_FILE(); base::TaskScheduler::GetInstance()->Shutdown(); Finally, we look at the BrowserThread::CurrentlyOn, which is short enough to show here. bool BrowserThread::CurrentlyOn(ID identifier) { BrowserThreadGlobals& globals = g_globals.Get(); base::AutoLock lock(globals.lock); DCHECK_GE(identifier, 0); DCHECK_LT(identifier, ID_COUNT); return globals.task_runners[identifier] && globals.task_runners[identifier]->RunsTasksInCurrentSequence(); } The stage is now set for the race. If Chrome starts shutdown at around 10 seconds from when we posted LoadSystemHotKeys, the following sequence of events can happen. 1) Post the LoadSystemHotkeys delayed task. 2) ResetThread_FILE, which waits for _currently_ in flight tasks to finish execution and sets globals.Get().task_runners[FILE] to nullptr. 3) The 10 seconds delayed task fires and submits the LoadSystemHotkeys task to the FILE TaskRunner. Because delayed tasks keep track of their own task runner, the global FILE task_runner isn't needed to post this task. Note that we haven't even attempted to shut down the task scheduler by this point (and even if we did, we would instead hit http://crbug.com/698140). Now, have a FILE task in flight. 4) The task runs. It checks CurrentlyOn and very likely notes that globals.Get().task_runners[FILE] is nullptr, return false and triggering the DCHECK. If we were to keep the LoadSystemHotkeys code the same, the fix here would involve stopping Task Scheduler based delayed tasks before shutting down the browser threads.
,
Jul 10 2017
Haven't seen this come up lately and it's being worked on, so removing from sheriff queue.
,
Jul 11 2017
Hasn't happened since the fix in #15 landed, resolving. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by a...@chromium.org
, Jul 7 2017