KitchenSync timeouts at StatusChangeChecker::StartBlockingWait() |
||||||
Issue descriptionKitchen Sync consistently fails on all platforms. Sample builds: https://uberchromegw.corp.google.com/i/internal.client.kitchensync/builders/Linux%20Canary/builds/40477 https://uberchromegw.corp.google.com/i/internal.client.kitchensync/builders/Linux%20Dev/builds/33156 https://uberchromegw.corp.google.com/i/internal.client.kitchensync/builders/Mac%20Dev/builds/15652 https://uberchromegw.corp.google.com/i/internal.client.kitchensync/builders/Windows%20Dev/builds/99062 The relevant bit of log is ======================================================= [9187:9187:0420/025946.614469:VERBOSE1:login_ui_test_utils.cc(359)] Wait for login UI to be ready. [9187:9187:0420/025947.797758:VERBOSE1:login_ui_test_utils.cc(363)] Sign in user: cs.ks.canary@gmail.com [9187:9187:0420/025948.662188:WARNING:CONSOLE(2411)] "Unrecognized message from GAIA: showConfirmCancel", source: chrome://chrome-signin/gaia_auth_host.js (2411) [9187:9187:0420/025948.741004:WARNING:CONSOLE(2411)] "Unrecognized message from GAIA: clearOldAttempts", source: chrome://chrome-signin/gaia_auth_host.js (2411) [9187:9187:0420/025950.665692:INFO:CONSOLE(0)] "[DOM] Found 2 elements with non-unique id #ca: (More info: https://goo.gl/9p2vKq) %o %o", source: https://accounts.google.com/signin/v2/challenge/pwd?client_id=77185425430.apps.googleusercontent.com&hl=en-US&source=chrome&flow=signin&flowName=EmbeddedSetupChromeUsermenu&TL=AHnYQLw7hlQ30NK72vBd-0pSH_10tCJXAD6UO07mncxe0s2xE0kskZQYhJMTA37ZSSw4hgYfzMHsqA2hqZsEs4HFsNpqjo_Jw2y6VpogcXYhBSMLCmuSBULQf6ObvjnIvsl1HvdMy_-EQxbXDEdsz-fhZve7NKBaxg&cid=1&navigationDirection=forward (0) [9187:9187:0420/025951.017610:WARNING:CONSOLE(2411)] "Unrecognized message from GAIA: showConfirmCancel", source: chrome://chrome-signin/gaia_auth_host.js (2411) [9187:9187:0420/025952.183625:VERBOSE1:login_ui_test_utils.cc(79)] Google signin succeeded. [9187:9408:0420/025952.308156:WARNING:syncer_proto_util.cc(338)] Error posting from syncer: Response Code (bogus on error): -1 Content-Length (bogus on error): -1 Server Status: SYNC_AUTH_ERROR [9187:9187:0420/025959.731220:INFO:CONSOLE(1)] "Uncaught ReferenceError: inline is not defined", source: (1) [9187:9187:0420/025959.985543:WARNING:CONSOLE(0)] "Styling master document from stylesheets defined in HTML Imports is deprecated. Please refer to https://goo.gl/EGXzpw for possible migration paths.", source: (0) [9187:9187:0420/030000.817391:ERROR:CONSOLE(1)] "Uncaught ReferenceError: inline is not defined", source: chrome://sync-confirmation/ (1) [9187:9408:0420/030004.591216:WARNING:sync_encryption_handler_impl.cc(346)] Ignoring new implicit passphrase. Keystore migration already performed. [1:1:0420/030004.867991:ERROR:render_process_impl.cc(197)] WebFrame LEAKED 1 TIMES [9187:9249:0420/030006.973727:WARNING:client_tag_based_model_type_processor.cc(534)] Received remote delete for a non-existing item. client_tag_hash: Ab6T9wRSGvCB6fvvDohEvrWML5A= BrowserTestBase received signal: Terminated. Backtrace: #0 0x7fc95236c23d base::debug::StackTrace::StackTrace() #1 0x7fc95236a7fc base::debug::StackTrace::StackTrace() #2 0x000008ee1a14 content::(anonymous namespace)::DumpStackTraceSignalHandler() #3 0x7fc933661d40 <unknown> #4 0x7fc93371812d __poll #5 0x7fc936f2dfe4 <unknown> #6 0x7fc936f2e0ec g_main_context_iteration #7 0x7fc952421f5f base::MessagePumpGlib::Run() #8 0x7fc95241cf5b base::MessageLoop::Run() #9 0x7fc9524d719d base::RunLoop::Run() #10 0x000003793552 StatusChangeChecker::StartBlockingWait() #11 0x00000379336c StatusChangeChecker::Wait() #12 0x000001856fa4 TwoClientTypedUrlsSyncTest_AddThenDelete_E2ETest_Test::RunTestOnMainThread() #13 0x000008ee1433 content::BrowserTestBase::ProxyRunTestOnMainThreadLoop() #14 0x0000019da3cd _ZN4base8internal13FunctorTraitsIMN18OAuth2TokenService7FetcherEFvvEvE6InvokeIPS3_JEEEvS5_OT_DpOT0_ #15 0x0000019da314 _ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIRKMN18OAuth2TokenService7FetcherEFvvEJPS5_EEEvOT_DpOT0_ #16 0x0000019da2c5 _ZN4base8internal7InvokerINS0_9BindStateIMN18OAuth2TokenService7FetcherEFvvEJNS0_17UnretainedWrapperIS4_EEEEEFvvEE7RunImplIRKS6_RKNSt3__15tupleIJS8_EEEJLm0EEEEvOT_OT0_NSF_16integer_sequenceImJXspT1_EEEE #17 0x0000019da20c _ZN4base8internal7InvokerINS0_9BindStateIMN18OAuth2TokenService7FetcherEFvvEJNS0_17UnretainedWrapperIS4_EEEEEFvvEE3RunEPNS0_13BindStateBaseE #18 0x000001a1026d _ZNKR4base17RepeatingCallbackIFvvEE3RunEv #19 0x000003d146bf ChromeBrowserMainParts::PreMainMessageLoopRunImpl() #20 0x000003d1254e ChromeBrowserMainParts::PreMainMessageLoopRun() #21 0x7fc94bbba14c content::BrowserMainLoop::PreMainMessageLoopRun() #22 0x7fc94a99e3bd _ZN4base8internal13FunctorTraitsIMN7content12ChildProcessEFvvEvE6InvokeIPS3_JEEEvS5_OT_DpOT0_ #23 0x7fc94a99e334 _ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIMN7content12ChildProcessEFvvEJPS5_EEEvOT_DpOT0_ #24 0x7fc94a99e2e5 _ZN4base8internal7InvokerINS0_9BindStateIMN7content12ChildProcessEFvvEJNS0_17UnretainedWrapperIS4_EEEEEFvvEE7RunImplIS6_NSt3__15tupleIJS8_EEEJLm0EEEEvOT_OT0_NSD_16integer_sequenceImJXspT1_EEEE #25 0x7fc94a9a11bc _ZN4base8internal7InvokerINS0_9BindStateIMN7content14ServiceFactoryEFvvEJNS0_17UnretainedWrapperIS4_EEEEEFvvEE3RunEPNS0_13BindStateBaseE #26 0x7fc94a8a95dd _ZNKR4base17RepeatingCallbackIFvvEE3RunEv #27 0x7fc94c8899fd content::StartupTaskRunner::RunAllTasksNow() #28 0x7fc94bbb62ef content::BrowserMainLoop::CreateStartupTasks() #29 0x7fc94bbc2043 content::BrowserMainRunnerImpl::Initialize() #30 0x7fc94bbae41b content::BrowserMain() #31 0x7fc94d8c16a7 content::RunNamedProcessTypeMain() #32 0x7fc94d8c5074 content::ContentMainRunnerImpl::Run() #33 0x7fc94d8b83b5 content::ContentServiceManagerMainDelegate::RunEmbedderProcess() #34 0x7fc9463dee9c service_manager::Main() #35 0x7fc94d8be5c5 content::ContentMain() #36 0x000008ee094c content::BrowserTestBase::SetUp() #37 0x0000037c8131 InProcessBrowserTest::SetUp() #38 0x0000037a8502 SyncTest::SetUp() #39 0x0000018f634e testing::internal::HandleSehExceptionsInMethodIfSupported<>() #40 0x0000018e7252 testing::internal::HandleExceptionsInMethodIfSupported<>() #41 0x0000018ca5b4 testing::Test::Run() #42 0x0000018cb030 testing::TestInfo::Run() #43 0x0000018cbadf testing::TestCase::Run() #44 0x0000018dde98 testing::internal::UnitTestImpl::RunAllTests() #45 0x0000018f63de testing::internal::HandleSehExceptionsInMethodIfSupported<>() #46 0x0000018e8ab2 testing::internal::HandleExceptionsInMethodIfSupported<>() #47 0x0000018ddaf7 testing::UnitTest::Run() #48 0x0000039a1b71 RUN_ALL_TESTS() #49 0x00000399e53b base::TestSuite::Run() #50 0x0000037c07a8 ChromeTestSuiteRunner::RunTestSuite() #51 0x0000037c08b3 ChromeTestLauncherDelegate::RunTestSuite() #52 0x000008f074c9 content::LaunchTests() #53 0x0000037c1ba2 LaunchChromeTests() #54 0x00000374473b main #55 0x7fc93364cec5 __libc_start_main #56 0x0000016fa02a _start [73/76] TwoClientTypedUrlsSyncTest.AddThenDelete_E2ETest (TIMED OUT) ======================================================= Unfortunately, I am not able to reproduce locally on Linux. This may be related to Dice (i.e. similar to issue 828729 and issue 833324).
,
Apr 20 2018
Oh, probably I run the wrong test as well, because I've tried again and was able to reproduce locally with TwoClientTypedUrlsSyncTest.AddThenDelete_E2ETest test. It does time out with precisely the same log.
,
Apr 20 2018
Increasing the time out does not help. Second sync does not work. Unfortunately, I don't have time to look more into this now. Note that other tests fail even without deletion, thus, this is broader then just deletion sync issue.
,
Apr 25 2018
I've investigated a bit; the problem is that SyncTest::ClearServerData never finishes. SyncManagerImpl::ClearServerData does schedule the clear, but apparently it never gets executed. I'll continue looking.
,
Apr 25 2018
A bit more detail on what happens: SyncManagerImpl::ClearServerData puts the SyncSchedulerImpl into CLEAR_SERVER_DATA_MODE, and schedules a ClearServerData task which gets posted to the message loop. However, before that task runs, the scheduler is put back into configuration mode by SyncBackendHostCore::DoConfigureSyncer, posted from SyncBackendHostImpl::ConfigureDataTypes, which in turn is called from ModelTypeController::OnProcessorStarted. It looks to me like this is a race condition that's probably always been there. I guess some recent unrelated change happened to trigger it.
,
Apr 25 2018
I have a potential fix; CL coming soon.
,
Apr 25 2018
Great, thank you a lot for looking into this!
,
Apr 25 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/082307a4269557866884e470ff7e0847482fe713 commit 082307a4269557866884e470ff7e0847482fe713 Author: Marc Treib <treib@chromium.org> Date: Wed Apr 25 16:08:16 2018 ProfileSyncServiceHarness: Always call AwaitSyncSetupCompletion in SetupSync Before this CL, SetupSync would early-out if skip_passphrase_verification was set to true. I don't see a good reason for this, so now it always gets called. For reference, this logic was added in https://codereview.chromium.org/2716413003/, but I couldn't find an explanation for it there. This is an attempt to fix Sync's E2E tests. Locally, it fixes some of them, though others still fail. The problem was that a ClearServerData task got scheduled in SyncSchedulerImpl, but then before that task actually got a chance to run, the scheduler was put back into CONFIGURATION mode via SyncBackendHostImpl::ConfigureDataTypes (race condition). This change makes sure that ConfigureDataTypes (and its async tasks) gets completed before we try to clear server data. More details on the failure can be found on the bug. Bug: 835250 Change-Id: I392376d23930540ce9442911ec16e00a42abf043 Reviewed-on: https://chromium-review.googlesource.com/1028050 Reviewed-by: vitaliii <vitaliii@chromium.org> Commit-Queue: Marc Treib <treib@chromium.org> Cr-Commit-Position: refs/heads/master@{#553588} [modify] https://crrev.com/082307a4269557866884e470ff7e0847482fe713/chrome/browser/sync/test/integration/profile_sync_service_harness.cc [modify] https://crrev.com/082307a4269557866884e470ff7e0847482fe713/chrome/browser/sync/test/integration/profile_sync_service_harness.h [modify] https://crrev.com/082307a4269557866884e470ff7e0847482fe713/chrome/browser/sync/test/integration/sync_errors_test.cc
,
Apr 25 2018
,
Apr 26 2018
The NextAction date has arrived: 2018-04-26
,
Apr 26 2018
I think it got a bit better, but a bunch of tests are still failing. Seems like it's lots of apps/extensions tests, but also a bunch of others. Example log: https://logs.chromium.org/v/?s=chrome%2Fbb%2Finternal.client.kitchensync%2FLinux_Canary%2F40483%2F%2B%2Frecipes%2Fsteps%2Fsync_integration_tests%2F0%2Fstdout I guess it's time for a bisect, but I likely won't get to it today. Vitalii, if you have time, please feel free to pick this up.
,
Apr 27 2018
Okay, this is weird. Here's the first failing build on Linux Canary: https://uberchromegw.corp.google.com/i/internal.client.kitchensync/builders/Linux%20Canary/builds/40475 It got revision #551923. However, the corresponding builds on Linux Dev and Windows Dev got the exact same revision, but ran through fine: https://uberchromegw.corp.google.com/i/internal.client.kitchensync/builders/Linux%20Dev/builds/33154 https://uberchromegw.corp.google.com/i/internal.client.kitchensync/builders/Windows%20Dev/builds/99060 So either the Linux Canary failure was an unrelated flake, or maybe the revision numbers don't mean what I think (or what they usually do)? What's the difference between Linux Dev and Linux Canary anyway?!
,
Apr 30 2018
Issue 820122 has been merged into this issue.
,
May 8 2018
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by vitaliii@chromium.org
, Apr 20 2018