New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 835250 link

Starred by 3 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Chrome , Mac
Pri: 3
Type: Bug-Regression



Sign in to add a comment

KitchenSync timeouts at StatusChangeChecker::StartBlockingWait()

Project Member Reported by vitaliii@chromium.org, Apr 20 2018

Issue description

Kitchen Sync consistently fails on all platforms.

Sample builds:
https://uberchromegw.corp.google.com/i/internal.client.kitchensync/builders/Linux%20Canary/builds/40477
https://uberchromegw.corp.google.com/i/internal.client.kitchensync/builders/Linux%20Dev/builds/33156
https://uberchromegw.corp.google.com/i/internal.client.kitchensync/builders/Mac%20Dev/builds/15652
https://uberchromegw.corp.google.com/i/internal.client.kitchensync/builders/Windows%20Dev/builds/99062

The relevant bit of log is
=======================================================
[9187:9187:0420/025946.614469:VERBOSE1:login_ui_test_utils.cc(359)] Wait for login UI to be ready.
[9187:9187:0420/025947.797758:VERBOSE1:login_ui_test_utils.cc(363)] Sign in user: cs.ks.canary@gmail.com
[9187:9187:0420/025948.662188:WARNING:CONSOLE(2411)] "Unrecognized message from GAIA: showConfirmCancel", source: chrome://chrome-signin/gaia_auth_host.js (2411)
[9187:9187:0420/025948.741004:WARNING:CONSOLE(2411)] "Unrecognized message from GAIA: clearOldAttempts", source: chrome://chrome-signin/gaia_auth_host.js (2411)
[9187:9187:0420/025950.665692:INFO:CONSOLE(0)] "[DOM] Found 2 elements with non-unique id #ca: (More info: https://goo.gl/9p2vKq) %o %o", source: https://accounts.google.com/signin/v2/challenge/pwd?client_id=77185425430.apps.googleusercontent.com&hl=en-US&source=chrome&flow=signin&flowName=EmbeddedSetupChromeUsermenu&TL=AHnYQLw7hlQ30NK72vBd-0pSH_10tCJXAD6UO07mncxe0s2xE0kskZQYhJMTA37ZSSw4hgYfzMHsqA2hqZsEs4HFsNpqjo_Jw2y6VpogcXYhBSMLCmuSBULQf6ObvjnIvsl1HvdMy_-EQxbXDEdsz-fhZve7NKBaxg&cid=1&navigationDirection=forward (0)
[9187:9187:0420/025951.017610:WARNING:CONSOLE(2411)] "Unrecognized message from GAIA: showConfirmCancel", source: chrome://chrome-signin/gaia_auth_host.js (2411)
[9187:9187:0420/025952.183625:VERBOSE1:login_ui_test_utils.cc(79)] Google signin succeeded.
[9187:9408:0420/025952.308156:WARNING:syncer_proto_util.cc(338)] Error posting from syncer: Response Code (bogus on error): -1 Content-Length (bogus on error): -1 Server Status: SYNC_AUTH_ERROR
[9187:9187:0420/025959.731220:INFO:CONSOLE(1)] "Uncaught ReferenceError: inline is not defined", source:  (1)
[9187:9187:0420/025959.985543:WARNING:CONSOLE(0)] "Styling master document from stylesheets defined in HTML Imports is deprecated. Please refer to https://goo.gl/EGXzpw for possible migration paths.", source:  (0)
[9187:9187:0420/030000.817391:ERROR:CONSOLE(1)] "Uncaught ReferenceError: inline is not defined", source: chrome://sync-confirmation/ (1)
[9187:9408:0420/030004.591216:WARNING:sync_encryption_handler_impl.cc(346)] Ignoring new implicit passphrase. Keystore migration already performed.
[1:1:0420/030004.867991:ERROR:render_process_impl.cc(197)] WebFrame LEAKED 1 TIMES
[9187:9249:0420/030006.973727:WARNING:client_tag_based_model_type_processor.cc(534)] Received remote delete for a non-existing item. client_tag_hash: Ab6T9wRSGvCB6fvvDohEvrWML5A=
BrowserTestBase received signal: Terminated. Backtrace:
#0 0x7fc95236c23d base::debug::StackTrace::StackTrace()
#1 0x7fc95236a7fc base::debug::StackTrace::StackTrace()
#2 0x000008ee1a14 content::(anonymous namespace)::DumpStackTraceSignalHandler()
#3 0x7fc933661d40 <unknown>
#4 0x7fc93371812d __poll
#5 0x7fc936f2dfe4 <unknown>
#6 0x7fc936f2e0ec g_main_context_iteration
#7 0x7fc952421f5f base::MessagePumpGlib::Run()
#8 0x7fc95241cf5b base::MessageLoop::Run()
#9 0x7fc9524d719d base::RunLoop::Run()
#10 0x000003793552 StatusChangeChecker::StartBlockingWait()
#11 0x00000379336c StatusChangeChecker::Wait()
#12 0x000001856fa4 TwoClientTypedUrlsSyncTest_AddThenDelete_E2ETest_Test::RunTestOnMainThread()
#13 0x000008ee1433 content::BrowserTestBase::ProxyRunTestOnMainThreadLoop()
#14 0x0000019da3cd _ZN4base8internal13FunctorTraitsIMN18OAuth2TokenService7FetcherEFvvEvE6InvokeIPS3_JEEEvS5_OT_DpOT0_
#15 0x0000019da314 _ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIRKMN18OAuth2TokenService7FetcherEFvvEJPS5_EEEvOT_DpOT0_
#16 0x0000019da2c5 _ZN4base8internal7InvokerINS0_9BindStateIMN18OAuth2TokenService7FetcherEFvvEJNS0_17UnretainedWrapperIS4_EEEEEFvvEE7RunImplIRKS6_RKNSt3__15tupleIJS8_EEEJLm0EEEEvOT_OT0_NSF_16integer_sequenceImJXspT1_EEEE
#17 0x0000019da20c _ZN4base8internal7InvokerINS0_9BindStateIMN18OAuth2TokenService7FetcherEFvvEJNS0_17UnretainedWrapperIS4_EEEEEFvvEE3RunEPNS0_13BindStateBaseE
#18 0x000001a1026d _ZNKR4base17RepeatingCallbackIFvvEE3RunEv
#19 0x000003d146bf ChromeBrowserMainParts::PreMainMessageLoopRunImpl()
#20 0x000003d1254e ChromeBrowserMainParts::PreMainMessageLoopRun()
#21 0x7fc94bbba14c content::BrowserMainLoop::PreMainMessageLoopRun()
#22 0x7fc94a99e3bd _ZN4base8internal13FunctorTraitsIMN7content12ChildProcessEFvvEvE6InvokeIPS3_JEEEvS5_OT_DpOT0_
#23 0x7fc94a99e334 _ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIMN7content12ChildProcessEFvvEJPS5_EEEvOT_DpOT0_
#24 0x7fc94a99e2e5 _ZN4base8internal7InvokerINS0_9BindStateIMN7content12ChildProcessEFvvEJNS0_17UnretainedWrapperIS4_EEEEEFvvEE7RunImplIS6_NSt3__15tupleIJS8_EEEJLm0EEEEvOT_OT0_NSD_16integer_sequenceImJXspT1_EEEE
#25 0x7fc94a9a11bc _ZN4base8internal7InvokerINS0_9BindStateIMN7content14ServiceFactoryEFvvEJNS0_17UnretainedWrapperIS4_EEEEEFvvEE3RunEPNS0_13BindStateBaseE
#26 0x7fc94a8a95dd _ZNKR4base17RepeatingCallbackIFvvEE3RunEv
#27 0x7fc94c8899fd content::StartupTaskRunner::RunAllTasksNow()
#28 0x7fc94bbb62ef content::BrowserMainLoop::CreateStartupTasks()
#29 0x7fc94bbc2043 content::BrowserMainRunnerImpl::Initialize()
#30 0x7fc94bbae41b content::BrowserMain()
#31 0x7fc94d8c16a7 content::RunNamedProcessTypeMain()
#32 0x7fc94d8c5074 content::ContentMainRunnerImpl::Run()
#33 0x7fc94d8b83b5 content::ContentServiceManagerMainDelegate::RunEmbedderProcess()
#34 0x7fc9463dee9c service_manager::Main()
#35 0x7fc94d8be5c5 content::ContentMain()
#36 0x000008ee094c content::BrowserTestBase::SetUp()
#37 0x0000037c8131 InProcessBrowserTest::SetUp()
#38 0x0000037a8502 SyncTest::SetUp()
#39 0x0000018f634e testing::internal::HandleSehExceptionsInMethodIfSupported<>()
#40 0x0000018e7252 testing::internal::HandleExceptionsInMethodIfSupported<>()
#41 0x0000018ca5b4 testing::Test::Run()
#42 0x0000018cb030 testing::TestInfo::Run()
#43 0x0000018cbadf testing::TestCase::Run()
#44 0x0000018dde98 testing::internal::UnitTestImpl::RunAllTests()
#45 0x0000018f63de testing::internal::HandleSehExceptionsInMethodIfSupported<>()
#46 0x0000018e8ab2 testing::internal::HandleExceptionsInMethodIfSupported<>()
#47 0x0000018ddaf7 testing::UnitTest::Run()
#48 0x0000039a1b71 RUN_ALL_TESTS()
#49 0x00000399e53b base::TestSuite::Run()
#50 0x0000037c07a8 ChromeTestSuiteRunner::RunTestSuite()
#51 0x0000037c08b3 ChromeTestLauncherDelegate::RunTestSuite()
#52 0x000008f074c9 content::LaunchTests()
#53 0x0000037c1ba2 LaunchChromeTests()
#54 0x00000374473b main
#55 0x7fc93364cec5 __libc_start_main
#56 0x0000016fa02a _start
[73/76] TwoClientTypedUrlsSyncTest.AddThenDelete_E2ETest (TIMED OUT)
=======================================================

Unfortunately, I am not able to reproduce locally on Linux.

This may be related to Dice (i.e. similar to issue 828729  and issue 833324).


 
Summary: KitchenSync timeouts at StatusChangeChecker::StartBlockingWait() (was: KitchenSync fails when signing in.)
Now I understood that this is probably not sign-in related.

Oh, probably I run the wrong test as well, because I've tried again and was able to reproduce locally with TwoClientTypedUrlsSyncTest.AddThenDelete_E2ETest test. It does time out with precisely the same log. 
Increasing the time out does not help. Second sync does not work. Unfortunately, I don't have time to look more into this now. Note that other tests fail even without deletion, thus, this is broader then just deletion sync issue.

Comment 4 by treib@chromium.org, Apr 25 2018

Cc: treib@chromium.org vitaliii@chromium.org
I've investigated a bit; the problem is that SyncTest::ClearServerData never finishes. SyncManagerImpl::ClearServerData does schedule the clear, but apparently it never gets executed. I'll continue looking.

Comment 5 by treib@chromium.org, Apr 25 2018

A bit more detail on what happens:
SyncManagerImpl::ClearServerData puts the SyncSchedulerImpl into CLEAR_SERVER_DATA_MODE, and schedules a ClearServerData task which gets posted to the message loop. However, before that task runs, the scheduler is put back into configuration mode by SyncBackendHostCore::DoConfigureSyncer, posted from SyncBackendHostImpl::ConfigureDataTypes, which in turn is called from ModelTypeController::OnProcessorStarted.

It looks to me like this is a race condition that's probably always been there. I guess some recent unrelated change happened to trigger it.

Comment 6 by treib@chromium.org, Apr 25 2018

Cc: -treib@chromium.org
Owner: treib@chromium.org
Status: Started (was: Available)
I have a potential fix; CL coming soon.
Great, thank you a lot for looking into this!
Project Member

Comment 8 by bugdroid1@chromium.org, Apr 25 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/082307a4269557866884e470ff7e0847482fe713

commit 082307a4269557866884e470ff7e0847482fe713
Author: Marc Treib <treib@chromium.org>
Date: Wed Apr 25 16:08:16 2018

ProfileSyncServiceHarness: Always call AwaitSyncSetupCompletion in SetupSync

Before this CL, SetupSync would early-out if skip_passphrase_verification
was set to true. I don't see a good reason for this, so now it always gets
called.
For reference, this logic was added in
https://codereview.chromium.org/2716413003/, but I couldn't find an
explanation for it there.

This is an attempt to fix Sync's E2E tests. Locally, it fixes some of them,
though others still fail.
The problem was that a ClearServerData task got scheduled in
SyncSchedulerImpl, but then before that task actually got a chance to run,
the scheduler was put back into CONFIGURATION mode via
SyncBackendHostImpl::ConfigureDataTypes (race condition). This change makes
sure that ConfigureDataTypes (and its async tasks) gets completed before
we try to clear server data.
More details on the failure can be found on the bug.

Bug: 835250
Change-Id: I392376d23930540ce9442911ec16e00a42abf043
Reviewed-on: https://chromium-review.googlesource.com/1028050
Reviewed-by: vitaliii <vitaliii@chromium.org>
Commit-Queue: Marc Treib <treib@chromium.org>
Cr-Commit-Position: refs/heads/master@{#553588}
[modify] https://crrev.com/082307a4269557866884e470ff7e0847482fe713/chrome/browser/sync/test/integration/profile_sync_service_harness.cc
[modify] https://crrev.com/082307a4269557866884e470ff7e0847482fe713/chrome/browser/sync/test/integration/profile_sync_service_harness.h
[modify] https://crrev.com/082307a4269557866884e470ff7e0847482fe713/chrome/browser/sync/test/integration/sync_errors_test.cc

Comment 9 by treib@chromium.org, Apr 25 2018

NextAction: 2018-04-26
The NextAction date has arrived: 2018-04-26

Comment 11 by treib@chromium.org, Apr 26 2018

NextAction: ----
I think it got a bit better, but a bunch of tests are still failing. Seems like it's lots of apps/extensions tests, but also a bunch of others. Example log:
https://logs.chromium.org/v/?s=chrome%2Fbb%2Finternal.client.kitchensync%2FLinux_Canary%2F40483%2F%2B%2Frecipes%2Fsteps%2Fsync_integration_tests%2F0%2Fstdout

I guess it's time for a bisect, but I likely won't get to it today. Vitalii, if you have time, please feel free to pick this up.

Comment 12 by treib@chromium.org, Apr 27 2018

Okay, this is weird. Here's the first failing build on Linux Canary:
https://uberchromegw.corp.google.com/i/internal.client.kitchensync/builders/Linux%20Canary/builds/40475
It got revision #551923. However, the corresponding builds on Linux Dev and Windows Dev got the exact same revision, but ran through fine:
https://uberchromegw.corp.google.com/i/internal.client.kitchensync/builders/Linux%20Dev/builds/33154
https://uberchromegw.corp.google.com/i/internal.client.kitchensync/builders/Windows%20Dev/builds/99060

So either the Linux Canary failure was an unrelated flake, or maybe the revision numbers don't mean what I think (or what they usually do)? What's the difference between Linux Dev and Linux Canary anyway?!

Comment 13 by treib@chromium.org, Apr 30 2018

Issue 820122 has been merged into this issue.
Cc: treib@chromium.org
Labels: -OS-Android -Pri-1 OS-Chrome Pri-3
Owner: ----
Status: Available (was: Started)

Sign in to add a comment