New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 810408 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 1
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug-Regression



Sign in to add a comment

TwoClientPrintersSyncTest.ConflictResolution is Flaky

Project Member Reported by Findit, Feb 8 2018

Issue description

Comment 1 by mcnee@chromium.org, Feb 8 2018

Components: Services>Sync
Labels: -Pri-3 OS-Chrome Pri-1 Type-Bug-Regression
Owner: fdoray@chromium.org
Findit's conclusion appears to be accurate.

This is flaking on linux-chromeos-dbg, but it's fine on linux-chromeos-rel.
https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=sync_integration_tests&tests=TwoClientPrintersSyncTest.ConflictResolution

Comment 2 by olka@chromium.org, Feb 9 2018

Cc: robliao@chromium.org zea@chromium.org
Status: Assigned (was: Available)
Culprit CL https://chromium-review.googlesource.com/c/chromium/src/+/890000

Comment 3 by olka@chromium.org, Feb 9 2018

I have not figured if any dependencies landed, so not reverting the CL, but disabling the test here:
https://chromium-review.googlesource.com/c/chromium/src/+/911473
Project Member

Comment 4 by bugdroid1@chromium.org, Feb 9 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1b4b187c7354112db591b845abd06a0a35d664dc

commit 1b4b187c7354112db591b845abd06a0a35d664dc
Author: Olga Sharonova <olka@chromium.org>
Date: Fri Feb 09 15:11:55 2018

Disabling TwoClientPrintersSyncTest.ConflictResolution: flaky on ChromeOS

TBR=zea@chromium.org

Bug:  810408 
Change-Id: I7f8fed6314d1a4653a95303602290c158e22f742
Reviewed-on: https://chromium-review.googlesource.com/911473
Reviewed-by: Olga Sharonova <olka@chromium.org>
Commit-Queue: Olga Sharonova <olka@chromium.org>
Cr-Commit-Position: refs/heads/master@{#535715}
[modify] https://crrev.com/1b4b187c7354112db591b845abd06a0a35d664dc/chrome/browser/sync/test/integration/two_client_printers_sync_test.cc

Comment 5 by olka@chromium.org, Feb 9 2018

Labels: -Sheriff-Chromium Test-Disabled

Comment 6 by fdoray@chromium.org, Feb 12 2018

Status: Started (was: Assigned)
Started: https://chromium-review.googlesource.com/c/chromium/src/+/914070
Project Member

Comment 7 by bugdroid1@chromium.org, Feb 12 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1f494356e38b755bc14e0eb043e28e7bb1cd9d3b

commit 1f494356e38b755bc14e0eb043e28e7bb1cd9d3b
Author: Francois Doray <fdoray@chromium.org>
Date: Mon Feb 12 22:27:38 2018

Fix TwoClientPrintersSyncTest.ConflictResolution.

Since https://chromium-review.googlesource.com/c/chromium/src/+/890000,
there is no implicit calls to content::RunAllTasksUntilIdle() to wait
for all pending tasks to run when GetPrinterStore() is called.

With this CL, an explicit call to content::RunAllTasksUntilIdle() is
added to TwoClientPrintersSyncTest.ConflictResolution to make sure
that both descriptions are taken into consideration and that conflict
resolution runs before expectations are verified.

Bug:  810408 
Change-Id: I73c86cf14ef992261d0c7ee0740270a630dbf596
Reviewed-on: https://chromium-review.googlesource.com/914070
Reviewed-by: Nicolas Zea <zea@chromium.org>
Commit-Queue: François Doray <fdoray@chromium.org>
Cr-Commit-Position: refs/heads/master@{#536215}
[modify] https://crrev.com/1f494356e38b755bc14e0eb043e28e7bb1cd9d3b/chrome/browser/sync/test/integration/two_client_printers_sync_test.cc

Seems still not fixed?

https://luci-milo.appspot.com/buildbot/chromium.chromiumos/linux-chromeos-dbg/4281

which grabbed revision
"refs/heads/master@{#537024}"

which is after your cl landed
Labels: Hotlist-DisableReview
fdoray@ Friendly ping! Any updates on this?
I'm not able to reproduce locally.

The conflict should be resolved by:
https://cs.chromium.org/chromium/src/chrome/browser/chromeos/printing/printers_sync_bridge.cc?l=294&rcl=d8280c4c838badf64d9d7781dce11005094cae41

The update timestamp is set by:
https://cs.chromium.org/chromium/src/chrome/browser/chromeos/printing/printers_sync_bridge.cc?l=334&rcl=802652a497f0d7bcce2388e6820984e1133b8fdd

Since I'm not a sync expert, I'm tempted to add log statements to be able to better debug failures on bots.

Any suggestions?
Project Member

Comment 14 by bugdroid1@chromium.org, Apr 10 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/797cc86ad91fdb00004d692fb491f4c778f41360

commit 797cc86ad91fdb00004d692fb491f4c778f41360
Author: Francois Doray <fdoray@chromium.org>
Date: Tue Apr 10 14:30:03 2018

Wait for  description propagation in TwoClientPrintersSyncTest.ConflictResolution.

Previously, the test ran all pending tasks, waited for all stores to be
equal, and then verified that the printer description was equal in all
stores.

It is better practice not to wait for all pending tasks to run, and
instead wait for the specific condition that we want to verify.

Bug:  810408 
Change-Id: Ifa5fe80368285786664498991c82c1ccdecb5072
Reviewed-on: https://chromium-review.googlesource.com/1003224
Commit-Queue: François Doray <fdoray@chromium.org>
Reviewed-by: Nicolas Zea (slow) <zea@chromium.org>
Cr-Commit-Position: refs/heads/master@{#549522}
[modify] https://crrev.com/797cc86ad91fdb00004d692fb491f4c778f41360/chrome/browser/sync/test/integration/two_client_printers_sync_test.cc

fdoray@: is there a status update? Thanks.
Yet another ping - this is still flaky. Any updates?
Created a CL that reverts most of the change that created the original flakyness https://chromium-review.googlesource.com/c/chromium/src/+/1090753

It is problematic to have a call to content::RunAllTasksUntilIdle() inside GetPrinterStore(), because it is called from a task, and we want to prevent calls to content::RunAllTasksUntilIdle() from within tasks (a tasks can't wait for all tasks to run, because it would be waiting for its own completion...).

However, I don't have the bandwidth to debug this more at this time.
I think a cleaner solution to this problem is to avoid RunAll-like functions altogether and expose appropriate methods in PrintersSyncBridge. For example, PrinterSyncBridge::IsLoaded(), in combination with slight modification of when the observers are invoked in StoreProxy::OnReadAllData() during startup (ideally IMO sync metadata should be read prior to notifying observers).

Having said that, I'm fine with any temporary workaround to avoid flakiness.
 
Yes, I totally agree that RunAll-like functions should be avoided. They make tests very hard to debug (we don't know what the code is trying to wait for).
fdoray@: friendly ping, thanks.
fdoray@: friendly ping, thanks.
Ping again!
Triage ping, this being a P1 it should have regular status update.
Triage ping, is this really a P1?

Comment 26 Deleted

A potential fix would be to add a call to content::RunAllTasksUntilIdle() in the test body to mimic the behavior prior to the CL that broke things (content::RunAllTasksUntilIdle() was called from within GetPrinterStore()). I wrote a CL that does that: https://chromium-review.googlesource.com/c/chromium/src/+/1165800 However, I think it's wrong to do that. If we wait for tasks to run between the 2 description changes, we are not testing a real conflict.

Note: A revert of the CL that broke things is not possible, because calling content::RunAllTasksUntilIdle() from within a task is no longer supported (and GetPrinterStore() is called from within tasks).

I'll let the sync team decide whether this is a P1.
Project Member

Comment 28 by bugdroid1@chromium.org, Aug 10

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/dbdc5ea0f31e37af9ef880d26e5e0a5e362fd914

commit dbdc5ea0f31e37af9ef880d26e5e0a5e362fd914
Author: Francois Doray <fdoray@chromium.org>
Date: Fri Aug 10 14:31:02 2018

Fix flakiness in TwoClientPrintersSyncTest.ConflictResolution with RunAllTasksUntilIdle().

This CL adds a call to RunAllTasksUntilIdle() between the 2 description changes
in TwoClientPrintersSyncTest.ConflictResolution. This mimics what was
done prior to https://chromium-review.googlesource.com/c/chromium/src/+/890000
and should eliminate flakiness.

Bug:  810408 
Change-Id: I52b7bc3a0ac76b4b2ddb0881b60ff137ba93e2c8
Reviewed-on: https://chromium-review.googlesource.com/1165800
Reviewed-by: Mikel Astiz <mastiz@chromium.org>
Commit-Queue: François Doray <fdoray@chromium.org>
Cr-Commit-Position: refs/heads/master@{#582152}
[modify] https://crrev.com/dbdc5ea0f31e37af9ef880d26e5e0a5e362fd914/chrome/browser/sync/test/integration/two_client_printers_sync_test.cc

Sync triage ping. Is there any updates?
Cc: fdoray@chromium.org
Owner: mastiz@chromium.org
The CL https://chromium-review.googlesource.com/1165800 definitely reduced flakiness, but we still see a few timeouts.
https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=sync_integration_tests&showAllRuns=true&tests=TwoClientPrintersSyncTest.ConflictResolution

Is it fine if I re-assign to mastiz@ who suggested "to avoid RunAll-like functions altogether"?
sync-triage ping
sync-triage ping: mastiz@ any progress here?
sync-triage ping: mastiz@ any progress here?
I think this case is another one where ConflictResolution tests expect something that is not real: the outcome of the scenario where two clients edit the same entity simultaneously is just non-deterministic.

I'll change the test to reflect that.
Project Member

Comment 35 by bugdroid1@chromium.org, Sep 27

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/268c3258284b0e4ef2be508284cca7c6ef8ced97

commit 268c3258284b0e4ef2be508284cca7c6ef8ced97
Author: Mikel Astiz <mastiz@chromium.org>
Date: Thu Sep 27 21:22:34 2018

Fix flakiness in TwoClientPrintersSyncTest.ConflictResolution

The hypothesis here is that the test outcome is, strictly speaking,
non-deterministic, because a server with an eventual consistency model
cannot guarantee how exactly a conflict will be resolved (although it
will guarantee that clients converge). In this model, the last commit
will win.

We fix the previously existing test by making sure which of the two
clients commits first to the server. This is achieved by mimic-ing an
offline state (or poor connectivity compared to the other client).

In order to verify the hypothesis, we add a second test that enables
strong consistency model on the server (implemented as part of this
patch), and hence relies on that mechanism to expect a deterministic
outcome without having to put clients offline.

Bug:  810408 
Change-Id: I2fa3fd8c941674344030a76e76f7855131c1908e
Reviewed-on: https://chromium-review.googlesource.com/1249482
Commit-Queue: Mikel Astiz <mastiz@chromium.org>
Reviewed-by: Tim Schumann <tschumann@chromium.org>
Cr-Commit-Position: refs/heads/master@{#594865}
[modify] https://crrev.com/268c3258284b0e4ef2be508284cca7c6ef8ced97/chrome/browser/sync/test/integration/two_client_printers_sync_test.cc
[modify] https://crrev.com/268c3258284b0e4ef2be508284cca7c6ef8ced97/components/sync/engine_impl/loopback_server/loopback_server.cc
[modify] https://crrev.com/268c3258284b0e4ef2be508284cca7c6ef8ced97/components/sync/engine_impl/loopback_server/loopback_server.h
[modify] https://crrev.com/268c3258284b0e4ef2be508284cca7c6ef8ced97/components/sync/test/fake_server/fake_server.cc
[modify] https://crrev.com/268c3258284b0e4ef2be508284cca7c6ef8ced97/components/sync/test/fake_server/fake_server.h

Status: Fixed (was: Started)
Flakiness dashboard confirms the test is no longer flaky, which includes the newly introduced tests.

This confirms the hypothesis that the original test was not realistic and exercised a scenario that has a non-deterministic outcome, at least without a server with a strong consistency model.
Flakiness dashboard confirms the test is no longer flaky, which includes the newly introduced tests.

This confirms the hypothesis that the original test was not realistic and exercised a scenario that has a non-deterministic outcome, at least without a server with a strong consistency model.

Sign in to add a comment