Issue metadata
Sign in to add a comment
|
TwoClientPrintersSyncTest.ConflictResolution is Flaky |
||||||||||||||||||||||
Issue descriptionFindit has detected a flake at test TwoClientPrintersSyncTest.ConflictResolution. Culprit (93.9% confidence): https://chromium-review.googlesource.com/q/I61aa695adf297beb14ec4e96fbb08a30516afd7b Regression range: https://crrev.com/3aafe3004a00abc46eab9cdffe7d0f9c448692fc..b99dae9f756ee45daedaa0c04be14d033383a5d4?pretty=fuller Analysis: https://findit-for-me.appspot.com/waterfall/flake?key=ag9zfmZpbmRpdC1mb3ItbWVytQELEhdNYXN0ZXJGbGFrZUFuYWx5c2lzUm9vdCJ_Y2hyb21pdW0uY2hyb21pdW1vcy9saW51eC1jaHJvbWVvcy1kYmcvNDE0OC9zeW5jX2ludGVncmF0aW9uX3Rlc3RzL1ZIZHZRMnhwWlc1MFVISnBiblJsY25OVGVXNWpWR1Z6ZEM1RGIyNW1iR2xqZEZKbGMyOXNkWFJwYjI0PQwLEhNNYXN0ZXJGbGFrZUFuYWx5c2lzGAEM If this result was incorrect, apply the label Findit-Incorrect-Result, mark the bug as Untriaged and the component Tools>Test>Findit>Flakiness.
,
Feb 9 2018
Culprit CL https://chromium-review.googlesource.com/c/chromium/src/+/890000
,
Feb 9 2018
I have not figured if any dependencies landed, so not reverting the CL, but disabling the test here: https://chromium-review.googlesource.com/c/chromium/src/+/911473
,
Feb 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1b4b187c7354112db591b845abd06a0a35d664dc commit 1b4b187c7354112db591b845abd06a0a35d664dc Author: Olga Sharonova <olka@chromium.org> Date: Fri Feb 09 15:11:55 2018 Disabling TwoClientPrintersSyncTest.ConflictResolution: flaky on ChromeOS TBR=zea@chromium.org Bug: 810408 Change-Id: I7f8fed6314d1a4653a95303602290c158e22f742 Reviewed-on: https://chromium-review.googlesource.com/911473 Reviewed-by: Olga Sharonova <olka@chromium.org> Commit-Queue: Olga Sharonova <olka@chromium.org> Cr-Commit-Position: refs/heads/master@{#535715} [modify] https://crrev.com/1b4b187c7354112db591b845abd06a0a35d664dc/chrome/browser/sync/test/integration/two_client_printers_sync_test.cc
,
Feb 9 2018
,
Feb 12 2018
Started: https://chromium-review.googlesource.com/c/chromium/src/+/914070
,
Feb 12 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1f494356e38b755bc14e0eb043e28e7bb1cd9d3b commit 1f494356e38b755bc14e0eb043e28e7bb1cd9d3b Author: Francois Doray <fdoray@chromium.org> Date: Mon Feb 12 22:27:38 2018 Fix TwoClientPrintersSyncTest.ConflictResolution. Since https://chromium-review.googlesource.com/c/chromium/src/+/890000, there is no implicit calls to content::RunAllTasksUntilIdle() to wait for all pending tasks to run when GetPrinterStore() is called. With this CL, an explicit call to content::RunAllTasksUntilIdle() is added to TwoClientPrintersSyncTest.ConflictResolution to make sure that both descriptions are taken into consideration and that conflict resolution runs before expectations are verified. Bug: 810408 Change-Id: I73c86cf14ef992261d0c7ee0740270a630dbf596 Reviewed-on: https://chromium-review.googlesource.com/914070 Reviewed-by: Nicolas Zea <zea@chromium.org> Commit-Queue: François Doray <fdoray@chromium.org> Cr-Commit-Position: refs/heads/master@{#536215} [modify] https://crrev.com/1f494356e38b755bc14e0eb043e28e7bb1cd9d3b/chrome/browser/sync/test/integration/two_client_printers_sync_test.cc
,
Feb 15 2018
Seems still not fixed? https://luci-milo.appspot.com/buildbot/chromium.chromiumos/linux-chromeos-dbg/4281 which grabbed revision "refs/heads/master@{#537024}" which is after your cl landed
,
Mar 15 2018
,
Mar 29 2018
The test is still quite flaky on ChromeOS dbg: https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=sync_integration_tests&tests=TwoClientPrintersSyncTest.ConflictResolution Any updates on this?
,
Apr 9 2018
fdoray@ Friendly ping! Any updates on this?
,
Apr 9 2018
I'm not able to reproduce locally. The conflict should be resolved by: https://cs.chromium.org/chromium/src/chrome/browser/chromeos/printing/printers_sync_bridge.cc?l=294&rcl=d8280c4c838badf64d9d7781dce11005094cae41 The update timestamp is set by: https://cs.chromium.org/chromium/src/chrome/browser/chromeos/printing/printers_sync_bridge.cc?l=334&rcl=802652a497f0d7bcce2388e6820984e1133b8fdd Since I'm not a sync expert, I'm tempted to add log statements to be able to better debug failures on bots. Any suggestions?
,
Apr 9 2018
Tentative fix uploaded at https://chromium-review.googlesource.com/c/chromium/src/+/1003224
,
Apr 10 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/797cc86ad91fdb00004d692fb491f4c778f41360 commit 797cc86ad91fdb00004d692fb491f4c778f41360 Author: Francois Doray <fdoray@chromium.org> Date: Tue Apr 10 14:30:03 2018 Wait for description propagation in TwoClientPrintersSyncTest.ConflictResolution. Previously, the test ran all pending tasks, waited for all stores to be equal, and then verified that the printer description was equal in all stores. It is better practice not to wait for all pending tasks to run, and instead wait for the specific condition that we want to verify. Bug: 810408 Change-Id: Ifa5fe80368285786664498991c82c1ccdecb5072 Reviewed-on: https://chromium-review.googlesource.com/1003224 Commit-Queue: François Doray <fdoray@chromium.org> Reviewed-by: Nicolas Zea (slow) <zea@chromium.org> Cr-Commit-Position: refs/heads/master@{#549522} [modify] https://crrev.com/797cc86ad91fdb00004d692fb491f4c778f41360/chrome/browser/sync/test/integration/two_client_printers_sync_test.cc
,
Apr 27 2018
Looks like this is still flaky, both in debug and release: https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=sync_integration_tests&tests=TwoClientPrintersSyncTest.ConflictResolution
,
May 14 2018
fdoray@: is there a status update? Thanks.
,
Jun 7 2018
Yet another ping - this is still flaky. Any updates?
,
Jun 7 2018
Created a CL that reverts most of the change that created the original flakyness https://chromium-review.googlesource.com/c/chromium/src/+/1090753 It is problematic to have a call to content::RunAllTasksUntilIdle() inside GetPrinterStore(), because it is called from a task, and we want to prevent calls to content::RunAllTasksUntilIdle() from within tasks (a tasks can't wait for all tasks to run, because it would be waiting for its own completion...). However, I don't have the bandwidth to debug this more at this time.
,
Jun 7 2018
I think a cleaner solution to this problem is to avoid RunAll-like functions altogether and expose appropriate methods in PrintersSyncBridge. For example, PrinterSyncBridge::IsLoaded(), in combination with slight modification of when the observers are invoked in StoreProxy::OnReadAllData() during startup (ideally IMO sync metadata should be read prior to notifying observers). Having said that, I'm fine with any temporary workaround to avoid flakiness.
,
Jun 7 2018
Yes, I totally agree that RunAll-like functions should be avoided. They make tests very hard to debug (we don't know what the code is trying to wait for).
,
Jun 20 2018
fdoray@: friendly ping, thanks.
,
Jul 2
fdoray@: friendly ping, thanks.
,
Jul 16
Ping again!
,
Jul 30
Triage ping, this being a P1 it should have regular status update.
,
Aug 6
Triage ping, is this really a P1?
,
Aug 7
A potential fix would be to add a call to content::RunAllTasksUntilIdle() in the test body to mimic the behavior prior to the CL that broke things (content::RunAllTasksUntilIdle() was called from within GetPrinterStore()). I wrote a CL that does that: https://chromium-review.googlesource.com/c/chromium/src/+/1165800 However, I think it's wrong to do that. If we wait for tasks to run between the 2 description changes, we are not testing a real conflict. Note: A revert of the CL that broke things is not possible, because calling content::RunAllTasksUntilIdle() from within a task is no longer supported (and GetPrinterStore() is called from within tasks). I'll let the sync team decide whether this is a P1.
,
Aug 10
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/dbdc5ea0f31e37af9ef880d26e5e0a5e362fd914 commit dbdc5ea0f31e37af9ef880d26e5e0a5e362fd914 Author: Francois Doray <fdoray@chromium.org> Date: Fri Aug 10 14:31:02 2018 Fix flakiness in TwoClientPrintersSyncTest.ConflictResolution with RunAllTasksUntilIdle(). This CL adds a call to RunAllTasksUntilIdle() between the 2 description changes in TwoClientPrintersSyncTest.ConflictResolution. This mimics what was done prior to https://chromium-review.googlesource.com/c/chromium/src/+/890000 and should eliminate flakiness. Bug: 810408 Change-Id: I52b7bc3a0ac76b4b2ddb0881b60ff137ba93e2c8 Reviewed-on: https://chromium-review.googlesource.com/1165800 Reviewed-by: Mikel Astiz <mastiz@chromium.org> Commit-Queue: François Doray <fdoray@chromium.org> Cr-Commit-Position: refs/heads/master@{#582152} [modify] https://crrev.com/dbdc5ea0f31e37af9ef880d26e5e0a5e362fd914/chrome/browser/sync/test/integration/two_client_printers_sync_test.cc
,
Aug 24
Sync triage ping. Is there any updates?
,
Aug 24
The CL https://chromium-review.googlesource.com/1165800 definitely reduced flakiness, but we still see a few timeouts. https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=sync_integration_tests&showAllRuns=true&tests=TwoClientPrintersSyncTest.ConflictResolution Is it fine if I re-assign to mastiz@ who suggested "to avoid RunAll-like functions altogether"?
,
Sep 9
sync-triage ping
,
Sep 17
sync-triage ping: mastiz@ any progress here?
,
Sep 26
sync-triage ping: mastiz@ any progress here?
,
Sep 27
I think this case is another one where ConflictResolution tests expect something that is not real: the outcome of the scenario where two clients edit the same entity simultaneously is just non-deterministic. I'll change the test to reflect that.
,
Sep 27
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/268c3258284b0e4ef2be508284cca7c6ef8ced97 commit 268c3258284b0e4ef2be508284cca7c6ef8ced97 Author: Mikel Astiz <mastiz@chromium.org> Date: Thu Sep 27 21:22:34 2018 Fix flakiness in TwoClientPrintersSyncTest.ConflictResolution The hypothesis here is that the test outcome is, strictly speaking, non-deterministic, because a server with an eventual consistency model cannot guarantee how exactly a conflict will be resolved (although it will guarantee that clients converge). In this model, the last commit will win. We fix the previously existing test by making sure which of the two clients commits first to the server. This is achieved by mimic-ing an offline state (or poor connectivity compared to the other client). In order to verify the hypothesis, we add a second test that enables strong consistency model on the server (implemented as part of this patch), and hence relies on that mechanism to expect a deterministic outcome without having to put clients offline. Bug: 810408 Change-Id: I2fa3fd8c941674344030a76e76f7855131c1908e Reviewed-on: https://chromium-review.googlesource.com/1249482 Commit-Queue: Mikel Astiz <mastiz@chromium.org> Reviewed-by: Tim Schumann <tschumann@chromium.org> Cr-Commit-Position: refs/heads/master@{#594865} [modify] https://crrev.com/268c3258284b0e4ef2be508284cca7c6ef8ced97/chrome/browser/sync/test/integration/two_client_printers_sync_test.cc [modify] https://crrev.com/268c3258284b0e4ef2be508284cca7c6ef8ced97/components/sync/engine_impl/loopback_server/loopback_server.cc [modify] https://crrev.com/268c3258284b0e4ef2be508284cca7c6ef8ced97/components/sync/engine_impl/loopback_server/loopback_server.h [modify] https://crrev.com/268c3258284b0e4ef2be508284cca7c6ef8ced97/components/sync/test/fake_server/fake_server.cc [modify] https://crrev.com/268c3258284b0e4ef2be508284cca7c6ef8ced97/components/sync/test/fake_server/fake_server.h
,
Oct 1
Flakiness dashboard confirms the test is no longer flaky, which includes the newly introduced tests. This confirms the hypothesis that the original test was not realistic and exercised a scenario that has a non-deterministic outcome, at least without a server with a strong consistency model.
,
Oct 1
Flakiness dashboard confirms the test is no longer flaky, which includes the newly introduced tests. This confirms the hypothesis that the original test was not realistic and exercised a scenario that has a non-deterministic outcome, at least without a server with a strong consistency model. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by mcnee@chromium.org
, Feb 8 2018Labels: -Pri-3 OS-Chrome Pri-1 Type-Bug-Regression
Owner: fdoray@chromium.org