Migrate drive::JobScheduler to NetworkConnectionTracker |
|||||||
Issue descriptiondrive::JobScheduler currently uses net::NetworkChangeNotifier to receive network changes. With network service, that will need to be converted to using NetworkConnectionTracker's observer APIs.
,
Oct 5
,
Nov 17
,
Dec 6
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e2d41b1393eb57790a1634640b1aa149a7c5fc24 commit e2d41b1393eb57790a1634640b1aa149a7c5fc24 Author: Jun Cai <juncai@chromium.org> Date: Thu Dec 06 17:15:52 2018 Network Service: Migrate drive::JobScheduler to NetworkConnectionTracker This CL migrates drive::JobScheduler from NetworkChangeNotifier to NetworkConnectionTracker, which works with the network service enabled. The "Initial upload" patch of this CL is the same as: https://chromium-review.googlesource.com/c/chromium/src/+/1236558 Bug: 887038 Change-Id: Ib19f501f6a85a0c32dc81d47ea4b161b5a726efe Reviewed-on: https://chromium-review.googlesource.com/c/1287040 Reviewed-by: Ryo Hashimoto <hashimoto@chromium.org> Reviewed-by: Naoki Fukino <fukino@chromium.org> Commit-Queue: Jun Cai <juncai@chromium.org> Cr-Commit-Position: refs/heads/master@{#614393} [modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/chrome/browser/chromeos/drive/drive_integration_service.cc [modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/chrome/browser/chromeos/file_manager/file_manager_browsertest_base.cc [modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/about_resource_loader_unittest.cc [modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/change_list_loader_unittest.cc [modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/directory_loader_unittest.cc [modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/file_system/operation_test_base.cc [modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/file_system_unittest.cc [modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/job_scheduler.cc [modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/job_scheduler.h [modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/job_scheduler_unittest.cc [modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/start_page_token_loader_unittest.cc [modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/sync_client_unittest.cc [modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/team_drive_change_list_loader_unittest.cc [modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/team_drive_list_loader_unittest.cc
,
Dec 6
,
Dec 6
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/bee8665911f1e87abbd90b9aa4e0ccb141ee2575 commit bee8665911f1e87abbd90b9aa4e0ccb141ee2575 Author: Jun Cai <juncai@chromium.org> Date: Thu Dec 06 19:06:24 2018 Revert "Network Service: Migrate drive::JobScheduler to NetworkConnectionTracker" This reverts commit e2d41b1393eb57790a1634640b1aa149a7c5fc24. Reason for revert: linux-chromeos-rel bot failed at: https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/linux-chromeos-rel/16932 Original change's description: > Network Service: Migrate drive::JobScheduler to NetworkConnectionTracker > > This CL migrates drive::JobScheduler from NetworkChangeNotifier to > NetworkConnectionTracker, which works with the network service enabled. > > The "Initial upload" patch of this CL is the same as: > https://chromium-review.googlesource.com/c/chromium/src/+/1236558 > > Bug: 887038 > Change-Id: Ib19f501f6a85a0c32dc81d47ea4b161b5a726efe > Reviewed-on: https://chromium-review.googlesource.com/c/1287040 > Reviewed-by: Ryo Hashimoto <hashimoto@chromium.org> > Reviewed-by: Naoki Fukino <fukino@chromium.org> > Commit-Queue: Jun Cai <juncai@chromium.org> > Cr-Commit-Position: refs/heads/master@{#614393} TBR=hashimoto@chromium.org,fukino@chromium.org,juncai@chromium.org Change-Id: I72e70bae035cab91ec3d325d7aec66d9f90b5499 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 887038 Reviewed-on: https://chromium-review.googlesource.com/c/1366260 Reviewed-by: Jun Cai <juncai@chromium.org> Commit-Queue: Jun Cai <juncai@chromium.org> Cr-Commit-Position: refs/heads/master@{#614436} [modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/chrome/browser/chromeos/drive/drive_integration_service.cc [modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/chrome/browser/chromeos/file_manager/file_manager_browsertest_base.cc [modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/about_resource_loader_unittest.cc [modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/change_list_loader_unittest.cc [modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/directory_loader_unittest.cc [modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/file_system/operation_test_base.cc [modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/file_system_unittest.cc [modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/job_scheduler.cc [modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/job_scheduler.h [modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/job_scheduler_unittest.cc [modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/start_page_token_loader_unittest.cc [modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/sync_client_unittest.cc [modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/team_drive_change_list_loader_unittest.cc [modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/team_drive_list_loader_unittest.cc
,
Dec 6
,
Dec 12
I tried landing the above CL, but it was reverted due to some tests failure at linux-chromeos-rel build bot, build 16932, 16933, and 16934 failed: https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/linux-chromeos-rel/16932 https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/linux-chromeos-rel/16933 https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/linux-chromeos-rel/16934 But the linux-chromeos-rel trybot seems ok: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux-chromeos-rel/155855 The failed browser tests are FilesAppBrowserTest related tests. I ran those failed browser tests locally and they all succeeded. I tried reproducing the test failure locally. I tried the individual swarming tasks on the "Swarming Task Page" from the builder, downloaded the archive, and ran the task locally, the test took a very long time to run and each run produced different sets of test failures, so it provided little information why the test failed and how to debug them. I also tried running the failed individual browser test locally using the same set of command flags by the builder, but the test all succeeded. Not sure how to consistently reproduce those test failure locally to help debugging the issue.
,
Dec 13
(Chrome Ops trooper here) It sounds like flaky failures, possibly due to test interactions. I'm not familiar with browser tests, so unfortunately can't suggest anything specific. Generally speaking, to debug flaky tests, one needs to figure out what the source of flakiness is: - Is it nondeterministic test itself? (try running the same test many times in isolation) - Is it one or more passing tests which somehow change a persistent state and make the subsequent tests fail? (this may require controlling the order of test runs and observing the tests prior to the failing ones - not sure if it's easily doable) - Are the tests sensitive to machine load? To available network ports? (running all tests sequentially vs. a bunch in parallel may give a hint) Another source to rule out is compilation / SDK version. If the downloaded binary reproduces the flakiness, would a locally compiled binary reproduce the same level of flakiness? (if it doesn't, maybe compile flags / system-wide packages play a role?) I'm obviously shooting in the dark here. Others with better knowledge of browser tests, please chime in.
,
Dec 13
Thanks for the suggestions! I did the following testing:
1. I ran the failed test locally many times in isolation, and all tests succeeded.
2. I ran some failed test locally in parallel using "--test-launcher-jobs" parameter, and the test succeeded.
3. The downloaded binary reproduces the flakiness, but different run produced different sets of test flakiness. I also tried running FilesAppBrowserTest test using binary built from top of the tree that is without my change, the test still produces flakiness:
10 tests failed:
TabIndex/FilesAppBrowserTest.Test/tabindexFocus (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
TabIndex/FilesAppBrowserTest.Test/tabindexFocusDirectorySelected (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
TabIndex/FilesAppBrowserTest.Test/tabindexFocusDirectorySelected_MyFilesVolume (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
TabIndex/FilesAppBrowserTest.Test/tabindexFocus_MyFilesVolume (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
TabIndex/FilesAppBrowserTest.Test/tabindexOpenDialogDrive_DriveFs (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
TabIndex/FilesAppBrowserTest.Test/tabindexOpenDialogDrive_DriveFs_MyFilesVolume (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
TabIndex/FilesAppBrowserTest.Test/tabindexSaveFileDialogDrive_DriveFs (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
TabIndex/FilesAppBrowserTest.Test/tabindexSaveFileDialogDrive_DriveFs_MyFilesVolume (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
TabIndex/FilesAppBrowserTest.Test/tabindexSearchBoxFocus (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
TabIndex/FilesAppBrowserTest.Test/tabindexSearchBoxFocus_MyFilesVolume (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
,
Dec 14
Thanks! So, IIUC, (1) suggests that tests themselves are OK. (2) suggests that the potential test interaction on a lightly loaded machine still is not enough to trigger flakiness. But (3) reproduces it (IIUC both with a bot-built binary and locally built binary?) - so it seems the flake is triggered by a sufficiently high test load. To confirm this, could you tell how many tests were run in (2) - both the total #tests and the value for --test-launcher-jobs? And the same numbers for (3) - for comparison. If the numbers between (2) and (3) are very different, maybe the next experiment is to try to reduce the parallelism in (3) (e.g. an extreme is to run all tests sequentially), and see if it eliminates the flake. If the root cause turns out to be high load, it's probably the hardest to debug... But at least we can mitigate it by reducing the parallelism on an offending bot temporarily (at the expense of runtime), and then find the right people to debug the flakiness.
,
Dec 14
Thanks! For (3), both the downloaded binary and locally built binary had some browser tests failure. For (2), I ran one test, but set the "--gtest_repeat" to be 20 and "--test-launcher-jobs" to be 10. For (3), I ran all FilesAppBrowserTest related browser tests using "--gtest_filter=*FilesAppBrowserTest*", and I believe it ran the tests sequentially.
,
Dec 17
Thanks! From this, I'm guessing that (2) didn't have a chance to show any "interesting" interactions (since it's multiple instances of the same test). But if a sequential run of many tests reproduces flakes, it means some tests leave turds behind that break other tests. Unfortunately, that's a pretty hard one to debug - the faulty tests may actually pass, but cause other "good" tests to fail later. Is it feasible to list the order of tests as they run, and try to eliminate tests that immediately precede the failing ones? It's basically a heuristic to find suspected tests and checking if removing them fixes the flakes. However, if a state can persist for multiple runs, this may turn into a full enumeration problem...
,
Dec 17
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e53b4732fadd692a8335ce3f7a505f2557f57537 commit e53b4732fadd692a8335ce3f7a505f2557f57537 Author: Robbie McElrath <rmcelrath@chromium.org> Date: Mon Dec 17 23:52:12 2018 Add content::NetworkConnectionChangeSimulator helper. This moves the NetworkConnectionChangeSimulator from chrome/browser/chrome_browser_main_browsertest.cc to its own file in content/public/test, since other browsertests need the same functionality. Bug: 887038 Change-Id: Ia43603cf06563524b45a3bc6c644cbab60d280fc Reviewed-on: https://chromium-review.googlesource.com/c/1378971 Reviewed-by: Scott Violet <sky@chromium.org> Commit-Queue: Robbie McElrath <rmcelrath@chromium.org> Cr-Commit-Position: refs/heads/master@{#617295} [modify] https://crrev.com/e53b4732fadd692a8335ce3f7a505f2557f57537/chrome/browser/chrome_browser_main_browsertest.cc [add] https://crrev.com/e53b4732fadd692a8335ce3f7a505f2557f57537/content/public/test/network_connection_change_simulator.cc [add] https://crrev.com/e53b4732fadd692a8335ce3f7a505f2557f57537/content/public/test/network_connection_change_simulator.h [modify] https://crrev.com/e53b4732fadd692a8335ce3f7a505f2557f57537/content/test/BUILD.gn
,
Dec 18
I was able to reproduce the failure on my P920. The issue is that there's some async test setup that wasn't always completing before the test ran. NetworkChangeNotifierChromeos defaults its connection type to NONE, and then a call downstream of BrowserTestBase::SetUp sets its connection type to ETHERNET. After a 1 second delay, NetworkChangeNotifier will send a connection changed notification, which will eventually make its way to NetworkConnectionTracker and then JobScheduler. The test itself will enqueue a job to JobScheduler that is considered USER_INITIATED, and will be cancelled if the JobScheduler thinks there's no connection. On slower machines (like Jun's and apparently the trybots), the network change notification arrives before the test starts running, so the jobs the test submits get run. On faster machines, the jobs get enqueued before the connection type change has been propagated, so they get cancelled and the test times out. To fix this we need to make sure the connection type is set correctly before we start the test. The CL from #14 should help with that.
,
Dec 18
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/065916a6adeb8e3c51ca9593824db8f53c207589 commit 065916a6adeb8e3c51ca9593824db8f53c207589 Author: Jun Cai <juncai@chromium.org> Date: Tue Dec 18 03:06:43 2018 Reland: Network Service: Migrate drive::JobScheduler to NetworkConnectionTracker The "Initial upload" patch of this CL is the same as: https://chromium-review.googlesource.com/c/chromium/src/+/1287040 This CL migrates drive::JobScheduler from NetworkChangeNotifier to NetworkConnectionTracker, which works with the network service enabled. The diff from "Initial upload": some code is added to make sure the connection type is set correctly before test starts running. TBR=hashimoto@chromium.org, fukino@chromium.org Bug: 887038 Change-Id: I7b6dd94c23de8238ee4c6253e63023ea0511ce77 Reviewed-on: https://chromium-review.googlesource.com/c/1368035 Reviewed-by: Jun Cai <juncai@chromium.org> Commit-Queue: Jun Cai <juncai@chromium.org> Cr-Commit-Position: refs/heads/master@{#617358} [modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/chrome/browser/chromeos/drive/drive_integration_service.cc [modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/chrome/browser/chromeos/file_manager/file_manager_browsertest_base.cc [modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/about_resource_loader_unittest.cc [modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/change_list_loader_unittest.cc [modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/directory_loader_unittest.cc [modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/file_system/operation_test_base.cc [modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/file_system_unittest.cc [modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/job_scheduler.cc [modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/job_scheduler.h [modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/job_scheduler_unittest.cc [modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/start_page_token_loader_unittest.cc [modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/sync_client_unittest.cc [modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/team_drive_change_list_loader_unittest.cc [modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/team_drive_list_loader_unittest.cc
,
Dec 18
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by rmcelrath@chromium.org
, Sep 20Status: Started (was: Available)