New issue
Advanced search Search tips

Issue 887038 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 18
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocking:
issue 821009



Sign in to add a comment

Migrate drive::JobScheduler to NetworkConnectionTracker

Project Member Reported by rmcelrath@chromium.org, Sep 19

Issue description

drive::JobScheduler currently uses net::NetworkChangeNotifier to receive network changes. 

With network service, that will need to be converted to using NetworkConnectionTracker's observer APIs.
 
Owner: rmcelrath@chromium.org
Status: Started (was: Available)
Cc: rmcelrath@chromium.org
Owner: juncai@chromium.org
Status: Assigned (was: Started)
Status: Started (was: Assigned)
Project Member

Comment 4 by bugdroid1@chromium.org, Dec 6

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e2d41b1393eb57790a1634640b1aa149a7c5fc24

commit e2d41b1393eb57790a1634640b1aa149a7c5fc24
Author: Jun Cai <juncai@chromium.org>
Date: Thu Dec 06 17:15:52 2018

Network Service: Migrate drive::JobScheduler to NetworkConnectionTracker

This CL migrates drive::JobScheduler from NetworkChangeNotifier to
NetworkConnectionTracker, which works with the network service enabled.

The "Initial upload" patch of this CL is the same as:
https://chromium-review.googlesource.com/c/chromium/src/+/1236558

Bug:  887038 
Change-Id: Ib19f501f6a85a0c32dc81d47ea4b161b5a726efe
Reviewed-on: https://chromium-review.googlesource.com/c/1287040
Reviewed-by: Ryo Hashimoto <hashimoto@chromium.org>
Reviewed-by: Naoki Fukino <fukino@chromium.org>
Commit-Queue: Jun Cai <juncai@chromium.org>
Cr-Commit-Position: refs/heads/master@{#614393}
[modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/chrome/browser/chromeos/drive/drive_integration_service.cc
[modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/chrome/browser/chromeos/file_manager/file_manager_browsertest_base.cc
[modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/about_resource_loader_unittest.cc
[modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/change_list_loader_unittest.cc
[modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/directory_loader_unittest.cc
[modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/file_system/operation_test_base.cc
[modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/file_system_unittest.cc
[modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/job_scheduler.cc
[modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/job_scheduler.h
[modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/job_scheduler_unittest.cc
[modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/start_page_token_loader_unittest.cc
[modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/sync_client_unittest.cc
[modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/team_drive_change_list_loader_unittest.cc
[modify] https://crrev.com/e2d41b1393eb57790a1634640b1aa149a7c5fc24/components/drive/team_drive_list_loader_unittest.cc

Status: Fixed (was: Started)
Project Member

Comment 6 by bugdroid1@chromium.org, Dec 6

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/bee8665911f1e87abbd90b9aa4e0ccb141ee2575

commit bee8665911f1e87abbd90b9aa4e0ccb141ee2575
Author: Jun Cai <juncai@chromium.org>
Date: Thu Dec 06 19:06:24 2018

Revert "Network Service: Migrate drive::JobScheduler to NetworkConnectionTracker"

This reverts commit e2d41b1393eb57790a1634640b1aa149a7c5fc24.

Reason for revert:
linux-chromeos-rel bot failed at:
https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/linux-chromeos-rel/16932

Original change's description:
> Network Service: Migrate drive::JobScheduler to NetworkConnectionTracker
> 
> This CL migrates drive::JobScheduler from NetworkChangeNotifier to
> NetworkConnectionTracker, which works with the network service enabled.
> 
> The "Initial upload" patch of this CL is the same as:
> https://chromium-review.googlesource.com/c/chromium/src/+/1236558
> 
> Bug:  887038 
> Change-Id: Ib19f501f6a85a0c32dc81d47ea4b161b5a726efe
> Reviewed-on: https://chromium-review.googlesource.com/c/1287040
> Reviewed-by: Ryo Hashimoto <hashimoto@chromium.org>
> Reviewed-by: Naoki Fukino <fukino@chromium.org>
> Commit-Queue: Jun Cai <juncai@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#614393}

TBR=hashimoto@chromium.org,fukino@chromium.org,juncai@chromium.org

Change-Id: I72e70bae035cab91ec3d325d7aec66d9f90b5499
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  887038 
Reviewed-on: https://chromium-review.googlesource.com/c/1366260
Reviewed-by: Jun Cai <juncai@chromium.org>
Commit-Queue: Jun Cai <juncai@chromium.org>
Cr-Commit-Position: refs/heads/master@{#614436}
[modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/chrome/browser/chromeos/drive/drive_integration_service.cc
[modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/chrome/browser/chromeos/file_manager/file_manager_browsertest_base.cc
[modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/about_resource_loader_unittest.cc
[modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/change_list_loader_unittest.cc
[modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/directory_loader_unittest.cc
[modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/file_system/operation_test_base.cc
[modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/file_system_unittest.cc
[modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/job_scheduler.cc
[modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/job_scheduler.h
[modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/job_scheduler_unittest.cc
[modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/start_page_token_loader_unittest.cc
[modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/sync_client_unittest.cc
[modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/team_drive_change_list_loader_unittest.cc
[modify] https://crrev.com/bee8665911f1e87abbd90b9aa4e0ccb141ee2575/components/drive/team_drive_list_loader_unittest.cc

Status: Started (was: Fixed)
Cc: sergeybe...@chromium.org jam@chromium.org fukino@chromium.org dxie@chromium.org
I tried landing the above CL, but it was reverted due to some tests failure at linux-chromeos-rel build bot, build 16932, 16933, and 16934 failed:
https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/linux-chromeos-rel/16932
https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/linux-chromeos-rel/16933
https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/linux-chromeos-rel/16934
But the linux-chromeos-rel trybot seems ok:
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux-chromeos-rel/155855

The failed browser tests are FilesAppBrowserTest related tests. I ran those failed browser tests locally and they all succeeded.

I tried reproducing the test failure locally. I tried the individual swarming tasks on the "Swarming Task Page" from the builder, downloaded the archive, and ran the task locally, the test took a very long time to run and each run produced different sets of test failures, so it provided little information why the test failed and how to debug them.

I also tried running the failed individual browser test locally using the same set of command flags by the builder, but the test all succeeded.

Not sure how to consistently reproduce those test failure locally to help debugging the issue.

(Chrome Ops trooper here)

It sounds like flaky failures, possibly due to test interactions. I'm not familiar with browser tests, so unfortunately can't suggest anything specific. 

Generally speaking, to debug flaky tests, one needs to figure out what the source of flakiness is:

- Is it nondeterministic test itself? (try running the same test many times in isolation)
- Is it one or more passing tests which somehow change a persistent state and make the subsequent tests fail? (this may require controlling the order of test runs and observing the tests prior to the failing ones - not sure if it's easily doable)
- Are the tests sensitive to machine load? To available network ports? (running all tests sequentially vs. a bunch in parallel may give a hint)

Another source to rule out is compilation / SDK version. If the downloaded binary reproduces the flakiness, would a locally compiled binary reproduce the same level of flakiness? (if it doesn't, maybe compile flags / system-wide packages play a role?)

I'm obviously shooting in the dark here. Others with better knowledge of browser tests, please chime in.
Thanks for the suggestions! I did the following testing:

1. I ran the failed test locally many times in isolation, and all tests succeeded.

2. I ran some failed test locally in parallel using "--test-launcher-jobs" parameter, and the test succeeded.

3. The downloaded binary reproduces the flakiness, but different run produced different sets of test flakiness. I also tried running FilesAppBrowserTest test using binary built from top of the tree that is without my change, the test still produces flakiness:
10 tests failed:
    TabIndex/FilesAppBrowserTest.Test/tabindexFocus (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
    TabIndex/FilesAppBrowserTest.Test/tabindexFocusDirectorySelected (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
    TabIndex/FilesAppBrowserTest.Test/tabindexFocusDirectorySelected_MyFilesVolume (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
    TabIndex/FilesAppBrowserTest.Test/tabindexFocus_MyFilesVolume (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
    TabIndex/FilesAppBrowserTest.Test/tabindexOpenDialogDrive_DriveFs (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
    TabIndex/FilesAppBrowserTest.Test/tabindexOpenDialogDrive_DriveFs_MyFilesVolume (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
    TabIndex/FilesAppBrowserTest.Test/tabindexSaveFileDialogDrive_DriveFs (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
    TabIndex/FilesAppBrowserTest.Test/tabindexSaveFileDialogDrive_DriveFs_MyFilesVolume (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
    TabIndex/FilesAppBrowserTest.Test/tabindexSearchBoxFocus (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)
    TabIndex/FilesAppBrowserTest.Test/tabindexSearchBoxFocus_MyFilesVolume (../../chrome/browser/chromeos/file_manager/file_manager_browsertest.cc:188)

Thanks! So, IIUC, (1) suggests that tests themselves are OK. (2) suggests that the potential test interaction on a lightly loaded machine still is not enough to trigger flakiness. But (3) reproduces it (IIUC both with a bot-built binary and locally built binary?) - so it seems the flake is triggered by a sufficiently high test load.

To confirm this, could you tell how many tests were run in (2) - both the total #tests and the value for --test-launcher-jobs? And the same numbers for (3) - for comparison.

If the numbers between (2) and (3) are very different, maybe the next experiment is to try to reduce the parallelism in (3) (e.g. an extreme is to run all tests sequentially), and see if it eliminates the flake.

If the root cause turns out to be high load, it's probably the hardest to debug... But at least we can mitigate it by reducing the parallelism on an offending bot temporarily (at the expense of runtime), and then find the right people to debug the flakiness.

Thanks! For (3), both the downloaded binary and locally built binary had some browser tests failure.

For (2), I ran one test, but set the "--gtest_repeat" to be 20 and "--test-launcher-jobs" to be 10.
For (3), I ran all FilesAppBrowserTest related browser tests using "--gtest_filter=*FilesAppBrowserTest*", and I believe it ran the tests sequentially.

Thanks! From this, I'm guessing that (2) didn't have a chance to show any "interesting" interactions (since it's multiple instances of the same test).

But if a sequential run of many tests reproduces flakes, it means some tests leave turds behind that break other tests. Unfortunately, that's a pretty hard one to debug - the faulty tests may actually pass, but cause other "good" tests to fail later.

Is it feasible to list the order of tests as they run, and try to eliminate tests that immediately precede the failing ones? 

It's basically a heuristic to find suspected tests and checking if removing them fixes the flakes. However, if a state can persist for multiple runs, this may turn into a full enumeration problem...

Project Member

Comment 14 by bugdroid1@chromium.org, Dec 17

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e53b4732fadd692a8335ce3f7a505f2557f57537

commit e53b4732fadd692a8335ce3f7a505f2557f57537
Author: Robbie McElrath <rmcelrath@chromium.org>
Date: Mon Dec 17 23:52:12 2018

Add content::NetworkConnectionChangeSimulator helper.

This moves the NetworkConnectionChangeSimulator from
chrome/browser/chrome_browser_main_browsertest.cc to its own file in
content/public/test, since other browsertests need the same
functionality.

Bug:  887038 
Change-Id: Ia43603cf06563524b45a3bc6c644cbab60d280fc
Reviewed-on: https://chromium-review.googlesource.com/c/1378971
Reviewed-by: Scott Violet <sky@chromium.org>
Commit-Queue: Robbie McElrath <rmcelrath@chromium.org>
Cr-Commit-Position: refs/heads/master@{#617295}
[modify] https://crrev.com/e53b4732fadd692a8335ce3f7a505f2557f57537/chrome/browser/chrome_browser_main_browsertest.cc
[add] https://crrev.com/e53b4732fadd692a8335ce3f7a505f2557f57537/content/public/test/network_connection_change_simulator.cc
[add] https://crrev.com/e53b4732fadd692a8335ce3f7a505f2557f57537/content/public/test/network_connection_change_simulator.h
[modify] https://crrev.com/e53b4732fadd692a8335ce3f7a505f2557f57537/content/test/BUILD.gn

I was able to reproduce the failure on my P920. The issue is that there's some async test setup that wasn't always completing before the test ran.

NetworkChangeNotifierChromeos defaults its connection type to NONE, and then a call downstream of BrowserTestBase::SetUp sets its connection type to ETHERNET. After a 1 second delay, NetworkChangeNotifier will send a connection changed notification, which will eventually make its way to NetworkConnectionTracker and then JobScheduler. The test itself will enqueue a job to JobScheduler that is considered USER_INITIATED, and will be cancelled if the JobScheduler thinks there's no connection. On slower machines (like Jun's and apparently the trybots), the network change notification arrives before the test starts running, so the jobs the test submits get run. On faster machines, the jobs get enqueued before the connection type change has been propagated, so they get cancelled and the test times out.

To fix this we need to make sure the connection type is set correctly before we start the test. The CL from #14 should help with that.
Project Member

Comment 16 by bugdroid1@chromium.org, Dec 18

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/065916a6adeb8e3c51ca9593824db8f53c207589

commit 065916a6adeb8e3c51ca9593824db8f53c207589
Author: Jun Cai <juncai@chromium.org>
Date: Tue Dec 18 03:06:43 2018

Reland: Network Service: Migrate drive::JobScheduler to NetworkConnectionTracker

The "Initial upload" patch of this CL is the same as:
https://chromium-review.googlesource.com/c/chromium/src/+/1287040

This CL migrates drive::JobScheduler from NetworkChangeNotifier to
NetworkConnectionTracker, which works with the network service enabled.

The diff from "Initial upload": some code is added to make sure the connection
type is set correctly before test starts running.

TBR=hashimoto@chromium.org, fukino@chromium.org

Bug:  887038 
Change-Id: I7b6dd94c23de8238ee4c6253e63023ea0511ce77
Reviewed-on: https://chromium-review.googlesource.com/c/1368035
Reviewed-by: Jun Cai <juncai@chromium.org>
Commit-Queue: Jun Cai <juncai@chromium.org>
Cr-Commit-Position: refs/heads/master@{#617358}
[modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/chrome/browser/chromeos/drive/drive_integration_service.cc
[modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/chrome/browser/chromeos/file_manager/file_manager_browsertest_base.cc
[modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/about_resource_loader_unittest.cc
[modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/change_list_loader_unittest.cc
[modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/directory_loader_unittest.cc
[modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/file_system/operation_test_base.cc
[modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/file_system_unittest.cc
[modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/job_scheduler.cc
[modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/job_scheduler.h
[modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/job_scheduler_unittest.cc
[modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/start_page_token_loader_unittest.cc
[modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/sync_client_unittest.cc
[modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/team_drive_change_list_loader_unittest.cc
[modify] https://crrev.com/065916a6adeb8e3c51ca9593824db8f53c207589/components/drive/team_drive_list_loader_unittest.cc

Status: Fixed (was: Started)

Sign in to add a comment