Issue metadata
Sign in to add a comment
|
DemoExtensionsExternalLoaderTest.LoadApp is sometimes deterministically flaky. |
||||||||||||||||||||||||
Issue descriptionIn this build, it fails multiple times: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux-chromeos-rel/144853 In this build [retry], it passes: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux-chromeos-rel/144903 I can repro the failures locally, but only ~10% of the time. GN args: """ 19 dcheck_always_on = true 20 ffmpeg_branding = "ChromeOS" 21 is_component_build = false 22 is_debug = false 23 proprietary_codecs = true 24 strip_absolute_paths_from_debug_symbols = true 25 symbol_level = 0 26 target_os = "chromeos" 27 use_goma = true 28 use_vaapi = true """ On the build bots, it appears to consistently fail on retry [e.g. 40 times in a row].
,
Nov 28
The test in question has been disabled here: https://monorail-prod.appspot.com/p/chromium/issues/detail?id=904644 I'd like to keep this crbug open though, because I want to know why the flakiness sometimes appears deterministic. That likely implies that there's state leaking from the parent test runner process into child, test processes.
,
Nov 28
+ wzang, michaelpg I'm aware that the flakiness is being tracked in issue 904644. I'm concerned about why the flakiness sometimes appears deterministic [e.g. 40 failures in a row]. This suggests that there may be problems in the test suite runner itself, where state is perhaps leaking from the parent test runner process into child test processes. Do you have any insight into the nature of the failure and what might cause it?
,
Nov 28
,
Nov 28
I explicitly asked for this not to be duped into 904644.
,
Nov 28
Sorry.
,
Nov 28
I can't reproduce the failures locally after running the tests 200 times using the GN args in #1. Does it repro 10% of the time in your case?
,
Nov 28
,
Nov 28
We're setting the SharedURLLoaderFactory on the testing BrowserProcess, requesting to load an extension, and expecting a URL request to be made. Even when unit tests run in parallel, they're being executed in different processes (with their own g_browser_process), right?
,
Nov 28
When I attach a keyboard/mouse/monitor to my Linux device, the failure never repros. When I CRD into the device [no keyboard/mouse/monitor], the test fails > 10% of the time.
,
Nov 28
The implementation of TestURLLoaderFactory::NumPending calls "base::RunLoop().RunUntilIdle" exactly once and then looks for the # of pending requests. This seems quite brittle.
When I change the implementation to the following:
"""
int TestURLLoaderFactory::NumPending() {
int pending = 0;
while (pending != 1) {
pending = 0;
base::RunLoop().RunUntilIdle();
for (const auto& candidate : pending_requests_) {
if (!candidate.client.encountered_error())
++pending;
}
}
return pending;
}
"""
I can no longer reproduce the test failure. Continuing to dig to see what's going on.
,
Nov 28
The test attempts to hop to a background task and back to the UI thread: https://cs.chromium.org/chromium/src/chrome/browser/extensions/updater/local_extension_cache.cc?type=cs&q=local_extension_cache.cc&sq=package:chromium&g=0&l=305 See LocalExtensionCache::CheckCacheContents and LocalExtensionCache::BackendCheckCacheContents. If the background thread processes LocalExtensionCache::BackendCheckCacheContents and reposts onto the UI thread before the call to TestURLLoaderFactory::NumPending() then the test passes. Otherwise, the test fails. Whether or not the test passes depends on the OS scheduler and availability of other cores on the device. I imagine that if there are no other available cores, then the test will very likely fail. This seems more likely to happen on the trybots, as those are run on VM images with hardware sharing. I'm going to close this bug, since the unit_tests test suite runner appears unrelated to the flakiness.
,
Dec 4
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by erikc...@chromium.org
, Nov 27