New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 910029 link

Starred by 5 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 5
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Fuchsia
Pri: 1
Type: Bug-Regression

Blocked on:
issue 910027

Blocking:
issue 909074



Sign in to add a comment

Raciness in content_unittests between ScopedTempDir teardown and SimpleBackendImpl initialization

Project Member Reported by tikuta@chromium.org, Nov 29

Issue description

Many tests failing with

[453226:1063433171:1129/071219.026473:170349682:ERROR:render_widget_host_view_base.cc(193)] Not implemented reached in virtual uint32_t content::RenderWidgetHostViewBase::GetCaptureSequenceNumber() const
[453226:1063433171:1129/071219.056807:170380016:WARNING:scoped_temp_dir.cc(23)] Could not delete temp dir in dtor.
[453226:980595369:1129/071219.059212:170382421:FATAL:simple_backend_impl.cc(732)] Check failed: mtime_result.

e.g.
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/fuchsia_x64/160034
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/fuchsia_x64/160029
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/fuchsia_x64/160027
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/fuchsia_x64/160026
 
Cc: sergeyu@chromium.org w...@chromium.org scottmg@chromium.org
Added Fuchsia people.
Cc: morlovich@chromium.org
This DCHECK verifies that stat(2) returns zero (i.e. success) for an existing directory. Is this racy in Fuchsia?
This is also causing my patches to be rejected.
Components: Internals>PlatformIntegration
Labels: M-73
Owner: w...@chromium.org
Status: Assigned (was: Untriaged)
Auto-assigning, to take an initial look before handing off to Gardener kmarshall@.
Owner: sergeyu@chromium.org
sergeyu: You landed https://chromium-review.googlesource.com/c/chromium/src/+/1351888 yesterday (Nov 28th), which altered how we propagate the /data directory to test-launcher sub-processes - I can't see anything obviously wrong in the CL, but could you take a look, please?
Status: Started (was: Assigned)
investigating
Actually, it is noticable that the tests which fail in the retry-with-patch are the same ones which fail in the original run of the content_unittests step. But each of the failed tries listed by tikuta@ have different tests failing in them.

IMO this looks like some earlier test in the same batch is doing something to break later tests.
Ah, nm, the retry steps aren't running anything at all, because --test-launcher-retry-limit isn't respected by the runner script.
Blockedon: 910027
OK, we should be able to do try runs to repro this, now.

Looking at the log output, the two lines:

[453226:1063433171:1129/071219.056807:170380016:WARNING:scoped_temp_dir.cc(23)] Could not delete temp dir in dtor.
[453226:980595369:1129/071219.059212:170382421:FATAL:simple_backend_impl.cc(732)] Check failed: mtime_result.

are suspicious - the first indicates that recursive deletion of a scoped temporary directory is failing, while the latter suggests that the SimpleBackendImpl cache directory is being deleted in-between being created, and us reaching the stat() call -> i.e. seemingly a race between ScopedTempDir teardown and SimpleBackendImpl initialization.

We'll need to work out who is creating and owning the ScopedTempDir in this case, to be able to diagnose further, I think.
Also, if I'm reading the logs correctly, this is a race between two threads in the same process, i.e. it's two sequences in the test itself racing against one another.
Labels: -Sheriff-Chromium
Removing Sheriff-Chromium since this has an owner and is underway.
There are now other tests failing with the same message.

https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8928190746169793264/+/steps/content_unittests/0/logs/ForwardingAudioStreamFactoryTest.Destruction_CallsOnSourceGoneOnRegisteredLoopbackSinks/0

It looks like cache is cleaned up asynchronously, which races with cache initialization in the next test. The cleanup task is posted here:

https://codesearch.chromium.org/chromium/src/net/disk_cache/simple/simple_backend_impl.cc?gsn=SetWorkerPoolForTesting&l=389

The cleanup task is posted to a TaskScheduler task runner. The TaskScheduler for many of these is supposed to be created and destroyed in TestBrowserThreadBundle, which uses ScopedTaskEnvironment: https://codesearch.chromium.org/chromium/src/content/public/test/test_browser_thread_bundle.cc?dr=CSs&g=0&l=117 , but it's likely that some other tests don't use ScopedTaskEnvironment leaving TaskScheduler for the tests that run afterwards.
Re #13 tasks leaking between tests would be bad, but I don't think that's possible.  Some tests have a ScopedTaskEnvironment in their fixture, others a TestBrowserThreadBundle but these get torn down after each test run so I'm not seeing any way to for tasks to leak.  Also without a ScopedTaskEnvironment or a MessageLoop (some tests might use those directly, but it shouldn't matter) PostTask will fail.
Cc: nednguyen@chromium.org jbudorick@chromium.org
 Issue 911159  has been merged into this issue.
Labels: -Pri-1 -Type-Bug Pri-0 Type-Bug-Regression
sergeyu: These flakes are causing considerable disruption. Are we seeing them from suites other than content_unittests? If not then let's temporarily take content_unittests off the CQ until we can identify the cause.

Tagging as P0, since this is causing bogus CQ failures.
I can reproduce it locally . Will disable it on CQ if I don't figure out the root cause soon
Looks like the problem is that in TestBrowserContext destructor the ScopedTestDir that contains the disk cache is destroyed while TaskScheduler is still running. The TaskScheduler is executing SimpleBackendImpl::InitCacheStructureOnDisk(), which races with directory destruction.
Project Member

Comment 19 by bugdroid1@chromium.org, Dec 3

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8e99628583e99028dac8a8433f76d91d68965d41

commit 8e99628583e99028dac8a8433f76d91d68965d41
Author: Sergey Ulanov <sergeyu@chromium.org>
Date: Mon Dec 03 21:01:20 2018

Disable content_unittests on Fuchsia x64 bot

content_unittests are failing on Fuchsia.

TBR=thakis@chromium.org

Bug:  910029 
Change-Id: Ie567ad03904e3ec27143c3a1bec1a86562c74c18
Reviewed-on: https://chromium-review.googlesource.com/c/1359057
Reviewed-by: Sergey Ulanov <sergeyu@chromium.org>
Reviewed-by: Nico Weber <thakis@chromium.org>
Commit-Queue: Sergey Ulanov <sergeyu@chromium.org>
Cr-Commit-Position: refs/heads/master@{#613241}
[modify] https://crrev.com/8e99628583e99028dac8a8433f76d91d68965d41/testing/buildbot/chromium.linux.json
[modify] https://crrev.com/8e99628583e99028dac8a8433f76d91d68965d41/testing/buildbot/test_suite_exceptions.pyl

sergeyu: Your explanation in #18 makes sense, but why is this suddenly flaking now, when it previously didn't? Has something changed about the ScopedTaskEnvironment vs ScopedTempDir dtor ordering?
Cc: est...@chromium.org
Looking at estade@'s https://chromium-review.googlesource.com/c/chromium/src/+/1352969 (landed Nov 28th, just before we observed these flakes starting), I notice that a RenderViewHostTestEnabler was added to a whole set of different test bases. RenderViewHostTestEnabler does own a ScopedTaskEnvironment internally.
Summary: Raciness in content_unittests between ScopedTempDir teardown and SimpleBackendImpl initialization (was: content_unittests is flaky on fuchsia_x64 builder)
Blocking: 909074
Project Member

Comment 25 by bugdroid1@chromium.org, Dec 5

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9320bb77f51270de86de19a37a6295186ab6c9bc

commit 9320bb77f51270de86de19a37a6295186ab6c9bc
Author: Sergey Ulanov <sergeyu@chromium.org>
Date: Wed Dec 05 10:07:42 2018

Fix race in TestBrowserContext shutdown sequence.

disk_cache::SimpleBackendImpl performs disk IO asynchronously on
TaskScheduler. These operations can race with ScopedTempDir destruction in
TestBrowserContext. Updated ~TestBrowserContext() to ensure that all
pending IO ops complete before it attempts to destroy ScopedTempDir.

Bug:  910029 
Change-Id: I69145a395c81a042d8c3880e9917a07a972df9ab
Reviewed-on: https://chromium-review.googlesource.com/c/1359464
Commit-Queue: Sergey Ulanov <sergeyu@chromium.org>
Reviewed-by: Scott Violet <sky@chromium.org>
Cr-Commit-Position: refs/heads/master@{#613925}
[modify] https://crrev.com/9320bb77f51270de86de19a37a6295186ab6c9bc/content/public/test/test_browser_context.cc

Labels: -Pri-0 Pri-1
FYI, queued a CL for restoration of content_unittests on the waterfall:
https://chromium-review.googlesource.com/c/chromium/src/+/1363640

Project Member

Comment 27 by bugdroid1@chromium.org, Dec 5

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/40a8e2c9f07c35937bbe629ffc6be3268e89626c

commit 40a8e2c9f07c35937bbe629ffc6be3268e89626c
Author: Wez <wez@chromium.org>
Date: Wed Dec 05 19:22:03 2018

Revert "Disable content_unittests on Fuchsia x64 bot"

This reverts commit 8e99628583e99028dac8a8433f76d91d68965d41.

Reason for revert: Underlying race-condition leading to flake has been resolved.

Original change's description:
> Disable content_unittests on Fuchsia x64 bot
> 
> content_unittests are failing on Fuchsia.
> 
> TBR=thakis@chromium.org
> 
> Bug:  910029 
> Change-Id: Ie567ad03904e3ec27143c3a1bec1a86562c74c18
> Reviewed-on: https://chromium-review.googlesource.com/c/1359057
> Reviewed-by: Sergey Ulanov <sergeyu@chromium.org>
> Reviewed-by: Nico Weber <thakis@chromium.org>
> Commit-Queue: Sergey Ulanov <sergeyu@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#613241}

TBR=thakis@chromium.org,sergeyu@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug:  910029 
Change-Id: Id3c04cb5efa2c875cad61ffae78805b04738f9cc
Reviewed-on: https://chromium-review.googlesource.com/c/1363640
Reviewed-by: Wez <wez@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#614055}
[modify] https://crrev.com/40a8e2c9f07c35937bbe629ffc6be3268e89626c/testing/buildbot/chromium.linux.json
[modify] https://crrev.com/40a8e2c9f07c35937bbe629ffc6be3268e89626c/testing/buildbot/test_suite_exceptions.pyl

Cc: fdegans@chromium.org
 Issue 912278  has been merged into this issue.
Status: Fixed (was: Started)

Sign in to add a comment