New issue
Advanced search Search tips

Issue 828407 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

"CvoxBrailleUtilUnitTest.TextField" is flaky (Multiple chromevox_tests are flaky)

Project Member Reported by chromium...@appspot.gserviceaccount.com, Apr 3 2018

Issue description

"CvoxBrailleUtilUnitTest.TextField" is flaky.

This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label.

We have detected 3 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLAsSBUZsYWtlIiFDdm94QnJhaWxsZVV0aWxVbml0VGVzdC5UZXh0RmllbGQM.

Flaky tests should be disabled within 30 minutes unless culprit CL is found and reverted. Please see more details here: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues#triaging-auto-filed-flakiness-bugs

This flaky test/step was previously tracked in  issue 540070 .
 

Comment 1 by zmin@chromium.org, Apr 3 2018

Labels: -Sheriff-Chromium
Owner: dmazz...@chromium.org
Status: Assigned (was: Untriaged)
Assign to Dominic to take a look.

Comment 2 by zmin@chromium.org, Apr 3 2018

 Issue 828342  has been merged into this issue.

Comment 3 by zmin@chromium.org, Apr 3 2018

 Issue 828320  has been merged into this issue.

Comment 4 by zmin@chromium.org, Apr 3 2018

 Issue 828319  has been merged into this issue.

Comment 5 by zmin@chromium.org, Apr 3 2018

 Issue 828289  has been merged into this issue.

Comment 6 by zmin@chromium.org, Apr 3 2018

 Issue 828106  has been merged into this issue.

Comment 7 by zmin@chromium.org, Apr 3 2018

 Issue 828080  has been merged into this issue.

Comment 8 by zmin@chromium.org, Apr 3 2018

 Issue 828070  has been merged into this issue.

Comment 9 by zmin@chromium.org, Apr 3 2018

 Issue 827824  has been merged into this issue.

Comment 10 by zmin@chromium.org, Apr 3 2018

 Issue 827823  has been merged into this issue.

Comment 11 by zmin@chromium.org, Apr 3 2018

 Issue 827818  has been merged into this issue.

Comment 12 by zmin@chromium.org, Apr 3 2018

 Issue 827808  has been merged into this issue.

Comment 13 by zmin@chromium.org, Apr 3 2018

 Issue 827766  has been merged into this issue.

Comment 14 by zmin@chromium.org, Apr 3 2018

 Issue 827795  has been merged into this issue.

Comment 15 by zmin@chromium.org, Apr 3 2018

 Issue 827765  has been merged into this issue.

Comment 16 by zmin@chromium.org, Apr 3 2018

 Issue 827657  has been merged into this issue.

Comment 17 by zmin@chromium.org, Apr 3 2018

 Issue 827655  has been merged into this issue.

Comment 18 by zmin@chromium.org, Apr 3 2018

 Issue 827636  has been merged into this issue.
Cc: jamescook@chromium.org
Sheriff here. There's tons of flake across chromevox_tests and it's been happening for a few days now. There is no clear regression CL that I can find.

Here's an example log which has no real useful information:

https://logs.chromium.org/v/?s=chromium%2Fbb%2Ftryserver.chromium.chromiumos%2Flinux-chromeos-rel%2F91089%2F%2B%2Frecipes%2Fsteps%2Fchromevox_tests__with_patch_%2F0%2Fstdout

I can repro locally, but flake is only seen when tests are run in batch. It seems to affect tests randomly, probably unrelated to specific test code.

Here's the stack I'm seeing (below). +James, does this look potentially like the Ozone initialization race you were telling me about before?

#0  0x00007fc76d739655 in base::internal::Invoker<base::internal::BindState<content::GpuProcessHost::InitOzone()::$_1, int, base::RepeatingCallback<void (IPC::Message*)> >, void (ui::OzonePlatform*)>::RunOnce(base::internal::BindStateBase*, ui::OzonePlatform*) () from /work/chrome/src/out/cros/./libcontent.so
(gdb) bt
#0  0x00007fc76d739655 in base::internal::Invoker<base::internal::BindState<content::GpuProcessHost::InitOzone()::$_1, int, base::RepeatingCallback<void (IPC::Message*)> >, void (ui::OzonePlatform*)>::RunOnce(base::internal::BindStateBase*, ui::OzonePlatform*) () from /work/chrome/src/out/cros/./libcontent.so
#1  0x00007fc76d738ead in base::internal::Invoker<base::internal::BindState<content::(anonymous namespace)::OzoneRegisterStartupCallbackHelper(base::OnceCallback<void (ui::OzonePlatform*)>)::$_3, base::internal::RetainedRefWrapper<base::SingleThreadTaskRunner>, base::internal::PassedWrapper<base::OnceCallback<void (ui::OzonePlatform*)> > >, void (ui::OzonePlatform*)>::RunOnce(base::internal::BindStateBase*, ui::OzonePlatform*) () from /work/chrome/src/out/cros/./libcontent.so
#2  0x00007fc769fb6b4a in ui::OzonePlatform::RegisterStartupCallback(base::OnceCallback<void (ui::OzonePlatform*)>) () from /work/chrome/src/out/cros/./libozone.so
#3  0x00007fc76d735825 in content::(anonymous namespace)::OzoneRegisterStartupCallbackHelper(base::OnceCallback<void (ui::OzonePlatform*)>) () from /work/chrome/src/out/cros/./libcontent.so
#4  0x00007fc76d735730 in content::GpuProcessHost::InitOzone() () from /work/chrome/src/out/cros/./libcontent.so
#5  0x00007fc76d73387e in content::GpuProcessHost::Init() () from /work/chrome/src/out/cros/./libcontent.so
#6  0x00007fc76d73320b in content::GpuProcessHost::Get(content::GpuProcessHost::GpuProcessKind, bool) () from /work/chrome/src/out/cros/./libcontent.so
#7  0x00007fc76d721a36 in content::BrowserGpuChannelHostFactory::EstablishRequest::EstablishOnIO() () from /work/chrome/src/out/cros/./libcontent.so
#8  0x00007fc76d185a3e in base::internal::Invoker<base::internal::BindState<base::internal::IgnoreResultHelper<bool (content::PlatformNotificationContextImpl::*)()>, scoped_refptr<content::PlatformNotificationContextImpl> >, void ()>::Run(base::internal::BindStateBase*) () from /work/chrome/src/out/cros/./libcontent.so
#9  0x00007fc7702da654 in base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) () from /work/chrome/src/out/cros/./libbase.so
#10 0x00007fc77030ccb9 in base::internal::IncomingTaskQueue::RunTask(base::PendingTask*) () from /work/chrome/src/out/cros/./libbase.so
#11 0x00007fc77031080b in base::MessageLoop::RunTask(base::PendingTask*) () from /work/chrome/src/out/cros/./libbase.so
#12 0x00007fc770310baa in base::MessageLoop::DeferOrRunPendingTask(base::PendingTask) () from /work/chrome/src/out/cros/./libbase.so
#13 0x00007fc770310e0c in base::MessageLoop::DoWork() () from /work/chrome/src/out/cros/./libbase.so
#14 0x00007fc770313239 in base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) () from /work/chrome/src/out/cros/./libbase.so
#15 0x00007fc770310139 in base::MessageLoop::Run(bool) () from /work/chrome/src/out/cros/./libbase.so
#16 0x00007fc770346e59 in base::RunLoop::Run() () from /work/chrome/src/out/cros/./libbase.so
#17 0x00007fc770386b07 in base::Thread::Run(base::RunLoop*) () from /work/chrome/src/out/cros/./libbase.so
#18 0x00007fc76d59c374 in content::BrowserProcessSubThread::IOThreadRun(base::RunLoop*) () from /work/chrome/src/out/cros/./libcontent.so
#19 0x00007fc76d59c314 in content::BrowserProcessSubThread::Run(base::RunLoop*) () from /work/chrome/src/out/cros/./libcontent.so
#20 0x00007fc7703870bd in base::Thread::ThreadMain() () from /work/chrome/src/out/cros/./libbase.so

Comment 20 by zmin@chromium.org, Apr 3 2018

 Issue 827471  has been merged into this issue.

Comment 21 by zmin@chromium.org, Apr 3 2018

 Issue 827466  has been merged into this issue.

Comment 22 by zmin@chromium.org, Apr 3 2018

 Issue 827472  has been merged into this issue.

Comment 23 by zmin@chromium.org, Apr 3 2018

 Issue 827524  has been merged into this issue.

Comment 24 by zmin@chromium.org, Apr 3 2018

 Issue 827590  has been merged into this issue.

Comment 25 by zmin@chromium.org, Apr 3 2018

 Issue 827581  has been merged into this issue.
Owner: rjkroege@chromium.org
Robert, the source of flake appears to be this line of code:
https://cs.chromium.org/chromium/src/content/browser/gpu/gpu_process_host.cc?rcl=d134a09bfac89f0e5e9961403fa3425f6731f302&l=736

Specifically I can confirm that local repros have GetGpuPlatformSupportHost() racily returning null. Any ideas?

I think it's obviously a bit late to revert the change which introduced this problem, but we should get it fixed ASAP.
Cc: sadrul@chromium.org
Also +sadrul to get more eyes on this

Comment 28 by zmin@chromium.org, Apr 3 2018

 Issue 828488  has been merged into this issue.
Cc: rjkroege@chromium.org
Owner: sky@chromium.org
OK, did more investigation and the issue appears to be Mus. GpuProcessHost::InitOzone() assumes OzonePlatform::InitializeForUI() has already been called by the time it runs. This is always true except for when Mus is enabled, where don't call OzonePlatform::InitializeForUI until the ui service is started on its own background thread.

I think we need to force GpuProcessHost::InitOzone()[1] (or ui::OzonePlatform::RegisterStartupCallback's invocation of its callback[2]) until after ui service has been brought up. I defer to sky@ and/or rjkroege@ to help figure out the best way to do that.

[1] https://cs.chromium.org/chromium/src/content/browser/gpu/gpu_process_host.cc?rcl=ae910bccac3f28ce18316a1a308a6a66a5f8b993&l=742
[2] https://cs.chromium.org/chromium/src/ui/ozone/public/ozone_platform.cc?rcl=a6aadcce39e544094dc0ec25d538d4636462d03e&l=110
I have an alternative proposal for a fix which I'll express in CL form shortly.

Comment 31 by zmin@chromium.org, Apr 3 2018

 Issue 828514  has been merged into this issue.

Comment 32 by zmin@chromium.org, Apr 3 2018

 Issue 828534  has been merged into this issue.

Comment 33 by zmin@chromium.org, Apr 3 2018

 Issue 828548  has been merged into this issue.

Comment 34 by zmin@chromium.org, Apr 3 2018

 Issue 828555  has been merged into this issue.
OK, I've exhausted my available energy for looking into this bug.

It is definitely a race between ui service initialization (OnStart) and GpuProcessHost initialization, as explained in c#29.

I was going to suggest that we simply let Aura initialize OzonePlatform in the Mus case, but this is complicated by global state management around both InputDeviceManager and DeviceDataManager. Not sure what the best path forward is, but Mus-enabled Chrome will be flakily crashy on startup until this is resolved.
This sounds vaguely similar to  issue 807781  "DCHECK gfx::ClientNativePixmapFactory::GetInstance() in ui::Service::OnStart()", where both ui service OnStart and aura init were fighting over initialization, in that case of a pixmap factory.

There is a separate issue 824809 "mus: Startup crash on device in ui::DrmDisplayHostManager::DrmDisplayHostManager", but that's on-device only.

Comment 37 by sky@chromium.org, Apr 3 2018

Owner: sadrul@chromium.org
Sadrul is more familiar with this than I am. Sadrul, is there an easy fix for this? If not, two options:

1. Make ChromeVox tests force mus off.
2. disable mus again for dev builds.

I'm happy to do either of these, let me know what you think.

Comment 38 by zmin@chromium.org, Apr 4 2018

 Issue 828980  has been merged into this issue.
Project Member

Comment 39 by bugdroid1@chromium.org, Apr 5 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a51e3935c81c5572eb1439a1f8d83c9184f3f128

commit a51e3935c81c5572eb1439a1f8d83c9184f3f128
Author: kylechar <kylechar@chromium.org>
Date: Thu Apr 05 17:57:00 2018

Fix two Ozone initialization races.

The first race is only for Ozone DRM. In DrmDisplayHostManager
|primary_drm_device_handle_| was being accessed from multiple threads.
The value was changed by the IO thread while the OzoneUI thread was
still using it.

Change calls to GpuThreadObserver OnGpuProcessLaunched() so it happens
on the OzoneUI thread. This delays the call slightly but it removes
the possibility for a race modifying |primary_drm_device_handle_|.

The second race was with OzonePlatform::RegisterStartupCallback(). This
is called from the IO thread and it checks if there is an OzonePlatform
instance and if Ozone UI initialization has happened. If both of those
things are true, then it runs a callback immediately, otherwise it runs
a callback after those things become true. The problem is that
|g_platform_initialized_ui| was set true on the UI thread before Ozone
UI initialization was fully finished. The callback accessed a null
OzonePlatform member variable and crashed.

Make sure that |g_platform_initialized_ui| is set after initialization
is finished and that variable change is protected by the same lock used
in RegisterStartupCallback(). The lock only protects changing
|g_platform_initialized_ui|, not all of initialization, so the IO thread
won't block for an extended period.

Bug: 824809,  828407 
Change-Id: I73b9404c823c9eeaaeaba99feb1e113953a5bb1b
Reviewed-on: https://chromium-review.googlesource.com/980574
Reviewed-by: Robert Kroeger <rjkroege@chromium.org>
Commit-Queue: kylechar <kylechar@chromium.org>
Cr-Commit-Position: refs/heads/master@{#548481}
[modify] https://crrev.com/a51e3935c81c5572eb1439a1f8d83c9184f3f128/ui/ozone/platform/drm/host/drm_display_host_manager.cc
[modify] https://crrev.com/a51e3935c81c5572eb1439a1f8d83c9184f3f128/ui/ozone/platform/drm/host/drm_gpu_platform_support_host.cc
[modify] https://crrev.com/a51e3935c81c5572eb1439a1f8d83c9184f3f128/ui/ozone/platform/drm/host/drm_gpu_platform_support_host.h
[modify] https://crrev.com/a51e3935c81c5572eb1439a1f8d83c9184f3f128/ui/ozone/platform/drm/host/gpu_thread_observer.h
[modify] https://crrev.com/a51e3935c81c5572eb1439a1f8d83c9184f3f128/ui/ozone/public/ozone_platform.cc

Project Member

Comment 40 by chromium...@appspot.gserviceaccount.com, Apr 5 2018

Labels: Sheriff-Chromium
Detected 3 new flakes for test/step "BackgroundTest.OptionChildIndexCount". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiRCYWNrZ3JvdW5kVGVzdC5PcHRpb25DaGlsZEluZGV4Q291bnQM. This message was posted automatically by the chromium-try-flakes app. Since flakiness is ongoing, the issue was moved back into Sheriff Bug Queue (unless already there).
Cc: dmazz...@chromium.org
 Issue 829001  has been merged into this issue.
Summary: "CvoxBrailleUtilUnitTest.TextField" is flaky (Multiple chromevox_tests are flaky) (was: "CvoxBrailleUtilUnitTest.TextField" is flaky)
Cc: st...@chromium.org
 Issue 829489  has been merged into this issue.
 Issue 829502  has been merged into this issue.
 Issue 829515  has been merged into this issue.
chromevox_tests seems pretty flaky since 3/29/2018

https://findit-for-me.appspot.com/waterfall/list-flakes?step_name=chromevox_tests
Owner: kylec...@chromium.org
Status: Fixed (was: Assigned)
Looks like the vast majority of the flake is fixed?

https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=chromevox_tests

Sign in to add a comment