New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 896034 link

Starred by 1 user

Autotest - Telemetry connection failures on policy tests

Project Member Reported by kathrelkeld@google.com, Oct 16

Issue description

Affects about 75% of policy test runs.  E.g. https://stainless.corp.google.com/search?view=list&test=%5Epolicy_CookiesBlockedForUrls%5C.notset_allow%24&build=%5ER72%5C-11162%5C.0%5C.0%24

Failures include:
"Devtools client not yet ready: [Errno 111] Connection refused"
"Failed with DevToolsClientConnectionError while starting the browser backend"


The timing is right for it to have been the latest Catapult roll: https://chromium.googlesource.com/chromium/src/+/f4e88862a721eb17ed886760a45ee8292aae94a0

Only affects tests which set policies through fake DMS.
 
Cc: timkovich@chromium.org
Labels: ent-automation OS-Chrome
Cc: -achuith@chromium.org rcui@chromium.org
Owner: achuith@chromium.org
Status: Assigned (was: Untriaged)
That catapult roll looks innocuous. The catapult roll would need to propagate through the PFQ before it caused problems.
Status: Started (was: Assigned)
I did a bisect, and the first failing LKGM is 11152.0.0, which is Chromium 71.0.3578.0. LKGM 11151.0.0 passes, and that's Chromium 71.0.3575.1.

This is the full change log:
https://chromium.googlesource.com/chromium/src/+log/71.0.3575.0..71.0.3578.0?pretty=fuller&n=10000


Cc: kathrelk...@chromium.org
Owner: khorimoto@chromium.org
Status: Assigned (was: Started)
Bisect has revealed that this change is to blame:
https://chromium-review.googlesource.com/c/chromium/src/+/1275307


You can run the test with: 
test_that --board=amd64-generic localhost:9222 policy_CookiesBlockedForUrls.notset_allow

Cc: hsuregan@chromium.org jlklein@chromium.org jordynass@chromium.org nohle@chromium.org jhawkins@chromium.org hansberry@chromium.org
Components: UI>Multidevice
Status: Started (was: Assigned)
kathrelkeld@: I'm looking into this issue, but I'm having a hard time figuring out where to start and how I can look into what's going wrong (sorry - I've never worked on Autotest or Chromium OS). Can you point me to the test that's failing? I'm not sure why my CL would cause any changes to Devtools - can you help point me in the right direction?
Update: kathrelkeld@ helped me get the test running locally. I'll see what I can do to fix this.

Unfortunately, my team is moving to another building, so we need to unplug our computers and will not have access until Monday, so I won't be able to make progress with this issue until then. Stay tuned...
Cc: malaykeshav@chromium.org
Cc: amoylan@chromium.org
Cc: achuith@chromium.org
Update:

*I didn't have a Chromium OS environment set up, so I started going through through the setup flow here: https://chromium.googlesource.com/chromiumos/docs/+/master/developer_guide.md

*I didn't have enough space on my hard drive to finish setup. I talked to Techstop, and said I needed to re-image my machine in order to partition my disk so that it would support the Chromium OS setup.

*With the help of Techstop, I attempted to re-image my machine, but we ran into issues. Apparently, my machine has some sort of hardware issue which prevented it from re-imaging correctly. Techstop concluded that my machine can no longer boot, so they ordered me a new machine.

*Unfortunately, I won't be able to start working on this again until I get a new machine. They said that they hope to get me a new machine within a week.

Sorry for the delay :(
I would offer to work with you on my workstation on this, but looks like you're in LA :( Is there anyone else in your office with a setup? I feel like it shouldn't be hard to reproduce or debug.
Cc: wutao@chromium.org sa...@chromium.org
Turns out the problem is a Chrome crash at login:

#0  (anonymous namespace)::do_malloc(unsigned long) () at ../../third_party/tcmalloc/gperftools-2.0/chromium/src/free_list.h:91
#1  0x00005b60932860ed in tc_malloc () at ../../third_party/tcmalloc/gperftools-2.0/chromium/src/tcmalloc.cc:1045
#2  0x00005b608e86e0ee in operator new(unsigned long, std::nothrow_t const&) () at ../../base/allocator/allocator_shim.cc:159
#3  0x00005b608ce53729 in content::RenderWidgetHostImpl::SetActive(bool) () at ../../content/browser/renderer_host/render_widget_host_impl.cc:1101
#4  0x00005b608ce661ac in content::RenderWidgetHostViewAura::OnWindowFocused(aura::Window*, aura::Window*) () at ../../content/browser/renderer_host/render_widget_host_view_aura.cc:1837
#5  0x00005b60922cf861 in wm::FocusController::SetFocusedWindow(aura::Window*) () at ../../ui/wm/core/focus_controller.cc:267
#6  0x00005b608fc85022 in aura::Window::NotifyWindowVisibilityChangedAtReceiver(aura::Window*, bool) () at ../../ui/aura/window.cc:1073
#7  0x00005b608fc84c50 in aura::Window::NotifyWindowVisibilityChangedDown(aura::Window*, bool) () at ../../ui/aura/window.cc:1079
#8  0x00005b608fc819ff in aura::Window::SetVisible(bool) () at ../../ui/aura/window.cc:1060
#9  0x00005b608d6e7b4d in chromeos::LoginDisplayHostWebUI::ResetLoginWindowAndView() () at ../../chrome/browser/chromeos/login/ui/login_display_host_webui.cc:1056
#10 0x00005b608d6e79a6 in chromeos::LoginDisplayHostWebUI::~LoginDisplayHostWebUI() () at ../../chrome/browser/chromeos/login/ui/login_display_host_webui.cc:473
#11 0x00005b608d6e7cae in chromeos::LoginDisplayHostWebUI::~LoginDisplayHostWebUI() () at ../../chrome/browser/chromeos/login/ui/login_display_host_webui.cc:448
#12 0x00005b608e876391 in base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) () at ../../base/callback.h:99
#13 0x00005b608e7e983f in base::MessageLoop::RunTask(base::PendingTask*) () at ../../base/message_loop/message_loop_impl.cc:545
#14 0x00005b608e7e9be2 in base::MessageLoop::DoWork() () at ../../base/message_loop/message_loop_impl.cc:556
#15 0x00005b608e8732f9 in base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) () at ../../base/message_loop/message_pump_libevent.cc:210
#16 0x00005b608e808405 in base::RunLoop::Run() () at ../../base/run_loop.cc:102
#17 0x00005b608e44adca in ChromeBrowserMainParts::MainMessageLoopRun(int*) () at ../../chrome/browser/chrome_browser_main.cc:1908
#18 0x00005b608cb52344 in content::BrowserMainLoop::RunMainMessageLoopParts() () at ../../content/browser/browser_main_loop.cc:994
#19 0x00005b608cb54b52 in content::BrowserMainRunnerImpl::Run() () at ../../content/browser/browser_main_runner_impl.cc:165
#20 0x00005b608cb4eca8 in content::BrowserMain(content::MainFunctionParams const&) () at ../../content/browser/browser_main.cc:47
#21 0x00005b608e43d3a6 in content::ContentMainRunnerImpl::RunServiceManager(content::MainFunctionParams&, bool) () at ../../content/app/content_main_runner_impl.cc:547
#22 0x00005b608e43d269 in content::ContentMainRunnerImpl::Run(bool) () at ../../content/app/content_main_runner_impl.cc:868
#23 0x00005b608e4439f2 in service_manager::Main(service_manager::MainParams const&) () at ../../services/service_manager/embedder/main.cc:472
#24 0x00005b608e43b701 in content::ContentMain(content::ContentMainParams const&) () at ../../content/app/content_main.cc:19
#25 0x00005b608bdd993f in ChromeMain () at ../../chrome/app/chrome_main.cc:102
#26 0x00007e040e3c7736 in __libc_start_main (main=0x5b608bdd98a0 <main>, argc=54, argv=0x7ffc7c43bb88, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc7c43bb78) at #27 0x00005b608bdd9769 in _start ()

Nothing jumps out at me as an obvious link to my CL, so I'll do some more digging.
Cc: alemate@chromium.org xiy...@chromium.org jdufault@chromium.org
After further investigation, I've realized that the crash does not occur at the same spot every time. Another sample stack trace is below:

#0  0x000079770d61e947 in __libc_send (fd=187, buf=0x17e3db355cc0, n=136, flags=16384) at ../sysdeps/unix/sysv/linux/x86_64/send.c:26
#1  0x0000568eaf8bc4b8 in mojo::core::(anonymous namespace)::ChannelPosix::WriteNoLock(mojo::core::(anonymous namespace)::MessageView) () at ../../mojo/core/channel_posix.cc:551
#2  0x0000568eaf8baba5 in mojo::core::(anonymous namespace)::ChannelPosix::Write(std::__1::unique_ptr<mojo::core::Channel::Message, std::__1::default_delete<mojo::core::Channel::Message> >) () at ../../mojo/core/channel_posix.cc:168
#3  0x0000568eaf8a9064 in mojo::core::NodeChannel::WriteChannelMessage(std::__1::unique_ptr<mojo::core::Channel::Message, std::__1::default_delete<mojo::core::Channel::Message> >) () at ../../mojo/core/node_channel.cc:727
#4  0x0000568eaf8a9a3d in mojo::core::NodeChannel::SendChannelMessage(std::__1::unique_ptr<mojo::core::Channel::Message, std::__1::default_delete<mojo::core::Channel::Message> >) () at ../../mojo/core/node_channel.cc:362
#5  0x0000568eaf8adb16 in mojo::core::NodeController::SendPeerEvent(mojo::core::ports::NodeName const&, std::__1::unique_ptr<mojo::core::ports::Event, std::__1::default_delete<mojo::core::ports::Event> >) () at ../../mojo/core/node_controller.cc:616
#6  0x0000568eaf8ae064 in mojo::core::NodeController::ForwardEvent(mojo::core::ports::NodeName const&, std::__1::unique_ptr<mojo::core::ports::Event, std::__1::default_delete<mojo::core::ports::Event> >) () at ../../mojo/core/node_controller.cc:694
#7  0x0000568eb2d4ecab in mojo::core::ports::Node::SendUserMessageInternal(mojo::core::ports::PortRef const&, std::__1::unique_ptr<mojo::core::ports::UserMessageEvent, std::__1::default_delete<mojo::core::ports::UserMessageEvent> >*) () at ../../mojo/core/ports/node.cc:841
#8  0x0000568eb2d4eb0d in mojo::core::ports::Node::SendUserMessage(mojo::core::ports::PortRef const&, std::__1::unique_ptr<mojo::core::ports::UserMessageEvent, std::__1::default_delete<mojo::core::ports::UserMessageEvent> >) () at ../../mojo/core/ports/node.cc:362
#9  0x0000568eaf8ac7f1 in mojo::core::NodeController::SendUserMessage(mojo::core::ports::PortRef const&, std::__1::unique_ptr<mojo::core::ports::UserMessageEvent, std::__1::default_delete<mojo::core::ports::UserMessageEvent> >) () at ../../mojo/core/node_controller.cc:257
#10 0x0000568eaf8a8042 in mojo::core::MessagePipeDispatcher::WriteMessage(std::__1::unique_ptr<mojo::core::ports::UserMessageEvent, std::__1::default_delete<mojo::core::ports::UserMessageEvent> >) () at ../../mojo/core/message_pipe_dispatcher.cc:143
#11 0x0000568eaf89e0ae in mojo::core::Core::WriteMessage(unsigned int, unsigned long, MojoWriteMessageOptions const*) () at ../../mojo/core/core.cc:572
#12 0x0000568eb1829cd8 in mojo::Connector::Accept(mojo::Message*) () at ../../mojo/public/cpp/system/message_pipe.h:97
#13 0x0000568eb182b48d in mojo::InterfaceEndpointClient::AcceptWithResponder(mojo::Message*, std::__1::unique_ptr<mojo::MessageReceiver, std::__1::default_delete<mojo::MessageReceiver> >) () at ../../mojo/public/cpp/bindings/lib/interface_endpoint_client.cc:269
#14 0x0000568eaf7c6dcb in viz::mojom::InputTargetClientProxy::FrameSinkIdAt(gfx::PointF const&, unsigned long, base::OnceCallback<void (viz::FrameSinkId const&, gfx::PointF const&)>) () at gen/services/viz/public/interfaces/hit_test/input_target_client.mojom.cc:95
#15 0x0000568eafdc7481 in content::RenderWidgetTargeter::QueryClientInternal(content::RenderWidgetHostViewBase*, content::RenderWidgetHostViewBase*, blink::WebInputEvent const&, ui::LatencyInfo const&, gfx::PointF const&, content::RenderWidgetHostViewBase*, gfx::PointF const&, viz::FrameSinkId const&) () at ../../content/browser/renderer_host/render_widget_targeter.cc:257
#16 0x0000568eafdc682f in content::RenderWidgetTargeter::FindTargetAndDispatch(content::RenderWidgetHostViewBase*, blink::WebInputEvent const&, ui::LatencyInfo const&) () at ../../content/browser/renderer_host/render_widget_targeter.cc:275
#17 0x0000568eaff46609 in content::RenderWidgetHostViewEventHandler::OnMouseEvent(ui::MouseEvent*) () at ../../content/browser/renderer_host/render_widget_host_view_event_handler.cc:396
#18 0x0000568eb202b6f6 in ui::EventDispatcher::ProcessEvent(ui::EventTarget*, ui::Event*) () at ../../ui/events/event_dispatcher.cc:193
#19 0x0000568eb202b4d1 in ui::EventDispatcherDelegate::DispatchEventToTarget(ui::EventTarget*, ui::Event*) () at ../../ui/events/event_dispatcher.cc:86
#20 0x0000568eb202b413 in ui::EventDispatcherDelegate::DispatchEvent(ui::EventTarget*, ui::Event*) () at ../../ui/events/event_dispatcher.cc:58
#21 0x0000568eb2b50792 in aura::WindowEventDispatcher::DispatchMouseEnterOrExit(aura::Window*, ui::MouseEvent const&, ui::EventType) () at ../../ui/aura/window_event_dispatcher.cc:336
#22 0x0000568eb2b510cf in aura::WindowEventDispatcher::DispatchMouseExitToHidingWindow(aura::Window*) () at ../../ui/aura/window_event_dispatcher.cc:195
#23 0x0000568eb2b4a25e in aura::Window::SetVisible(bool) () at ../../ui/aura/window.cc:868
#24 0x0000568eb064808d in chromeos::LoginDisplayHostWebUI::ResetLoginWindowAndView() () at ../../chrome/browser/chromeos/login/ui/login_display_host_webui.cc:1056
#25 0x0000568eb0647ee6 in chromeos::LoginDisplayHostWebUI::~LoginDisplayHostWebUI() () at ../../chrome/browser/chromeos/login/ui/login_display_host_webui.cc:473
#26 0x0000568eb06481ee in chromeos::LoginDisplayHostWebUI::~LoginDisplayHostWebUI() () at ../../chrome/browser/chromeos/login/ui/login_display_host_webui.cc:448
#27 0x0000568eb17d5e51 in base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) () at ../../base/callback.h:99
#28 0x0000568eb174385f in base::MessageLoopImpl::RunTask(base::PendingTask*) () at ../../base/message_loop/message_loop_impl.cc:404
#29 0x0000568eb1743da2 in base::MessageLoopImpl::DoWork() () at ../../base/message_loop/message_loop_impl.cc:415
#30 0x0000568eb17d27b9 in base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) () at ../../base/message_loop/message_pump_libevent.cc:210
#31 0x0000568eb1763a75 in base::RunLoop::Run() () at ../../base/run_loop.cc:102
#32 0x0000568eb13a125a in ChromeBrowserMainParts::MainMessageLoopRun(int*) () at ../../chrome/browser/chrome_browser_main.cc:1908
#33 0x0000568eafa9e404 in content::BrowserMainLoop::RunMainMessageLoopParts() () at ../../content/browser/browser_main_loop.cc:993
#34 0x0000568eafaa0cb2 in content::BrowserMainRunnerImpl::Run() () at ../../content/browser/browser_main_runner_impl.cc:165
#35 0x0000568eafa9ad68 in content::BrowserMain(content::MainFunctionParams const&) () at ../../content/browser/browser_main.cc:47
#36 0x0000568eb1393728 in content::ContentMainRunnerImpl::Run(bool) () at ../../content/app/content_main_runner_impl.cc:537
#37 0x0000568eb1399dbd in service_manager::Main(service_manager::MainParams const&) () at ../../services/service_manager/embedder/main.cc:472
#38 0x0000568eb1391951 in content::ContentMain(content::ContentMainParams const&) () at ../../content/app/content_main.cc:19
#39 0x0000568eaed6a95f in ChromeMain () at ../../chrome/app/chrome_main.cc:102
#40 0x000079770ca03736 in __libc_start_main (main=0x568eaed6a8c0 <main>, argc=54, argv=0x7ffe6d3e2e28, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe6d3e2e18) at ../csu/libc-start.c:289
#41 0x0000568eaed6a789 in _start ()

That being said, I have noticed that each stack trace starts due to the fact that chromeos::LoginDisplayHostWebUI's destructor calls ResetLoginWindowAndView(), which invokes aura::Window::SetVisible() via the Hide() call here: https://cs.chromium.org/chromium/src/chrome/browser/chromeos/login/ui/login_display_host_webui.cc?l=1056&q=Hide

Do any //chrome/browser/chromeos/login OWNERS have any insights here? I'm still having a hard time figuring out how this is related to my CL, which didn't touch any of these files.
Sounds like  issue 900615 , where we could double delete LoginDisplayHostWebUI.

The fix on truck should be available after 72.0.3600.0 and merge to M71 after 71.0.3578.38. 

Does the version you test have the fix?
Cc: r...@chromium.org khorimoto@chromium.org
Owner: ----
Status: Available (was: Started)
xiyuan@ and I dug through this issue for a few hours - here are our findings:

(1) The autotest failure is caused by a Chrome crash at login. The crash occurs with or without my CL (see comment #4).
(2) Just before the crash, we see this log: "Policy fetch failed for the user. Aborting profile initialization", which is produced by [1].
(3) When the policy fetch fails, a callback is invoked, which ends up calling the OnUserPolicyFatalError() function at [2].
(4) This function results in chrome::AttemptUserExit() (i.e., a user sign-out).
(5) This user signout causes the login window to be deleted, but the LoginDisplayHostWebui class at [3] does not know that the window is deleted.
(6) Thus, when LoginDisplayHostWebui is deleted, it causes a segfault by trying to hide a window that has been deleted.
(7) I tried to work around this issue by making LoginDisplayHostWebui observe |login_window_| and clearing the reference in OnWidgetDestroyed(). This did bypass the original crash in ~LoginDisplayHostWebui(), but it led to another crash, this time in ash.

Since I don't work on login or ash, I don't think I'm the best person to continue looking into this issue considering that the crash exists with or without my original CL in question.

[1] https://cs.chromium.org/chromium/src/chrome/browser/chromeos/policy/user_cloud_policy_manager_chromeos.cc?q="Policy+fetch+failed+for+the+user"
[2] https://cs.chromium.org/chromium/src/chrome/browser/chromeos/policy/user_policy_manager_factory_chromeos.cc?q=OnUserPolicyFatalError
[3] https://cs.chromium.org/chromium/src/chrome/browser/chromeos/login/ui/login_display_host_webui.h
Owner: jdufault@chromium.org
Status: Assigned (was: Available)
Is this a possible duplicate of  issue 897278 ?
Re: comment #20: It does seem to be failing consistently for me on local test runs.
Owner: alemate@chromium.org
alemate@ can you investigate?
Cc: updowndota@chromium.org weidongg@chromium.org
Cc: nverne@chromium.org agawronska@chromium.org
Cc: -agawronska@chromium.org -nverne@chromium.org
Components: -UI>Multidevice

Sign in to add a comment