Autotest - Telemetry connection failures on policy tests |
|||||||||||||||||
Issue descriptionAffects about 75% of policy test runs. E.g. https://stainless.corp.google.com/search?view=list&test=%5Epolicy_CookiesBlockedForUrls%5C.notset_allow%24&build=%5ER72%5C-11162%5C.0%5C.0%24 Failures include: "Devtools client not yet ready: [Errno 111] Connection refused" "Failed with DevToolsClientConnectionError while starting the browser backend" The timing is right for it to have been the latest Catapult roll: https://chromium.googlesource.com/chromium/src/+/f4e88862a721eb17ed886760a45ee8292aae94a0 Only affects tests which set policies through fake DMS.
,
Oct 16
That catapult roll looks innocuous. The catapult roll would need to propagate through the PFQ before it caused problems.
,
Oct 17
I did a bisect, and the first failing LKGM is 11152.0.0, which is Chromium 71.0.3578.0. LKGM 11151.0.0 passes, and that's Chromium 71.0.3575.1. This is the full change log: https://chromium.googlesource.com/chromium/src/+log/71.0.3575.0..71.0.3578.0?pretty=fuller&n=10000
,
Oct 18
Bisect has revealed that this change is to blame: https://chromium-review.googlesource.com/c/chromium/src/+/1275307
,
Oct 18
You can run the test with: test_that --board=amd64-generic localhost:9222 policy_CookiesBlockedForUrls.notset_allow
,
Oct 18
,
Oct 18
kathrelkeld@: I'm looking into this issue, but I'm having a hard time figuring out where to start and how I can look into what's going wrong (sorry - I've never worked on Autotest or Chromium OS). Can you point me to the test that's failing? I'm not sure why my CL would cause any changes to Devtools - can you help point me in the right direction?
,
Oct 18
Update: kathrelkeld@ helped me get the test running locally. I'll see what I can do to fix this. Unfortunately, my team is moving to another building, so we need to unplug our computers and will not have access until Monday, so I won't be able to make progress with this issue until then. Stay tuned...
,
Oct 22
,
Oct 22
,
Oct 25
,
Oct 26
Update: *I didn't have a Chromium OS environment set up, so I started going through through the setup flow here: https://chromium.googlesource.com/chromiumos/docs/+/master/developer_guide.md *I didn't have enough space on my hard drive to finish setup. I talked to Techstop, and said I needed to re-image my machine in order to partition my disk so that it would support the Chromium OS setup. *With the help of Techstop, I attempted to re-image my machine, but we ran into issues. Apparently, my machine has some sort of hardware issue which prevented it from re-imaging correctly. Techstop concluded that my machine can no longer boot, so they ordered me a new machine. *Unfortunately, I won't be able to start working on this again until I get a new machine. They said that they hope to get me a new machine within a week. Sorry for the delay :(
,
Oct 26
I would offer to work with you on my workstation on this, but looks like you're in LA :( Is there anyone else in your office with a setup? I feel like it shouldn't be hard to reproduce or debug.
,
Nov 5
,
Nov 7
Turns out the problem is a Chrome crash at login: #0 (anonymous namespace)::do_malloc(unsigned long) () at ../../third_party/tcmalloc/gperftools-2.0/chromium/src/free_list.h:91 #1 0x00005b60932860ed in tc_malloc () at ../../third_party/tcmalloc/gperftools-2.0/chromium/src/tcmalloc.cc:1045 #2 0x00005b608e86e0ee in operator new(unsigned long, std::nothrow_t const&) () at ../../base/allocator/allocator_shim.cc:159 #3 0x00005b608ce53729 in content::RenderWidgetHostImpl::SetActive(bool) () at ../../content/browser/renderer_host/render_widget_host_impl.cc:1101 #4 0x00005b608ce661ac in content::RenderWidgetHostViewAura::OnWindowFocused(aura::Window*, aura::Window*) () at ../../content/browser/renderer_host/render_widget_host_view_aura.cc:1837 #5 0x00005b60922cf861 in wm::FocusController::SetFocusedWindow(aura::Window*) () at ../../ui/wm/core/focus_controller.cc:267 #6 0x00005b608fc85022 in aura::Window::NotifyWindowVisibilityChangedAtReceiver(aura::Window*, bool) () at ../../ui/aura/window.cc:1073 #7 0x00005b608fc84c50 in aura::Window::NotifyWindowVisibilityChangedDown(aura::Window*, bool) () at ../../ui/aura/window.cc:1079 #8 0x00005b608fc819ff in aura::Window::SetVisible(bool) () at ../../ui/aura/window.cc:1060 #9 0x00005b608d6e7b4d in chromeos::LoginDisplayHostWebUI::ResetLoginWindowAndView() () at ../../chrome/browser/chromeos/login/ui/login_display_host_webui.cc:1056 #10 0x00005b608d6e79a6 in chromeos::LoginDisplayHostWebUI::~LoginDisplayHostWebUI() () at ../../chrome/browser/chromeos/login/ui/login_display_host_webui.cc:473 #11 0x00005b608d6e7cae in chromeos::LoginDisplayHostWebUI::~LoginDisplayHostWebUI() () at ../../chrome/browser/chromeos/login/ui/login_display_host_webui.cc:448 #12 0x00005b608e876391 in base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) () at ../../base/callback.h:99 #13 0x00005b608e7e983f in base::MessageLoop::RunTask(base::PendingTask*) () at ../../base/message_loop/message_loop_impl.cc:545 #14 0x00005b608e7e9be2 in base::MessageLoop::DoWork() () at ../../base/message_loop/message_loop_impl.cc:556 #15 0x00005b608e8732f9 in base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) () at ../../base/message_loop/message_pump_libevent.cc:210 #16 0x00005b608e808405 in base::RunLoop::Run() () at ../../base/run_loop.cc:102 #17 0x00005b608e44adca in ChromeBrowserMainParts::MainMessageLoopRun(int*) () at ../../chrome/browser/chrome_browser_main.cc:1908 #18 0x00005b608cb52344 in content::BrowserMainLoop::RunMainMessageLoopParts() () at ../../content/browser/browser_main_loop.cc:994 #19 0x00005b608cb54b52 in content::BrowserMainRunnerImpl::Run() () at ../../content/browser/browser_main_runner_impl.cc:165 #20 0x00005b608cb4eca8 in content::BrowserMain(content::MainFunctionParams const&) () at ../../content/browser/browser_main.cc:47 #21 0x00005b608e43d3a6 in content::ContentMainRunnerImpl::RunServiceManager(content::MainFunctionParams&, bool) () at ../../content/app/content_main_runner_impl.cc:547 #22 0x00005b608e43d269 in content::ContentMainRunnerImpl::Run(bool) () at ../../content/app/content_main_runner_impl.cc:868 #23 0x00005b608e4439f2 in service_manager::Main(service_manager::MainParams const&) () at ../../services/service_manager/embedder/main.cc:472 #24 0x00005b608e43b701 in content::ContentMain(content::ContentMainParams const&) () at ../../content/app/content_main.cc:19 #25 0x00005b608bdd993f in ChromeMain () at ../../chrome/app/chrome_main.cc:102 #26 0x00007e040e3c7736 in __libc_start_main (main=0x5b608bdd98a0 <main>, argc=54, argv=0x7ffc7c43bb88, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc7c43bb78) at #27 0x00005b608bdd9769 in _start () Nothing jumps out at me as an obvious link to my CL, so I'll do some more digging.
,
Nov 7
After further investigation, I've realized that the crash does not occur at the same spot every time. Another sample stack trace is below: #0 0x000079770d61e947 in __libc_send (fd=187, buf=0x17e3db355cc0, n=136, flags=16384) at ../sysdeps/unix/sysv/linux/x86_64/send.c:26 #1 0x0000568eaf8bc4b8 in mojo::core::(anonymous namespace)::ChannelPosix::WriteNoLock(mojo::core::(anonymous namespace)::MessageView) () at ../../mojo/core/channel_posix.cc:551 #2 0x0000568eaf8baba5 in mojo::core::(anonymous namespace)::ChannelPosix::Write(std::__1::unique_ptr<mojo::core::Channel::Message, std::__1::default_delete<mojo::core::Channel::Message> >) () at ../../mojo/core/channel_posix.cc:168 #3 0x0000568eaf8a9064 in mojo::core::NodeChannel::WriteChannelMessage(std::__1::unique_ptr<mojo::core::Channel::Message, std::__1::default_delete<mojo::core::Channel::Message> >) () at ../../mojo/core/node_channel.cc:727 #4 0x0000568eaf8a9a3d in mojo::core::NodeChannel::SendChannelMessage(std::__1::unique_ptr<mojo::core::Channel::Message, std::__1::default_delete<mojo::core::Channel::Message> >) () at ../../mojo/core/node_channel.cc:362 #5 0x0000568eaf8adb16 in mojo::core::NodeController::SendPeerEvent(mojo::core::ports::NodeName const&, std::__1::unique_ptr<mojo::core::ports::Event, std::__1::default_delete<mojo::core::ports::Event> >) () at ../../mojo/core/node_controller.cc:616 #6 0x0000568eaf8ae064 in mojo::core::NodeController::ForwardEvent(mojo::core::ports::NodeName const&, std::__1::unique_ptr<mojo::core::ports::Event, std::__1::default_delete<mojo::core::ports::Event> >) () at ../../mojo/core/node_controller.cc:694 #7 0x0000568eb2d4ecab in mojo::core::ports::Node::SendUserMessageInternal(mojo::core::ports::PortRef const&, std::__1::unique_ptr<mojo::core::ports::UserMessageEvent, std::__1::default_delete<mojo::core::ports::UserMessageEvent> >*) () at ../../mojo/core/ports/node.cc:841 #8 0x0000568eb2d4eb0d in mojo::core::ports::Node::SendUserMessage(mojo::core::ports::PortRef const&, std::__1::unique_ptr<mojo::core::ports::UserMessageEvent, std::__1::default_delete<mojo::core::ports::UserMessageEvent> >) () at ../../mojo/core/ports/node.cc:362 #9 0x0000568eaf8ac7f1 in mojo::core::NodeController::SendUserMessage(mojo::core::ports::PortRef const&, std::__1::unique_ptr<mojo::core::ports::UserMessageEvent, std::__1::default_delete<mojo::core::ports::UserMessageEvent> >) () at ../../mojo/core/node_controller.cc:257 #10 0x0000568eaf8a8042 in mojo::core::MessagePipeDispatcher::WriteMessage(std::__1::unique_ptr<mojo::core::ports::UserMessageEvent, std::__1::default_delete<mojo::core::ports::UserMessageEvent> >) () at ../../mojo/core/message_pipe_dispatcher.cc:143 #11 0x0000568eaf89e0ae in mojo::core::Core::WriteMessage(unsigned int, unsigned long, MojoWriteMessageOptions const*) () at ../../mojo/core/core.cc:572 #12 0x0000568eb1829cd8 in mojo::Connector::Accept(mojo::Message*) () at ../../mojo/public/cpp/system/message_pipe.h:97 #13 0x0000568eb182b48d in mojo::InterfaceEndpointClient::AcceptWithResponder(mojo::Message*, std::__1::unique_ptr<mojo::MessageReceiver, std::__1::default_delete<mojo::MessageReceiver> >) () at ../../mojo/public/cpp/bindings/lib/interface_endpoint_client.cc:269 #14 0x0000568eaf7c6dcb in viz::mojom::InputTargetClientProxy::FrameSinkIdAt(gfx::PointF const&, unsigned long, base::OnceCallback<void (viz::FrameSinkId const&, gfx::PointF const&)>) () at gen/services/viz/public/interfaces/hit_test/input_target_client.mojom.cc:95 #15 0x0000568eafdc7481 in content::RenderWidgetTargeter::QueryClientInternal(content::RenderWidgetHostViewBase*, content::RenderWidgetHostViewBase*, blink::WebInputEvent const&, ui::LatencyInfo const&, gfx::PointF const&, content::RenderWidgetHostViewBase*, gfx::PointF const&, viz::FrameSinkId const&) () at ../../content/browser/renderer_host/render_widget_targeter.cc:257 #16 0x0000568eafdc682f in content::RenderWidgetTargeter::FindTargetAndDispatch(content::RenderWidgetHostViewBase*, blink::WebInputEvent const&, ui::LatencyInfo const&) () at ../../content/browser/renderer_host/render_widget_targeter.cc:275 #17 0x0000568eaff46609 in content::RenderWidgetHostViewEventHandler::OnMouseEvent(ui::MouseEvent*) () at ../../content/browser/renderer_host/render_widget_host_view_event_handler.cc:396 #18 0x0000568eb202b6f6 in ui::EventDispatcher::ProcessEvent(ui::EventTarget*, ui::Event*) () at ../../ui/events/event_dispatcher.cc:193 #19 0x0000568eb202b4d1 in ui::EventDispatcherDelegate::DispatchEventToTarget(ui::EventTarget*, ui::Event*) () at ../../ui/events/event_dispatcher.cc:86 #20 0x0000568eb202b413 in ui::EventDispatcherDelegate::DispatchEvent(ui::EventTarget*, ui::Event*) () at ../../ui/events/event_dispatcher.cc:58 #21 0x0000568eb2b50792 in aura::WindowEventDispatcher::DispatchMouseEnterOrExit(aura::Window*, ui::MouseEvent const&, ui::EventType) () at ../../ui/aura/window_event_dispatcher.cc:336 #22 0x0000568eb2b510cf in aura::WindowEventDispatcher::DispatchMouseExitToHidingWindow(aura::Window*) () at ../../ui/aura/window_event_dispatcher.cc:195 #23 0x0000568eb2b4a25e in aura::Window::SetVisible(bool) () at ../../ui/aura/window.cc:868 #24 0x0000568eb064808d in chromeos::LoginDisplayHostWebUI::ResetLoginWindowAndView() () at ../../chrome/browser/chromeos/login/ui/login_display_host_webui.cc:1056 #25 0x0000568eb0647ee6 in chromeos::LoginDisplayHostWebUI::~LoginDisplayHostWebUI() () at ../../chrome/browser/chromeos/login/ui/login_display_host_webui.cc:473 #26 0x0000568eb06481ee in chromeos::LoginDisplayHostWebUI::~LoginDisplayHostWebUI() () at ../../chrome/browser/chromeos/login/ui/login_display_host_webui.cc:448 #27 0x0000568eb17d5e51 in base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) () at ../../base/callback.h:99 #28 0x0000568eb174385f in base::MessageLoopImpl::RunTask(base::PendingTask*) () at ../../base/message_loop/message_loop_impl.cc:404 #29 0x0000568eb1743da2 in base::MessageLoopImpl::DoWork() () at ../../base/message_loop/message_loop_impl.cc:415 #30 0x0000568eb17d27b9 in base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) () at ../../base/message_loop/message_pump_libevent.cc:210 #31 0x0000568eb1763a75 in base::RunLoop::Run() () at ../../base/run_loop.cc:102 #32 0x0000568eb13a125a in ChromeBrowserMainParts::MainMessageLoopRun(int*) () at ../../chrome/browser/chrome_browser_main.cc:1908 #33 0x0000568eafa9e404 in content::BrowserMainLoop::RunMainMessageLoopParts() () at ../../content/browser/browser_main_loop.cc:993 #34 0x0000568eafaa0cb2 in content::BrowserMainRunnerImpl::Run() () at ../../content/browser/browser_main_runner_impl.cc:165 #35 0x0000568eafa9ad68 in content::BrowserMain(content::MainFunctionParams const&) () at ../../content/browser/browser_main.cc:47 #36 0x0000568eb1393728 in content::ContentMainRunnerImpl::Run(bool) () at ../../content/app/content_main_runner_impl.cc:537 #37 0x0000568eb1399dbd in service_manager::Main(service_manager::MainParams const&) () at ../../services/service_manager/embedder/main.cc:472 #38 0x0000568eb1391951 in content::ContentMain(content::ContentMainParams const&) () at ../../content/app/content_main.cc:19 #39 0x0000568eaed6a95f in ChromeMain () at ../../chrome/app/chrome_main.cc:102 #40 0x000079770ca03736 in __libc_start_main (main=0x568eaed6a8c0 <main>, argc=54, argv=0x7ffe6d3e2e28, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe6d3e2e18) at ../csu/libc-start.c:289 #41 0x0000568eaed6a789 in _start () That being said, I have noticed that each stack trace starts due to the fact that chromeos::LoginDisplayHostWebUI's destructor calls ResetLoginWindowAndView(), which invokes aura::Window::SetVisible() via the Hide() call here: https://cs.chromium.org/chromium/src/chrome/browser/chromeos/login/ui/login_display_host_webui.cc?l=1056&q=Hide Do any //chrome/browser/chromeos/login OWNERS have any insights here? I'm still having a hard time figuring out how this is related to my CL, which didn't touch any of these files.
,
Nov 7
Sounds like issue 900615 , where we could double delete LoginDisplayHostWebUI. The fix on truck should be available after 72.0.3600.0 and merge to M71 after 71.0.3578.38. Does the version you test have the fix?
,
Nov 8
xiyuan@ and I dug through this issue for a few hours - here are our findings: (1) The autotest failure is caused by a Chrome crash at login. The crash occurs with or without my CL (see comment #4). (2) Just before the crash, we see this log: "Policy fetch failed for the user. Aborting profile initialization", which is produced by [1]. (3) When the policy fetch fails, a callback is invoked, which ends up calling the OnUserPolicyFatalError() function at [2]. (4) This function results in chrome::AttemptUserExit() (i.e., a user sign-out). (5) This user signout causes the login window to be deleted, but the LoginDisplayHostWebui class at [3] does not know that the window is deleted. (6) Thus, when LoginDisplayHostWebui is deleted, it causes a segfault by trying to hide a window that has been deleted. (7) I tried to work around this issue by making LoginDisplayHostWebui observe |login_window_| and clearing the reference in OnWidgetDestroyed(). This did bypass the original crash in ~LoginDisplayHostWebui(), but it led to another crash, this time in ash. Since I don't work on login or ash, I don't think I'm the best person to continue looking into this issue considering that the crash exists with or without my original CL in question. [1] https://cs.chromium.org/chromium/src/chrome/browser/chromeos/policy/user_cloud_policy_manager_chromeos.cc?q="Policy+fetch+failed+for+the+user" [2] https://cs.chromium.org/chromium/src/chrome/browser/chromeos/policy/user_policy_manager_factory_chromeos.cc?q=OnUserPolicyFatalError [3] https://cs.chromium.org/chromium/src/chrome/browser/chromeos/login/ui/login_display_host_webui.h
,
Nov 8
,
Nov 8
I'm confused by this; is this still failing? Using the stainless link in #1 shows all runs (except 1 likely unrelated flake) passing from 11/08 to 11/02. https://stainless.corp.google.com/search?view=list&first_date=2018-11-02&last_date=2018-11-08&test=%5Epolicy_CookiesBlockedForUrls%5C.notset_allow%24&exclude_cts=false&exclude_not_run=true&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false
,
Nov 8
Is this a possible duplicate of issue 897278 ?
,
Nov 8
Re: comment #20: It does seem to be failing consistently for me on local test runs.
,
Nov 8
alemate@ can you investigate?
,
Nov 12
,
Nov 19
,
Nov 26
,
Dec 26
|
|||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||
Comment 1 by kathrelkeld@google.com
, Oct 16Labels: ent-automation OS-Chrome