Issue metadata
Sign in to add a comment
|
chrome://tracing hangs after recording |
||||||||||||||||||||||
Issue descriptionChrome Version: tested w/ 52.0.2734.0 & 52.0.2739.0 Chrome OS Version: 8343.0.0 Chrome OS Platform: oak & peach_pi Steps To Reproduce: (1) chrome://tracing (2) Record (3) Select anything in "Record a new trace..." (4) click Record (5) click Stop Expected Result: Chrome process buffer and shows a trace. Actual Result: " Recording... Buffer usage: N% " No progress... How frequently does this problem reproduce? (Always, sometimes, hard to reproduce?) Always. What is the impact to the user, and is there a workaround? If so, what is it? Cannot record a trace. Note: This was working yesterday, but I no longer remember what Chrome OS version I was using. However, it is still broken with 8343.0.0 chrome OS and an older (yesterday's) Chrome. So, I think this may be Chrome OS related (somehow).
,
May 19 2016
,
May 19 2016
,
May 20 2016
Bisected to this change in chromiumos-overlay: 95eb890655de64cbff74eeb0e8297471074f8643 Uprev all AOSP ebuilds to the version in Chromium OS. This change updates the following packages: chromeos-base/apmanager/apmanager-0.0.1-r1069.ebuild chromeos-base/attestation/attestation-0.0.1-r1064.ebuild chromeos-base/chromeos-dbus-bindings/chromeos-dbus-bindings-0.0.1-r1080.ebuild chromeos-base/chromeos-minijail/chromeos-minijail-0.0.1-r1464.ebuild chromeos-base/firewalld/firewalld-0.0.1-r858.ebuild chromeos-base/libbrillo/libbrillo-0.0.1-r22.ebuild chromeos-base/shill-client/shill-client-0.0.1-r54.ebuild chromeos-base/shill-test-scripts/shill-test-scripts-0.0.1-r1804.ebuild chromeos-base/shill/shill-0.0.3-r13.ebuild chromeos-base/tpm_manager/tpm_manager-0.0.1-r346.ebuild chromeos-base/trunks/trunks-0.0.1-r1195.ebuild chromeos-base/update_engine-client/update_engine-client-0.0.1-r39.ebuild chromeos-base/update_engine/update_engine-0.0.3-r1899.ebuild chromeos-base/webserver/webserver-0.0.1-r681.ebuild
,
May 20 2016
Small breakthrough... chrome://tracing will complete if you de-select the "System tracing" checkbox.
,
May 20 2016
Fwiw, I was able to capture chrome traces and ftraces with https://chromium-review.googlesource.com/#/c/314413/. I used: simple_chrome_tracing --chrome-trace --ftrace -t 10 http://webglsamples.org/aquarium/aquarium.html to capture 10 seconds of traces on webgl aquarium. I do notice that it took noticeably longer than before for the script to collect the chrome traces from the browser though.
,
May 20 2016
I traced to chromeos/dbus/debug_daemon_client.cc:
// Called when a response for StopAgentTracing() is received.
void OnStopAgentTracing(dbus::Response* response) {
if (!response) { <===== DJK: replace this with "true" [0]
LOG(ERROR) << "Failed to request systrace stop";
// If debugd crashes or completes I/O before this message is processed
// then pipe_reader_ can be NULL, see OnIOComplete().
if (pipe_reader_.get())
pipe_reader_->OnDataReady(-1); // terminate data stream
}
// NB: requester is signaled when i/o completes
}
[0] if I force OnStopAgentTracing to terminate the systrace data stream, then tracing can continue processing the trace on "Stop".
So, for some reason, it looks like debugd is not terminating the data pipe by itself when it receives a "debugd::kSystraceStop" event after Chrome has finished reading all of the data it has.
,
May 20 2016
(I don't know anything about tracing or debugd.)
,
May 20 2016
Most recent change to debugd is https://chromium.git.corp.google.com/chromiumos/platform2/+/16d356558338a7cd852901eddf5866bb5b0f33bd +dhsharp
,
May 23 2016
@afakhry - reverting that last change doesn't fix the issue. Actually, if you look above, debugd is not in the list of ebuilds that changed with the breaking patch. However, libbrillo did change, and debugd depends on it. In fact, reverting this recent change to libbrillo fixes the issue: commit 476d59a8142cd756de99f8811124ea00f333eafc Author: Alex Deymo <deymo@google.com> Date: Tue Mar 29 17:10:10 2016 -0700 Fix Process::BindFd when passing the same fd to the child. Process::BindFd() allows to map a file descriptor in the parent to a file descriptor in the child. This was normally used to redirect the child's stdout or stderr to a given file descriptor in the parent, like an open file or a pipe. This didn't work when binding a file descriptor in the parent to the same number in the child, used for example to make available to the child a file descriptor opened in the parent, but close all the other file descriptors with SetCloseUnusedFileDescriptors. Bug: 27880754 TEST=Added unittests for that case. Revert uploaded as: https://chromium-review.googlesource.com/346453
,
May 24 2016
As suggested by deymo, CC'ing avakulenko@chromium.org (the only libbrillo owner not currently on this issue).
,
May 24 2016
Some logging, showing the interaction between debugd and libbrillo: [ERROR:process_with_output.cc(109)] RunProcess:109 [ERROR:process_with_output.cc(112)] RunProcess:112 [ERROR:process_with_output.cc(157)] DoRunProcess:157/usr/sbin/is_developer_end_user [ERROR:process_with_output.cc(46)] Init:46 [ERROR:process_with_output.cc(48)] Init:48 [ERROR:sandboxed_process.cc(50)] Init:50 [ERROR:process_with_output.cc(160)] DoRunProcess:160 [ERROR:debug_daemon.cc(105)] SystraceStart:105 [ERROR:systrace_tool.cc(39)] Start:39 [ERROR:sandboxed_process.cc(40)] GetHelperPath:40 path: /usr/libexec/debugd/helpers/systrace.sh [ERROR:systrace_tool.cc(44)] Start:44 path: /usr/libexec/debugd/helpers/systrace.sh [ERROR:process_with_output.cc(46)] Init:46 [ERROR:process_with_output.cc(48)] Init:48 [ERROR:sandboxed_process.cc(50)] Init:50 [ERROR:systrace_tool.cc(22)] AddCategoryArgs:22 [ERROR:process_with_output.cc(93)] GetOutput:93 [ERROR:debug_daemon.cc(111)] SystraceStop:111 [ERROR:systrace_tool.cc(60)] Stop:60 [ERROR:sandboxed_process.cc(40)] GetHelperPath:40 path: /usr/libexec/debugd/helpers/systrace.sh [ERROR:systrace_tool.cc(66)] Stop:66 path: /usr/libexec/debugd/helpers/systrace.sh [ERROR:process_with_output.cc(46)] Init:46 [ERROR:process_with_output.cc(48)] Init:48 [ERROR:sandboxed_process.cc(50)] Init:50 [ERROR:process.cc(313)] first: 1 parent_fd: -1 child_fd: 8 second_is_bound: 1 [ERROR:process.cc(242)] first: 1 parent_fd: -1 child_fd: 8 [ERROR:process.cc(254)] first: 1 parent_fd: -1 child_fd: 8 [ERROR:systrace_tool.cc(76)] Stop:76 The patch in #10 changed libbrillo such that it no longer closes "child_fd" (line "process.cc(313)") if it is bound.
,
May 24 2016
For posterity, here is a full trace with lots of debugging spew: localhost ~ # /sbin/debugd [ERROR:process_with_output.cc(114)] RunProcess:114 [ERROR:process_with_output.cc(117)] RunProcess:117 [ERROR:process_with_output.cc(162)] DoRunProcess:162 /usr/sbin/is_developer_end_user [ERROR:process_with_output.cc(46)] Init:46 [ERROR:process_with_output.cc(48)] Init:48 [ERROR:sandboxed_process.cc(50)] Init:50 [ERROR:sandboxed_process.cc(79)] Init:79 DONE [ERROR:process_with_output.cc(59)] Init:59 outfile: 7 [ERROR:process_with_output.cc(66)] Init:66 errfile: 8 [ERROR:process_with_output.cc(75)] Init BindFd(7, stdout:1) [ERROR:process.cc(78)] BindFd(7, 1) [ERROR:process_with_output.cc(78)] Init BindFd(8, stderr:2) [ERROR:process.cc(78)] BindFd(8, 2) [ERROR:process.cc(213)] Start close_unused_file_descriptors: 0 [ERROR:process.cc(221)] argv[0] = /sbin/minijail0 [ERROR:process.cc(221)] argv[1] = -v [ERROR:process.cc(221)] argv[2] = -- [ERROR:process.cc(221)] argv[3] = /usr/sbin/is_developer_end_user [ERROR:process.cc(326)] first: 1 parent_fd: -1 child_fd: 7 second_is_bound: 1 [ERROR:process.cc(326)] first: 2 parent_fd: -1 child_fd: 8 second_is_bound: 1 [ERROR:process.cc(250)] first: 1 parent_fd: -1 child_fd: 7 [ERROR:process.cc(250)] first: 2 parent_fd: -1 child_fd: 8 ### start chrome://tracing [ERROR:debug_daemon.cc(105)] SystraceStart:105 [ERROR:systrace_tool.cc(39)] Start:39 [ERROR:sandboxed_process.cc(40)] GetHelperPath:40 path: /usr/libexec/debugd/helpers/systrace.sh [ERROR:systrace_tool.cc(44)] Start:44 path: /usr/libexec/debugd/helpers/systrace.sh [ERROR:process_with_output.cc(46)] Init:46 [ERROR:process_with_output.cc(48)] Init:48 [ERROR:sandboxed_process.cc(50)] Init:50 [ERROR:sandboxed_process.cc(79)] Init:79 DONE [ERROR:process_with_output.cc(59)] Init:59 outfile: 7 [ERROR:process_with_output.cc(75)] Init BindFd(7, stdout:1) [ERROR:process.cc(78)] BindFd(7, 1) [ERROR:process_with_output.cc(78)] Init BindFd(7, stderr:2) [ERROR:process.cc(78)] BindFd(7, 2) [ERROR:systrace_tool.cc(22)] AddCategoryArgs:22 [ERROR:process.cc(213)] Start close_unused_file_descriptors: 0 [ERROR:process.cc(221)] argv[0] = /sbin/minijail0 [ERROR:process.cc(221)] argv[1] = -v [ERROR:process.cc(221)] argv[2] = -u [ERROR:process.cc(221)] argv[3] = debugd [ERROR:process.cc(221)] argv[4] = -g [ERROR:process.cc(221)] argv[5] = debugfs-access [ERROR:process.cc(221)] argv[6] = -- [ERROR:process.cc(221)] argv[7] = /usr/libexec/debugd/helpers/systrace.sh [ERROR:process.cc(221)] argv[8] = start [ERROR:process.cc(221)] argv[9] = all [ERROR:process.cc(326)] first: 1 parent_fd: -1 child_fd: 7 second_is_bound: 1 [ERROR:process.cc(326)] first: 2 parent_fd: -1 child_fd: 7 second_is_bound: 1 [ERROR:process.cc(250)] first: 1 parent_fd: -1 child_fd: 7 [ERROR:process.cc(250)] first: 2 parent_fd: -1 child_fd: 7 [ERROR:process_with_output.cc(98)] GetOutput:98 ### stop chrome://tracing [ERROR:debug_daemon.cc(111)] SystraceStop:111 [ERROR:systrace_tool.cc(60)] Stop:60 [ERROR:sandboxed_process.cc(40)] GetHelperPath:40 path: /usr/libexec/debugd/helpers/systrace.sh [ERROR:systrace_tool.cc(66)] Stop:66 path: /usr/libexec/debugd/helpers/systrace.sh [ERROR:process_with_output.cc(46)] Init:46 [ERROR:process_with_output.cc(48)] Init:48 [ERROR:sandboxed_process.cc(50)] Init:50 [ERROR:sandboxed_process.cc(79)] Init:79 DONE [ERROR:process_with_output.cc(59)] Init:59 outfile: 9 [ERROR:process_with_output.cc(75)] Init BindFd(9, stdout:1) [ERROR:process.cc(78)] BindFd(9, 1) [ERROR:process_with_output.cc(78)] Init BindFd(9, stderr:2) [ERROR:process.cc(78)] BindFd(9, 2) [ERROR:systrace_tool.cc(74)] Stop BindFd(8, stdout:1) [ERROR:process.cc(78)] BindFd(8, 1) [ERROR:process.cc(213)] Start close_unused_file_descriptors: 0 [ERROR:process.cc(221)] argv[0] = /sbin/minijail0 [ERROR:process.cc(221)] argv[1] = -v [ERROR:process.cc(221)] argv[2] = -u [ERROR:process.cc(221)] argv[3] = debugd [ERROR:process.cc(221)] argv[4] = -g [ERROR:process.cc(221)] argv[5] = debugfs-access [ERROR:process.cc(221)] argv[6] = -- [ERROR:process.cc(221)] argv[7] = /usr/libexec/debugd/helpers/systrace.sh [ERROR:process.cc(221)] argv[8] = stop [ERROR:process.cc(326)] first: 1 parent_fd: -1 child_fd: 8 second_is_bound: 1 [ERROR:process.cc(326)] first: 2 parent_fd: -1 child_fd: 9 second_is_bound: 1 [ERROR:process.cc(250)] first: 1 parent_fd: -1 child_fd: 8 [ERROR:process.cc(250)] first: 2 parent_fd: -1 child_fd: 9 [ERROR:systrace_tool.cc(77)] Stop:77 The trace is identical in both the broken and fixed cases. The difference is that: in the old case, the parent process after the fork() in ProcessImpl::Start() always closed its copy of child side pipes (like the comment). in the new case (libbrillo patch in #10), the parent process doesn't close its copies of the child side pipes, because it finds that they are "bound" (second_is_bound == 1).
,
May 24 2016
Here is a partial revert of #10: https://chromium-review.googlesource.com/346841 This patch restores the old behavior where the parent process always closes its copy of all child side pipes, whether they are bound or not.
,
May 25 2016
Some updates: (1) The revert [0] has been stuck in CQ, and hasn't landed yet, so chrome://tracing is still broken on ToT. [0] https://chromium-review.googlesource.com/#/c/346453/1 (2) jcliang is working on patch [1] to have debugd manually close fds after calling libbrillo's BindFd(): [1] https://chromium-review.googlesource.com/#/c/346762/ So, at this point, we can probably just abandon the revert and go with (2).
,
May 26 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/1ef73e543200fb7150d2bfb223f4a398067e8599 commit 1ef73e543200fb7150d2bfb223f4a398067e8599 Author: Ricky Liang <jcliang@chromium.org> Date: Tue May 24 08:32:34 2016 debugd: Close bound FDs when debugd is done with them Commit 476d59a ("Fix Process::BindFd when passing the same fd to the child.") in libbrillo changed to not close a FD in the parent process after fork if the FD was bound. As a result it causes FD leaks in several places in debugd. For example, chrome://tracing is broken because the pipe FD is not closed after debugd handles kSystraceStop DBus call. Another exmaple is the "Store Logs" in chrome://net-internals/#chromeos where the logs are stored but debugd never closes the output FD. We should close the bound FDs in the parent process when debugd is done with them. This patch overrides BindFd() in SandboxedProcess to record all the bound FDs and close all the bound FDs in the destructor. The only place that SandboxedProcess is not used is in DebugLogsTool::GetDebugLogs(), where the FD is closed explicitly after the subprocess finishes. Also the process that runs `systrace.sh stop` does not need to be ProcessWithOutput, so change it to SandboxedProcess. BUG= chromium:613122 TEST=On Chrome OS: chrome://tracing start a trace, be sure to enable "System tracing" stop trace => Chrome successfully imports trace data (before this patch, Chrome would hang, waiting for debugd to respond) TEST=On Chrome OS: chrome://net-internals/#chromeos click on "Store Debug Logs" and wait for logs to be created run `lsof -p $(pgrep debugd)` => no FD leak in debugd (before this patch, the debugd would leave the logs output file opened) Change-Id: I78ca534dc654e4dbc79d37a5e0278e2eb18c0e4e Signed-off-by: Ricky Liang <jcliang@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/346762 Reviewed-by: Ahmed Fakhry <afakhry@chromium.org> [modify] https://crrev.com/1ef73e543200fb7150d2bfb223f4a398067e8599/debugd/src/debug_logs_tool.cc [modify] https://crrev.com/1ef73e543200fb7150d2bfb223f4a398067e8599/debugd/src/sandboxed_process.cc [modify] https://crrev.com/1ef73e543200fb7150d2bfb223f4a398067e8599/debugd/src/systrace_tool.cc [modify] https://crrev.com/1ef73e543200fb7150d2bfb223f4a398067e8599/debugd/src/sandboxed_process.h
,
May 27 2016
,
May 27 2016
Your change meets the bar and is auto-approved for M52 (branch: 2743)
,
May 28 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/509e6db266df8986804b0a0101afa8071047602d commit 509e6db266df8986804b0a0101afa8071047602d Author: Ricky Liang <jcliang@chromium.org> Date: Tue May 24 08:32:34 2016 debugd: Close bound FDs when debugd is done with them Commit 476d59a ("Fix Process::BindFd when passing the same fd to the child.") in libbrillo changed to not close a FD in the parent process after fork if the FD was bound. As a result it causes FD leaks in several places in debugd. For example, chrome://tracing is broken because the pipe FD is not closed after debugd handles kSystraceStop DBus call. Another exmaple is the "Store Logs" in chrome://net-internals/#chromeos where the logs are stored but debugd never closes the output FD. We should close the bound FDs in the parent process when debugd is done with them. This patch overrides BindFd() in SandboxedProcess to record all the bound FDs and close all the bound FDs in the destructor. The only place that SandboxedProcess is not used is in DebugLogsTool::GetDebugLogs(), where the FD is closed explicitly after the subprocess finishes. Also the process that runs `systrace.sh stop` does not need to be ProcessWithOutput, so change it to SandboxedProcess. BUG= chromium:613122 TEST=On Chrome OS: chrome://tracing start a trace, be sure to enable "System tracing" stop trace => Chrome successfully imports trace data (before this patch, Chrome would hang, waiting for debugd to respond) TEST=On Chrome OS: chrome://net-internals/#chromeos click on "Store Debug Logs" and wait for logs to be created run `lsof -p $(pgrep debugd)` => no FD leak in debugd (before this patch, the debugd would leave the logs output file opened) Change-Id: I78ca534dc654e4dbc79d37a5e0278e2eb18c0e4e Signed-off-by: Ricky Liang <jcliang@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/346762 Reviewed-by: Ahmed Fakhry <afakhry@chromium.org> (cherry picked from commit 1ef73e543200fb7150d2bfb223f4a398067e8599) Reviewed-on: https://chromium-review.googlesource.com/348120 [modify] https://crrev.com/509e6db266df8986804b0a0101afa8071047602d/debugd/src/debug_logs_tool.cc [modify] https://crrev.com/509e6db266df8986804b0a0101afa8071047602d/debugd/src/sandboxed_process.cc [modify] https://crrev.com/509e6db266df8986804b0a0101afa8071047602d/debugd/src/systrace_tool.cc [modify] https://crrev.com/509e6db266df8986804b0a0101afa8071047602d/debugd/src/sandboxed_process.h
,
May 31 2016
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
May 31 2016
,
Jul 1 2016
,
Aug 29 2016
,
Oct 7 2016
,
Nov 19 2016
,
Jan 21 2017
,
Mar 4 2017
,
Apr 17 2017
,
May 3 2017
Verified on build 9517.0.0 |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by djkurtz@chromium.org
, May 19 2016