New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 613122 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Closed: May 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug-Regression



Sign in to add a comment

chrome://tracing hangs after recording

Project Member Reported by djkurtz@chromium.org, May 19 2016

Issue description

Chrome Version: tested w/ 52.0.2734.0 & 52.0.2739.0
Chrome OS Version: 8343.0.0
Chrome OS Platform: oak & peach_pi

Steps To Reproduce:
(1) chrome://tracing
(2) Record
(3) Select anything in "Record a new trace..."
(4) click Record
(5) click Stop

Expected Result:

Chrome process buffer and shows a trace.

Actual Result:

"
Recording...
Buffer usage: N%
"

No progress...

How frequently does this problem reproduce? (Always, sometimes, hard to
reproduce?)

Always.

What is the impact to the user, and is there a workaround? If so, what is
it?

Cannot record a trace.

Note: This was working yesterday, but I no longer remember what Chrome OS version I was using.  However, it is still broken with 8343.0.0 chrome OS and an older (yesterday's) Chrome.  So, I think this may be Chrome OS related (somehow).
 
Chrome OS bisect (these are the only working versions listed on goldeneye for oak):
8318.0.0 / 52.0.2734.0	OK
8321.0.0 / 52.0.2734.0	BAD
Note the same Chromium version...

https://crosland.corp.google.com/log/8318.0.0..8321.0.0

Comment 2 by derat@chromium.org, May 19 2016

Cc: -derat@chromium.org

Comment 3 by osh...@chromium.org, May 19 2016

Cc: w...@chromium.org
Cc: vapier@chromium.org
Owner: de...@chromium.org
Bisected to this change in chromiumos-overlay:
 95eb890655de64cbff74eeb0e8297471074f8643
Uprev all AOSP ebuilds to the version in Chromium OS.

This change updates the following packages:

chromeos-base/apmanager/apmanager-0.0.1-r1069.ebuild
chromeos-base/attestation/attestation-0.0.1-r1064.ebuild
chromeos-base/chromeos-dbus-bindings/chromeos-dbus-bindings-0.0.1-r1080.ebuild
chromeos-base/chromeos-minijail/chromeos-minijail-0.0.1-r1464.ebuild
chromeos-base/firewalld/firewalld-0.0.1-r858.ebuild
chromeos-base/libbrillo/libbrillo-0.0.1-r22.ebuild
chromeos-base/shill-client/shill-client-0.0.1-r54.ebuild
chromeos-base/shill-test-scripts/shill-test-scripts-0.0.1-r1804.ebuild
chromeos-base/shill/shill-0.0.3-r13.ebuild
chromeos-base/tpm_manager/tpm_manager-0.0.1-r346.ebuild
chromeos-base/trunks/trunks-0.0.1-r1195.ebuild
chromeos-base/update_engine-client/update_engine-client-0.0.1-r39.ebuild
chromeos-base/update_engine/update_engine-0.0.3-r1899.ebuild
chromeos-base/webserver/webserver-0.0.1-r681.ebuild

Small breakthrough...  chrome://tracing will complete if you de-select the "System tracing" checkbox.
Fwiw, I was able to capture chrome traces and ftraces with https://chromium-review.googlesource.com/#/c/314413/. I used:

  simple_chrome_tracing --chrome-trace --ftrace -t 10 http://webglsamples.org/aquarium/aquarium.html

to capture 10 seconds of traces on webgl aquarium.

I do notice that it took noticeably longer than before for the script to collect the chrome traces from the browser though.
Cc: derat@chromium.org
I traced to chromeos/dbus/debug_daemon_client.cc:

  // Called when a response for StopAgentTracing() is received.
  void OnStopAgentTracing(dbus::Response* response) {
    if (!response) {          <===== DJK: replace this with "true" [0]
      LOG(ERROR) << "Failed to request systrace stop";
      // If debugd crashes or completes I/O before this message is processed
      // then pipe_reader_ can be NULL, see OnIOComplete().
      if (pipe_reader_.get())
        pipe_reader_->OnDataReady(-1); // terminate data stream
    }
    // NB: requester is signaled when i/o completes
  }

[0] if I force OnStopAgentTracing to terminate the systrace data stream, then tracing can continue processing the trace on "Stop".

So, for some reason, it looks like debugd is not terminating the data pipe by itself when it receives a "debugd::kSystraceStop" event after Chrome has finished reading all of the data it has.

Comment 8 by derat@chromium.org, May 20 2016

Cc: -derat@chromium.org afakhry@chromium.org
(I don't know anything about tracing or debugd.)
Cc: dhsharp@chromium.org
Most recent change to debugd is https://chromium.git.corp.google.com/chromiumos/platform2/+/16d356558338a7cd852901eddf5866bb5b0f33bd

+dhsharp
@afakhry - reverting that last change doesn't fix the issue.
Actually, if you look above, debugd is not in the list of ebuilds that changed with the breaking patch.

However, libbrillo did change, and debugd depends on it.

In fact, reverting this recent change to libbrillo fixes the issue:

commit 476d59a8142cd756de99f8811124ea00f333eafc
Author: Alex Deymo <deymo@google.com>
Date:   Tue Mar 29 17:10:10 2016 -0700

    Fix Process::BindFd when passing the same fd to the child.
    
    Process::BindFd() allows to map a file descriptor in the parent to a
    file descriptor in the child. This was normally used to redirect the
    child's stdout or stderr to a given file descriptor in the parent, like
    an open file or a pipe. This didn't work when binding a file descriptor
    in the parent to the same number in the child, used for example to make
    available to the child a file descriptor opened in the parent, but
    close all the other file descriptors with SetCloseUnusedFileDescriptors.
    
    Bug: 27880754
    TEST=Added unittests for that case.

Revert uploaded as: https://chromium-review.googlesource.com/346453
Cc: avakulenko@chromium.org
As suggested by deymo, CC'ing avakulenko@chromium.org (the only libbrillo owner not currently on this issue).
Some logging, showing the interaction between debugd and libbrillo:

[ERROR:process_with_output.cc(109)] RunProcess:109
[ERROR:process_with_output.cc(112)] RunProcess:112
[ERROR:process_with_output.cc(157)] DoRunProcess:157/usr/sbin/is_developer_end_user
[ERROR:process_with_output.cc(46)] Init:46
[ERROR:process_with_output.cc(48)] Init:48
[ERROR:sandboxed_process.cc(50)] Init:50
[ERROR:process_with_output.cc(160)] DoRunProcess:160
[ERROR:debug_daemon.cc(105)] SystraceStart:105
[ERROR:systrace_tool.cc(39)] Start:39
[ERROR:sandboxed_process.cc(40)] GetHelperPath:40 path: /usr/libexec/debugd/helpers/systrace.sh
[ERROR:systrace_tool.cc(44)] Start:44 path: /usr/libexec/debugd/helpers/systrace.sh
[ERROR:process_with_output.cc(46)] Init:46
[ERROR:process_with_output.cc(48)] Init:48
[ERROR:sandboxed_process.cc(50)] Init:50
[ERROR:systrace_tool.cc(22)] AddCategoryArgs:22
[ERROR:process_with_output.cc(93)] GetOutput:93
[ERROR:debug_daemon.cc(111)] SystraceStop:111
[ERROR:systrace_tool.cc(60)] Stop:60
[ERROR:sandboxed_process.cc(40)] GetHelperPath:40 path: /usr/libexec/debugd/helpers/systrace.sh
[ERROR:systrace_tool.cc(66)] Stop:66 path: /usr/libexec/debugd/helpers/systrace.sh
[ERROR:process_with_output.cc(46)] Init:46
[ERROR:process_with_output.cc(48)] Init:48
[ERROR:sandboxed_process.cc(50)] Init:50
[ERROR:process.cc(313)] first: 1 parent_fd: -1 child_fd: 8 second_is_bound: 1
[ERROR:process.cc(242)] first: 1 parent_fd: -1 child_fd: 8
[ERROR:process.cc(254)] first: 1 parent_fd: -1 child_fd: 8
[ERROR:systrace_tool.cc(76)] Stop:76


The patch in #10 changed libbrillo such that it no longer closes "child_fd" (line "process.cc(313)") if it is bound.
For posterity, here is a full trace with lots of debugging spew:

localhost ~ # /sbin/debugd
[ERROR:process_with_output.cc(114)] RunProcess:114
[ERROR:process_with_output.cc(117)] RunProcess:117
[ERROR:process_with_output.cc(162)] DoRunProcess:162 /usr/sbin/is_developer_end_user
[ERROR:process_with_output.cc(46)] Init:46
[ERROR:process_with_output.cc(48)] Init:48
[ERROR:sandboxed_process.cc(50)] Init:50
[ERROR:sandboxed_process.cc(79)] Init:79 DONE
[ERROR:process_with_output.cc(59)] Init:59 outfile: 7
[ERROR:process_with_output.cc(66)] Init:66 errfile: 8
[ERROR:process_with_output.cc(75)] Init BindFd(7, stdout:1)
[ERROR:process.cc(78)] BindFd(7, 1)
[ERROR:process_with_output.cc(78)] Init BindFd(8, stderr:2)
[ERROR:process.cc(78)] BindFd(8, 2)
[ERROR:process.cc(213)] Start close_unused_file_descriptors: 0
[ERROR:process.cc(221)] argv[0] = /sbin/minijail0
[ERROR:process.cc(221)] argv[1] = -v
[ERROR:process.cc(221)] argv[2] = --
[ERROR:process.cc(221)] argv[3] = /usr/sbin/is_developer_end_user
[ERROR:process.cc(326)] first: 1 parent_fd: -1 child_fd: 7 second_is_bound: 1
[ERROR:process.cc(326)] first: 2 parent_fd: -1 child_fd: 8 second_is_bound: 1
[ERROR:process.cc(250)] first: 1 parent_fd: -1 child_fd: 7
[ERROR:process.cc(250)] first: 2 parent_fd: -1 child_fd: 8

### start chrome://tracing

[ERROR:debug_daemon.cc(105)] SystraceStart:105
[ERROR:systrace_tool.cc(39)] Start:39
[ERROR:sandboxed_process.cc(40)] GetHelperPath:40 path: /usr/libexec/debugd/helpers/systrace.sh
[ERROR:systrace_tool.cc(44)] Start:44 path: /usr/libexec/debugd/helpers/systrace.sh
[ERROR:process_with_output.cc(46)] Init:46
[ERROR:process_with_output.cc(48)] Init:48
[ERROR:sandboxed_process.cc(50)] Init:50
[ERROR:sandboxed_process.cc(79)] Init:79 DONE
[ERROR:process_with_output.cc(59)] Init:59 outfile: 7
[ERROR:process_with_output.cc(75)] Init BindFd(7, stdout:1)
[ERROR:process.cc(78)] BindFd(7, 1)
[ERROR:process_with_output.cc(78)] Init BindFd(7, stderr:2)
[ERROR:process.cc(78)] BindFd(7, 2)
[ERROR:systrace_tool.cc(22)] AddCategoryArgs:22
[ERROR:process.cc(213)] Start close_unused_file_descriptors: 0
[ERROR:process.cc(221)] argv[0] = /sbin/minijail0
[ERROR:process.cc(221)] argv[1] = -v
[ERROR:process.cc(221)] argv[2] = -u
[ERROR:process.cc(221)] argv[3] = debugd
[ERROR:process.cc(221)] argv[4] = -g
[ERROR:process.cc(221)] argv[5] = debugfs-access
[ERROR:process.cc(221)] argv[6] = --
[ERROR:process.cc(221)] argv[7] = /usr/libexec/debugd/helpers/systrace.sh
[ERROR:process.cc(221)] argv[8] = start
[ERROR:process.cc(221)] argv[9] = all
[ERROR:process.cc(326)] first: 1 parent_fd: -1 child_fd: 7 second_is_bound: 1
[ERROR:process.cc(326)] first: 2 parent_fd: -1 child_fd: 7 second_is_bound: 1
[ERROR:process.cc(250)] first: 1 parent_fd: -1 child_fd: 7
[ERROR:process.cc(250)] first: 2 parent_fd: -1 child_fd: 7
[ERROR:process_with_output.cc(98)] GetOutput:98

### stop chrome://tracing

[ERROR:debug_daemon.cc(111)] SystraceStop:111
[ERROR:systrace_tool.cc(60)] Stop:60
[ERROR:sandboxed_process.cc(40)] GetHelperPath:40 path: /usr/libexec/debugd/helpers/systrace.sh
[ERROR:systrace_tool.cc(66)] Stop:66 path: /usr/libexec/debugd/helpers/systrace.sh
[ERROR:process_with_output.cc(46)] Init:46
[ERROR:process_with_output.cc(48)] Init:48
[ERROR:sandboxed_process.cc(50)] Init:50
[ERROR:sandboxed_process.cc(79)] Init:79 DONE
[ERROR:process_with_output.cc(59)] Init:59 outfile: 9
[ERROR:process_with_output.cc(75)] Init BindFd(9, stdout:1)
[ERROR:process.cc(78)] BindFd(9, 1)
[ERROR:process_with_output.cc(78)] Init BindFd(9, stderr:2)
[ERROR:process.cc(78)] BindFd(9, 2)
[ERROR:systrace_tool.cc(74)] Stop BindFd(8, stdout:1)
[ERROR:process.cc(78)] BindFd(8, 1)
[ERROR:process.cc(213)] Start close_unused_file_descriptors: 0
[ERROR:process.cc(221)] argv[0] = /sbin/minijail0
[ERROR:process.cc(221)] argv[1] = -v
[ERROR:process.cc(221)] argv[2] = -u
[ERROR:process.cc(221)] argv[3] = debugd
[ERROR:process.cc(221)] argv[4] = -g
[ERROR:process.cc(221)] argv[5] = debugfs-access
[ERROR:process.cc(221)] argv[6] = --
[ERROR:process.cc(221)] argv[7] = /usr/libexec/debugd/helpers/systrace.sh
[ERROR:process.cc(221)] argv[8] = stop
[ERROR:process.cc(326)] first: 1 parent_fd: -1 child_fd: 8 second_is_bound: 1
[ERROR:process.cc(326)] first: 2 parent_fd: -1 child_fd: 9 second_is_bound: 1
[ERROR:process.cc(250)] first: 1 parent_fd: -1 child_fd: 8
[ERROR:process.cc(250)] first: 2 parent_fd: -1 child_fd: 9
[ERROR:systrace_tool.cc(77)] Stop:77


The trace is identical in both the broken and fixed cases.

The difference is that:
  in the old case, the parent process after the fork() in ProcessImpl::Start() always closed its copy of child side pipes (like the comment).

  in the new case (libbrillo patch in #10), the parent process doesn't close its copies of the child side pipes, because it finds that they are "bound" (second_is_bound == 1).
Here is a partial revert of #10:
https://chromium-review.googlesource.com/346841

This patch restores the old behavior where the parent process always closes its copy of all child side pipes, whether they are bound or not.
Owner: jcliang@chromium.org
Some updates:

(1) The revert [0] has been stuck in CQ, and hasn't landed yet, so chrome://tracing is still broken on ToT.
[0] https://chromium-review.googlesource.com/#/c/346453/1

(2) jcliang is working on patch [1] to have debugd manually close fds after calling libbrillo's BindFd():
[1] https://chromium-review.googlesource.com/#/c/346762/

So, at this point, we can probably just abandon the revert and go with (2).
Project Member

Comment 16 by bugdroid1@chromium.org, May 26 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/1ef73e543200fb7150d2bfb223f4a398067e8599

commit 1ef73e543200fb7150d2bfb223f4a398067e8599
Author: Ricky Liang <jcliang@chromium.org>
Date: Tue May 24 08:32:34 2016

debugd: Close bound FDs when debugd is done with them

Commit 476d59a ("Fix Process::BindFd when passing the same fd
to the child.") in libbrillo changed to not close a FD in
the parent process after fork if the FD was bound. As a result
it causes FD leaks in several places in debugd.

For example, chrome://tracing is broken because the pipe FD is
not closed after debugd handles kSystraceStop DBus call. Another
exmaple is the "Store Logs" in chrome://net-internals/#chromeos
where the logs are stored but debugd never closes the output FD.

We should close the bound FDs in the parent process when debugd
is done with them. This patch overrides BindFd() in SandboxedProcess
to record all the bound FDs and close all the bound FDs in the
destructor. The only place that SandboxedProcess is not used is
in DebugLogsTool::GetDebugLogs(), where the FD is closed explicitly
after the subprocess finishes.

Also the process that runs `systrace.sh stop` does not need to
be ProcessWithOutput, so change it to SandboxedProcess.

BUG= chromium:613122 
TEST=On Chrome OS: chrome://tracing
  start a trace, be sure to enable "System tracing"
  stop trace
  => Chrome successfully imports trace data
     (before this patch, Chrome would hang, waiting for debugd to respond)
TEST=On Chrome OS: chrome://net-internals/#chromeos
  click on "Store Debug Logs" and wait for logs to be created
  run `lsof -p $(pgrep debugd)`
  => no FD leak in debugd
     (before this patch, the debugd would leave the logs output file opened)

Change-Id: I78ca534dc654e4dbc79d37a5e0278e2eb18c0e4e
Signed-off-by: Ricky Liang <jcliang@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/346762
Reviewed-by: Ahmed Fakhry <afakhry@chromium.org>

[modify] https://crrev.com/1ef73e543200fb7150d2bfb223f4a398067e8599/debugd/src/debug_logs_tool.cc
[modify] https://crrev.com/1ef73e543200fb7150d2bfb223f4a398067e8599/debugd/src/sandboxed_process.cc
[modify] https://crrev.com/1ef73e543200fb7150d2bfb223f4a398067e8599/debugd/src/systrace_tool.cc
[modify] https://crrev.com/1ef73e543200fb7150d2bfb223f4a398067e8599/debugd/src/sandboxed_process.h

Labels: Merge-Request-52

Comment 18 by tin...@google.com, May 27 2016

Labels: -Merge-Request-52 Merge-Approved-52 Hotlist-Merge-Approved
Your change meets the bar and is auto-approved for M52 (branch: 2743)
Project Member

Comment 19 by bugdroid1@chromium.org, May 28 2016

Labels: merge-merged-release-R52-8350.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/509e6db266df8986804b0a0101afa8071047602d

commit 509e6db266df8986804b0a0101afa8071047602d
Author: Ricky Liang <jcliang@chromium.org>
Date: Tue May 24 08:32:34 2016

debugd: Close bound FDs when debugd is done with them

Commit 476d59a ("Fix Process::BindFd when passing the same fd
to the child.") in libbrillo changed to not close a FD in
the parent process after fork if the FD was bound. As a result
it causes FD leaks in several places in debugd.

For example, chrome://tracing is broken because the pipe FD is
not closed after debugd handles kSystraceStop DBus call. Another
exmaple is the "Store Logs" in chrome://net-internals/#chromeos
where the logs are stored but debugd never closes the output FD.

We should close the bound FDs in the parent process when debugd
is done with them. This patch overrides BindFd() in SandboxedProcess
to record all the bound FDs and close all the bound FDs in the
destructor. The only place that SandboxedProcess is not used is
in DebugLogsTool::GetDebugLogs(), where the FD is closed explicitly
after the subprocess finishes.

Also the process that runs `systrace.sh stop` does not need to
be ProcessWithOutput, so change it to SandboxedProcess.

BUG= chromium:613122 
TEST=On Chrome OS: chrome://tracing
  start a trace, be sure to enable "System tracing"
  stop trace
  => Chrome successfully imports trace data
     (before this patch, Chrome would hang, waiting for debugd to respond)
TEST=On Chrome OS: chrome://net-internals/#chromeos
  click on "Store Debug Logs" and wait for logs to be created
  run `lsof -p $(pgrep debugd)`
  => no FD leak in debugd
     (before this patch, the debugd would leave the logs output file opened)

Change-Id: I78ca534dc654e4dbc79d37a5e0278e2eb18c0e4e
Signed-off-by: Ricky Liang <jcliang@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/346762
Reviewed-by: Ahmed Fakhry <afakhry@chromium.org>
(cherry picked from commit 1ef73e543200fb7150d2bfb223f4a398067e8599)
Reviewed-on: https://chromium-review.googlesource.com/348120

[modify] https://crrev.com/509e6db266df8986804b0a0101afa8071047602d/debugd/src/debug_logs_tool.cc
[modify] https://crrev.com/509e6db266df8986804b0a0101afa8071047602d/debugd/src/sandboxed_process.cc
[modify] https://crrev.com/509e6db266df8986804b0a0101afa8071047602d/debugd/src/systrace_tool.cc
[modify] https://crrev.com/509e6db266df8986804b0a0101afa8071047602d/debugd/src/sandboxed_process.h

Project Member

Comment 20 by sheriffbot@chromium.org, May 31 2016

This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Approved-52
Status: Fixed (was: Available)
Labels: VerifyIn-53
Labels: VerifyIn-54
Labels: VerifyIn-55

Comment 25 by dchan@google.com, Nov 19 2016

Labels: VerifyIn-56

Comment 26 by dchan@google.com, Jan 21 2017

Labels: VerifyIn-57

Comment 27 by dchan@google.com, Mar 4 2017

Labels: VerifyIn-58

Comment 28 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 29 by son...@google.com, May 3 2017

Status: Verified (was: Fixed)
Verified on build 9517.0.0

Sign in to add a comment