New issue
Advanced search Search tips

Issue 850654 link

Starred by 7 users

Issue metadata

Status: Fixed
Owner: ----
Closed: Aug 27
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

crosh: w/NativeCrxBindings enabled: close one crosh, all other croshes freeze

Reported by agrif...@gmail.com, Jun 7 2018

Issue description

Chrome Version       : 68.0.3440.15
OS Version: 10718.13.0
URLs (if applicable) : chrome-extension://okddffdblfhhnmhodogpojmfkjmhinfp/html/crosh.html

What steps will reproduce the problem?
1. On dev channel, open two crosh windows
2. Close one of the crosh windows
3. The other crosh window is frozen, no longer accepts input
4. Same is true of crostini terminals

What is the expected result?
The open crosh window should still work.

What happens instead of that?
No input is recognized, the only recourse is to close the window.

Please provide any additional information below. Attach a screenshot if
possible.

UserAgentString: Mozilla/5.0 (X11; CrOS x86_64 10718.13.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.15 Safari/537.36


 
screencast.webm
6.2 MB View Download
Cc: za...@chromium.org
Components: Platform>Apps>Default>Hterm
it'd be better if we could focus on crosh and not get Terminal/crostini in the mix.  i think you're saying that if you run more than one crosh and close one, the others hang ?

this is the builtin crosh path:
  chrome-extension://nkoccljplnhpfnfiajclkommnmllphnl/html/crosh.html
if you load multiple tabs of that directly, does it fail in the same way ?

if you go into guest mode, can you reproduce it there ?

Comment 2 by agrif...@gmail.com, Jun 8 2018

> it'd be better if we could focus on crosh and not get Terminal/crostini in the mix.  i think you're saying that if you run more than one crosh and close one, the others hang ?

right 

> this is the builtin crosh path:
>  chrome-extension://nkoccljplnhpfnfiajclkommnmllphnl/html/crosh.html
> if you load multiple tabs of that directly, does it fail in the same way ?

yes. Interestingly, this experiment didn't affect my other crosh instances, presumably because they're running chrome-extension://okddffdblfhhnmhodogpojmfkjmhinfp/html/crosh.html

> if you go into guest mode, can you reproduce it there ?

yes, it does the same thing in guest mode.
interesting that only the same extension set was affected.  while the html files are diff, the backend C++/JS code is the same as all crosh instances use the same privateTerminal JS API.

this bug really sounds like an issue with Chrome itself, but i'm not as familiar there, and i haven't been able to reproduce it :/.

the fact that it reproduces in guest mode is helpful.  Zach: i forget if you said the same, or if guest mode fixed it?  if it also fails, might be a way of doing a bisect across CrOS & Chrome releases.

Comment 4 by za...@chromium.org, Jun 8 2018

It reproduced in guest mode but was not reproducible on other machines with identical versions, but different users.
maybe set that pixelbook down until next time i'm in MTV ? :)

i think Aron is still in the northeast ...

Comment 6 by agrif...@gmail.com, Jun 8 2018

yup, I'm in providence. are you still in worcester?

Isn't there a way for you to remote debug if I set it up for you? Tag me on freenode if that's easier.
i'm in Boston now.  i'm not sure i could effectively remote in (for GUI), and if we're going to bisect images, in person is only way to go.

this is a pixelbook right ?

Comment 8 by agrif...@gmail.com, Jun 8 2018

yup, pixelbook i5
This bug affects my Samsung Chromebook Pro as well, and I tried in guest mode and tried rebooting but it affects me in both those cases as well.
This occurs on the internal extension chrome-extension://nkoccljplnhpfnfiajclkommnmllphnl/html/crosh.html
as well as the chrome web store version https://chrome.google.com/webstore/detail/pnhechapfaindjhompbnflcldabbghjo

Also note that I'm on dev mode, and I have loaded custom fonts in crosh (Source Code Pro for Powerline)

Chrome OS Version: 68.0.3440.15 (Official Build) dev (64-bit)
Revision: 97fda87c8faca0ed36579987237d552dd92d1fcd-refs/branch-heads/3440@{#179}
Platform: 10718.13.0 (Official Build) dev-channel caroline

Comment 10 by agrif...@gmail.com, Jun 10 2018

another experiment... it's not just input, it's output too that's frozen.

If I start "while true; do date; sleep 1; done" in one crosh, then close the other crosh, the output from the loop stops.
Status: Started (was: Unconfirmed)
Summary: crosh: close one crosh, all other croshes freeze (was: Close one crosh, all other croshes freeze)
while working on an unrelated issue, i managed to reproduce this on my Chromebook Pixel (link)

the problem seems to be in Chrome itself:
- create two croshes (A & B)
- ssh into the system
- close crosh B
- see crosh A seems to freeze
- while watching `ps` in ssh session, start typing commands into crosh A
- "shell" spawns a new shell.  "top" starts up top.  Ctrl+C kills top.  leave "top" running because it'll keep generating output.
- put a breakpoint on crosh.js:Crosh.prototype.onProcessOutput_
  - verify it's the right entrypoint by starting a new crosh and seeing this fire for every line
  - this callback is registered with Chrome via chrome.terminalProviate.onProcessOutput.addListener
- breakpoint is never hit even though top is still running

ok, lets dig a little deeper:
- in a working crosh shell, run: dash -c 'while sleep 0.01; do date; done'
  - use `dash` so it's easier to find via ssh to strace quickly
- in ssh session, find dash pid and run: strace -qq -f -ewrite -p <pid>
- watch output in crosh & ssh session work correctly
- start & close a new crosh session
- see crosh output halt
- strace via ssh continues outputting ... until it halts at a write
- most likely some buffer has filled up and the kernel has stopped accepting anymore output
- pstree now looks like:
init,1
  |-session_manager,18093 
  |   `-chrome,18346 --gpu-sandbox-failures-fatal=yes --enable-logging ....
  |       |-bash,12949 /usr/bin/crosh
  |       |   `-bash,13045 /usr/bin/crosh
  |       |       `-bash,13046 -l
  |       |           `-dash,13099 -c while sleep 0.01; do date; done
  |       |               `-date,15544 --coreutils-prog-shebang=date /bin/date
- so date is stuck on its write() call and is in S (sleeping) state

date normally writes to stdout, so lets trace that back:
# ls -l /proc/15544/fd/1
lrwx------ 1 chronos chronos 64 Jun 27 16:03 /proc/15544/fd/1 -> /dev/pts/4
ls -l /proc/12949/fd/1
lrwx------ 1 chronos chronos 64 Jun 27 16:01 /proc/12949/fd/1 -> /dev/pts/4
so date & bash are connected to a pty that was created by chrome.  that's to be expected.

we know Chrome is responsible for creating the pseudo terminal in the first place.  here we see it's pid 18346 which is the main process w/lots of threads.  it doesn't seem to be broken down further though as bash's ppid is set to 18346 (and not another thread).  its fd/ output is pretty large (not surprising).  it has two handles open to /dev/ptmx, but i'm not sure if there's a way to extract more details out of the kernel as to the status of these ptys.

lets poke some more.
- close all crosh instances
- start a new crosh & shell and run: while date; do :; done
- in the ssh session run: strace -qq -f -eread -p 18346
  - basically watch the storm of read calls from all the threads until we see one that looks like the `date` output:
    [pid 25399] read(274, "Wed Jun 27 16:22:21 PDT 2018\r\nWe"..., 4095) = 120
- so now we know thread 25399 is supposed to be watching the output of crosh all the time
- kill the `date` spam in crosh
- in the ssh session, start a new strace: strace -qq -p 25399
- see that the process is quiet (good)
- hit enter in crosh and see the process handle it (good)
  epoll_wait(244, ...) 
    [{EPOLLIN, {u32=274, u64=274}}], 32, -1) = 1
  epoll_ctl(244, EPOLL_CTL_DEL, 274, 0x7f8eb5e6ed18) = 0
  write(249, "\0", 1)                     = 1
  read(274, "\r\n", 4095)                 = 2
  futex(0x7f8eb3ea7cfc, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x7f8eb3ea7cd0, 2) = 1
  futex(0x7f8eb3ea7cd0, FUTEX_WAKE_PRIVATE, 1) = 1
  futex(0x285df26959b8, FUTEX_WAKE_PRIVATE, 1) = 1
  futex(0x285ded2b95d8, FUTEX_WAKE_PRIVATE, 1) = 1
  epoll_wait(244, [{EPOLLIN, {u32=248, u64=248}}], 32, 0) = 1
  read(248, "\0", 1)                      = 1
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, [{EPOLLIN, {u32=248, u64=248}}], 32, -1) = 1
  read(248, "\0", 1)                      = 1
  epoll_ctl(244, EPOLL_CTL_ADD, 274, {EPOLLIN, {u32=274, u64=274}}) = 0
  epoll_wait(244, [{EPOLLIN, {u32=274, u64=274}}], 32, 0) = 1
  epoll_ctl(244, EPOLL_CTL_DEL, 274, 0x7f8eb5e6ed18) = 0
  epoll_wait(244, [], 32, 0)              = 0
  read(274, "\33]0;chronos@localhost:/\7\33[01;32m"..., 4095) = 67
  futex(0x7f8eb3ea7cfc, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x7f8eb3ea7cd0, 2) = 1
  futex(0x7f8eb3ea7cd0, FUTEX_WAKE_PRIVATE, 1) = 1
  futex(0x285df26959b8, FUTEX_WAKE_PRIVATE, 1) = 1
  futex(0x285ded2b95d8, FUTEX_WAKE_PRIVATE, 1) = 1
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, [{EPOLLIN, {u32=248, u64=248}}], 32, -1) = 1
  read(248, "\0", 1)                      = 1
  epoll_ctl(244, EPOLL_CTL_ADD, 274, {EPOLLIN, {u32=274, u64=274}}) = 0
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244,

  - fd/244 is anon_inode:[eventpoll]
  - fd/249 is pipe:[112269]
  - fd/274 is /dev/ptmx (prob master side of this pty)
- start a new crosh
- see this thread is responsible for all the crosh instances (which is fine)
- hit enter in this crosh to see the results
  epoll_wait(244, ...) 
    [{EPOLLIN, {u32=270, u64=270}}], 32, -1) = 1
  epoll_ctl(244, EPOLL_CTL_DEL, 270, 0x7f8eb5e6ed18) = 0
  write(249, "\0", 1)                     = 1
  read(270, "\r\n", 4095)                 = 2
  futex(0x7f8eb3ea7cfc, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x7f8eb3ea7cd0, 2) = 1
  futex(0x7f8eb3ea7cd0, FUTEX_WAKE_PRIVATE, 1) = 1
  futex(0x285df26959b8, FUTEX_WAKE_PRIVATE, 1) = 1
  futex(0x285ded2b95d8, FUTEX_WAKE_PRIVATE, 1) = 1
  epoll_wait(244, [{EPOLLIN, {u32=248, u64=248}}], 32, 0) = 1
  read(248, "\0", 1)                      = 1
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, [{EPOLLIN, {u32=248, u64=248}}], 32, -1) = 1
  read(248, "\0", 1)                      = 1
  epoll_ctl(244, EPOLL_CTL_ADD, 270, {EPOLLIN, {u32=270, u64=270}}) = 0
  epoll_wait(244, [{EPOLLIN, {u32=270, u64=270}}], 32, 0) = 1
  epoll_ctl(244, EPOLL_CTL_DEL, 270, 0x7f8eb5e6ed18) = 0
  epoll_wait(244, [], 32, 0)              = 0
  read(270, "\33[1;33mcrosh>\33[0m ", 4095) = 18
  futex(0x7f8eb3ea7cfc, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x7f8eb3ea7cd0, 2) = 1
  futex(0x7f8eb3ea7cd0, FUTEX_WAKE_PRIVATE, 1) = 1
  futex(0x285df26959b8, FUTEX_WAKE_PRIVATE, 1) = 1
  futex(0x285ded2b95d8, FUTEX_WAKE_PRIVATE, 1) = 1
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, [{EPOLLIN, {u32=248, u64=248}}], 32, -1) = 1
  read(248, "\0", 1)                      = 1
  epoll_ctl(244, EPOLL_CTL_ADD, 270, {EPOLLIN, {u32=270, u64=270}}) = 0
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244,

  - all the fds are the same except 270
  - fd/270 is /dev/ptmx (prob master side of this pty)
- close the new crosh and watch strace
  epoll_ctl(244, EPOLL_CTL_DEL, 270, 0x7f8eb5e6ed18) = 0
  write(249, "\0", 1)                     = 1
  read(270, 0x285df02ff000, 4095)         = -1 EIO (Input/output error)
  epoll_wait(244, [{EPOLLIN, {u32=248, u64=248}}], 32, 0) = 1
  read(248, "\0", 1)                      = 1
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, [{EPOLLIN, {u32=248, u64=248}}], 32, -1) = 1
  read(248, "\0", 1)                      = 1
  close(270)                              = 0
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, 
- looks good -- we basically got an EOF on 270 (terminal master), so we went ahead and closed it and didn't add it back to the watch list
- press enter once on the (now frozen) crosh
- we get a single event
  [{EPOLLIN, {u32=274, u64=274}}], 32, -1) = 1
  epoll_ctl(244, EPOLL_CTL_DEL, 274, 0x7f8eb5e6ed18) = 0
  write(249, "\0", 1)                     = 1
  read(274, "\r\n\33]0;chronos@localhost:/\7\33[01;3"..., 4095) = 69
  futex(0x7f8eb3ea7cfc, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x7f8eb3ea7cd0, 2) = 1
  futex(0x7f8eb3ea7cd0, FUTEX_WAKE_PRIVATE, 1) = 1
  futex(0x285df26959b8, FUTEX_WAKE_PRIVATE, 1) = 1
  futex(0x285ded2b95d8, FUTEX_WAKE_PRIVATE, 1) = 1
  epoll_wait(244, [{EPOLLIN, {u32=248, u64=248}}], 32, 0) = 1
  read(248, "\0", 1)                      = 1
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, [], 32, 0)              = 0
  epoll_wait(244, 
- see how we called EPOLL_CTL_DEL on 274 but not the corresponding EPOLL_CTL_ADD
- entering things into crosh doesn't trigger anything anymore

next step is to look at the Chrome source code

Comment 12 by agrif...@gmail.com, Jun 28 2018

FYI vapier, I had to powerwash and switch to stable because of https://bugs.chromium.org/p/chromium/issues/detail?id=848540

After that, I'm not seeing this problem, so unfortunately I won't be able to test a fix for you.
I'm seeing this on my powerwashed dev channel Chromebook Pro (Caroline) on chrome OS 69.0.3494.0, so it occurs even after a powerwash for me (on the dev channel mind you).
Cc: dgreid@chromium.org smbar...@chromium.org vapier@chromium.org nverne@chromium.org
 Issue 867161  has been merged into this issue.
Cc: amistry@chromium.org
I was debugging  crbug.com/868763  and came across this while looking for other similar bugs. I can repro this reliably by enabling the "NativeCrxBindings" feature, and "fix" by disabling it.

To repro:
1. Place device into dev mode and mount / rw.
2. Edit /etc/chrome_dev.conf and add:
--enable-features=NativeCrxBindings
3. Reboot
4. Login and open 2 terminal tabs, close one of them.

In that bug, when an app window is closed, the extensions system unregisters the app pages from receiving events.
This doesn't seem to be a problem for me anymore on Caroline on the Canary Channel

Google Chrome	70.0.3530.0 (Official Build) canary (64-bit)
Revision	a20250ff3690485ecc1cfaf289802499a7e611aa-refs/branch-heads/3530@{#1}
Platform	11005.0.0 (Official Build) canary-channel caroline
Firmware Version	Google_Caroline.7820.384.0
Summary: crosh: w/NativeCrxBindings enabled: close one crosh, all other croshes freeze (was: crosh: close one crosh, all other croshes freeze)
Status: Fixed (was: Started)
It was fixed in crrev.com/582793 and also merged into 69. It also doesn't affect stable. Unless we get a report to the contrary, I think we should consider this fixed.

Sign in to add a comment