New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 598088 link

Starred by 19 users

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug



Sign in to add a comment

Mach shared memory can cause renderer hang.

Reported by alli...@saucelabs.com, Mar 25 2016

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Firefox/45.0

Example URL:
http://google.com

Steps to reproduce the problem:
Here are the steps I follow to trigger my problem:
1. Using kvm-enabled qemu, launch a Mac VM (such as 10.9 - this issue was seen to affect OS X 10.8 through 10.11).
2. Download Chrome 49 (https://www.google.com/chrome/browser/desktop/) using Safari/Firefox
3. Launch Chrome

What is the expected behavior?
The two starting tabs load:
"Chrome": chrome://chrome-signin/?access_point=0&reason=0  "Getting Started": https://www.google.com/intl/en/chrome/browser/welcome.html)

Additional user-created tabs can also request urls and load them

What went wrong?
Of the two starting tabs, the chrome://chrome-signin tab either never loads, or appears to have loaded, but nothing is rendered on the screen. (In these cases, the Network tab shows that the page resources have been successfully retrieved, although they are not visible.) The "Getting Started" tab either never loads and spins forever, or successfully loads. When it successfully loads, any links on the page may be successfully clicked and loaded, and it's possible to edit elements via "Inspect element" to serve links to any desired webpage, and these successfully load. Sometimes, when this "Getting Started" tab loads successfully, inputting a new url in the address bar is also successful; other times, inputting a url and hitting enter appears to do nothing whatsoever - the page doesn't appear to be loading, and external network traffic logging indicates that no request is issued. Regardless of how the first two tabs behave, additional user-created tabs are unable to load anything.

I've uploaded two gifs here that demonstrate the issue, as well as the Guest profile workaround described below: http://imgur.com/a/QqMg9

Does it occur on multiple sites: Yes

Is it a problem with a plugin? No 

Did this work before? Yes Chrome 48 and below

Does this work in other browsers? Yes 

Chrome version: 49.0.2623.108  Channel: stable
OS Version: OS X 10.9
Flash Version: Shockwave Flash 21.0 r0

Interestingly, the issue goes away when switching the profile to a Guest profile - in this case, everything loads as expected. Switching away from this Guest profile causes the issue to come back.

A copy of Chrome 48 on the same machine works without issue, as do Firefox and Safari.

Google's search suggestions also appear successfully as text is written to the address bar. Hitting enter doesn't seem to cause the url to load.

Launching Chrome with the switches  "--enable-logging --v=1" and inspecting the resulting chrome_debug.log, there was only one error, which may or may not be related:

[4303:1287:0324/171016:WARNING:vt_video_decode_accelerator_mac.cc(205)] Failed to create hardware VideoToolbox session. Hardware accelerated video decoding will be disabled.

The Guest profile can load chrome://net-internals (and any other chrome:// pages), but the Default profile cannot.

I verified that the latest dotrelease (49.0.2623.108) did not solve the issue.

I'm more than happy to provide any additional details that would help. Many thanks for your time!
 
Labels: Needs-Feedback
Can you purge your chrome user profile? Sounds like it is corrupt.

https://support.google.com/chrome/answer/142059?hl=en

Comment 2 by al...@saucelabs.com, Mar 28 2016

Hi there!

We actually came upon that suggestion when looking for a fix on our own, but the VMs that this is failing on don't have a Default user profile to begin with (it gets generated on the first run of Chrome 49).

We've also tried having Chrome 48 generate a Default user profile for us (which gives us a working Chrome 48 instance), and then quitting out of Chrome 48 and starting a Chrome 49 instance afterwards to no avail. Interestingly enough, the opposite seems to work fine (having Chrome 49 generate the default profile and then using Chrome 48 to browse the web).

I'm wondering if there is anything else we can do on our end to give you more information? I know getting a data dump from chrome://net-internals has been requested before, but we can really only access the chrome:// pages right now if we use the Guest profile, so I'm not sure how much use that would be. One other option is that we can set you up with remote access to one of our VMs that are exhibiting this problem so that you can see it first hand and poke around on the machine yourselves if you would like!

Thanks in advance for all of your help! It's much appreciated :)
Thanks for the response! I'm afraid that removing the default user profile didn't solve the issue, as my colleague mentioned (we're trying to solve this together). Unfortunately it's happening on a fresh install. 

We're following this similar issue: https://bugs.chromium.org/p/chromium/issues/detail?id=595968 It mentions old hardware, and the virtualized CPU we're using is also a Core 2 Duo, for what it's worth.

If there's anything we can provide to help investigate this further, please do let us know! Thanks again for your help.

Comment 4 by al...@saucelabs.com, Mar 29 2016

One thing that was mentioned in  issue 595968  was taking a look at the task manager to see if processes were being spawned for each tab, and it looks like in our case they are. I've attached another .gif that shows this behavior. One thing that might be worth noting is that the browser doesn't seem to respond to the "Stop" and "Refresh" commands.
Project Member

Comment 5 by sheriffbot@chromium.org, Mar 30 2016

Labels: -Needs-Feedback Needs-Review
Owner: dtapu...@chromium.org
Thank you for providing more feedback. Adding requester "dtapuska@chromium.org" for another review and adding "Needs-Review" label for tracking.

For more details visit https://sites.google.com/a/chromium.org/dev/issue-tracking/autotriage - Your friendly Sheriffbot
Owner: ----

Comment 7 by mmenke@chromium.org, Mar 30 2016

Could one of you provide a network log as well?  Instructions:  https://sites.google.com/a/chromium.org/dev/for-testers/providing-network-details

I don't think this is a network bug, but best to be sure.

Comment 8 by al...@saucelabs.com, Mar 30 2016

Hi!

Unfortunately the chrome://net-internals page doesn't load using the default profile, but I was able to start a Guest session and use net-internals to capture information from the default profile window (which I didn't know was possible before). Please see the attached :)
net-internals-log.json
547 KB View Download

Comment 9 by al...@saucelabs.com, Mar 30 2016

Also, if you would like to see this behavior first hand, we can still set you up with a VM to try out :)
Components: UI>Browser>Navigation
The fact that net-internals fails to load strongly indicates this is an issue starting up / setting up the child process.

The log you uploaded also supports that theory - the only network requests are for google.com, from the omnibox, as you typed "twitter.com", and one for http://hello/.  None for twitter, so the new process never made any network requests.

I assume devtools shows the same behavior?

If you run chrome with "--no-sandbox" does it work, or "--single-process"?  You should never use either of these command line flags in production, this is purely for diagnostic purposes.
And thanks for the offer of a VM, and your helpfulness in general.  Someone may take you up on it, I'm just trying to determine who to pass this bug to.
You're very welcome! We really appreciate the time you are taking to look into this as it's a fairly big issue for us considering having Chrome 49 available is a big part of our product offering.

I tried out the --no-sandbox and --single-process switches both individually and together, and unfortunately the problem still persists.

You are right about devtools showing no network activity when I use the omnibox to try and request a new URL. I've attached a screenshot where I've asked it to get hello.com, and as you can see, nothing new showed on the network monitor.
Screen Shot 2016-03-30 at 1.39.21 PM.png
192 KB View Download
Status: Untriaged (was: Unconfirmed)
That's unexpected - I would have thought devtools would be blank and hanging, too, just like the webpage, since chrome pages are hanging.  Going to keep the navigation and blink labels, because that seems the place to start.

Comment 14 by creis@chromium.org, Mar 30 2016

Does chrome://settings work?  If the only chrome:// URL failing is the initial sign-in page, that's a special case that involves loading some content from the web.

I'm not sure what's special about the Getting Started process (and Guest profile), but it sounds like most other renderer processes are unable to make network requests for some reason.  Typing in a cross-site URL like cnn.com will normally start a new process, just like going to it in a different tab.  Clicking a link will normally stay in the same process, and that's working for you (at least in a process that is already working, like the Getting Started page).

You can confirm this by running with --process-per-tab and typing cnn.com into the omnibox in the Getting Started tab.  That won't do a process swap and it should work.  Conversely, you can run with --site-per-process and we'll do a process swap on link clicks as well, which I would expect to fail.

Unfortunately, I have no idea why network requests wouldn't work in most renderer processes.  This would just help narrow down the cause a bit.
chrome://net-internals/ is also apparently not working with the default profile.
Just to clarify, all chrome:// pages don't actually load when using the default profile (including chrome://settings).

Also, I decided to try a few more instances of opening up devtools, and it seems to only intermittently open up the devtools box.

I've attached another .gif showing chrome://settings not loading, even though I'm trying it in a tab that can load other web pages without problems (the .gif also shows devtools not opening).

Going to try the --process-per-tab switch and report back my findings.
settings.gif
2.7 MB View Download
One interesting finding that we had was that if the working "Getting Started" tab spawns other tabs through link clicks, those tabs seem to work without any problems. spawnedtabs.gif shows this in action.

@creis regarding the --process-per-tab and --site-per-process switches, the info I gathered is as follows:

1. --process-per-tab works as you expected with loading cnn.com in the working "Getting Started" tab (this pretty much behaves the same, as if I were not to use the switch)
2. Using --site-per-process, I was still able to load links, webpages and spawn new working tabs from the "Getting Started" tab (which, again is similar behavior to what would happen if I were not to use the switch). I attached the siteperprocess.gif where I show what I was doing with the --site-per-process switch on.
spawnedtabs.gif
2.5 MB View Download
siteperprocess.gif
4.3 MB View Download

Comment 18 by creis@chromium.org, Mar 30 2016

Sounds like my theory might be wrong.

Then again, you're only browsing google.com pages in all those videos, so we wouldn't be doing process swaps in any of them.  What happens if you actually go cross-site to cnn.com or chromium.org?

My theory was that any time you switch to a new process, it's unlikely to work.  You can verify whether a new process is created or not using Chrome's Task Manager and seeing if the Process ID for the tab changes.
I gave the --site-per-process thing another shot; this time trying to go to cnn.com and it does indeed hang on anything that isn't a google domain. I had the task manager open while doing this, and it doesn't appear to be able to spawn a new process for cnn.com. You can see this in siteperprocesscnn.gif 

I also included a gif of a --process-per-tab session as well for good measure.
siteperprocesscnn.gif
6.9 MB View Download
processpertab.gif
3.2 MB View Download
@creis in  issue 595968  which seems to be very related (especially since using the Guest Profile also fixes the issue for the OP) the OP mentions that he's on a laptop using a Core 2 Duo as does another poster in that thread. 

This is a little suspicious since in our VMs we're currently emulating Core 2 Duo architecture as well. Off the top of your head, do you know of any recent changes in 49 that might cause this sort of incompatibility?
Cc: mark@chromium.org a...@chromium.org nasko@chromium.org
I'm afraid I don't know enough in that area.  Adding Avi and Mark in case they have any ideas about these two bugs, and how Core 2 Duo on Mac might have somehow regressed in M49.  So far, it does sound like certain renderer processes end up getting stuck unable to navigate.
I appreciate all of your help so far in this anyways @creis :) Hoping we can figure out what's going on soon!

On our end, we're going to try cloning one of our 10.9 VMs and having it run on a Sandy Bridge emulation instead to see if that fixes the issue. Will report back if there's any progress.
Cc: rdsmith@chromium.org
 Issue 595968  has been merged into this issue.
More reports from  issue 595968  that users running into this are using a Core 2 Duo.  It could be a race that is more likely to surface when there are only two cores, I suppose.
Summary: Pages are randomly failing to load starting with Chrome 49 (was: Websites and chrome:// pages won't load on Mac unless using Guest profile in Chrome 49)
Updating title, to make issue more discoverable.
Owner: mark@chromium.org
Mark, could you help triage this?  Avi said you might know more about the lower levels where we're having trouble with Core 2 Duo users.

Comment 27 by mark@chromium.org, Apr 5 2016

Is this a VM-only problem or has anyone been able to reproduce on a bare machine?
@mark This is not only a VM problem. I originally reported it on my 2010 MacBook Air if you look at the other ticket that was just merged into this one.
My Late 2009 Mac Mini is affected by this bug. Not a VM.

Comment 30 by mark@chromium.org, Apr 5 2016

Got it, thanks. I think that comment 24 is the most likely explanation so far, rather than it being a problem with the Core 2 architecture itself.
We're actually only simulating 1 core in our VMs where this error occurs, so having two cores MIGHT not be the issue? 
Screen Shot 2016-04-05 at 1.23.19 PM.png
91.2 KB View Download
I apologize for stating before that we're using Core 2 Duos, not Core 2 Solos, that was  my mistake.

Comment 33 by mark@chromium.org, Apr 5 2016

Can we get a samples of all of the involved processes when you’re seeing this?
Hi Mark!

I've attached two .gifs displaying the task manager while I work through different tabs. This is what you were looking for, right?

The first run is in processes.gif, and I figured that it might help to include a .gif of a subsequent run where it wasn't doing the "first run" stuff in processes2.gif

I've offered this before, but if you like we can easily set you up with a trial account on our service so that you can see this behavior firsthand yourself and poke around on the VM.

processes.gif
7.6 MB View Download
processes2.gif
1.4 MB View Download
@mark just wanted to point out since I was the person who created the ticket that got merged into this one. The behavior I have been noticing (as well as the other people who commented on that ticket) is slightly different than the one described in this ticket.

Occasionally I have seen the new tab page fail to load any page or do anything at all like in the gif attached here, but more often random pages will hang and fail to load and spin for a while. They show up in the task manager, but never actually make any network requests to load the page. It definitely feels like a race condition since it seems completely random when it happens. 

During the time when it is stuck the stop and reload buttons do not work, but killing the task in the task manager allows you to reload it at which point it will work again.

It also applies to loading of extensions (certain extensions will randomly fail to load) when the browser is started up. After killing them and getting the message that they crashed and click to reload, they work again.

I am pretty sure that is the more common behavior people are seeing.

I can attach a GIF later to illustrate the behavior if you would like.

Comment 36 by mark@chromium.org, Apr 5 2016

No, I want you to run the “sample” tool and attach its output. You can use the Task Manager to find the relevant PIDs, and run “sample 1234 > sample_1234.txt” substituting the PID you’re interested in for 1234. Do this for each wedged process, and attach the captured samples here.
Hi Mark!

I've attached three sample files that I obtained using the following steps:

1. Started up Google Chrome and opened Task Manager
2. Opened a new Tab
3. Started all three sample instances to capture the processes "Browser", "GPU Process" and the "new Tab" over a period of 120 seconds
4. Typed CNN.com in the new tab and hit enter (tab didn't respond as usual)

Hopefully these are what you're looking for, but if not please let me know what else I can grab for you.

Cheers!
tab.txt
51.7 KB View Download
gpu.txt
69.3 KB View Download
browser.txt
2.2 MB View Download

Comment 38 by mark@chromium.org, Apr 5 2016

Symbolized.
tab.sym.txt
54.2 KB View Download
browser.sym.txt
3.0 MB View Download

Comment 39 by mark@chromium.org, Apr 5 2016

Fixed symbolized gpu sample.
gpu.sym.txt
77.9 KB View Download

Comment 40 by wob...@gmail.com, Apr 6 2016

While I've starred this bug and 595968 for some time, I thought I have no additional information. Being on OS X 10.9.5, Intel Core 2 Duo (natively, not a VM), I also noticed this randomly failing load of pages as well suddenly starting with Chrome 49.

However, I just noticed, that this also happens if I start chrome anew and it loads the local starting page listing the 8 top visited pages. I think nobody mentioned this before, but it looks like the very same issue. Sometimes this starting page is shown, other times the window remains completely empty. No network should be involved to display this page. Also, no spinning counter in the tab is shown, while a spinning counter is shown for web pages failing to load. Everything else is identical as for failing load of web pages. It sometimes work, it sometimes doesn't. Really strange and very annoying.
The exact same issue is happening to me (natively) on Macbook Pro Mid 2009 Core 2 Duo and 10.9.5. I have the same version of Chrome and Os on a newer Mac Mini 2012 or so (so a different architecture) and I don't have the issue there.
Just an update from our side: we managed to get Chrome 49 working on a test VM running OS X 10.9.5 by switching over to Sandy Bridge emulation rather than using the Core 2 Solo we were using before. 

This isn't really anything new, but I just thought it might be useful to have another data point that supports this issue being related to compatibility between the Core 2 architecture and Chrome 49.
Hi guys! I just confirmed that unfortunately this problem still exists on the new stable released today (50.0.2661.75). Is there currently still an effort to get this resolved? Thanks!
Yeah. I pointed out in the other ticket that it still was not working correctly in Canary (51 at the time).
Hi Mark,

In case additional samples could be helpful, I have attached samples from this happening on a non VM machine.
Google_Chrome_Helper_2016-04-19_002513_k6jk.sample.txt
48.7 KB View Download
Google_Chrome_Helper_2016-04-19_002634_ak9O.sample.txt
49.0 KB View Download

Comment 46 by dr.d...@gmail.com, Apr 19 2016

Hi,

I'm also seeing this bug since Chrome 49. I'm running OS X 10.11.4 on a Mid-2010 MacBook Pro which uses a Core2Duo processor. Let me know if you need any further debug info; this issue is prevalent enough to be annoying i.e. multiple times per hour. No specific URL affected; I agree that if feels like a race condition - close the tab and reload the URL again usually fixes the issue on a per-incident basis.

Cheers,

Ian

Comment 47 by yosin@chromium.org, Apr 20 2016

Components: -Blink Blink>Loader
I just did a bisect of all of the builds between Chrome 48 and Chrome 49 to try to figure out where this broke. I have narrowed it down to a couple of commits.

Build 361868 works fine. 
Build 361871 does not work.

I am fairly sure this is the commit that broke it

https://chromium.googlesource.com/chromium/src/+/8e53d1be210193655b682ab381c3377700705067

Looking at the notes in the commit:

> Speculative revert: 3 mac perf bots started timing out in random tests around the time this patch landed. I'll keep an eye on the waterfall and reland if things don't improve.

Then it was relanded because

> Relanding. Looks like the failures may have been unrelated after all.

I really hope someone from the Chromium team can now take a look at this since I spent hours narrowing this down for you :)

Owner: skyos...@chromium.org
Moving skyostil@ to owners. Really appreciate that you ran the bisect.
Cc: skyos...@chromium.org
Owner: erikc...@chromium.org
Status: Assigned (was: Untriaged)
Thanks for the bisect! Passing this over to erikchen@ who is the original author of that CL.
Just wanted to clarify that I was testing the behavior mentioned in  issue 595968 . Someone from saucelabs will have to verify my findings for this ticket since I don't have a way to test that, but it is likely the same issue :)
Will do, Craig! I still feel like our issues stem from the same problem, even though our symptoms seem to be slightly different.

Thanks for getting the ball rolling again on this. Great find! :)
Labels: -Pri-2 Pri-1
iamcraig: Please follow the steps in Comment 36 to get samples of "Browser" after it has wedged.

Both iamcraig and aliao: 

Follow the steps in https://www.chromium.org/developers/how-tos/trace-event-profiling-tool/recording-tracing-runs to record a tracing run. After you've started the browser, make sure that starting a trace is the first thing you do. Then cause the browser to wedge and stop the tracing run.

aliao: If you can give me access to a VM, I'll take a look directly as well.
@erikc My samples are attached in comment #45. I will try to do the tracing run later tonight.
iamcraig: Unfortunately, those are samples of the wrong process. Use Chrome's Task Manager to find out the PID of the "Browser" process and grab a sample of that. It should say "Google Chrome" instead of "Google Chrome Helper"
@erikc could that be related to the issue? I am almost certain that I copied the correct Process IDs from the Task Manager. Could it be that when the tabs are in this “uninitialized spinning” state they show up as Chrome Helper processes? 

I will try it again later tonight.
Let's try this:

Open your task manager and take a screenshot using cmd+shift+4. Then get samples for "Browser", whose name should be "Google Chrome" and a hung renderer, whose name should be "Google Chrome Helper".


Hi Erik!

I've attached a zip file containing the trace that you requested, along with a .gif showing what I was doing along with the trace. Please let me know if this isn't sufficient and if you need anything else from me.

I'm sending you an email right now with some instructions on getting you set up with a VM that exhibits this behavior so you can poke around yourself if you like! :)

Cheers! 
trace.zip
1.3 MB Download
trace.gif
2.2 MB View Download
@erikc Sorry I totally misread that you were asking for “Browser” samples. The instructions in comment 36 mentioned finding the pids for the individual stuck processes. I will get that and the tracing run for you later, but looks like aliao has some additional info in the meantime :)
Interesting observations:

The problem in aliao's VMs is definitely related to Mach shared memory. You can verify this by adding the command line flags for Chrome 49. Adding the first causes Chrome to break. Adding the second causes Chrome to work.
--force-fieldtrials=MacMemoryMechanism/Mach
--force-fieldtrials=MacMemoryMechanism/Posix

iamcraig: Can you verify this with your non-VM machine?

The NTP doesn't appear to have any trouble loading. Furthermore, launching a site with "./out/Debug/Chromium.app/... wikipedia.org" also works. When it loads, navigating by clicking on links works fine. 

All new tabs cannot perform any navigations. If you open a new tab, all navigations fail. If you then press the "X" button and wait, Chrome will put up the "Page no responding" dialog, which implies that the browser process is having trouble communicating with the new renderer.

I added logging around all direct uses of Mach ports (MapAt, Duplicate, Create, etc.) but everything seems to work fine.
@erikc I can confirm that when using --force-fieldtrials=MacMemoryMechanism/Posix everything works, and when using Mach, tabs occasionally hang.

I have attached the new samples you have requested. Unfortunately, I cannot get tracing to work. When I press the record button, nothing happens. It's supposed to bring up a dialog with options from what I understand. I don't get anything. No errors in the console either.
Google_Chrome_2016-04-20_214252_NysM.sample.txt
737 KB View Download
Google_Chrome_Helper_2016-04-20_214331_PzZH.sample.txt
48.7 KB View Download
I tried again and was able to grab a trace. For some reason the tracing behavior also seems to exhibit the same tendencies (clicking record and it hanging intermittently). Although it could potentially be related to  issue 604885 .
trace_chrome-hanging.json.gz
7.0 MB Download
Thanks for all the help, iamcraig and aliao. 

The problem is a race condition that is *almost* never hit, because it requires some very odd thread scheduling.
Thread A posts a task onto thread B, and is immediately interrupted. Thread B runs the task to completion, and then Thread A continues its task.

Working on a fix.
Cc: shrike@chromium.org
I'm going to merge the fix to M51, but we may want to consider using Finch to turn off Mach ports for M49 and M50. I'm inclined to do so.

shrike@, opinion?
To give slightly more details about the bug: We were connecting IPC channels, and then registering them with the attachment broker. 

This means that if there was a Mach port sent to a newly created process that was waiting to be processed, and if connecting the IPC channel pauses the current task immediately [instead of executing the next line of code, which would register the attachment broker], switches threads to the newly posted task and processes that, then all future IPC messages to the new process will be hosted, because we will have effectively lost an attachment brokering message.
Cc: pinkerton@chromium.org
Seems like using Finch to prevent MacMemoryMechanism/Mach for M49 and M50 is the right thing. Can you provide a bit more info about this experiment/pointer to bug or design doc?

Labels: ReleaseBlock-Stable M-51
Thanks for the design doc links (very well written!).
Labels: -Needs-Review
Removing from triaging queue.
Yeah, i'm in favor of turning this off for M49 and M50 via finch, and having the fix in M51. Thanks for all the research!
Project Member

Comment 73 by bugdroid1@chromium.org, Apr 25 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9097190cafdecfb1b5796b65c83af92e861ccaf8

commit 9097190cafdecfb1b5796b65c83af92e861ccaf8
Author: erikchen <erikchen@chromium.org>
Date: Mon Apr 25 23:45:31 2016

IPC: Fix attachment brokering race condition.

A channel must be registered as a broker communication channel before it is
connected. When possible, invert the sequence of the call to connect a channel,
and the call to register the channel as a broker. In some cases, the channel
constructor and the channel initializer had to be separated, so that the
registration could happen in between.

This requirement is now enforced by a CHECK, which verifies that a channel
cannot be registered after it is connected.

BUG= 598088 

Review URL: https://codereview.chromium.org/1903663004

Cr-Commit-Position: refs/heads/master@{#389618}

[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/components/nacl/loader/nacl_listener.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/content/browser/renderer_host/render_process_host_impl.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/content/child/child_thread_impl.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/content/common/child_process_host_impl.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/ipc/attachment_broker_mac_unittest.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/ipc/attachment_broker_privileged_win_unittest.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/ipc/ipc_channel.h
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/ipc/ipc_channel_common.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/ipc/ipc_channel_nacl.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/ipc/ipc_channel_posix.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/ipc/ipc_channel_proxy.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/ipc/ipc_channel_proxy.h
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/ipc/ipc_channel_win.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/ipc/ipc_endpoint.h
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/ipc/ipc_test_base.h
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/ipc/mojo/ipc_channel_mojo.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/remoting/host/desktop_process.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/remoting/host/desktop_session_agent.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/remoting/host/ipc_util.h
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/remoting/host/ipc_util_posix.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/remoting/host/ipc_util_win.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/remoting/host/remoting_me2me_host.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/remoting/host/win/unprivileged_process_delegate.cc
[modify] https://crrev.com/9097190cafdecfb1b5796b65c83af92e861ccaf8/remoting/host/win/wts_session_process_delegate.cc

Labels: Merge-Request-51
Summary: Mach shared memory can cause renderer hang. (was: Pages are randomly failing to load starting with Chrome 49)
Transitioning to Mach shared memory fixed a common GPU hang. After turning off Mach shared memory, we saw a sudden spikes (thousands) of GPU hangs:
https://bugs.chromium.org/p/chromium/issues/detail?id=560875

This may be worse than the very rare users who see a stuck renderer. So we may want to consider turning on Mach shared memory for M49 and M50 again...
Yikes!

If I'm following this correctly, the real fix for this issue is going to be put into M51, but the hotpatch to fix this in M50 and M49 was to have Chrome default to using Posix instead of Mach, and this is causing  issue 560875  to become worse?
That's right. Switching to Mach substantially reduced GPU hangs. 

Unfortunately, we don't have a good metric for the number of users who are seeing hung renderers so it's hard to make a call on this one. 

Comment 79 by tin...@google.com, Apr 26 2016

Labels: -Merge-Request-51 Merge-Approved-51 Hotlist-Merge-Approved
Your change meets the bar and is auto-approved for M51 (branch: 2704)
Note that the crash "spike" used to be default behavior (M47 and older)
so we're just delaying the milestone that fixes it.
Project Member

Comment 81 by bugdroid1@chromium.org, Apr 26 2016

Labels: -merge-approved-51 merge-merged-2704
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a132709b08846c8ac75bead8d97a0fec2cb30ace

commit a132709b08846c8ac75bead8d97a0fec2cb30ace
Author: erikchen <erikchen@chromium.org>
Date: Tue Apr 26 23:28:02 2016

IPC: Fix attachment brokering race condition.

A channel must be registered as a broker communication channel before it is
connected. When possible, invert the sequence of the call to connect a channel,
and the call to register the channel as a broker. In some cases, the channel
constructor and the channel initializer had to be separated, so that the
registration could happen in between.

This requirement is now enforced by a CHECK, which verifies that a channel
cannot be registered after it is connected.

BUG= 598088 

Review URL: https://codereview.chromium.org/1903663004

Cr-Commit-Position: refs/heads/master@{#389618}
(cherry picked from commit 9097190cafdecfb1b5796b65c83af92e861ccaf8)

Review URL: https://codereview.chromium.org/1917333002 .

Cr-Commit-Position: refs/branch-heads/2704@{#258}
Cr-Branched-From: 6e53600def8f60d8c632fadc70d7c1939ccea347-refs/heads/master@{#386251}

[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/components/nacl/loader/nacl_listener.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/content/browser/renderer_host/render_process_host_impl.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/content/child/child_thread_impl.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/content/common/child_process_host_impl.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/ipc/attachment_broker_mac_unittest.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/ipc/attachment_broker_privileged_win_unittest.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/ipc/ipc_channel.h
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/ipc/ipc_channel_common.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/ipc/ipc_channel_nacl.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/ipc/ipc_channel_posix.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/ipc/ipc_channel_proxy.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/ipc/ipc_channel_proxy.h
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/ipc/ipc_channel_win.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/ipc/ipc_endpoint.h
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/ipc/ipc_test_base.h
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/ipc/mojo/ipc_channel_mojo.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/remoting/host/desktop_process.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/remoting/host/desktop_session_agent.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/remoting/host/ipc_util.h
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/remoting/host/ipc_util_posix.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/remoting/host/ipc_util_win.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/remoting/host/remoting_me2me_host.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/remoting/host/win/unprivileged_process_delegate.cc
[modify] https://crrev.com/a132709b08846c8ac75bead8d97a0fec2cb30ace/remoting/host/win/wts_session_process_delegate.cc

Cc: erikc...@chromium.org gov...@chromium.org tinazh@chromium.org manoranj...@chromium.org
 Issue 560875  has been merged into this issue.
Mach shared memory is fixed on M51+, and turned off via Finch on M50 and older.
Status: fix (was: Assigned)
Status: Fixed (was: Fix)
Project Member

Comment 86 by bugdroid1@chromium.org, May 6 2016

Labels: merge-merged-2661
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/75ca2da31f0449be76215e6c4cddd371d2631e50

commit 75ca2da31f0449be76215e6c4cddd371d2631e50
Author: erikchen <erikchen@chromium.org>
Date: Fri May 06 17:43:17 2016

[Merge to M50] IPC: Fix attachment brokering race condition.

This CL is a merge of the safest, and most importance pieces
https://codereview.chromium.org/1917333002/ to M50.

There is a race condition where an IPC::Channel can start processing messages
before it is registered as a communication channel with the attachment broker.
This CL reorders channel connection and registration.

BUG= 598088 , 609262
R=avi@chromium.org, mseaborn@chromium.org, tsepez@chromium.org

Review URL: https://codereview.chromium.org/1960513002 .

Cr-Commit-Position: refs/branch-heads/2661@{#660}
Cr-Branched-From: ef6f6ae5e4c96622286b563658d5cd62a6cf1197-refs/heads/master@{#378081}

[modify] https://crrev.com/75ca2da31f0449be76215e6c4cddd371d2631e50/components/nacl/loader/nacl_listener.cc
[modify] https://crrev.com/75ca2da31f0449be76215e6c4cddd371d2631e50/content/browser/renderer_host/render_process_host_impl.cc
[modify] https://crrev.com/75ca2da31f0449be76215e6c4cddd371d2631e50/content/child/child_thread_impl.cc
[modify] https://crrev.com/75ca2da31f0449be76215e6c4cddd371d2631e50/content/common/child_process_host_impl.cc
[modify] https://crrev.com/75ca2da31f0449be76215e6c4cddd371d2631e50/ipc/ipc_channel_proxy.h

erikchen@ - the patch in c#73 doesn't seem to touch any files that are directly related to the Mac. But given that you landed it here and marked the bug as fixed I guess that change does affect the Mac? Does it also prevent a similar race condition on Windows and Linux?

It affects a race condition on Windows and Mac for brokering HANDLES and Mach ports, respectively. Note that this mostly effects releases around ~M51, since MojoChannel has been turned on for most channels in tip of tree.
I'm hunting a regression, and this change (landed in beta) stands out the most. But can you tell me more about the Mojo stuff? I guess it handles the attachment brokering so that a regression caused by this change might no longer exist? Is MojoChannel turned on on the Mac?

It's really unlikely that the change in c#73 causes a performance regression. Furthermore, almost all Chrome IPC channels are now layered on top of MojoChannels, including on Mac.
I'm hunting a regression in SessionRestore.ForegroundTabFirstLoaded in beta:

https://uma.googleplex.com/timeline_v2?sid=4277d44b761dc7f6fb8a15703790d74b

It appears to occur within this range of cls, and there's nothing else that stands out:

https://chromium.googlesource.com/chromium/src/+log/51.0.2704.22..51.0.2704.29?pretty=fuller&n=10000

When I look at where this change landed in Canary there is also a spike in SessionRestore.ForegroundTabFirstLoaded time.

I'm asking about Mojo because there's a chance that the metric improved when we switched to it/away from this patch. That would perhaps help confirm that this was the source of the regression.

rockot@ is the right person to talk to about the timing of MojoChannel changes.

Sign in to add a comment