New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 698759 link

Starred by 7 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Crash: ProcessSingleton::NotifyOtherProcessWithTimeout

Project Member Reported by sheriffbot@chromium.org, Mar 6 2017

Issue description

Crash Signature: ProcessSingleton::NotifyOtherProcessWithTimeout
Process Type: Browser
Platform: Mac
Channel: Canary
Version: 59.0.3030.0
Distinct Clients: 8
CPM: 1.99
Crash Reports: 21
Median Uptime: startup
Infected Clients: 0.0%

Sample Reports:
https://crash.corp.google.com/browse?q=reportid=%27139e699e20000000%27
https://crash.corp.google.com/browse?q=reportid=%27648846b300000000%27
https://crash.corp.google.com/browse?q=reportid=%2770550eb300000000%27
https://crash.corp.google.com/browse?q=reportid=%2784e226b300000000%27
https://crash.corp.google.com/browse?q=reportid=%27de2b93a480000000%27

Crash Link:
https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_Mac%27%20AND%20product.version%3D%2759.0.3030.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27ProcessSingleton%3A%3ANotifyOtherProcessWithTimeout%27

Crash Link (with version impact distribution):
https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_Mac%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27ProcessSingleton%3A%3ANotifyOtherProcessWithTimeout%27

Crash Stacktrace:
EXC_BREAKPOINT (0x101c0711f)
#0 0x101c0711f in ProcessSingleton::NotifyOtherProcessWithTimeout chrome/browser/process_singleton_posix.cc:223
#1 0x101c077bc in ProcessSingleton::NotifyOtherProcessWithTimeoutOrCreate chrome/browser/process_singleton_posix.cc:885
#2 0x101c07769 in ProcessSingleton::NotifyOtherProcessOrCreate chrome/browser/process_singleton_posix.cc:874
#3 0x1018b1839 in ChromeBrowserMainParts::PreMainMessageLoopRunImpl chrome/browser/chrome_browser_main.cc:1535
#4 0x1018b149d in ChromeBrowserMainParts::PreMainMessageLoopRun chrome/browser/chrome_browser_main.cc:1242
#5 0x100786742 in content::BrowserMainLoop::PreMainMessageLoopRun content/browser/browser_main_loop.cc:1166
#6 0x100ab2085 in content::StartupTaskRunner::RunAllTasksNow base/callback.h:85
#7 0x100784a21 in content::BrowserMainLoop::CreateStartupTasks content/browser/browser_main_loop.cc:974
#8 0x1007896d3 in content::BrowserMainRunnerImpl::Initialize content/browser/browser_main_runner.cc:125
#9 0x100782473 in content::BrowserMain content/browser/browser_main.cc:42
#10 0x10186bb4f in content::ContentMainRunnerImpl::Run content/app/content_main_runner.cc:836
#11 0x10186ae65 in content::ContentMain content/app/content_main.cc:20
#12 0x1002dda9e in ChromeMain chrome/app/chrome_main.cc:121
#13 0x100260d99 in Google Chrome Canary+0xd99 
#14 0x7fffb212c254 in start 
#15 0x7fffb212c254 in start 


Reporter: ajha

 

Comment 1 by ajha@chromium.org, Mar 6 2017

Cc: -ajha@google.com jamescook@chromium.org ajha@chromium.org
Labels: -Type-Bug ReleaseBlock-Stable M-58 OS-Mac Type-Bug-Regression
Owner: sky@chromium.org
Status: Assigned (was: Untriaged)
Crashes are seen on Mac and first spiked in M-58 from chrome version: 58.0.3029.0.

Considering below as the changelog:
===================================
https://chromium.googlesource.com/chromium/src/+log/58.0.3028.0..58.0.3029.0?pretty=fuller&n=10000

Suspecting: https://codereview.chromium.org/2729633002 from the above regression range.

As James@ is out on Sick leave, assigning to sky@ for help in further investigation of these crashes.
Project Member

Comment 2 by sheriffbot@chromium.org, Mar 6 2017

Labels: FoundIn-M-59
Users experienced this crash on the following builds:

Mac Canary 59.0.3030.0 -  1.96 CPM, 21 reports, 8 clients (signature ProcessSingleton::NotifyOtherProcessWithTimeout)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
 Issue 698620  has been merged into this issue.

Comment 4 by ajha@chromium.org, Mar 7 2017

Cc: roc...@chromium.org
Labels: -ReleaseBlock-Stable ReleaseBlock-Beta
sky@: Could you please take a look at these crashes and confirm if the suspected CL is related. This has been amongst the top browser crash on Mac canary.
It seems unlike this is related to my CL. My CL should only change before for chrome --mash (which doesn't run on Mac). It does touch a posix file, but only in code that installs signal handlers that run at shutdown, and only in response to friendly signals (not crash exceptions). Also, my code runs in PostMainMessageLoopRun() but the crash looks like it happens before that, in PreMessageLoopRun().

I didn't see anything obvious in the change list, though. :-(

Cc: ligim...@chromium.org
Components: Internals
Labels: -ReleaseBlock-Beta ReleaseBlock-Stable
Owner: mattm@chromium.org
Looks like the crash existed in old milestones as well, the reports are same in M54 and M59. Unfortunately no changes in process_singleton_posix.cc since the last few months too. Below is the historical data of crash rate. 

59.0.3033.0	7.69%	7	
59.0.3032.0	26.37%	24	
59.0.3030.0	24.18%	22	
58.0.3029.3	3.30%	3	
58.0.3029.0	18.68%	17	
56.0.2924.87	5.49%	5	
55.0.2883.95	6.59%	6	
54.0.2840.98	5.49%	5	
54.0.2840.71	2.20%	2	

OS
===

10.12 (Sierra)	        60.44%	55	
10.11 (El Capitan)	17.58%	16	
OSX ??? - 0.0.0	        13.19%	12	
10.10 (Yosemite)	8.79%	8	

Majority of the crashes are from Sierra.

Only possible suspect from code search is - https://chromium.googlesource.com/chromium/src/+/a92250eb824c4f23ca2dab70d3029a52650591c1 , which is a pretty old CL.Assigning to Matt, please feel to unassign if its not yours.

Also looping to Stability sheriff for further investigation.

Comment 7 by mattm@chromium.org, Mar 7 2017

These are all hitting the max socket length check in process_singleton_posix.cc: CHECK(path.length() < arraysize(addr->sun_path))
Nothing has changed there recently, any correlation with chrome versions must be purely coincidental.
Looking at some minidumps, it appears to be people with a non-standard TMPDIR variable that is longer than the default.
So far they all look like "/Users/<FOO>/Library/Application Support/Slack/temp". I wonder if this occurs for people who launch chrome from Slack? (If chrome is already running beforehand, then clicking a link in slack would not trigger it.)

There are some simple changes we could do to reduce the length of the parts of the socket path that we control, and make this harder to hit.

Not sure if there is anything we could/should do about chrome getting launched from other apps that have set a custom TMPDIR variable.

Comment 8 by mattm@chromium.org, Mar 7 2017

Note for that particular pattern of Slack TMPDIR:
With canary would hit the CHECK with username of 9 or more chars, but with other channels would need a username of 16 or more chars. I suspect that is why it is showing up on Canary mostly.
Cc: mark@chromium.org mattm@chromium.org
Owner: iannucci@chromium.org
This CL recently landed to start consulting the use of $TMPDIR: https://codereview.chromium.org/2713293002/

Comment 10 by ajha@chromium.org, Mar 10 2017

Cc: miguelg@chromium.org
Just to update, Crashes are seen only on the canary channel builds and no crashes seen on the latest Dev(58.0.3029.6).

From the regression range:
===========================
https://chromium.googlesource.com/chromium/src/+log/58.0.3028.0..58.0.3029.0?pretty=fuller&n=10000

Could this https://codereview.chromium.org/2727003002 also be related change. cc'ing miguelg@ as well. 

Comment 11 by ajha@chromium.org, Mar 15 2017

Labels: Stability-Sheriff-Desktop
(stability sheriff here)

It sounds as this fully breaks Chrome for some users. This is likely a case where the metric will "improve" on its own if we do nothing, which would be bad. 

#7 seems to say there are some simple things we could try?
Labels: -Stability-Sheriff-Desktop
I'll remove the stability sheriff label as the issue is understood and being looked at.

Comment 14 by mattm@chromium.org, Mar 15 2017

Re: #12, my suggestion in #7:

So the socket path looks something like:
$TMPDIR/<ScopedTempDir>/<socketname>

Currently that works out to something like:
$TMPDIR/.com.google.Chrome.canary.q87kKe/SingletonSocket

So we can easily shave some length off by shortening the socket name. The ScopedTempDir name should also be possible to shorten though it doesn't currently have an API for that so would be a slightly more involved code change. These doesn't completely solve the issue but would make it less likely to be hit.

I'll make a CL first for the socket name change since that is easy enough.
Oh, wow, I'm just seeing this now >_<.

I'll coordinate with mattm.
Read the bug... the approach suggested in #14 sgtm, fwiw.

Would it make sense to make ScopedTempDir used a fixed-size string for scope? Maybe a truncated hash of the app id?
The other thing we could do would be to take the shorter of NSTemporaryDirectory() and $TMPDIR. That would fix the issue here, and continue to allow infra to manage a path for $TMPDIR (the plan it to have it be even shorter than NSTemporaryDirectory()).

The downside is, of course, that this behavior would be a bit weirdo/unpredictable.
Labels: -Restrict-View-EditIssue
Cc: rsesek@chromium.org
 Issue 702345  has been merged into this issue.
Project Member

Comment 20 by sheriffbot@chromium.org, Mar 19 2017

Labels: FoundIn-M-58
Users experienced this crash on the following builds:

Mac Canary 59.0.3045.0 -  1.53 CPM, 6 reports, 5 clients (signature ProcessSingleton::NotifyOtherProcessWithTimeout)
Mac Beta 58.0.3029.19 -  0.10 CPM, 1 reports, 1 clients (signature ProcessSingleton::Create)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
Labels: -M-58 M-59
Latest behaviour of crashes on latest channels are as below.This crash is still seeing on latest dev and beta channels.

59.0.3046.0	7.89%	28	
59.0.3045.0	2.25%	8	
59.0.3044.0	5.63%	20	
59.0.3043.0	8.17%	29	
59.0.3042.0	7.89%	28	

Link to the list of builds
https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_Mac%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27ProcessSingleton%3A%3ANotifyOtherProcessWithTimeout%27#samplereports:5,productversion:1000

Changing this to M-59 as no crashes seen on latest beta.

iannucci@ Could you please look into this stable blocker issue.

Thanks,

Comment 22 by mattm@chromium.org, Mar 22 2017

Fwiw, on my CL thakis suggested an alternative approach: https://codereview.chromium.org/2751123003/#msg11

"- use CHROMIUM_TEMPDIR_FOR_BOTS if it's set
- use NSTemporaryDirectory() if not as before
- set CHROMIUM_TEMPDIR_FOR_BOTS on bots

That way, we don't change behavior in prod and probably dodge a whole list of
issues, and bots still get to override this dir."

That sgtm
Should I make that change?
Commented on CL in #22
CL to use different envvar: https://codereview.chromium.org/2770753006

Comment 27 by raja...@gmail.com, Mar 31 2017

Hi,
Any updates on this issue? I still can not use Canary.
Project Member

Comment 28 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/16abc4a0f8b419fe03dcfba8799cae52f28993b1

commit 16abc4a0f8b419fe03dcfba8799cae52f28993b1
Author: iannucci <iannucci@chromium.org>
Date: Mon Apr 03 20:04:32 2017

[base/files] Respect MAC_CHROMIUM_TMPDIR instead of TMPDIR on macOS.

Some folks are currently setting $TMPDIR to very long values on macOS,
which causes socket name length issues when using $TMPDIR.

R=mattm@chromium.org
BUG= 698759 ,  696803 

Review-Url: https://codereview.chromium.org/2770753006
Cr-Commit-Position: refs/heads/master@{#461515}

[modify] https://crrev.com/16abc4a0f8b419fe03dcfba8799cae52f28993b1/base/files/file_util_mac.mm
[modify] https://crrev.com/16abc4a0f8b419fe03dcfba8799cae52f28993b1/chrome/browser/extensions/test_extension_dir.cc

Hey raja48a, the CL to change to the new envvar (and stop respecting $TMPDIR) has just landed on master, so it should show up in the next Canary, I suspect.

Comment 30 by raja...@gmail.com, Apr 3 2017

Awesome, thanks so much!
Can we confirm that this is fixed now?

Comment 32 by ajha@chromium.org, Apr 11 2017

Labels: Needs-Feedback
Crash server still shows few crashes on Mac canary. Last Mac canary(59.0.3067.0) reported 11 crashes from 7 clients.

raja48a@: Could you please confirm if the canary now works for you?


Comment 33 by raja...@gmail.com, Apr 11 2017

Downloading the latest Canary, while maintaining the old profile I have on disc, does not work. Canary keeps bouncing off the dock, but never opens. When I delete the entire  "Chrome Canary" folder under ~/Library/Application Support/Google, Canary works.
I wonder if there's some persistent gunk on machines which experienced this, then?

With the change in 16abc4a0f8b419fe03dcfba8799cae52f28993b1 the behavior should be exactly equivalent to how it was before the original change.

Comment 35 by mattm@chromium.org, Apr 11 2017

Cc: -mattm@chromium.org iannucci@chromium.org
Owner: mattm@chromium.org
Oh, yeah :(

When ProcessSingleton starts, if there is no running process, it creates the SingletonSocket symlink in the user-data-dir before attempting to create the actual socket file. If creating the socket hits the path length CHECK, the dangling symlink will be left there.

If there already is a symlink there on startup, ProcessSingleton will try to connect to it to start a new window. Normally if there isn't actually a chrome running there it would delete the symlink and create a new one, but in this case it hits the max path length CHECK again before getting to that point.

So the workaround is just to delete SingletonSocket symlink from your user data dir.

There are at least two things that could be fixed in the code:
1. Avoid leaving the garbage symlink in the first place: Try to create the socket before creating the symlink to it.
2. Recover users stuck in that state: When trying to connect to an existing socket, if the path is too long, just delete it instead of failing on a CHECK().

I guess I can take that.
Project Member

Comment 36 by bugdroid1@chromium.org, Apr 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/916edf164d5d8a60473dcdcf46e13ab0a646d7f3

commit 916edf164d5d8a60473dcdcf46e13ab0a646d7f3
Author: mattm <mattm@chromium.org>
Date: Thu Apr 13 21:26:51 2017

ProcessSingletonPosix: don't CHECK if trying to connect to existing process with too long socket symlink target.

Also don't leave a dangling socket symlink if startup fails because the tmpdir path is too long.

BUG= 698759 

Review-Url: https://codereview.chromium.org/2811173003
Cr-Commit-Position: refs/heads/master@{#464549}

[modify] https://crrev.com/916edf164d5d8a60473dcdcf46e13ab0a646d7f3/chrome/browser/process_singleton_posix.cc
[modify] https://crrev.com/916edf164d5d8a60473dcdcf46e13ab0a646d7f3/chrome/browser/process_singleton_posix_unittest.cc

Comment 37 by ajha@chromium.org, Apr 18 2017

Just to update, fix first landed on Mac canary version: 59.0.3071.0 and apart from 60.0.3072.0(single digit crash), there have been no crashes seen on recent canary builds.

Tracking this for few more canary before any conclusion or adding the verified label here.
Labels: TE-Verified-59.0.3071.15 TE-Verified-M59
No crashes observed on chrome latest dev and canary channels. Last crash observed in M59 is #59.0.3070.0 with 8 instances and in M60 is #60.0.3072.0 with 1 instance.

mattm@ Could you please confirm is this issue is fixed completely? If yes, can we add the verification label for this issue.

Thanks!

Comment 39 Deleted

Comment 40 by mattm@chromium.org, Apr 25 2017

Status: Fixed (was: Assigned)
The crash on 60.0.3072.0 is different, it's failing on socket(). Possibly due to too many files open? Anyway, unrelated to this.
Labels: Merge-TBD
[Auto-generated comment by a script] We noticed that this issue is targeted for M-59; it appears the fix may have landed after branch point, meaning a merge might be required. Please confirm if a merge is required here - if so add Merge-Request-59 label, otherwise remove Merge-TBD label. Thanks.

Comment 42 by mattm@chromium.org, Apr 25 2017

Labels: -Merge-TBD
Looks to me like it made the branch.
Issue 717869 has been merged into this issue.
This crash is still seeing on M58 as below.on latest stable 58.0.3029.96 so far seeing 236 from 114 different clients.

58.0.3029.96	14.30%	236	
58.0.3029.81	43.45%	717	
58.0.3029.54	0.06%	1	
58.0.3029.41	0.30%	5	
58.0.3029.3	0.18%	3	
58.0.3029.0	1.03%	17	

mattm@ could you please look into this issue.

Thanks,
Labels: -M-59 Merge-Request-58 M-58
Status: Assigned (was: Fixed)
Oh. Somehow I didn't notice 95fcf47ca01054ffe38e8609525a6ca8579b7824 was in M58 also :/  (Maybe because the bug was marked M-59?)

I guess we should merge 16abc4a0f8b419fe03dcfba8799cae52f28993b1 and 916edf164d5d8a60473dcdcf46e13ab0a646d7f3 to M58.
Labels: -Needs-Feedback
The crash seems to be fixed no reports after- 60.0.3072.0, currently 60.0.3090.0 is in production.

Crash rate in latest M58 - 58.0.3029.96	,28.46%	576 reports with 201 instances and ranks #2.Looks like we may need take this merge in case of any future 
respin.
mattm@, are we having enough automation test coverage for CLs listed at #45 and will be absolutely safe merge to M58?
Is "automation test" a distinct kind of test or just a general term?
The ProcessSingletonPosix changes have reasonable unittest coverage.

Cc: bhthompson@chromium.org
Labels: -Merge-Request-58 Merge-Approved-58
Thanks for confirming over chat mattm@. Confirmed it's a safe merge, with enough automated testing, and that this affects Linux and Mac. Approving merge for M58. 

+ bernie@ as this may impact ChromeOS as well. 
Project Member

Comment 50 by bugdroid1@chromium.org, May 8 2017

Labels: -merge-approved-58 merge-merged-3029
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fc1487fed94d1b5989094220dbbbb80c05841b51

commit fc1487fed94d1b5989094220dbbbb80c05841b51
Author: Matt Mueller <mattm@chromium.org>
Date: Mon May 08 21:56:46 2017

[base/files] Respect MAC_CHROMIUM_TMPDIR instead of TMPDIR on macOS.

Some folks are currently setting $TMPDIR to very long values on macOS,
which causes socket name length issues when using $TMPDIR.

R=mattm@chromium.org
BUG= 698759 ,  696803 

Review-Url: https://codereview.chromium.org/2770753006
Cr-Commit-Position: refs/heads/master@{#461515}
(cherry picked from commit 16abc4a0f8b419fe03dcfba8799cae52f28993b1)

Review-Url: https://codereview.chromium.org/2867943002 .
Cr-Commit-Position: refs/branch-heads/3029@{#823}
Cr-Branched-From: 939b32ee5ba05c396eef3fd992822fcca9a2e262-refs/heads/master@{#454471}

[modify] https://crrev.com/fc1487fed94d1b5989094220dbbbb80c05841b51/base/files/file_util_mac.mm
[modify] https://crrev.com/fc1487fed94d1b5989094220dbbbb80c05841b51/chrome/browser/extensions/test_extension_dir.cc

Project Member

Comment 51 by bugdroid1@chromium.org, May 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8bd8b3c759f5846a4c28cde8c97b55bd0a424724

commit 8bd8b3c759f5846a4c28cde8c97b55bd0a424724
Author: Matt Mueller <mattm@chromium.org>
Date: Mon May 08 21:58:18 2017

ProcessSingletonPosix: don't CHECK if trying to connect to existing process with too long socket symlink target.

Also don't leave a dangling socket symlink if startup fails because the tmpdir path is too long.

BUG= 698759 

Review-Url: https://codereview.chromium.org/2811173003
Cr-Commit-Position: refs/heads/master@{#464549}
(cherry picked from commit 916edf164d5d8a60473dcdcf46e13ab0a646d7f3)

Review-Url: https://codereview.chromium.org/2869933003 .
Cr-Commit-Position: refs/branch-heads/3029@{#824}
Cr-Branched-From: 939b32ee5ba05c396eef3fd992822fcca9a2e262-refs/heads/master@{#454471}

[modify] https://crrev.com/8bd8b3c759f5846a4c28cde8c97b55bd0a424724/chrome/browser/process_singleton_posix.cc
[modify] https://crrev.com/8bd8b3c759f5846a4c28cde8c97b55bd0a424724/chrome/browser/process_singleton_posix_unittest.cc

Status: Fixed (was: Assigned)
This crash is fixed in latest M58 Stable - 58.0.3029.110

Sign in to add a comment