Issue metadata
Sign in to add a comment
|
Crash: ProcessSingleton::NotifyOtherProcessWithTimeout |
||||||||||||||||||||||
Issue descriptionCrash Signature: ProcessSingleton::NotifyOtherProcessWithTimeout Process Type: Browser Platform: Mac Channel: Canary Version: 59.0.3030.0 Distinct Clients: 8 CPM: 1.99 Crash Reports: 21 Median Uptime: startup Infected Clients: 0.0% Sample Reports: https://crash.corp.google.com/browse?q=reportid=%27139e699e20000000%27 https://crash.corp.google.com/browse?q=reportid=%27648846b300000000%27 https://crash.corp.google.com/browse?q=reportid=%2770550eb300000000%27 https://crash.corp.google.com/browse?q=reportid=%2784e226b300000000%27 https://crash.corp.google.com/browse?q=reportid=%27de2b93a480000000%27 Crash Link: https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_Mac%27%20AND%20product.version%3D%2759.0.3030.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27ProcessSingleton%3A%3ANotifyOtherProcessWithTimeout%27 Crash Link (with version impact distribution): https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_Mac%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27ProcessSingleton%3A%3ANotifyOtherProcessWithTimeout%27 Crash Stacktrace: EXC_BREAKPOINT (0x101c0711f) #0 0x101c0711f in ProcessSingleton::NotifyOtherProcessWithTimeout chrome/browser/process_singleton_posix.cc:223 #1 0x101c077bc in ProcessSingleton::NotifyOtherProcessWithTimeoutOrCreate chrome/browser/process_singleton_posix.cc:885 #2 0x101c07769 in ProcessSingleton::NotifyOtherProcessOrCreate chrome/browser/process_singleton_posix.cc:874 #3 0x1018b1839 in ChromeBrowserMainParts::PreMainMessageLoopRunImpl chrome/browser/chrome_browser_main.cc:1535 #4 0x1018b149d in ChromeBrowserMainParts::PreMainMessageLoopRun chrome/browser/chrome_browser_main.cc:1242 #5 0x100786742 in content::BrowserMainLoop::PreMainMessageLoopRun content/browser/browser_main_loop.cc:1166 #6 0x100ab2085 in content::StartupTaskRunner::RunAllTasksNow base/callback.h:85 #7 0x100784a21 in content::BrowserMainLoop::CreateStartupTasks content/browser/browser_main_loop.cc:974 #8 0x1007896d3 in content::BrowserMainRunnerImpl::Initialize content/browser/browser_main_runner.cc:125 #9 0x100782473 in content::BrowserMain content/browser/browser_main.cc:42 #10 0x10186bb4f in content::ContentMainRunnerImpl::Run content/app/content_main_runner.cc:836 #11 0x10186ae65 in content::ContentMain content/app/content_main.cc:20 #12 0x1002dda9e in ChromeMain chrome/app/chrome_main.cc:121 #13 0x100260d99 in Google Chrome Canary+0xd99 #14 0x7fffb212c254 in start #15 0x7fffb212c254 in start Reporter: ajha
,
Mar 6 2017
Users experienced this crash on the following builds: Mac Canary 59.0.3030.0 - 1.96 CPM, 21 reports, 8 clients (signature ProcessSingleton::NotifyOtherProcessWithTimeout) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Mar 6 2017
Issue 698620 has been merged into this issue.
,
Mar 7 2017
sky@: Could you please take a look at these crashes and confirm if the suspected CL is related. This has been amongst the top browser crash on Mac canary.
,
Mar 7 2017
It seems unlike this is related to my CL. My CL should only change before for chrome --mash (which doesn't run on Mac). It does touch a posix file, but only in code that installs signal handlers that run at shutdown, and only in response to friendly signals (not crash exceptions). Also, my code runs in PostMainMessageLoopRun() but the crash looks like it happens before that, in PreMessageLoopRun(). I didn't see anything obvious in the change list, though. :-(
,
Mar 7 2017
Looks like the crash existed in old milestones as well, the reports are same in M54 and M59. Unfortunately no changes in process_singleton_posix.cc since the last few months too. Below is the historical data of crash rate. 59.0.3033.0 7.69% 7 59.0.3032.0 26.37% 24 59.0.3030.0 24.18% 22 58.0.3029.3 3.30% 3 58.0.3029.0 18.68% 17 56.0.2924.87 5.49% 5 55.0.2883.95 6.59% 6 54.0.2840.98 5.49% 5 54.0.2840.71 2.20% 2 OS === 10.12 (Sierra) 60.44% 55 10.11 (El Capitan) 17.58% 16 OSX ??? - 0.0.0 13.19% 12 10.10 (Yosemite) 8.79% 8 Majority of the crashes are from Sierra. Only possible suspect from code search is - https://chromium.googlesource.com/chromium/src/+/a92250eb824c4f23ca2dab70d3029a52650591c1 , which is a pretty old CL.Assigning to Matt, please feel to unassign if its not yours. Also looping to Stability sheriff for further investigation.
,
Mar 7 2017
These are all hitting the max socket length check in process_singleton_posix.cc: CHECK(path.length() < arraysize(addr->sun_path)) Nothing has changed there recently, any correlation with chrome versions must be purely coincidental. Looking at some minidumps, it appears to be people with a non-standard TMPDIR variable that is longer than the default. So far they all look like "/Users/<FOO>/Library/Application Support/Slack/temp". I wonder if this occurs for people who launch chrome from Slack? (If chrome is already running beforehand, then clicking a link in slack would not trigger it.) There are some simple changes we could do to reduce the length of the parts of the socket path that we control, and make this harder to hit. Not sure if there is anything we could/should do about chrome getting launched from other apps that have set a custom TMPDIR variable.
,
Mar 7 2017
Note for that particular pattern of Slack TMPDIR: With canary would hit the CHECK with username of 9 or more chars, but with other channels would need a username of 16 or more chars. I suspect that is why it is showing up on Canary mostly.
,
Mar 8 2017
This CL recently landed to start consulting the use of $TMPDIR: https://codereview.chromium.org/2713293002/
,
Mar 10 2017
Just to update, Crashes are seen only on the canary channel builds and no crashes seen on the latest Dev(58.0.3029.6). From the regression range: =========================== https://chromium.googlesource.com/chromium/src/+log/58.0.3028.0..58.0.3029.0?pretty=fuller&n=10000 Could this https://codereview.chromium.org/2727003002 also be related change. cc'ing miguelg@ as well.
,
Mar 15 2017
,
Mar 15 2017
(stability sheriff here) It sounds as this fully breaks Chrome for some users. This is likely a case where the metric will "improve" on its own if we do nothing, which would be bad. #7 seems to say there are some simple things we could try?
,
Mar 15 2017
I'll remove the stability sheriff label as the issue is understood and being looked at.
,
Mar 15 2017
Re: #12, my suggestion in #7: So the socket path looks something like: $TMPDIR/<ScopedTempDir>/<socketname> Currently that works out to something like: $TMPDIR/.com.google.Chrome.canary.q87kKe/SingletonSocket So we can easily shave some length off by shortening the socket name. The ScopedTempDir name should also be possible to shorten though it doesn't currently have an API for that so would be a slightly more involved code change. These doesn't completely solve the issue but would make it less likely to be hit. I'll make a CL first for the socket name change since that is easy enough.
,
Mar 15 2017
Oh, wow, I'm just seeing this now >_<. I'll coordinate with mattm.
,
Mar 15 2017
Read the bug... the approach suggested in #14 sgtm, fwiw. Would it make sense to make ScopedTempDir used a fixed-size string for scope? Maybe a truncated hash of the app id?
,
Mar 15 2017
The other thing we could do would be to take the shorter of NSTemporaryDirectory() and $TMPDIR. That would fix the issue here, and continue to allow infra to manage a path for $TMPDIR (the plan it to have it be even shorter than NSTemporaryDirectory()). The downside is, of course, that this behavior would be a bit weirdo/unpredictable.
,
Mar 17 2017
,
Mar 17 2017
,
Mar 19 2017
Users experienced this crash on the following builds: Mac Canary 59.0.3045.0 - 1.53 CPM, 6 reports, 5 clients (signature ProcessSingleton::NotifyOtherProcessWithTimeout) Mac Beta 58.0.3029.19 - 0.10 CPM, 1 reports, 1 clients (signature ProcessSingleton::Create) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Mar 21 2017
Latest behaviour of crashes on latest channels are as below.This crash is still seeing on latest dev and beta channels. 59.0.3046.0 7.89% 28 59.0.3045.0 2.25% 8 59.0.3044.0 5.63% 20 59.0.3043.0 8.17% 29 59.0.3042.0 7.89% 28 Link to the list of builds https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_Mac%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27ProcessSingleton%3A%3ANotifyOtherProcessWithTimeout%27#samplereports:5,productversion:1000 Changing this to M-59 as no crashes seen on latest beta. iannucci@ Could you please look into this stable blocker issue. Thanks,
,
Mar 22 2017
Fwiw, on my CL thakis suggested an alternative approach: https://codereview.chromium.org/2751123003/#msg11 "- use CHROMIUM_TEMPDIR_FOR_BOTS if it's set - use NSTemporaryDirectory() if not as before - set CHROMIUM_TEMPDIR_FOR_BOTS on bots That way, we don't change behavior in prod and probably dodge a whole list of issues, and bots still get to override this dir."
,
Mar 24 2017
That sgtm
,
Mar 24 2017
Should I make that change?
,
Mar 24 2017
Commented on CL in #22
,
Mar 25 2017
CL to use different envvar: https://codereview.chromium.org/2770753006
,
Mar 31 2017
Hi, Any updates on this issue? I still can not use Canary.
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/16abc4a0f8b419fe03dcfba8799cae52f28993b1 commit 16abc4a0f8b419fe03dcfba8799cae52f28993b1 Author: iannucci <iannucci@chromium.org> Date: Mon Apr 03 20:04:32 2017 [base/files] Respect MAC_CHROMIUM_TMPDIR instead of TMPDIR on macOS. Some folks are currently setting $TMPDIR to very long values on macOS, which causes socket name length issues when using $TMPDIR. R=mattm@chromium.org BUG= 698759 , 696803 Review-Url: https://codereview.chromium.org/2770753006 Cr-Commit-Position: refs/heads/master@{#461515} [modify] https://crrev.com/16abc4a0f8b419fe03dcfba8799cae52f28993b1/base/files/file_util_mac.mm [modify] https://crrev.com/16abc4a0f8b419fe03dcfba8799cae52f28993b1/chrome/browser/extensions/test_extension_dir.cc
,
Apr 3 2017
Hey raja48a, the CL to change to the new envvar (and stop respecting $TMPDIR) has just landed on master, so it should show up in the next Canary, I suspect.
,
Apr 3 2017
Awesome, thanks so much!
,
Apr 10 2017
Can we confirm that this is fixed now?
,
Apr 11 2017
Crash server still shows few crashes on Mac canary. Last Mac canary(59.0.3067.0) reported 11 crashes from 7 clients. raja48a@: Could you please confirm if the canary now works for you?
,
Apr 11 2017
Downloading the latest Canary, while maintaining the old profile I have on disc, does not work. Canary keeps bouncing off the dock, but never opens. When I delete the entire "Chrome Canary" folder under ~/Library/Application Support/Google, Canary works.
,
Apr 11 2017
I wonder if there's some persistent gunk on machines which experienced this, then? With the change in 16abc4a0f8b419fe03dcfba8799cae52f28993b1 the behavior should be exactly equivalent to how it was before the original change.
,
Apr 11 2017
Oh, yeah :( When ProcessSingleton starts, if there is no running process, it creates the SingletonSocket symlink in the user-data-dir before attempting to create the actual socket file. If creating the socket hits the path length CHECK, the dangling symlink will be left there. If there already is a symlink there on startup, ProcessSingleton will try to connect to it to start a new window. Normally if there isn't actually a chrome running there it would delete the symlink and create a new one, but in this case it hits the max path length CHECK again before getting to that point. So the workaround is just to delete SingletonSocket symlink from your user data dir. There are at least two things that could be fixed in the code: 1. Avoid leaving the garbage symlink in the first place: Try to create the socket before creating the symlink to it. 2. Recover users stuck in that state: When trying to connect to an existing socket, if the path is too long, just delete it instead of failing on a CHECK(). I guess I can take that.
,
Apr 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/916edf164d5d8a60473dcdcf46e13ab0a646d7f3 commit 916edf164d5d8a60473dcdcf46e13ab0a646d7f3 Author: mattm <mattm@chromium.org> Date: Thu Apr 13 21:26:51 2017 ProcessSingletonPosix: don't CHECK if trying to connect to existing process with too long socket symlink target. Also don't leave a dangling socket symlink if startup fails because the tmpdir path is too long. BUG= 698759 Review-Url: https://codereview.chromium.org/2811173003 Cr-Commit-Position: refs/heads/master@{#464549} [modify] https://crrev.com/916edf164d5d8a60473dcdcf46e13ab0a646d7f3/chrome/browser/process_singleton_posix.cc [modify] https://crrev.com/916edf164d5d8a60473dcdcf46e13ab0a646d7f3/chrome/browser/process_singleton_posix_unittest.cc
,
Apr 18 2017
Just to update, fix first landed on Mac canary version: 59.0.3071.0 and apart from 60.0.3072.0(single digit crash), there have been no crashes seen on recent canary builds. Tracking this for few more canary before any conclusion or adding the verified label here.
,
Apr 25 2017
No crashes observed on chrome latest dev and canary channels. Last crash observed in M59 is #59.0.3070.0 with 8 instances and in M60 is #60.0.3072.0 with 1 instance. mattm@ Could you please confirm is this issue is fixed completely? If yes, can we add the verification label for this issue. Thanks!
,
Apr 25 2017
The crash on 60.0.3072.0 is different, it's failing on socket(). Possibly due to too many files open? Anyway, unrelated to this.
,
Apr 25 2017
[Auto-generated comment by a script] We noticed that this issue is targeted for M-59; it appears the fix may have landed after branch point, meaning a merge might be required. Please confirm if a merge is required here - if so add Merge-Request-59 label, otherwise remove Merge-TBD label. Thanks.
,
Apr 25 2017
Looks to me like it made the branch.
,
May 3 2017
Issue 717869 has been merged into this issue.
,
May 4 2017
This crash is still seeing on M58 as below.on latest stable 58.0.3029.96 so far seeing 236 from 114 different clients. 58.0.3029.96 14.30% 236 58.0.3029.81 43.45% 717 58.0.3029.54 0.06% 1 58.0.3029.41 0.30% 5 58.0.3029.3 0.18% 3 58.0.3029.0 1.03% 17 mattm@ could you please look into this issue. Thanks,
,
May 5 2017
Oh. Somehow I didn't notice 95fcf47ca01054ffe38e8609525a6ca8579b7824 was in M58 also :/ (Maybe because the bug was marked M-59?) I guess we should merge 16abc4a0f8b419fe03dcfba8799cae52f28993b1 and 916edf164d5d8a60473dcdcf46e13ab0a646d7f3 to M58.
,
May 5 2017
The crash seems to be fixed no reports after- 60.0.3072.0, currently 60.0.3090.0 is in production. Crash rate in latest M58 - 58.0.3029.96 ,28.46% 576 reports with 201 instances and ranks #2.Looks like we may need take this merge in case of any future respin.
,
May 5 2017
mattm@, are we having enough automation test coverage for CLs listed at #45 and will be absolutely safe merge to M58?
,
May 5 2017
Is "automation test" a distinct kind of test or just a general term? The ProcessSingletonPosix changes have reasonable unittest coverage.
,
May 8 2017
Thanks for confirming over chat mattm@. Confirmed it's a safe merge, with enough automated testing, and that this affects Linux and Mac. Approving merge for M58. + bernie@ as this may impact ChromeOS as well.
,
May 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/fc1487fed94d1b5989094220dbbbb80c05841b51 commit fc1487fed94d1b5989094220dbbbb80c05841b51 Author: Matt Mueller <mattm@chromium.org> Date: Mon May 08 21:56:46 2017 [base/files] Respect MAC_CHROMIUM_TMPDIR instead of TMPDIR on macOS. Some folks are currently setting $TMPDIR to very long values on macOS, which causes socket name length issues when using $TMPDIR. R=mattm@chromium.org BUG= 698759 , 696803 Review-Url: https://codereview.chromium.org/2770753006 Cr-Commit-Position: refs/heads/master@{#461515} (cherry picked from commit 16abc4a0f8b419fe03dcfba8799cae52f28993b1) Review-Url: https://codereview.chromium.org/2867943002 . Cr-Commit-Position: refs/branch-heads/3029@{#823} Cr-Branched-From: 939b32ee5ba05c396eef3fd992822fcca9a2e262-refs/heads/master@{#454471} [modify] https://crrev.com/fc1487fed94d1b5989094220dbbbb80c05841b51/base/files/file_util_mac.mm [modify] https://crrev.com/fc1487fed94d1b5989094220dbbbb80c05841b51/chrome/browser/extensions/test_extension_dir.cc
,
May 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8bd8b3c759f5846a4c28cde8c97b55bd0a424724 commit 8bd8b3c759f5846a4c28cde8c97b55bd0a424724 Author: Matt Mueller <mattm@chromium.org> Date: Mon May 08 21:58:18 2017 ProcessSingletonPosix: don't CHECK if trying to connect to existing process with too long socket symlink target. Also don't leave a dangling socket symlink if startup fails because the tmpdir path is too long. BUG= 698759 Review-Url: https://codereview.chromium.org/2811173003 Cr-Commit-Position: refs/heads/master@{#464549} (cherry picked from commit 916edf164d5d8a60473dcdcf46e13ab0a646d7f3) Review-Url: https://codereview.chromium.org/2869933003 . Cr-Commit-Position: refs/branch-heads/3029@{#824} Cr-Branched-From: 939b32ee5ba05c396eef3fd992822fcca9a2e262-refs/heads/master@{#454471} [modify] https://crrev.com/8bd8b3c759f5846a4c28cde8c97b55bd0a424724/chrome/browser/process_singleton_posix.cc [modify] https://crrev.com/8bd8b3c759f5846a4c28cde8c97b55bd0a424724/chrome/browser/process_singleton_posix_unittest.cc
,
May 8 2017
,
May 9 2017
This crash is fixed in latest M58 Stable - 58.0.3029.110 |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by ajha@chromium.org
, Mar 6 2017Labels: -Type-Bug ReleaseBlock-Stable M-58 OS-Mac Type-Bug-Regression
Owner: sky@chromium.org
Status: Assigned (was: Untriaged)