Issue metadata
Sign in to add a comment
|
Chrome cause system to go out of memory after resuming from sleep
Reported by
darkbaha...@gmail.com,
Sep 10 2017
|
||||||||||||||||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.9 Safari/537.36 Steps to reproduce the problem: 1. Set a hard commit limit (fixed page file size) 2. Sleep computer for several hours with a chrome session running 3. Upon system resume system goes OOM and windows terminates various process to attempt recovery leading to various application crashes. What is the expected behavior? System resumes in the state it went to sleep. What went wrong? My system runs a fixed commit limit of 21GB. 16GB physical RAM with a 5GB page file. On average it runs with about 11GB commit used on the desktop with a large chrome session (~70 tabs) a couple of background applications. As of 3192 when the system is put to sleep and then resumed some time later, chrome appears to try allocating lots of memory to something, and as this system runs a fixed commit limit this causes the system to go OOM and various applications crash as a result. Usually a complete reboot is the only way to recover the system. The amount of time the system is asleep is definitely a factor here. If I only sleep it for 1 hour then I notice a spike on the memory graph, but there is no OOM condition. If I sleep the system overnight (~10 hours sleep time) then the system is always OOM on resume, even though it had more than 10GB of commit free at the time the system went to sleep. If chrome is closed before the system goes to sleep then the issue doesn't occur as the system resumes normally. it only happens if chrome is running when the system enters sleep. Did this work before? Yes 62.0.3188.4 Chrome version: 62.0.3202.9 Channel: dev OS Version: 10.0 Flash Version: issue has been occurring since 62.0.3192.0. Build 3188 appeared to work with no problems, so the issue seems to have come up in that release window.
,
Sep 12 2017
Tested this on Windows-10 chrome version: 62.0.3202.9 as per below test steps: 1. Opened 20/30 tabs in 2 user profiles. 2. Kept the system to sleep. This is being investigated and will resume the laptop from sleep tomorrow and update the result. darkbahamut@: Could you please confirm if any crash is encountered when the system is resumed from sleep. If yes, please attach the crash id from chrome://crashes. Thank you!
,
Sep 12 2017
Hi! Thanks for taking a look. Yes, there are crashes if it's been asleep long enough to go OOM. Sometimes chrome locks up and doesn't seem to generate a crash report for it, but it seem one was sent a couple of days ago. Uploaded Crash Report ID af4fb3e91ed7a302 (Local Crash ID: a08cfb60-e2b3-4b31-ba50-737b26ecfe00) Crash report captured on Saturday, September 9, 2017 at 9:01:40 PM, uploaded on Saturday, September 9, 2017 at 9:05:29 PM I grab this screenshot from last nights memory usage. System was asleep for 8 hours an 5 minutes and woken up this morning. The spike on the graph is the system waking up, the time before that is pre sleep, and after the recovery of the memory usage after. High CPU usage (one core under full load) is observed in the main browser process while the memory usage recovered (but the processes memory usage was normal).
,
Sep 12 2017
Thank you for providing more feedback. Adding requester "sc00335628@techmahindra.com" to the cc list and removing "Needs-Feedback" label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Sep 13 2017
@Reporter: Thank you for providing crash id. Updating appropriate labels for further triaging.
,
Sep 14 2017
@darkbahamut: The Crash I'd which you provided in comment #3 doesn't have meaningful stack trace information. Requesting you to update your'e chrome to the latest dev #63.0.3213.3 and check if you still face the issue? If so please navigate to chrome://crashes and add a sample Crash server I'd for further action on this. This is being investigated on latest dev #63.0.3213.3 and will resume the laptop from sleep after some time and update the behavior in a while. Thanks!!
,
Sep 14 2017
I'm not sure if my chrome had updated to 63.0.3213.3 or not before I put the system to sleep last night (I think it had) but upon waking today the system went right back to OOM and chrome was forcibly terminated by Windows. No crash log was generated, likely because it got killed by Windows rather than crashing. I'm not sure the crash logs will provide much information really, as when it does 'crash' it's only crashing because it runs out of address space. The real issue is whatever is happening to cause the huge spike in memory usage after sleep which leads to the OOM if left long enough. From what I can see the memory usage seems to increase at roughly 900MB/hour while it's sleeping, so on this system more than about 10 hours alseep pushes it to OOM (but any time asleep is enough for the issue to occur).
,
Sep 14 2017
Thank you for providing more feedback. Adding requester "sandeepkumars@chromium.org" to the cc list and removing "Needs-Feedback" label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Sep 15 2017
Unable to reproduce the issue on Win-10 using latest chrome dev version #63.0.3213.3 and the crash id: af4fb3e91ed7a302 (comment#3) doesn't show any meaningful stack trace. Hence, unable to proceed further with crash triaging. Removing the Needs-Bisect label as of now as it is not reproducible from TE-end. Please feel free to add the same if required. Thanks...!!
,
Sep 15 2017
This is an odd one for sure. I've done more testing on this machine and it's definitely chrome related. I also tested on my tablet but it didn't occur there, so it appears it's setup dependent (which isn't going to make working out the cause very easy..). This is where I'm at so far with testing over the past 2 days. Dev (63.0.3213.3) - Issue occurs Canary (63.0.3215.0) - Issue occurs (clean install) Stable (61.0.3163.79) - No issue (clean install) All tested with the same tabs running. So a clean install does nothing, but as soon as I revert to a pre 62.0.3192.0 build then the issue disappears. I've updated a 2nd desktop here to the latest dev now so will keep an eye on that. It's much closer in setup to this PC than the tablet was so I'll see if that shows any signs of the same issue when it gets woken up later today. I've attached 3 screenshots of the memory usage on wake-up from the 3 builds tested above. Filename says which channel and time asleep. Both Dev and Canary show a visible bump in usage, followed by high CPU usage and high I/O on the chrome main process. Stable is completely flat and shows nothing at all.
,
Sep 17 2017
Still getting this issue. Been trying to get more information on it. Taking a look at the elevated CPU usage after resuming, it appears the high CPU usage is in a message loop on the main browser process. A theory is that might be a backlog of messages building up while the system is asleep, which causes the spike in system memory usages on wakeup and the message loop thread is clearing these out which is shown in the recovery of the memory usage and high CPU usage. As to why any of that is happening is still unclear though! Attached a stack trace of the message loop in question while the system was recovering memory usage. Not sure if it provides anything helpful, but more information can't hurt!
,
Oct 11 2017
I've spent some time trying to find the issue. After bisecting the builds I've found the commit which causes the issue. It's this: https://chromium.googlesource.com/chromium/src/+/7be7aceb006f81a3638bba4249c4367ddf659351 Without that commit the memory usage is fine, with it the spikes on resuming are observed. I'm just checking if it's now still present in the very latest canary today to make sure it's not been fixed in the mean time, but the commit above is the source of the problems I observe.
,
Oct 11 2017
Based on C#12 bisect assigning to the suspected CL owner/reviewers for more inputs and debugging of this. Tagging with M-62 RB-Stable for tracking, feel free to remove if this should not be blocking.
,
Oct 11 2017
Assigning to ojan@, as I think he took the lead on GRC. Feel free to dispatch to more appropriate person.
,
Oct 11 2017
assign to myself. This issue should be fixed in https://chromium-review.googlesource.com/c/chromium/src/+/627031
,
Oct 11 2017
@15 It looks like that commit has already been merged to master from what I can see. The issue is still present in every build since 62.0.3189 including Canary 63.0.3236.0 so I don't think that has resolved the issue unfortunately.
,
Oct 11 2017
Maybe let me try to disable the code you pointed to today, and let's see if that help once canary rolls out. :) CC the owner of related metrics collected by that code.
,
Oct 11 2017
,
Oct 11 2017
Add memory expert for insight :)
,
Oct 12 2017
Given that we have a bisect in c#12, and c#15 didn't fix the problem, please revert the CLs in question and investigate further.
,
Oct 16 2017
lpy@ This issue is marked as RB-Stable for M62, could you please let us know is there any latest update available on this issue? Thanks!
,
Oct 16 2017
+oysteine - this is a RB-S bug, that has a bisect pointing to a CL, see c#12. Is there any reason we can't revert and investigate further?
,
Oct 16 2017
I doubt the commit could just reverted outright, it's two months old and has some base functionality that's also used for other features now. We could disable or Finch-gate the actual sending of the EQT from renderer_scheduler_impl.cc though. lpy, have you had any luck reproducing this?
,
Oct 16 2017
I didn't have windows machine to reproduce it, but I can remove the code that collects EQT
,
Oct 16 2017
#24: Great, yep given that this is a stable release blocker that's probably a good idea.
,
Oct 16 2017
I attempted to revert it myself yesterday but as mentioned, given the age there are quite a few conflicts so it's not so easy it seems. I tested the latest Canary last night (63.0.3239.6) and still getting the issue. Right now I'm on 64.0.3241.2 built from source so I'll confirm the issue is still present on this build tonight and I'll try disabling the EQT code tomorrow if so. I think it seems likely that should resolve it (I hope), assuming that won't cause any issues with anything else.
,
Oct 16 2017
Cool, I am going to remove the code here: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc?l=2002 The patch is https://chromium-review.googlesource.com/c/chromium/src/+/721613 Maybe you can try to patch it and compile a chrome and try it :)
,
Oct 16 2017
As discussed with erikchen@, we will continue with our m62 stable push tomorrow. We will monitor for this issue and include fix/revert in our next M62 Respin.
,
Oct 17 2017
,
Oct 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/de184fe0065ceca2a3c8e65f91cc7ac48b7b288c commit de184fe0065ceca2a3c8e65f91cc7ac48b7b288c Author: Peiyong Lin <lpy@chromium.org> Date: Tue Oct 17 19:21:48 2017 Remove EQT plumbing to GRC. The patch that added EQT plumbing to GRC was suspected as a cause of OOM crash when Windows was waken up, thus revert it speculatively. BUG= 763710 Change-Id: I4695328752aa5c9a19f100dbeaaefd2afb1b350b Reviewed-on: https://chromium-review.googlesource.com/721613 Commit-Queue: lpy <lpy@chromium.org> Reviewed-by: Alexander Timin <altimin@chromium.org> Reviewed-by: Timothy Dresser <tdresser@chromium.org> Cr-Commit-Position: refs/heads/master@{#509483} [modify] https://crrev.com/de184fe0065ceca2a3c8e65f91cc7ac48b7b288c/third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc
,
Oct 17 2017
Hello darkbahamut@, could you please help us verify the problem once the new Canary rolls out? :) Btw, could you please also share with us some reproduce steps?
,
Oct 18 2017
Hello! I gave this a test last night and today. I built a 64.0.3241.2 build and put the computer to sleep overnight. Memory issue observed upon waking. I applied the patch above to that branch this morning and then put the system back to sleep while I was at work (~9 hours) and issue is no longer present. I tried it again for another 3.5 hours a sleep and again no issues. I can confirm the patch resolves the issue on my end :) Reproducing the issue is an odd one. On this machine I can reproduce it 100% as long as that code is present. It persists across all channels (beta/dev/canary) and even building from source, so there no issue with the installation. But on a 2nd machine here the issue is not present at all, so there must be something specific it doesn't like about this machines setup. At a bit of a guess it looks like the intention of that code is to check something once per second. It looks like when the system is put to sleep then resumed that the browser notices the time has changed and queues up all the work that should have happened in that time the system wasn't running and tries to do it all at once, instead of discarding the time the system was off. This causes the memory usage to spike up and the message loop goes to 100% CPU load to clear out the work which is the recovery of ht memory usage observed in the images above. The longer the system is a sleep the worse it is. It's somewhere in the region of 900MB/hour usage on an 80 tab session. That's only a bit of a guess from the behaviour observed though. Steps to reproduce on this machine are: 1.Chrome session open with ~80 tabs 2. Running any build with the EQT code/plumbing in place 3. Put system to sleep then wake up at a later point in time. Upon wakeup memory/commit usage will have increased by 900MB/hour and if the system has a fixed commit limit like mine then it possible for the system to go OOM and applications crash on resume. It's possible other machines are affected, but if they run a normal dynamic page file then that will grow to avoid the OOM condition so it may not be as noticeable for some.
,
Oct 18 2017
cc Tim
,
Oct 18 2017
,
Oct 23 2017
,
Oct 24 2017
Was the fix in #30 merged to M62?
,
Oct 24 2017
The fix was not requested for M-62. Lpy@ can you confirm if this is ready for M62? Has this been well tested in lower level channels? Is it a safe merge overall?
,
Oct 24 2017
,
Oct 24 2017
This bug requires manual review: Request affecting a post-stable build Please contact the milestone owner if you have questions. Owners: amineer@(Android), cmasso@(iOS), bhthompson@(ChromeOS), abdulsyed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Oct 24 2017
,
Oct 24 2017
As discussed with lpy@, it's a fairly rare scenario, I recommend we consider this for M63. Removing M62.
,
Oct 25 2017
Your change meets the bar and is auto-approved for M63. Please go ahead and merge the CL to branch 3239 manually. Please contact milestone owner if you have questions. Owners: cmasso@(Android), cmasso@(iOS), gkihumba@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Oct 26 2017
Please merge your change to M63 branch 3239 by 4:00 PM PT, today(Thursday). Thank you.
,
Oct 26 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a0771ddd35addd512f8ae728e4c48177c602d74b commit a0771ddd35addd512f8ae728e4c48177c602d74b Author: Peiyong Lin <lpy@chromium.org> Date: Thu Oct 26 17:37:47 2017 Remove EQT plumbing to GRC. The patch that added EQT plumbing to GRC was suspected as a cause of OOM crash when Windows was waken up, thus revert it speculatively. BUG= 763710 Change-Id: I4695328752aa5c9a19f100dbeaaefd2afb1b350b Reviewed-on: https://chromium-review.googlesource.com/721613 Commit-Queue: lpy <lpy@chromium.org> Reviewed-by: Alexander Timin <altimin@chromium.org> Reviewed-by: Timothy Dresser <tdresser@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#509483}(cherry picked from commit de184fe0065ceca2a3c8e65f91cc7ac48b7b288c) Reviewed-on: https://chromium-review.googlesource.com/739762 Reviewed-by: lpy <lpy@chromium.org> Cr-Commit-Position: refs/branch-heads/3239@{#245} Cr-Branched-From: adb61db19020ed8ecee5e91b1a0ea4c924ae2988-refs/heads/master@{#508578} [modify] https://crrev.com/a0771ddd35addd512f8ae728e4c48177c602d74b/third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc
,
Oct 30 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/bc1a27994f49199f4f193489c5fe36dfb22a3057 commit bc1a27994f49199f4f193489c5fe36dfb22a3057 Author: Peiyong Lin <lpy@chromium.org> Date: Mon Oct 30 18:49:22 2017 Re-add EQT plumbing to GRC. In https://chromium-review.googlesource.com/c/chromium/src/+/737430, we introduced an approach to avoid calculating EQT for long idle period. Thus, we re-add EQT plumbing to GRC in this patch. Previously it was removed in: https://chromium-review.googlesource.com/c/chromium/src/+/721613 BUG= 763710 Change-Id: I511f4ad6ea0f6e35d16d299248053e19f4834d21 Reviewed-on: https://chromium-review.googlesource.com/740013 Commit-Queue: lpy <lpy@chromium.org> Reviewed-by: Alexander Timin <altimin@chromium.org> Cr-Commit-Position: refs/heads/master@{#512556} [modify] https://crrev.com/bc1a27994f49199f4f193489c5fe36dfb22a3057/third_party/WebKit/Source/platform/scheduler/renderer/renderer_scheduler_impl.cc
,
Oct 30 2017
Issue 779391 has been merged into this issue.
,
Oct 30 2017
[Bulk Edit] URGENT - PTAL. M63 Stable promotion is coming soon and your bug is labelled as Stable ReleaseBlock, pls make sure to land the fix and get it merged into the release branch ASAP. Thank you.
,
Nov 1 2017
|
|||||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||||
Comment 1 by ligim...@chromium.org
, Sep 11 2017Components: Internals>Core
Labels: Needs-Triage-M62 Needs-Bisect