Large virtual memory usage causing crash in Chrome and Chromium
Reported by
larskuls...@gmail.com,
May 15 2018
|
|||||||||||||||
Issue descriptionChrome Version (from the about:version page): 66.0.3359.170 (Official Build) (64-bit) Is this the most recent version: yes OS + version: Linux Mint 18.3 CPU architecture (32-bit / 64-bit): 64 bit Window manager: Xfce URLs (if relevant): https://bvaughn.github.io/redux-search/ Behavior in Linux Chrome and Chromium: I am seeing large VIRT-memory usage on this demo when using Chrome or Chromium on Linux: https://bvaughn.github.io/redux-search/ Each time you click "Generate Data", about 1 GB of additional VIRT memory is consumed, and the tab crashes when it reaches 16 GB on a few machines I've tried. According to the Chrome task manager, the memory usage of the tab is orders of magnitude less, but I think this is of little help since the problem seems to affect virtual memory. This does not happen in Firefox or Chrome/Firefox in Windows. I have tried this on Chrome 66.0.3359.170 and Chromium 66.0.3359.139 on Linux Mint 18.3, and Chrome 65.0.3325.181 on CentOS 7.4.1708 Behavior in Linux Firefox and Windows Chrome: Memory usage virt/res grows, but not in the manner that Chrome/Chromium in Linux does. What steps will reproduce the problem? (1) Visit https://bvaughn.github.io/redux-search/ (2) Watch memory usage using htop or similar tool (3) Each click on "Generate Data" will grow the VIRT-memory by about 1 GB, until the tab crashes. Took about 14 clikcs in my case. What is the expected result? Memory usage should perhaps grow, but not this fast. Firefox and Chrome on Windows will eventually grow to this point, but after a very long time. What happens instead? Each time you click "Generate Data", about 1 GB of additional VIRT memory is consumed, and the tab crashes when it reaches 16 GB on a few machines I've tried. Here is a gif of what I see (please excuse the framerate as the machine starts struggling as I click): https://user-images.githubusercontent.com/827702/39926823-afb9b210-5530-11e8-833e-9551578b66b7.gif
,
May 15 2018
What type of storage in the machine? HDD or SSD!
,
May 15 2018
,
May 15 2018
n my case, there is no swapping at all, and I think it may be unrelated to this issue.
,
May 15 2018
,
May 15 2018
I managed to make some code that will reproduce the problem. I'm not sure if this is minimal or not, but it removes the need for the external library.
////// JAVASCRIPT
/*
* Have a web worker loaded from a string.
* Make a couple of instances of this in a class
* Every so often overwrite the instances with new ones.
*/
// Worker
var SearchWorkerStr = function() {
var alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
var chars = "a";
for (let k = 0; k < 10; k++) {
chars += chars+chars;
}
onmessage = (ev) => {
if (ev.data === "search") {
console.log("searching...");
// Generate some result
let fakeRes = [];
for (let i = 0; i < 100; i++) {
let id = ""
for (var j = 0; j < 37; j++) {
id += alpha.charAt(Math.floor(Math.random() * alpha.length));
}
fakeRes.push({
id: id,
name: "Bob",
title: "Sales"
});
}
// Send to browser
postMessage({
"results": fakeRes,
"id": "unique"
});
}
};
}.toString();
class Searcher {
constructor(WorkerStr) {
// blob stuff, load worker from string
SearchWorker = WorkerStr.slice(WorkerStr.indexOf('{') + 1, -1).trim();
var blob = new Blob([SearchWorker])
var blobUrl = window.URL.createObjectURL(blob)
this._worker = new Worker(blobUrl)
this._worker.onmessage = function(event) {
// Result
console.log("result:", event.data.results.length);
}
}
doWork = (str) => {
this._worker.postMessage(str);
}
}
class Parent {
constructor() {
this.registry = {};
}
addSearcher(name){
this.registry[name] = new Searcher(SearchWorkerStr);
}
}
// Browser
let pInst = new Parent();
// Click
setInterval(() => {
pInst.addSearcher("a")
pInst.addSearcher("b")
pInst.registry["a"].doWork("search")
pInst.registry["b"].doWork("search")
}, 1000);
////// END JAVASCRIPT
,
May 16 2018
> What type of storage in the machine? HDD or SSD! SSD
,
May 16 2018
Able to reproduce this issue on reported version 66.0.3359.170 and latest build 68.0.3431.0 using Ubuntu 17.10 with link given in comment#0. NOTE: 1. In M66 on clicking upto 16 times crashed the tab where as in M60 and M68 35 clicks triggered the crash. 2. In Mac and Windows tried clicking for almost 50 times and no crash is seen but memory in task manager increased. As this issue is seen from M60 considering this issue as Non-Regression and marking as Untriaged. Thanks!
,
May 23 2018
to memory sheriff for further triage.
,
May 23 2018
Interesting, I Can reproduce.
,
May 23 2018
The leak in the JS code stems from workers not being terminated, but the impact is much higher on Linux.
,
May 25 2018
Slightly modified repro:
var SearchWorkerStr = function() {
var alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
var chars = "a";
for (let k = 0; k < 10; k++) {
chars += chars+chars;
}
onmessage = (ev) => {
if (ev.data === "search") {
console.log("searching...");
// Generate some result
let fakeRes = [];
for (let i = 0; i < 100; i++) {
let id = ""
for (var j = 0; j < 37; j++) {
id += alpha.charAt(Math.floor(Math.random() * alpha.length));
}
fakeRes.push({
id: id,
name: "Bob",
title: "Sales"
});
}
// Send to browser
postMessage({
"results": fakeRes,
"id": "unique"
});
}
};
}.toString();
class Searcher {
constructor(WorkerStr) {
// blob stuff, load worker from string
var SearchWorker = WorkerStr.slice(WorkerStr.indexOf('{') + 1, -1).trim();
var blob = new Blob([SearchWorker])
var blobUrl = window.URL.createObjectURL(blob)
this._worker = new Worker(blobUrl)
this._worker.onmessage = function(event) {
// Result
console.log("result:", event.data.results.length);
}
}
doWork(str) {
this._worker.postMessage(str);
}
}
class Parent {
constructor() {
this.registry = {};
}
addSearcher(name){
this.registry[name] = new Searcher(SearchWorkerStr);
}
}
// Browser
let pInst = new Parent();
// Click
setInterval(() => {
pInst.addSearcher("a")
pInst.addSearcher("b")
pInst.registry["a"].doWork("search")
pInst.registry["b"].doWork("search")
}, 1000);
</script>
We are not finalizing the Worker. Looking into why...
,
May 25 2018
Since the worker is not explicitly closing itself and not explicitly terminated it will be retained forever. Looking at the spec, they should actually get collected if there is no reference holding them alive.
I got terminate() to work by butting add the end of setInterval the termination statements:
setInterval(() => {
pInst.addSearcher("a")
pInst.addSearcher("b")
pInst.registry["a"].doWork("search")
pInst.registry["b"].doWork("search")
pInst.registry["a"].terminate()
pInst.registry["a"] = null
pInst.registry["b"].terminate()
pInst.registry["b"] = null
}, 1000);
I did not get close() to work.
To collect a worker DedicatedWorkerMessagingProxy::HasPendingActivity() has to be false which is only the case when it was explicitly killed.
Kentaro, can you have a look and forward it to the folks working on Worker?
One more thing I noticed, if the main thread creates a lot of works but does not have much allocation pressure workers will never get collected (independent of close/terminating them manually). Should we call AdjustAmountOfExternalAllocatedMemory here? This is funky: an isolate keeps another isolate alive :)
,
May 25 2018
An expected behavior is that when the worker object is collected in the main thread, the main thread notifies the worker. Then the worker shuts down. However, this does not happen if HasPendingActivity returns true. I think that's the problem of this bug... nhiroki: Would you mind taking a look? > One more thing I noticed, if the main thread creates a lot of works but does not have much allocation pressure workers will never get collected (independent of close/terminating them manually). Should we call AdjustAmountOfExternalAllocatedMemory here? This is funky: an isolate keeps another isolate alive :) Yeah, you're right... I agree that we should call AdjustExternalMemory(a common size of one isolate and one context).
,
Jun 11 2018
,
Jul 4
Upping the priority since issue 848674 is a P1 (currently by far the most significant crasher on linux)
,
Jul 4
,
Jul 5
haraken@, mlippautz@: Regarding c#13 and c#14, this seems intentional according to the following CL. Do you think we need to resurrect a mechanism to precisely track the number of active tasks on workers? "Keep WorkerGlobalScope wrapper alive as long as its context is alive" https://chromium-review.googlesource.com/c/chromium/src/+/584875#message-5dc2f046e540055610222485dc10a9e45bde074d > Looking at the spec, they should actually get collected if there is no reference holding them alive. To be more preceise, a worker needs to be alive as far as active tasks are running on that (protected worker) even if a worker object in the main thread is collected. "A worker is said to be a protected worker if it is an active needed worker and either it has outstanding timers, database transactions, or network connections, or its list of the worker's ports is not empty, or its WorkerGlobalScope is actually a SharedWorkerGlobalScope object (i.e. the worker is a shared worker)." https://html.spec.whatwg.org/multipage/workers.html#protected-worker "22. Closing orphan workers: Start monitoring the worker such that no sooner than it stops being a protected worker, and no later than it stops being a permissible worker, worker global scope's closing flag is set to true." https://html.spec.whatwg.org/multipage/workers.html#worker-processing-model
,
Jul 5
Yeah, that was an intentional change, but it seems like it was not a right change. To implement the spec correctly, yeah, we'll need to implement the "closing orphan workers" mechanism...
,
Jul 13
,
Jul 13
nhiroki@, could you provide an update on this issue? As mentioned in c#16, this issue is blocking issue 848674, which is currently by far the most significant crasher on linux.
,
Jul 31
Hi folks, any plan to move this forward?
,
Aug 7
Friendly ping get an update as it is blocking another issue#848674. Kindly take a look and update the thread accordingly. Thanks..!
,
Aug 8
Sorry for my late reply. It's on my list to implement the "closing orphan workers" mechanism, but I haven't started it yet. I'll take a look at this tomorrow. By the way, regarding issue 848674, I'm still not sure why the crash rate has increased on M67 because the suspicious change (c#18) was landed in M62. Also, the crash rate is not so high on M68[1]. There could be another factor to raise it...? [1] https://crash.corp.google.com/browse?q=product_name%3D%27Chrome_Linux%27+AND+expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27v8%3A%3Ainternal%3A%3AIsolate%3A%3AInit%27#-property-selector,-productname:1000,+productversion:1010,-magicsignature:50,-magicsignature2:50,-stablesignature:50,-magicsignaturesorted:50
,
Aug 30
This just came up today again and really seems to be hurting us these days. I cannot exactly recall the reasoning back then. It might have been (a) in-flight events, or (b) some missing wrapper tracing reference. I guess for in-flight event, i.e., events that have been sent but not yet dispatched to a listener we lack a tracking mechanism?
,
Aug 31
nhiroki: Can we consider prioritizing this work? :)
,
Sep 3
,
Sep 3
haraken: Filed a dedicated issue (issue 879989). I'm now targeting M71. |
|||||||||||||||
►
Sign in to add a comment |
|||||||||||||||
Comment 1 by illia.si...@gmail.com
, May 15 2018