Issue metadata
Sign in to add a comment
|
28 or more web workers crashes a tab.
Reported by
apos...@atlassian.com,
Nov 9 2017
|
||||||||||||||||||||||||
Issue description
IMPORTANT: Your crash has already been automatically reported to our crash system. Please file this bug only if you can provide more information about it.
Chrome Version: 62.0.3202.89
Operating System: Linux 4.13.10-200.fc26.x86_64
URL (if applicable) where crash occurred: N/A
The crash is caused by creating 28 or more web workers
Can you reproduce this crash? yes
What steps will reproduce this crash? (If it's not reproducible, what were you doing just before the crash?)
1. The following JS replicates the issue.
note it creates empty blobs for the workers, they do nothing, simply existing is enough to cause the crash.
```
<!DOCTYPE html>
<html><body><script>
worker_count = 28
var blob = new Blob([''], {type : 'text/javascript'});
var empty_worker = window.URL.createObjectURL(blob);
for (i = 0; i <= worker_count; i++){
new Worker(empty_worker);
}
</script></body></html>
```
****DO NOT CHANGE BELOW THIS LINE****
Crash ID: crash/928a265f36678279
,
Nov 9 2017
Yep this is an expected out of memory crash.
,
Nov 9 2017
On windows I can open hundreds of web workers without issue with the same amount of RAM. Why does this only affect the linux version of chrome?
,
Nov 9 2017
Hm, not sure why Linux vs Windows are different here.
,
Nov 9 2017
Is this ticket likely to re-opened? For reference, under linux, with firefox it takes over 500 web workers before it hits an OOM. I really don't think 28 web workers crashing a tab should be expected behaviour. Unless it is an artificial limit that has been put in place.
,
Nov 9 2017
GC timing issue? What happens if you keep created workers like this? Is Windows still able to create hundreds of workers?
var workers = [];
for (i = 0; i <= worker_count; i++) {
workers.push(new Worker(empty_worker));
}
,
Nov 9 2017
That change doesn't seem to make a difference. See screenshots below.
,
Nov 9 2017
Just for completeness, here is the same behaviour on ubuntu Notice how it crashes only after hitting 29.
,
Nov 9 2017
Thank you for the investigation. Then, there would be another factor to make a difference. I ran the same script on powerful Linux machine that has 64-bit arch and 64GB RAM. It can create 28 workers, but cannot create 29 workers. Looks like there is some hard limit regardless of available memory. CC: Memory folks (keishi@ and bashi@) who may know something the limit.
,
Nov 9 2017
Looks like we are running out of virtual memory space. CodeRange::SetUp() returns false. https://cs.chromium.org/chromium/src/v8/src/heap/spaces.cc?type=cs&q=AlignedAllocVirtualMemory&l=121 haraken@: is this by design or is there anything we can do?
,
Nov 9 2017
+Toon +Hannes: Any thoughts? We cannot create more than 28 workers because we run out of virtual memory space at CodeRange::SetUp().
,
Nov 9 2017
,
Nov 9 2017
+jorgelo more strange OOM in v8?
,
Nov 9 2017
Issue 783260 has been merged into this issue.
,
Nov 9 2017
This is by design. We impose an 8GB per-process RLIMIT_DATA limit, and a 16GB per-process RLIMIT_AS limit on Linux. If WebAsm is being used, the RLIMIT_AS limit is raised to 4 TiB. (https://codesearch.chromium.org/chromium/src/services/service_manager/sandbox/linux/sandbox_linux.cc?l=390) The objective of this is to limit heap spraying a little bit. I am not intimately familiar with the Chrome sandbox on Windows, +wfh for details on Windows. If the workers are just reserving address space but not actually mapping anything to those regions, we can try increasing the 16GB limit to 32GB. Is this a theoretical exercise or do you have cases where you want 28+ workers in th same process? That will help with prioritization for this issue, and help inform the address-space-size VS exploitation tradeoff.
,
Nov 9 2017
It's strange that this does not happen on Windows - on Win (64-bit) we hard cap renderer memory limit at 4Gb, which seems below the numbers I see in #15 - perhaps this is a v8 memory limit being hit? Bear in mind if you are developing a website which relies on using this much memory, then you're likely to hit issues on 32-bit clients, which are still quite a sizable population on Windows.
,
Nov 9 2017
Will: is this mapped memory or just reserved address space that's limited at 4GB?
,
Nov 9 2017
On Windows, the 4Gb hard Job-enforced limit is "working set size" which is defined by MS as "subset of process virtual memory that can be accessed without incurring a page fault."
,
Nov 9 2017
That sounds like RLIMIT_DATA, not RLIMIT_AS, which explains why you can create a bunch of workers without a problem -- IIUC the V8 isolates that are created when you create a worker are mostly reserving address space but not populating that virtual memory in a way that "can be accessed without incurring a page fault." That is, of course, until you have the worker actually do something useful. This can be easily tested by increasing the RLIMIT_AS limit to 4 TB and seeing what happens.
,
Nov 9 2017
Here is the bisect result pointing to: https://chromium.googlesource.com/chromium/src/+/27480dd9ef24537450b3776e37cb3fb2d982040d Good#62.0.3202.0 Bad#62.0.3202.62 Change log: https://chromium.googlesource.com/chromium/src/+log/62.0.3202.0..62.0.3202.62?pretty=fuller&n=10000 Thank you!
,
Nov 9 2017
I'm happy to own this but I would like to know if folks are using 28+ workers in one process for legitimate reasons.
,
Nov 9 2017
So I am talking with the team that develops the code that is spawning the web workers to try and understand their use case. (I am not on that team, just discovered the bug). I would point out also that this doesn't affect OSX/macOS either, chrome is able to reach 100s of web workers like windows. Also would it be possible to make this ticket open to the @atlassian email domain, it might speed things up in getting answers. I am also fine with just making the ticket public.
,
Nov 9 2017
So in terms of our use case. The web worker was being spawned every time component was added to the page, and expecting chrome to GC the web worker once the component was removed. We are planning now to refactor the component to use a global pool of web workers. We didn't notice it being an issue before since the hard crash only affected linux (not windows/osx).
,
Nov 9 2017
Also we have observed that chrome is generating SIGILL signals during the web worker crash, as well the expected SIGABRT.
,
Nov 9 2017
opening bug up based on #22
,
Nov 9 2017
Thanks, that's very valuable context. Re: platforms, that is completely expected. This is a Linux restriction that as far as I can tell (I'm not an expert on our sandboxing/security policies on other arches) we're not setting anywhere else. I haven't been able to confirm that this is 100% my hunch from c#19 but I will try to do that tomorrow.
,
Nov 10 2017
Issue 782530 has been merged into this issue.
,
Nov 13 2017
Workers not being terminated/garbage collected when expected is a different (and IMO more serious) kind of bug than opening 28 workers and running out of memory. Can you explain why you expect the worker to be terminated? What is the "component" in comment #23? Could you call WorkerGlobalScope#close() or Worker#terminate() to terminate the worker when done?
,
Nov 13 2017
So component is react component, and my understanding is that it wasn't explicitly calling close or terminate. I believe the expectation was that once the reference to the worker variable went out of scope, the worker would be garbage collected once it finished doing useful work, since it can't receive any new messages. I can get more clarification if that is unclear.
,
Nov 14 2017
Reg c#29: I'm getting to remember how the GC works with workers. An unreferenced worker object is never GC'ed until the owner Document is closed or WorkerGlobalScope#close() is called. See my comment at [1] for details. Therefore, my suggestion at c#6 doesn't make sense, sorry! [1] https://chromium-review.googlesource.com/c/chromium/src/+/584875#message-5dc2f046e540055610222485dc10a9e45bde074d
,
Nov 14 2017
On my workstation, I can go up to 54 workers as per the repro in OP. Granted, this is a beefy workstation but this suggests that we're not hitting an absolute per-process limit (unless some of those 54 workers are being created in a different process). I'm happy to increase the limit arbitrarily, but that sound like kicking the can down the road rather than figuring out the right way to fix this issue. I agree with falken@ that knowing whether properly terminating the workers makes a difference would be useful.
,
Nov 14 2017
In terms of our actual use case, we have moved from using per component workers to a global pool for the entire page so that we have a known fixed size of workers. So for out current issue I think it is resolved. However I/(dev team) didn't realise that workers were not automatically cleaned up after they went out of scope. So the main issue for us still is the varying behaviour across different platforms. I realise that I can't be uniform everywhere. Would it be possible to just return an error when the worker is trying to be created instead of crashing the tab?
,
Nov 14 2017
That's a fair point, thought at this point this becomes a V8 issue IIUC.
,
Dec 8 2017
We also have a legitimate use case for having more than 28 workers opened. Our internal dashboarding system leans heavily on web workers to perform data fetching, data manipulation, etc on a per chart basis, for our streaming/live charts. This works *really* well in terms of performance & memory usage but also it provides us with great architecture to generate very performant, responsive UI that can deal with large amounts of data that also happens to be fault tolerant (transformations are user provided, so they could have bugs). Internally we are 2000+ engineers with 'beefy' machines 16GB memory, the only people having issues are Linux users.
,
Dec 8 2017
Might be worth mentioning that Firefox in Linux does not have this issue.
,
Dec 11 2017
There are two aspects to this bug: -Figuring out the right RLIMIT_DATA limits. -Having V8 return something reasonable when it cannot create a worker (and avoid crashing the process). I'll keep this bug for the discussion on limits, and have filed issue 793825 to track not crashing the process.
,
Jan 18 2018
Issue 795871 has been merged into this issue.
,
Jan 18 2018
Issue 776420 has been merged into this issue.
,
Jan 18 2018
Issue 761845 has been merged into this issue.
,
Feb 1 2018
I don't have cycles to work on this until remediation work for Spectre/Meltdown dies down. If somebody wants to take it, they should. I still think the V8 fix is the most important thing here: failing webworker creation because of this would be totally in spec and would prevent the crashes.
,
Feb 8 2018
,
Apr 12 2018
Issue 832013 has been merged into this issue.
,
Apr 26 2018
In Isolate::Init at line 2944: "The initialization process does not handle memory exhaustion." https://cs.chromium.org/chromium/src/v8/src/isolate.cc?q=isolate.cc&dr&l=2944 AlwaysAllocateScope is at https://cs.chromium.org/chromium/src/v8/src/heap/heap-inl.h?l=561 Which is presumably getting an error during the heap setup at https://cs.chromium.org/chromium/src/v8/src/heap/heap.cc?type=cs&q=SetUp&l=4613 Which is returning false all the way from CodeRange->Setup at https://cs.chromium.org/chromium/src/v8/src/heap/spaces.cc?type=cs&q=AlignedAllocVirtualMemory&l=121
,
Jun 13 2018
hpayer: Is there someone on V8 who could look into the fix to avoid crashing out if we don't have enough memory to create a new Worker? There are a load of sites that OOM-crash due on machines with huge #s of cores, thanks to (ab)using WASM+Workers to do crypto-mining - see issue 851626.
,
Jul 5
Hi Rodrigo, this is the bug we were talking about. Let's figure out how to integrate this into the external memory reporting mechanism.
,
Jul 5
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by dtapu...@chromium.org
, Nov 9 2017