New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 782982 link

Starred by 23 users

Issue metadata

Status: Duplicate
Merged: issue 800348
Owner:
Last visit > 30 days ago
Closed: Jul 5
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug



Sign in to add a comment

28 or more web workers crashes a tab.

Reported by apos...@atlassian.com, Nov 9 2017

Issue description

IMPORTANT: Your crash has already been automatically reported to our crash system. Please file this bug only if you can provide more information about it.


Chrome Version: 62.0.3202.89
Operating System: Linux 4.13.10-200.fc26.x86_64

URL (if applicable) where crash occurred: N/A
The crash is caused by creating 28 or more web workers

Can you reproduce this crash? yes

What steps will reproduce this crash? (If it's not reproducible, what were you doing just before the crash?)
1. The following JS replicates the issue.
note it creates empty blobs for the workers, they do nothing, simply existing is enough to cause the crash.
```
<!DOCTYPE html>
<html><body><script>
worker_count = 28
var blob = new Blob([''], {type : 'text/javascript'});
var empty_worker = window.URL.createObjectURL(blob);
for (i = 0; i <= worker_count; i++){
  new Worker(empty_worker);
}
</script></body></html>
```

****DO NOT CHANGE BELOW THIS LINE****
Crash ID: crash/928a265f36678279

 
Components: Blink>Workers
Status: WontFix (was: Unconfirmed)
Yep this is an expected out of memory crash.
On windows I can open hundreds of web workers without issue with the same amount of RAM. Why does this only affect the linux version of chrome?
Hm, not sure why Linux vs Windows are different here.
Is this ticket likely to re-opened? 

For reference, under linux, with firefox it takes over 500 web workers before it hits an OOM. 

I really don't think 28 web workers crashing a tab should be expected behaviour. Unless it is an artificial limit that has been put in place.
Status: Untriaged (was: WontFix)
GC timing issue? What happens if you keep created workers like this? Is Windows still able to create hundreds of workers?

var workers = [];
for (i = 0; i <= worker_count; i++) {
  workers.push(new Worker(empty_worker));
}
That change doesn't seem to make a difference. See screenshots below. 
Screenshot from 2017-11-09 13-40-39.png
228 KB View Download
Screenshot from 2017-11-09 13-42-37.png
376 KB View Download
Just for completeness, here is the same behaviour on ubuntu
Notice how it crashes only after hitting 29. 
Screenshot from 2017-11-09 14-23-56.png
327 KB View Download
Screenshot from 2017-11-09 14-24-24.png
280 KB View Download
Cc: keishi@chromium.org bashi@chromium.org
Components: Blink>MemoryAllocator
Thank you for the investigation. Then, there would be another factor to make a difference. I ran the same script on powerful Linux machine that has 64-bit arch and 64GB RAM. It can create 28 workers, but cannot create 29 workers. Looks like there is some hard limit regardless of available memory.

CC: Memory folks (keishi@ and bashi@) who may know something the limit.
Cc: haraken@chromium.org
Looks like we are running out of virtual memory space. CodeRange::SetUp() returns false.

https://cs.chromium.org/chromium/src/v8/src/heap/spaces.cc?type=cs&q=AlignedAllocVirtualMemory&l=121

haraken@: is this by design or is there anything we can do?
Cc: hpayer@chromium.org verwa...@chromium.org
+Toon +Hannes: Any thoughts? We cannot create more than 28 workers because we run out of virtual memory space at CodeRange::SetUp().


Cc: thomasanderson@chromium.org
Labels: M-64 OS-Linux Pri-2 Type-Bug
Cc: jorgelo@chromium.org
+jorgelo more strange OOM in v8?
Issue 783260 has been merged into this issue.
Cc: wfh@chromium.org
This is by design. We impose an 8GB per-process RLIMIT_DATA limit, and a 16GB per-process RLIMIT_AS limit on Linux. If WebAsm is being used, the RLIMIT_AS limit is raised to 4 TiB. (https://codesearch.chromium.org/chromium/src/services/service_manager/sandbox/linux/sandbox_linux.cc?l=390)

The objective of this is to limit heap spraying a little bit. I am not intimately familiar with the Chrome sandbox on Windows, +wfh for details on Windows.

If the workers are just reserving address space but not actually mapping anything to those regions, we can try increasing the 16GB limit to 32GB. Is this a theoretical exercise or do you have cases where you want 28+ workers in th same process? That will help with prioritization for this issue, and help inform the address-space-size VS exploitation tradeoff.

Comment 16 by wfh@chromium.org, Nov 9 2017

It's strange that this does not happen on Windows - on Win (64-bit) we hard cap renderer memory limit at 4Gb, which seems below the numbers I see in #15 - perhaps this is a v8 memory limit being hit?

Bear in mind if you are developing a website which relies on using this much memory, then you're likely to hit issues on 32-bit clients, which are still quite a sizable population on Windows.
Will: is this mapped memory or just reserved address space that's limited at 4GB?

Comment 18 by wfh@chromium.org, Nov 9 2017

On Windows, the 4Gb hard Job-enforced limit is "working set size" which is defined by MS as "subset of process virtual memory that can be accessed without incurring a page fault."
That sounds like RLIMIT_DATA, not RLIMIT_AS, which explains why you can create a bunch of workers without a problem -- IIUC the V8 isolates that are created when you create a worker are mostly reserving address space but not populating that virtual memory in a way that "can be accessed without incurring a page fault."

That is, of course, until you have the worker actually do something useful.

This can be easily tested by increasing the RLIMIT_AS limit to 4 TB and seeing what happens.
Labels: M-62
Owner: jorgelo@chromium.org
Status: Assigned (was: Untriaged)
Here is the bisect result pointing to: https://chromium.googlesource.com/chromium/src/+/27480dd9ef24537450b3776e37cb3fb2d982040d

Good#62.0.3202.0
Bad#62.0.3202.62

Change log: https://chromium.googlesource.com/chromium/src/+log/62.0.3202.0..62.0.3202.62?pretty=fuller&n=10000

Thank you!
I'm happy to own this but I would like to know if folks are using 28+ workers in one process for legitimate reasons.
So I am talking with the team that develops the code that is spawning the web workers to try and understand their use case. (I am not on that team, just discovered the bug).

I would point out also that this doesn't affect OSX/macOS either, chrome is able to reach 100s of web workers like windows.

Also would it be possible to make this ticket open to the @atlassian email domain, it might speed things up in getting answers. I am also fine with just making the ticket public. 
So in terms of our use case. The web worker was being spawned every time component was added to the page, and expecting chrome to GC the web worker once the component was removed.
We are planning now to refactor the component to use a global pool of web workers.
We didn't notice it being an issue before since the hard crash only affected linux (not windows/osx).
Also we have observed that chrome is generating SIGILL signals during the web worker crash, as well the expected SIGABRT.

Comment 25 by wfh@chromium.org, Nov 9 2017

Labels: -Restrict-View-EditIssue
opening bug up based on #22
Thanks, that's very valuable context. Re: platforms, that is completely expected. This is a Linux restriction that as far as I can tell (I'm not an expert on our sandboxing/security policies on other arches) we're not setting anywhere else.

I haven't been able to confirm that this is 100% my hunch from c#19 but I will try to do that tomorrow.
Issue 782530 has been merged into this issue.
Workers not being terminated/garbage collected when expected is a different (and IMO more serious) kind of bug than opening 28 workers and running out of memory.

Can you explain why you expect the worker to be terminated? What is the "component" in comment #23?

Could you call WorkerGlobalScope#close() or Worker#terminate() to terminate the worker when done?
So component is react component, and my understanding is that it wasn't explicitly calling close or terminate. I believe the expectation was that once the reference to the worker variable went out of scope, the worker would be garbage collected once it finished doing useful work, since it can't receive any new messages. I can get more clarification if that is unclear. 
Reg c#29: I'm getting to remember how the GC works with workers. An unreferenced worker object is never GC'ed until the owner Document is closed or WorkerGlobalScope#close() is called. See my comment at [1] for details. Therefore, my suggestion at c#6 doesn't make sense, sorry!

[1] https://chromium-review.googlesource.com/c/chromium/src/+/584875#message-5dc2f046e540055610222485dc10a9e45bde074d
On my workstation, I can go up to 54 workers as per the repro in OP. Granted, this is a beefy workstation but this suggests that we're not hitting an absolute per-process limit (unless some of those 54 workers are being created in a different process).

I'm happy to increase the limit arbitrarily, but that sound like kicking the can down the road rather than figuring out the right way to fix this issue.

I agree with falken@ that knowing whether properly terminating the workers makes a difference would be useful.
In terms of our actual use case, we have moved from using per component workers to a global pool for the entire page so that we have a known fixed size of workers. So for out current issue I think it is resolved.

However I/(dev team) didn't realise that workers were not automatically cleaned up after they went out of scope.

So the main issue for us still is the varying behaviour across different platforms. I realise that I can't be uniform everywhere. 
Would it be possible to just return an error when the worker is trying to be created instead of crashing the tab?
That's a fair point, thought at this point this becomes a V8 issue IIUC.
We also have a legitimate use case for having more than 28 workers opened.
Our internal dashboarding system leans heavily on web workers to perform data fetching, data manipulation, etc on a per chart basis, for our streaming/live charts. This works *really* well in terms of performance & memory usage but also it provides us with great architecture to generate very performant, responsive UI that can deal with large amounts of data that also happens to be fault tolerant (transformations are user provided, so they could have bugs).

Internally we are 2000+ engineers with 'beefy' machines 16GB memory, the only people having issues are Linux users.
Might be worth mentioning that Firefox in Linux does not have this issue.
There are two aspects to this bug:

-Figuring out the right RLIMIT_DATA limits.
-Having V8 return something reasonable when it cannot create a worker (and avoid crashing the process).

I'll keep this bug for the discussion on limits, and have filed issue 793825 to track not crashing the process.
 Issue 795871  has been merged into this issue.
Issue 776420 has been merged into this issue.
Cc: hablich@chromium.org neis@chromium.org adamk@chromium.org
 Issue 761845  has been merged into this issue.
Status: Available (was: Assigned)
I don't have cycles to work on this until remediation work for Spectre/Meltdown dies down. If somebody wants to take it, they should. I still think the V8 fix is the most important thing here: failing webworker creation because of this would be totally in spec and would prevent the crashes.
Owner: ----
Issue 832013 has been merged into this issue.
In Isolate::Init at line 2944:

"The initialization process does not handle memory exhaustion."

https://cs.chromium.org/chromium/src/v8/src/isolate.cc?q=isolate.cc&dr&l=2944

AlwaysAllocateScope is at 

https://cs.chromium.org/chromium/src/v8/src/heap/heap-inl.h?l=561

Which is presumably getting an error during the heap setup at 

https://cs.chromium.org/chromium/src/v8/src/heap/heap.cc?type=cs&q=SetUp&l=4613

Which is returning false all the way from CodeRange->Setup at

https://cs.chromium.org/chromium/src/v8/src/heap/spaces.cc?type=cs&q=AlignedAllocVirtualMemory&l=121 

Comment 44 by w...@chromium.org, Jun 13 2018

Owner: hpayer@chromium.org
hpayer: Is there someone on V8 who could look into the fix to avoid crashing out if we don't have enough memory to create a new Worker?  There are a load of sites that OOM-crash due on machines with huge #s of cores, thanks to (ab)using WASM+Workers to do crypto-mining - see issue 851626.
Cc: rfbpb@google.com
Hi Rodrigo, this is the bug we were talking about. Let's figure out how to integrate this into the external memory reporting mechanism.
Mergedinto: 800348
Status: Duplicate (was: Available)

Sign in to add a comment