WebAssembly.instantiate sometimes takes minutes instead of seconds
Reported by
e...@figma.com,
Jun 20 2018
|
|||||||
Issue descriptionUserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36 Steps to reproduce the problem: 1. Sign into www.figma.com (signing up is free) 2. Create a new document 3. Click on the hamburger menu in the top left, then click File > New (opens in a new tab) 4. Repeat step #3 eight times Browser: Google Chrome Version 67.0.3396.87 (Official Build) (64-bit) OS: macOS High Sierra 10.13.5 What is the expected behavior? Each WebAssembly.instantiate call should take around the same amount of time (the average is normally around 3.3 seconds for me). What went wrong? The 8th one takes a crazy long amount of time but eventually succeeds. Strangely enough, it appears to be reproducible on the 8th time the three times I've tried it. The average time for WebAssembly.instantiate when this happens is around 3.5 minutes. The CPU is pegged at 300% for the whole time. A capture of chrome://tracing (attached) appears to show a major GC happening every second? Not sure if that's the root issue or not. Did this work before? N/A Chrome version: 67.0.3396.87 Channel: stable OS Version: OS X 10.13.5 Flash Version: I had previously filed this as issue 829463 , which was merged into issue 824681 . But it seems unrelated, especially since issue 824681 was supposed to be fixed in Chrome 67.
,
Jun 20 2018
OK, I can't repro this bug on ToT. I'll do a quick bisect to see where the fix was.
,
Jun 20 2018
,
Jun 20 2018
I started to bisect, but found that I could easily repro if I followed a different set of steps. Not sure if it's the same bug, though: 1. Choose File > New from the menu. 2. While this tab is loading close a previous tab. The current tab doesn't load, but it does show the "Jason from Figma" popup in the bottom right corner.
,
Jun 20 2018
Ah interesting. I can also reproduce this on Version 69.0.3466.0 (Official Build) canary (64-bit) following the steps from comment 4. It may indeed be a different issue though because the CPU appeared to be idle the whole time. I'm not sure if it would have terminated eventually of course but I did wait 15 minutes and WebAssembly.instantiate still hadn't finished.
,
Jun 21 2018
Sorry, I didn't make any more progress on the bisect. I'll try some more tomorrow. Andreas, I wonder if you have an idea of what may have changed between these versions (~M67 and ~M69)? Were there parallel compilation fixes?
,
Jun 21 2018
Another update: I tried "WebAssembly.instantiateStreaming(new Response(new Blob([a], {type: 'application/wasm'})), b)" instead of "WebAssembly.instantiate(a, b)" as a workaround but that also hangs forever in the same way.
,
Jun 25 2018
Sorry, was out Thursday and Friday last week. I tried again with a build w/ the same version as Chrome Beta (68.0.3440.33) and wasn't able to repro (with the original steps). Can you confirm?
,
Jun 26 2018
If the bug as described here is fixed in M68, then there may not be much that can be done here. That said, the similar bug (WebAssembly.instantiate that never terminates) described in comment #4 still occurs on ToT. I believe Andreas is still on vacation, is that correct? Ben, do you know what might be going on here?
,
Jun 27 2018
I just tried to reproduce this on Chrome Beta (68.0.3440.42). I wasn't able to reproduce it with 8 tabs but I was able to reproduce it with 11 tabs. WebAssembly.instantiate never succeeded or failed and the CPU was similarly pegged the entire time. I tried waiting it out to see if it would eventually finish or not but I gave up after 10 minutes. I was also building Chromium with dchecks locally though so it's not a fair comparison :)
,
Jun 28 2018
I looked into the second bug with my own Chromium build:
1. Choose File > New from the menu.
2. While this tab is loading close a previous tab.
I captured this stack trace when the bug happens:
v8::base::debug::StackTrace::StackTrace()
v8::internal::wasm::AsyncCompileJob::~AsyncCompileJob()
v8::internal::wasm::AsyncCompileJob::Abort()
v8::internal::wasm::WasmEngine::AbortAllCompileJobs()
v8::Isolate::ContextDisposedNotification(bool)
blink::V8GCForContextDispose::NotifyContextDisposed(bool, blink::WindowProxy::FrameReuseStatus)
blink::LocalWindowProxy::DisposeContext(blink::WindowProxy::Lifecycle, blink::WindowProxy::FrameReuseStatus)
blink::WindowProxyManager::ClearForClose()
blink::ScriptController::ClearForClose()
blink::LocalFrame::Detach(blink::FrameDetachType)
blink::Page::WillBeDestroyed()
blink::WebViewImpl::Close()
content::RenderWidget::Close()
content::RenderViewImpl::Close()
It looks like maybe closing the first page aborts the compilation on the second page?
,
Jun 28 2018
As for the first bug, it's pretty obvious what's happening when you run Chrome with --js-flags="--trace_gc". All tabs share the same process as can easily be seen in Chrome's Task Manager. It looks like each tab adds 200mb or so to the process memory and once the process memory gets up around 1gb the GC starts thrashing. That explains why it always reproduces after a certain number of tabs are opened. When I tried it, Chrome printed stuff like this to the console for 4 minutes straight before eventually finishing the WebAssembly compilation: ... [91627:0x7f96c901fa00] 199057 ms: Mark-sweep 345.5 (364.9) -> 345.5 (364.9) MB, 244.6 / 35.9 ms (+ 0.7 ms in 2 steps since start of marking, biggest step 0.6 ms, walltime since start of marking 272 ms) (average mu = 0.090, current mu = 0.097) memory pressure GC in old space requested [91627:0x7f96c901fa00] 199318 ms: Mark-sweep 345.5 (364.9) -> 345.5 (364.9) MB, 235.2 / 29.5 ms (+ 4.3 ms in 1 steps since start of marking, biggest step 4.3 ms, walltime since start of marking 261 ms) (average mu = 0.087, current mu = 0.083) memory pressure GC in old space requested [91627:0x7f96c901fa00] 199559 ms: Mark-sweep 345.5 (364.9) -> 345.5 (364.9) MB, 217.2 / 30.9 ms (+ 3.6 ms in 1 steps since start of marking, biggest step 3.6 ms, walltime since start of marking 241 ms) (average mu = 0.085, current mu = 0.083) memory pressure GC in old space requested [91627:0x7f96c901fa00] 199801 ms: Mark-sweep 345.5 (364.9) -> 345.5 (364.9) MB, 217.4 / 29.2 ms (+ 4.5 ms in 1 steps since start of marking, biggest step 4.5 ms, walltime since start of marking 242 ms) (average mu = 0.084, current mu = 0.083) memory pressure GC in old space requested [91627:0x7f96c901fa00] 200060 ms: Mark-sweep 345.5 (364.9) -> 345.5 (364.9) MB, 234.0 / 30.3 ms (+ 0.6 ms in 1 steps since start of marking, biggest step 0.6 ms, walltime since start of marking 259 ms) (average mu = 0.089, current mu = 0.094) memory pressure GC in old space requested [91627:0x7f96c901fa00] 200298 ms: Mark-sweep 345.5 (364.9) -> 345.5 (364.9) MB, 212.9 / 29.4 ms (+ 3.5 ms in 1 steps since start of marking, biggest step 3.5 ms, walltime since start of marking 238 ms) (average mu = 0.089, current mu = 0.090) memory pressure GC in old space requested ... I suppose it would be nice if Chrome didn't have this GC thrashing behavior, but clearly even if it didn't thrash the GC it'll eventually hit the per-tab memory limit and crash. So maybe this thrashing bug isn't urgent? It seems like the real solution here would be to have some way of ensuring that each new tab is opened in a new process. Right now creating a new document in Figma involves calling submit() on a <form> element with a "_blank" target to send a post request to our server in a new tab. I've read that the rel="noreferrer" attribute on <a> elements will cause Chrome to create the new tab in a separate process. But we can't use an <a> element because need to use a post request, not a get request, since creating a new document is not idempotent. What should we do?
,
Jun 28 2018
Thanks for the stack trace in #11, Evan. This is pretty obviously wrong. We shouldn't cancel all compilation just because one tab is being closed. I will come up with a fix for this. Regarding the GC thrashing, there probably is not much we can do about this. Adding Ulan for that question.
,
Jun 28 2018
I was thinking about this more last night and we should be able to solve it on our end by being clever with using get requests instead of post requests. CSRF makes it a little trickier but we'll figure it out. Thanks so much for all of your help with these bugs!
,
Jun 28 2018
It seems like you've done most of the hard work here, Evan! Thanks for tracking down these bugs.
,
Jun 28 2018
The following revision refers to this bug: https://chromium.googlesource.com/v8/v8.git/+/63372e467935d98ef4736a00ee02926cd50ad008 commit 63372e467935d98ef4736a00ee02926cd50ad008 Author: Clemens Hammacher <clemensh@chromium.org> Date: Thu Jun 28 15:34:27 2018 [wasm] Do not cancel all compilation on context disposal We should only cancel asynchronous compilation jobs for the isolate which is being recycled. R=titzer@chromium.org Bug: chromium:854755 Cq-Include-Trybots: luci.chromium.try:linux_chromium_rel_ng Change-Id: I31d6c3ccb648f5465e52f4bc47c4261894458e60 Reviewed-on: https://chromium-review.googlesource.com/1118378 Reviewed-by: Ben Titzer <titzer@chromium.org> Commit-Queue: Clemens Hammacher <clemensh@chromium.org> Cr-Commit-Position: refs/heads/master@{#54092} [modify] https://crrev.com/63372e467935d98ef4736a00ee02926cd50ad008/src/api.cc [modify] https://crrev.com/63372e467935d98ef4736a00ee02926cd50ad008/src/wasm/module-compiler.h [modify] https://crrev.com/63372e467935d98ef4736a00ee02926cd50ad008/src/wasm/wasm-engine.cc [modify] https://crrev.com/63372e467935d98ef4736a00ee02926cd50ad008/src/wasm/wasm-engine.h
,
Jun 28 2018
Note that even though #16 makes the code a bit safer (and future-proof), it probably does not fix any bug, since the WasmEngine is not currently shared between isolates. So closing one tab should not have any effect on other tabs. I will try to reproduce this bug tomorrow.
,
Jul 4
I just tried reproducing either the original bug (#1) or the second (?) bug in #4. Neither reproduce on Linux x64, Version 69.0.3472.3 (Official Build) dev (64-bit). I tried with Liftoff (WebAssembly baseline compiler) explicitly enabled or disabled. Can anyone confirm whether this is fixed?
,
Jul 25
Marking as Fixed according to #18. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by binji@chromium.org
, Jun 20 2018Labels: OS-Linux
Status: Untriaged (was: Unconfirmed)