New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 869569 link

Starred by 4 users

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Feature

Blocked on:
issue 877301
issue 881144



Sign in to add a comment

Prototype a way to run a message (or multiple messages) in a nested message loop from WebAssembly threads

Project Member Reported by kbr@chromium.org, Jul 31

Issue description

Given recent feedback to the synchronous OffscreenCanvas.commit() API proposal on blink-dev [1] and the W3C TAG thread [2], it's necessary to investigate alternatives.

One that has been discussed is a way to pump the message loop from a WebAssembly thread; perhaps running a single posted task, or all currently posted tasks. Such an API could be called from the main loop of a WASM thread in order to allow Promises to resolve, posted messages to be handled, and so forth. It would provide better interoperability with other web APIs.

[1] https://groups.google.com/a/chromium.org/d/topic/blink-dev/hRZ_P2o-aEk/discussion
[2] https://github.com/w3ctag/design-reviews/issues/288

 
Owner: fs...@chromium.org
Status: Started (was: Available)
Let me try something around this.
Okey. Here's the strawmanest of all the CLs....

https://chromium-review.googlesource.com/c/chromium/src/+/1169686

This adds a new "executePendingTasks()" function on DedicatedWorkerGlobalScope that executes all pending tasks up to the point of the call.

Feel free to patch this and try something like the example below, where we manually wait for RAF. :)

<script id="worker" type="text/worker">

let waitRAF = false;

function waitForRAF() {
  waitRAF = true;
  requestAnimationFrame(waitForRAF);
}

onmessage = function(ev) {
  var ofc = ev.data;
  var ctx = ofc.getContext("2d");
  var x = 0;
  var last = 0;
  waitForRAF();

  function busy_loop() {
    const pn = performance.now();
    let delta = pn - last;
    last = pn;
    ctx.clearRect(0, 0, 200, 200);
    ctx.fillStyle = x % 2 == 0? "blue" : "red";
    // ctx.fillRect(x, 90, 10, 10);
    ctx.fillRect(0, 0, 200, 200);
    x = (x + 1) % 200;
    waitRAF = false;
    while (!waitRAF) {
      executePendingTasks();
    }
  }

  for (let i = 0; i < 100; ++i) {
    busy_loop();
  }
}
</script>

PS: the current patch does stack dumps on RunTasksInCurrentSequence() but I can't for the life of me find out why.
Cc: kbr@chromium.org
Super cool, thank you! Let me try wiring this up to a WebAssembly test case.

Blocking: -563816
Blockedon: 877301
Owner: kainino@chromium.org
Thanks fserb@ for the prototype above! kainino will pick this up, figure out the SequenceChecker assertion failures with it, and see whether wiring this up into Emscripten's runtime seems to be the best path forward (per Issue 877301 as well).

Labels: -Pri-2 Pri-1
Increasing to P1 RFE since this is a blocker for moving forward with Unreal Engine's threaded rendering.

Progress so far:

Somehow this patch is causing a task that's supposed to run on the main thread to run on the worker thread instead.

Replacing the whole BarrierClosure thing with run_loop.RunUntilIdle();
seems to prevent the crash (though this isn't actually valid because RunUntilIdle may never return).
Components: Blink>Scheduling
FTR, this is the crash we're investigating:

[1:17:0829/184550.217531:FATAL:simple_watcher.cc(135)] Check failed: task_runner_->RunsTasksInCurrentSequence().
#0 0x7f89a7e67cfc base::debug::StackTrace::StackTrace()
#1 0x7f89a7d931ab logging::LogMessage::~LogMessage()
#2 0x7f89a8062928 mojo::SimpleWatcher::SimpleWatcher()
#3 0x7f89a80a43f0 mojo::Connector::WaitToReadMore()
#4 0x7f89a80a421a mojo::Connector::Connector()
#5 0x7f89a80aeda5 mojo::internal::MultiplexRouter::MultiplexRouter()
#6 0x7f89a80aeaf6 mojo::internal::InterfacePtrStateBase::InitializeEndpointClient()
#7 0x7f89a37c6bf0 mojo::internal::InterfacePtrState<>::ConfigureProxyIfNecessary()
#8 0x7f89a37c5aea ukm::MojoUkmRecorder::AddEntry()
#9 0x7f89a37c6fdb ukm::internal::UkmEntryBuilderBase::Record()
#10 0x7f899ddbf800 blink::scheduler::WorkerThreadScheduler::RecordTaskUkm()
#11 0x7f899ddbf67d blink::scheduler::WorkerThreadScheduler::OnTaskCompleted()
#12 0x7f89a7e024be base::sequence_manager::internal::TaskQueueImpl::OnTaskCompleted()
#13 0x7f89a7df80e1 base::sequence_manager::internal::SequenceManagerImpl::NotifyDidProcessTask()
#14 0x7f89a7df7aa0 base::sequence_manager::internal::SequenceManagerImpl::DidRunTask()
#15 0x7f89a7e05d8a base::sequence_manager::internal::ThreadControllerImpl::DoWork()
#16 0x7f89a7e07be8 _ZN4base8internal7InvokerINS0_9BindStateIMNS_16sequence_manager8internal20ThreadControllerImplEFvNS5_8WorkTypeEEJNS_7WeakPtrIS5_EES6_EEEFvvEE3RunEPNS0_13BindStateBaseE
#17 0x7f89a7d73a9d base::debug::TaskAnnotator::RunTask()
#18 0x7f89a7da128e base::MessageLoop::RunTask()
#19 0x7f89a7da1752 base::MessageLoop::DoWork()
#20 0x7f89a7da5156 base::MessagePumpDefault::Run()
#21 0x7f89a7da0d61 base::MessageLoop::Run()
#22 0x7f89a7dd5496 base::RunLoop::Run()
#23 0x7f89a7e2fe5a base::Thread::Run()
#24 0x7f89a7e30417 base::Thread::ThreadMain()
#25 0x7f89a7e7da1f base::(anonymous namespace)::ThreadFunc()
#26 0x7f899bedd494 start_thread
#27 0x7f8999e30a8f clone

The implementation of RunsTasksInCurrentSequence being hit here is this one:
https://cs.chromium.org/chromium/src/base/task/sequence_manager/task_queue_task_runner.cc?l=35&rcl=a50a419478967dd04eaa190d7cf8219de0e78a02

I tried these 3 implementations:

  run_loop.RunUntilIdle();

Hits this DCHECK, but not reliably.

  base::PostDelayedTask(FROM_HERE, run_loop.QuitClosure(), TimeDelta());
  run_loop.Run();

Usually hits this DCHECK.

  base::RepeatingClosure bc = base::BarrierClosure(4, run_loop.QuitClosure());
  GetThread()->GetTaskRunner(TaskType::kJavascriptTimer)->PostTask(FROM_HERE, bc);
  GetThread()->GetTaskRunner(TaskType::kPostedMessage)->PostTask(FROM_HERE, bc);
  GetThread()->GetTaskRunner(TaskType::kWorkerAnimation)->PostTask(FROM_HERE, bc);
  GetThread()->GetTaskRunner(TaskType::kIdleTask)->PostTask(FROM_HERE, bc);
  run_loop.Run();

Usually hits this DCHECK.

Even more concerning is this: regardless of which implementation I use, the *browser* becomes unresponsive while the worker's main loop is running. It stays unresponsive for about 3x the time it takes for the worker to stop looping. (i.e. it continues for 2x after the worker stops).

This happens even if OffscreenCanvas is not used at all, so it doesn't seem tied to the OffscreenCanvas presentation.

Comment 12 Deleted

It's not currently clear to us whether this DCHECK is even correct in this context.

BTW, just remembered, the blink::TaskType for the crashing task we investigated (which iirc was associated with the main renderer thread but running on the main worker thread (in gdb with --single-process)) was 46, kWorkerThreadTaskQueueDefault. So maybe the associated thread id is just wrong somehow.

Comment 14 Deleted

Comment 15 Deleted

Sorry for the extra emails.

Reuploaded test cases:
https://github.com/kainino0x/crbug869569

http://kai.graphics/crbug869569/raf-render-nopump.html
- postmessage works, browser ok

http://kai.graphics/crbug869569/spin-norender-pump-outside-onmessage.html
- postmessage probably works, browser unresponsive
- proves OffscreenCanvas doesn't cause unresponsiveness

http://kai.graphics/crbug869569/spin-render-nopump-outside-onmessage.html
- postmessage doesn't work (obviously), browser ok
- proves pumping does cause freeze

http://kai.graphics/crbug869569/spin-render-pump-in-onmessage.html
- postmessage doesn't work, browser unresponsive
- shows postmessage won't get called inside postmessage

http://kai.graphics/crbug869569/spin-render-pump-outside-onmessage.html
- postmessage works, browser unresponsive

In all cases, if the browser is unresponsive during the main loop, it continues being unresponsive for an extra few seconds after the main loop finishes.

My forked version of fserb's CL here:
https://chromium-review.googlesource.com/c/chromium/src/+/1169686
Update: turns out all of the browser unresponsiveness was caused by the console.log in the inner loop; removing that fixed it. It also only seems to happen in my local build, not on dev, probably because of DCHECKs slowing things down.
There is another crash, which is reliably reproducible by reloading the page (e.g. spin-render-pump-outside-onmessage.html) while the spin loop is still going:

[1:15:0831/144246.264744:FATAL:gc_task_runner.h(51)] Check failed: !nesting_ || nesting_ == 1.
#0 0x7f1379fa0b4c base::debug::StackTrace::StackTrace()
#1 0x7f1379ecb1ab logging::LogMessage::~LogMessage()
#2 0x7f136faba09e blink::GCTaskObserver::~GCTaskObserver()
#3 0x7f136fcbab5d blink::WebThreadSupportingGC::ShutdownOnThread()
#4 0x7f1371e87bc3 blink::WorkerBackingThread::ShutdownOnBackingThread()
#5 0x7f1371e94c9c blink::WorkerThread::PerformShutdownOnWorkerThread()
#6 0x7f136fcb97dc blink::(anonymous namespace)::RunCrossThreadClosure()
#7 0x7f136fcba203 _ZN4base8internal7InvokerINS0_9BindStateIPFvN3WTF19CrossThreadFunctionIFvvEEEEJS6_EEES5_E7RunOnceEPNS0_13BindStateBaseE
#8 0x7f1379eaba9d base::debug::TaskAnnotator::RunTask()
#9 0x7f1379f3e3d6 base::sequence_manager::internal::ThreadControllerImpl::DoWork()
#10 0x7f1379f402c8 _ZN4base8internal7InvokerINS0_9BindStateIMNS_16sequence_manager8internal20ThreadControllerImplEFvNS5_8WorkTypeEEJNS_7WeakPtrIS5_EE
S6_EEEFvvEE3RunEPNS0_13BindStateBaseE
#11 0x7f1379eaba9d base::debug::TaskAnnotator::RunTask()
#12 0x7f1379ed928e base::MessageLoop::RunTask()
#13 0x7f1379ed9752 base::MessageLoop::DoWork()
#14 0x7f1379edd166 base::MessagePumpDefault::Run()
#15 0x7f1379ed8dde base::MessageLoop::Run()
#16 0x7f1379f0d4a6 base::RunLoop::Run()
#17 0x7f1371e76925 blink::DedicatedWorkerGlobalScope::executePendingTasks()
#18 0x7f137287bec4 v8::internal::FunctionCallbackArguments::Call()
#19 0x7f137287a591 v8::internal::(anonymous namespace)::HandleApiCallHelper<>()
#20 0x7f13728789d8 v8::internal::Builtin_Impl_HandleApiCall()
#21 0x7f1372878471 v8::internal::Builtin_HandleApiCall()
#22 0x7f1373566fd5 <unknown>

This one appears to be because executePendingTasks is executing the cooperative thread shutdown (PerformShutdownOnWorkerThread) *during* another task.

I'm looking at prevent this from happening.
Reuploaded here (old one had a bad change-id), with a fix for the gc_task_runner.h crash:

https://chromium-review.googlesource.com/c/chromium/src/+/1200409
OK, back to "Check failed: task_runner_->RunsTasksInCurrentSequence()". Debugging this again, I have this task:

https://cs.chromium.org/chromium/src/mojo/public/cpp/system/simple_watcher.cc?l=107&rcl=27028c1e27b992351011dcb9bfeb7cd437ffc5ca

being run on thread 216616 "DedicatedWorker". This task has a task type of kWorkerThreadTaskQueueDefault.

The crashing SimpleWatcher's TaskRunner is a base::sequence_manager::internal::TaskQueueTaskRunner (with TaskType kMainThreadTaskQueueDefault).

I don't really understand how these tasks are getting mixed up in this way. Maybe this task is being created on the worker thread, but somehow associated with the main thread? Will probably need input from an expert on this stuff.
Labels: Partner-Games-Epic
Cc: altimin@chromium.org
Thanks to robliao for help finding the root of this error. His analysis:

> Looks like both @Kai Ninomiya  and I reach the same conclusion at nearly the same time. The UKM Recorder appears to be smuggled here: https://chromium-review.googlesource.com/c/chromium/src/+/1151989/7/third_party/blink/renderer/platform/scheduler/worker/worker_scheduler_proxy.cc#26
> This appears to bind the UKM Recorder on the main thread (where it's created) and sent directly to the worker thread.

Binding the ukm recorder grabs the task runner for the current thread. That task runner ends up getting shuffled over to a Web Worker thread along with the ukm recorder, where it ends up getting misused.

altimin has suggested a temporary workaround until there is a real fix for this bug:
> If you're prototyping, I suggest removing lines 24-27 from here: https://cs.chromium.org/chromium/src/third_party/blink/renderer/platform/scheduler/worker/worker_scheduler_proxy.cc?type=cs&g=0&l=24

I'll try this tomorrow.
Blockedon: 881144
Just curious but have you considered having executePendingTasks never return if the window is not the active tab and what implications that would have?

It seems like it would work, the worker would still get events, but at least it would hopefully not spin in the background
We've had some discussions about how executePendingTasks interacts with requestAnimationFrame - in particular, how to make executePendingTasks work without being wrapped in a spin loop. I think we can solve this easily.

executePendingTasks is a means toward interacting with other web APIs, including RAF, so an application's RAF (and probably other work) would be throttled in the same way that it is throttled today.
FYI: I finally have time to look at this again. piman had this idea that if we want to restrict usage of a nested message loop, we could use a "requestMainLoop" api. I decided to prototype this because I do prefer making it as restricted as possible.

Prototype patch:
https://chromium-review.googlesource.com/c/chromium/src/+/1200409
and demo:
https://kai.graphics/crbug869569/spin-render-pump-outside-onmessage.html
are updated.
function mainLoop() {
  while (true) {
    /* rendering, etc. */

    requestAnimationFrame(requestStopExecutingTasks);
    beginExecutingTasks();
  }
}

onmessage = function(ev) {
  if (ev.data == "beginMainLoop") {
    requestMainLoop(mainLoop);
  }
};

Latest minimal demo code FTR
Labels: -Pri-1 Pri-2
[GPU Triage Council]
How is this progressing?
I was in the middle of a refactor but I haven't spent time on it recently. I've been focusing on some WebGPU work.

FTR, I think I promised to write one-pager for this. I had forgotten to put that on my todo list.

Sign in to add a comment