New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 721491 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Mac
Pri: 2
Type: Bug-Regression



Sign in to add a comment

Graylog tabs take up 100% CPU each when not the active tab

Reported by sgla...@evergage.com, May 11 2017

Issue description

UserAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.47 Safari/537.36

Steps to reproduce the problem:
1. Open a tab to the Graylog log server UI (any tab will do, search, list, etc)
2. Select a different tab in that window to be the active tab
3. Check the Task Manager in chrome and watch the CPU stick at 99-100% for that tab.
4. GO back and activate the Graylog tab again, and watch the CPU go back down to (or near) 0%

What is the expected behavior?
No CPU spike

What went wrong?
The CPU in the tab hung at 100%.

I confirmed that going back to stable (chrome 58) fixes this issue.  I also confirmed (in my current chrome 59 beta channel) that opening an entirely new profile (google-chrome --user-data-dir=NEW_DIR_HERE) can reproduce the problem.  So that means no extensions or customizations are in the way.  With that profile, I have taken a performance save (from the Performance tab in devtools, attached), a screen capture (attached), and a chrome://tracing trace (also attached).  Hopefully this helps.  Since the Performance tab save showed the majority of the time being idle CPU, I'm assuming that something in chrome itself is eating the CPU, as opposed to some runaway javascript code.

Unfortunately, this only seems to happen with this one site, a Graylog server UI.  I've been trying to find a publicly accessible instance of graylog to point you at, but having no luck.  There are some pre-built instances in various platforms you can try, and it is an open source project so you can download and install it.

Did this work before? Yes 58.0.3029.110 (64-bit)

Chrome version: 59.0.3071.47  Channel: beta
OS Version: 18.1 (Serena)
Flash Version: 

Let me know if you need any more input!
 
graylog_100_percent_cpu_background.ogv
1.2 MB View Download
trace_graylog_background_100percent_cpu.json.gz
3.0 MB Download
graylog_100_percent_cpu_background_performance_tab_save.json.gz
3.2 MB Download
Labels: Needs-Triage-M59 Performance-Memory
I was able to install a new graylog server into a VM using these instructions:
http://docs.graylog.org/en/latest/pages/installation/os/ubuntu.html

Then in that VM, I downloaded chrome stable, beta, and unstable (58, 59, 60).  Only in version 59 (beta) did the problem reproduce.  So if this a regression, it might have been fixed in version 60?  If so, I'd love to get whatever fix that was backported to version 59, but I'm not sure the best way to narrow down what it was.

Let me know if you need anything more from me for machine specs or anything like that.
Some more information was added by another user (STRML) in the github issue I made in graylog.

https://github.com/Graylog2/graylog2-server/issues/3814#issuecomment-301782316

One excerpt from his comments:

"Digging through the trace, it appears to be in a Microtask loop. There may be a bug in 59 that executes microtasks synchronously, so if one is scheduled inside another it will busyloop all day."
Labels: TE-NeedsTriageHelp
Now that chrome 59 is now on stable, my whole team is seeing this issue now.  They are on macs, so it's not Linux specific like the OS list in the bug suggests.  I don't have permissions to edit that portion of the bug so could someone else do so?

Also is there any news on why this is fixed in chrome 60 (or was in my testing).  Is a backport a possibility?
This appears to be related to the use of TLS.

I've set up a test server that demonstrates the issue (log in with `admin:admin`)

http://graylog.strml.net/dashboards , when idle, does not use 100% CPU.

https://graylog.strml.net/dashboards (note TLS), does.
I've reproduced it consistently with http. Don't have a publicly-accessible reproducer, though.

Comment 8 by be...@torch.sh, Jun 7 2017

I can reproduce that with Chrome 59.0.3071.86 on Ubuntu 16.04 LTS using my local Graylog development setup. (I am a developer in the Graylog team)

The Graylog web interface is running several background jobs using the setInterval() JavaScript function. If I disable all of those background jobs, the high CPU issue doesn't show up when I switch to a different browser tab. Once I enable only one of those setInterval() calls, the issue shows up again.

I found a way to make the high CPU usage stop by opening a specific HTML select dropdown on the Graylog search page. When I opened that dropdown, I can switch to a different tab and the CPU usage stays at 0, even with all background setIterval() jobs running. No idea why that fixes it. :)

Please find the attached screen-capture.gif file which shows the issue and how opening the dropdown "fixes" it.

Once I opened the dropdown, the high CPU usage never shows up again for the lifetime of that browser tab. Even if I reload the Graylog web app with ctrl+shift+r I cannot reproduce the issue again. Only when opening a new browser tab it will show up again.

I hope that helps. Please let me know if you need more information.
screen-capture.gif
564 KB View Download

Comment 9 by be...@torch.sh, Jun 7 2017

I forgot a detail in my previous comment.

The background setInterval() jobs are running HTTP requests against the Graylog server API and update the page if some state changes.

I created a simple setInterval() job which only logs to the console and that didn't trigger the high CPU usage when switching to a different browser tab.
Cc: skyos...@chromium.org altimin@chromium.org
Components: Blink>Scheduling Internals>Network
Labels: -Performance-Memory Performance OS-Mac
Owner: cbentzel@chromium.org
Sorry to hear about the issue, and thank you for all the detailed reporting, folks. This makes things a lot easier!

+Chris, it sounds like the net stack may be the culprit here. See some references to HTTP requests and TLS above, and it in the trace it appears that the high CPU is being triggered by an XHR (consistent with the above description). I assume there's some priority inversion going on here when we're switching tabs... so also +Sami and +Alexander in case this is a scheduling issue.
Cc: rsch...@chromium.org
Cc: eroman@chromium.org sleevi@google.com
Components: -Internals>Network Internals>Network>SSL
Cc: -sleevi@google.com rsleevi@chromium.org
Components: -Internals>Network>SSL
That it's attributed to the Tab, rather than the Browser process, implies its renderer side. Removing Internals>Network>SSL because of that.
Components: Internals>Mojo
Based on the trace it looks like this mojo function is continually posting itself:

https://cs.chromium.org/chromium/src/content/common/message_port.cc?rcl=2febb9832367123799f3e553274c4449f91a5634&l=200

Could someone more familiar with that code have a look?
Cc: roc...@chromium.org
AFAICT this is only possible if a MessagePort onmessage handler triggers itself, e.g.:

  let channel = new MessageChannel;
  channel.port2.onmessage = () => channel.port1.postMessage("hai");
  channel.port2.start();
  channel.port1.postMessage("hai");

If this is fixed in M60 I suppose (wild guess) it may be due to task scheduling changes. The MessagePort impl hasn't been touched since M59.

Comment 17 by be...@torch.sh, Jun 16 2017

I just got an update to Chrome 59.0.3071.104 on Ubuntu 16.04 LTS but the problem still persists. I didn't expect it to be fixed, so this is just a FYI. :)
Cc: pbomm...@chromium.org abdulsyed@chromium.org
Labels: M-59
I am suspecting this is something similar issues#733469 and  733469 , currently we are planning to revert the patch in M59.

Comment 19 by be...@torch.sh, Jun 21 2017

I just updated to Chrome 59.0.3071.109 on Ubuntu 16.04 LTS and I don't see the problem anymore! :)

Good work and thank you very much!
Verified fixed here as well. Shutting down the Graylog server. Thanks!

Comment 21 Deleted

Status: Fixed (was: fixedf)
Re #20: I'd like to investigate this problem further to avoid repeating it again. Could you keep the server alive for an extra couple of weeks?

Sign in to add a comment