Discard tab when system is thrashing |
||
Issue descriptionWhen CrOS is under memory pressure, there're two types of potential problem: One is kernel OOM kill which leads to "Aw snap" tab, system hang, or even crash. Another is that system enters a heavy swapping state so user experiences severe sluggish. For the first type of problem, we're trying our best to kill tabs when running out of both free memory and swap space. Solutions to it doesn't necessary apply to the latter one because trashing can happen when running programs can fit in memory, but the active working set spans across memory and swap (zram). It typically results in lots of swap so the system becomes laggy. A proposal is to quantify "sluggishness" by measuring FPS/smoothness. Then we collect system metrics (e.g., CPU loads, swapping stats) to observe how laggy the system is under a specific set of metrics. If we can derive a consistent relationship between the metrics and sluggishness, we can monitor those metrics on runtime and signal chrome to discard tabs when we believe system has entered a sluggish state. The relationship could be differ from board to board, so we can set it as a per-board (if needed) parameter to prevent users from experience (the same extend of) sluggishness on any device. Here's some preliminary findings: First I tried to allocate lots of non-swappable memory (using mlockall() to lock memory), then call memory-eater (which allocates memory and sequential traverses multiple times) to create more memory pressure. The end results is that part of memory allocated by memory-eater is swapped out to zram. When memory is swapped in/out back and forth, trashing happens. I opened a WebGL benchmark webpage (https://webglsamples.org/aquarium/aquarium.html) and measured FPS along with current cpu load, swap in/out number, and (major) page fault number every second. Attached are plots for 100 fishes, 500 fishes, and 1000 fishes. As you can see, swap in and swap out have a strong correlation (nearly linear) to FPS rate. CPU loads could be used as a supplementary signal. Page fault, on the other hands, doesn't seem to have much correlation to FPS rate. Next step is to 1) Try it on more machines to see if the behavior is consistent, maybe an autotest to make this automatic and more deterministic. 2. Derive a set of metrics and threshold. Fire a "discard tab" signal when the set of metrics meet the threshold.
,
Mar 13 2018
Yes, we could call it the signal graph now. And the missing signal on the graphs is the size of memory_eater allocated. However, the major difference between Windows and CrOS is that CrOS uses zRam for swap instead of flash/HD storage. Therefore, when memory is pressured on CrOS, swap in/out rate and CPU load are highly correlated than on Windows. Our final goal is to tweak tab discard setting to optimize the trade-off between system sluggish and memory resource utilization. Once Cheng-yu make it an autotest benchmark with the information of memory pressure(memory_eater), we could check the impact of the thrashing signals with different CrOS tab discard settings. wez/sebmarchand, would you also share your thrashing work with us, so we could see if there is anything we missed? Thanks!
,
Mar 13 2018
Re #2: Issue 775644 tracks the new tab-discarding & TabManager refactoring, based on the new "lifecycle" model - you'll want to make sure that any new signals line up with the new system. Regarding zRAM vs disk swap: Windows and Mac both have compressed memory swap in addition to disk swap, these days. FWIW, what you describe sounds like a last-resort signal, which suggests it would only be hooked up to discard tabs via full renderer-process termination.
,
Mar 13 2018
I agree that a high rate of swap should trigger tab discards, but only in some circumstances. The trade-off with zram is that when we run out of RAM, instead of discarding a tab, we swap it out. Then, when we select that tab again, instead of reloading it, we swap it back in. (Or rather we swap back in a large part of its pages.) Therefore, when changing context, a high rate of swap is acceptable, and should not trigger discards---otherwise we destroy the benefit of swapping. We know that some activities generate large context switches: specifically, switching to a different tab or app. It is likely that there are other actions that cause large swapping (for instance changing context between Mail and Contacts in the same tab). Thus I think that the kernel (on Chrome OS) should send two signals: one (the current one) when available RAM is near exhaustion. The other when CPU usage from swapping exceeds some threshold (for instance, 50% of one CPU). Chrome should always respect the first signal (low RAM) but should ignore the high-swap signal for a few seconds (say 3 to 5 seconds) after a tab/app switch, and in general should also ignore it if it lasts only a few seconds. This will catch most "beneficial" swap situations. Unfortunately it will also trigger a number of unwanted discards (as an example, a user switching between Mail and Contacts repeatedly in rapid sequence) but we can hope that this number will be small.
,
Mar 14 2018
Agree with Luigi's comments, let's wait until Cheng-yu makes it an autotest benchmark w/ information of fps, cpu load, app/tab discard count, meminfo together w/ different load of memory_eater. It will be clear what we should react w/ different CPU load and memory pressure. In the future, when we tweak the policy of tab/app discard, we could easily run the benchmark and see the improvements from new settings.
,
Apr 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/28888c4ab2a5d5e7c9907e0eecb5397140aebdcc commit 28888c4ab2a5d5e7c9907e0eecb5397140aebdcc Author: Cheng-Yu Lee <cylee@chromium.org> Date: Mon Apr 09 19:42:42 2018 Add a util memory-eater-locked to allocate non-swappable memory. It can be useful for memory related test. BUG=chromium:821046 TEST=manual Change-Id: Ia52b1c961cfca85172b4a563576d1797bc1f6129 Reviewed-on: https://chromium-review.googlesource.com/988894 Commit-Ready: Cheng-Yu Lee <cylee@google.com> Tested-by: Cheng-Yu Lee <cylee@google.com> Reviewed-by: Chung-yih Wang <cywang@chromium.org> [add] https://crrev.com/28888c4ab2a5d5e7c9907e0eecb5397140aebdcc/dev-util/memory-eater-locked/memory-eater-locked-9999.ebuild
,
Apr 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/experimental/+/f7c40ff90f882bd14101096293c7c21063599dea commit f7c40ff90f882bd14101096293c7c21063599dea Author: Cheng-Yu Lee <cylee@chromium.org> Date: Mon Apr 09 19:42:36 2018 Add a util memory-eater-locked to allocate non-swappable memory. It can be useful for memory related test. BUG=chromium:821046 TEST=manual Change-Id: I45f790e3957b8552977608d7a7d3ce53b13431cb Reviewed-on: https://chromium-review.googlesource.com/988892 Commit-Ready: Cheng-Yu Lee <cylee@google.com> Tested-by: Cheng-Yu Lee <cylee@google.com> Reviewed-by: Chung-yih Wang <cywang@chromium.org> [add] https://crrev.com/f7c40ff90f882bd14101096293c7c21063599dea/memory-eater-locked/memory-eater-locked.c [add] https://crrev.com/f7c40ff90f882bd14101096293c7c21063599dea/memory-eater-locked/Makefile
,
Apr 13 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/experimental/+/71e236cd2eb8aa7aef05aa0d63c42c21f490c50d commit 71e236cd2eb8aa7aef05aa0d63c42c21f490c50d Author: Cheng-Yu Lee <cylee@chromium.org> Date: Fri Apr 13 13:47:26 2018 Refactor fps_meter so it can be used both as a binary or a library. BUG=chromium:821046 TEST=manual Change-Id: I48ab9ce25d1de1defdbe8beee993fddcbe42c8f2 Reviewed-on: https://chromium-review.googlesource.com/988893 Commit-Ready: Cheng-Yu Lee <cylee@google.com> Tested-by: Cheng-Yu Lee <cylee@google.com> Reviewed-by: Chung-yih Wang <cywang@chromium.org> [modify] https://crrev.com/71e236cd2eb8aa7aef05aa0d63c42c21f490c50d/fps_meter/fps_meter.py
,
Apr 16 2018
,
Apr 16 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/8f120cd575c62a642d3a6be172e7dbe50433b174 commit 8f120cd575c62a642d3a6be172e7dbe50433b174 Author: Cheng-Yu Lee <cylee@chromium.org> Date: Mon Apr 16 16:20:34 2018 Add dependencies to tests_graphics_WebGLAquarium for measuring FPS under memory pressure. BUG=chromium:821046 TEST=manual CQ-DEPEND=CL:988892,CL:988894 Change-Id: I844ad69b469050713bfb91df692bd387e08e5c2b Reviewed-on: https://chromium-review.googlesource.com/988895 Commit-Ready: Cheng-Yu Lee <cylee@google.com> Tested-by: Cheng-Yu Lee <cylee@google.com> Reviewed-by: Vovo Yang <vovoy@chromium.org> [modify] https://crrev.com/8f120cd575c62a642d3a6be172e7dbe50433b174/chromeos-base/autotest-chrome/autotest-chrome-9999.ebuild
,
Jun 28 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/ef1f011d05db5a81b0aa3355a2216f287994dc3c commit ef1f011d05db5a81b0aa3355a2216f287994dc3c Author: Cheng-Yu Lee <cylee@chromium.org> Date: Thu Jun 28 05:07:24 2018 Add a new mode of graphics_WebGLAquarium to measure FPS under memory pressure. BUG=chromium:821046 TEST=manual Change-Id: I9ffdd183a1eaae093baf8417729b3797fb637b87 Reviewed-on: https://chromium-review.googlesource.com/979672 Commit-Ready: Cheng-Yu Lee <cylee@chromium.org> Tested-by: Cheng-Yu Lee <cylee@chromium.org> Reviewed-by: Ilja H. Friedel <ihf@chromium.org> Reviewed-by: Cheng-Yu Lee <cylee@chromium.org> [add] https://crrev.com/ef1f011d05db5a81b0aa3355a2216f287994dc3c/client/site_tests/graphics_WebGLAquarium/control.memory_pressure [add] https://crrev.com/ef1f011d05db5a81b0aa3355a2216f287994dc3c/client/site_tests/graphics_WebGLAquarium/system_sampler.py [modify] https://crrev.com/ef1f011d05db5a81b0aa3355a2216f287994dc3c/client/site_tests/graphics_WebGLAquarium/graphics_WebGLAquarium.py [add] https://crrev.com/ef1f011d05db5a81b0aa3355a2216f287994dc3c/client/common_lib/cros/memory_eater.py |
||
►
Sign in to add a comment |
||
Comment 1 by w...@chromium.org
, Mar 12 2018Summary: Discard tab when system is thrashing (was: Discard tab when system is trahshing)