New issue
Advanced search Search tips

Issue 896031 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

platform_MemoryPressure tab discard before using swap space on eve

Project Member Reported by asavery@chromium.org, Oct 16

Issue description

Running platform_MemoryPressure on an eve with nvme and an R71 image and increasing the timeout to give it time to fill the 16G RAM, I am getting this output:
MemTotal                                                                              16298600
Phase1DiscardCount                                                                    1
Phase1MaxPageFaultRate                                                                88.5113979446
Phase1MemFree                                                                         319916
Phase1PageFaultRate                                                                   0.99900030992
Phase1SwapFree                                                                        23077716
Phase1TabCount                                                                        102
Phase1Time                                                                            14416.309833
Phase2DiscardCount                                                                    1
Phase2MaxPageFaultRate                                                                1.19888700813
Phase2MemFree                                                                         360016
Phase2PageFaultRate                                                                   0.499499595797
Phase2SwapFree                                                                        23077716
Phase2TabCount                                                                        102
Phase2Time                                                                            70.0698299408
SwapTotal                                                                             23874900

which shows that there is a tab discard even though SwapFree is 23G.
This is the output of swap on crosh:
crosh> swap 
Filename				Type		Size	Used	Priority
/dev/zram0                              partition	23874900	84736	-2
low-memory margin (MiB): 827
min_filelist_kbytes (KiB): 100000
extra_free_kbytes (KiB): 0

top-level entries in /sys/block/zram0:
alignment_offset:0
backing_dev:/dev/loop7
capability:10
comp_algorithm:[lzo] deflate 
debug_stat:version: 1
debug_stat:       0
dev:253:0
discard_alignment:0
disksize:24447901696
ext_range:1
inflight:       0        0
initstate:1
io_stat:       0        0        0 30329449
max_comp_streams:4
mm_stat:83058688  5053829  6397952        0 372879360    12890    32567
range:1
removable:0
ro:0
size:47749808
stat: 1130951        0  9047608    16074 29255700        0 234045600    64368        0    59236    80351
uevent:MAJOR=253
uevent:MINOR=0
uevent:DEVNAME=zram0
uevent:DEVTYPE=disk

 
Cc: sonnyrao@chromium.org
Eve uses a swap_vs_ram_ratio of 4.  Thus 22GB of free swap should push the available RAM up to at least 5.5GB.  I have no idea why there is a tab discard here.  Maybe Chrome is using some other signal in this case?
post the chrome logs?
attaching the chrome logs
chrome_20181016-143254
0 bytes View Download
chrome_20181016-103104
102 KB View Download
chrome_20181016-103103
6.4 KB View Download
chrome.PREVIOUS
102 KB Download
chrome_20181015-214406
549 bytes View Download
It looks like chrome_20181016-143254  is zero bytes?
The other ones don't seem to show any discards, so I think chrome_20181016-143254  is probably the interesting one?
Sorry, I attached the chrome logs from the test_that log_diff/chrome/ where it is zero bytes, I am attaching a copy of the log from the device now.
chrome_20181016-143254
11.5 KB View Download
That's strange I don't see any discards in that one either.  Is is possible there wasn't a discard but an OOM kill or something else like that happened?
It's possible that the tab was killed for other reasons.  The test only counts the number of tabs, and assumes that they go away only when discarded.

Maybe run the test again?

The end of chrome_20181016-103104 seems to align more with the time, since GOOD END GOOD is at 14:32:55, so I am attaching that full log as well.
I am seeing this message at the end of that log:
[20271:20271:1016/143251.963256:ERROR:tab_manager_delegate_chromeos.cc(384)] Failed to set oom_score_adj to 300 for focused tab, pid: 29137
chrome_20181016-103104
102 KB View Download
I have run the test multiple times and this seems to be consistent behavior, but I'm running it now with reduced RAM size to 8G and it seems to be fine, currently it is using almost 10G of swap so it hasn't had the same issue.
Yeah I looked through that one as well.  I think the message you mention there is mostly unrelated -- it happens when tab focus changes, and seems to periodically happen.
so it only happens when you run with 16GB of RAM?
Yes, at least that is what I have seen.
So, it would be nice if we could sample available memory instead of just reporting MemFree which comes straight out of /proc/meminfo to see if there's some of issue with how available is being calculated.

MemFree looks like around 300 MB with 23000 MB of swap available, and the margin is 827MB. With the swap alone we should have 23000/4 or 5750 MB available, so this is way above the margin and we shouldn't be discarding.

Alexis, could you run something like "watch -d cat /sys/kernel/mm/chromeos-low_mem/available"

and check and see if the value is going down correctly over the course of the test?

I don't actually have a 16GB device so I cannot do any testing to confirm.


I also notice this line from /sys/block/zram0 above:
backing_dev:/dev/loop7

So is this is with the new backing dev stuff turned on?
Does it still happen if that's turned off?
Maybe there's some kind of bug there?

Yes, I was attempting to look at the difference between with and without backing_dev, but I was seeing the problem with it turned off as well. I will run it again monitoring the available value with the backing dev stuff turned off to be sure that isn't affecting anything.

Comment 15 Deleted

I checked the test logs in:
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/testDetails?testName=platform_MemoryPressure

In most logs, Phase1DiscardCount in platform_MemoryPressure/result/keyval > 0, so this issue is not eve only.
As an update, I flashed the device with eve/latest (R70-11021.50.0) and ran the test again and got this output:
MemTotal                                                                              16304816
Phase1DiscardCount                                                                    5
Phase1MaxPageFaultRate                                                                26261.7905648
Phase1MemFree                                                                         310716
Phase1PageFaultRate                                                                   4657.29764454
Phase1SwapFree                                                                        3189620
Phase1TabCount                                                                        180
Phase1Time                                                                            136227.241787
Phase2DiscardCount                                                                    5
Phase2MaxPageFaultRate                                                                7732.47235179
Phase2MemFree                                                                         276448
Phase2PageFaultRate                                                                   2682.09832483
Phase2SwapFree                                                                        3336852
Phase2TabCount                                                                        180
Phase2Time                                                                            70.0717740059
SwapTotal                                                                             23884004

with /sys/kernel/mm/chromeos-low_mem/available getting as low as 920. It clearly has used more of the swap space, but there is still 3G of swap free.

Running the test with an image I built (R71-11142) with kernel v4.14 (all of the writeback/backing dev stuff is turned off), I get this output:
MemTotal                                                                              16298600
Phase1DiscardCount                                                                    1
Phase1MaxPageFaultRate                                                                99.8971161628
Phase1MemFree                                                                         484040
Phase1PageFaultRate                                                                   0.0998990126101
Phase1SwapFree                                                                        22925140
Phase1TabCount                                                                        107
Phase1Time                                                                            14415.7493062
Phase2DiscardCount                                                                    1
Phase2MaxPageFaultRate                                                                0.999009161446
Phase2MemFree                                                                         474916
Phase2PageFaultRate                                                                   0.0
Phase2SwapFree                                                                        22925140
Phase2TabCount                                                                        107
Phase2Time                                                                            70.0703189373
SwapTotal                                                                             23874900

with /sys/kernel/mm/chromeos-low_mem/available only getting to 3986 before the test ends.
I am running the test again on an image I built with kernel 4.4 to make sure that is not a factor.
I am attaching logs of the /sys/kernel/mm/chromeos-low_mem/available output for reference, although I'm not sure how useful they are without the context of the test. Note that the log from the stable image starts with lower values because I started collecting the output with timestamps later into the test.
available_w_time_stable
2.7 MB View Download
available_w_time_tb_4_14
244 KB View Download
I think the numbers from the R70 look okay - if there's 3GB left then we've used 20GB of swap, so we've used quite a bit of it.  

Is there anything interesting in the Chrome logs on that 4.14 run?

Also, let me ask this -- if you take your R71 image and use kernel 4.4 without zram backing_dev instead of 4.14, do you still see the problem?
So taking my R71 image and using kernel 4.4 showed the same behavior as with 4.14. I didn't see anything obvious in the Chrome logs, they are similar to those already attached.
I am currently running the test with a R72-11174.0.0 canary image, and I haven't seen the problem (currently it is using about 7G swap and available is about 3100) so it seems like I can probably avoid the problem if I repo sync, but I don't know what the actual cause is.
re #19 -- ok thanks for trying that.  It sounds like this is a change in chrome behavior then rather than kernel behavior.

I'm not sure I understand the comment in #16 -- the Phase1DiscardCount should always be higher than 0
re #20, in the bug description "which shows that there is a tab discard even though SwapFree is 23G.", I think the discard refers to Phase1DiscardCount = 1.

It's common to have Phase1DiscardCount > 0 and I think Phase1DiscardCount > 0 is not a regression.
re #21 -- yes we should always have at least 1 discard, the question is though are we seeing discards with a lot of swap space still used on other systems as well?
Cc: vovoy@chromium.org

Sign in to add a comment