platform_MemoryPressure tab discard before using swap space on eve |
||
Issue descriptionRunning platform_MemoryPressure on an eve with nvme and an R71 image and increasing the timeout to give it time to fill the 16G RAM, I am getting this output: MemTotal 16298600 Phase1DiscardCount 1 Phase1MaxPageFaultRate 88.5113979446 Phase1MemFree 319916 Phase1PageFaultRate 0.99900030992 Phase1SwapFree 23077716 Phase1TabCount 102 Phase1Time 14416.309833 Phase2DiscardCount 1 Phase2MaxPageFaultRate 1.19888700813 Phase2MemFree 360016 Phase2PageFaultRate 0.499499595797 Phase2SwapFree 23077716 Phase2TabCount 102 Phase2Time 70.0698299408 SwapTotal 23874900 which shows that there is a tab discard even though SwapFree is 23G. This is the output of swap on crosh: crosh> swap Filename Type Size Used Priority /dev/zram0 partition 23874900 84736 -2 low-memory margin (MiB): 827 min_filelist_kbytes (KiB): 100000 extra_free_kbytes (KiB): 0 top-level entries in /sys/block/zram0: alignment_offset:0 backing_dev:/dev/loop7 capability:10 comp_algorithm:[lzo] deflate debug_stat:version: 1 debug_stat: 0 dev:253:0 discard_alignment:0 disksize:24447901696 ext_range:1 inflight: 0 0 initstate:1 io_stat: 0 0 0 30329449 max_comp_streams:4 mm_stat:83058688 5053829 6397952 0 372879360 12890 32567 range:1 removable:0 ro:0 size:47749808 stat: 1130951 0 9047608 16074 29255700 0 234045600 64368 0 59236 80351 uevent:MAJOR=253 uevent:MINOR=0 uevent:DEVNAME=zram0 uevent:DEVTYPE=disk
,
Oct 16
post the chrome logs?
,
Oct 17
attaching the chrome logs
,
Oct 17
It looks like chrome_20181016-143254 is zero bytes? The other ones don't seem to show any discards, so I think chrome_20181016-143254 is probably the interesting one?
,
Oct 17
Sorry, I attached the chrome logs from the test_that log_diff/chrome/ where it is zero bytes, I am attaching a copy of the log from the device now.
,
Oct 17
That's strange I don't see any discards in that one either. Is is possible there wasn't a discard but an OOM kill or something else like that happened?
,
Oct 17
It's possible that the tab was killed for other reasons. The test only counts the number of tabs, and assumes that they go away only when discarded. Maybe run the test again?
,
Oct 17
The end of chrome_20181016-103104 seems to align more with the time, since GOOD END GOOD is at 14:32:55, so I am attaching that full log as well. I am seeing this message at the end of that log: [20271:20271:1016/143251.963256:ERROR:tab_manager_delegate_chromeos.cc(384)] Failed to set oom_score_adj to 300 for focused tab, pid: 29137
,
Oct 17
I have run the test multiple times and this seems to be consistent behavior, but I'm running it now with reduced RAM size to 8G and it seems to be fine, currently it is using almost 10G of swap so it hasn't had the same issue.
,
Oct 17
Yeah I looked through that one as well. I think the message you mention there is mostly unrelated -- it happens when tab focus changes, and seems to periodically happen.
,
Oct 17
so it only happens when you run with 16GB of RAM?
,
Oct 17
Yes, at least that is what I have seen.
,
Oct 17
So, it would be nice if we could sample available memory instead of just reporting MemFree which comes straight out of /proc/meminfo to see if there's some of issue with how available is being calculated. MemFree looks like around 300 MB with 23000 MB of swap available, and the margin is 827MB. With the swap alone we should have 23000/4 or 5750 MB available, so this is way above the margin and we shouldn't be discarding. Alexis, could you run something like "watch -d cat /sys/kernel/mm/chromeos-low_mem/available" and check and see if the value is going down correctly over the course of the test? I don't actually have a 16GB device so I cannot do any testing to confirm. I also notice this line from /sys/block/zram0 above: backing_dev:/dev/loop7 So is this is with the new backing dev stuff turned on? Does it still happen if that's turned off? Maybe there's some kind of bug there?
,
Oct 17
Yes, I was attempting to look at the difference between with and without backing_dev, but I was seeing the problem with it turned off as well. I will run it again monitoring the available value with the backing dev stuff turned off to be sure that isn't affecting anything.
,
Oct 17
I checked the test logs in: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/testDetails?testName=platform_MemoryPressure In most logs, Phase1DiscardCount in platform_MemoryPressure/result/keyval > 0, so this issue is not eve only.
,
Oct 20
As an update, I flashed the device with eve/latest (R70-11021.50.0) and ran the test again and got this output: MemTotal 16304816 Phase1DiscardCount 5 Phase1MaxPageFaultRate 26261.7905648 Phase1MemFree 310716 Phase1PageFaultRate 4657.29764454 Phase1SwapFree 3189620 Phase1TabCount 180 Phase1Time 136227.241787 Phase2DiscardCount 5 Phase2MaxPageFaultRate 7732.47235179 Phase2MemFree 276448 Phase2PageFaultRate 2682.09832483 Phase2SwapFree 3336852 Phase2TabCount 180 Phase2Time 70.0717740059 SwapTotal 23884004 with /sys/kernel/mm/chromeos-low_mem/available getting as low as 920. It clearly has used more of the swap space, but there is still 3G of swap free. Running the test with an image I built (R71-11142) with kernel v4.14 (all of the writeback/backing dev stuff is turned off), I get this output: MemTotal 16298600 Phase1DiscardCount 1 Phase1MaxPageFaultRate 99.8971161628 Phase1MemFree 484040 Phase1PageFaultRate 0.0998990126101 Phase1SwapFree 22925140 Phase1TabCount 107 Phase1Time 14415.7493062 Phase2DiscardCount 1 Phase2MaxPageFaultRate 0.999009161446 Phase2MemFree 474916 Phase2PageFaultRate 0.0 Phase2SwapFree 22925140 Phase2TabCount 107 Phase2Time 70.0703189373 SwapTotal 23874900 with /sys/kernel/mm/chromeos-low_mem/available only getting to 3986 before the test ends. I am running the test again on an image I built with kernel 4.4 to make sure that is not a factor. I am attaching logs of the /sys/kernel/mm/chromeos-low_mem/available output for reference, although I'm not sure how useful they are without the context of the test. Note that the log from the stable image starts with lower values because I started collecting the output with timestamps later into the test.
,
Oct 23
I think the numbers from the R70 look okay - if there's 3GB left then we've used 20GB of swap, so we've used quite a bit of it. Is there anything interesting in the Chrome logs on that 4.14 run? Also, let me ask this -- if you take your R71 image and use kernel 4.4 without zram backing_dev instead of 4.14, do you still see the problem?
,
Oct 23
So taking my R71 image and using kernel 4.4 showed the same behavior as with 4.14. I didn't see anything obvious in the Chrome logs, they are similar to those already attached. I am currently running the test with a R72-11174.0.0 canary image, and I haven't seen the problem (currently it is using about 7G swap and available is about 3100) so it seems like I can probably avoid the problem if I repo sync, but I don't know what the actual cause is.
,
Oct 23
re #19 -- ok thanks for trying that. It sounds like this is a change in chrome behavior then rather than kernel behavior. I'm not sure I understand the comment in #16 -- the Phase1DiscardCount should always be higher than 0
,
Oct 23
re #20, in the bug description "which shows that there is a tab discard even though SwapFree is 23G.", I think the discard refers to Phase1DiscardCount = 1. It's common to have Phase1DiscardCount > 0 and I think Phase1DiscardCount > 0 is not a regression.
,
Oct 24
re #21 -- yes we should always have at least 1 discard, the question is though are we seeing discards with a lot of swap space still used on other systems as well?
,
Nov 6
|
||
►
Sign in to add a comment |
||
Comment 1 by semenzato@chromium.org
, Oct 16