New issue
Advanced search Search tips

Issue 901576 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

platform_MemoryPressure unexpected behavior with kernel 4.19

Project Member Reported by asavery@chromium.org, Nov 3

Issue description

When I run platform_MemoryPressure on a fizz with image R72-11220.0.0, I get this output:
MemTotal                                                                              16304304
Phase1DiscardCount                                                                    1
Phase1MaxPageFaultRate                                                                6973.84278423
Phase1MemFree                                                                         264664
Phase1PageFaultRate                                                                   838.782935848
Phase1SwapFree                                                                        2871268
Phase1TabCount                                                                        71
Phase1Time                                                                            2382.15241718
Phase2DiscardCount                                                                    1
Phase2MaxPageFaultRate                                                                10765.4426211
Phase2MemFree                                                                         285316
Phase2PageFaultRate                                                                   2636.84288747
Phase2SwapFree                                                                        2934436
Phase2TabCount                                                                        71
Phase2Time                                                                            70.0634291172
SwapTotal                                                                             23883256

but when I upgrade the kernel to 4.19, I get this output:
MemTotal                                                                              16298004
Phase1DiscardCount                                                                    1
Phase1MaxPageFaultRate                                                                62.2182671714
Phase1MemFree                                                                         10348272
Phase1PageFaultRate                                                                   0.0
Phase1SwapFree                                                                        23874028
Phase1TabCount                                                                        20
Phase1Time                                                                            280.45522809
Phase2DiscardCount                                                                    1
Phase2MaxPageFaultRate                                                                0.0
Phase2MemFree                                                                         10012700
Phase2PageFaultRate                                                                   0.0
Phase2SwapFree                                                                        23874028
Phase2TabCount                                                                        20
Phase2Time                                                                            70.0700008869
SwapTotal                                                                             23874028

showing a tab discard with still 10G MemFree and no swap usage. /sys/kernel/mm/chromeos-low_mem/available seems to be decreasing correctly and only gets as low as about 12000, so the issue doesn't seem to be on that side. I am attaching some of the test_that_results logs.
 
memPress_4_19_results_part.tar
5.3 MB Download
Labels: Kernel-4.19
Cc: vovoy@chromium.org
We only send a discard request to Chrome when available crosses margin.  We may compute available incorrectly, and we've had such bugs, but the condition for the low-memory notifier to fire is a simple one (available < margin).

We've seen tab discards happening without the low-memory threshold being crossed.  I thought we had a bug open for this but cannot find it---if there isn't one, let's use this one then, since we have a repro case.

One theory is that chrome proactively discards tabs (possibly based on some ML algorithm) even when memory pressure is low.  It's possible that some of the signals used by such algorithm has changed with kernel 4.19.  This of course assumes that this problem is only reproducible on 4.19 and there are no other differences.

If the theory is correct, we may want to revise it for Chrome OS.  On other platforms, other apps can benefit from the tab discard.  On Chrome OS this is less clear---even if ARC++ and/or VMs are running, we can coordinate memory usage across all components, which we cannot do on Windows/MacOS.
re #3 -- we should be able to tell if that happened from memd logs (like we were looking at on my other system)

We'd see a clip where theres a discard but no associated low memory condition
if the image has this CL: https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1312383 

otherwise, we wouldn't see a discard in any of the clips

If we think 4.19 is broken with respect to low memory detection, there's lower level tests we can run like kernel_LowMemNotify to see whether it's working or not

If tab discarder is triggered, there should be a string "Target memory to free:" in the log. I didn't find such log in memPress_4_19_results_part.tar.

As the discussion in crbug.com/896031 , Phase1DiscardCount=1 doesn't mean there is real tab discard.

I ran kernel_LowMemNotify and it passes, so it doesn't seem to be a problem with low memory detection. I also ran the simple version of the test, and I don't see the same problem. Obviously the output is less detailed, but monitoring the memory use it is using the swap space. Looking at the logs, I do see "Target memory to free:" as described in #5 when I run the simple test, but not when I run the realistic version.
I think that I have narrowed down the problem. Running the test with kernel 4.19 on R70-11018.0.0, the test runs as expected, but running the test with R70-11019.0.0 the test has the behavior described above. With kernel version 4.4 and 4.14, R70-11019.0.0 runs as expected, so I only see the behavior with kernel 4.19. I am attaching the keyval and /var/log/messages for R70-11018.0.0 with 4.19 and R70-11019.0.0 with 4.14 and 4.19 for comparison. The keyval for R70-11019 and 4.19 shows that we see a discard before we get to low memory. The messages for R70-11018 and R70-11019 with 4.14 show "entering low_mem" which I don't see in R70-11019 with 4.19, which has "Received crash notification for chrome[5492]" instead.
keyval_4_14_R70-11019
492 bytes View Download
keyval_4_19_R70-11018
491 bytes View Download
keyval_4_19_R70-11019
465 bytes View Download
messages_4_14_R70-11019
380 KB View Download
messages_4_19_R70-11018
321 KB View Download
messages_4_19_R70-11019
94.8 KB View Download
Huh, so I looked at the diff between 11018.0.0 and 11019.0.0 and don't see anything obvious, but here it is:

https://crosland.corp.google.com/log/11018.0.0..11019.0.0

Does that crash happen consistently?  It looks like Signal 5 SIGTRAP -- so Chrome is most likely hitting a DCHECK
Could it be a chrome change?
re #9 -- chrome didn't change between those two versions
chromeos-4.19 didn't change either, and neither did it change in any of the surrounding versions. Confused.

In memPress_4_19_results_part.tar/test_that_results_rbIqi2/results-1-platform_MemoryPressure/platform_MemoryPressure/debug/platform_MemoryPressure.INFO , the log shows that the test terminated early because of a devtools crash exception.

11/02 17:34:36.324 WARNI|platform_MemoryPre:0210| network wait exception Devtools target crashed
(/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:539 _AddDebuggingInformation) Received a socket error in the browser connection and the tab no longer exists. The tab probably crashed.

asavery@, please help to check if the devtools always crash on the test with kernel 4.19.
In the bug description, the issue of the test instance with kernel 4.19 is that it terminated when there are only 20 tabs due to devtools crash.

I tried to run platform_MemoryPressure once on kench (fizz variant) with R73-11437.0.0 image, 16 GiB RAM and kernel 4.19, and the test passed with 160+ tabs created. I think the issue in bug description is a flaky issue.

It take hours to run platform_MemoryPressure with 16 GiB RAM. Is it OK to run the test on 4 GiB machine for zswap testing? Or we may modify platform_MemoryPressure to make it take less time on 16 GiB machine.
I have also proposed modifying the test to allocate and lock a bunch of RAM at startup.  There's some concern that the test would not be as realistic.  I think it's OK to do that for performance measurements, as the locked memory would behave as a number of unused tabs.  Other tests can check functionality without the need for realism.

Sign in to add a comment