The memory pressure monitor for Chrome OS uses an 1-second interval check. It resends the notification if the system stays under critical memory pressure on next check. If, for any reason, the first handler dispatch runs for more than 1 sec and hasn't relieved the system from memory pressure yet, the memory pressure monitor sends another notification unconditionally. This is bad beacuse the system is so busy that it takes > 1 sec to handle the memory pressure, and we send another notification to make it busier. Take one user feedback for example:
1st notification:
2018-09-19 01:35:31.491 3 2250:2250: device_event_log_impl.cc(159): [01:35:31.491] Memory: tab_manager_delegate_chromeos.cc:571 List of low memory kill candidates (sorted from low priority to high priority):
... (list of tabs and apps)
... (kill until target memory to free is < 0)
2018-09-19 01:35:32.407 3 2250:2250: device_event_log_impl.cc(159): [01:35:32.407] Memory: tab_manager_delegate_chromeos.cc:584 Target memory to free: -66680 KB
2018-09-19 01:35:32.407 3 2250:2250: device_event_log_impl.cc(159): [01:35:32.407] Memory: tab_manager_delegate_chromeos.cc:649 Time to first kill 4.67197 s
Then follows 5 more notifications:
2018-09-19 01:35:32.428 3 2250:2250: device_event_log_impl.cc(159): [01:35:32.428] Memory: tab_manager_delegate_chromeos.cc:649 Time to first kill 4.61998 s
2018-09-19 01:35:32.443 3 2250:2250: device_event_log_impl.cc(159): [01:35:32.443] Memory: tab_manager_delegate_chromeos.cc:649 Time to first kill 3.76112 s
2018-09-19 01:35:32.647 3 2250:2250: device_event_log_impl.cc(159): [01:35:32.647] Memory: tab_manager_delegate_chromeos.cc:649 Time to first kill 2.98576 s
2018-09-19 01:35:32.657 3 2250:2250: device_event_log_impl.cc(159): [01:35:32.657] Memory: tab_manager_delegate_chromeos.cc:649 Time to first kill 2.97839 s
2018-09-19 01:35:32.674 3 2250:2250: device_event_log_impl.cc(159): [01:35:32.674] Memory: tab_manager_delegate_chromeos.cc:649 Time to first kill 1.98833 s
Then follows the result of 6 async dispatches of OOM memory details (tons of /proc/ system IO calls).
2018-09-19 01:35:32.723 4 2250:2250: oom_memory_details.cc(46): Tab Discards Memory details (4040 ms):
...
2018-09-19 01:35:32.724 4 2250:2250: oom_memory_details.cc(46): Tab Discards Memory details (3045 ms):
...
2018-09-19 01:35:32.768 4 2250:2250: oom_memory_details.cc(46): Tab Discards Memory details (3114 ms):
...
2018-09-19 01:35:32.801 4 2250:2250: oom_memory_details.cc(46): Tab Discards Memory details (2116 ms):
...
2018-09-19 01:35:33.422 4 2250:2250: oom_memory_details.cc(46): Tab Discards Memory details (702 ms):
...
2018-09-19 01:35:33.444 4 2250:2250: oom_memory_details.cc(46): Tab Discards Memory details (712 ms):
...
Note that the time span of 6 low memory handling calls is about 1.2 sec, where the 1st notification took the longest time (almost 1 sec). The subsequent 5 notifications probably ran under lighter low-mem conditions because they finished much faster than the 1st one, but each notification actually killed one (app or tab) process.
If the memory pressure is over when the handler runs, the handler call becomes a noop that kills nothing. All the cycles are wasted (on the UI thread).
2018-09-19 01:37:03.141 3 2250:2250: device_event_log_impl.cc(159): [01:37:03.139] Memory: tab_manager_delegate_chromeos.cc:571 List of low memory kill candidates (sorted from low priority to high priority):
... (the list of tabs and apps)
(No tab or app is killed)
2018-09-19 01:37:03.150 3 2250:2250: device_event_log_impl.cc(159): [01:37:03.150] Memory: tab_manager_delegate_chromeos.cc:584 Target memory to free: -3072 KB
If we look at the first-kill latencies, we can see that they overlapped in time. That is, the 2nd notification actually takes running of the 1st notification into account, and this applies to the subsequent notificaitons. This all resulted from the 1st notification creating a "traffic jam" in the thread-hopping of the low-mem handler calls, but the traffic souce, the memory pressure monitor, still adds more traffic to make the situation worse. Having first-kill latencies overlapping in time is also misleading from UMA metrics and makes the number look worse than it actually should be.
I think we should suppress the low-mem handler dispatch from the memory pressure monitor if an existing handler call is still running.
Comment 1 by chinglinyu@chromium.org
, Oct 17