Log swapping activity |
||||||||||
Issue descriptionWhen assessing high memory pressure situations in feedback reports, we often don't have enough information about the history of the memory pressure on the device. The feedback logs include discards (in the chrome log) which can be a side effect of the pressure, but there are cases of high swap without discards, and typically the feedback report can only be sent after the system-slows-to-a-crawl episode has ended, and the snapshot of memory usage at that point usually offers no clue as to what happened. We would like to dump some indication of swap activity in the syslog (/var/log/messages) relatively frequently, say every few seconds. Currently the metrics daemon collects swap activity histogram samples every 30s, so we may want to coordinate the logging with that process. We also need to ensure that the measurement is not too expensive (for instance, reading /proc/vmstat too often might be expensive, and we need to investigate if the vmstat command is lighter-weight than the procfs entry).
,
Apr 7 2017
,
Apr 9 2017
,
Apr 10 2017
+cc derat for logging advice. Every few seconds seems like pretty rapid logging. If we really need it to be that frequent to identify issues, it seems like we should: (1) log only while under enough memory pressure (though this may make it harder to read), or (2) Use a separate circular log so we don't spam /var/log/messages. Maybe I'm overreacting, though.
,
Apr 10 2017
My impression was that this would be like 2 -- it would go into a separate log with just the vmstat data in a circular buffer for say the last few minutes with logging every few seconds. Each sample is fairly small and the buffer itself could be relatively small as well.
,
Apr 10 2017
Just some assorted questions: How are we getting into situations where we swap without doing discards? Is that unavoidable in some situations (e.g. a renderer allocating memory very quickly or there actually not being any tabs open that we're willing to discard)? When the system slows to a crawl due to swapping, are we confident that any userspace processes will still be performant enough to log useful information? Should we be logging this in the kernel instead? It'd be good to talk more about what information we actually want to log. :-)
,
Apr 10 2017
#6 yes, deciding what to log and how often is not obvious. To answer some questions: 1. We swap a a fair amount before we start discarding, and this is intentional. Early on we discovered that the discard rate, particularly on 2GB chromebooks, was too high and made several common workflows not possible. 2. We suspect that the extreme cases of jank and hung tasks etc. are due mostly to instruction page eviction, rather than swapping. But yes, in those cases there is no guarantee that any daemon which logs this information will get enough cycles. However, if the ramp-up to this situation is not extremely fast, we should be able to notice the increase of activity. #5 do you really want to maintain a separate circular buffer? And when would we dump it into the syslog? Or put it in the ramoops on all panics? I myself was thinking that something simpler would work for now.
,
Apr 10 2017
re #7 - yes I really would rather have that vmstat info in a separate location -- IMO that is the simpler option for the actual user of the report -- I find it difficult enough to find things in the reports as it is, without mixing more stuff into the same spot :-)
,
Apr 12 2017
Presumably a good start would be: pgmajfault_a - Major Faults (anonymous) pgmajfault_f - Major Faults (file) pgmajfault_s - Major Faults (shmem) pswpin - number of swapins pswpout - number of swapouts We can use pgmajfault until the new values are available. Luigi can chime in if he's looking for other specific things. re #7, when would we dump it into syslog-- that's the tricky part :) Presumably we could do something like the memory pressure calculations that Chrome does inside the metrics daemon, but that just gets us back to only really logging when we're under pressure. This information is pretty uninteresting unless we're tight on memory, though, right?
,
Apr 13 2017
,
Apr 19 2017
There's a design doc for logging this information from metrics_daemon here: https://docs.google.com/document/d/1ciiJhSoJm1LvPWfFwlA2aQNo4ak2RPqUFqtH_VB7pLQ I hope to implement this after wrapping up crbug.com/709660
,
Apr 25 2017
,
Apr 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/42233fde5e833652c3d635ee9964ba0fd917cd51 commit 42233fde5e833652c3d635ee9964ba0fd917cd51 Author: Justin TerAvest <teravest@chromium.org> Date: Fri Apr 28 21:16:08 2017 metrics: Add vmlog logging to metrics_daemon. Historically, we've depended on having direct interaction with systems to understand swap behavior on systems. This change logs some information from /proc/vmstat at a high rate so that we get a better understanding of swap and fault behavior around the time of a feedback report. I ran PowerIdleServer to check for any significant power change, and it was within the noise. w_energy_rate was 6.846 before this change, and 6.801 after. BUG= chromium:709662 TEST=unit tests, copied binary and inspected log and rotated output CL-DEPEND=486090 Change-Id: Idf8767c3b31d11ad09d1929d60f86066d7f8a0d4 Reviewed-on: https://chromium-review.googlesource.com/487085 Commit-Ready: Justin TerAvest <teravest@chromium.org> Tested-by: Justin TerAvest <teravest@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> [modify] https://crrev.com/42233fde5e833652c3d635ee9964ba0fd917cd51/metrics/metrics_daemon.cc [add] https://crrev.com/42233fde5e833652c3d635ee9964ba0fd917cd51/metrics/vmlog_writer_test.cc [modify] https://crrev.com/42233fde5e833652c3d635ee9964ba0fd917cd51/metrics/metrics_daemon_test.cc [modify] https://crrev.com/42233fde5e833652c3d635ee9964ba0fd917cd51/metrics/metrics_daemon_main.cc [add] https://crrev.com/42233fde5e833652c3d635ee9964ba0fd917cd51/metrics/vmlog_writer.cc [modify] https://crrev.com/42233fde5e833652c3d635ee9964ba0fd917cd51/metrics/metrics.gyp [add] https://crrev.com/42233fde5e833652c3d635ee9964ba0fd917cd51/metrics/vmlog_writer.h [modify] https://crrev.com/42233fde5e833652c3d635ee9964ba0fd917cd51/metrics/metrics_daemon.h
,
May 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/ba503f02353b71fc5f029171f7f8f44319374bbb commit ba503f02353b71fc5f029171f7f8f44319374bbb Author: Justin TerAvest <teravest@chromium.org> Date: Mon May 08 19:40:56 2017 init: Clean up vmlog files. metrics_daemon recently started writing vmlog files, summarizing memory management behavior. This was added in the following commit: https://chromium-review.googlesource.com/c/487085/ This change cleans up those new log files. BUG= chromium:709662 TEST=None Change-Id: I73acde379828b349de500505b5cedcfb1b11360e Reviewed-on: https://chromium-review.googlesource.com/491327 Commit-Ready: Justin TerAvest <teravest@chromium.org> Tested-by: Justin TerAvest <teravest@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> [modify] https://crrev.com/ba503f02353b71fc5f029171f7f8f44319374bbb/init/chromeos-cleanup-logs
,
May 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/bf921a390d726ff671136c0aa0b6f17d0f534203 commit bf921a390d726ff671136c0aa0b6f17d0f534203 Author: Justin TerAvest <teravest@chromium.org> Date: Sat May 20 10:55:15 2017 debugd: Collect vmlog files in log_tool. metrics_daemon now writes details about VM behavior to log files in /var/log/vmlog. This change ensures that they're picked up by debugd for debugging. There's no sensitive information in these files; just summaries of the overall level of swap and page fault activity. BUG= chromium:709662 TEST=Built the debugd package Change-Id: Ib8fd689d346adc36caa579522f86d062868f934a Reviewed-on: https://chromium-review.googlesource.com/503608 Commit-Ready: Justin TerAvest <teravest@chromium.org> Tested-by: Justin TerAvest <teravest@chromium.org> Reviewed-by: Jorge Lucangeli Obes <jorgelo@chromium.org> Reviewed-by: Sonny Rao <sonnyrao@chromium.org> [modify] https://crrev.com/bf921a390d726ff671136c0aa0b6f17d0f534203/debugd/src/log_tool.cc
,
May 23 2017
,
May 23 2017
,
Jan 22 2018
|
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by semenzato@chromium.org
, Apr 7 2017