New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 709662 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Log swapping activity

Project Member Reported by semenzato@chromium.org, Apr 7 2017

Issue description

When assessing high memory pressure situations in feedback reports, we often don't have enough information about the history of the memory pressure on the device.  The feedback logs include discards (in the chrome log) which can be a side effect of the pressure, but there are cases of high swap without discards, and typically the feedback report can only be sent after the system-slows-to-a-crawl episode has ended, and the snapshot of memory usage at that point usually offers no clue as to what happened.

We would like to dump some indication of swap activity in the syslog (/var/log/messages) relatively frequently, say every few seconds.  Currently the metrics daemon collects swap activity histogram samples every 30s, so we may want to coordinate the logging with that process.  We also need to ensure that the measurement is not too expensive (for instance, reading /proc/vmstat too often might be expensive, and we need to investigate if the vmstat command is lighter-weight than the procfs entry).

 
As part of this, consider including the separation of instruction vs. data page faults (see https://chromium-review.googlesource.com/c/470508/)
Cc: teravest@chromium.org
Components: Privacy
Cc: derat@chromium.org
+cc derat for logging advice.

Every few seconds seems like pretty rapid logging.
If we really need it to be that frequent to identify issues, it seems like we should:
  (1) log only while under enough memory pressure (though this may make it harder to read), or
  (2) Use a separate circular log so we don't spam /var/log/messages.

Maybe I'm overreacting, though.
My impression was that this would be like 2 -- it would go into a separate log with just the vmstat data in a circular buffer for say the last few minutes with logging every few seconds.  Each sample is fairly small and the buffer itself could be relatively small as well.

Comment 6 by derat@chromium.org, Apr 10 2017

Just some assorted questions:

How are we getting into situations where we swap without doing discards? Is that unavoidable in some situations (e.g. a renderer allocating memory very quickly or there actually not being any tabs open that we're willing to discard)?

When the system slows to a crawl due to swapping, are we confident that any userspace processes will still be performant enough to log useful information? Should we be logging this in the kernel instead?

It'd be good to talk more about what information we actually want to log. :-)
#6 yes, deciding what to log and how often is not obvious.

To answer some questions:

1. We swap a a fair amount before we start discarding, and this is intentional.  Early on we discovered that the discard rate, particularly on 2GB chromebooks, was too high and made several common workflows not possible.

2. We suspect that the extreme cases of jank and hung tasks etc. are due mostly to instruction page eviction, rather than swapping.  But yes, in those cases there is no guarantee that any daemon which logs this information will get enough cycles.  However, if the ramp-up to this situation is not extremely fast, we should be able to notice the increase of activity.

#5 do you really want to maintain a separate circular buffer?  And when would we dump it into the syslog?  Or put it in the ramoops on all panics?

I myself was thinking that something simpler would work for now.

re #7 - yes I really would rather have that vmstat info in a separate location -- IMO that is the simpler option for the actual user of the report -- I find it difficult enough to find things in the reports as it is, without mixing more stuff into the same spot :-)
Presumably a good start would be:
pgmajfault_a - Major Faults (anonymous)
pgmajfault_f - Major Faults (file)
pgmajfault_s - Major Faults (shmem)
pswpin - number of swapins
pswpout - number of swapouts

We can use pgmajfault until the new values are available. Luigi can chime in if he's looking for other specific things.

re #7, when would we dump it into syslog--
that's the tricky part :) Presumably we could do something like the memory pressure calculations that Chrome does inside the metrics daemon, but that just gets us back to only really logging when we're under pressure. This information is pretty uninteresting unless we're tight on memory, though, right?

Comment 10 by igo@chromium.org, Apr 13 2017

Owner: semenzato@chromium.org
Owner: teravest@chromium.org
Status: Assigned (was: Untriaged)
There's a design doc for logging this information from metrics_daemon here:
https://docs.google.com/document/d/1ciiJhSoJm1LvPWfFwlA2aQNo4ak2RPqUFqtH_VB7pLQ

I hope to implement this after wrapping up crbug.com/709660
Status: Started (was: Assigned)
Summary: Log swapping activity (was: Include swapping activity in syslog)
Project Member

Comment 13 by bugdroid1@chromium.org, Apr 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/42233fde5e833652c3d635ee9964ba0fd917cd51

commit 42233fde5e833652c3d635ee9964ba0fd917cd51
Author: Justin TerAvest <teravest@chromium.org>
Date: Fri Apr 28 21:16:08 2017

metrics: Add vmlog logging to metrics_daemon.

Historically, we've depended on having direct interaction with systems
to understand swap behavior on systems. This change logs some
information from /proc/vmstat at a high rate so that we get a better
understanding of swap and fault behavior around the time of a feedback
report.

I ran PowerIdleServer to check for any significant power change,
and it was within the noise. w_energy_rate was 6.846 before this
change, and 6.801 after.

BUG= chromium:709662 
TEST=unit tests, copied binary and inspected log and rotated output
CL-DEPEND=486090

Change-Id: Idf8767c3b31d11ad09d1929d60f86066d7f8a0d4
Reviewed-on: https://chromium-review.googlesource.com/487085
Commit-Ready: Justin TerAvest <teravest@chromium.org>
Tested-by: Justin TerAvest <teravest@chromium.org>
Reviewed-by: Dan Erat <derat@chromium.org>

[modify] https://crrev.com/42233fde5e833652c3d635ee9964ba0fd917cd51/metrics/metrics_daemon.cc
[add] https://crrev.com/42233fde5e833652c3d635ee9964ba0fd917cd51/metrics/vmlog_writer_test.cc
[modify] https://crrev.com/42233fde5e833652c3d635ee9964ba0fd917cd51/metrics/metrics_daemon_test.cc
[modify] https://crrev.com/42233fde5e833652c3d635ee9964ba0fd917cd51/metrics/metrics_daemon_main.cc
[add] https://crrev.com/42233fde5e833652c3d635ee9964ba0fd917cd51/metrics/vmlog_writer.cc
[modify] https://crrev.com/42233fde5e833652c3d635ee9964ba0fd917cd51/metrics/metrics.gyp
[add] https://crrev.com/42233fde5e833652c3d635ee9964ba0fd917cd51/metrics/vmlog_writer.h
[modify] https://crrev.com/42233fde5e833652c3d635ee9964ba0fd917cd51/metrics/metrics_daemon.h

Project Member

Comment 14 by bugdroid1@chromium.org, May 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/ba503f02353b71fc5f029171f7f8f44319374bbb

commit ba503f02353b71fc5f029171f7f8f44319374bbb
Author: Justin TerAvest <teravest@chromium.org>
Date: Mon May 08 19:40:56 2017

init: Clean up vmlog files.

metrics_daemon recently started writing vmlog files, summarizing memory
management behavior. This was added in the following commit:
  https://chromium-review.googlesource.com/c/487085/

This change cleans up those new log files.

BUG= chromium:709662 
TEST=None

Change-Id: I73acde379828b349de500505b5cedcfb1b11360e
Reviewed-on: https://chromium-review.googlesource.com/491327
Commit-Ready: Justin TerAvest <teravest@chromium.org>
Tested-by: Justin TerAvest <teravest@chromium.org>
Reviewed-by: Dan Erat <derat@chromium.org>

[modify] https://crrev.com/ba503f02353b71fc5f029171f7f8f44319374bbb/init/chromeos-cleanup-logs

Project Member

Comment 15 by bugdroid1@chromium.org, May 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/bf921a390d726ff671136c0aa0b6f17d0f534203

commit bf921a390d726ff671136c0aa0b6f17d0f534203
Author: Justin TerAvest <teravest@chromium.org>
Date: Sat May 20 10:55:15 2017

debugd: Collect vmlog files in log_tool.

metrics_daemon now writes details about VM behavior to log files in
/var/log/vmlog. This change ensures that they're picked up by debugd for
debugging.

There's no sensitive information in these files; just summaries of the
overall level of swap and page fault activity.

BUG= chromium:709662 
TEST=Built the debugd package

Change-Id: Ib8fd689d346adc36caa579522f86d062868f934a
Reviewed-on: https://chromium-review.googlesource.com/503608
Commit-Ready: Justin TerAvest <teravest@chromium.org>
Tested-by: Justin TerAvest <teravest@chromium.org>
Reviewed-by: Jorge Lucangeli Obes <jorgelo@chromium.org>
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/bf921a390d726ff671136c0aa0b6f17d0f534203/debugd/src/log_tool.cc

Status: Fixed (was: Started)
Labels: M-60

Comment 18 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment