uma-events fills disk |
|||
Issue description
I have a whirlwind testbed router (version of whirlwind image used for Wifi testing) whose stateful partition is totally full of UMA events:
# cat /etc/lsb-release
CHROMEOS_RELEASE_APPID={63E6908B-4769-4985-AD03-72E2C9E77D05}
CHROMEOS_BOARD_APPID={63E6908B-4769-4985-AD03-72E2C9E77D05}
CHROMEOS_CANARY_APPID={90F229CE-83E2-4FAF-8479-E368A34938B1}
DEVICETYPE=OTHER
HWID_OVERRIDE=WHIRLWIND DOGFOOD
CHROMEOS_RELEASE_BUILDER_PATH=trybot-whirlwind-test-ap-tryjob/R65-10323.33.0-c40061
GOOGLE_RELEASE=10323.33.2018_02_15_1605
CHROMEOS_DEVSERVER=http://build41-m2.golo.chromium.org:8080
CHROMEOS_RELEASE_BOARD=whirlwind
CHROMEOS_RELEASE_BUILD_NUMBER=10323
CHROMEOS_RELEASE_BRANCH_NUMBER=33
CHROMEOS_RELEASE_CHROME_MILESTONE=65
CHROMEOS_RELEASE_PATCH_NUMBER=2018_02_15_1605
CHROMEOS_RELEASE_TRACK=testimage-channel
CHROMEOS_RELEASE_DESCRIPTION=10323.33.2018_02_15_1605 (Continuous Builder - Builder: N/A) whirlwind
CHROMEOS_RELEASE_NAME=Chromium OS
CHROMEOS_RELEASE_BUILD_TYPE=Continuous Builder - Builder: N/A
CHROMEOS_RELEASE_VERSION=10323.33.2018_02_15_1605
CHROMEOS_AUSERVER=http://build41-m2.golo.chromium.org:8080/update
localhost metrics # df -h /mnt/stateful_partition/
Filesystem Size Used Avail Use% Mounted on
/dev/mmcblk0p1 1.9G 1.9G 0 100% /mnt/stateful_partition
localhost metrics # ls -alh /var/lib/metrics/uma-events
-rw-rw-rw- 1 chronos chronos 1.1G May 2 23:10 /var/lib/metrics/uma-events
Are these supposed to get rotated?
,
May 2 2018
I thought test builds weren't supposed to collect UMA stats. They certainly aren't supposed to report them. Is it possible the whirlwind testbed is misconfigured? Since whirlwind/arkham/gale don't have chrome anyway, metric_daemon is expected to upload the "staged" UMA metrics. Luigi, does metric_daemon poll for the staged files and attempt to clean them up?
,
May 3 2018
Actually looks like a bug. We have a function AreMetricsEnabled() in the metrics library, but it is only exported instead of being used internally to the library to block sample production at the source. That's because the uploader is expected to periodically clean up the file. As a consequence, the metrics daemon keeps appending samples to the uma-events file, and nobody picks them out of there. It should be fairly easy to fix but I wonder if we'll break anything (i.e. for instance tests that look for certain metrics to be generated). I guess the only way to know is to try. There are two possible fixes. 1. in test mode, prevent the library from generating the samples; 2. make sure that the uploader runs and cleans up. I favor solution 2 because that's the current behavior when Chrome runs. I am in a bit of a time crunch with other projects, though. It could be a good starter project for someone who wants to get involved. Otherwise I can do it, but no ETA.
,
Sep 26
Luigi, is this something you're going to take on? Or should I try to wade in? We're hitting this on some lab APs in bug 889556.
,
Sep 26
Or, do you know anyone else less busy who might be able to help?
,
Sep 26
Hi! I am indeed a little busy and this seems urgent. Can I perhaps help you with code understanding, brainstorming, and code review?
,
Sep 26
It's probably not actually urgent, if we can just manually prune the few dozen routers we have in the test lab. I'm not sure where to start on #2; who normally runs the uploader?
,
Sep 27
The metrics daemon runs the uploader. On devices with Chrome, Chrome takes care of uploading. I can take a look in the bus (leaving now) and/or help more tonight or tomorrow.
,
Jan 15
|
|||
►
Sign in to add a comment |
|||
Comment 1 by briannorris@chromium.org
, May 2 2018