New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 839164 link

Starred by 3 users

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

uma-events fills disk

Project Member Reported by briannorris@chromium.org, May 2 2018

Issue description

I have a whirlwind testbed router (version of whirlwind image used for Wifi testing) whose stateful partition is totally full of UMA events:

# cat /etc/lsb-release                                                                                                   
CHROMEOS_RELEASE_APPID={63E6908B-4769-4985-AD03-72E2C9E77D05}
CHROMEOS_BOARD_APPID={63E6908B-4769-4985-AD03-72E2C9E77D05}
CHROMEOS_CANARY_APPID={90F229CE-83E2-4FAF-8479-E368A34938B1}
DEVICETYPE=OTHER
HWID_OVERRIDE=WHIRLWIND DOGFOOD
CHROMEOS_RELEASE_BUILDER_PATH=trybot-whirlwind-test-ap-tryjob/R65-10323.33.0-c40061
GOOGLE_RELEASE=10323.33.2018_02_15_1605
CHROMEOS_DEVSERVER=http://build41-m2.golo.chromium.org:8080
CHROMEOS_RELEASE_BOARD=whirlwind
CHROMEOS_RELEASE_BUILD_NUMBER=10323
CHROMEOS_RELEASE_BRANCH_NUMBER=33
CHROMEOS_RELEASE_CHROME_MILESTONE=65
CHROMEOS_RELEASE_PATCH_NUMBER=2018_02_15_1605
CHROMEOS_RELEASE_TRACK=testimage-channel
CHROMEOS_RELEASE_DESCRIPTION=10323.33.2018_02_15_1605 (Continuous Builder - Builder: N/A) whirlwind
CHROMEOS_RELEASE_NAME=Chromium OS
CHROMEOS_RELEASE_BUILD_TYPE=Continuous Builder - Builder: N/A
CHROMEOS_RELEASE_VERSION=10323.33.2018_02_15_1605
CHROMEOS_AUSERVER=http://build41-m2.golo.chromium.org:8080/update

localhost metrics # df -h /mnt/stateful_partition/                                                                                         
Filesystem      Size  Used Avail Use% Mounted on
/dev/mmcblk0p1  1.9G  1.9G     0 100% /mnt/stateful_partition

localhost metrics # ls -alh /var/lib/metrics/uma-events 
-rw-rw-rw- 1 chronos chronos 1.1G May  2 23:10 /var/lib/metrics/uma-events


Are these supposed to get rotated?
 
Other news: this device was disconnected from the network for a few days (intentionally), so it's possible it was accumulating a lot of shill events. I'm trying to figure out how to manually parse this and see what/when was going on.
I thought test builds weren't supposed to collect UMA stats. They certainly aren't supposed to report them. Is it possible the whirlwind testbed is misconfigured?

Since whirlwind/arkham/gale don't have chrome anyway, metric_daemon is expected to upload the "staged" UMA metrics. 

Luigi, does metric_daemon poll for the staged files and attempt to clean them up?

Actually looks like a bug.  We have a function AreMetricsEnabled() in the metrics library, but it is only exported instead of being used internally to the library to block sample production at the source.  That's because the uploader is expected to periodically clean up the file.

As a consequence, the metrics daemon keeps appending samples to the uma-events file, and nobody picks them out of there.

It should be fairly easy to fix but I wonder if we'll break anything (i.e. for instance tests that look for certain metrics to be generated).  I guess the only way to know is to try.

There are two possible fixes.  1. in test mode, prevent the library from generating the samples; 2. make sure that the uploader runs and cleans up.

I favor solution 2 because that's the current behavior when Chrome runs.  I am in a bit of a time crunch with other projects, though.  It could be a good starter project for someone who wants to get involved.  Otherwise I can do it, but no ETA.



Luigi, is this something you're going to take on? Or should I try to wade in? We're hitting this on some lab APs in bug 889556.
Or, do you know anyone else less busy who might be able to help?
Hi!  I am indeed a little busy and this seems urgent.  Can I perhaps help you with code understanding, brainstorming, and code review?
Labels: Hotlist-GoodFirstBug
It's probably not actually urgent, if we can just manually prune the few dozen routers we have in the test lab.

I'm not sure where to start on #2; who normally runs the uploader?
The metrics daemon runs the uploader.  On devices with Chrome, Chrome takes care of uploading.

I can take a look in the bus (leaving now) and/or help more tonight or tomorrow.
Labels: Enterprise-Triaged

Sign in to add a comment