metrics_daemon: failed to open vmlog: file exists |
|||||
Issue descriptionWhile chasing a separate bug, I noticed these errors in /var/log/messages on a gale at 100.107.3.47. (Hope it stays there.) Justin, do you want to take a look? Is this an issue with devices that cannot set their clock correctly before they are on line? Grant would you please copy to some jetstream folks? I don't think they have to worry about it, it's just an extra log for vm metrics. 1970-01-01T00:00:06.276394+00:00 NOTICE boot-metrics[461]: Warm boot detected 1970-01-01T00:00:07.692502+00:00 ERR metrics_daemon[629]: [ERROR:vmlog_writer.cc(97)] Failed to open file: /var/log/vmlog/vmlog.19700101-000007: File exists 1970-01-01T00:00:07.697741+00:00 INFO metrics_daemon[629]: [INFO:metrics_daemon.cc(361)] uploader disabled on non-official build 1970-01-01T00:00:09.696663+00:00 ERR metrics_daemon[629]: [ERROR:vmlog_writer.cc(212)] Writing to vmlog failed.
,
May 17 2017
If you overwrite the existing file, will you potentially lose useful data with every reboot? And if we're having a lot of OOM crashes, that's exactly when the data would be most useful.
,
May 17 2017
Offhand, I don't know who in jetstream team is responsible for their jetstream specific user space (vmlog in this case). I'll add kkunduru and sduvvuri as first guess and they might know? I'm surprised we loose the time on warmboot since the power never goes away (it's via USB-C which AFAIK is always on). I've always assumed my devices were loosing time because of pre-release HW and/or the update sequence I used...but that doesn't explain it at all since this gale is MP build (one of the first MP devices). The clock is NOT battery backed because jetstream assumed NTP would be available to set the time or get the time from google (which manages the device configuration anyway). That all has it's own set of issues and we just need to beware of timestamps changing shortly after boot. Justin - thanks for picking this up. If you need access to the gale or whirlwind, I'll usually have one of each running on my desk. Just warn me when you need it since I routinely build/update the devices several times per week.
,
May 17 2017
I suppose I could have the vmlog writer wait for a while until the time is set correctly, but the jetstream device will still lose logs for some other userspace services. I don't want to get too fancy as I already got some pushback on the log rotation that vmlog does now. Grant, Do you know how time is set on the device on jetstream at boot? I'm not sure what would happen if NTP had a huge offset while a jobs on the machine are using CLOCK_MONOTONIC for scheduling.
,
May 17 2017
My bad: tlsdate, not NTP Date is set from /etc/init/tlsdated.conf: description "Run the tlsdate daemon to set the system time" author "chromium-os-dev@chromium.org" start on started system-services stop on stopping system-services ... I expect this is the same as for any other chromebook....except we loose the date on every reboot. :/
,
May 17 2017
Sorry, I don't understand the exact issues here, but as long as the date is bogus, why not make up a different filename which doesn't collide (and doesn't look like a date, to avoid confusion)? Is the date part of the file name used for anything other than UX? #4: I would not worry about system implications of losing the time---not our problem here.
,
May 17 2017
#6: This is a UX thing. We can manage the LATEST and PREVIOUS symlinks easily, but it's going to get confusing beyond that point to determine the right vmlog file if we come up with some other naming scheme. The timestamps in vmlog are relative to boot time, so that's no help if you're trying to figure out which boot they're for. Whatever naming policy we'd device should apply to other logs with the same date format as well. Waiting a little while until we have a good clock time seems simpler.
,
May 17 2017
#7 yes that will work. Also, we really need those logs when under memory pressure, and it's unlikely that it will happen that soon after boot.
,
May 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/f0c760c3774d9a4d5da5ad53004f9a83e87b9673 commit f0c760c3774d9a4d5da5ad53004f9a83e87b9673 Author: Justin TerAvest <teravest@chromium.org> Date: Fri May 19 20:57:55 2017 metrics: Delay vmlog until clock time is good. Some Chrome OS devices do not have a battery backed clock and have a delay before time is correctly synchronized. Vmlog files are named based on the timestamp that writing starts, which can cause collisions without a good clock time. We generally don't care about behavior near boot time for vmlog, so we defer starting to write log files until the time is past the first day after the epoch. BUG= chromium:723196 TEST=Manually set time, noticed file writing delay. Change-Id: Id0f2392e78f7d79b1fff38273cdf50c692ed9413 Reviewed-on: https://chromium-review.googlesource.com/508815 Commit-Ready: Justin TerAvest <teravest@chromium.org> Tested-by: Justin TerAvest <teravest@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> [modify] https://crrev.com/f0c760c3774d9a4d5da5ad53004f9a83e87b9673/metrics/vmlog_writer.cc [modify] https://crrev.com/f0c760c3774d9a4d5da5ad53004f9a83e87b9673/metrics/vmlog_writer.h
,
May 23 2017
,
Aug 1 2017
,
Jan 22 2018
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by teravest@chromium.org
, May 17 2017Status: Assigned (was: Untriaged)