crash_reporter should report hardware watchdog resets |
|||||||
Issue description
Many of our ARM-based Chromebooks have a hardware watchdog that reboots hung systems if all else fails. Right now, I believe there is no reporting whatsoever for these reboots. We should probably file a crash report and maybe have a UMA stat or something.
The easiest consistent way to detect a watchdog reset is by parsing the eventlog. At least Tegra, Rockchip and Mediatek based systems should be reporting the ELOG_TYPE_ASYNC_HW_TIMER_EXPIRED event in there ('mosys eventlog list' reports
it as "Hardware watchdog reset"). I think we already dump the eventlog into /var/log/eventlog.txt every boot, although I'm not quite sure how and whether that could race with crash_reporter starting up. We should read it from there if possible, or otherwise popen() mosys directly. Since the eventlog is persistent across reboots, the important thing to check for is whether there was a watchdog reset event after the latest ELOG_TYPE_BOOT ("System boot") event.
,
Mar 25 2016
See also ancient bugs: * http://crosbug.com/p/24222 * http://crosbug.com/p/24221
,
Mar 25 2016
,
Apr 5 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/19c2e63fa4a9e2d51123497f72f2a90f8c5639c1 commit 19c2e63fa4a9e2d51123497f72f2a90f8c5639c1 Author: Julius Werner <jwerner@chromium.org> Date: Fri Mar 25 22:51:58 2016 userfeedback: Add dependency on crash-reporter CL:334910 adds functionality to crash-reporter that uses the /var/log/eventlog.txt file generated by userfeedback. Since it's not desired to add a dependency from crash-reporter on userfeedback, I've instead added a dependency from userfeedback on crash-reporter (by changing the userfeedback init script in a way that will make crash-reporter-late wait on it iff it exists). Update the userfeedback ebuild to reflect this new dependency. CQ-DEPEND=CL:334910 BUG= chromium:595531 TEST=None Change-Id: I190985e3b82d378119bcc2fc4707d3e740d7394e Signed-off-by: Julius Werner <jwerner@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/335264 [modify] https://crrev.com/19c2e63fa4a9e2d51123497f72f2a90f8c5639c1/chromeos-base/userfeedback/userfeedback-9999.ebuild [modify] https://crrev.com/19c2e63fa4a9e2d51123497f72f2a90f8c5639c1/chromeos-base/crash-reporter/crash-reporter-9999.ebuild
,
Apr 5 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/336f42c28ccc99081e29ab222ddc8fd9f6f1e0ec commit 336f42c28ccc99081e29ab222ddc8fd9f6f1e0ec Author: Julius Werner <jwerner@chromium.org> Date: Fri Mar 25 22:50:31 2016 crash-reporter: Update init scripts This patch updates the ebuild to install the new init scripts from CL:335262. CQ-DEPEND=CL:335262 BUG= chromium:595531 TEST=cros deploy Change-Id: I62b397049213b288a932283eac634ad5b44953b4 Signed-off-by: Julius Werner <jwerner@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/335263 Reviewed-by: Andrey Ulanov <andreyu@google.com> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/336f42c28ccc99081e29ab222ddc8fd9f6f1e0ec/chromeos-base/crash-reporter/crash-reporter-9999.ebuild
,
Apr 5 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/43d39977c37f4d2264d596b6d4e0724ca84236d9 commit 43d39977c37f4d2264d596b6d4e0724ca84236d9 Author: Julius Werner <jwerner@chromium.org> Date: Fri Mar 25 22:56:08 2016 cros: crash_test: Adapt to new crash-reporter interface After CL:335262 --nounclean_check is no longer used by crash_reporter --init. Delete it. CQ-DEPEND=CL:335263 BUG= chromium:595531 TEST=logging_UserCrash Change-Id: Ie7d6be3df7b451370d1b2f68fcec5b90b3b83a95 Signed-off-by: Julius Werner <jwerner@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/335294 [modify] https://crrev.com/43d39977c37f4d2264d596b6d4e0724ca84236d9/client/cros/crash_test.py
,
Apr 5 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/309d1beb6fd8d7cf43b5b9578a9ce19ba71eebd7 commit 309d1beb6fd8d7cf43b5b9578a9ce19ba71eebd7 Author: Julius Werner <jwerner@chromium.org> Date: Fri Mar 25 00:16:11 2016 crash: Collect console-ramoops for hardware watchdog resets Many of our boards contain a hardware watchdog that reboots the machine if it hasn't been petted for a while. This serves as the last line of defense against the most unrecoverable of hangs. Right now we only detect these events as an unclean shutdown, but we do not differentiate them from other such events (e.g. battery ran dry in S3) and do not collect any debugging information. This patch adds support to crash-reporter to detect hardware watchdog resets and upload the console-ramoops file in a crash report in that case (subject to the usual consent and sanitization rules, of course). console-ramoops contains a ring buffer with the most recent dmesg lines before the last reboot, similar to the normal pstore crash dumps we collect for real kernel crashes. Since the kernel cannot respond to a hardware watchdog timeout this dump will not contain the usual backtrace and register dump, but the last few log messages before the hang can still often give a useful clue about the problem. We aggregate it through a "signature" computed from the last log line which is the most likely one related to the problem. Since crash-reporter needs to read /var/log/eventlog.txt for this, we change the userfeedback job that creates this file to a 'task' which starts on 'starting crash-reporter-late'. This makes upstart wait for it to finish before actually starting the crash-reporter-late job. CQ-DEPEND=CL:335264 BUG= chromium:595531 TEST=Tried both 'cat > /dev/watchdog' and 'echo c > /proc/sysrq-trigger' on Oak, observed how the expected crash dumps got written to /var/spool/crash. Ran unit tests. Change-Id: I89fa18917926fbb458733b5c5ca5204fe219ad37 Signed-off-by: Julius Werner <jwerner@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/334910 Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/309d1beb6fd8d7cf43b5b9578a9ce19ba71eebd7/crash-reporter/kernel_collector.h [modify] https://crrev.com/309d1beb6fd8d7cf43b5b9578a9ce19ba71eebd7/userfeedback/init/firmware-version.conf [modify] https://crrev.com/309d1beb6fd8d7cf43b5b9578a9ce19ba71eebd7/crash-reporter/kernel_collector_test.cc [modify] https://crrev.com/309d1beb6fd8d7cf43b5b9578a9ce19ba71eebd7/crash-reporter/kernel_collector.cc
,
Apr 5 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/83c8370fc702d30a7f31df8933b3eefa68759ad8 commit 83c8370fc702d30a7f31df8933b3eefa68759ad8 Author: Julius Werner <jwerner@chromium.org> Date: Fri Mar 25 22:07:49 2016 crash: Split --init into --init and --boot_collect The crash-reporter --init job is used for two distinct things: to initialize some sysfs settings that cause the system to collect crashes when they happen, and to run certain one-shot per-boot collection tasks. We want to ensure that the former happens as soon as possible (to be able to record crashes as early as possible), while the latter may include longer-running operations and dependencies on other tasks but is not very time-critical. Therefore, this patch splits the two into two separate upstart tasks, so that crash-reporter could get moved further up the boot chain while crash-boot-collect can accumulate more dependencies without fear of holding up anything critical. Also remove the --nounclean_check option, because this patch makes it unnecessary for the autotest that seems to have been its only user. CQ-DEPEND=CL:335294 BUG= chromium:595531 TEST=Booted on Oak with --verbose command line and manually confirmed that both jobs ran in the expected order. Ran 'test_that e:logging_.*'. Change-Id: I000db5ff41bc170708e3a8699a6d911231b0efe8 Signed-off-by: Julius Werner <jwerner@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/335262 [modify] https://crrev.com/83c8370fc702d30a7f31df8933b3eefa68759ad8/crash-reporter/crash_reporter.cc [add] https://crrev.com/83c8370fc702d30a7f31df8933b3eefa68759ad8/crash-reporter/init/crash-boot-collect.service [add] https://crrev.com/83c8370fc702d30a7f31df8933b3eefa68759ad8/crash-reporter/init/crash-boot-collect.conf
,
Mar 15 2017
,
Mar 29 2017
This feature has landed, follow-up regarding moving crash_reporter --init earlier is discussed in issue 702794 .
,
May 30 2017
,
Aug 1 2017
,
Jan 22 2018
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by jwer...@chromium.org
, Mar 25 2016Status: Started (was: Available)