New issue
Advanced search Search tips

Issue 595531 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Closed: Mar 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Feature



Sign in to add a comment

crash_reporter should report hardware watchdog resets

Project Member Reported by jwer...@chromium.org, Mar 17 2016

Issue description

Many of our ARM-based Chromebooks have a hardware watchdog that reboots hung systems if all else fails. Right now, I believe there is no reporting whatsoever for these reboots. We should probably file a crash report and maybe have a UMA stat or something.

The easiest consistent way to detect a watchdog reset is by parsing the eventlog. At least Tegra, Rockchip and Mediatek based systems should be reporting the ELOG_TYPE_ASYNC_HW_TIMER_EXPIRED event in there ('mosys eventlog list' reports
it as "Hardware watchdog reset"). I think we already dump the eventlog into /var/log/eventlog.txt every boot, although I'm not quite sure how and whether that could race with crash_reporter starting up. We should read it from there if possible, or otherwise popen() mosys directly. Since the eventlog is persistent across reboots, the important thing to check for is whether there was a watchdog reset event after the latest ELOG_TYPE_BOOT ("System boot") event.
 
Owner: jwer...@chromium.org
Status: Started (was: Available)
I'm taking a stab at this. It's actually not that hard...

https://chromium-review.googlesource.com/#/c/334910/
See also ancient bugs:
* http://crosbug.com/p/24222
* http://crosbug.com/p/24221
Cc: groeck@chromium.org
Project Member

Comment 4 by bugdroid1@chromium.org, Apr 5 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/19c2e63fa4a9e2d51123497f72f2a90f8c5639c1

commit 19c2e63fa4a9e2d51123497f72f2a90f8c5639c1
Author: Julius Werner <jwerner@chromium.org>
Date: Fri Mar 25 22:51:58 2016

userfeedback: Add dependency on crash-reporter

CL:334910 adds functionality to crash-reporter that uses the
/var/log/eventlog.txt file generated by userfeedback. Since it's not
desired to add a dependency from crash-reporter on userfeedback,
I've instead added a dependency from userfeedback on crash-reporter
(by changing the userfeedback init script in a way that will make
crash-reporter-late wait on it iff it exists). Update the userfeedback
ebuild to reflect this new dependency.

CQ-DEPEND=CL:334910
BUG= chromium:595531 
TEST=None

Change-Id: I190985e3b82d378119bcc2fc4707d3e740d7394e
Signed-off-by: Julius Werner <jwerner@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/335264

[modify] https://crrev.com/19c2e63fa4a9e2d51123497f72f2a90f8c5639c1/chromeos-base/userfeedback/userfeedback-9999.ebuild
[modify] https://crrev.com/19c2e63fa4a9e2d51123497f72f2a90f8c5639c1/chromeos-base/crash-reporter/crash-reporter-9999.ebuild

Project Member

Comment 5 by bugdroid1@chromium.org, Apr 5 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/336f42c28ccc99081e29ab222ddc8fd9f6f1e0ec

commit 336f42c28ccc99081e29ab222ddc8fd9f6f1e0ec
Author: Julius Werner <jwerner@chromium.org>
Date: Fri Mar 25 22:50:31 2016

crash-reporter: Update init scripts

This patch updates the ebuild to install the new init scripts from
CL:335262.

CQ-DEPEND=CL:335262
BUG= chromium:595531 
TEST=cros deploy

Change-Id: I62b397049213b288a932283eac634ad5b44953b4
Signed-off-by: Julius Werner <jwerner@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/335263
Reviewed-by: Andrey Ulanov <andreyu@google.com>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/336f42c28ccc99081e29ab222ddc8fd9f6f1e0ec/chromeos-base/crash-reporter/crash-reporter-9999.ebuild

Project Member

Comment 6 by bugdroid1@chromium.org, Apr 5 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/43d39977c37f4d2264d596b6d4e0724ca84236d9

commit 43d39977c37f4d2264d596b6d4e0724ca84236d9
Author: Julius Werner <jwerner@chromium.org>
Date: Fri Mar 25 22:56:08 2016

cros: crash_test: Adapt to new crash-reporter interface

After CL:335262 --nounclean_check is no longer used by crash_reporter
--init. Delete it.

CQ-DEPEND=CL:335263
BUG= chromium:595531 
TEST=logging_UserCrash

Change-Id: Ie7d6be3df7b451370d1b2f68fcec5b90b3b83a95
Signed-off-by: Julius Werner <jwerner@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/335294

[modify] https://crrev.com/43d39977c37f4d2264d596b6d4e0724ca84236d9/client/cros/crash_test.py

Project Member

Comment 7 by bugdroid1@chromium.org, Apr 5 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/309d1beb6fd8d7cf43b5b9578a9ce19ba71eebd7

commit 309d1beb6fd8d7cf43b5b9578a9ce19ba71eebd7
Author: Julius Werner <jwerner@chromium.org>
Date: Fri Mar 25 00:16:11 2016

crash: Collect console-ramoops for hardware watchdog resets

Many of our boards contain a hardware watchdog that reboots the machine
if it hasn't been petted for a while. This serves as the last line of
defense against the most unrecoverable of hangs. Right now we only
detect these events as an unclean shutdown, but we do not differentiate
them from other such events (e.g. battery ran dry in S3) and do not
collect any debugging information.

This patch adds support to crash-reporter to detect hardware watchdog
resets and upload the console-ramoops file in a crash report in that
case (subject to the usual consent and sanitization rules, of course).
console-ramoops contains a ring buffer with the most recent dmesg lines
before the last reboot, similar to the normal pstore crash dumps we
collect for real kernel crashes. Since the kernel cannot respond to a
hardware watchdog timeout this dump will not contain the usual backtrace
and register dump, but the last few log messages before the hang can
still often give a useful clue about the problem. We aggregate it
through a "signature" computed from the last log line which is the most
likely one related to the problem.

Since crash-reporter needs to read /var/log/eventlog.txt for this, we
change the userfeedback job that creates this file to a 'task' which
starts on 'starting crash-reporter-late'. This makes upstart wait for it
to finish before actually starting the crash-reporter-late job.

CQ-DEPEND=CL:335264
BUG= chromium:595531 
TEST=Tried both 'cat > /dev/watchdog' and 'echo c > /proc/sysrq-trigger'
on Oak, observed how the expected crash dumps got written to
/var/spool/crash. Ran unit tests.

Change-Id: I89fa18917926fbb458733b5c5ca5204fe219ad37
Signed-off-by: Julius Werner <jwerner@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/334910
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/309d1beb6fd8d7cf43b5b9578a9ce19ba71eebd7/crash-reporter/kernel_collector.h
[modify] https://crrev.com/309d1beb6fd8d7cf43b5b9578a9ce19ba71eebd7/userfeedback/init/firmware-version.conf
[modify] https://crrev.com/309d1beb6fd8d7cf43b5b9578a9ce19ba71eebd7/crash-reporter/kernel_collector_test.cc
[modify] https://crrev.com/309d1beb6fd8d7cf43b5b9578a9ce19ba71eebd7/crash-reporter/kernel_collector.cc

Project Member

Comment 8 by bugdroid1@chromium.org, Apr 5 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/83c8370fc702d30a7f31df8933b3eefa68759ad8

commit 83c8370fc702d30a7f31df8933b3eefa68759ad8
Author: Julius Werner <jwerner@chromium.org>
Date: Fri Mar 25 22:07:49 2016

crash: Split --init into --init and --boot_collect

The crash-reporter --init job is used for two distinct things: to
initialize some sysfs settings that cause the system to collect crashes
when they happen, and to run certain one-shot per-boot collection tasks.
We want to ensure that the former happens as soon as possible (to be
able to record crashes as early as possible), while the latter may
include longer-running operations and dependencies on other tasks but is
not very time-critical. Therefore, this patch splits the two into two
separate upstart tasks, so that crash-reporter could get moved further
up the boot chain while crash-boot-collect can accumulate more
dependencies without fear of holding up anything critical.

Also remove the --nounclean_check option, because this patch makes it
unnecessary for the autotest that seems to have been its only user.

CQ-DEPEND=CL:335294
BUG= chromium:595531 
TEST=Booted on Oak with --verbose command line and manually confirmed
that both jobs ran in the expected order. Ran 'test_that e:logging_.*'.

Change-Id: I000db5ff41bc170708e3a8699a6d911231b0efe8
Signed-off-by: Julius Werner <jwerner@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/335262

[modify] https://crrev.com/83c8370fc702d30a7f31df8933b3eefa68759ad8/crash-reporter/crash_reporter.cc
[add] https://crrev.com/83c8370fc702d30a7f31df8933b3eefa68759ad8/crash-reporter/init/crash-boot-collect.service
[add] https://crrev.com/83c8370fc702d30a7f31df8933b3eefa68759ad8/crash-reporter/init/crash-boot-collect.conf

Comment 9 by vapier@chromium.org, Mar 15 2017

Components: Internals>CrashReporting
Status: Fixed (was: Started)
This feature has landed, follow-up regarding moving crash_reporter --init earlier is discussed in  issue 702794 .

Comment 11 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 13 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment