arkham: watchdog reboots are not reported as reboot reason by linux/watchdog.h |
|||||||||||
Issue descriptionVersion: CHROMEOS_RELEASE_VERSION=8030.0.2016_03_08_0946 OS: ChromeOS Steps (using thirdparty/daisydog): - Modified daisydog.c to not cleanly shutdown by commenting out the write of the "V" char: //if (write(fd, "V", 1)) - Stop daisydog and wait 10 seconds for the device to reboot - After reboot, use daisydog -c to check the boo status (daisydog may need to be stopped to read this). localhost ~ # daisydog -c HW watchdog interval is 10 seconds /dev/watchdog reported boot status: normal-boot What is the expected output? What do you see instead? Should see "watchdog-timeout" as the reason which generates the HwWatchdogReboot UMA CrOS event.
,
Mar 17 2016
What is the actual hardware watchdog used ? Not all watchdog drivers (and not all hardware) support reporting a reset reason.
,
Mar 17 2016
arkham is Qualcom IPQ8064 chipset. The chipset does implement the WDT and usually I expect the chipset to also report when it fires (or has fired). If it's not working as intended, my first suspicion (in this case) is the BIOS is clearing that state from the chipset and not reporting this particular bit of info to the kernel. I don't know the whole chain of events from WDT firing to "reset reason" getting queried through /dev/watchdog IOCTL on the next boot...but we certainly have all the source code for that and someone (me most likely) just needs to spend time tracking this down. But if someone else has time right now, go for it.
,
Mar 21 2016
The qcom_wdt driver (drivers/watchdog/qcom_wdt.c) does not implement returning the boot status. The chip does support reporting the reset reason (watchdog status register at 0x44, bit 0). I'll be happy to implement the code, but I would need someone to test it. If that is ok feel, free to assign to me.
,
Mar 21 2016
I can test it. In fact, I'd like to modify ./files/server/site_tests/platform_HWwatchdog/platform_HWwatchdog.py to also look for the reboot reason if it finds /dev/watchdog present and resets the system via WDT.
,
Mar 22 2016
The ability to check the boot status depends on the watchdog timer, and on the driver. I am not sure if a generic script would be that useful. For example, the driver for Intel's iTCO doesn't support the boot status either, and I am not sure if it can be supported (the documentation is a bit vague). The DesignWare watchdog doesn't support it either. I don't have a datasheet for the DW watchdog, so I can not check if it supports reporting the boot status. Anyway, I reassigned the bug to myself.
,
Mar 22 2016
ISTR one of the previous ARM platforms used a global variable to store "bootreason" which was filled in by chipset initialization code, not the watchdog driver. The WDIOC_GETBOOTSTATUS is "merely" the primary API to get access to that "state" info. 14 drivers in the kernel support WDIOC_GETBOOTSTATUS. If there is a different/better API to get (and log) the same state, we can switch to that instead. Regardless of which API is used, I would like to modify the platform_HWwatchdog.py autotest to enforce WDT reset logging (not just in CoreBoot "BIOS event logs" for the ChromeOS devices that have that). To collect broad statistics on forced reboots, we need a consistent method of logging this event.
,
Mar 22 2016
s/bootreason/bootstatus maybe? I'm not finding the code right now though.
,
Mar 22 2016
You were probably looking for reset_status (used by drivers/watchdog/sa1100_wdt.c).
,
Mar 22 2016
,
Mar 22 2016
Yup - reset_status is exactly right - thanks. :)
,
Apr 5 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/0d93f8f36b56b6a922fcd4ae7b586cf09e371323 commit 0d93f8f36b56b6a922fcd4ae7b586cf09e371323 Author: Guenter Roeck <groeck@chromium.org> Date: Mon Mar 21 22:23:10 2016 CHROMIUM: watchdog: qcom: Report reboot reason The Qualcom watchdog timer block reports if the system was reset by the watchdog. Pass the information to user space. BUG= chromium:593028 TEST=build for whirlwind (chromeos-3.14 branch) and pass: test_that $H platform_HWwatchdog and manually verify daisydog emits "watchdog-timeout" when it starts. Change-Id: Ic4aeeae9da3354eef279151746f1b67ccba834fa Signed-off-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Grant Grundler <grundler@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/334460 Reviewed-by: Guenter Roeck <groeck@google.com> [modify] https://crrev.com/0d93f8f36b56b6a922fcd4ae7b586cf09e371323/drivers/watchdog/qcom-wdt.c
,
Apr 5 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/eeeba2480cb8eec080ca20df2d1bff2583e4a315 commit eeeba2480cb8eec080ca20df2d1bff2583e4a315 Author: Grant Grundler <grundler@google.com> Date: Tue Mar 22 23:04:23 2016 BACKPORT: watchdog: qcom: initialize wdd.timeout chromeos-3.18 qcom wdt driver initializes wdd.timeout field. BUG= chromium:593028 TEST=build image for whirlwind and pass: test_that $H platform_HWwatchdog and manually verify daisydog emits "watchdog-timeout" when it starts. Change-Id: I8e499f4dfaa7eacf29c8d8887fcf4d2d98c4eb57 Signed-off-by: Grant Grundler <grundler@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/334239 Tested-by: Guenter Roeck <groeck@chromium.org> Reviewed-by: Guenter Roeck <groeck@google.com> [modify] https://crrev.com/eeeba2480cb8eec080ca20df2d1bff2583e4a315/drivers/watchdog/qcom-wdt.c
,
Apr 11 2016
With this change I see the reboot reported as a watchdog, but now I also see that as the reason when using the "reboot" command: /dev/watchdog reported boot status: 0x20 watchdog-timeout
,
Apr 11 2016
That may be technically correct, if the system uses the watchdog driver to reboot itself (via qcom_wdt_restart).
,
Apr 11 2016
Agreed. But not really what we want to be reporting to UMA...
,
Apr 11 2016
Sounds like a catch-22. Wonder if there is a way to find out if the restart function triggered the reboot. I'll check the datasheet if I can find something.
,
Apr 11 2016
Guenter appears to be correct:
107 static int qcom_watchdog_probe(struct platform_device *pdev)
108 {
...
156 /*
157 * WDT restart notifier has priority 0 (use as a last resort)
158 */
159 wdt->restart_nb.notifier_call = qcom_wdt_restart;
160 ret = register_restart_handler(&wdt->restart_nb);
Since I'd like to leave this as-is, what is the "first resort"?
Or what should the system be trying first to restart the system?
,
Apr 11 2016
Restart handlers depend on the hardware/platform, really. Most of the time there is an arm platform restart handler, or a gpio restart handler. Since this one fires, it looks like no other restart handler is enabled for this platform. The devicetree file should tell. Some qcom devicetree files (qcom-apq8084, qcom-msm8974, msm8916) specify "qcom,pshold", which would have a priority of 128. It might be possible to use WD1 instead of WD0 to restart the system from the watchdog driver, assuming that WD1 is always available.
,
Apr 12 2016
Could we write a cookie to memory before performing the intentional reboot? Then only report a 'watchdog' if this was not intentional?
,
Apr 12 2016
Let's try using wd1 first. I'll try to submit a patch later today.
,
Jun 3 2016
,
Feb 17 2017
,
Mar 18 2017
Activating. Please assign to the right owner and the appropriate priority.
,
Apr 16 2018
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Aug 3
This bug has an owner, thus, it's been triaged. Changing status to "assigned".
,
Aug 3
,
Aug 3
Adding some original Onhub folks in case they want correct UMA stats for watchdog timeout on whirlwind/arkham. |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by grundler@chromium.org
, Mar 8 2016