New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 593028 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Aug 3
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

arkham: watchdog reboots are not reported as reboot reason by linux/watchdog.h

Project Member Reported by cbook@google.com, Mar 8 2016

Issue description

Version: CHROMEOS_RELEASE_VERSION=8030.0.2016_03_08_0946
OS: ChromeOS

Steps (using thirdparty/daisydog):
- Modified daisydog.c to not cleanly shutdown by commenting out the write of the "V" char:
//if (write(fd, "V", 1))

- Stop daisydog and wait 10 seconds for the device to reboot
- After reboot, use daisydog -c to check the boo status (daisydog may need to be stopped to read this).

localhost ~ # daisydog -c    
HW watchdog interval is 10 seconds
/dev/watchdog reported boot status: normal-boot


What is the expected output? What do you see instead?
Should see "watchdog-timeout" as the reason which generates the HwWatchdogReboot UMA CrOS event.
 
Owner: grundler@chromium.org
Once upon a time I wrote an autotest for watchdog to verify this works:
  cd ~/trunk/src/thirdparty/autotest/files/
  git log server/site_tests/platform_HWwatchdog/platform_HWwatchdog.py
...
  commit 9e97cd9600c6c78a98e5e81046fdfb3bc2342033
Author: Grant Grundler <grundler@chromium.org>
Date:   Fri Dec 7 14:10:47 2012 -0800

    CHROMIUMOS: autotest: add /dev/watchdog kernel test
...

Let me try that first and see if the test is actually checking the reboot reason. I suspect the reboot reason is not being verified by this test.

Comment 2 by groeck@chromium.org, Mar 17 2016

What is the actual hardware watchdog used ? Not all watchdog drivers (and not all hardware) support reporting a reset reason.

Comment 3 by grundler@google.com, Mar 17 2016

arkham is Qualcom IPQ8064 chipset. The chipset does implement the WDT and usually I expect the chipset to also report when it fires (or has fired). If it's not working as intended, my first suspicion (in this case) is the BIOS is clearing that state from the chipset and not reporting this particular bit of info to the kernel.

I don't know the whole chain of events from WDT firing to "reset reason" getting queried through /dev/watchdog IOCTL on the next boot...but we certainly have all the source code for that and someone (me most likely) just needs to spend time tracking this down.  But if someone else has time right now, go for it.

Comment 4 by groeck@chromium.org, Mar 21 2016

The qcom_wdt driver (drivers/watchdog/qcom_wdt.c) does not implement returning the boot status. The chip does support reporting the reset reason (watchdog status register at 0x44, bit 0).
I'll be happy to implement the code, but I would need someone to test it. If that is ok feel, free to assign to me.

Comment 5 by grundler@google.com, Mar 21 2016

I can test it. In fact, I'd like to modify ./files/server/site_tests/platform_HWwatchdog/platform_HWwatchdog.py to also look for the reboot reason if it finds /dev/watchdog present and resets the system via WDT.

Comment 6 by groeck@chromium.org, Mar 22 2016

Cc: grundler@chromium.org
Owner: groeck@chromium.org
Status: Assigned (was: Untriaged)
The ability to check the boot status depends on the watchdog timer, and on the driver. I am not sure if a generic script would be that useful. For example, the driver for Intel's  iTCO doesn't support the boot status either, and I am not sure if it can be supported (the documentation is a bit vague). The DesignWare watchdog doesn't support it either. I don't have a datasheet for the DW watchdog, so I can not check if it supports reporting the boot status.

Anyway, I reassigned the bug to myself.

Comment 7 by grundler@google.com, Mar 22 2016

ISTR one of the previous ARM platforms used a global variable to store "bootreason" which was filled in by chipset initialization code, not the watchdog driver.  The WDIOC_GETBOOTSTATUS is "merely" the primary API to get access to that "state" info.

14 drivers in the kernel support WDIOC_GETBOOTSTATUS. If there is a different/better API to get (and log) the same state, we can switch to that instead.

Regardless of which API is used, I would like to modify the platform_HWwatchdog.py autotest to enforce WDT reset logging (not just in CoreBoot "BIOS event logs" for the ChromeOS devices that have that). To collect broad statistics on forced reboots, we need a consistent method of logging this event.

Comment 8 by grundler@google.com, Mar 22 2016

s/bootreason/bootstatus maybe? I'm not finding the code right now though.

Comment 9 by groeck@chromium.org, Mar 22 2016

You were probably looking for reset_status (used by drivers/watchdog/sa1100_wdt.c).

Comment 10 Deleted

Status: Started (was: Assigned)
Yup - reset_status is exactly right - thanks. :)
Project Member

Comment 13 by bugdroid1@chromium.org, Apr 5 2016

Labels: merge-merged-chromeos-3.14
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/0d93f8f36b56b6a922fcd4ae7b586cf09e371323

commit 0d93f8f36b56b6a922fcd4ae7b586cf09e371323
Author: Guenter Roeck <groeck@chromium.org>
Date: Mon Mar 21 22:23:10 2016

CHROMIUM: watchdog: qcom: Report reboot reason

The Qualcom watchdog timer block reports if the system was reset by the
watchdog. Pass the information to user space.

BUG= chromium:593028 
TEST=build for whirlwind (chromeos-3.14 branch) and pass:
   test_that $H platform_HWwatchdog

   and manually verify daisydog emits "watchdog-timeout" when it starts.

Change-Id: Ic4aeeae9da3354eef279151746f1b67ccba834fa
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/334460
Reviewed-by: Guenter Roeck <groeck@google.com>

[modify] https://crrev.com/0d93f8f36b56b6a922fcd4ae7b586cf09e371323/drivers/watchdog/qcom-wdt.c

Project Member

Comment 14 by bugdroid1@chromium.org, Apr 5 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/eeeba2480cb8eec080ca20df2d1bff2583e4a315

commit eeeba2480cb8eec080ca20df2d1bff2583e4a315
Author: Grant Grundler <grundler@google.com>
Date: Tue Mar 22 23:04:23 2016

BACKPORT: watchdog: qcom: initialize wdd.timeout

chromeos-3.18 qcom wdt driver initializes wdd.timeout field.

BUG= chromium:593028 
TEST=build image for whirlwind and pass:
    test_that $H platform_HWwatchdog

    and manually verify daisydog emits "watchdog-timeout" when it starts.

Change-Id: I8e499f4dfaa7eacf29c8d8887fcf4d2d98c4eb57
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/334239
Tested-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Guenter Roeck <groeck@google.com>

[modify] https://crrev.com/eeeba2480cb8eec080ca20df2d1bff2583e4a315/drivers/watchdog/qcom-wdt.c

Comment 15 by cbook@google.com, Apr 11 2016

With this change I see the reboot reported as a watchdog, but now I also see that as the reason when using the "reboot" command:

/dev/watchdog reported boot status: 0x20 watchdog-timeout

Comment 16 by groeck@google.com, Apr 11 2016

That may be technically correct, if the system uses the watchdog driver to reboot itself (via qcom_wdt_restart).

Comment 17 by cbook@google.com, Apr 11 2016

Agreed. But not really what we want to be reporting to UMA...
Sounds like a catch-22. Wonder if there is a way to find out if the restart function triggered the reboot. I'll check the datasheet if I can find something.
Guenter appears to be correct:

107 static int qcom_watchdog_probe(struct platform_device *pdev)
108 {
...
156         /*
157          * WDT restart notifier has priority 0 (use as a last resort)
158          */
159         wdt->restart_nb.notifier_call = qcom_wdt_restart;
160         ret = register_restart_handler(&wdt->restart_nb);

Since I'd like to leave this as-is, what is the "first resort"?
Or what should the system be trying first to restart the system?
Restart handlers depend on the hardware/platform, really. Most of the time there is an arm platform restart handler, or a gpio restart handler. Since this one fires, it looks like no other restart handler is enabled for this platform. The devicetree file should tell. Some qcom devicetree files (qcom-apq8084, qcom-msm8974, msm8916) specify "qcom,pshold", which would have a priority of 128.

It might be possible to use WD1 instead of WD0 to restart the system from the watchdog driver, assuming that WD1 is always available.

Comment 21 by cbook@google.com, Apr 12 2016

Could we write a cookie to memory before performing the intentional reboot?  Then only report a 'watchdog' if this was not intentional?
Let's try using wd1 first. I'll try to submit a patch later today.


Project Member

Comment 23 by sheriffbot@chromium.org, Jun 3 2016

Labels: Hotlist-Google
Status: Archived (was: Started)

Comment 25 by ketakid@google.com, Mar 18 2017

Labels: Pri-3
Status: Available (was: Archived)
Activating. Please assign to the right owner and the appropriate priority.
Project Member

Comment 26 by sheriffbot@chromium.org, Apr 16 2018

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Status: Assigned (was: Untriaged)
This bug has an owner, thus, it's been triaged. Changing status to "assigned".
Status: WontFix (was: Assigned)
Cc: kevinhayes@google.com kyan@chromium.org kkunduru@chromium.org
Adding some original Onhub folks in case they want correct UMA stats for watchdog timeout on whirlwind/arkham.

Sign in to add a comment