Add a new app API to enable watchdog behavior in kiosk apps (implementation) |
||||||||||
Issue descriptionExtend chrome.runtime with a restartOnWatchdog(int seconds) method. Calling this should restart the system (similar to the restart() method) after <seconds> have elapsed. The minimal implementation here would be for the watchdog expiration to behave just like restart(). It would be great if we could also gather debug info of some sort, but that needs much more definition and I don't want to block this on it. I'll file a separate issue to track that. This is needed for hotrod. Use case: 1: App push contains critical failure. 2: Rather than just hanging indefinitely the kiosk will enter a reboot loop. 3: As soon as a fixed app is available it will get picked up on one of the reboot cycles and repaired. See internal doc at go/chrome-watchdog-api-proposal for more context. This does NOT cover all failure cases. Building a general and robust watchdog mechanism is a non-goal here. The goal is only to give apps a simple watchdog to use if they want.
,
Apr 28 2016
API proposal is this: https://docs.google.com/a/google.com/document/d/12eF4MmdSHdrGxW0a36ytLVzpFfAAZPqIufevAkIRBEA/edit?usp=sharing Albert changed name a bit in the launch bug: chrome.runtime.restartOnWatchdog(int seconds)
,
Apr 29 2016
,
May 2 2016
One concern from the review is that a bad app (bug or bad design) could cause a device into an endless reboot loop, which would prevent the device from receiving any updates. As such, we should add a throttle or min timeout for the reboots from this API.
,
May 3 2016
,
May 10 2016
,
May 10 2016
,
May 12 2016
+rdevlin fyi
,
May 12 2016
+1 to #4, we need to be careful how we do this. Also, I mildly object to "restartOnWatchdog()" - that seems like really odd naming. Is there a reason to not do something like "scheduleRestart()"?
,
May 12 2016
I have finished implementing this API here: https://codereview.chromium.org/1970613003/. Changing the name of the function will be easy if desired. I also wanted to figure out what we should do when we determine that a restart is requested too soon after a previous one. We will ignore it, yes, but will be reschedule it for later after a minimum duration has passed?
,
May 13 2016
@10, maybe just throw an error?
,
May 13 2016
Yes, that's what I'm currently doing.
,
May 23 2016
Per afakhry's preference, bringing discussion about api naming here. I'd rather not call this "restartOnWatchdog()", which sounds very strange. Counter-proposals: scheduleRestart(), or restartAfterDelay().
,
May 24 2016
I don't have a strong preference on the names. mnilsson@, WDYT? Would the names in #13 work for you?
,
May 25 2016
I also don't have a strong preference in regards to names. I currently have it implemented as "restartOnWatchdog()" as requested in comment #0. I can easily switch to whatever name we agree on.
,
May 26 2016
I can live with all proposals. If anything, restartAfterDelay and scheduleRestart don't necessarily hint that the timer is reschedulable, and restartOnWatchdog reads a bit awkwardly to me too. rescheduleRestart? (re-)setRestartDelay? May I ask where the 3h minimum restart interval came from? Is that cleared if the device is manually rebooted? Is it cleared if the app does an explicit chrome.runtime.restart()? I think it should in both cases. And would anything in this implementation change if we wanted to be able to learn if a previous restart was due to the watchdog? The rationale is described in the proposal doc, but tl;dr - it would be useful for the app to have this knowledge to be able to take precautionary measures if it finds that it was restarted due to an issue. Clear state, refresh caches, reset experiments, etc.
,
May 26 2016
3h is the default poll interval of policy. The purpose of this is to avoid getting into a fast reboot loop, which blocks any updates to the device.
,
May 26 2016
Currently, the chrome.runtime.restart() will not affect the 3h minimum restart interval. It won't be cleared if the device is manually rebooted. Rightnow, the app has no way of knowing that a previous restart was due to the watchdog. If it tries to schedule a reboot too soon, the reboot will be successfully scheduled to happen after the 3h minimum time has passed. What's the preferred behavior?
,
May 26 2016
I think it's a reasonable expectation that there is some way to reset the reboot throttling state, and to me a manual reboot carries the right connotation; It logically implies restoring everything to a known state. Besides you are physically present if you press the power button, so chances are you'd like to be able to see if you get a watchdog-induced reboot immediately after your manual reboot, not be surprised 3 hours later when you have moved along. I think of the watchdog as the last resort in a chain of strategies to restore the system to a working state. For instance, the app might try to remedy unresponsive peripherals, by first resetting the USB device, then restarting the app and if that doesn't help, try rebooting. If the app is still in enough control that it can decide to restart or reboot, I think that might be strong enough a signal that the final recourse is not needed and can be reset. But everything is not well-behaved, and I do see the value of a reboot loop prevention, but maybe it needs to be a bit more forgiving. Maybe allow up to XX reboots within YY minutes before applying the 3 hour throttle. Maybe XX=5 and YY=15 is a reasonable level?
,
May 26 2016
five reboots in fifteen minutes seems like a bunch. If the device needs to reboot every three minutes, something's very wrong. Is there a particular scenario we're expecting when devices would need to restart more once?
,
Jun 1 2016
Please, let's finalize the expected behavior of this API, so that my reviewer and I can move forward.
,
Jun 5 2016
Comment 20, fair enough, 5 reboots within 15 minutes may be too generous and I can reconsider this request if there's a way for the app to understand if the previous boot was a watchdog induced reboot or a normal boot. This is also to be able to log impressions to monitor the frequency of watchdog reboots. The reason is to support increasingly more aggressive remedies to clear a bad state as I wrote in comment 19, maybe with a watchdog reboot between each recovery step, so it needs to be more than 1 in 3 hours. At least 2 reboots in fairly quick succession. I also still think that a manual reboot should clear the throttle.
,
Jun 6 2016
Re #22: Why we need to reboot via the watchdog api for remedies? Can't we use runtime.restart?
,
Jun 6 2016
I get the need for clearing the throttle when manual reboot has been done. What I don't see is the need to allow XX reboots within YY minutes before applying the throttle. If the motivation behind that is to give apps a chance to do several trials to fix its state before it gives up, I think it's fair enough to let the app know whether its request "has been successfully scheduled as requested" or it has been "throttled". Throttling in itself is an indication that something went wrong previously, and the app should try to take actions to prevent entering the bad state again.
,
Jun 7 2016
The key for the app to be able to take some post-watchdog reboot action (and to successfully record an impression) is to understand that it was restarted by means of the watchdog, outside its control. I suspect that by some strict definition an app can't reliably keep score of this on its own in all circumstances. In comment 18, it sounded like once the app starts using the watchdog, a chrome.runtime.restart would be treated the same way as a watchdog restart, so that's why I was reluctant to go with throttling after the first reboot. The effect would be that if the app sets the watchdog, detects some error condition, attempts to recover by calling chrome.runtime.restart on its own, the throttle would then be on, and any subsequent watchdog reboot would wait a minimum of 3 hours, even though the initial chrome.runtime.restart was under the app's control. (And I wouldn't want to disable the watchdog right before the reboot just to circumvent this.) For this reason, I mean it would be better that voluntary or manual reboots clears the throttle, and with that I'd agree that a throttle after the first watchdog reboot is reasonable. This doesn't preclude having a separate reboot throttle against undue use of chrome.runtime.reboot (maybe because of app error), but since rebooting may be a legitimate course of action to recover, such a limiter should not kick in after the first reboot. In summary, if watchdog-reboot and chrome.runtime.reboot were treated the same, then I suggested XX/YY where XX>1. If chrome can tell them apart, then I'm fine with XX=1 for watchdog, and a separate XX/YY for chrome.runtime.reboot. Re Comment 23: The purpose of is to protect against unforeseen application errors. If an attempted remedy exposes a new error, the watchdog may kick in again. I agree that this is bad behavior, and the desired choice is to use chrome.runtime.reboot for known issues.
,
Jun 7 2016
Re comment 24: yes, getting a signal back that the watchdog is in a throttled state would do, being equivalent to a signal that it was rebooted due to watchdog. Provided chrome.runtime.restart doesn't count towards this.
,
Jun 7 2016
Just to clarify. Currently, the existing chrome.runtime.restart() doesn't have any throttling applied to it, and it doesn't affect the throttling behavior at all. Each time an app calls chrome.runtime.restart() it will certainly succeed. I think this is desired for this non-watchdog function, which should be used for known restart needs.
In summary, Here's what I will do (Please comment so that we can move forward with the implementation):
chrome.runtime.restartAfterDelay(function() {
// chrome.runtime.lastError = "Restart request too soon, throttled"; will
// be set in the case of throttling. chrome.runtime.lastError will be
// undefined otherwise (i.e. successfully scheduled).
});
Any restarts not as a result of calling chrome.runtime.restartAfterDelay() (whether due to auto-update or manually by user) will clear the throttle. Except calling chrome.runtime.restart() should not clear the throttle.
How does that sound?
,
Jun 7 2016
Thanks for explaining. SGTM.
,
Jun 15 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6c1c80d89c29b22117b13760643c7fd1734e2466 commit 6c1c80d89c29b22117b13760643c7fd1734e2466 Author: afakhry <afakhry@chromium.org> Date: Wed Jun 15 22:16:38 2016 Add a new app API to enable watchdog behavior restarts in kiosk apps This CL adds chrome.runtime.restartAfterDelay(int seconds, callback) to enable watchdog behavior restarts in kiosk apps. BUG= 604578 TEST=browser_tests --gtest_filter=RestartAfterDelayApiTest.RestartAfterDelayTest Review-Url: https://codereview.chromium.org/1970613003 Cr-Commit-Position: refs/heads/master@{#400027} [modify] https://crrev.com/6c1c80d89c29b22117b13760643c7fd1734e2466/chrome/browser/prefs/browser_prefs.cc [add] https://crrev.com/6c1c80d89c29b22117b13760643c7fd1734e2466/extensions/browser/api/runtime/restart_after_delay_api_unittest.cc [modify] https://crrev.com/6c1c80d89c29b22117b13760643c7fd1734e2466/extensions/browser/api/runtime/runtime_api.cc [modify] https://crrev.com/6c1c80d89c29b22117b13760643c7fd1734e2466/extensions/browser/api/runtime/runtime_api.h [modify] https://crrev.com/6c1c80d89c29b22117b13760643c7fd1734e2466/extensions/browser/extension_function_histogram_value.h [modify] https://crrev.com/6c1c80d89c29b22117b13760643c7fd1734e2466/extensions/browser/test_extensions_browser_client.cc [modify] https://crrev.com/6c1c80d89c29b22117b13760643c7fd1734e2466/extensions/browser/test_extensions_browser_client.h [modify] https://crrev.com/6c1c80d89c29b22117b13760643c7fd1734e2466/extensions/common/api/runtime.json [modify] https://crrev.com/6c1c80d89c29b22117b13760643c7fd1734e2466/extensions/extensions_tests.gypi [modify] https://crrev.com/6c1c80d89c29b22117b13760643c7fd1734e2466/tools/metrics/histograms/histograms.xml
,
Jun 16 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/833b977639b4533097d60323c993d6a298ebe8db commit 833b977639b4533097d60323c993d6a298ebe8db Author: benwells <benwells@chromium.org> Date: Thu Jun 16 12:27:59 2016 Revert "Add a new app API to enable watchdog behavior restarts in kiosk apps" This reverts commit 6c1c80d89c29b22117b13760643c7fd1734e2466. This change was reverted because it fails under DrMemory. First failing build is: https://build.chromium.org/p/chromium.memory.fyi/builders/Windows%20Unit%20%28DrMemory%20full%29%20%285%29/builds/7514 Sample failure output: RestartAfterDelayApiTest.RestartAfterDelayTest: c:\b\build\slave\drm-cr\build\src\extensions\browser\api\runtime\restart_after_delay_api_unittest.cc(230): error: The difference between (desired_restart_time() - last_restart_time).InSecondsF() and base::TimeDelta::FromSeconds(2).InSecondsF() is 0.025495000000000045, which exceeds 0.01, where (desired_restart_time() - last_restart_time).InSecondsF() evaluates to 1.974505, base::TimeDelta::FromSeconds(2).InSecondsF() evaluates to 2, and 0.01 evaluates to 0.01. TBR=afakhry@chromium.org BUG= 604578 Review-Url: https://codereview.chromium.org/2070753003 Cr-Commit-Position: refs/heads/master@{#400136} [modify] https://crrev.com/833b977639b4533097d60323c993d6a298ebe8db/chrome/browser/prefs/browser_prefs.cc [delete] https://crrev.com/7d77d2274c07224580b060a49399215ce0ebce1f/extensions/browser/api/runtime/restart_after_delay_api_unittest.cc [modify] https://crrev.com/833b977639b4533097d60323c993d6a298ebe8db/extensions/browser/api/runtime/runtime_api.cc [modify] https://crrev.com/833b977639b4533097d60323c993d6a298ebe8db/extensions/browser/api/runtime/runtime_api.h [modify] https://crrev.com/833b977639b4533097d60323c993d6a298ebe8db/extensions/browser/extension_function_histogram_value.h [modify] https://crrev.com/833b977639b4533097d60323c993d6a298ebe8db/extensions/browser/test_extensions_browser_client.cc [modify] https://crrev.com/833b977639b4533097d60323c993d6a298ebe8db/extensions/browser/test_extensions_browser_client.h [modify] https://crrev.com/833b977639b4533097d60323c993d6a298ebe8db/extensions/common/api/runtime.json [modify] https://crrev.com/833b977639b4533097d60323c993d6a298ebe8db/extensions/extensions_tests.gypi [modify] https://crrev.com/833b977639b4533097d60323c993d6a298ebe8db/tools/metrics/histograms/histograms.xml
,
Jun 16 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2bd481835dcc7d0af338de28de9b91f136683cbf commit 2bd481835dcc7d0af338de28de9b91f136683cbf Author: afakhry <afakhry@chromium.org> Date: Thu Jun 16 16:17:24 2016 [Reland] Add a new app API to enable watchdog behavior restarts in kiosk apps Original CL: https://codereview.chromium.org/1970613003/ This CL adds chrome.runtime.restartAfterDelay(int seconds, callback) to enable watchdog behavior restarts in kiosk apps. TBR=xiyuan@chromium.org,rdevlin.cronin@chromium.org,benwells@chromium.org,mpearson@chromium.org,pam@chromium.org BUG= 604578 TEST=browser_tests --gtest_filter=RestartAfterDelayApiTest.RestartAfterDelayTest Review-Url: https://codereview.chromium.org/2069923006 Cr-Commit-Position: refs/heads/master@{#400166} [modify] https://crrev.com/2bd481835dcc7d0af338de28de9b91f136683cbf/chrome/browser/prefs/browser_prefs.cc [add] https://crrev.com/2bd481835dcc7d0af338de28de9b91f136683cbf/extensions/browser/api/runtime/restart_after_delay_api_unittest.cc [modify] https://crrev.com/2bd481835dcc7d0af338de28de9b91f136683cbf/extensions/browser/api/runtime/runtime_api.cc [modify] https://crrev.com/2bd481835dcc7d0af338de28de9b91f136683cbf/extensions/browser/api/runtime/runtime_api.h [modify] https://crrev.com/2bd481835dcc7d0af338de28de9b91f136683cbf/extensions/browser/extension_function_histogram_value.h [modify] https://crrev.com/2bd481835dcc7d0af338de28de9b91f136683cbf/extensions/browser/test_extensions_browser_client.cc [modify] https://crrev.com/2bd481835dcc7d0af338de28de9b91f136683cbf/extensions/browser/test_extensions_browser_client.h [modify] https://crrev.com/2bd481835dcc7d0af338de28de9b91f136683cbf/extensions/common/api/runtime.json [modify] https://crrev.com/2bd481835dcc7d0af338de28de9b91f136683cbf/extensions/extensions_tests.gypi [modify] https://crrev.com/2bd481835dcc7d0af338de28de9b91f136683cbf/tools/metrics/histograms/histograms.xml
,
Jun 16 2016
,
Jul 1 2016
bulk verified |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by abodenha@chromium.org
, Apr 19 2016