Get a watchdog daemon onto infra devices |
|||
Issue descriptionThis daemon will serve to restart and/or heal the device if it gets into a bad state.
,
Aug 12 2016
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra.git/+/8efe78cbfaa8f69bd0518dd6dd2048379c75bf8e commit 8efe78cbfaa8f69bd0518dd6dd2048379c75bf8e Author: bpastene <bpastene@chromium.org> Date: Fri Aug 12 00:45:30 2016 Add support for building Go for Android. With GOOS=android and GOARCH=arm set, this will let you build binaries that run on devices. BUG= 632895 Review-Url: https://codereview.chromium.org/2226873002 [add] https://crrev.com/8efe78cbfaa8f69bd0518dd6dd2048379c75bf8e/go/mobile_env.py [add] https://crrev.com/8efe78cbfaa8f69bd0518dd6dd2048379c75bf8e/go/src/infra/tools/device_watchdog/device_watchdog.infra_testing [add] https://crrev.com/8efe78cbfaa8f69bd0518dd6dd2048379c75bf8e/go/src/infra/tools/device_watchdog/main.go
,
Aug 16 2016
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra.git/+/ceb9067257a11233b4b8f83e781281ac58773de2 commit ceb9067257a11233b4b8f83e781281ac58773de2 Author: bpastene <bpastene@chromium.org> Date: Tue Aug 16 19:58:04 2016 Add support for a whitelist of supported platforms for a CIPD package. Also create an android-only package for the device watchdog. BUG= 632895 Review-Url: https://codereview.chromium.org/2247983002 [modify] https://crrev.com/ceb9067257a11233b4b8f83e781281ac58773de2/build/README.md [modify] https://crrev.com/ceb9067257a11233b4b8f83e781281ac58773de2/build/build.py [add] https://crrev.com/ceb9067257a11233b4b8f83e781281ac58773de2/build/packages/device_watchdog.yaml [modify] https://crrev.com/ceb9067257a11233b4b8f83e781281ac58773de2/go/mobile_env.py
,
Aug 16 2016
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra.git/+/603782c6c8f1369ebe08f0905371057d32240c5d commit 603782c6c8f1369ebe08f0905371057d32240c5d Author: bpastene <bpastene@chromium.org> Date: Tue Aug 16 22:36:23 2016 Add GOOS=android cross-compile to infra-continous builder. BUG= 632895 Review-Url: https://codereview.chromium.org/2249223002 [modify] https://crrev.com/603782c6c8f1369ebe08f0905371057d32240c5d/recipes/recipes/infra_continuous.expected/infra-cross-compile.json [modify] https://crrev.com/603782c6c8f1369ebe08f0905371057d32240c5d/recipes/recipes/infra_continuous.py
,
Aug 17 2016
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra.git/+/5800afbf62d889d3db5a261e45da7c1c956fba8e commit 5800afbf62d889d3db5a261e45da7c1c956fba8e Author: bpastene <bpastene@chromium.org> Date: Wed Aug 17 02:44:56 2016 Implement device watchdog. BUG= 632895 Review-Url: https://codereview.chromium.org/2241963002 [modify] https://crrev.com/5800afbf62d889d3db5a261e45da7c1c956fba8e/go/src/infra/tools/device_watchdog/main.go
,
Aug 17 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/puppet/+/b69085fbe07ce6911897423190636d33b26bd139 commit b69085fbe07ce6911897423190636d33b26bd139 Author: Benjamin Pastene <bpastene@google.com> Date: Wed Aug 17 23:03:06 2016
,
Aug 17 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/puppet/+/106172640aba87f983b07ea9e346a07f6860a614 commit 106172640aba87f983b07ea9e346a07f6860a614 Author: Benjamin Pastene <bpastene@google.com> Date: Wed Aug 17 23:14:24 2016
,
Aug 18 2016
,
Aug 25 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/puppet/+/8bc9b6e0aad15b725689b5168f3652bb2f8526ec commit 8bc9b6e0aad15b725689b5168f3652bb2f8526ec Author: Benjamin Pastene <bpastene@google.com> Date: Thu Aug 25 22:13:48 2016
,
Aug 29 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/infra_internal.git/+/3dccac1684da104d90ddf7254af6cf92a59a7682 commit 3dccac1684da104d90ddf7254af6cf92a59a7682 Author: bpastene <bpastene@google.com> Date: Mon Aug 29 20:55:54 2016
,
Aug 29 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/infra_internal.git/+/035e5e7b4a67209239f96446cea0a4d2a22e220b commit 035e5e7b4a67209239f96446cea0a4d2a22e220b Author: bpastene <bpastene@google.com> Date: Mon Aug 29 21:11:13 2016
,
Aug 31 2016
Update: watchdog deployed everywhere. It's been correctly rebooting phones when needed, but some phones, especially those that really need it, seem to slip through the cracks: http://shortn/_AVVVJrv3ty Specifically 00e7a97549912611 should be getting rebooted but it's not. From the device: root@bullhead:/ # ps | grep watchdog root 6216 1 827180 3076 0 00aadb6794 R /data/local/tmp/cit_watchdog root@bullhead:/ # root@bullhead:/ # root@bullhead:/ # cat /proc/6216/stack [<0000000000000000>] __switch_to+0x7c/0x88 [<0000000000000000>] cpu_worker_pools+0x77c/0x780 [<0000000000000000>] 0xffffffffffffffff root@bullhead:/ # root@bullhead:/ # cat /proc/6216/status Name: cit_watchdog State: R (running) Tgid: 6216 Pid: 6216 PPid: 1 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 64 Groups: 1004 1007 1011 1015 1028 3001 3002 3003 3006 VmPeak: 827180 kB VmSize: 827180 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 3076 kB VmRSS: 3076 kB VmData: 794156 kB VmStk: 136 kB VmExe: 1168 kB VmLib: 29232 kB VmPTE: 80 kB VmSwap: 0 kB Threads: 1 SigQ: 6/5842 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000000000 SigCgt: fffffffe7fc1feff CapInh: 0000000000000000 CapPrm: 0000001fffffffff CapEff: 0000001fffffffff CapBnd: 0000001fffffffff Seccomp: 0 Cpus_allowed: 3f Cpus_allowed_list: 0-5 Mems_allowed: 1 Mems_allowed_list: 0 voluntary_ctxt_switches: 0 nonvoluntary_ctxt_switches: 458720
,
Aug 31 2016
root@bullhead:/ # cat /proc/6216/schedstat 22744458145465 12210252825 475018 Not sure what the fields in schedstat correspond to, but I'm sure the values listed on there can't be good... For reference, this is what's listed for the watchdog on a more healthier device: root@bullhead:/ # cat /proc/6979/schedstat 4469792 0 1
,
Aug 31 2016
With stip@'s (and strace's) help, I managed to track down why the process was hanging. Turns out it has trouble reading from /proc/uptime at times. Will need to add some timeouts to the file I/O. It'd also be a good idea to trigger a reboot if it fails too many times in a row.
,
Sep 9 2016
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra.git/+/d026d656c86ae12a6e180630fd3f6185c788ca28 commit d026d656c86ae12a6e180630fd3f6185c788ca28 Author: bpastene <bpastene@chromium.org> Date: Fri Sep 09 22:12:47 2016 Change daemonize logic in watchdog and add timeout to file system read. Daemonize via fork made goroutines behave strangely, so this uses exec instead. Reading from /proc/uptime can hang indefinitely on some phones. This adds a timeout. BUG= 632895 Review-Url: https://codereview.chromium.org/2302193002 [modify] https://crrev.com/d026d656c86ae12a6e180630fd3f6185c788ca28/go/deps.lock [modify] https://crrev.com/d026d656c86ae12a6e180630fd3f6185c788ca28/go/deps.yaml [modify] https://crrev.com/d026d656c86ae12a6e180630fd3f6185c788ca28/go/src/infra/tools/device_watchdog/main.go
,
Sep 12 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/puppet/+/8f6ed4d91384c236328acc6a3f0c4ea8b73edc19 commit 8f6ed4d91384c236328acc6a3f0c4ea8b73edc19 Author: Benjamin Pastene <bpastene@google.com> Date: Mon Sep 12 18:40:05 2016
,
Mar 2 2017
|
|||
►
Sign in to add a comment |
|||
Comment 1 by bugdroid1@chromium.org
, Aug 11 2016