At times - SSP just seems to fail |
|||||||||
Issue descriptionIntermittently I see this issue, it seems to become more frequent recently - reboot of the moblab usually resolves the issue. 10/11 15:06:26.151 INFO | autoserv:0687| Results placed in /usr/local/autotest/results/48-moblab/192.168.231.101 10/11 15:06:26.151 DEBUG| autoserv:0695| autoserv is running in drone localhost. 10/11 15:06:26.151 DEBUG| autoserv:0696| autoserv command was: /usr/local/autotest/server/autoserv -p -r /usr/local/autotest/results/48-moblab/192.168.231.101 -m 192.168.231.101 -u moblab -l sand-release/R61-9765.81.0/cts_N/cheets_CTS_N.x86.CtsAccessibilityTestCases -s --lab True -P 48-moblab/192.168.231.101 -n /usr/local/autotest/results/drone_tmp/attach.99 --require-ssp --parent_job_id 12 --verify_job_repo_url --warn-no-ssp 10/11 15:06:26.151 INFO | pidfile:0016| Logged pid 6641 to /usr/local/autotest/results/48-moblab/192.168.231.101/.autoserv_execute 10/11 15:06:26.154 WARNI| autoserv:0324| Autoserv is required to run with server-side packaging. However, no drone is found to support server-side packaging. The test will be executed in a drone without server-side packaging supported.
,
Oct 11 2017
,
Oct 11 2017
For documentation, on moblab autotest code gets reinstalled every time on boot, so local changes get overwritten, there are some other hacks you can do to prevent this which I will use to try to reproduce.
,
Oct 11 2017
I think it might be a boot order issue, possibly induced by users where they get frustrated at slow boot speed and press the "run" buttons in mobmonitor causing the boot sequence to not be in the correct order, if the monitor_db started before the container was set up we could get into this situation. Any thoughts if I wanted to add a new flag to monitor_db called "require_ssp" and the process exited if the drone thought ssp was not available ? Default would be false.
,
Oct 11 2017
How about delay scheduler start after container is set up?
,
Oct 11 2017
That is what should happen but there is always the option for the user to use mobmonitor to override the start sequence and start the scheduler early, container setup can be very slow when not in the USA
,
Oct 11 2017
The container setup only needs to be done once, unless the base image is updated in shadow config. So it seems that the solution for ssp issue is just to do a reboot?
,
Oct 11 2017
Yes - but it would be good to be able to detect that the scheduler is not working, what has happened is that 2 devices hit this and the suites were set off but not monitored, it likely will cause a delay in 61 being pushed to those devices. It would be nice to be able to either 1) stop the scheduler if it is in a bad way regarding SSP or 2) At least ask the scheduler if SSP is not setup so the UI can prevent a suite being started and the user told to reboot the device. If you are not keen on 1) any thoughts about how I would do 2) and detect this state without starting a test ? Thanks
,
Oct 11 2017
2 is a bit hard to implement because suite job is created from RPC handler which has no access to drone information. 1 is doable, we can have a new setting like [SCHEDULE]fail_without_ssp, and only enable it in moblab_config.ini. Another option, instead of failing scheduler, we can stop tests from running if there is no drone can do ssp. This requires some work in scheduler, but I think it's doable. drone refresh should update the drone.support_ssp status, so after some time, the moblab can run tests with ssp, and these jobs can starts running. The catch is that one might ask why the jobs are not running.
,
Oct 11 2017
ok I am good with 1 if you prefer an ini setting rather than a flag I am good with that, I will work on it tomorrow.
,
Oct 12 2017
I think ini setting is more consistent and easy to control. So the user experience will be that, scheduler crashes if moblab takes too long to set up base container. User notices that no jobs are running, and reboot the moblab (hopefully by then the base container is ready). Scheduler starts to work after the reboot. Is that right?
,
Oct 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d8c6389d125b277c640a70a9f2e4e22d0ecfbddc commit d8c6389d125b277c640a70a9f2e4e22d0ecfbddc Author: Dan Shi <dshi@google.com> Date: Thu Oct 12 01:32:47 2017 [autotest] Add logging for drone's ssp check failure. BUG= chromium:773700 TEST=local run Change-Id: I8366b9f1746b705087c8a9a3e13a58228a1d328c Reviewed-on: https://chromium-review.googlesource.com/713455 Commit-Ready: Dan Shi <dshi@google.com> Tested-by: Dan Shi <dshi@google.com> Reviewed-by: Keith Haddow <haddowk@chromium.org> [modify] https://crrev.com/d8c6389d125b277c640a70a9f2e4e22d0ecfbddc/scheduler/drones.py
,
Oct 12 2017
The scheduler will exit with an error code, I will check but likely we will have the upstart config re-start the job until it is successful. I will think about the restart but first step of getting the scheduler to stop when it is in a mode not compatible with running SSP tests is a good first step.
,
Oct 12 2017
I confirmed with the user that they were pressing the run button on the scheduler before it was normally started by upstart, likely because the system is so slow to boot. I have done some work in 61 to improve boot speeds so hopefully this will be less of an issue going forward.
,
Oct 13 2017
,
Oct 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/b1c3e6e574c1b18e19b9c264d78e233fa3617a12 commit b1c3e6e574c1b18e19b9c264d78e233fa3617a12 Author: Keith Haddow <haddowk@chromium.org> Date: Tue Oct 17 15:51:32 2017 :[autotest] Add option to exit the scheduler if SSP is not available. Moblab needs SSP, there are some boot up sequences that cause the schedular to start before all the SSP components are available. If this happens we want the option to exit the schedular. Moblab will keep trying to restart the schedular, if SSP becomes avaialbe schedular will be started and run normally. BUG= chromium:773700 TEST=manual testing on moblab. Change-Id: Ie5e323c7880f12d31a2b3ff5c81790b6c2ca3ebe Reviewed-on: https://chromium-review.googlesource.com/719241 Commit-Ready: Keith Haddow <haddowk@chromium.org> Tested-by: Keith Haddow <haddowk@chromium.org> Reviewed-by: Keith Haddow <haddowk@chromium.org> [modify] https://crrev.com/b1c3e6e574c1b18e19b9c264d78e233fa3617a12/scheduler/drones.py
,
Oct 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d9a38812de09832892052d3b992b2d72800af427 commit d9a38812de09832892052d3b992b2d72800af427 Author: Keith Haddow <haddowk@chromium.org> Date: Tue Oct 17 16:15:46 2017 :[autotest] Add option to exit the scheduler if SSP is not available. Moblab needs SSP, there are some boot up sequences that cause the schedular to start before all the SSP components are available. If this happens we want the option to exit the schedular. Moblab will keep trying to restart the schedular, if SSP becomes avaialbe schedular will be started and run normally. BUG= chromium:773700 TEST=manual testing on moblab. Change-Id: Ie5e323c7880f12d31a2b3ff5c81790b6c2ca3ebe Reviewed-on: https://chromium-review.googlesource.com/719241 Commit-Ready: Keith Haddow <haddowk@chromium.org> Tested-by: Keith Haddow <haddowk@chromium.org> Reviewed-by: Keith Haddow <haddowk@chromium.org> (cherry picked from commit b1c3e6e574c1b18e19b9c264d78e233fa3617a12) Reviewed-on: https://chromium-review.googlesource.com/723473 Commit-Queue: Keith Haddow <haddowk@chromium.org> [modify] https://crrev.com/d9a38812de09832892052d3b992b2d72800af427/scheduler/drones.py
,
Oct 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/caf7674b37db3ed10ab79bf56c57513adb583e4e commit caf7674b37db3ed10ab79bf56c57513adb583e4e Author: Keith Haddow <haddowk@chromium.org> Date: Tue Oct 17 16:15:51 2017 :[autotest] Add option to exit the scheduler if SSP is not available. Moblab needs SSP, there are some boot up sequences that cause the schedular to start before all the SSP components are available. If this happens we want the option to exit the schedular. Moblab will keep trying to restart the schedular, if SSP becomes avaialbe schedular will be started and run normally. BUG= chromium:773700 TEST=manual testing on moblab. Change-Id: Ie5e323c7880f12d31a2b3ff5c81790b6c2ca3ebe Reviewed-on: https://chromium-review.googlesource.com/719241 Commit-Ready: Keith Haddow <haddowk@chromium.org> Tested-by: Keith Haddow <haddowk@chromium.org> Reviewed-by: Keith Haddow <haddowk@chromium.org> (cherry picked from commit b1c3e6e574c1b18e19b9c264d78e233fa3617a12) Reviewed-on: https://chromium-review.googlesource.com/723472 Commit-Queue: Keith Haddow <haddowk@chromium.org> [modify] https://crrev.com/caf7674b37db3ed10ab79bf56c57513adb583e4e/scheduler/drones.py
,
Oct 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/a3895f7b5a8872bbcfba6753aa98ee69182ac05f commit a3895f7b5a8872bbcfba6753aa98ee69182ac05f Author: Keith Haddow <haddowk@chromium.org> Date: Tue Oct 17 16:15:54 2017 :[autotest] Add option to exit the scheduler if SSP is not available. Moblab needs SSP, there are some boot up sequences that cause the schedular to start before all the SSP components are available. If this happens we want the option to exit the schedular. Moblab will keep trying to restart the schedular, if SSP becomes avaialbe schedular will be started and run normally. BUG= chromium:773700 TEST=manual testing on moblab. Change-Id: Ie5e323c7880f12d31a2b3ff5c81790b6c2ca3ebe Reviewed-on: https://chromium-review.googlesource.com/719241 Commit-Ready: Keith Haddow <haddowk@chromium.org> Tested-by: Keith Haddow <haddowk@chromium.org> Reviewed-by: Keith Haddow <haddowk@chromium.org> (cherry picked from commit b1c3e6e574c1b18e19b9c264d78e233fa3617a12) Reviewed-on: https://chromium-review.googlesource.com/723471 Commit-Queue: Keith Haddow <haddowk@chromium.org> [modify] https://crrev.com/a3895f7b5a8872bbcfba6753aa98ee69182ac05f/scheduler/drones.py
,
Oct 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/713b7bdb0b932dc3f16f138dfe8f559db31a1fee commit 713b7bdb0b932dc3f16f138dfe8f559db31a1fee Author: Keith Haddow <haddowk@chromium.org> Date: Wed Oct 18 23:15:22 2017 [moblab] Update the settings in the moblab_config.ini Uprev the repair image to a recent cut of 61 Add the option where schedular will exit if SSP is not availabe, this allows the system to keep restarting the schedular until SSP is available. BUG= chromium:773700 TEST=manual moblab tests Change-Id: Ice0f68e07fc1f1906966dabfeda57b30d330829f Reviewed-on: https://chromium-review.googlesource.com/719464 Commit-Ready: Keith Haddow <haddowk@chromium.org> Tested-by: Keith Haddow <haddowk@chromium.org> Reviewed-by: Dan Shi <dshi@google.com> [modify] https://crrev.com/713b7bdb0b932dc3f16f138dfe8f559db31a1fee/moblab_config.ini
,
Oct 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/288f92da0980089806d9d63b09a699d952c17308 commit 288f92da0980089806d9d63b09a699d952c17308 Author: Keith Haddow <haddowk@chromium.org> Date: Thu Oct 19 14:01:21 2017 [moblab] Update the settings in the moblab_config.ini Uprev the repair image to a recent cut of 61 Add the option where schedular will exit if SSP is not availabe, this allows the system to keep restarting the schedular until SSP is available. BUG= chromium:773700 TEST=manual moblab tests Change-Id: Ice0f68e07fc1f1906966dabfeda57b30d330829f Reviewed-on: https://chromium-review.googlesource.com/719464 Commit-Ready: Keith Haddow <haddowk@chromium.org> Tested-by: Keith Haddow <haddowk@chromium.org> Reviewed-by: Dan Shi <dshi@google.com> (cherry picked from commit 713b7bdb0b932dc3f16f138dfe8f559db31a1fee) Reviewed-on: https://chromium-review.googlesource.com/727807 Reviewed-by: Keith Haddow <haddowk@chromium.org> Commit-Queue: Keith Haddow <haddowk@chromium.org> [modify] https://crrev.com/288f92da0980089806d9d63b09a699d952c17308/moblab_config.ini
,
Oct 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/3eb9710a5fe7782dc0c12b6ec4b50c2b8fc035a4 commit 3eb9710a5fe7782dc0c12b6ec4b50c2b8fc035a4 Author: Keith Haddow <haddowk@chromium.org> Date: Thu Oct 19 16:45:45 2017 [moblab] Update the settings in the moblab_config.ini Uprev the repair image to a recent cut of 61 Add the option where schedular will exit if SSP is not availabe, this allows the system to keep restarting the schedular until SSP is available. BUG= chromium:773700 TEST=manual moblab tests Change-Id: Ice0f68e07fc1f1906966dabfeda57b30d330829f Reviewed-on: https://chromium-review.googlesource.com/719464 Commit-Ready: Keith Haddow <haddowk@chromium.org> Tested-by: Keith Haddow <haddowk@chromium.org> Reviewed-by: Dan Shi <dshi@google.com> Reviewed-on: https://chromium-review.googlesource.com/728380 Reviewed-by: Keith Haddow <haddowk@chromium.org> Commit-Queue: Keith Haddow <haddowk@chromium.org> [modify] https://crrev.com/3eb9710a5fe7782dc0c12b6ec4b50c2b8fc035a4/moblab_config.ini
,
Oct 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d868bf55b3485eae16348d5502173246711f7fc1 commit d868bf55b3485eae16348d5502173246711f7fc1 Author: Keith Haddow <haddowk@chromium.org> Date: Thu Oct 19 19:45:42 2017 [moblab] Update the settings in the moblab_config.ini Uprev the repair image to a recent cut of 61 Add the option where schedular will exit if SSP is not availabe, this allows the system to keep restarting the schedular until SSP is available. BUG= chromium:773700 TEST=manual moblab tests Change-Id: Ice0f68e07fc1f1906966dabfeda57b30d330829f Reviewed-on: https://chromium-review.googlesource.com/719464 Commit-Ready: Keith Haddow <haddowk@chromium.org> Tested-by: Keith Haddow <haddowk@chromium.org> Reviewed-by: Dan Shi <dshi@google.com> (cherry picked from commit 713b7bdb0b932dc3f16f138dfe8f559db31a1fee) Reviewed-on: https://chromium-review.googlesource.com/727808 Reviewed-by: Keith Haddow <haddowk@chromium.org> Commit-Queue: Keith Haddow <haddowk@chromium.org> [modify] https://crrev.com/d868bf55b3485eae16348d5502173246711f7fc1/moblab_config.ini
,
Nov 8 2017
,
Jan 22 2018
,
Jan 23 2018
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by dshi@chromium.org
, Oct 11 2017