New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 773700 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

At times - SSP just seems to fail

Project Member Reported by haddowk@chromium.org, Oct 11 2017

Issue description


Intermittently I see this issue, it seems to become more frequent recently - reboot of the moblab usually resolves the issue.

10/11 15:06:26.151 INFO |          autoserv:0687| Results placed in /usr/local/autotest/results/48-moblab/192.168.231.101
10/11 15:06:26.151 DEBUG|          autoserv:0695| autoserv is running in drone localhost.
10/11 15:06:26.151 DEBUG|          autoserv:0696| autoserv command was: /usr/local/autotest/server/autoserv -p -r /usr/local/autotest/results/48-moblab/192.168.231.101 -m 192.168.231.101 -u moblab -l sand-release/R61-9765.81.0/cts_N/cheets_CTS_N.x86.CtsAccessibilityTestCases -s --lab True -P 48-moblab/192.168.231.101 -n /usr/local/autotest/results/drone_tmp/attach.99 --require-ssp --parent_job_id 12 --verify_job_repo_url --warn-no-ssp
10/11 15:06:26.151 INFO |           pidfile:0016| Logged pid 6641 to /usr/local/autotest/results/48-moblab/192.168.231.101/.autoserv_execute
10/11 15:06:26.154 WARNI|          autoserv:0324| Autoserv is required to run with server-side packaging. However, no drone is found to support server-side packaging. The test will be executed in a drone without server-side packaging supported.


 

Comment 1 by dshi@chromium.org, Oct 11 2017

You can copy the change in
https://chromium-review.googlesource.com/#/c/chromiumos/third_party/autotest/+/713455
to the moblab, restart it couple times. After each restart, try to grep scheduler log for "not support server-side packaging", see if you can reproduce the failure and where the flake comes from.

Comment 2 by dshi@chromium.org, Oct 11 2017

Cc: kenobi@chromium.org
For documentation, on moblab autotest code gets reinstalled every time on boot, so local changes get overwritten, there are some other hacks you can do to prevent this which I will use to try to reproduce.
I think it might be a boot order issue, possibly induced by users where they get frustrated at slow boot speed and press the "run" buttons in mobmonitor causing the boot sequence to not be in the correct order, if the monitor_db started before the container was set up we could get into this situation.

Any thoughts if I wanted to add a new flag to monitor_db called "require_ssp" and the process exited if the drone thought ssp was not available ?  Default would be false.

Comment 5 by dshi@chromium.org, Oct 11 2017

How about delay scheduler start after container is set up?
That is what should happen but there is always the option for the user to use mobmonitor to override the start sequence and start the scheduler early, container setup can be very slow when not in the USA

Comment 7 by dshi@chromium.org, Oct 11 2017

The container setup only needs to be done once, unless the base image is updated in shadow config. 

So it seems that the solution for ssp issue is just to do a reboot?
Yes - but it would be good to be able to detect that the scheduler is not working, what has happened is that 2 devices hit this and the suites were set off but not monitored, it likely will cause a delay in 61 being pushed to those devices.

It would be nice to be able to either

1) stop the scheduler if it is in a bad way regarding SSP
 or
2) At least ask the scheduler if SSP is not setup so the UI can prevent a suite being started and the user told to reboot the device.

If you are not keen on 1) any thoughts about how I would do 2) and detect this state without starting a test ?

Thanks

Comment 9 by dshi@chromium.org, Oct 11 2017

2 is a bit hard to implement because suite job is created from RPC handler which has no access to drone information.

1 is doable, we can have a new setting like [SCHEDULE]fail_without_ssp, and only enable it in moblab_config.ini.

Another option, instead of failing scheduler, we can stop tests from running if there is no drone can do ssp. This requires some work in scheduler, but I think it's doable.

drone refresh should update the drone.support_ssp status, so after some time, the moblab can run tests with ssp, and these jobs can starts running.
The catch is that one might ask why the jobs are not running.

ok I am good with 1 if you prefer an ini setting rather than a flag I am good with that, I will work on it tomorrow.

Comment 11 by dshi@chromium.org, Oct 12 2017

I think ini setting is more consistent and easy to control.

So the user experience will be that, scheduler crashes if moblab takes too long to set up base container. User notices that no jobs are running, and reboot the moblab (hopefully by then the base container is ready). Scheduler starts to work after the reboot.

Is that right?
Project Member

Comment 12 by bugdroid1@chromium.org, Oct 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d8c6389d125b277c640a70a9f2e4e22d0ecfbddc

commit d8c6389d125b277c640a70a9f2e4e22d0ecfbddc
Author: Dan Shi <dshi@google.com>
Date: Thu Oct 12 01:32:47 2017

[autotest] Add logging for drone's ssp check failure.

BUG= chromium:773700 
TEST=local run

Change-Id: I8366b9f1746b705087c8a9a3e13a58228a1d328c
Reviewed-on: https://chromium-review.googlesource.com/713455
Commit-Ready: Dan Shi <dshi@google.com>
Tested-by: Dan Shi <dshi@google.com>
Reviewed-by: Keith Haddow <haddowk@chromium.org>

[modify] https://crrev.com/d8c6389d125b277c640a70a9f2e4e22d0ecfbddc/scheduler/drones.py

The scheduler will exit with an error code, I will check but likely we will have the upstart config re-start the job until it is successful.

I will think about the restart but first step of getting the scheduler to stop when it is in a mode not compatible with running SSP tests is a good first step.
I confirmed with the user that they were pressing the run button on the scheduler before it was normally started by upstart, likely because the system is so slow to boot.

I have done some work in 61 to improve boot speeds so hopefully this will be less of an issue going forward.
Owner: haddowk@chromium.org
Status: Assigned (was: Untriaged)
Project Member

Comment 16 by bugdroid1@chromium.org, Oct 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/b1c3e6e574c1b18e19b9c264d78e233fa3617a12

commit b1c3e6e574c1b18e19b9c264d78e233fa3617a12
Author: Keith Haddow <haddowk@chromium.org>
Date: Tue Oct 17 15:51:32 2017

:[autotest] Add option to exit the scheduler if SSP is not available.

Moblab needs SSP, there are some boot up sequences that cause the
schedular to start before all the SSP components are available.

If this happens we want the option to exit the schedular.  Moblab
will keep trying to restart the schedular, if SSP becomes avaialbe
schedular will be started and run normally.

BUG= chromium:773700 
TEST=manual testing on moblab.

Change-Id: Ie5e323c7880f12d31a2b3ff5c81790b6c2ca3ebe
Reviewed-on: https://chromium-review.googlesource.com/719241
Commit-Ready: Keith Haddow <haddowk@chromium.org>
Tested-by: Keith Haddow <haddowk@chromium.org>
Reviewed-by: Keith Haddow <haddowk@chromium.org>

[modify] https://crrev.com/b1c3e6e574c1b18e19b9c264d78e233fa3617a12/scheduler/drones.py

Project Member

Comment 17 by bugdroid1@chromium.org, Oct 17 2017

Labels: merge-merged-release-R61-9765.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d9a38812de09832892052d3b992b2d72800af427

commit d9a38812de09832892052d3b992b2d72800af427
Author: Keith Haddow <haddowk@chromium.org>
Date: Tue Oct 17 16:15:46 2017

:[autotest] Add option to exit the scheduler if SSP is not available.

Moblab needs SSP, there are some boot up sequences that cause the
schedular to start before all the SSP components are available.

If this happens we want the option to exit the schedular.  Moblab
will keep trying to restart the schedular, if SSP becomes avaialbe
schedular will be started and run normally.

BUG= chromium:773700 
TEST=manual testing on moblab.

Change-Id: Ie5e323c7880f12d31a2b3ff5c81790b6c2ca3ebe
Reviewed-on: https://chromium-review.googlesource.com/719241
Commit-Ready: Keith Haddow <haddowk@chromium.org>
Tested-by: Keith Haddow <haddowk@chromium.org>
Reviewed-by: Keith Haddow <haddowk@chromium.org>
(cherry picked from commit b1c3e6e574c1b18e19b9c264d78e233fa3617a12)
Reviewed-on: https://chromium-review.googlesource.com/723473
Commit-Queue: Keith Haddow <haddowk@chromium.org>

[modify] https://crrev.com/d9a38812de09832892052d3b992b2d72800af427/scheduler/drones.py

Project Member

Comment 18 by bugdroid1@chromium.org, Oct 17 2017

Labels: merge-merged-release-R62-9901.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/caf7674b37db3ed10ab79bf56c57513adb583e4e

commit caf7674b37db3ed10ab79bf56c57513adb583e4e
Author: Keith Haddow <haddowk@chromium.org>
Date: Tue Oct 17 16:15:51 2017

:[autotest] Add option to exit the scheduler if SSP is not available.

Moblab needs SSP, there are some boot up sequences that cause the
schedular to start before all the SSP components are available.

If this happens we want the option to exit the schedular.  Moblab
will keep trying to restart the schedular, if SSP becomes avaialbe
schedular will be started and run normally.

BUG= chromium:773700 
TEST=manual testing on moblab.

Change-Id: Ie5e323c7880f12d31a2b3ff5c81790b6c2ca3ebe
Reviewed-on: https://chromium-review.googlesource.com/719241
Commit-Ready: Keith Haddow <haddowk@chromium.org>
Tested-by: Keith Haddow <haddowk@chromium.org>
Reviewed-by: Keith Haddow <haddowk@chromium.org>
(cherry picked from commit b1c3e6e574c1b18e19b9c264d78e233fa3617a12)
Reviewed-on: https://chromium-review.googlesource.com/723472
Commit-Queue: Keith Haddow <haddowk@chromium.org>

[modify] https://crrev.com/caf7674b37db3ed10ab79bf56c57513adb583e4e/scheduler/drones.py

Project Member

Comment 19 by bugdroid1@chromium.org, Oct 17 2017

Labels: merge-merged-release-R63-10032.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/a3895f7b5a8872bbcfba6753aa98ee69182ac05f

commit a3895f7b5a8872bbcfba6753aa98ee69182ac05f
Author: Keith Haddow <haddowk@chromium.org>
Date: Tue Oct 17 16:15:54 2017

:[autotest] Add option to exit the scheduler if SSP is not available.

Moblab needs SSP, there are some boot up sequences that cause the
schedular to start before all the SSP components are available.

If this happens we want the option to exit the schedular.  Moblab
will keep trying to restart the schedular, if SSP becomes avaialbe
schedular will be started and run normally.

BUG= chromium:773700 
TEST=manual testing on moblab.

Change-Id: Ie5e323c7880f12d31a2b3ff5c81790b6c2ca3ebe
Reviewed-on: https://chromium-review.googlesource.com/719241
Commit-Ready: Keith Haddow <haddowk@chromium.org>
Tested-by: Keith Haddow <haddowk@chromium.org>
Reviewed-by: Keith Haddow <haddowk@chromium.org>
(cherry picked from commit b1c3e6e574c1b18e19b9c264d78e233fa3617a12)
Reviewed-on: https://chromium-review.googlesource.com/723471
Commit-Queue: Keith Haddow <haddowk@chromium.org>

[modify] https://crrev.com/a3895f7b5a8872bbcfba6753aa98ee69182ac05f/scheduler/drones.py

Project Member

Comment 20 by bugdroid1@chromium.org, Oct 18 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/713b7bdb0b932dc3f16f138dfe8f559db31a1fee

commit 713b7bdb0b932dc3f16f138dfe8f559db31a1fee
Author: Keith Haddow <haddowk@chromium.org>
Date: Wed Oct 18 23:15:22 2017

[moblab] Update the settings in the moblab_config.ini

Uprev the repair image to a recent cut of 61

Add the option where schedular will exit if SSP is not availabe,
this allows the system to keep restarting the schedular until
SSP is available.

BUG= chromium:773700 
TEST=manual moblab tests

Change-Id: Ice0f68e07fc1f1906966dabfeda57b30d330829f
Reviewed-on: https://chromium-review.googlesource.com/719464
Commit-Ready: Keith Haddow <haddowk@chromium.org>
Tested-by: Keith Haddow <haddowk@chromium.org>
Reviewed-by: Dan Shi <dshi@google.com>

[modify] https://crrev.com/713b7bdb0b932dc3f16f138dfe8f559db31a1fee/moblab_config.ini

Project Member

Comment 21 by bugdroid1@chromium.org, Oct 19 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/288f92da0980089806d9d63b09a699d952c17308

commit 288f92da0980089806d9d63b09a699d952c17308
Author: Keith Haddow <haddowk@chromium.org>
Date: Thu Oct 19 14:01:21 2017

[moblab] Update the settings in the moblab_config.ini

Uprev the repair image to a recent cut of 61

Add the option where schedular will exit if SSP is not availabe,
this allows the system to keep restarting the schedular until
SSP is available.

BUG= chromium:773700 
TEST=manual moblab tests

Change-Id: Ice0f68e07fc1f1906966dabfeda57b30d330829f
Reviewed-on: https://chromium-review.googlesource.com/719464
Commit-Ready: Keith Haddow <haddowk@chromium.org>
Tested-by: Keith Haddow <haddowk@chromium.org>
Reviewed-by: Dan Shi <dshi@google.com>
(cherry picked from commit 713b7bdb0b932dc3f16f138dfe8f559db31a1fee)
Reviewed-on: https://chromium-review.googlesource.com/727807
Reviewed-by: Keith Haddow <haddowk@chromium.org>
Commit-Queue: Keith Haddow <haddowk@chromium.org>

[modify] https://crrev.com/288f92da0980089806d9d63b09a699d952c17308/moblab_config.ini

Project Member

Comment 22 by bugdroid1@chromium.org, Oct 19 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/3eb9710a5fe7782dc0c12b6ec4b50c2b8fc035a4

commit 3eb9710a5fe7782dc0c12b6ec4b50c2b8fc035a4
Author: Keith Haddow <haddowk@chromium.org>
Date: Thu Oct 19 16:45:45 2017

[moblab] Update the settings in the moblab_config.ini

Uprev the repair image to a recent cut of 61

Add the option where schedular will exit if SSP is not availabe,
this allows the system to keep restarting the schedular until
SSP is available.

BUG= chromium:773700 
TEST=manual moblab tests

Change-Id: Ice0f68e07fc1f1906966dabfeda57b30d330829f
Reviewed-on: https://chromium-review.googlesource.com/719464
Commit-Ready: Keith Haddow <haddowk@chromium.org>
Tested-by: Keith Haddow <haddowk@chromium.org>
Reviewed-by: Dan Shi <dshi@google.com>
Reviewed-on: https://chromium-review.googlesource.com/728380
Reviewed-by: Keith Haddow <haddowk@chromium.org>
Commit-Queue: Keith Haddow <haddowk@chromium.org>

[modify] https://crrev.com/3eb9710a5fe7782dc0c12b6ec4b50c2b8fc035a4/moblab_config.ini

Project Member

Comment 23 by bugdroid1@chromium.org, Oct 19 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d868bf55b3485eae16348d5502173246711f7fc1

commit d868bf55b3485eae16348d5502173246711f7fc1
Author: Keith Haddow <haddowk@chromium.org>
Date: Thu Oct 19 19:45:42 2017

[moblab] Update the settings in the moblab_config.ini

Uprev the repair image to a recent cut of 61

Add the option where schedular will exit if SSP is not availabe,
this allows the system to keep restarting the schedular until
SSP is available.

BUG= chromium:773700 
TEST=manual moblab tests

Change-Id: Ice0f68e07fc1f1906966dabfeda57b30d330829f
Reviewed-on: https://chromium-review.googlesource.com/719464
Commit-Ready: Keith Haddow <haddowk@chromium.org>
Tested-by: Keith Haddow <haddowk@chromium.org>
Reviewed-by: Dan Shi <dshi@google.com>
(cherry picked from commit 713b7bdb0b932dc3f16f138dfe8f559db31a1fee)
Reviewed-on: https://chromium-review.googlesource.com/727808
Reviewed-by: Keith Haddow <haddowk@chromium.org>
Commit-Queue: Keith Haddow <haddowk@chromium.org>

[modify] https://crrev.com/d868bf55b3485eae16348d5502173246711f7fc1/moblab_config.ini

Status: Fixed (was: Assigned)

Comment 25 by dchan@chromium.org, Jan 22 2018

Status: archived (was: Fixed)

Comment 26 by dchan@chromium.org, Jan 23 2018

Status: Fixed (was: Archived)

Sign in to add a comment