moblab-generic-vm-paladin: Upstart service moblab-scheduler-init not in running state. |
||||||||||||||||||||
Issue descriptionhttps://ci.chromium.org/p/chromeos/builders/luci.chromeos.general/CQ/b8924679387556216816 https://logs.chromium.org/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8924679387556216816/+/steps/MoblabVMTest/0/stdout ----------------------------------------------------------------------------------------------- /tmp/cbuildbotpejVqZ/results/results-1-moblab_DummyServerNoSspSuite [ FAILED ] /tmp/cbuildbotpejVqZ/results/results-1-moblab_DummyServerNoSspSuite ERROR: Unhandled UpstartServiceNotRunning: Upstart service moblab-scheduler-init not in running state. /tmp/cbuildbotpejVqZ/results/results-1-moblab_DummyServerNoSspSuite/moblab_RunSuite [ FAILED ] /tmp/cbuildbotpejVqZ/results/results-1-moblab_DummyServerNoSspSuite/moblab_RunSuite ERROR: Unhandled UpstartServiceNotRunning: Upstart service moblab-scheduler-init not in running state. -----------------------------------------------------------------------------------------------
,
Jan 11
Results are here: https://pantheon.corp.google.com/storage/browser/chromeos-image-archive/moblab-generic-vm-paladin/R73-11562.0.0-rc1/moblab_vm_test_results I don't see anything interesting in the logs ,-( I don't even see where that moblab-scheduler-init service should be started from... There are some other errors in mobmonitor.log, not sure if any of them matter: 2019-01-10 22:34:checkfile.manager:ERROR Failed to execute health check ServoExists: Command '['sudo', 'lsusb']' returned non-zero exit status 1 Traceback (most recent call last): File "/etc/moblab/mobmonitor/checkfile/manager.py", line 130, in DetermineHealthcheckStatus result = healthcheck.Check() File "/etc/moblab/mobmonitor/checkfiles/moblab/servo_check.py", line 26, in Check usbs = osutils.sudo_run_command(cmd).strip() File "/etc/moblab/mobmonitor/util/osutils.py", line 64, in sudo_run_command shell=shell) File "/etc/moblab/mobmonitor/util/osutils.py", line 46, in run_command raise RunCommandError(e.returncode, e.cmd) RunCommandError: Command '['sudo', 'lsusb']' returned non-zero exit status 1 2019-01-10 22:34:moblab.heartbeat_check:INFO Start to check heartbeat 2019-01-10 22:34:moblab.heartbeat_check:INFO Try to import autotest. 2019-01-10 22:34:moblab.heartbeat_check:WARNING Autotest is not ready. 2019-01-10 22:34:checkfile.manager:ERROR Failed to execute health check Heartbeat: too many values to unpack Traceback (most recent call last): File "/etc/moblab/mobmonitor/checkfile/manager.py", line 139, in DetermineHealthcheckStatus description, actions = healthcheck.Diagnose(result) ValueError: too many values to unpack
,
Jan 11
,
Jan 11
+haddowk to triage.
,
Jan 12
#6, precq error is different: crbug.com/921324
,
Jan 15
The original issue is caused by lack of access to cloud storage that moblab needs during its boot up
File "/usr/local/autotest/client/common_lib/utils.py", line 834, in join_bg_jobs
"Command(s) did not complete within %d seconds" % timeout)
autotest_lib.client.common_lib.error.CmdTimeoutError: Command <sudo curl -s https://storage.googleapis.com/abci-ssp/autotest-containers/moblab_base_07.tar.xz -o /tmp/moblab_base_07.tar.xz_ZVClmb> failed, rc=-9, Command(s) did not complete within 180 seconds
* Command:
sudo curl -s https://storage.googleapis.com/abci-ssp/autotest-
containers/moblab_base_07.tar.xz -o /tmp/moblab_base_07.tar.xz_ZVClmb
So this is an infra issue.
,
Jan 15
,
Jan 16
(6 days ago)
I believe this error appeared on the latest CQ run. Build: https://ci.chromium.org/p/chromeos/builders/luci.chromeos.general/Prod/b8924169113349556416 Logs: https://logs.chromium.org/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8924167895517798912/+/steps/MoblabVMTest/0/stdout Can we get this triaged?
,
Jan 16
(6 days ago)
,
Jan 16
(6 days ago)
,
Jan 16
(6 days ago)
,
Jan 16
(6 days ago)
This bug or crbug.com/921324 seems to be blocking my CL from going through: https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1410305
,
Jan 17
(5 days ago)
,
Jan 17
(5 days ago)
,
Jan 17
(5 days ago)
,
Jan 18
(5 days ago)
I think we should mark this builder as experimental, permanently disable it, or hand off its ownership. The test infra team is hands full migrating to the Skylab test infrastructure; feature development on the autotest infrastructure is effectively frozen. So, the value provided to us is minimal.
,
Jan 18
(4 days ago)
In that case, I vote to mark it experimental to prevent further CQ failures.
,
Jan 18
(4 days ago)
Agreed, experimental or remove; either is fine.
,
Jan 18
(4 days ago)
We want this test to be in the CQ, we're debugging the issue right now. I think it's fine to keep it experimental until we resolve this issue.
,
Jan 18
(4 days ago)
,
Jan 18
(4 days ago)
,
Jan 18
(4 days ago)
A use flag 'dlc' is enabled for amd64-generic and arm-generic to run unittest in CQ for dlcservice package. Let's disable it for overlays that are impacted by this. May I know what overlays we should target?
,
Jan 18
(4 days ago)
overlay-moblab-generic-vm, overlay-variant-amd64-generic-embedded and overlay-variant-amd64-generic-mobbuild all inherit from amd64-generic.
,
Jan 18
(4 days ago)
thanks!
,
Jan 18
(4 days ago)
For reference, this is following an email to chatty " Also when the moblab VM fails to provision we see dlcservice and update_engine_client crashes : " Not sure if dlcservice is causing the provision failure itself, but disabling it will at least remove some noise
,
Jan 18
(4 days ago)
,
Jan 18
(4 days ago)
Cl is here: https://chromium-review.googlesource.com/c/chromiumos/overlays/board-overlays/+/1422551 I also triggered a try job against that CL: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8923957630171940736
,
Jan 19
(3 days ago)
Disabling dlcservice doesn't fix the issue. CL https://chromium-review.googlesource.com/c/chromiumos/overlays/board-overlays/+/1422555 that removes it from the build is also failing in moblab-generic-vm-pre-cq because upstart is not running. Log of the pre-cq run: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8923930265940279312 Not sure it's worth chumping the CL since it doesn't appear to help, seems better to wait until the breakage has been resolved. Re-assigning to original owner for further diagnosis.
,
Jan 21
(2 days ago)
Does anybody know why "moblab-generic-vm-paladin" is not marked as experimental? I saw in "http://chromiumos-status.appspot.com/" there is a message "Tree is open (EXPERIMENTAL=moblab-generic-vm-paladin crbug.com/920855 )". But in the latest build ("https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8923760139434038576"), it does not have experimental label on it.
,
Yesterday
(47 hours ago)
,
Yesterday
(44 hours ago)
Since moblab-generic-vm-paladin keeps blocking CQ, we have a CL (https://chromium-review.googlesource.com/c/chromiumos/chromite/+/1424718) to move it to experimental.
,
Yesterday
(43 hours ago)
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/caadb4497559b686a231e2b5029e5abc0ef1090f commit caadb4497559b686a231e2b5029e5abc0ef1090f Author: paulhsia <paulhsia@google.com> Date: Mon Jan 21 10:34:35 2019 moblab-generic-vm: mark as experimental BUG=chromium:920855 TEST=Run local unit tests by command $ ./chromeos_config_unittest --update Change-Id: I41fa7b43f8898222b509dc85915b62a26c4ff314 Reviewed-on: https://chromium-review.googlesource.com/1424718 Commit-Ready: Wei Lee <wtlee@chromium.org> Tested-by: Wei Lee <wtlee@chromium.org> Reviewed-by: Wei Lee <wtlee@chromium.org> [modify] https://crrev.com/caadb4497559b686a231e2b5029e5abc0ef1090f/config/chromeos_config.py [modify] https://crrev.com/caadb4497559b686a231e2b5029e5abc0ef1090f/config/config_dump.json
,
Yesterday
(39 hours ago)
I'm still seeing failures on moblab-generic-vm-pre-cq [1]. Does the change in #34 take time to propagate or is this a different builder? [1] https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8923724957450825408
,
Yesterday
(39 hours ago)
Re: #31, Legoland only renders the experimental '*' on builders if they are set in the configuration that way. It does not render the status set in Tree Status, however, the master-paladin still considers this when deciding whether to pass for fail a run as you can see from the run that you linked to: https://ci.chromium.org/p/chromeos/builders/luci.chromeos.general/Prod/b8923760139434038576 . Therefore that run (and all of them after) failed because the hardware lab is having an outage tracked on issue 923737 . Re: #35: moblab is not configured to block CL's from passing PreCQ so the failures are irrelevant: http://cs/chromeos_public/chromite/lib/constants.py?l=634&rcl=caadb4497559b686a231e2b5029e5abc0ef1090f Please read the Sheriff FAQ <https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriff-details-chromium-os> to prepare for your shift; it's been updated with your responsibilities. In particular, you need to be annotating this failed builds: https://chromiumos-build-annotator.googleplex.com/build_annotations/builds_list/master-paladin/ ; that's your #1 priority.
,
Yesterday
(37 hours ago)
@jclinton: Not sure who you are referring to, but please note that neither Wei (#31) nor me (#35) are build sheriffs. If moblab does not block CLs from passing PreCQ, why does my commit queue flag get reset [1]? This has been happening repeatedly since Jan 14 and it seems like I can't submit my CL. [1] https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1404615
,
Yesterday
(26 hours ago)
Wei (wtlee@) is the non-PST build sheriff for this week. > If moblab does not block CLs from passing PreCQ, why does my commit queue flag get reset [1]? This has been happening repeatedly since Jan 14 and it seems like I can't submit my CL. > > [1] https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1404615 Someone explicitly added it as a special requirement for autotest CL's. Maybe you can contact them to see if it can be removed: http://cs/chromeos_public/src/third_party/autotest/files/COMMIT-QUEUE.ini?l=11&rcl=1eb8fe70531942bd81aca9c7f634c3562fe9e617
,
Today
(23 hours ago)
Prathmesh, since you've added it originally and disabled it temporarily once, should moblab-generic-vm-pre-cq be removed from third_party/autotest/files/COMMIT-QUEUE.ini?
,
Today
(21 hours ago)
,
Today
(4 hours ago)
FWIW, I'm seeing the same thing as ljusten@ in #37 (PreCQ failure resetting my commit queue flag, so I can't submit: https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1387752/10#message-93ddcb9164f9a84152a3600bdf1cd3516bb728ec)
,
Today
(4 hours ago)
I'm not sure why no one has thrown anything like this up yet: https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1429219 It might not be exactly the right thing, but hopefully that will move the conversation... In the meantime, just do what I do and chump ;) Note: I definitely did not recommend chumping. You did *not* hear it here.
,
Today
(4 hours ago)
Also, I noticed there were a couple of passing runs somewhere around Jan 17, but otherwise, this pre-cq has been red for over a week: https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=moblab-generic-vm-pre-cq&buildBranch=master Not sure what that's all about.
,
Today
(3 hours ago)
I spent the whole day trying to debug, it is not really a moblab issue, the moblab VM comes up and does what it is supposed to do. When it runs the provision_AutoUpdate "job" the sub dut ( also a VM ) it reboots the device and the VM comes back with no networking so the provision fails and so the test fails ( WAI ) There have been no moblab changes recently that could cause this break, I am no VM expert so getting the logs of a VM that has no networking is proving challenging. I can get to the VM UI, but not anything useful re-logs. I also am not an AU expert so knowing why calling /usr/bin/update_engine_client --update --omaha_url=http://192.168.231.1:8080/update/moblab-generic-vm-pre-cq/R73-11629.0.0-b3386716 Would stop a VM from booting, again I can not get onto the broken VM to see what is going on. |
||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||
Comment 1 by jclinton@chromium.org
, Jan 11