chromite hasn't been uprevved |
|||||||||
Issue descriptionIn recent push-to-prod, there was always no new commits in chromite. However, there have been queue lots of commits (138) in chromite, see d998ce5f1 4 days ago Xixuan Wu (HEAD, m/master, cros/master) cbuildbot: Remove swarming_cli_cmd. 4b9089142 4 days ago Mike Frysinger tree_status: remove unused modules efcc86a68 4 days ago Mike Frysinger pylintrc: document deprecated-modules field 065a4f374 4 days ago Mike Frysinger pylintrc: ban exit & quit builtins 9b5a3d78d 12 days ago Xixuan Wu cbuildbot: Let SkylabHWTestStage run any suite. ... 8dace2766 5 weeks ago Don Garrett gen_luci_scheduler: Add branched build config support. 68844342e 5 weeks ago Don Garrett chromeos_config: Remove many dead boards. dad47a953 4 weeks ago Caroline Tice Enable CFI on peach_pit and kevin release builders. 62e2a5e8c 4 weeks ago Mike Frysinger cbuildbot: fuzzer: stop processing android stages 69c79e9bc 6 weeks ago Mike Frysinger lint: fix not-an-iterable warnings 8960f7c7b 6 weeks ago Mike Frysinger lint: fix consider-using-enumerate warnings 9e8edc1dd 6 weeks ago Mike Frysinger lint: fix unsubscriptable-object warnings b96253cab 5 weeks ago Amin Hassani cros_payload: deprecate it 5389d34c8 4 weeks ago Bob Moragues (cros/prod-next, cros/prod) config: Fix Build Break on firmware tryjobs on master branch $ git l m/master...cros/prod-next|wc -l 138
,
Aug 21
,
Aug 21
I did some investigation. The change responsible is almost certainly in one of these two lists: https://chrome-internal-review.googlesource.com/q/is:merged+before:2018-07-31++after:2018-07-23 Internal changes that landed the day of the last chromite push that made changes and the day of the first chromite push which did not. https://chromium-review.googlesource.com/q/is:merged+before:2018-07-24++after:2018-07-23+projects:autotest Autotest changes which were in the push itself. (I am willing to rule out the chromite changes being responsible; there were very few and none are plausible candidates.)
,
Aug 22
push master has local changes. These should have been reset by Puppet. Looking
,
Aug 22
Oh, Puppet doesn't reset the chromiumos repo repo
,
Aug 22
pprabhu, shame (although the fault is in the system, not the developer)
WARN: 2018-08-22 15:00:03 -0700: Ignoring configured merge_behavior
WARN: 2018-08-22 15:00:03 -0700: Must have 'deep_merge' gem installed.
WARN: 2018-08-22 15:00:03 -0700: Cannot load backend eyaml: cannot load such file -- hiera/backend/eyaml_backend
2018-08-22 15:00:03,566 INFO| Running atomic task RunTestPushTask
2018-08-22 15:00:03,566 INFO| Starting test push
2018-08-22 15:00:03,566 INFO| Triggering /usr/local/autotest/site_utils/test_push.py
2018-08-22 15:00:03,570 INFO| Triggering /opt/infra-tools/usr/bin/skylab_test_push
[xkcd] Would have run: /usr/local/autotest/site_utils/test_push.py
[xkcd] Would have run: /opt/infra-tools/usr/bin/skylab_test_push
2018-08-22 15:00:03,575 INFO| [xkcd] Success!!!
2018-08-22 15:00:03,576 INFO| Completed test push
2018-08-22 15:00:03,576 INFO| RunTestPushTask succeed.
2018-08-22 15:00:03,576 INFO| Printing out task report.
{
"sub_reports": [],
"exception": null,
"is_successful": true,
"description": "RunTestPushTask succeed.",
"arguments_used": {
"service_account_json": "/creds/service_accounts/cipd-uprev-service.json"
},
"task_name": "RunTestPushTask"
}
,
Aug 22
afe_labels_sync keeps crashing
,
Aug 22
Re #8, pprabhu is dropping that service https://chrome-internal-review.googlesource.com/c/chromeos/chromeos-admin/+/665986
,
Aug 22
Okay, that means it needs to be dropped from test push too
,
Aug 22
Re #10 that happens automatically, so I'll go run test push again
,
Aug 23
Ironically, the CL in #9 broke test push in a different way.
,
Aug 23
There were other problems affecting test push, but chromite still isn't being uprevved.
,
Aug 23
Oh curses it's using site_utils/deploy_server.py to update the repos. I have a bad feeling about this
,
Aug 23
./utils/build_externals.py --use_chromite_master isn't working
,
Aug 23
When I stepped through with a debugger, it started working...
,
Aug 23
> /usr/local/autotest/ExternalSource/utils/build_externals.py(180)build_and_install_packages() (Pdb) l [EOF] DEBUG:root:[stdout] Updating server: chromeos-staging-shard2.cbf.corp.google.com DEBUG:root:[stdout] Checking tree status: DEBUG:root:[stdout] Tree status: clean DEBUG:root:[stdout] Updating Repo. DEBUG:root:[stdout] Updating push servers, checkout cros/master DEBUG:root:[stdout] Removing .pyc files DEBUG:root:[stdout] Updating ~chromeos-test/chromiumos DEBUG:root:[stdout] Removing .pyc files DEBUG:root:[stdout] Running update commands: build_externals DEBUG:root:[stdout] Running: build_externals: ./utils/build_externals.py --use_chromite_master DEBUG:root:[stdout] Restarting Services: apache2, gs_offloader, gs_offloader_s, host-scheduler, job_aborter, scheduler, shard-client, sysmon DEBUG:root:[stdout] Restarting: apache2 DEBUG:root:[stdout] Restarting: gs_offloader DEBUG:root:[stdout] Restarting: gs_offloader_s DEBUG:root:[stdout] Restarting: host-scheduler DEBUG:root:[stdout] Restarting: job_aborter DEBUG:root:[stdout] Restarting: scheduler DEBUG:root:[stdout] Restarting: shard-client DEBUG:root:[stdout] Restarting: sysmon DEBUG:root:[stdout] Changes: DEBUG:root:[stdout] autotest: DEBUG:root:[stdout] fc085e574 autotest: Modify provision expiration secs as 95% of the whole timeout. DEBUG:root:[stdout] e1fd8bf78 autotest: urgent fix: remove unused @property DEBUG:root:[stdout] DEBUG:root:[stdout] autotest/site_utils/autotest_private: DEBUG:root:[stdout] No Change. DEBUG:root:[stdout] DEBUG:root:[stdout] autotest/site_utils/devserver: DEBUG:root:[stdout] No Change. DEBUG:root:[stdout]
,
Aug 23
Bah, it's probably working now. Maybe stale pyc file or a race condition? All of the code appears correct and it appears to be working now, stupid heisenbug.
,
Aug 27
-> deputy who will be doing a push anyway. Can check on deputy-view to see if chromite was updated.
,
Aug 27
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/fb85484f887525242600c74a7742f9a81fd481a0 commit fb85484f887525242600c74a7742f9a81fd481a0 Author: Allen Li <ayatane@chromium.org> Date: Mon Aug 27 19:35:13 2018
,
Aug 30
Apparently, our push testing no longer updates the cros/prod-next branch in chromite: : jrbarnette $REPOS/cros.base/chromite; git rev-parse cros/prod cros/prod-next 5389d34c833dae162248c0c8ddb1b71d31851924 5389d34c833dae162248c0c8ddb1b71d31851924 : jrbarnette $REPOS/cros.base/chromite; git log -1 cros/prod commit 5389d34c833dae162248c0c8ddb1b71d31851924 (cros/prod-next, cros/prod) Author: Bob Moragues <moragues@google.com> Date: Mon Jul 23 10:26:46 2018 -0700 config: Fix Build Break on firmware tryjobs on master branch Set upload_hw_test_artifacts to false Generate config_dump.json BUG=chromium:828953 TEST=cros tryjob nocturne-firmware-tryjob Change-Id: I2168c742df1c717d934ecdb99e124793bedfb8f6 Reviewed-on: https://chromium-review.googlesource.com/1147112 Commit-Ready: Bob Moragues <moragues@chromium.org> Tested-by: Bob Moragues <moragues@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org>
,
Aug 30
> Apparently, our push testing no longer updates the cros/prod-next > branch in chromite: I guess we already knew that. :-( It's not yet proven whether the fix above worked, since the last successful push test was just before the change landed.
,
Aug 30
Oh dear. Re #23 by all appearances the code should still update chromite. It worked as the code said it should work when I stepped through usigng pdb. Maybe there's a race, since stepping through works.
,
Aug 30
> Re #23 by all appearances the code should still update chromite. > It worked as the code said it should work when I stepped through > usigng pdb. Maybe there's a race, since stepping through works. We haven't successfully updated either the autotest prod or the chromite prod branch since the fix was committed. So, we don't yet know whether this works or fails.
,
Aug 31
Re #26, are you talking about the "fix" in #22? That does not address the root problem, merely a related problem I ran into while digging into the root problem. The code path that uprevs chromite during test push hasn't been touched in ages, which makes this problem all the more infuriating. Ultimately, the code winds up calling build_externals.py --use_chromite_master I have checked that both the test push code ends up calling build_externals with the correct flag and that build_externals.py when called with said flag checks out master in site-packages/chromite
,
Sep 1
,
Sep 4
,
Sep 5
As expected, only the build_externals deploy version of chromite is stuck on an old ref. The ~/chromiumos/chromite version is latest: chromeos-test@chromeos-staging-master2:/usr/local/autotest/site-packages/chromite$ git log -1 commit 5389d34c833dae162248c0c8ddb1b71d31851924 (HEAD -> master, origin/prod-next, origin/prod) Author: Bob Moragues <moragues@google.com> Date: Mon Jul 23 10:26:46 2018 -0700 config: Fix Build Break on firmware tryjobs on master branch Set upload_hw_test_artifacts to false Generate config_dump.json BUG=chromium:828953 TEST=cros tryjob nocturne-firmware-tryjob Change-Id: I2168c742df1c717d934ecdb99e124793bedfb8f6 Reviewed-on: https://chromium-review.googlesource.com/1147112 Commit-Ready: Bob Moragues <moragues@chromium.org> Tested-by: Bob Moragues <moragues@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> chromeos-test@chromeos-staging-master2:/usr/local/autotest/site-packages/chromite$ cd ~/chromiumos/chromite/ chromeos-test@chromeos-staging-master2:~/chromiumos/chromite$ git log -1 commit d15b2011d9c0ec0dcb683a0456360ae950889e1f (HEAD, m/master, cros/master) Author: Manoj Gupta <manojgupta@google.com> Date: Fri Aug 31 14:31:41 2018 -0700 chromeos_config: Add kevin64-full builder. We want to start testing AArch64 userspace in Chrome OS. So start testing kevin64-full builds. BUG=chromium:878565 TEST=chromite unit tests pass Change-Id: Id424b7c349bed433e5984b3f73045f2cc6709dab Reviewed-on: https://chromium-review.googlesource.com/1200173 Commit-Ready: Manoj Gupta <manojgupta@chromium.org> Tested-by: Manoj Gupta <manojgupta@chromium.org> Reviewed-by: Mattias Nissler <mnissler@chromium.org> Reviewed-by: Luis Lozano <llozano@chromium.org>
,
Sep 5
Aha! My nasty little frenemy! chromeos-test@chromeos-staging-master2:/usr/local/autotest/site-packages/chromite$ /usr/local/autotest/utils/build_externals.py --use_chromite_master ... chromeos-test@chromeos-staging-master2:/usr/local/autotest/site-packages/chromite$ git log -1 commit d15b2011d9c0ec0dcb683a0456360ae950889e1f (HEAD -> master, origin/master, origin/HEAD) Author: Manoj Gupta <manojgupta@google.com> Date: Fri Aug 31 14:31:41 2018 -0700 chromeos_config: Add kevin64-full builder. We want to start testing AArch64 userspace in Chrome OS. So start testing kevin64-full builds. BUG=chromium:878565 TEST=chromite unit tests pass Change-Id: Id424b7c349bed433e5984b3f73045f2cc6709dab Reviewed-on: https://chromium-review.googlesource.com/1200173 Commit-Ready: Manoj Gupta <manojgupta@chromium.org> Tested-by: Manoj Gupta <manojgupta@chromium.org> Reviewed-by: Mattias Nissler <mnissler@chromium.org> Reviewed-by: Luis Lozano <llozano@chromium.org> root@chromeos-staging-master2:/usr/local/autotest/site-packages/chromite# /root/chromeos-admin/puppet/run_puppet ... chromeos-test@chromeos-staging-master2:/usr/local/autotest/site-packages/chromite$ git log -1 commit 5389d34c833dae162248c0c8ddb1b71d31851924 (HEAD -> master, origin/prod-next, origin/prod) Author: Bob Moragues <moragues@google.com> Date: Mon Jul 23 10:26:46 2018 -0700 config: Fix Build Break on firmware tryjobs on master branch Set upload_hw_test_artifacts to false Generate config_dump.json BUG=chromium:828953 TEST=cros tryjob nocturne-firmware-tryjob Change-Id: I2168c742df1c717d934ecdb99e124793bedfb8f6 Reviewed-on: https://chromium-review.googlesource.com/1147112 Commit-Ready: Bob Moragues <moragues@chromium.org> Tested-by: Bob Moragues <moragues@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> because, as part of puppet run, Notice: /Stage[main]/Lab::Compiled_autotest_repo/Exec[build_externals]/returns: executed successfully I'm guessing that around July 31 is when I changed the order in which test_push updates the repos and runs puppet (puppet uses some tools deployed in autotest, and when these tools break, the only way to get the staging servers to a good state is to first update the tools, then the use of puppet. running the puppet first fails, and the servers get stuck). So, since then each test_push has promptly downgraded chromite as part of setup and never pushed a new chromite.
,
Sep 5
Since build_externals is run as part of prod-push, it should not be run by puppet, except during setup. This is how the ~/chromiumos and /usr/local/autotest checkouts are handled as well.
,
Sep 5
https://chrome-internal-review.googlesource.com/c/chromeos/chromeos-admin/+/672469 has landed. This bug is in wait_close
,
Sep 5
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/c22ff88235abcdcdeb8ae12bb8058e59826ad32a commit c22ff88235abcdcdeb8ae12bb8058e59826ad32a Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Wed Sep 05 22:08:57 2018
,
Sep 6
verified that staging lab has now updated the site-packages/chromite path for the next test_push |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by jkop@chromium.org
, Aug 21Labels: -Pri-2 Pri-1
Summary: chromite hasn't been uprevved (was: chromte havn't got revved)