Configure servo overlay from "model" instead of "board"
Reported by
jrbarnette@chromium.org,
Sep 18
|
||||||
Issue description
This is being forked from bug 878435 for the sake of the
octopus deadline.
Currently, when we start servod, we supply a --board option that requests
including an overlay for the specified board. With the advent of octopus,
an overlay based on the board (i.e. software build configuration) is no
longer sufficient. Instead, we need for the overlay to be model-specific.
That entails the following code changes:
* In hdctools, for every model in the lab, there will need to be an
appropriate overlay named for the model.
* In Autotest servo host repair code, all of the code that assigns
the servo overlay from the "board" label must change to use the
"model" label instead.
* The `deploy` script will need to know both board and model. Right
now, I think they'd need to be both passed on the command line.
Ideally, though, the code should calculate the board from the model.
Ideally, we should also rename "board" to model. That shows up in a
few more places:
* A servod --model option should be created as an alias for --board.
* The servod upstart job for both beaglebone_servo and guado_labstation
will need to rename the "BOARD" setting to "MODEL".
* Code in Autotest that knows how to start/restart the upstart jobs
will need to know about the change from BOARD to MODEL.
Additionally, there will need to be some training/PSA work:
* The techs on englab-sys-cros@ will need to know about the changes,
especially changes to the `deploy` script.
* Anyone administering a test lab will want to know about the need
to use model instead of board for the upstart jobs.
It should be noted that because of the multiple moving parts, rollout
for this change will be ... tricky. Most likely, it will be necessary
that for a short time (possibly a long time) "board" and "model" should
be effectively synonymous in the context of servo overlays. Initial
rollout can (should) be simplified by not performing the rename at all.
This work is on deadline, because it must be ready in time for the
arrival of octopus in the lab (or at least, done before arrival of
the second model).
,
Sep 18
,
Sep 18
,
Sep 18
> There's another bug for this already ... I'm working on this right now. Does the work include the specified changes to Autotest, and do you anticipate being done within four weeks?
,
Sep 18
I spoke with bjanakiraman@, and he wants to ask guocb@ to take care of the changes in autoserv, the deployment script, and servod, as well as coordinating rollout and training (if any) with englab-sys-cros@. I'm assuming that bug 878435 is proposing different work, which is why I forked this bug from there in the first place. If there's no other work proposed, we should sort that out, so that we don't duplicate effort.
,
Sep 19
Fine if guocb@ does the work (first couple cls are already in flight), as long as it's done with the appropriate priority for octopus. Also, the solution we've discussed with octopus team is different. We just want to set this value on a per model basis in cros_config and not try to create a new convention with model->servo config mapping, since conventions tend to break down quickly or cause excessive overhead/redundancy. First 2 cls in flight are already indicative of the desired solution.
,
Sep 19
As per the CL, a device runtime query for servod config doesn't make sense, as you only need servo when the device isn't running. Cros config isn't the right solution.
,
Sep 19
,
Sep 19
,
Sep 19
Issue 878435 has been merged into this issue.
,
Sep 19
Moving discussion from https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1232013 to here. Acknowledge the servo dep on the device setup routine. Model was never part of this, but now we're leaking this all over the servo config and applying very verbose convention for the sake of one case. This is a function of the ec and not the model, which is where it goes wrong. I'm not tied to using cros_config to master this data, but I disagree with spreading convention based config further than it's already spread. The long term solution here is to provide the master config in a manner that autotest tooling could just leverage it (previously discussed).
,
Sep 19
It's not tied to ec specifically, it's tied to the pcb layout of that model. Since model is the common key for deployment, goldeneye, and the mapping to the pcb, it makes sense to use it for this config. If there were some trebuchet style offline db, that could map model to servo config in deployment scripts, chroot, autotest, wherever, that would be a good place to put it, and I thing that's a good long term goal. Short term hacking it into runtime config -> sticky settings makes deployment, reliable repair, and config maintenance confusing. It also adds a revision lock where yaml can't change without breaking CI, ATL, and labstation, with the same failure mechanism that affected the sticky settings of scarlet/dru name change.
,
Sep 19
,
Sep 19
There are two distinct problems under discussion here:
1) Cobble together a solution good enough for deploying the
new octopus models before the hardware arrives in a few weeks.
2) Clean up how we calculate the servo overlay during deployment
so that we don't need hundreds of servo overlays.
Item 1) is this bug. Comments #11 and #12 relate to item 2); that
item needs its own bug.
,
Sep 19
,
Sep 19
Here's a set of symlinks we can use once model is used for servo overlay instead of board: https://chromium-review.googlesource.com/c/chromiumos/third_party/hdctools/+/1234457 This list is only going to get longer :(
,
Sep 19
I've been thinking about rollout issues here. I think we need the
following ordering
First, change the deployment script to accept the model at command
invocation time, and apply the model label at the same time as we
apply the board label. We want to start this ASAP, because it's got
a long lead time: We have to make sure that englab-sys-cros@ is
fully aware of the change and ready for it. This is also the item
least likely to be affected by a late-breaking "we can do better"
idea.
Second, once we know that the deployment script is doing its thing,
we can change servod to recognize all the model names. Once that
change is committed, it has to be pushed to the labstations and
beaglebones. That process also has a little lead time, so we should
arrange for it early.
Finally, once the first two items are done, we can switch to using
model in place of board. That's a trivial change, right here:
https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/server/hosts/servo_host.py?l=752
This change will need both of the prior steps fully deployed, and
won't take effect until a push to prod. That push to prod needs to
happen before we start rolling out new octopus models.
,
Sep 19
> Finally, once the first two items are done, we can switch to using > model in place of board. Will this change mean we will no longer be able to mix models of a unibuild on a single labstation any longer? (e.g. Blue and Bruce are both Coral, deployed on chromeos6-row5-rack7-labstation) We only have a few instances of this situation that I know of, but we should account for it if we need to fix them (physically) ahead of a labstation update.
,
Sep 19
> Will this change mean we will no longer be able to mix models > of a unibuild on a single labstation any longer? It shouldn't. The only restriction should be that you can't deploy more than one model with a single deployment command invocatio. With a little bit of work, I think even that could be dropped by allowing the .csv file to name more than one model (that might work now, but I can't say for sure).
,
Sep 19
> > Will this change mean we will no longer be able to mix models > > of a unibuild on a single labstation any longer? > > It shouldn't. The only restriction should be that you can't deploy More broadly, AFAICT, there's no part of the system that cares about the BOARD configuration for any DUTs sharing a labstation. I remember that I believed that at one time, part of Autotest might do the wrong thing if a single labstation had different configs. However, I'm now doubting that the problem is still present. Moreover, I know that some labstations outside of Stierlin Ct already have more than one BOARD configuration for different DUTs, and nothing bad has been reported on those systems because of it.
,
Sep 24
Update on schedule: we expect to receive the first round of ite-based octopus boards (e.g. bip, apel) the week of November 26th. We would ideally like a solution before then. Thank you for working on this!
,
Sep 25
,
Sep 26
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/defcd0a3c1b1566dd31593610749ff181e443799 commit defcd0a3c1b1566dd31593610749ff181e443799 Author: Congbin Guo <guocb@chromium.org> Date: Wed Sep 26 17:33:17 2018 autotest: deploy.py: add option '--model/-m' This change add a new option '--model/-m' to deploy.py. For now, we don't check validity of it. This change set label 'model' in AFE if it doesn't exist. BUG=chromium:885276 TEST=Ran 'deploy.py' locally. Change-Id: I931533d4519ce0762606ea66e30f6a64f46266a5 Reviewed-on: https://chromium-review.googlesource.com/1237328 Commit-Ready: Congbin Guo <guocb@chromium.org> Tested-by: Congbin Guo <guocb@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com> [modify] https://crrev.com/defcd0a3c1b1566dd31593610749ff181e443799/site_utils/deployment/deploy.py [modify] https://crrev.com/defcd0a3c1b1566dd31593610749ff181e443799/site_utils/deployment/cmdparse_unittest.py [modify] https://crrev.com/defcd0a3c1b1566dd31593610749ff181e443799/site_utils/deployment/install.py [modify] https://crrev.com/defcd0a3c1b1566dd31593610749ff181e443799/site_utils/deployment/cmdparse.py [modify] https://crrev.com/defcd0a3c1b1566dd31593610749ff181e443799/site_utils/deployment/install_unittest.py [modify] https://crrev.com/defcd0a3c1b1566dd31593610749ff181e443799/site_utils/deployment/cmdvalidate.py
,
Nov 1
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/2f3c985085a788673f8eb8302d1baa8d8cb4ab36 commit 2f3c985085a788673f8eb8302d1baa8d8cb4ab36 Author: Nick Sanders <nsanders@chromium.org> Date: Thu Nov 01 04:59:35 2018 autotest: allow servo repair with model Add MODEL=model arg to servod restart. This allows octopus devices with ite ECs to be repaired in the lab. BUG=chromium:885276, chromium:886955 TEST=will run in staging lab Change-Id: I190caf73792a8e3409547ec479e579c5a87827ec Reviewed-on: https://chromium-review.googlesource.com/1298355 Commit-Ready: Alex Zamorzaev <zamorzaev@chromium.org> Tested-by: Nick Sanders <nsanders@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> [modify] https://crrev.com/2f3c985085a788673f8eb8302d1baa8d8cb4ab36/server/hosts/cros_host.py [modify] https://crrev.com/2f3c985085a788673f8eb8302d1baa8d8cb4ab36/server/hosts/servo_host.py [modify] https://crrev.com/2f3c985085a788673f8eb8302d1baa8d8cb4ab36/server/hosts/servo_repair.py |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by shapiroc@chromium.org
, Sep 18