lucifer: Does not handle synch_count=1 jobs with multiple HQEs |
||||
Issue descriptionlucifer does not handle synch_count=1 jobs with multiple HQEs. Results in scheduler crashes https://bugs.chromium.org/p/chromium/issues/detail?id=831689
,
Apr 12 2018
It's a feature that used to work. It's a shortcut to avoid creating a job three times to run it on three DUTs; you can just create one job with three DUTs instead. The hard question is, whether or not it is worth restoring this feature, given how close we are to beginning skylab migration and in the skylab world this feature will not be supported (i.e., just create three jobs instead).
,
Apr 12 2018
,
Apr 12 2018
A bit more detail, the central issue is how execution groups are handled. An execution group is a group of HQEs/hosts that run a single autoserv together. Most HQEs are in an execution group by themselves. synch count jobs have all of their HQEs in one execution group. Thus, most jobs have one execution group. However, non synch count jobs with multiple HQEs will have multiple execution groups, one per HQE. (It is theoretically possible to have synch_count jobs with multiple execution groups. I'm pretty sure no such job has ever been created.) Lucifer considers each job to be one execution group. The difficulty in restoring support for multiple execution groups is that the execution group transaction locks between Autotest and Lucifer are keyed on job id. Also keying on HQE requires a database migration and a few trips to deploy everything in a backward compatible manner.
,
Apr 23 2018
Lower prio after mitigation https://bugs.chromium.org/p/chromium/issues/detail?id=832167
,
Jul 24
Separate bug for skylab multi-DUT jobs |
||||
►
Sign in to add a comment |
||||
Comment 1 by pprabhu@chromium.org
, Apr 12 2018