Master-release/LATEST-master=R69-10837.0.0, which blocks suite-scheduler kicking off ToT suites. |
||||||||||||
Issue descriptionI'm watching this dashboard daily: https://stainless.corp.google.com/search?status=GOOD&status=WARN&status=FAIL&status=ERROR&exclude_retried=true&exclude_cts=false&exclude_non_production=true&exclude_acts=true&exclude_non_release=true&exclude_au=true&test=cheets_CTS_N.7.1_r19.arm&exclude_not_run=false&row=build&col=test&view=matrix&first_date=2018-07-10&last_date=2018-07-24 and found all tests from the arc-cts suite stopped running after R69-10895.0.0 (while bvt-arc results are accumulating for M70). To me the situation looks quite similar to Bug 847540 , which we encountered at the M69 branch point. In the bug, the solution was to checking chromeos-image-archive/master-release/LATEST-master instead of master-paladin, but this time, as far as I checked, -release/LATEST-master is getting far behind. gs://chromeos-image-archive/master-paladin/LATEST-master == R70-10903.0.0-rc2 gs://chromeos-image-archive/master-release/LATEST-master == R69-10837.0.0
,
Jul 24
Issue 867091 has been merged into this issue.
,
Jul 24
Master-release'version is decided as ToT as we discussed in Issue 847540 . Given this assumption, I don't find anything wrong in suite-scheduler. Unless we switch the ToT back to master-paladin/LATEST-master. arc-cts suite is not only kicked off as nightly suites. There's also weekly, new_build event to kick off arc-cts for different branches. [ArcCtsStablePerBuild] run_on: new_build suite: arc-cts branch_specs: ==tot-2 [ArcCtsBetaPerWeek] run_on: weekly suite: arc-cts branch_specs: ==tot-1 [ArcCtsDevPerWeekSat] run_on: weekly suite: arc-cts branch_specs: ==tot [ArcCtsDevPerWeekWed] run_on: weekly suite: arc-cts branch_specs: ==tot [ArcCtsBetaNightly] run_on: nightly suite: arc-cts branch_specs: ==tot-1 [ArcCtsDevPerWeekMon] run_on: weekly suite: arc-cts branch_specs: ==tot [ArcCtsDevPerWeekThu] run_on: weekly suite: arc-cts branch_specs: ==tot [ArcCtsDevPerWeekTue] run_on: weekly suite: arc-cts branch_specs: ==tot [ArcCtsPerBuild] run_on: new_build suite: arc-cts branch_specs: ==tot As ToT=R69, all events on tot branch cannot be kicked off. Issue 867091 has the same issue. If master-release goes to R70, this problem will be solved. Assign to deputy to decide whether we want to solve this by pushing release-master to real ToT, (cc sheriffs) or switch ToT in suite-scheduler's settings.
,
Jul 25
Hokay, so what's the issue here? OOB suites are tracking master-release since they test against release builds. It sounds like master-release should be R70 (which is DEV?) But why are the release builders trailing behind the CQ? Shouldn't the CQ also track DEV?
,
Jul 25
,
Jul 25
master-release/LATEST-master == R69-10837.0.0 is because that master-release stays red since 2018-07-02: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8942091101558299760. If we get a green master-release, the ToT will go newer.
,
Jul 27
,
Jul 30
,
Jul 31
So it sounds like per comment 6 this is blocked on https://bugs.chromium.org/p/chromium/issues/detail?id=869430 which appears to be what is causing the master-release builder to fail since early July. Seems this may be related to migrating to Swarming?
,
Jul 31
I'm not sure what's required here, but the LATEST-master files are created by chromite; they're not in the purview of test infrastructure. So, passing to CI for evaluation.
,
Jul 31
... Actually, this is probably modestly urgent, so let's pass to the CI Bobby for prioritized evaluation.
,
Jul 31
Issue 868078 has been merged into this issue.
,
Jul 31
Is there any chance someone could dumb this down for me, this is all new territory. So one of the builders makes an image (either master or release builder) and that image gets uploaded to Cloud Storage. Then, transitively, something called ?suite scheduler? kicks off several suite executions on that image, like CTS (which is the Android compatibility test, right?). What machines are running suite scheduler? I also don't follow what 'checking' the master-paladin means? Is build status polled?
,
Aug 1
Suite-scheduler is a GAE service, which schedules tests based on user needs, e.g. schedule suite dummy on branch ToT for all boards. So suite-scheduler needs a source to decide what's current ToT, e.g. it's R69 or R70. Currently, it use the latest passed release-master as ToT: xixuan@xixuan0:~/chromiumos/src/third_party/autotest/files$ gsutil cat gs://chromeos-image-archive/master-release/LATEST-master R69-10837.0.0 So it's R69. However, R69 is not expected. What users want is R70. But because master-release keeps failing since 7.02: https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=master-release&buildBranch= The LATEST-master is not updated since then. To solve this problem, we want at least one green run of master-release builder. Or ToT will be R69 forever. Based on #9, the blocker for one green run of master-release may be Issue 869430 .
,
Aug 1
Can we base it on something other than the LATEST files? Maybe CIDB?
,
Aug 1
Re #15, Technically yes. But most suites that are scheduled by suite-scheduler are based on a green release build. If master-release or some release builders always keep red, we will skip many suites. I wonder if we have any official indication for ToT in GE or somewhere. But "master-release is red for nearly one month" is not expected anyway...
,
Aug 1
Thank you Xixuan, appreciate the explanation! So if I'm understanding correctly, options (for me) are find a way around using the LATEST file (maybe CIDB as Don suggested), wait for a green run on master-release or switch to the LATEST from master-paladin?
,
Aug 1
1) "switch to the LATEST from master-paladin": we don't want this finally since suite-scheduler is targeting release build. But as a temporary fix, we can do that. It's just a revert of https://chromium-review.googlesource.com/c/chromiumos/infra/suite_scheduler/+/1077329. 2) "Debug Issue 869430 to make master-release green" is the right solution. Since red release builder also blocks suite-schedule to schedule suites on them, which is unexpected anyway. 3) "Find a way around instead of using LATEST file", that needs more work & discussion than option 1), From the perspective of adjusting ToT, we can simply adopt 1).
,
Aug 1
Yes, we need to look at issue 869430 but also, per jrbarnette on IRC just now, the majority of the slaves are failing due to DUT shortages and have been for awhile. That needs to be fixed in parallel. Alec will look at issue 869430 while we use this bug (or another?) to track fixing DUT shortages.
,
Aug 1
> Yes, we need to look at issue 869430 but also, per jrbarnette on > IRC just now, the majority of the slaves are failing due to DUT > shortages and have been for awhile. That needs to be fixed in parallel. AFAICT, this problem isn't about the DUT shortages. Certainly, there's nothing in the bug history explaining a connection. If I've understood the ask here properly, it requires chromite changes, not device repairs or Autotest changes.
,
Aug 1
,
Aug 1
,
Aug 1
Richard, where are we tracking getting the DUT shortages fixed? Because that also blocks this bug.
,
Aug 1
> Richard, where are we tracking getting the DUT shortages fixed? Because that also blocks this bug. There's no bug filed; there's a daily e-mail with a list of the problems; I'm working my way through that to see what's what. Do we have specific builders failing that we know are blocking this bug? Although I can guess at some of the failures, the mail doesn't identify problems by builder, but only by model.
,
Aug 1
The LATEST files are just files in GS. We could manually create/update one if we want as a temp workaround.
,
Aug 1
Sorry for taking so long to get up to speed, let me double checking my (new) understanding here: master-release has actual failing builders, but that might be okay because master-release is allowed to pass with failures in specific builders. The problem is its failing during the generation of links to builds thinking that they are still on Buildbot. Aka once Don's change lands, both of these problems might go away, as long as the builders that are failing are allowed to fail. In addition to this, some of them are failing because of provisioning issues, which might still block a successful run.
,
Aug 2
We had a successful run on master-release last night (thank you dgarrett for the links fix!) Does that resolve this issue?
,
Aug 2
francos66040@gmail.com Master-release/LATEST-master=R69-10837.0.0, which blocks suite-scheduler kicking off ToT suites. builds. also track DEV?
,
Aug 2
The fix needs merging back.
,
Aug 2
I've issued a merge request in https://crbug.com/869430 .
,
Aug 2
Master-release/LATEST-master=R69-10837.0.0, which blocks suite-scheduler kicking off ToT suites.
,
Aug 3
Still blocked on crbug.com/869430 which Don has a fix for that needs to be merged in, it's pending review but should go in soon.
,
Aug 3
This should be fixed when the next R69 build starts. Calling it fixed, since this was proven out on R70. |
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by ayatane@chromium.org
, Jul 24Status: Assigned (was: Untriaged)