New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 407401 link

Starred by 2 users

Issue metadata

Status: Fixed
Closed: Mar 2018
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Bug

issue 407399

Sign in to add a comment

Remove cluster_fuzz dependencies on LKGR

Reported by, Aug 25 2014

Issue description

Remove cluster_fuzz dependencies on LKGR

I'm told cluster_fuzz still depends on having LKGR.  We should just have it build its own "last known good revision" from recent builds which is more targeted to cluster_fuzz's needs.
Blocking: chromium:407399
There is no work to be done from my side other than verifying. ClusterFuzz does not create builds, i think we should just go with
Could you tell me which tests cluster_fuzz cares about passing?  Or does cluster_fuzz only care that the compile passes?  Or that tests pass with < N crashes?
Compile is a must-have, as otherwise we will archive builds without the binaries and mess up regression testing badly.

Other thing we care about is startup crash. It should launch the browser and basic pageload tests should pass.

All other things (like passing 90% of tests, etc) are just addons. They are not needed and won't be blockers. If this is doable, then we should set a rule like 80% of the tests should pass or something like that.

Comment 5 by, Aug 26 2014

It almost never happens that the binary compiles and fewer than 90% of the tests pass. It does happen, but it's *really* rare. Make you could just use LKCR? If that's not sufficient, I think any script that walks through the last 40 revisions and finds the best one will meet this criteria 100% of the time.
Thinking again, LKCR is more than enough. compile breakage on main build, will also cause compile breakage on memory build and that prevents the build from getting archived already. verified on
Isn't there a problem with the number of LKCR releases there are, however? Given that you'd be building, instrumenting and archiving each and every LKCR release you'd have a lot more than currently occur with LKGR... is this an issue?
LCKR will be roughly like one every 40 revisions, which is good enough for us. LKGR has been more frequent sometimes and we have a rule in ClusterFuzz to only use the new build if it is more than 50 revisions apart (so that we don't waste too much time in pulling new builds.).
[Speaking of the ASan bots as an example]
I wonder if the ASan builder that packages builds for CF can run a (small?) subset of browser_tests and skip packaging if it fails?  Or maybe one of the ASan tester bots can be used to package builds if all the previous steps are green?

Comment 10 by, Aug 28 2014

If you were going to do that, it'd make more sense to me to invest that effort in writing a script that computed an LKBR or something that is the best revision in the last 40. It could update every 30 minutes or so. That would give you much better coverage than trying to run a relatively random subset of the tests.

Canary already has a script that basically does this. I don't think it'd be hard to extend it to do the same for a more generic thing.

That said, either way is fine with me. Totally up to you.

Comment 11 by, Sep 13 2014

Labels: -Pri-2 Pri-1 Infra-CodeYellow CY-TreeAlwaysOpen
Status: Assigned
OK, so it sounds like in the short term we should just use LKCR. inferno, can you make that change? We'd like to kill LKGR in the next week or two.

Comment 12 by, Sep 13 2014

Owner: ----
Status: Untriaged
I don't know the chromium build infrastructure side scripts, can you please find someone to work on this. I can make changes from the ClusterFuzz side as needed.

Comment 13 by, Sep 13 2014

Abhishek, who is responsible for maintaining clusterfuzz? Where is the logic for deciding to use lkgr? If you didn't set that up, do you know who did?

Comment 14 by, Sep 13 2014

Ojan- me and Marty(mbarbella@) are the developers and maintainers of ClusterFuzz. However, we don't manage the build archiving aspect on the google cloud storage bucket, that is done by the chromium infrastructure team side [we just pickup whatever gets stored there]. As far as i remember, some of the scripts were written by Alex (glider@) some time back. e.g. I don't know where the code hooks will go from changing from lkgr to lkcr. Alex, do you have any idea ? Also, ccing Chase for suggestions.

Comment 15 by, Sep 13 2014

Status: Available
OK, that python file traces back to cf_archive_build in the master config file:

So, the clusterfuzz uploaders actually run on the lkgr master. Maybe we just need to switch the lkgr master over to using lkcr and then we're done? We need to make sure the other bots on this waterfall are OK with that, e.g.  issue 407402 .

Comment 16 by, Sep 13 2014

Seems like it, but someone from chrome infra should confirm.

'cf_archive_build': ActiveMaster.is_production_host, // does ActiveMaster refer to lkgr master ??? what is needed for changing this to lkcr ??
Technically, I think the only thing that is needed, is to change the branch from lkgr->lkcr here:

And as a consequence, all other occurrences of "lkgr" should be changed to "lkcr" as well in, and slaves.cfg.

Last but not least, the whole master should probably be renamed, right? Is that possible, or do we need to recreate it as a new lkcr master? This is only cosmetic, but it might be confusing if a master called lkgr pulls its builds from lkcr.

Comment 18 by, Oct 22 2014

Labels: -Infra-CodeYellow -CY-TreeAlwaysOpen
Status: Assigned (was: Available)
Just FYI how we do this in V8 (maybe too simple for chromium): We have _one_ bot as a gate for a respective clusterfuzz builder, e.g. our own ASAN builder/tester must pass building/testing on a specific revision. The tester then triggers the clusterfuzz builder for uploading the same revision to CF using the trigger recipe_module.

I don't know why the notion of lkgr/lkcr would be useful here, as it means that a bunch of other unrelated bots have to be green. E.g. why would clusterfuzz linux testing care if windows has a compile error in the same revision, as long as the linux build compiles and tests cleanly?

I understand that it might make sense when a set of several different tester bots are the gate for a particular clusterfuzz build type. But since we are on swarming, we mostly have one bot that covers all tests of a particular build configuration.
Labels: -Pri-1 Pri-2
downgrading to P2 to make sure any M53-blockers get precedence.
Components: -Infra Infra>Client>Chrome
Labels: -Pri-2 Pri-3
Labels: OS-All
Labels: -Pri-3 Pri-2
I'd like to come back to this and make some progress. I know inferno@ and I discussed how we thought we'd do this months ago, but I've now forgotten what we discussed :(.

@inferno, let's see if we can find some time this week to discuss a path forward.

Comment 27 by, Jan 15 2017

Sounds good, lets discuss this week.
Labels: -Pri-2 Pri-3
Owner: ----
Status: Available (was: Assigned)
actually, I don't know when I'll have time to work on this :(.

Comment 29 by, Mar 7 2017

Status: Assigned (was: Available)
Assigning this to myself because I've been looking into turning LKGR. 
Status: Fixed (was: Assigned)
Closing this based on 

Clusterfuzz builders are not using lkgr or lkcr anymore.

Sign in to add a comment