New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 786554 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Dec 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

linux_site_isolation trybot should be always (non-optionally) included for CQ

Project Member Reported by lukasza@chromium.org, Nov 17 2017

Issue description

We would like the CQ to always include the linux_site_isolation trybot.


So far the bot was optional.  There is a presubmit (*) adding the bot to a subset of CLs that touch //content/browser/frame_host.


The extra coverage is desired because:

- It will avoid a situation when a CL that doesn't require linux_site_isolation lands and breaks the bot, effectively blocking all subsequent CLs that do require the linux_site_isolation bot

- We continue to ship more isolation modes (--isolate-extensions in 
M56, sign-in-isolation in M63, more to come)


The bot seems fast-enough and stable-enough to be added to all CQ runs:

- Sharding is enabled on the bot (including layout tests)

- I am not sure how to check how much time the trybots takes on average.  FWIW, looking at the FYI bot (**) that uses the same config, the runtime is usually around 30 minutes (lloking at builds 22730-22735).

- The FYI bot (**) using the same config stays mostly green (in particular - the most recent redness starting at build #22758 is caused by  issue 786539  - that issue would be prevented by CQ if it included linux_site_isolation)


(*) https://chromium.googlesource.com/chromium/src/+/e95c0692055388f752fd7b8db460024944176534/content/browser/frame_host/PRESUBMIT.py

(**) https://build.chromium.org/p/chromium.fyi/builders/Site%20Isolation%20Linux?numbuilds=200
 
Owner: dpranke@chromium.org
Status: Assigned (was: Untriaged)
dpranke@, could you please advise on how to make linux_site_isolation part of CQ?  What are the rough steps to make it happen?  Can I help by providing any additional data?

Right now both 1) linux_site_isolation trybot and 2) the "Site Isolation Linux" FYI bot are configured via the same file - via "Site Isolation Linux" entry in chromium/src/testing/buildbot/chromium.fyi.json.  I hear (from creis@/sky@) that as a first step we might want to move the FYI bot to the main waterfall (but I am not sure how one would do that).

BTW: Please assume talking to a silly Chromium engineer who knows almost nothing about configuring bots and/or CQ.  I've found https://chromium.googlesource.com/infra/infra/+/9673d2fe9ea3d4206d7ebd12dab5cb5ebf1cbd53/doc/users/services/commit_queue/index.md and it mentioned infra/config/cq.cfg.  Maybe I can just add an entry to the bucket associated with master.tryserver.chromium.linux?

Comment 2 by creis@chromium.org, Nov 17 2017

Labels: -Pri-3 Pri-1
Raising priority since our existing CQ coverage is going away in  issue 786555 .  I agree that having CQ coverage at all should include it being monitored by sheriffs, which presumably means being on the main waterfall?
Cc: jbudorick@chromium.org s...@google.com
Status: Started (was: Assigned)
So, fortunately, "Site Isolation Linux" (and the matching linux_site_isolation trybot) are actually the same build configuration as "Linux Builder" / "Linux Tests" (and linux_chromium_rel_ng).

So, all we need to do is make sure that the tests that are running on "Site Isolation Linux" are moved over to "Linux Tests" (and that we have the capacity for this). To do that, you'd just copy the "Site Isolation Linux" entries in //testing/buildbot/chromium.fyi.json over to the "Linux Tests" entry in //testing/buildbot/chromium.linux.json.

If you do that, you should be able to just make the change in a CL and run a linux_chromium_rel_ng try job and see it work.

However, given that you're running browser_tests and the layout tests, I'd guess we probably don't actually have the capacity at the moment (kinda depends on how often the Cq-Include-Trybots thing was triggering, compared to how often analyze would trigger it).

That said, next week should be light, traffic-wise, so we could probably land things now and see how it goes while we work on getting more capacity.

(jbudorick, smut, fyi ...)
I should have noted that once linux_site_isolation is part of the CQ, then some steps can be removed from linux_chromium_rel_ng (all the steps beginning with site_per_process - for example site_per_process_browser_tests, site_per_process_content_browsertests, site_per_process_content_unittests, etc.).  OTOH, some site-per-process tests are currently unique to linux_site_isolation bot (e.g. layout tests, interactive_ui_tests, extensions_browsertests ,extensions_unittests).  Not sure how much this helps lower the capacity requirements and if/how much this helps with capacity planning...

Comment 5 by jam@chromium.org, Nov 20 2017

Cc: jam@chromium.org
@lukasza: curious if we need layout tests running with site isolation on CQ? The heavy test suites (browser tests based) are mostly run, so apart from layout tests this this bug is just asking for a few fast tests to be added.
IMHO layout tests are right now the most common culprit for turning the Site Isolation Linux FYI bot red - adding CQ coverage for OOPIF support in layout tests would go a long way toward avoiding OOPIF-specific regresions.  Please note that "OOPIF support" is both A) --site-per-process and B) a long list of WPT-specific origins to isolate - see the current config of the Site Isolation Linux FYI bot for details. 

Since dpranke@ asked this offline, let me also clarify that the Site Isolation Linux FYI bot runs *all* layout tests - this gives OOPIF coverage for 1) http/tests, 2) external/wpt, 3) any virtual test suites that include 1 or 2 and 4) any other layout tests that happen to trigger a different behavior in site-per-process mode (not sure how many tests fall into 4th category in practice / I don't remember specific examples.)

creis@/alexmos@ - how much do we care about CQ coverage for other test suites?  I know that some OOPIF-specific functionality (e.g. drag-and-drop) is for example only verified by interactive_ui_tests, but in practice I don't think we've seen failures from this test suite on the FYI bot, right?  Same question for extension_browsertests/unittests, components_browsertests/unittests and other test suites covered by the FYI bot but not CQ.  CQ currently covers site_per_process_browser_tests, site_per_process_unit_tests, site_per_process_content_browsertests, site_per_process_content_unittests (based on https://cs.chromium.org/search/?q=file:json+file:testing+site_per_process&sq=package:chromium&type=cs).
#6: I think we've seen failures in interactive_ui_tests, but maybe not the other examples you brought up.  Theoretically, anything that uses content/ could behave differently with --site-per-process, which is how we've determined the list of test targets originally.  In practice, you're right that pretty much all of the failures we've ever seen were in content_unittests, content_browsertests, browser_tests, unit_tests, interactive_ui_tests, or layout tests.

I agree that getting layout tests with --site-per-process added to the CQ is the highest priority.  I'd also add interactive_ui_tests if possible, and maybe we can keep relying on our FYI bots for the rest.

Comment 8 by jam@chromium.org, Nov 21 2017

note: I think adding the extra browser test based harnesses you listed is pretty cheap compared to the existing ones that are running.

About layout_tests, is it possible to use flagsexpectations for this? that way we avoid having to do the isolate/archive steps twice? It's not clear that we really care to run the other virtual test suites with this in practice. I'm just wondering if there's a lighter weight way of getting most of the benefit of running all of layout tests. Could we try a virtual test suite and see what the new rate of failures is on the FYI bot?
FlagsExpectations is a specific mechanism separate from VirtualTestSuites. We could likely do either, or add an additional expectations file (like we use the filter files for gtests). I agree you'd want something where we could reuse the isolate.
dpranke@: any updates on this?  I think the highest priority for us is to get CQ coverage for layout tests running in --site-per-process (prior to r517627 we at least had it for content/browser/frame_host changes).  A bit lower priority would be to also get coverage for the other test targets that linux_site_isolation runs with --site-per-process (everything other than content_unittests, unit_tests, content_browsertests, browser_tests, which do have site-per-process CQ coverage on the linux bots).
Sorry for the delay; we have more capacity and should be able to add this now. I'll try to get to it tomorrow.
Thanks!  That will be very helpful to keep things working as we run trials of this mode, and help avoid problems like  issue 786539  (which happened multiple times).

Comment 13 by creis@chromium.org, Dec 11 2017

I just noticed that adding "Cq-Include-Trybots: master.tryserver.chromium.linux:linux_site_isolation" to my CL description of https://chromium-review.googlesource.com/c/chromium/src/+/817661 didn't seem to kick off linux_site_isolation try jobs for the CQ.  Could that be related to in-progress work here?  (Or do I have a typo?)  :)

I can file a separate issue about Cq-Include-Trybots if needed, but finishing this one (i.e., getting --site-per-process coverage on the waterfall) would be sufficient for me as well.  Thanks!
CL out for review / testing now in https://crrev.com/c/821554; if it turns up a bunch of bugs I'll likely punt this over to the site isolation folks to debug and land. 

Hopefully this will make the thing in #c13 moot.

Sorry for the delay :(.
Project Member

Comment 15 by bugdroid1@chromium.org, Dec 14 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7feb880d77166ca4cadb5a7f27acf1f2e54f97ad

commit 7feb880d77166ca4cadb5a7f27acf1f2e54f97ad
Author: Dirk Pranke <dpranke@chromium.org>
Date: Thu Dec 14 19:27:45 2017

Add remaining site_per_process test suites to main Linux bots.

This CL adds
  - site_per_process_components_browsertests
  - site_per_process_components_unittests
  - site_per_process_content_browsertests
  - site_per_process_content_unittests
  - site_per_process_extensions_browsertests
  - site_per_process_extensions_unittests
  - site_per_process_interactive_ui_tests
  - site_per_process_sync_integration_tests

to the Linux Tests and Linux Tests (dbg)(1) builders; this will
also add them to linux_chromium_rel_ng in the CQ. This CL also
removes these tests from the "Linux Site Isolation" builder on
chromium.fyi.

R=jam@chromium.org, creis@chromium.org, jbudorick@chromium.org
BUG= 786554 

Change-Id: I5b26655138a0120e17a88e1f6aee1ae73cbb8571
Reviewed-on: https://chromium-review.googlesource.com/821554
Reviewed-by: John Budorick <jbudorick@chromium.org>
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Reviewed-by: John Abd-El-Malek <jam@chromium.org>
Reviewed-by: Alex Moshchuk <alexmos@chromium.org>
Commit-Queue: Dirk Pranke <dpranke@chromium.org>
Cr-Commit-Position: refs/heads/master@{#524141}
[modify] https://crrev.com/7feb880d77166ca4cadb5a7f27acf1f2e54f97ad/testing/buildbot/chromium.fyi.json
[modify] https://crrev.com/7feb880d77166ca4cadb5a7f27acf1f2e54f97ad/testing/buildbot/chromium.linux.json
[modify] https://crrev.com/7feb880d77166ca4cadb5a7f27acf1f2e54f97ad/testing/buildbot/test_suite_exceptions.pyl
[modify] https://crrev.com/7feb880d77166ca4cadb5a7f27acf1f2e54f97ad/testing/buildbot/test_suites.pyl
[modify] https://crrev.com/7feb880d77166ca4cadb5a7f27acf1f2e54f97ad/testing/buildbot/waterfalls.pyl

Comment 16 by creis@chromium.org, Dec 15 2017

Thanks!  This is fantastic to see.

Do you know when the layout tests will move over as well?  (We're continuing to see changes land that turn those red as in  issue 795333 , so it'd be great to cover them in try jobs if possible.)
I'm going to check how are capacity is doing over the past 24 hours and, if things look okay, add that in today (I expect we should have plenty of capacity given things getting quieter for the holidays).
I've hit a complication, in that I didn't realize that the way the layout test steps are written, we shouldn't actually run two different variants  of the layout tests in a single build on a single builder, because the `archive_webkit_test_results` step for the second build will overwrite the first one's results.

I'm not sure how hard this will be to fix, as the assumption that there's only one `layout-test-results` directory per build may be fairly hard-coded in a few places.
If I can't readily fix this, we can either go back to running this on another bot, or turn this into VirtualTestSuites.

Comment 19 by creis@chromium.org, Dec 17 2017

Cc: lukasza@chromium.org
A virtual test suite might be a good option.  Is there a way to do that on certain bots instead of all of them (to limit the extra workload)?
trybots "mirror" waterfall bots, so, for example, linux_chromium_rel_ng runs the same tests the same way as "Linux Tests".

So, there's no way to run things as a VirtualTestSuite on the trybot but run everything on *that* waterfall bot.

It would be possible to run everything on a different waterfall bot (e.g., if we wanted to keep the FYI bots around), but it's not clear how useful that'd be.

I'll work on this today and see what I can come up with; one way or another I should have something running by the end of the day.
I think the main concern in #c19 was to only run --site-per-process+--isolate-origins=wpt-origins version of layout tests once (e.g. on linux_chromium_rel_ng) and avoid running it on all bots that run layout tests (e.g. Mac and Windows).  I am not sure if this is possible via VirtualTestSuite mechanism (which I assume is platform agnostic).

Additionally, I see that most VirtualTestSuites try to apply only to a subset of layout tests (e.g. via "base": "fast/dom/shadow" declaration).  This might be possibly different from our requirements:
- Today Site Isolation Linux bot runs *all* layout tests with --site-per-process+--isolate-origins=wpt-origins
- alexmos@ and creis@ might know more, but I think running http/tests and external/wpt is P0 (and running other layout tests is either P1 or P2).
> I think the main concern in #c19 was to only run 
> --site-per-process+--isolate-origins=wpt-origins version of layout 
> tests once (e.g. on linux_chromium_rel_ng) and avoid running it on all 
> bots that run layout tests (e.g. Mac and Windows).  I am not sure if this 
> is possible via VirtualTestSuite mechanism (which I assume is platform agnostic).

The VirtualTestSuites themselves are platform agnostic, but you can easily [ Skip ] them on some platforms in TestExpectations.

> Additionally, I see that most VirtualTestSuites try to apply only to a subset of layout tests 
> (e.g. via "base": "fast/dom/shadow" declaration).  This might be possibly different from our 
> requirements:
> - Today Site Isolation Linux bot runs *all* layout tests with
> --site-per-process+--isolate-origins=wpt-origins
> - alexmos@ and creis@ might know more, but I think running http/tests and external/wpt is 
> P0 (and running other layout tests is either P1 or P2).

It is true that VirtualTestSuites cannot (easily) be used to run *everything* again with a different flag, and it would be an abuse of the feature to try and do so. But, running just http/tests again would be fine and is exactly what it's for. I do understand the your requirements (from #c6 and #c20), and I do think that running the test step twice is the better way to do things if you really do want to run everything.

As an aside, external/wpt is basically 50% of the tests at this point, and covers pretty much the whole platform, so saying that you need to run external/wpt is basically saying "I need to test the whole platform". It strikes me that that's probably not really true, but I expect that it's hard to say *which* subsets of external/wpt really matter.


Update: it looks like the archive upload location isn't that hard-coded, and fixing this should be straightforward. I'm testing it now.
Project Member

Comment 24 by bugdroid1@chromium.org, Dec 19 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/9dbcaaf60aa0960572a31ced8fdc92b0060d9595

commit 9dbcaaf60aa0960572a31ced8fdc92b0060d9595
Author: Dirk Pranke <dpranke@chromium.org>
Date: Tue Dec 19 18:06:24 2017

Add custom step name to layout test archive uploads.

Currently it's only possible to upload one set of layout test
results per build. This is a problem if you want to run the
tests twice, with different configurations. This CL changes things
so that the step name will be included in the path to the upload
if we're using a step name other than 'webkit_tests' or
'webkit_layout_tests' (ignoring the with/without patch part).

R=jbudorick@chromium.org, qyearsley@chromium.org
BUG= 786554 

Change-Id: I198add586d8fedc9cd6f7ecb6572b6e9b3030476
Reviewed-on: https://chromium-review.googlesource.com/833092
Commit-Queue: Dirk Pranke <dpranke@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>

[modify] https://crrev.com/9dbcaaf60aa0960572a31ced8fdc92b0060d9595/scripts/slave/README.recipes.md
[add] https://crrev.com/9dbcaaf60aa0960572a31ced8fdc92b0060d9595/scripts/slave/recipe_modules/chromium_tests/tests/steps/generate_isolated_script.expected/custom_webkit_tests_step_name.json
[modify] https://crrev.com/9dbcaaf60aa0960572a31ced8fdc92b0060d9595/scripts/slave/chromium/archive_layout_test_results.py
[modify] https://crrev.com/9dbcaaf60aa0960572a31ced8fdc92b0060d9595/scripts/slave/recipe_modules/chromium_tests/tests/steps/generate_isolated_script.py
[modify] https://crrev.com/9dbcaaf60aa0960572a31ced8fdc92b0060d9595/scripts/slave/recipe_modules/chromium_tests/steps.py

Status: Fixed (was: Started)
I think this is done, and we can remove the FYI bot.
Project Member

Comment 27 by bugdroid1@chromium.org, Dec 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/6ced07da6d703937f3d2418fbcb81cace5abe3fb

commit 6ced07da6d703937f3d2418fbcb81cace5abe3fb
Author: Dirk Pranke <dpranke@chromium.org>
Date: Thu Dec 21 03:03:41 2017

Fix how we determine the custom step name for layout_tests-based-tests.

crrev.com/c/833092 changed how we set the step name for
archive_layout_test_results to better support running webkit_test-based
tests multiple times in a build, for the site_per_process tests.
This, however, affected rebaseline-cl, which is bad.

This CL attempts to fix that by using a custom step name less often.

R=robertma@chromium.org
BUG= 786554 

Change-Id: I9e30f09367b141d9e5b61e9f9dd0fa0f042a040d
Reviewed-on: https://chromium-review.googlesource.com/837645
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>
Commit-Queue: Dirk Pranke <dpranke@chromium.org>

[modify] https://crrev.com/6ced07da6d703937f3d2418fbcb81cace5abe3fb/scripts/slave/recipe_modules/chromium_tests/steps.py

Comment 28 by creis@chromium.org, Dec 23 2017

Thank you!!  This is a fantastic help for preventing regressions in Site Isolation modes.  Looks like the Site Isolation Linux bot can be retired whenever you want.  (It appears to still be building but not doing much else.)
Project Member

Comment 29 by bugdroid1@chromium.org, Jan 12 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d946846ba23387d87af71292b5cf50e100404941

commit d946846ba23387d87af71292b5cf50e100404941
Author: Dirk Pranke <dpranke@chromium.org>
Date: Fri Jan 12 02:37:52 2018

Run the layout tests under site-per-process on Linux debug.

Previously we were only running them on Linux Tests (release),
so this wouldn't catch DCHECKs and other debug-only failures,
and we'd have tests failing on the CQ but not realize it :(.

TBR=jbudorick@chromium.org, lukasza@chromium.org
NOTRY=true
BUG= 786554 

Change-Id: I05259069fd2ab390a5372b6c330b1500231c98a3
Reviewed-on: https://chromium-review.googlesource.com/863163
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Dirk Pranke <dpranke@chromium.org>
Cr-Commit-Position: refs/heads/master@{#528858}
[modify] https://crrev.com/d946846ba23387d87af71292b5cf50e100404941/testing/buildbot/chromium.linux.json
[modify] https://crrev.com/d946846ba23387d87af71292b5cf50e100404941/testing/buildbot/test_suite_exceptions.pyl

Project Member

Comment 30 by bugdroid1@chromium.org, Jan 12 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/948f1199acde5bcae335808d7624b143e4c9d4d8

commit 948f1199acde5bcae335808d7624b143e4c9d4d8
Author: Friedrich Horschig <fhorschig@chromium.org>
Date: Fri Jan 12 11:54:05 2018

Revert "Run the layout tests under site-per-process on Linux debug."

This reverts commit d946846ba23387d87af71292b5cf50e100404941.

Reason for revert: tests apparently look for content_shell in out/Release (https://crbug.com/801478) so this would need to be configured as well when enabling this.

Original change's description:
> Run the layout tests under site-per-process on Linux debug.
> 
> Previously we were only running them on Linux Tests (release),
> so this wouldn't catch DCHECKs and other debug-only failures,
> and we'd have tests failing on the CQ but not realize it :(.
> 
> TBR=jbudorick@chromium.org, lukasza@chromium.org
> NOTRY=true
> BUG= 786554 
> 
> Change-Id: I05259069fd2ab390a5372b6c330b1500231c98a3
> Reviewed-on: https://chromium-review.googlesource.com/863163
> Reviewed-by: Dirk Pranke <dpranke@chromium.org>
> Reviewed-by: John Budorick <jbudorick@chromium.org>
> Commit-Queue: Dirk Pranke <dpranke@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#528858}

TBR=dpranke@chromium.org,lukasza@chromium.org,jbudorick@chromium.org

Change-Id: I27fa9abe8faf10390f074cf5fcb976575e2e8e4c
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  786554 
Reviewed-on: https://chromium-review.googlesource.com/864022
Reviewed-by: Friedrich Horschig <fhorschig@chromium.org>
Commit-Queue: Friedrich Horschig <fhorschig@chromium.org>
Cr-Commit-Position: refs/heads/master@{#528928}
[modify] https://crrev.com/948f1199acde5bcae335808d7624b143e4c9d4d8/testing/buildbot/chromium.linux.json
[modify] https://crrev.com/948f1199acde5bcae335808d7624b143e4c9d4d8/testing/buildbot/test_suite_exceptions.pyl

Comment 31 by s...@google.com, Feb 15 2018

Cc: smut@chromium.org

Comment 32 by s...@google.com, Feb 15 2018

Cc: -s...@google.com

Sign in to add a comment