New issue
Advanced search Search tips

Issue 874695 link

Starred by 4 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Layout tests seem more flaky with site-per-process

Project Member Reported by lukasza@chromium.org, Aug 15

Issue description

This is a follow-up to the issues raised in https://groups.google.com/a/chromium.org/d/topic/chromium-dev/cIycVUIowzU/discussion

I tried to look at flakiness dashboard for site_per_process_webkit_layout_tests step:
1) https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=site_per_process_webkit_layout_tests&sortColumn=slowest
and compare with flakiness dashboard for webkit_layout_tests:
2) https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=webkit_layout_tests&showFlaky=true&builder=chromium.linux%3ALinux%20Tests

AFAICT, the dashboard reports 3958 flaky tests with site-per-process and only 185 flaky tests without site-per-process - please see the attached results.
 
layout-tests-flaky-with-site-per-process
320 KB View Download
layout-tests-flaky-without-site-per-process
10.0 KB View Download
$ cat ~/scratch/layout-tests-flaky-with-site-per-process | cut -f 1 | cut -f 1-2 -d '/' | sort | uniq -c | sort -n | tail -10
     17 virtual/layout_ng
     23 css3/filters
     48 virtual/gpu
     56 virtual/video-surface-layer
    138 virtual/threaded
    158 virtual/outofblink-cors
    167 virtual/outofblink-cors-ns
    307 virtual/layout_ng_experimental
    736 http/tests
   1939 external/wpt
Many tests seem to be slow (and possibly falling off a timeout cliff with site-per-process?):

$ cat ~/scratch/layout-tests-flaky-with-site-per-process | grep '[[:space:]][0-9][0-9]s$' | wc -l
84

$ cat ~/scratch/layout-tests-flaky-with-site-per-process | grep '[[:space:]][5-9]s$' | wc -l
617
FWIW, I assume that I am comparing apples-to-apples here (e.g. despite the fact that the first URL doesn't explicitly specify builder= or showFlaky=, both URLs open dashboards that seem to be restricted to 1) "chromium.linux:Linux Tests" waterfall bot and 2) flaky tests).
Cc: nednguyen@chromium.org liaoyuke@chromium.org st...@chromium.org
+stgao / liaoyuke - you should be able to confirm this pretty easily with infra data, right?

Back In The Day, I had a rule of thumb that any test that took longer than 1 second to run should probably be marked as Slow, because the variance in test times we'd see on the bots might lead to some timeouts. (And this was definitely true for things slower than 2-3 seconds). 

I expect that we've had a lot of tests either get added or get slower where we haven't done this, and it's possible that site-per-process takes an already oversubscribed bot (due to running too many content_shells in parallel) and *way* oversubscribes it, making things too slow.

We could test this by either increasing the timeout values on the tests, or reducing the amount of parallelism (with --jobs values < 8) and seeing if that helped.
Cc: -nednguyen@chromium.org robertma@chromium.org nedngu...@google.com
Below are data for flaky tests that caused retries of CQ builds/attempts from May 1 to July 31.
I haven't checked those hidden flakes yet -- the tests passed after 2 retries.
And I didn't check data in Aug yet.

Based on the data, my preliminary conclusion is: site_per_process_webkit_layout_tests has less flake occurrences, but more tests of low flakiness.

1. No special filtering:
test_target                            total_flake_occurrences               distinct_flaky_tests
webkit_layout_tests                    6242                                         1791
site_per_process_webkit_layout_tests   4322                                         3034


2. Ignore tests that had only one flake occurrence
test_target                            total_flake_occurrences               distinct_flaky_tests
webkit_layout_tests                    857                                         5308
site_per_process_webkit_layout_tests   331                                         1619

3. Ignore tests that had more than one flake occurrence
test_target                            total_flake_occurrences               distinct_flaky_tests
site_per_process_webkit_layout_tests   2703                                  2703	
webkit_layout_tests                    934                                   934


For those who would like to play with the data, here is the query to start with. I could handle permission issue there.
https://pantheon.corp.google.com/bigquery?project=findit-for-me&folder&organizationId=433637338589&j=bquxjob_2f70f9c5_16540abfd9b&page=queryresults

A handful of tests with slowest_run >= 10s are already present in third_party/WebKit/LayoutTests/SlowTests, but most aren't - let me put together a CL that fixes this.

# Missing from SlowTests:

$ for i in `cat ~/scratch/layout-tests-flaky-with-site-per-process | grep '[[:space:]][0-9][0-9]s$' | cut -f 1`; do if ! grep -q "$i" third_party/WebKit/LayoutTests/SlowTests; then echo $i; fi; done | wc -l
72

# Already present in SlowTests:

~/src/chromium3/src on spinner-of-dead-navigation
$ for i in `cat ~/scratch/layout-tests-flaky-with-site-per-process | grep '[[:space:]][0-9][0-9]s$' | cut -f 1`; do if grep -q "$i" third_party/WebKit/LayoutTests/SlowTests; then echo $i; fi; done | wc -l
12


WIP CL: https://crrev.com/c/1178226
We should think about automating keeping SlowTests up to date, just like we have w/ flaky tests.
Summary: Layout tests seem more flaky with site-per-process (was: Layout tests are significantly more flaky with site-per-process)
Project Member

Comment 10 by bugdroid1@chromium.org, Aug 17

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/81862268bd1fca446fcad20865eaa9ab64fad14f

commit 81862268bd1fca446fcad20865eaa9ab64fad14f
Author: Lukasz Anforowicz <lukasza@chromium.org>
Date: Fri Aug 17 16:01:51 2018

Use flakiness dashboard snapshot to identify layout tests slower than 3s

This CL has been put together by
1. Going to flakiness dashboard for site_per_process_webkit_layout_tests
   (see the links at the top of https://crbug.com/874695)
2. Grabbing flaky tests with slowest_run >= 3 and adding them
   to SlowTests (unless they've been already present)

Bug: 874695
Change-Id: I96c7befd3a654b1be8291921c51d539b5f6fbfb8
Reviewed-on: https://chromium-review.googlesource.com/1178226
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Commit-Queue: Ɓukasz Anforowicz <lukasza@chromium.org>
Cr-Commit-Position: refs/heads/master@{#584088}
[modify] https://crrev.com/81862268bd1fca446fcad20865eaa9ab64fad14f/third_party/WebKit/LayoutTests/SlowTests

Status update:

- I am still struggling with https://crrev.com/c/1178465 - having a test in SlowTests is not sufficient to get rid of flaky timeouts.  I've opened  issue 875430  to track this aspect of the problem.

- I've opened issue 875419 to follow-up on dpranke@'s suggestion from #c8 to automate updating LayoutTests/SlowTests
Blockedon: 834185
One source of flakiness has been identified in  issue 834185 .  Let's revisit after this issue gets fixed.
Blockedon: -834185
Cc: japhet@chromium.org
 Issue 855816  has been merged into this issue.

Sign in to add a comment