webkit_layout_tests failing on chromium.webkit/WebKit Win7 (dbg) |
||||||
Issue descriptionwebkit_layout_tests failing on chromium.webkit/WebKit Win7 (dbg) Builders failed on: - WebKit Win7 (dbg): https://build.chromium.org/p/chromium.webkit/builders/WebKit%20Win7%20%28dbg%29 There have been a series of flakes in the test: virtual/longtask-v2/http/tests/performance-timing/longtask-v2/longtask-v8compile.html Example failing build: https://uberchromegw.corp.google.com/i/chromium.webkit/builders/WebKit%20Win7%20%28dbg%29/builds/11615 Nothing in the logs sticks out to me. maxig@ has done work in this area lately. Could you take a look?
,
Oct 23 2017
Assigned owner, taking out of sheriff queue.
,
Oct 23 2017
Sample from a flake: --- /b/s/w/iobVGBUD/layout-test-results/virtual/longtask-v2/http/tests/performance-timing/longtask-v2/longtask-v8compile-expected.txt +++ /b/s/w/iobVGBUD/layout-test-results/virtual/longtask-v2/http/tests/performance-timing/longtask-v2/longtask-v8compile-actual.txt @@ -1,4 +1,4 @@ This is a testharness.js-based test. -PASS Performance longtask entries with script-compile attribute are observable. +FAIL Performance longtask entries with script-compile attribute are observable. assert_equals: expected "self" but got "unknown" Harness: the test ran to completion.
,
Oct 23 2017
Seems like the flakiness is caused by the wrong assumption of the test that only the artificial v8compile-longtask is to be present, but there could be non-artificial ones.
,
Oct 23 2017
It's also interesting that flakiness only happens on Webkit win7 (dbg).
,
Oct 23 2017
+tdresser@, panicker@
,
Jan 29 2018
,
Jan 29 2018
The failure tells us that when it's run on Win7 dbg, there is no frame attached to the execution context - https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/timing/Performance.cpp?rcl=009da37a65b88b9fb8ca4677bb382622fa2db04f&l=214 I've looked into the long task part of the codes. We doesn't treat different OS differently. I feel like whether the context is 'self' or 'unknown' is a bit out of the scope of this test. Can we just accept both 'self' and 'unknown' in the test case?
,
Jan 29 2018
It's flaky, so my guess is that the Win7 dbg machine is slow and sometimes produces a different kind of longtask (not the compile one), which you need to consider in your test by disregarding longtasks with longtask.name == 'unknown'.
,
Jan 29 2018
,
Jan 29 2018
Yep. Nicolas' explanation is very likely to be true. By the way, I couldn't reproduce it on the windows desktop locally, so it's possible that the problem is not on Windows but on 'dbg' - The windows workstation is not slow enough to reproduce the bug. And also, there are two flaky tests and different failures here: For http/tests/performance-timing/longtask-v2/longtask-v8compile.html 1. -FAIL Performance longtask entries with script-compile attribute are observable. assert_false: script-compile is not captured. expected false got true +FAIL Performance longtask entries with script-compile attribute are observable. assert_equals: expected "self" but got "unknown" Harness: the test ran to completion. 2. -FAIL Performance longtask entries with script-compile attribute are observable. assert_false: script-compile is not captured. expected false got true +FAIL Performance longtask entries with script-compile attribute are observable. assert_equals: expected "self" but got "unknown" Harness: the test ran to completion. 3. -FAIL Performance longtask entries with script-compile attribute are observable. assert_false: script-compile is not captured. expected false got true +FAIL Performance longtask entries with script-compile attribute are observable. assert_greater_than: expected a number greater than 50 but got 49.999999999272404 Harness: the test ran to completion. 4. -FAIL Performance longtask entries with script-compile attribute are observable. assert_false: script-compile is not captured. expected false got true +FAIL Performance longtask entries with script-compile attribute are observable. assert_equals: Exactly one entry is expected. expected 1 but got 2 Harness: the test ran to completion. The proposed fix would be just skipping this test case other than expecting a particular cause of failure. By its nature it could fail for different reasons and they are not interesting to us. For virtual/longtask-v2/http/tests/performance-timing/longtask-v2/longtask-v8compile.html 1. -PASS Performance longtask entries with script-compile attribute are observable. +FAIL Performance longtask entries with script-compile attribute are observable. assert_equals: Exactly one entry is expected. expected 1 but got 2 Harness: the test ran to completion. 2. -PASS Performance longtask entries with script-compile attribute are observable. +FAIL Performance longtask entries with script-compile attribute are observable. assert_false: script-compile is not captured. expected false got true Harness: the test ran to completion. 3. -PASS Performance longtask entries with script-compile attribute are observable. +FAIL Performance longtask entries with script-compile attribute are observable. assert_equals: expected "self" but got "unknown" Harness: the test ran to completion. The proposed fix would be filtering the 'unknown', remove the checking of entrylist's length, and wait for the next entrylist other than declare not found if the expected entry is not found in the current entrylist.
,
Jan 30 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5b205831c0c46f617d23b128e22e1700a9c7d409 commit 5b205831c0c46f617d23b128e22e1700a9c7d409 Author: Liquan (Max) Gu <maxlg@chromium.org> Date: Tue Jan 30 20:48:54 2018 [LongTaskV2] Fix flakiness in longtask-v8compile test The original test is flaky in both virtual test and nonvirtual test. For the non-virtual test, it was expected to fail for one reason but it's reasonable to fail for other reasons as well. We should include those failures. For the virtual test, the test failed since long tasks could be present in a slow environment. We fix it by ignoring those longtasks & attributions that are not generated by us. Also, we allow waiting for multiple entry lists until we find the interesting one. Bug: 775626 Change-Id: Ief2a3bc4016dc3f808cb17be0122e35f75b58f9a Reviewed-on: https://chromium-review.googlesource.com/891571 Commit-Queue: Liquan Gu <maxlg@chromium.org> Reviewed-by: Timothy Dresser <tdresser@chromium.org> Cr-Commit-Position: refs/heads/master@{#533016} [modify] https://crrev.com/5b205831c0c46f617d23b128e22e1700a9c7d409/third_party/WebKit/LayoutTests/TestExpectations [delete] https://crrev.com/ef42564665d915f2d2294d0bfa3a9ab5632c9777/third_party/WebKit/LayoutTests/http/tests/performance-timing/longtask-v2/longtask-v8compile-expected.txt [modify] https://crrev.com/5b205831c0c46f617d23b128e22e1700a9c7d409/third_party/WebKit/LayoutTests/http/tests/performance-timing/longtask-v2/longtask-v8compile.html
,
Mar 2 2018
The test is no longer flaky. Mark as fixed. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by maxlg@chromium.org
, Oct 17 2017