New issue
Advanced search Search tips

Issue 884776 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 21
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocking:
issue 888734
issue 881991
issue 882969



Sign in to add a comment

webkit_layout_test fails with INVALID_TEST_RESULTS

Project Member Reported by erikc...@chromium.org, Sep 17

Issue description

In this tryjob:
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/86368

webkit_layout_tests fails with INVALID_TEST_RESULTS, which prevents 'retry with patch' logic in  Issue 883321  from kicking in.

An initial glance suggests that there were no infra issues. The failed shard: https://chromium-swarm.appspot.com/task?id=3ffffc3bf8298610&refresh=10&show_raw=1

hit a DCHECK very early on and never ran any tests.


 
Screen Shot 2018-09-17 at 1.11.37 PM.png
98.9 KB View Download
Cc: robertma@chromium.org
+robertma who knows better of the layout test runner.
This particular issue is being tracked here: https://bugs.chromium.org/p/chromium/issues/detail?id=773617#c11

Although there's no traction there yet. 

Regardless, there's no reason this error should prevent 'retry with patch' from running. I'm going to mull on this some more and figure out the best way to move forward.
Erik is correct. Unfortunately, no one seems to understand the root cause of the occasional crashes on Windows.

We could change the runner to produce a valid but empty result when things like this happen, but that would have its own problems: results would become incomplete and the problem would be even harder to discover.
Re #2: I think there's some logic in the re-running without patch logic which says that we should only re-run the specific tests that failed. If there aren't any test results, then the recipes don't know which tests failed, and so it doesn't know what it should re-run, and it doesn't do anything. Or something like that. This might have changed since you've landed CLs, so I trust you looking at the code.
Cc: kbr@chromium.org
Making progress.

If I had a CHECK(false) to content itself, e.g.  BrowserMainLoop::PreCreateThreads, the crash is handled gracefully by the test runner. This will output [after crashing a bunch and retrying]...

"""
0 tests ran as expected, 7 didn't:
    svg/dom/SVGScriptElement/script-async-attr.svg
    svg/dom/SVGScriptElement/script-clone-rerun-self.svg
    svg/dom/SVGScriptElement/script-clone-rerun.svg
    svg/dom/SVGScriptElement/script-external-no-multiple-load.html
    svg/dom/SVGScriptElement/script-reexecution.svg
    svg/dom/SVGScriptElement/script-set-href.svg
    svg/dom/SVGScriptElement/script-type-attribute.svg
"""

However, if I add a CHECK(false) to AtExitManager::~AtExitManager(), instead I see the same error as above. None of the tests run and the test runner early exits. This is caused by a crash in the system dependency checker. Passing "--nocheck-sys-deps" works around this problem.

So there are actually two problems here:
  1) The system dependency checker is flakily hitting a DCHECK
  2) This causes the test_runner to early exit with no results.
The system dependency checker was added in 2010:
https://trac.webkit.org/changeset/54094/webkit

It's not clear to me what utility it provides now. If there are missing system dependencies, the tests will fail. I don't think we need a separate layer of logic to test system dependencies [which in this case is flaky].
Waiting for dpranke to respond on this crbug since he added the logic in 2010.
The purpose of the check was to make sure that Windows is configured correctly (right system theme and font settings) so that certain tests would pass; if the system wasn't configured correctly, you'd get a clear error at the beginning of the test run telling you what you'd need to fix, rather than random test failures with inscrutable errors.

The layout test harness was supposed to handle this failure cleanly, and there should be a way for that error to propagate up through the recipe cleanly as well (e.g., as an infra failure).

The check shouldn't ever be flaky, unless what's happening is that you're bouncing between machines that are configured correctly and machines that aren't.

It's been so long since I wrote this code (which was done for XP) that I don't have a real sense now of what'd happen if we ran it on a machine that wasn't configured correctly, or if that's even still possible, although I'd expect it to be possible.

I'd be inclined to argue that we should figure out what's going wrong in the test harness and recipe code (if anything) that's leading to INVALID_TEST_RESULTS rather than an infra error.

You could, though, decide that all of the checking I described in the first paragraph wasn't necessary, but I suspect that'd just lead to flakes elsewhere. If you were to remove the checks, I'd try to figure out which tests might fail and get approval for this from, say, eae@, since the tests that are likely to fail are text and layout-related.

Arguably you could also move this check to part of the swarming config and make sure that the bot ran the check on startup and self-quarantined, instead.
Cc: e...@chromium.org
+ eae

To clarify, the failures in the layout test harness are not due to misconfigurations of system settings. They are due to the checker flakily hitting a DCHECK during shutdown [causing a crash]. This then causes run_web_tests.py to barf up the exit_codes.SYS_DEPS_EXIT_STATUS error code.

The checker runs Content Shell with the flag "--check-layout-test-sys-deps". The business logic is actually a relatively short function:
https://cs.chromium.org/chromium/src/content/shell/app/blink_test_platform_support_win.cc?type=cs&q=CheckLayoutSystemDeps&g=0&l=45

It is not clear to me why we need to bring up Content Shell at all to check for system configuration. We happen do so using some custom logic that looks similar to, but different from the usual Content Shell logic. Presumably, these differences cause some subtle threading issues that cause us to hit the DCHECK during shut down.
https://cs.chromium.org/chromium/src/content/shell/browser/layout_test/layout_test_browser_main.cc?type=cs&q=kCheckLayoutTestSysDeps&sq=package:chromium&g=0&l=160

If this piece of code is actually useful [are there examples in the last few years where it provided positive utility?] I can move the business logic into a tiny executable and run that from run_web_tests.py. However, I think there's also a chance that this is just legacy code which isn't providing utility, in which case I will remove the logic altogether.
As noted above, I don't know how useful, if at all, that check is. I don't have a strong opinion on whether or not to keep it around.
The check is there to catch miss-configured bots early and has helped quite a bit with that over the years. It essentially checks some of the key settings that would affect layout results, as dpranke described in comment 8.

If this check is failing or flaky that is quite concerning and it is likely that whatever causes it to fail would also cause actual layout tests to fail. Removing it doesn't seem like a good option to me, figuring out why it started failing now, after working fine for 8 years, would be my suggestion.

eae, dpranke -- thanks for the feedback. I have a solution that is simpler, removes the "--check-layout-test-sys-deps" step, but will still catch misconfigurations. 

Summary: We will run system configuration checks during early in the life cycle of layout tests [this is already mostly happening]. This way we don't need a separate "--check-layout-test-sys-deps" step. This allows us to remove a bunch of rather suspect code that is only used for "--check-layout-test-sys-deps" and likely responsible for DCHECKs being hit during shutdown. This way any errors that do occur will be correctly caught by the test harness.

===========Responses to earlier comments==============
> it is likely that whatever causes it to fail would also cause actual layout tests to fail

I don't think so. The DCHECK is being hit during shutdown, due to tasks being posted to the main thread after it's been shut down. If we take a look at the main functions:
https://cs.chromium.org/chromium/src/content/shell/browser/layout_test/layout_test_browser_main.cc?type=cs&q=kCheckLayoutTestSysDeps&sq=package:chromium&g=0&l=158

We see that "--check-layout-test-sys-deps" uses different logic that layout tests. Suspiciously, it tries to run a single task on the main thread and then exit, but doesn't do anything to prevent further tasks from being posted from other threads. 

> figuring out why it started failing now

I started looking into webkit layout test flakiness recently. It's not clear to me that this is a recent issue though. 

===========More details on how "--check-layout-test-sys-deps" works ==============

"--check-layout-test-sys-deps" is intended to run CheckLayoutSystemDeps() and BlinkTestPlatformInitialize() and to print a nice, human readable error message if either returns false. Let's modify them to return an error and see what happens:

"""
 54 bool CheckLayoutSystemDeps() {                                                  
 55   std::cerr << "Error in CheckLayoutSystemDeps" << "\n";                        
 56   return false;      
...
python third_party/blink/tools/run_web_tests.py -t gn svg/dom/SVGScriptElement 
...
System dependencies check failed.
To override, invoke with --nocheck-sys-deps

Error in CheckLayoutSystemDeps
"""

Okay -- if there's an error in CheckLayoutSystemDeps, "--check-layout-test-sys-deps" works correctly.

"""
 60 bool BlinkTestPlatformInitialize() {                                            
 61   std::cerr << "Error in BlinkTestPlatformInitialize" << "\n";                  
 62   return false; 
...
python third_party/blink/tools/run_web_tests.py -t gn svg/dom/SVGScriptElement 
...
Checking system dependencies ...
...
Running 1 Content Shell.        

The Content Shell process crashed while starting:         
"Error in BlinkTestPlatformInitialize
"
content_shell took too long to startup.
[1/7] svg/dom/SVGScriptElement/script-clone-rerun-self.svg failed unexpectedly (Content Shell crashed [pid=24635])
"""

If BlinkTestPlatformInitialize() returns false, then the system dependency checker actually succeeds! But then when we try to run the layout test, it prints the expected error message. So we're already running the most relevant check from layout tests.

===========Proposed Solution==============

1) Modify ShellMainDelegate::BasicStartupComplete to also call CheckLayoutSystemDeps() for layout tests
2) Remove all logic from blink test runner and Content Shell related to "--check-layout-test-sys-deps".

Benefits:
* Configuration errors are still caught and prevent the test from running. This prints human-readable error messages as before and makes misconfiguration easy to debug.
* Removal of a bunch of code in Content Shell that behaves similarly to, but not quite the same as layout tests. [Likely responsible for flaky crashes]
* Flakiness in Content Shell during "--check-layout-test-sys-deps" phase no longer takes down run_web_tests.py script. Crashes during a normal run of Content Shell will be correclty caught, retried, etc.
Sounds like a good plan, thank you!
Project Member

Comment 14 by bugdroid1@chromium.org, Sep 18

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3e2f865c337b783b83d4d42fc1b09f523e04a1f9

commit 3e2f865c337b783b83d4d42fc1b09f523e04a1f9
Author: Erik Chen <erikchen@chromium.org>
Date: Tue Sep 18 19:15:03 2018

Run system dependency checks early in the lifecycle of layout tests.

This CL removes the flag kCheckLayoutTestSysDeps and modifies Content Shell to
run CheckLayoutSystemDeps for layout tests.

Previously, run_web_tests.py would first run Content Shell with
kCheckLayoutTestSysDeps to check system dependencies, and then subsequently run
Content Shell with kRunWebTests to run the actual layout tests. The goal was for
the former step to discover system misconfigurations and early-exit so that
developers would not lose time debugging system configuration failures.

This CL's changes has several benefits:
  * kCheckLayoutTestSysDeps was using a different mechanism for starting and
    stopping the main thread's run loop. This was likely responsible for flaky
    crashes.
  * run_web_tests.py would exit with SYS_DEPS_EXIT_STATUS upon encountering a
    crash in kCheckLayoutTestSysDeps. This failure was not recoverable. Now,
    flaky crashes are caught using the normal layout test runner mechanism, and
    are appropriately retried.
  * It simplifies the logic for both Content Shell [removing a flag] and
    web_test_runner.py [removing yet another mechanism for launching the driver
    and parsing its results].

Bug:  884776 
Change-Id: I7472f5e3b90e6d2578d2ed0be906e0e57ae9dbaa
Reviewed-on: https://chromium-review.googlesource.com/1229240
Commit-Queue: Erik Chen <erikchen@chromium.org>
Reviewed-by: Emil A Eklund <eae@chromium.org>
Reviewed-by: Camille Lamy <clamy@chromium.org>
Reviewed-by: Robert Ma <robertma@chromium.org>
Cr-Commit-Position: refs/heads/master@{#592137}
[modify] https://crrev.com/3e2f865c337b783b83d4d42fc1b09f523e04a1f9/content/shell/app/blink_test_platform_support_win.cc
[modify] https://crrev.com/3e2f865c337b783b83d4d42fc1b09f523e04a1f9/content/shell/app/shell_main_delegate.cc
[modify] https://crrev.com/3e2f865c337b783b83d4d42fc1b09f523e04a1f9/content/shell/browser/layout_test/layout_test_browser_main.cc
[modify] https://crrev.com/3e2f865c337b783b83d4d42fc1b09f523e04a1f9/content/shell/common/layout_test/layout_test_switches.cc
[modify] https://crrev.com/3e2f865c337b783b83d4d42fc1b09f523e04a1f9/content/shell/common/layout_test/layout_test_switches.h
[modify] https://crrev.com/3e2f865c337b783b83d4d42fc1b09f523e04a1f9/third_party/blink/tools/blinkpy/web_tests/port/base.py
[modify] https://crrev.com/3e2f865c337b783b83d4d42fc1b09f523e04a1f9/third_party/blink/tools/blinkpy/web_tests/port/browser_test_unittest.py
[modify] https://crrev.com/3e2f865c337b783b83d4d42fc1b09f523e04a1f9/third_party/blink/tools/blinkpy/web_tests/port/port_testcase.py

Status: Fixed (was: Assigned)
Sorry, finally getting caught up on this now.

If I understand what you did in the change in #c14, you changed things so that if the system check truly fails, rather than the entire test suite aborting up front, we'll now get N different tests failing, each with an error message (although not with the text that was previously in the python code).

Is that correct?

If so, this change worries me a bit at least semantically, in the sense that the test is failing due to an infrastructure issue, but we won't actually get any information now that could've told us that to generate an alert; instead, to the infrastructure, these'll look like normal test failures, and we'll ignore them.

Is that intended?
> If I understand what you did in the change in #c14, you changed things so that if the system check truly fails, rather than the entire test suite aborting up front, we'll now get N different tests failing, each with an error message (although not with the text that was previously in the python code).

> Is that correct?

correct. 

> If so, this change worries me a bit at least semantically, in the sense that the test is failing due to an infrastructure issue, but we won't actually get any information now that could've told us that to generate an alert; instead, to the infrastructure, these'll look like normal test failures, and we'll ignore them.

Prior to my CL, the error bubbled up as INVALID_TEST_RESULTS, caused the CQ run to fail, and then the CQ run was automatically retried and then passed. We ignored the INVALID_TEST_RESULTS result anyways.

> Is that intended?

Yes. 

I agree that we could have a better mechanism for reporting configuration errors, but we kind of need that across the board, not just for webkit tests, and the previous implementation was just not doing the right thing [lots of false positives, very few true positives].

One of my earlier proposals was:
"""
It is not clear to me why we need to bring up Content Shell at all to check for system configuration.
...
I can move the business logic into a tiny executable and run that from run_web_tests.py.
"""

That seems like the best way forward if we want this functionality, but I see no reason to implement this until there's something that will actually consume the results [and not just have them be hidden by a retry].

Here's an example of an actual configuration error [from a different issue]:
https://chromium-review.googlesource.com/c/chromium/src/+/1238780/2..3

That was returning as INVALID_TEST_RESULTS [for who knows how long] that was ignored. We need a mechanism to self-quarantine and auto-file bug on config errors. 
Cc: -kbr@chromium.org
I probably wasn't as clear as I could've been earlier: your changes have been good and I'm glad you're working on them. I'm just noting some minor concerns.

I agree that it'd be good to move the code for checking dependencies out of content_shell if we think that we still need them (as I said earlier somewhere, I'm not even sure that we do, at least on Win 10), and what you've done is fine for now.

I also agree that the problem of (not) handling infra failures is endemic system-wide. I expect that at least what you've done makes the failure more like what another test suite would do, and that's probably a good thing.

We do have a mechanism for self-quarantining in swarming, so the hypothetical binary could be compiled and added to the checks that swarming does, which is probably actually the right thing to do.

Blocking: 888734
Cc: nedngu...@google.com erikc...@chromium.org
 Issue 773617  has been merged into this issue.
Project Member

Comment 22 by bugdroid1@chromium.org, Nov 13

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c91ef9689e59c6c83aa408367a68599556b9086f

commit c91ef9689e59c6c83aa408367a68599556b9086f
Author: Ned Nguyen <nednguyen@google.com>
Date: Tue Nov 13 18:51:20 2018

Remove needs_http parameter in check_sys_deps method

This also remove check_sys_deps from base.Port's subclasses that are
the same as the base class method.

Note: this CL removes an unused parameter and is a refactor with no intended behavior change.

This clean up is a follow-up of https://chromium-review.googlesource.com/c/chromium/src/+/1229240/

BUG:  chromium:773617 , chromium:884776 
Change-Id: I82edf733a696c4e0e6de20a8bac954e244d51365
Reviewed-on: https://chromium-review.googlesource.com/c/1333893
Commit-Queue: Ned Nguyen <nednguyen@google.com>
Reviewed-by: Robert Ma <robertma@chromium.org>
Reviewed-by: Erik Chen <erikchen@chromium.org>
Cr-Commit-Position: refs/heads/master@{#607676}
[modify] https://crrev.com/c91ef9689e59c6c83aa408367a68599556b9086f/third_party/blink/tools/blinkpy/web_tests/controllers/manager.py
[modify] https://crrev.com/c91ef9689e59c6c83aa408367a68599556b9086f/third_party/blink/tools/blinkpy/web_tests/port/android.py
[modify] https://crrev.com/c91ef9689e59c6c83aa408367a68599556b9086f/third_party/blink/tools/blinkpy/web_tests/port/base.py
[modify] https://crrev.com/c91ef9689e59c6c83aa408367a68599556b9086f/third_party/blink/tools/blinkpy/web_tests/port/browser_test.py
[modify] https://crrev.com/c91ef9689e59c6c83aa408367a68599556b9086f/third_party/blink/tools/blinkpy/web_tests/port/fuchsia.py
[modify] https://crrev.com/c91ef9689e59c6c83aa408367a68599556b9086f/third_party/blink/tools/blinkpy/web_tests/port/mock_drt.py
[modify] https://crrev.com/c91ef9689e59c6c83aa408367a68599556b9086f/third_party/blink/tools/blinkpy/web_tests/port/test.py

Labels: Infra-Platform-Test

Sign in to add a comment