webkit_layout_test fails with INVALID_TEST_RESULTS |
|||||||
Issue descriptionIn this tryjob: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/86368 webkit_layout_tests fails with INVALID_TEST_RESULTS, which prevents 'retry with patch' logic in Issue 883321 from kicking in. An initial glance suggests that there were no infra issues. The failed shard: https://chromium-swarm.appspot.com/task?id=3ffffc3bf8298610&refresh=10&show_raw=1 hit a DCHECK very early on and never ran any tests.
,
Sep 17
This particular issue is being tracked here: https://bugs.chromium.org/p/chromium/issues/detail?id=773617#c11 Although there's no traction there yet. Regardless, there's no reason this error should prevent 'retry with patch' from running. I'm going to mull on this some more and figure out the best way to move forward.
,
Sep 17
Erik is correct. Unfortunately, no one seems to understand the root cause of the occasional crashes on Windows. We could change the runner to produce a valid but empty result when things like this happen, but that would have its own problems: results would become incomplete and the problem would be even harder to discover.
,
Sep 17
Re #2: I think there's some logic in the re-running without patch logic which says that we should only re-run the specific tests that failed. If there aren't any test results, then the recipes don't know which tests failed, and so it doesn't know what it should re-run, and it doesn't do anything. Or something like that. This might have changed since you've landed CLs, so I trust you looking at the code.
,
Sep 17
Making progress.
If I had a CHECK(false) to content itself, e.g. BrowserMainLoop::PreCreateThreads, the crash is handled gracefully by the test runner. This will output [after crashing a bunch and retrying]...
"""
0 tests ran as expected, 7 didn't:
svg/dom/SVGScriptElement/script-async-attr.svg
svg/dom/SVGScriptElement/script-clone-rerun-self.svg
svg/dom/SVGScriptElement/script-clone-rerun.svg
svg/dom/SVGScriptElement/script-external-no-multiple-load.html
svg/dom/SVGScriptElement/script-reexecution.svg
svg/dom/SVGScriptElement/script-set-href.svg
svg/dom/SVGScriptElement/script-type-attribute.svg
"""
However, if I add a CHECK(false) to AtExitManager::~AtExitManager(), instead I see the same error as above. None of the tests run and the test runner early exits. This is caused by a crash in the system dependency checker. Passing "--nocheck-sys-deps" works around this problem.
So there are actually two problems here:
1) The system dependency checker is flakily hitting a DCHECK
2) This causes the test_runner to early exit with no results.
,
Sep 17
The system dependency checker was added in 2010: https://trac.webkit.org/changeset/54094/webkit It's not clear to me what utility it provides now. If there are missing system dependencies, the tests will fail. I don't think we need a separate layer of logic to test system dependencies [which in this case is flaky].
,
Sep 17
Waiting for dpranke to respond on this crbug since he added the logic in 2010.
,
Sep 17
The purpose of the check was to make sure that Windows is configured correctly (right system theme and font settings) so that certain tests would pass; if the system wasn't configured correctly, you'd get a clear error at the beginning of the test run telling you what you'd need to fix, rather than random test failures with inscrutable errors. The layout test harness was supposed to handle this failure cleanly, and there should be a way for that error to propagate up through the recipe cleanly as well (e.g., as an infra failure). The check shouldn't ever be flaky, unless what's happening is that you're bouncing between machines that are configured correctly and machines that aren't. It's been so long since I wrote this code (which was done for XP) that I don't have a real sense now of what'd happen if we ran it on a machine that wasn't configured correctly, or if that's even still possible, although I'd expect it to be possible. I'd be inclined to argue that we should figure out what's going wrong in the test harness and recipe code (if anything) that's leading to INVALID_TEST_RESULTS rather than an infra error. You could, though, decide that all of the checking I described in the first paragraph wasn't necessary, but I suspect that'd just lead to flakes elsewhere. If you were to remove the checks, I'd try to figure out which tests might fail and get approval for this from, say, eae@, since the tests that are likely to fail are text and layout-related. Arguably you could also move this check to part of the swarming config and make sure that the bot ran the check on startup and self-quarantined, instead.
,
Sep 17
+ eae To clarify, the failures in the layout test harness are not due to misconfigurations of system settings. They are due to the checker flakily hitting a DCHECK during shutdown [causing a crash]. This then causes run_web_tests.py to barf up the exit_codes.SYS_DEPS_EXIT_STATUS error code. The checker runs Content Shell with the flag "--check-layout-test-sys-deps". The business logic is actually a relatively short function: https://cs.chromium.org/chromium/src/content/shell/app/blink_test_platform_support_win.cc?type=cs&q=CheckLayoutSystemDeps&g=0&l=45 It is not clear to me why we need to bring up Content Shell at all to check for system configuration. We happen do so using some custom logic that looks similar to, but different from the usual Content Shell logic. Presumably, these differences cause some subtle threading issues that cause us to hit the DCHECK during shut down. https://cs.chromium.org/chromium/src/content/shell/browser/layout_test/layout_test_browser_main.cc?type=cs&q=kCheckLayoutTestSysDeps&sq=package:chromium&g=0&l=160 If this piece of code is actually useful [are there examples in the last few years where it provided positive utility?] I can move the business logic into a tiny executable and run that from run_web_tests.py. However, I think there's also a chance that this is just legacy code which isn't providing utility, in which case I will remove the logic altogether.
,
Sep 17
As noted above, I don't know how useful, if at all, that check is. I don't have a strong opinion on whether or not to keep it around.
,
Sep 17
The check is there to catch miss-configured bots early and has helped quite a bit with that over the years. It essentially checks some of the key settings that would affect layout results, as dpranke described in comment 8. If this check is failing or flaky that is quite concerning and it is likely that whatever causes it to fail would also cause actual layout tests to fail. Removing it doesn't seem like a good option to me, figuring out why it started failing now, after working fine for 8 years, would be my suggestion.
,
Sep 18
eae, dpranke -- thanks for the feedback. I have a solution that is simpler, removes the "--check-layout-test-sys-deps" step, but will still catch misconfigurations. Summary: We will run system configuration checks during early in the life cycle of layout tests [this is already mostly happening]. This way we don't need a separate "--check-layout-test-sys-deps" step. This allows us to remove a bunch of rather suspect code that is only used for "--check-layout-test-sys-deps" and likely responsible for DCHECKs being hit during shutdown. This way any errors that do occur will be correctly caught by the test harness. ===========Responses to earlier comments============== > it is likely that whatever causes it to fail would also cause actual layout tests to fail I don't think so. The DCHECK is being hit during shutdown, due to tasks being posted to the main thread after it's been shut down. If we take a look at the main functions: https://cs.chromium.org/chromium/src/content/shell/browser/layout_test/layout_test_browser_main.cc?type=cs&q=kCheckLayoutTestSysDeps&sq=package:chromium&g=0&l=158 We see that "--check-layout-test-sys-deps" uses different logic that layout tests. Suspiciously, it tries to run a single task on the main thread and then exit, but doesn't do anything to prevent further tasks from being posted from other threads. > figuring out why it started failing now I started looking into webkit layout test flakiness recently. It's not clear to me that this is a recent issue though. ===========More details on how "--check-layout-test-sys-deps" works ============== "--check-layout-test-sys-deps" is intended to run CheckLayoutSystemDeps() and BlinkTestPlatformInitialize() and to print a nice, human readable error message if either returns false. Let's modify them to return an error and see what happens: """ 54 bool CheckLayoutSystemDeps() { 55 std::cerr << "Error in CheckLayoutSystemDeps" << "\n"; 56 return false; ... python third_party/blink/tools/run_web_tests.py -t gn svg/dom/SVGScriptElement ... System dependencies check failed. To override, invoke with --nocheck-sys-deps Error in CheckLayoutSystemDeps """ Okay -- if there's an error in CheckLayoutSystemDeps, "--check-layout-test-sys-deps" works correctly. """ 60 bool BlinkTestPlatformInitialize() { 61 std::cerr << "Error in BlinkTestPlatformInitialize" << "\n"; 62 return false; ... python third_party/blink/tools/run_web_tests.py -t gn svg/dom/SVGScriptElement ... Checking system dependencies ... ... Running 1 Content Shell. The Content Shell process crashed while starting: "Error in BlinkTestPlatformInitialize " content_shell took too long to startup. [1/7] svg/dom/SVGScriptElement/script-clone-rerun-self.svg failed unexpectedly (Content Shell crashed [pid=24635]) """ If BlinkTestPlatformInitialize() returns false, then the system dependency checker actually succeeds! But then when we try to run the layout test, it prints the expected error message. So we're already running the most relevant check from layout tests. ===========Proposed Solution============== 1) Modify ShellMainDelegate::BasicStartupComplete to also call CheckLayoutSystemDeps() for layout tests 2) Remove all logic from blink test runner and Content Shell related to "--check-layout-test-sys-deps". Benefits: * Configuration errors are still caught and prevent the test from running. This prints human-readable error messages as before and makes misconfiguration easy to debug. * Removal of a bunch of code in Content Shell that behaves similarly to, but not quite the same as layout tests. [Likely responsible for flaky crashes] * Flakiness in Content Shell during "--check-layout-test-sys-deps" phase no longer takes down run_web_tests.py script. Crashes during a normal run of Content Shell will be correclty caught, retried, etc.
,
Sep 18
Sounds like a good plan, thank you!
,
Sep 18
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3e2f865c337b783b83d4d42fc1b09f523e04a1f9 commit 3e2f865c337b783b83d4d42fc1b09f523e04a1f9 Author: Erik Chen <erikchen@chromium.org> Date: Tue Sep 18 19:15:03 2018 Run system dependency checks early in the lifecycle of layout tests. This CL removes the flag kCheckLayoutTestSysDeps and modifies Content Shell to run CheckLayoutSystemDeps for layout tests. Previously, run_web_tests.py would first run Content Shell with kCheckLayoutTestSysDeps to check system dependencies, and then subsequently run Content Shell with kRunWebTests to run the actual layout tests. The goal was for the former step to discover system misconfigurations and early-exit so that developers would not lose time debugging system configuration failures. This CL's changes has several benefits: * kCheckLayoutTestSysDeps was using a different mechanism for starting and stopping the main thread's run loop. This was likely responsible for flaky crashes. * run_web_tests.py would exit with SYS_DEPS_EXIT_STATUS upon encountering a crash in kCheckLayoutTestSysDeps. This failure was not recoverable. Now, flaky crashes are caught using the normal layout test runner mechanism, and are appropriately retried. * It simplifies the logic for both Content Shell [removing a flag] and web_test_runner.py [removing yet another mechanism for launching the driver and parsing its results]. Bug: 884776 Change-Id: I7472f5e3b90e6d2578d2ed0be906e0e57ae9dbaa Reviewed-on: https://chromium-review.googlesource.com/1229240 Commit-Queue: Erik Chen <erikchen@chromium.org> Reviewed-by: Emil A Eklund <eae@chromium.org> Reviewed-by: Camille Lamy <clamy@chromium.org> Reviewed-by: Robert Ma <robertma@chromium.org> Cr-Commit-Position: refs/heads/master@{#592137} [modify] https://crrev.com/3e2f865c337b783b83d4d42fc1b09f523e04a1f9/content/shell/app/blink_test_platform_support_win.cc [modify] https://crrev.com/3e2f865c337b783b83d4d42fc1b09f523e04a1f9/content/shell/app/shell_main_delegate.cc [modify] https://crrev.com/3e2f865c337b783b83d4d42fc1b09f523e04a1f9/content/shell/browser/layout_test/layout_test_browser_main.cc [modify] https://crrev.com/3e2f865c337b783b83d4d42fc1b09f523e04a1f9/content/shell/common/layout_test/layout_test_switches.cc [modify] https://crrev.com/3e2f865c337b783b83d4d42fc1b09f523e04a1f9/content/shell/common/layout_test/layout_test_switches.h [modify] https://crrev.com/3e2f865c337b783b83d4d42fc1b09f523e04a1f9/third_party/blink/tools/blinkpy/web_tests/port/base.py [modify] https://crrev.com/3e2f865c337b783b83d4d42fc1b09f523e04a1f9/third_party/blink/tools/blinkpy/web_tests/port/browser_test_unittest.py [modify] https://crrev.com/3e2f865c337b783b83d4d42fc1b09f523e04a1f9/third_party/blink/tools/blinkpy/web_tests/port/port_testcase.py
,
Sep 21
,
Sep 21
Sorry, finally getting caught up on this now. If I understand what you did in the change in #c14, you changed things so that if the system check truly fails, rather than the entire test suite aborting up front, we'll now get N different tests failing, each with an error message (although not with the text that was previously in the python code). Is that correct? If so, this change worries me a bit at least semantically, in the sense that the test is failing due to an infrastructure issue, but we won't actually get any information now that could've told us that to generate an alert; instead, to the infrastructure, these'll look like normal test failures, and we'll ignore them. Is that intended?
,
Sep 21
> If I understand what you did in the change in #c14, you changed things so that if the system check truly fails, rather than the entire test suite aborting up front, we'll now get N different tests failing, each with an error message (although not with the text that was previously in the python code). > Is that correct? correct. > If so, this change worries me a bit at least semantically, in the sense that the test is failing due to an infrastructure issue, but we won't actually get any information now that could've told us that to generate an alert; instead, to the infrastructure, these'll look like normal test failures, and we'll ignore them. Prior to my CL, the error bubbled up as INVALID_TEST_RESULTS, caused the CQ run to fail, and then the CQ run was automatically retried and then passed. We ignored the INVALID_TEST_RESULTS result anyways. > Is that intended? Yes. I agree that we could have a better mechanism for reporting configuration errors, but we kind of need that across the board, not just for webkit tests, and the previous implementation was just not doing the right thing [lots of false positives, very few true positives]. One of my earlier proposals was: """ It is not clear to me why we need to bring up Content Shell at all to check for system configuration. ... I can move the business logic into a tiny executable and run that from run_web_tests.py. """ That seems like the best way forward if we want this functionality, but I see no reason to implement this until there's something that will actually consume the results [and not just have them be hidden by a retry]. Here's an example of an actual configuration error [from a different issue]: https://chromium-review.googlesource.com/c/chromium/src/+/1238780/2..3 That was returning as INVALID_TEST_RESULTS [for who knows how long] that was ignored. We need a mechanism to self-quarantine and auto-file bug on config errors.
,
Sep 21
,
Sep 21
I probably wasn't as clear as I could've been earlier: your changes have been good and I'm glad you're working on them. I'm just noting some minor concerns. I agree that it'd be good to move the code for checking dependencies out of content_shell if we think that we still need them (as I said earlier somewhere, I'm not even sure that we do, at least on Win 10), and what you've done is fine for now. I also agree that the problem of (not) handling infra failures is endemic system-wide. I expect that at least what you've done makes the failure more like what another test suite would do, and that's probably a good thing. We do have a mechanism for self-quarantining in swarming, so the hypothetical binary could be compiled and added to the checks that swarming does, which is probably actually the right thing to do.
,
Sep 24
,
Nov 13
,
Nov 13
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c91ef9689e59c6c83aa408367a68599556b9086f commit c91ef9689e59c6c83aa408367a68599556b9086f Author: Ned Nguyen <nednguyen@google.com> Date: Tue Nov 13 18:51:20 2018 Remove needs_http parameter in check_sys_deps method This also remove check_sys_deps from base.Port's subclasses that are the same as the base class method. Note: this CL removes an unused parameter and is a refactor with no intended behavior change. This clean up is a follow-up of https://chromium-review.googlesource.com/c/chromium/src/+/1229240/ BUG: chromium:773617 , chromium:884776 Change-Id: I82edf733a696c4e0e6de20a8bac954e244d51365 Reviewed-on: https://chromium-review.googlesource.com/c/1333893 Commit-Queue: Ned Nguyen <nednguyen@google.com> Reviewed-by: Robert Ma <robertma@chromium.org> Reviewed-by: Erik Chen <erikchen@chromium.org> Cr-Commit-Position: refs/heads/master@{#607676} [modify] https://crrev.com/c91ef9689e59c6c83aa408367a68599556b9086f/third_party/blink/tools/blinkpy/web_tests/controllers/manager.py [modify] https://crrev.com/c91ef9689e59c6c83aa408367a68599556b9086f/third_party/blink/tools/blinkpy/web_tests/port/android.py [modify] https://crrev.com/c91ef9689e59c6c83aa408367a68599556b9086f/third_party/blink/tools/blinkpy/web_tests/port/base.py [modify] https://crrev.com/c91ef9689e59c6c83aa408367a68599556b9086f/third_party/blink/tools/blinkpy/web_tests/port/browser_test.py [modify] https://crrev.com/c91ef9689e59c6c83aa408367a68599556b9086f/third_party/blink/tools/blinkpy/web_tests/port/fuchsia.py [modify] https://crrev.com/c91ef9689e59c6c83aa408367a68599556b9086f/third_party/blink/tools/blinkpy/web_tests/port/mock_drt.py [modify] https://crrev.com/c91ef9689e59c6c83aa408367a68599556b9086f/third_party/blink/tools/blinkpy/web_tests/port/test.py
,
Dec 4
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by st...@chromium.org
, Sep 17