WebKit Linux Trusty (dbg) - content_shell crashing in webkit_layout_tests |
||||||||
Issue descriptionA CL I landed was immediately reverted by sheriff because it caused a bot breakage: https://chromium-review.googlesource.com/c/chromium/src/+/882002 https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Linux%20Trusty/39500 Looking into the breakage I found that it tried to treat various tests as reftests, despite me removing the match link in the source and deleting the ref file. For example, for position-sticky-root-scroller.html: 11:18:45.457 13059 reference /b/s/w/ir/third_party/WebKit/LayoutTests/external/wpt/css/css-position/position-sticky-root-scroller-ref.html was not found 11:18:45.458 12904 [5061/10633] external/wpt/css/css-position/position-sticky-root-scroller.html failed unexpectedly (reference test didn't generate pixel results) 11:18:45.455 13066 worker/5 external/wpt/css/css-flexbox/flex-shrink-004.html passed 11:18:45.457 13059 worker/0 external/wpt/css/css-position/position-sticky-root-scroller.html failed: You can see that this file had its match link removed in the CL: https://chromium-review.googlesource.com/c/chromium/src/+/882002/4/third_party/WebKit/LayoutTests/external/wpt/css/css-position/position-sticky-root-scroller.html I'm unclear why this happened. Could the manifest not have been regenerated properly? I have seen that a few times on my local machine, but would hope that it wouldn't be able to happen on bots... This is blocking landing my CL again because the sheriffs are (rightly) worried that it'll cause another bot failure.
,
Jan 30 2018
$ python third_party/WebKit/Tools/Scripts/run-webkit-tests --no-retry-failures -t Default --iterations=1000 external/wpt/css/css-position/position-sticky-root-scroller.html ... Found 1 test; running 1 (1000 times each: --repeat-each=1 --iterations=1000), skipping 0. Running 1 content_shell. All 1000 tests ran as expected. From your description, this isn't that the test is flaky, it is that testharness or the test infrastructure is flaky. If my test is flaky then it should produce some failing test output, but it would still be test output.
,
Jan 30 2018
You're right about the distinction. Your tests LGTM, and it's definitely an issue in our infra (Python runner, fixture scripts, etc.). How urgent is your CL? I did some quick debugging at the similar issue 805463, but no luck so far. I also found it hard to reproduce the problem locally.
,
Jan 30 2018
FYI; the sheriff (bsep@) is telling me that the bot did *not* recover via reverting my CL. You (or a MTVer who knows WPT infra) may wish to take a look at https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Linux%20Trusty%20%28dbg%29/ with some urgency.
,
Jan 30 2018
I saw the initial failures across several other bots but they seem to have all recovered, so that one bot might just be unlucky.
,
Jan 30 2018
The log says content_shell crashed when checking system dependencies (SEGV_MAPERR): #0 0x7f8981ca979d [30242:30242:0130/143445.544698:ERROR:sandbox_linux.cc(375)] InitializeSandbox() called with multiple threads in process gpu-process. It looks like more than just being unlucky. Might be a genuine issue in GPU/sandbox setup (which I'm totally unfamiliar with), unrelated to this bug.
,
Jan 30 2018
,
Jan 30 2018
+kbr@ because GPUs
,
Jan 31 2018
I'm not sure that that error is the cause of the SIGSEGV. It might be, but not sure. The failures I see in the first failing build: https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Linux%20Trusty%20%28dbg%29/9251 are caused by the loader, and I think this CL: https://chromium.googlesource.com/chromium/src/+/f7246b9e21cd0065f3b5eb4699c90c61b8e8008c 11:12:36.098 30181 worker/5 virtual/outofblink-cors/external/wpt/service-workers/service-worker/fetch-event-network-error.https.html crashed, (stderr lines): 11:12:36.098 30181 [1:1:0130/111234.455411:ERROR:render_process_impl.cc(213)] WebFrame LEAKED 16 TIMES 11:12:36.098 30181 [32606:32705:0130/111235.249279:WARNING:http_cache_transaction.cc(1241)] Unable to create cache entry 11:12:36.098 30181 [1:1:0130/111235.320117:FATAL:DocumentThreadableLoader.cpp(1050)] Check failed: fallback_request_for_service_worker_.IsNull(). 11:12:36.098 30181 #0 0x7fc1d35e479d base::debug::StackTrace::StackTrace() 11:12:36.098 30181 #1 0x7fc1d35e2c8c base::debug::StackTrace::StackTrace() 11:12:36.098 30181 #2 0x7fc1d366cf2a logging::LogMessage::~LogMessage() 11:12:36.098 30181 #3 0x7fc1cba9cc9d blink::DocumentThreadableLoader::HandleReceivedData() 11:12:36.098 30181 #4 0x7fc1cba9caf7 blink::DocumentThreadableLoader::DataReceived() 11:12:36.098 30181 #5 0x7fc1c8fa121c blink::Resource::AppendData() 11:12:36.098 30181 #6 0x7fc1c8f97660 blink::RawResource::AppendData() 11:12:36.098 30181 #7 0x7fc1c8fe47df blink::ResourceLoader::DidReceiveData() 11:12:36.098 30181 #8 0x7fc1d72f4d8b content::WebURLLoaderImpl::Context::OnReceivedData() 11:12:36.098 30181 #9 0x7fc1d72f610f content::WebURLLoaderImpl::RequestPeerImpl::OnReceivedData() 11:12:36.098 30181 #10 0x7fc1d72e9076 content::URLResponseBodyConsumer::OnReadable() 11:12:36.099 30181 #11 0x7fc1d72e5db5 content::URLLoaderClientImpl::OnStartLoadingResponseBody() 11:12:36.099 30181 #12 0x7fc1d4e29ebe content::ThrottlingURLLoader::OnStartLoadingResponseBody() Maybe that CL's been reverted in the meantime. I do see the following on the bot: 15:38:46.666 13194 "/b/s/w/ir/out/Debug/content_shell --check-layout-test-sys-deps" took 4.31s 15:38:46.666 13194 System dependencies check failed. 15:38:46.666 13194 To override, invoke with --nocheck-sys-deps 15:38:46.666 13194 15:38:46.666 13194 Xlib: extension "RANDR" missing on display ":100". DevTools listening on ws://127.0.0.1:36032/devtools/browser/90550195-978f-495c-81f7-7b801f80e132 Received signal 11 SEGV_MAPERR 000000000000 #0 0x7fbb64e5679d base::debug::StackTrace::StackTrace() #1 0x7fbb64e54c8c base::debug::StackTrace::StackTrace() #2 0x7fbb64e56165 base::debug::(anonymous namespace)::StackDumpSignalHandler() #3 0x7fbb6a541330 <unknown> r8: 0000000000000000 r9: 0000000000000009 r10: 0000007cf757fd60 r11: 0000000000000000 r12: 0000007cf786db20 r13: 0000007cf7784144 r14: 0000000000000004 r15: 0000000000000004 di: 000000000000002e si: 0000000000000000 bp: 00007fbb3d2b14a0 bx: 00007fbb3d2b1878 dx: 000000000000002d ax: 000000000000002e cx: 000000000000002e sp: 00007fbb3d2b1468 ip: 0000000000000000 efl: 0000000000010202 cgf: 0000000000000033 erf: 0000000000000014 trp: 000000000000000e msk: 0000000000000000 cr2: 0000000000000000 [end of stack trace] I don't see any CLs in the first failing build that are GPU-related and it's possible and likely that something landed while the bot was red. Can someone please look at the blamelists of the intervening failing builds and see if something suspicious is there?
,
Jan 31 2018
It was red previously because of failures in webkit_layout_tests and now webkit_layout_tests don't run at all, so it would have to be the first failing build, wouldn't it? It was only red for two builds before this cropped up: https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Linux%20Trusty%20%28dbg%29/9251 https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Linux%20Trusty%20%28dbg%29/9252 Nothing jumps out at me, but I also don't have any specific knowledge here.
,
Jan 31 2018
tl;dr I'm going to revert the angle roll and disable the auto roller This "System dependencies check failed." thing is still persisting on the debug bot per #c10 https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Linux%20Trusty%20%28dbg%29/9252 - isolate shards mostly ran OK - just some position-sticky-* layout tests failures https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Linux%20Trusty%20%28dbg%29/9253 - suddenly all shards fail. https://chromium-review.googlesource.com/889761 - -Wimplicit-fallthrough https://chromium-review.googlesource.com/890518 - webrtc https://chromium-review.googlesource.com/893641 - angle roll "Vulkan: Add the Vulkan API to gpu_test_expectations." --> https://chromium-review.googlesource.com/891839 -- current suspect https://chromium-review.googlesource.com/893646 - catapault roll "[Devil] Use ListProcesses to count processes in device_monitor" --> https://chromium-review.googlesource.com/893561 https://chromium-review.googlesource.com/882961 - sqlite https://chromium-review.googlesource.com/887792 - "Remove Instrumentation Test Runners" (android) https://chromium-review.googlesource.com/887631 - BrowserLifetimeHandler: Require true/false update arg https://chromium-review.googlesource.com/893187 - passwords/ android https://chromium-review.googlesource.com/893647 - xml/android https://chromium-review.googlesource.com/893536 - the revert discussed here https://chromium-review.googlesource.com/893270 - Remove broken IsSVG*Element overload https://chromium-review.googlesource.com/893803 - Fix SitePerProcessMouseWheelHitTestBrowserTest.* tests on Win official. https://chromium-review.googlesource.com/893184 [PE] Fix edge-cases in SVGGeometryElement::PathLengthScaleFactor https://chromium-review.googlesource.com/884368 - autofill https://chromium-review.googlesource.com/884846 - Settings/OOBE: Network: Handle connect failures in UI
,
Jan 31 2018
+jmadill for "If the [angle] roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary."
,
Jan 31 2018
autoroller is stopped. DEPS rollback going through CQ in https://chromium-review.googlesource.com/c/chromium/src/+/895122
,
Jan 31 2018
there's a green run in https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Linux%20Trusty%20(dbg)/9264 which has another angle roll. I've unticked the CQ bit on the CL. autoroller still stopped.
,
Jan 31 2018
... it also has a skia roll which has an angle rollception https://skia.googlesource.com/skia.git/+log/51494f6615b8..ac568a934f8f -> https://chromium.googlesource.com/angle/angle.git/+/bd6ae4aa145daea5869a2c86bb962d37a71bd264 ¯\_(ツ)_/¯
,
Jan 31 2018
another green run. https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Linux%20Trusty%20%28dbg%29/9265 restarted the auto roller at https://angle-chromium-roll.skia.org/ Let's see how we go..
,
Jan 31 2018
Seems this bug was sort of taken over by attempts to get the tree green. Renaming to remove the WPT reference; I'll open another bug for robertma@ and folks to look at that.
,
Jan 31 2018
Closing this. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by robertma@chromium.org
, Jan 30 2018