[WPT import] DCHECK enabled for CQ bots but not for Blink try bots. |
||||||||||||
Issue descriptionIn the W3C test auto-import today (https://codereview.chromium.org/2579783002) the initial try jobs had a Timeout result when running imported/wpt/workers/opaque-origin.html; so, the Timeout expectation was added added to TestExpectations and the regular CQ was tried. On the win_chromium_rel_ng and linux_chromium_rel_ng builders, the result was a crash rather than timeout. For both Win and Linux, the crash log started with something like: STDOUT: CONSOLE ERROR: line 4: Uncaught NotSupportedError: Failed to construct 'BroadcastChannel': Can't create BroadcastChannel in an opaque origin STDOUT: #CRASHED - renderer STDERR: [1:12:1215/090125.390914:2985388043:FATAL:WorkerOrWorkletScriptController.cpp(297)] Check failed: !m_globalScope->shouldSanitizeScriptError(state.m_location->url(), NotSharableCrossOrigin). This is a DCHECK at https://chromium.googlesource.com/chromium/src/+/master/third_party/WebKit/Source/bindings/core/v8/WorkerOrWorkletScriptController.cpp#296. Now, the Blink try bots have DCHECK disabled so that the layout test results are consistent with the continuous release builders (https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb_config.pyl#453) But, the CQ try bots have "release_trybot" in their config names, meaning that DCHECK is enabled. This is good for other tests besides the layout tests, since it may help to catch some problems. But, as a consequence, this means that sometimes the CQ bots ({linux,mac,win}_chromium_rel) will crash in some tests where the Blink try bots and waterfall would not have any crashes -- which means that what happens on the Blink try bots cannot reliably predict what will happen on the CQ. Idea #1: For W3C auto-imports, maybe we don't want to run layout tests using the default try bots, and instead we want try again with the tryserver.blink try bots? Idea #2: Maybe we actually never want to run layout tests with DCHECK turned on on the CQ, because this is inconsistent with the layout tests on the waterfall -- it's possible for someone to check in a test with a Crash expectation which passes the CQ, but doesn't crash (and thus fails) on the waterfall. Maybe we actually do want separate dedicated layout test try bots on tryserver.chromium.{android,linux,mac,win} (which would all have DCHECK disabled, and a subset of them would also be on the CQ)? And no separate tryserver.blink waterfall? Dirk, what do you think? Does #2 sound like a good idea long-term? Does #1 sound like a reasonable solution in the shorter term?
,
Dec 16 2016
So far, it happens once a month or less, and working around it without changing any try bots would be OK. Manually working around it involves patching in the CL made by the autoroller, and adding a Crash expectation to get it passed the CQ. (Actually, does that sound correct? The new crash expectation applies only to builders with DCHECK enabled, not to normal Release builds.) It's also possible to change the w3c-test-autoroller without changing any try bots: We'd want to also look at results from the CQ try bots, and if there are crash results there, then we'd add a Crash expectation and try again. Anyway, hypothetically, do you think it would be a bad idea to do add separate layout test CQ builders that build Release without DCHECK and run just the layout tests? This would imply adding another 3-4 CQ builders which do a build of content_shell etc. and run layout tests for each CQ run. This seems a little more correct for layout tests, but maybe not worth it overall?
,
Dec 16 2016
I think it's not worth it overall to add more builders, but we can maybe revisit once we have swarmed layout tests running well all the times it should be running and we can see the impact on fleet capacity.
,
Dec 16 2016
Does a dry run not use the same configuration as the CQ does? If it does, then maybe a dry run can be used to get the baselines? (And if not, why isn't everything on fire?)
,
Dec 16 2016
A CQ dry run uses the same configuration as the normal CQ as far as I know -- both use Release builds with DCHECK on. The reason why we can't use *only* the CQ bots *_chromium_rel_ng to get the baselines now is that they don't cover all supported platforms -- they only cover one version of mac, windows and linux, rather than all supported versions. But once a dry run is done, we can indeed check the results of those runs and see which tests crashed on those bots; if a test crashed on (some of) the CQ bots but not on the other Blink try bots, then that will usually indicate that it crashed due to tripping a DCHECK.
,
Dec 23 2016
,
Dec 24 2016
,
Dec 27 2016
Next action for this issue: Add functionality to the auto-import code (in webkitpy/w3c/deps_updater.py) to check the results of the CQ, and if there are Crash results, add a Crash expectation for the test, then try the CQ again. As long as there aren't often new crashes in tests triggered by DCHECK, this is probably not very high priority.
,
Jan 3 2017
,
Jan 3 2017
,
Mar 24 2017
A thought: In general, and not just for import, there's no set of expectations that's really specific and correct that lets layout tests with DCHECK failures get through the CQ. The most specific and correct expectation would be: crbug.com/xxx [ Debug ] test [ Crash ] But the configuration for CQ builders is still considered release, so this expectation wouldn't apply then. This means that to get through the CQ, we have to use the expectation: crbug.com/xxx test [ Crash Pass ] But this doesn't explicitly say that the crash is related to DCHECK, this test will apply non-flaky on the continuous builders. In most cases, for non-imported tests, the best way to solve this would be to never commit any tests that trigger DCHECK failures in the first place. But for imported tests we can't control this. Anyway, my best idea now for how to handle this would still be to modify the auto-importer code (now in webkitpy/w3c/test_importer.py) to check the results of the CQ, and if there are Crash results, add a flaky crash expectation and try the CQ again. Another alternative would be to trigger *_chromium_rel_ng after first uploading the import CL, update expectations to include both the results there and on the try bots.
,
May 22 2017
,
Jul 3 2017
,
Jul 3 2017
,
Jul 6 2017
,
Aug 4 2017
,
Oct 4 2017
robertma@, this is a P2 issue that hasn't seen any activity for 60 days. Should we demote it to P3 if it's not causing regular problems, or should it actually be fixed soonish?
,
Oct 4 2017
It only causes problems every once in a while (every 3-4 weeks?), but when this happens the wpt-import bot remains red for quite a while until we either manually skip the offending tests or fix the code that's triggering the DCHECKs, which affects the Q4 OKR to reduce WPT import latency. This last happened earlier this week: builds #2521 up to #2536 (Sept 30th to Oct 2nd) were red until https://chromium-review.googlesource.com/689357 fixed the underlying issue.
,
Oct 4 2017
Sorry this is actually my first time noticing this issue. In the past three months, the importer was tripped by DCHECK twice I think, which is lower than the threshold (1 or 2 times a month, same as suggested by dpranke@) I'd like to prioritize automation. It's therefore safe to downgrade to P3. Regarding the potential pragmatic fix discussed here (esp. comment 11), I think it's blocked on issue 739522 (rebaselining from CQ), which is also a P3.
,
Apr 11 2018
,
Apr 11 2018
It happened in issue 830490 again last week, but wasn't all bad, the human involvement probably increased the speed at which the Blink bugs were fixed. |
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by dpranke@chromium.org
, Dec 16 2016