CRWCertVerificationController mixes incorrectly GCD and Chromium threads |
|||||
Issue descriptionI've encountered the following crash today while running tests. [ RUN ] AllForms/FormStructureBrowserTest.DataDrivenHeuristics/115 2018-06-18 17:37:41.586643+0200 ios_chrome_unittests[2749:71202402] Could not signal service com.apple.WebKit.WebContent: 113: Could not find specified service 2018-06-18 17:37:41.588474+0200 ios_chrome_unittests[2749:71202402] Could not signal service com.apple.WebKit.Networking: 113: Could not find specified service [2749:771:0618/173742.593923:2879176673000417:ERROR:crw_web_controller.mm(2539)] JavaScript error: Script error. URL:https://chromium.test/ [2749:771:0618/173742.598875:2879176677921560:ERROR:crw_web_controller.mm(2539)] JavaScript error: Script error. URL:https://chromium.test/ 2018-06-18 17:37:42.940297+0200 ios_chrome_unittests[2749:71202402] Could not signal service com.apple.WebKit.WebContent: 113: Could not find specified service 2018-06-18 17:37:42.941135+0200 ios_chrome_unittests[2749:71202402] Could not signal service com.apple.WebKit.Networking: 113: Could not find specified service [2749:771:0618/173743.818081:2879177897204745:ERROR:crw_web_controller.mm(2539)] JavaScript error: Script error. URL:https://chromium.test/ 2018-06-18 17:37:44.415148+0200 ios_chrome_unittests[2749:71202402] Could not signal service com.apple.WebKit.WebContent: 113: Could not find specified service 2018-06-18 17:37:44.416150+0200 ios_chrome_unittests[2749:71202402] Could not signal service com.apple.WebKit.Networking: 113: Could not find specified service [2749:771:0618/173745.182205:2879179261409212:FATAL:crw_cert_verification_controller.mm(85)] Check failed: ::web::WebThread::CurrentlyOn(web::WebThread::UI). Must be called on Web_UIThread; actually called on Unknown Thread. The crash happens here, in the DCHECK_CURRENTLY_ON: - (void)decideLoadPolicyForTrust:(base::ScopedCFTypeRef<SecTrustRef>)trust host:(NSString*)host completionHandler:(web::PolicyDecisionHandler)completionHandler { DCHECK_CURRENTLY_ON(web::WebThread::UI); DCHECK(completionHandler); [self verifyTrust:trust completionHandler:^(SecTrustResultType trustResult) { DCHECK_CURRENTLY_ON(web::WebThread::UI); if (trustResult == kSecTrustResultProceed || trustResult == kSecTrustResultUnspecified) { completionHandler(web::CERT_ACCEPT_POLICY_ALLOW, net::CertStatus()); return; } [self decideLoadPolicyForRejectedTrustResult:trustResult serverTrust:trust host:host completionHandler:completionHandler]; }]; } When this line is executed, the test fixture has already been destroyed, and the web::WebThread destroyed. This mean that there is no UI thread, and the DCHECK fail. Looking at the implementation of -verifyTrust:completionHandler:, we see the following: - (void)verifyTrust:(base::ScopedCFTypeRef<SecTrustRef>)trust completionHandler:(void (^)(SecTrustResultType))completionHandler { DCHECK_CURRENTLY_ON(web::WebThread::UI); DCHECK(completionHandler); // SecTrustEvaluate performs trust evaluation synchronously, possibly making // network requests. The UI thread should not be blocked by that operation. base::PostTaskWithTraits( FROM_HERE, {base::TaskShutdownBehavior::BLOCK_SHUTDOWN}, base::BindOnce(^{ SecTrustResultType trustResult = kSecTrustResultInvalid; if (SecTrustEvaluate(trust.get(), &trustResult) != errSecSuccess) { trustResult = kSecTrustResultInvalid; } // Use GCD API, which is guaranteed to be called during shutdown. dispatch_async(dispatch_get_main_queue(), ^{ completionHandler(trustResult); }); })); } So the completionHandler passed to -verifyTrust:completionHandler: is executed by GCD which means it is executed even if the web::WebThread is destroyed. This mean that code in the completion handler cannot assume that the web::WebThread still exists when it is executed. The DCHECK can be fixed by first checking whether the thread exists, but I don't know how to fix -decideLoadPolicyForRejectedTrustResult:serverTrust:host:completionHandler that uses web::WebThread to post a thread on IO thread. I think that the dispatch_async should be removed and instead the code fixed to use web::WebThread only.
,
Jun 19 2018
Then you also have to change -decideLoadPolicyForRejectedTrustResult:serverTrust:host:completionHandler: because it also call web::WebThread::PostTask:
- (void)
decideLoadPolicyForRejectedTrustResult:(SecTrustResultType)trustResult
serverTrust:(base::ScopedCFTypeRef<SecTrustRef>)trust
host:(NSString*)host
completionHandler:(web::PolicyDecisionHandler)handler {
DCHECK_CURRENTLY_ON(web::WebThread::UI);
DCHECK(handler);
web::WebThread::PostTask(
web::WebThread::IO, FROM_HERE, base::BindOnce(^{
...
Since this is the method that invoke the real completionHandler, it is also not guaranteed to be run (for the same reason).
,
Jun 19 2018
If the API requires this to use GCD, then it should be used all the way. Not mixed with web::WebThread API IMO. Note that the only time web::WebThread is not running a task is when the app is shutdown while the task is in the TaskRunner queue. In that case, the IO thread won't exists, so -decideLoadPolicyForRejectedTrustResult:serverTrust:host:completionHandler: won't be able to post the task on IO thread and the completionHandler won't be executed. Thus I think using GCD in verifyTrust:completionHandler: does not help as there is still no guarantee that the completionHandler will be executed anyway (at least with current implementation).
,
Jun 19 2018
Comment #3 has a good point. We can try switching everything to web::WebThread behind 1% experiment and compare CPM. Is this bug blocking your work or something?
,
Jun 20 2018
This bug is not blocking anything. It is just making some tests flaky (AllForms/FormStructureBrowserTest.DataDrivenHeuristics/*).
,
Jun 20 2018
eugenebut@ who do you think should work on this? should we mark as available?
,
Jun 20 2018
Switching to GCD is a lot of work. Can we fix FormStructureBrowserTest flakiness in some other way? Perhaps by making sure that TestThreadBundle is destroyed after all callbacks completed.
,
Jun 21 2018
We already do wait for tasks posted to the TestScopedTaskEnvironment to complete. But we do not wait for all tasks posted to GCD to complete. So the task posted in -verifyTrust:completionHandler: (by base::PostTaskWithTraits) is executed. During its execution it post to GCD. Here there is a race condition. The code can either see there is no pending tasks on Chrome's TaskRunners if it is executed before the GCD has time to execute, or it can see the task posted from the block executed by GCD if the GCD happened before.
,
Jun 25 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/398625c5c2b7a000e2ed711069b25647a9fd89b7 commit 398625c5c2b7a000e2ed711069b25647a9fd89b7 Author: Eugene But <eugenebut@chromium.org> Date: Mon Jun 25 14:10:30 2018 Use WebThread::PostTask in CRWCertVerificationController. GCD API was used to make sure that completion callbacks are called during the app shutdown. Calling completion handler is required by WKWebView, otherwise web view throws an exception. This CL uses WebThread::PostTask instead of GCD behind feature flag. This will allow to run an experiment to see if using WebThread::PostTask leads to crashes during app shutdown. WebThread::PostTask is preferable for testing, because there is no good way to wait until GCD tasks are completed in a unit test. Bug: 853774 Cq-Include-Trybots: luci.chromium.try:ios-simulator-full-configs;master.tryserver.chromium.mac:ios-simulator-cronet Change-Id: Ie9eefa7f7cef0c7432a2f733086fc5d630d8a969 Reviewed-on: https://chromium-review.googlesource.com/1113182 Reviewed-by: Sylvain Defresne <sdefresne@chromium.org> Commit-Queue: Eugene But <eugenebut@chromium.org> Cr-Commit-Position: refs/heads/master@{#570028} [modify] https://crrev.com/398625c5c2b7a000e2ed711069b25647a9fd89b7/ios/web/features.mm [modify] https://crrev.com/398625c5c2b7a000e2ed711069b25647a9fd89b7/ios/web/net/crw_cert_verification_controller.mm [modify] https://crrev.com/398625c5c2b7a000e2ed711069b25647a9fd89b7/ios/web/public/features.h
,
Aug 21
Metric results should be available after launching M69. The earliest we can fix this bug is M71.
,
Sep 7
We have results from UseWebThreadPostTask variations experiment. Switching to WebThread::PostTask introduces new crashes with __exceptionPreprocess signature. These crashes happen because posted task is not executed on shutdown, WKNavigationDelegate completion handler is not called and WKWebView throws an exception. Sylvain brought very interesting point in comment #3. It is possible that using Chromium threading API in other places in CRWCertVerificationController introduces low volume of __exceptionPreprocess crashes. It seems like the right fix here is to use GCD API for all threading code in CRWCertVerificationController. However it's unclear how to make GCD execute task on Chromium's IO thread.
,
Sep 7
Experiment results were posted here: cl/211984761
,
Sep 10
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c27c9d9df5dc0253e2ed8dc5a1332195f4983594 commit c27c9d9df5dc0253e2ed8dc5a1332195f4983594 Author: Eugene But <eugenebut@google.com> Date: Mon Sep 10 21:16:37 2018 Remove UseWebThreadInCertVerificationController experiment. Experiment results were collected in M69. Bug: 853774 Cq-Include-Trybots: luci.chromium.try:ios-simulator-cronet;luci.chromium.try:ios-simulator-full-configs Change-Id: I44d71ec5a620b8ac5b9673161dbf9172cfc67fdf Reviewed-on: https://chromium-review.googlesource.com/1213268 Reviewed-by: Sylvain Defresne <sdefresne@chromium.org> Commit-Queue: Eugene But <eugenebut@chromium.org> Cr-Commit-Position: refs/heads/master@{#590064} [modify] https://crrev.com/c27c9d9df5dc0253e2ed8dc5a1332195f4983594/ios/web/features.mm [modify] https://crrev.com/c27c9d9df5dc0253e2ed8dc5a1332195f4983594/ios/web/net/crw_cert_verification_controller.mm [modify] https://crrev.com/c27c9d9df5dc0253e2ed8dc5a1332195f4983594/ios/web/public/features.h
,
Sep 12
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f496a4bd2ac12e8f57541b13365f1624f2dc00db commit f496a4bd2ac12e8f57541b13365f1624f2dc00db Author: Eugene But <eugenebut@google.com> Date: Wed Sep 12 15:43:36 2018 Use TaskShutdownBehavior::BLOCK_SHUTDOWN in CRWCertVerificationController BLOCK_SHUTDOWN will ensure that CRWCertVerificationController completion handler are always called on shutdown. WKWebView throws an exception if completionHandler is not called, which leads to background crash. Bug: 853774 Cq-Include-Trybots: luci.chromium.try:ios-simulator-cronet;luci.chromium.try:ios-simulator-full-configs Change-Id: I875319ce84b3f6ed5e1e67059381337b97cc182d Reviewed-on: https://chromium-review.googlesource.com/1213509 Reviewed-by: Sylvain Defresne <sdefresne@chromium.org> Commit-Queue: Eugene But <eugenebut@chromium.org> Cr-Commit-Position: refs/heads/master@{#590694} [modify] https://crrev.com/f496a4bd2ac12e8f57541b13365f1624f2dc00db/ios/web/net/crw_cert_verification_controller.mm
,
Sep 19
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/203f9c7e70c3b5660b43dd30244e91181173222c commit 203f9c7e70c3b5660b43dd30244e91181173222c Author: Eugene But <eugenebut@google.com> Date: Wed Sep 19 19:34:13 2018 Replace dispatch_async with base::PostTaskWithTraits. TaskShutdownBehavior::BLOCK_SHUTDOWN is used to make sure that completion handler is always called on shutdown. Bug: 853774 Cq-Include-Trybots: luci.chromium.try:ios-simulator-cronet;luci.chromium.try:ios-simulator-full-configs Change-Id: Ief6560419f79055ce9a8d0de47806db5f3f9c9d4 Reviewed-on: https://chromium-review.googlesource.com/1226195 Reviewed-by: Sylvain Defresne <sdefresne@chromium.org> Commit-Queue: Eugene But <eugenebut@chromium.org> Cr-Commit-Position: refs/heads/master@{#592517} [modify] https://crrev.com/203f9c7e70c3b5660b43dd30244e91181173222c/ios/web/net/crw_cert_verification_controller.mm [modify] https://crrev.com/203f9c7e70c3b5660b43dd30244e91181173222c/ios/web/public/test/web_test_with_web_state.mm
,
Sep 19
,
Dec 13
We had to go back to GCD API in crrev.com/c/1352347 to ensure that WKWebView callbacks are called on shutdown. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by eugene...@chromium.org
, Jun 18 2018