New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 830917 link

Starred by 6 users

Issue metadata

Status: Fixed
Owner: ----
Closed: Oct 26
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

DNS over HTTP: Potential use-after-free.

Project Member Reported by mmenke@chromium.org, Apr 9 2018

Issue description

When DNS over HTTP is enabled, the following probably happens (I have not yet been able to verify it - wanted a test before filing this bug, to very, but it's tricky to do, in a unit test):

1)  Original HTTP socket request comes in to the socket pool.
2)  We create a new, empty group, if one didn't already exist for the origin+privacy mode pair.
3)  We create a ConnectJob and start it.
4)  The ConnectJob starts a DNS lookup.
5)  When using HTTP over DNS, this re-enters the socket pool with a new socket request.
6)  Before looking for sockets that can be reused, the socket pool looks through empty groups and closes them.  We destroy the aforementioned new group here, as it's still empty (We only add to it once we get back the synchronous net::Error from the ConnectJob).
7)  Then when the DNS lookup returns ERR_IO_PENDING to the outer connect job, the socket pool tries to add the ConnectJob to the group that has already been deleted, resulting in a crash.

The simplest solution here is probably just to start DNS HTTP requests asynchronously.  Testing this is going to be a bit more difficult, I think.

[lassey]:  Do you have time to take this on, or should I do it?
 
I merged the HTTPATTEMPT to chromium_56, then i can reproduce this issue easily, but i haven't tried it with the latest code, and it's seems very hard to verify it in unit test.

Comment 2 Deleted

I think just enable this feature and apply an doh server even though it's an invalid server, it should can be reproduced.

@lassey @mmenke had you ever try this feature with any doh server? i am building the latest chromium now, if you had  tried this with a doh server, maybe my code is too old.
  
BTW, the RequestSocektInternal may remove the group, so why just check it in 

void ClientSocketPoolBaseHelper::RequestSockets(
    const std::string& group_name,
    const Request& request,
    int num_sockets,
    HttpRequestInfo::RequestMotivation motivation);

not checked in 

int ClientSocketPoolBaseHelper::RequestSocket(
    const std::string& group_name,
    std::unique_ptr<Request> request)
I tried with the latest code, crash not happend.
I tried with the content_shell. then reproduce it. bellow is the log.

29470 F/chromium(13198): [13198:13216:0411/104444.178428:FATAL:client_socket_pool_base.cc(954)] Check failed: group_it != group_map_.end().
29471 F/chromium(13198): #00 0xafd5f645 /data/app/org.chromium.content_shell_apk-1/lib/arm/libbase.cr.so+0x000d2645
29472 F/chromium(13198): #01 0xa116c001 /data/app/org.chromium.content_shell_apk-1/lib/arm/libnet.cr.so+0x00404001
29473 F/chromium(13198): #02 0xa116a04f /data/app/org.chromium.content_shell_apk-1/lib/arm/libnet.cr.so+0x0040204f
29474 F/chromium(13198): #03 0xa103349d /data/app/org.chromium.content_shell_apk-1/lib/arm/libnet.cr.so+0x002cb49d
29475 F/chromium(13198): #04 0xa103348f /data/app/org.chromium.content_shell_apk-1/lib/arm/libnet.cr.so+0x002cb48f
29476 F/chromium(13198): #05 0xa1035d25 /data/app/org.chromium.content_shell_apk-1/lib/arm/libnet.cr.so+0x002cdd25
29477 F/chromium(13198): #06 0xa1176491 /data/app/org.chromium.content_shell_apk-1/lib/arm/libnet.cr.so+0x0040e491
29478 F/chromium(13198): #07 0xa103349d /data/app/org.chromium.content_shell_apk-1/lib/arm/libnet.cr.so+0x002cb49d
29479 F/chromium(13198): #08 0xa103348f /data/app/org.chromium.content_shell_apk-1/lib/arm/libnet.cr.so+0x002cb48f
29480 F/chromium(13198): #09 0xa117734d /data/app/org.chromium.content_shell_apk-1/lib/arm/libnet.cr.so+0x0040f34d


if use https server, the log will be

30009 F/chromium(15171): [15171:15202:0411/105642.700755:FATAL:client_socket_pool_base.cc(1353)] Check failed: unassigned_job_count_ <= jobs_.size() (2703275512 vs. 2702634125)
30010 F/chromium(15171): #00 0xafd5f645 /data/app/org.chromium.content_shell_apk-1/lib/arm/libbase.cr.so+0x000d2645
30011 F/chromium(15171): #01 0xa116c8d1 /data/app/org.chromium.content_shell_apk-1/lib/arm/libnet.cr.so+0x004048d1
30012 F/chromium(15171): #02 0xa116b133 /data/app/org.chromium.content_shell_apk-1/lib/arm/libnet.cr.so+0x00403133
30013 F/chromium(15171): #03 0xa116aa37 /data/app/org.chromium.content_shell_apk-1/lib/arm/libnet.cr.so+0x00402a37

Seems the group had been deleted.

Comment 6 by lassey@google.com, Apr 12 2018

> The simplest solution here is probably just to start DNS HTTP requests asynchronously.  Testing this is going to be a bit more difficult, I think.
Perhaps we should check that the empty 
I had another thought that we could add an auto reference count to Group that gets checked by that clean up code in addition to emptiness. That would solve a more general problem of reentrancy in this code.

> [lassey]:  Do you have time to take this on, or should I do it?
I could do the change I propose. Changing it to start requests async seems more involved on the surface though.

Comment 7 by lassey@google.com, Apr 12 2018

After looking at it, I don't think we need to get a reference to that group until we're ready to insert the pending request into it, and not at all if we don't insert it. RequestSocketInternal() will also call GetOrCreateGroup(), but it looks like a ton of that logic would be skipped if we had a GetGroup()) call that returned NULL if none existed.

I put up a CL, LMKWYT

Comment 8 by mmenke@chromium.org, Apr 12 2018

I don't think we want to just add reference counting here (I do agree that we should look into a redesign of how group lifetimes work - this is certainly not the first time we're run into group-related UAFs).

Another option would be to add the job to the group before starting the connect job.  We'd need to pull it out instantly, in the rare sync success case, but it avoids having two paths go the case GetGroup finds a group and the path where it doesn't, though the "CloseOneIdleSocketExceptInGroup" still seems not great (Without a redesign - closing a socket in another pool could have unexpected consequences in the current one, which it would be nice not to have to think about).

Anyhow, I'll look at your CL, but I'm likely to be pretty heavily loaded today, so may not get to it until tomorrow.

Comment 9 by lassey@chromium.org, Apr 12 2018

no rush

Comment 10 by lassey@google.com, Apr 17 2018

Hui, how did you reproduce this crash in content? I'm wondering if your reproducing steps are most of the way to a test case.
I tried with content_shell. bellow is my  changes

+++ b/content/shell/browser/shell_url_request_context_getter.cc
@@ -31,6 +31,10 @@
 #include "net/cookies/cookie_store.h"
 #include "net/dns/host_resolver.h"
 #include "net/dns/mapped_host_resolver.h"
+#include "net/dns/host_resolver.h"
+#include "net/dns/dns_client.h"
+#include "net/dns/dns_config_service.h"
+#include "net/dns/host_resolver_impl.h"
 #include "net/http/http_network_session.h"
 #include "net/net_buildflags.h"
 #include "net/proxy_resolution/proxy_config_service.h"
@@ -67,6 +71,8 @@ class IgnoresCTPolicyEnforcer : public net::CTPolicyEnforcer {
   }
 };
 
+const uint8_t kDNSpod[] = {0x77, 0x1D, 0x1D, 0x1D};
+
 }  // namespace
 
 ShellURLRequestContextGetter::ShellURLRequestContextGetter(
@@ -181,14 +187,16 @@ net::URLRequestContext* ShellURLRequestContextGetter::GetURLRequestContext() {
         ignore_certificate_errors_;
     builder.set_http_network_session_params(network_session_params);
 
-    if (command_line.HasSwitch(network::switches::kHostResolverRules)) {
-      std::unique_ptr<net::MappedHostResolver> mapped_host_resolver(
-          new net::MappedHostResolver(
-              net::HostResolver::CreateDefaultResolver(net_log_)));
-      mapped_host_resolver->SetRulesFromString(command_line.GetSwitchValueASCII(
-          network::switches::kHostResolverRules));
-      builder.set_host_resolver(std::move(mapped_host_resolver));
-    }
+    // if (command_line.HasSwitch(network::switches::kHostResolverRules)) {
+    //   std::unique_ptr<net::MappedHostResolver> mapped_host_resolver(
+    //       new net::MappedHostResolver(
+    //           net::HostResolver::CreateDefaultResolver(net_log_)));
+    //   mapped_host_resolver->SetRulesFromString(command_line.GetSwitchValueASCII(
+    //       network::switches::kHostResolverRules));
+    //   builder.set_host_resolver(std::move(mapped_host_resolver));
+    // }
+    builder.set_host_resolver(
+          net::HostResolver::CreateDefaultResolver(nullptr));
 
     // Keep ProtocolHandlers added in sync with
     // ShellContentBrowserClient::IsHandledURL().
@@ -223,6 +231,23 @@ net::URLRequestContext* ShellURLRequestContextGetter::GetURLRequestContext() {
     url_request_context_ = builder.Build();
   }
 
+  net::HostResolverImpl*
+      resolver =  reinterpret_cast<net::HostResolverImpl*>(
+                      url_request_context_->host_resolver());
+  resolver->SetDnsClientEnabled(true);
+  std::unique_ptr<net::DnsClient> custom_client =
+      net::DnsClient::CreateClient(nullptr);
+  net::DnsConfig dns_config;
+  dns_config.nameservers.push_back(net::IPEndPoint(net::IPAddress(kDNSpod), 53));
+  dns_config.use_local_ipv6 = false;
+  dns_config.attempts = 1;
+  dns_config.dns_over_https_servers.push_back(
+      net::DnsConfig::DnsOverHttpsServerConfig(
+          GURL("http://119.29.29.29/d?dn="), false));
+  custom_client->SetConfig(dns_config);
+  resolver->SetDnsClient(std::move(custom_client));
+  resolver->SetRequestContext(url_request_context_.get());
+
   return url_request_context_.get();
 }
 
(END)


I think maybe the test case just like the URLRequest_unitest. 

Apply the DOH to URLRequestContext's resolver then start serval URLRequests, if the DOH server is invalid then should try UDPAttempt after the HTTPAttempt failed. but it crashed due to the group-related UAFs.

Comment 12 by lassey@google.com, Apr 18 2018

Cc: wanghui0...@gmail.com

Comment 13 Deleted

I tried write an net_unitest to reproduce this issue, but found the dns_client will be override by ScopedDefaultHostResolverProc and it's hard to reproduce it
when i create one URLRequest from the URLRequestContext which had been customed. we should create several requests  then it can be reproduced.

Comment 15 by lassey@google.com, Apr 18 2018

If you have a unit test that can reproduce it even sporadically, that would be useful. Can you post a patch for that?
I will try it again today
I tried it with a http server. haven't reproduce with https server now. 
This patch contains my changes to repair this bug.  please have a look too.
if i add the doh server "https://119.29.29.29" then i can reproduce it with content_shell, but cann't reproduce with this test, if set the doh server to "https://dns.google.com" then it cann't be reprodued.
Project Member

Comment 20 by bugdroid1@chromium.org, May 29 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b591d330308729fe790f1893f7896da7c5e59d7b

commit b591d330308729fe790f1893f7896da7c5e59d7b
Author: Brad Lassey <lassey@chromium.org>
Date: Tue May 29 16:08:51 2018

Don't call GetOrCreateGroup until we need a group

Bug:  830917 
Change-Id: I52deae4c68054b0dcef62b18e7b263830b438162
Reviewed-on: https://chromium-review.googlesource.com/1010507
Commit-Queue: Brad Lassey <lassey@chromium.org>
Reviewed-by: Matt Menke <mmenke@chromium.org>
Cr-Commit-Position: refs/heads/master@{#562448}
[modify] https://crrev.com/b591d330308729fe790f1893f7896da7c5e59d7b/net/BUILD.gn
[modify] https://crrev.com/b591d330308729fe790f1893f7896da7c5e59d7b/net/socket/client_socket_pool_base.cc
[add] https://crrev.com/b591d330308729fe790f1893f7896da7c5e59d7b/net/url_request/http_with_dns_over_https_unittest.cc

Project Member

Comment 21 by bugdroid1@chromium.org, May 29 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c4bbe637aeafcef7e5cc44ce671d63898cd3f2b7

commit c4bbe637aeafcef7e5cc44ce671d63898cd3f2b7
Author: Findit <findit-for-me@appspot.gserviceaccount.com>
Date: Tue May 29 20:28:07 2018

Revert "Don't call GetOrCreateGroup until we need a group"

This reverts commit b591d330308729fe790f1893f7896da7c5e59d7b.

Reason for revert:

Findit (https://goo.gl/kROfz5) identified CL at revision 562448 as the
culprit for failures in the build cycles as shown on:
https://findit-for-me.appspot.com/waterfall/culprit?key=ag9zfmZpbmRpdC1mb3ItbWVyRAsSDVdmU3VzcGVjdGVkQ0wiMWNocm9taXVtL2I1OTFkMzMwMzA4NzI5ZmU3OTBmMTg5M2Y3ODk2ZGE3YzVlNTlkN2IM

Sample Failed Build: https://ci.chromium.org/buildbot/chromium.memory/Linux%20MSan%20Tests/10089

Sample Failed Step: net_unittests

Original change's description:
> Don't call GetOrCreateGroup until we need a group
> 
> Bug:  830917 
> Change-Id: I52deae4c68054b0dcef62b18e7b263830b438162
> Reviewed-on: https://chromium-review.googlesource.com/1010507
> Commit-Queue: Brad Lassey <lassey@chromium.org>
> Reviewed-by: Matt Menke <mmenke@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#562448}

Change-Id: I3064c71c087498a908ab9380b560290a3f38b49a
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  830917 
Reviewed-on: https://chromium-review.googlesource.com/1077033
Cr-Commit-Position: refs/heads/master@{#562569}
[modify] https://crrev.com/c4bbe637aeafcef7e5cc44ce671d63898cd3f2b7/net/BUILD.gn
[modify] https://crrev.com/c4bbe637aeafcef7e5cc44ce671d63898cd3f2b7/net/socket/client_socket_pool_base.cc
[delete] https://crrev.com/0d2376d6e43bfbefecdc5028d0a91047a9c4480e/net/url_request/http_with_dns_over_https_unittest.cc

The group was deleted due to it's empty.
Why not add this job into the group just
behind the socket's connect was involved?
Project Member

Comment 23 by bugdroid1@chromium.org, May 31 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d44bd427b3b457dc41152b26b343b1ea9f86aba7

commit d44bd427b3b457dc41152b26b343b1ea9f86aba7
Author: Brad Lassey <lassey@chromium.org>
Date: Thu May 31 22:45:23 2018

Reland "Don't call GetOrCreateGroup until we need a group"

This is a reland of b591d330308729fe790f1893f7896da7c5e59d7b

Original change's description:
> Don't call GetOrCreateGroup until we need a group
> 
> Bug:  830917 
> Change-Id: I52deae4c68054b0dcef62b18e7b263830b438162
> Reviewed-on: https://chromium-review.googlesource.com/1010507
> Commit-Queue: Brad Lassey <lassey@chromium.org>
> Reviewed-by: Matt Menke <mmenke@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#562448}

Bug:  830917 
Change-Id: Ib998555ec3840434b35d7b5c491f514aed966a11
Reviewed-on: https://chromium-review.googlesource.com/1080587
Reviewed-by: Matt Menke <mmenke@chromium.org>
Commit-Queue: Brad Lassey <lassey@chromium.org>
Cr-Commit-Position: refs/heads/master@{#563411}
[modify] https://crrev.com/d44bd427b3b457dc41152b26b343b1ea9f86aba7/net/BUILD.gn
[modify] https://crrev.com/d44bd427b3b457dc41152b26b343b1ea9f86aba7/net/socket/client_socket_pool_base.cc
[add] https://crrev.com/d44bd427b3b457dc41152b26b343b1ea9f86aba7/net/url_request/http_with_dns_over_https_unittest.cc

Status: Fixed (was: Untriaged)

Sign in to add a comment