New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Browser memory can grow upto 1GB because of HttpStreamFactoryImpl::JobController leak

Project Member Reported by ssid@chromium.org, Jan 5 2017

Issue description

The browser on my Linux machine grows more than 1GB in 5 days on my Google on-corp machine. Collecting a heap profile snapshot of browser memory shows that 500MiB of allocations are being made by net/ codebase.

The main allocation sites are:
net::(anonymous namespace)::DefaultJobFactory::CreateJob()
net::HttpStreamFactoryImpl::Job::Job()
These ClientSocketHandle objects hold read and write socket buffers and certificates.

For more details on the experiment and trace files, refer to this doc:
https://docs.google.com/document/d/1XKx4-AMBxPxfJ0jDlBlqiWGNbl6-AFNvNceLvlx4mvM

Allocation stack trace on a different run is attached here. The stack grows up to 500MiB  if left for days.
 
long_running_net5.png
388 KB View Download

Comment 1 by ssid@chromium.org, Jan 5 2017

Cc: cbentzel@chromium.org rsleevi@chromium.org mariakho...@chromium.org dskiba@chromium.org xunji...@chromium.org

Comment 2 by ssid@chromium.org, Jan 5 2017

On chat with xunjieli@:
The trace shows only 16 url requests, but has 21000 HttpStreamFactoryImpl Jobs (inferred from the number of new calls made from DefaultJobFactory::CreateJob).
It's possible that preconnects are misbehaving -- holding on to those jobs. Preconnects create Jobs directly bypassing URL request.

Comment 3 by ssid@chromium.org, Jan 5 2017

Cc: primiano@chromium.org mmenke@chromium.org

Comment 4 by ssid@chromium.org, Jan 5 2017

Cc: ricea@chromium.org
Is it 21000 HttpStreamFactoryImpl Jobs, or 21000 allocations from Job's constructor (i.e., potentially 21000 ClientSocketHandles?).  It seems like if we have both Jobs and ClientSocketHandles, they'd have two slightly different stacks for allocation, no?
(And two different allocation sizes, presumably)

Comment 7 by ssid@chromium.org, Jan 5 2017

So looking at the screenshot above it shows that there are 21000 new/malloc calls directly made from CreateJob() by seeing the number of allocation count at "ShimCppNew" directly under CreateJob(). There are 167000 new/malloc calls directly or indirectly from CreateJob(). Out of the 167000 calls, 146000 are made directly/indirectly from HttpStreamFactoryImpl::Job::Job.


And per comment on the doc, it's possible that these are mostly/all ClientSocketHandles.

Comment 9 by ssid@chromium.org, Jan 5 2017

Cc: rdsmith@chromium.org

Comment 10 by ssid@chromium.org, Jan 5 2017

I just added a log every time a new Job is created and the number keeps increasing (size of HttpStreamFactoryImpl::job_controller_set_ is 3000) in past 30 mins by just leaving the browser open with 10 tabs. 
One thing I don't understand is that net/ MemoryDumpProvider shows there are only 16 active URLRequests. It can't be that we have thousands of undestroyed ClientSocketHandles -- who owns them if not active URLRequests?

In case someone is wondering where I got the "16" number of url requests, once you click on "net" category, you can see that the main URLRequestContext has 16 active requests (screenshot attached).


Screen Shot 2017-01-05 at 6.22.05 PM.png
127 KB View Download
Cc: mge...@chromium.org rch@chromium.org
[ricea]:  We think WebSockets are off the hook (Or at least the problem is strictly at the HttpStreamFactoryImpl layer).

[+mgersh, +rch] Randy and I have been eyeballing the traces, and we think that since we're seeing both an allocation from the layer above the Job's constructor, and allocations from the Job's constructor on the stack, we're leaking the Job as well as the ClientSocketHandle.

So the current theory is these are orphaned Jobs owned the their JobControllers (Owned by the HttpStreamFactoryImpl) that we're seeing.  QUIC ("alternate") and non-QUIC ("main") Jobs use slightly different constructors.  Looking at the call stack of the leak, these look like "main" jobs that are leaking.

There is a connection timeout, so unclear what state these are really in (Connected?  Connecting?  Waiting for [proxy] auth?  Waiting for client auth?  Cert errors?  Disconnected?  Florida?)

Comment 13 by ssid@chromium.org, Jan 5 2017

If it's of any use:
I can cook up a memory dump provider for HttpStreamFactoryImpl and add all these details and run the browser again. This should give us state of the jobs, how many HttpStreamFactoryImpl are around and how many jobs each of them hold.
I think that would be a useful confirmation of the current set of beliefs.
Cc: zhongyi@chromium.org
[+zhongyi]

[mgersh/zhongyi], sorry - I just realized it was zhongyi rather than mgersh who wrote http_stream_factory_impl_job_controller.cc, which is around where this issue seems to be.
Cc: -mge...@chromium.org
I added some logging to net/ MemoryDumpProvider this morning. The leaked jobs are indeed main jobs. These leaked main jobs are all blocked. I will continue digging.

--------------------------------------------
  // True if the main job has to wait for the alternative job: i.e., the main
  // job must not create a connection until it is resumed.
  bool main_job_is_blocked_;
--------------------------------------------
These main jobs are all in next_state_ = STATE_WAIT_COMPLETE.
Components: -Internals>Network Internals>Network>QUIC
That means they're waiting for the alternate job....Which presumably already succeeded, and thus orphaned them.  Adding the QUIC label (Since the orphan job logic exists just for that, I think?)
Components: Internals>Network>HTTP2
The ones that I saw are all non-QUIC, so I think this (also) applies to H2.
Do you mean the main jobs or the alternate jobs?  It's the alts that are QUIC.
The alternate jobs are non-QUIC (presumably H2). I think I might know what is missing. Let me write a regression unit test.
I made an attempt to fix this: https://codereview.chromium.org/2619583002/
Suggestions are welcome.
Willing to summarize root cause for the bug?  I couldn't quite figure it out from the CL.

Cc: -xunji...@chromium.org
Owner: xunji...@chromium.org
Status: Assigned (was: Untriaged)
Here's a quick description:

Some servers advertise an alternate server to connect to (QUIC and/or H2 at another domain/port).  When this happens, and we want to connect to one, we create two Jobs, one of the actual HTTP URL the request is for, and one of the alternative server (More than one alternate can be advertised, I believe, but we only connect to one of them).

When one of the two connects, we no longer need the other...However, we still want to keep the other one around until it connects, so if HTTP connects first, we'll still establish a QUIC connection and can use it in the future (Not sure if we really care about the case where non-QUIC connects first, but we do keep the jobs around in that case).  However, we also have logic that the main job won't start until we know the alternate job won't complete synchronously.  For this bug, the alternate job is completing synchronously, so the main job never starts.  But yet we still keep around the main job waiting for it to complete.

Furthmore, it looks to me like there may be a Job leak even if the main job does complete successfully, if it's after the alt job has been destroy.
Matt is spot on. I ignored the case where we do want to keep alt jobs around even if main jobs have succeeded. 

I talked to Matt offline and will upload a second patch. rch@ and zhongyi@ are out. I will check with them on Monday.
As far as merging is concerned:  We may want to merge the fix to M56, given the potential size of the leak, but I don't think we want to merge it to M55, unless we have reason to think it's high impact.
Summary: Browser memory can grow upto 1GB because of HttpStreamFactoryImpl::JobController leak (was: Browser memory can grow upto 1GB because of network sockets: ClientSocketHandle)
+1 on the merge to M56. Huge kudos for tracking this down.

I'd also be happy merging to M55 (if it's low technical risk) in case we happen to re-spin stable for another reason, but we shouldn't re-push just for this.

How long do we think this issue has existed?
My guess is it's been around at least since HttpStreamFactoryImpl::JobController was added, back in June, which would be M53.  The old code may have had this same issue, though I think it would have been noticed before if it had been around for 6+ months before that.

Comment 32 by ssid@chromium.org, Jan 6 2017

@xunjieli Can you also write a memory dump provider for HttpStreamFactoryImpl so that we keep track of such regressions in future?
Actually, I'm almost certain this broke in 407846 (https://codereview.chromium.org/1952423002), which would be M54, not M53.
Labels: -OS-Linux OS-All
Status: Started (was: Assigned)
Thanks Matt for the ideas and helpful discussion. So far I am able to identify exactly one scenario for the leak. To confirm that we aren't leaking when orphaned job completes, I've added a unit test. 

Let me know if you spot any other potential leak in https://codereview.chromium.org/2619583002/.

Re #32: Sg, I will upload a CL once this bug is fixed. It won't be a separate dump provider, but rather an allocator dump in the existing net/ mdp.

Comment 35 by w...@chromium.org, Jan 7 2017

Cc: brucedaw...@chromium.org
Nice work!

I analyze my Windows VirtualFree shutdown trace, from a browser process with 1 1.1 GB private working set, and it shows similar results. I thought I'd share my results in case any of the memory I'm seeing is not explained by the JobController leak.

My numbers are all approximate because I could only track calls to VirtualFree, which happen when an entire heap segment is freed. This makes attribution imprecise, but when the number of frees and the amount of memory are large enough then the results should be meaningful. I tried to simplify some of the template function names in the call stacks below - apologies if I mangled any of them.

I see ~270 MB of memory freed by the JobController destructor. It looked like the message_center::Notification tree probably had ~1-4 thousand objects total, based on the depth of the call free trees.


It looked like ~27 MB of std::string objects were freed by GURL destructors called from here:
chrome.dll!std::unique_ptr<ExtensionWebRequestTimeTracker>::~unique_ptr<ExtensionWebRequestTimeTracker>
chrome.dll!base::Singleton<extensions::ExtensionWebRequestEventRouter>::OnExit


About 26 MB of mojo data was closed along this call stack:
chrome.dll!mojo::internal::MultiplexRouter::CloseMessagePipe
chrome.dll!mojo::internal::MultiplexedBindingState::Close
chrome.dll!shell::InterfaceRegistry::`vector deleting destructor'
chrome.dll!std::vector<std::unique_ptr<net::QuicChromiumPacketReader>>::~vector<>
chrome.dll!shell::ServiceContext::`vector deleting destructor'
chrome.dll!content::ServiceManagerConnectionImpl::IOThreadContext::ShutDownOnIOThread


and another ~13 MB of mojo data from:
mojo::edk::`anonymous namespace'::ThreadDestructionObserver::WillDestroyCurrentMessageLoop


The tooling for analyzing ETW traces of VirtualFree is horrible so I had to hack together some scripts instead, which may be why I can account for only about a third of the memory. I can drill in deeper, and can probably get heap free information, if it seems likely that more information would be helpful.

Hrm...I can't see how any of those three callstacks could be related to the HttpStreamFactoryImpl::JobController leak - the leak is deep in the bowels of the network stack, and I don't think any of those look to either be deleting a HttpStreamFactoryImpl, or have the ability to stay alive based on the existence of an HttpStreamFactoryImpl::JobController, though I didn't spend much time digging into them.

Comment 38 by ssid@chromium.org, Jan 9 2017

Thanks for investigation into the issue. It looks like the CL https://codereview.chromium.org/2619583002 fixes the major leak :)
I have only tried running browser with the first patch. The browser running for 2.5 days is just 250MB in size (That's a 500MB lesser!), with the exact same tabs as in comment #0. Good work!
The memory allocated by the functions,
net::(anonymous namespace)::DefaultJobFactory::CreateJob()
net::HttpStreamFactoryImpl::Job::Job()
has increased by 10MB. It grew from 12 to 22 in 3 days. This growth is also very constant, suggesting that it could be a leak.
Is this still expected or should I be using the latest patch in CL?

Trace here: https://drive.google.com/open?id=0B8FmUrPL9B_CaE5hUEJacVJoaW8

Note: There are other similar memory bloats observed in components like cc/. I will file different bugs for these.
Cc: haraken@chromium.org tasak@google.com bashi@chromium.org picksi@chromium.org sque@chromium.org bccheng@chromium.org chrisha@chromium.org hpayer@chromium.org
 Issue 668682  has been merged into this issue.
Thanks everybody for the collaboration on this, y all rock.
I have a question here: if we had slow-reports (or anything else that takes data form memory-infra) enabled on desktop  AND the net MDP that have been (or are being) developed were already there, would we had caught this sooner?

In other words if this issue happens again 6 months from now, the memory-infra probes that the net team has added would be enough to narrow down the problem?
This is really for my own curiosity and to learn whether we are shooting in the right direction (and if not, what we need to fix). Essentially what I am trying to figure out here is: was the strategy (about adding MDP to net) right but just too late with regards to this issue? If not what could have we done differently?
Components: -Internals>Network>HTTP2
Re #40. Yes, I believe so. The current net MemoryDumpProvider helped in narrowing down the issue. It showed 16 active URLRequests. This combined with the fact that Ssid@'s manual instrumentation (native heap profiling) showed that there were thousands of unfreed ClientSocketHandle allocations helped to find the cause. Slow-reports would tell us something is wrong, since in this case this leak is pretty big. Even though the pseudo stack from Slow-Reports is coarse, but at least we will have something.

-----------------------
This is specifically a QUIC issue (What I said in #22 is incorrect). Removing Http2 label. The corresponding alt jobs of the leaked main jobs are QUIC jobs which reused an existing QUIC session.
(And net MDP shows that the socket pools and SSL session cache are not the ones to blame for ClientSocketHandles. I can work on a postmortem so this can be documented)

Comment 43 by ssid@chromium.org, Jan 9 2017

Sorry, what do you mean by "pseudo stack from Slow reports"? We do not have heap profiling turned on for field trials.
So Slow-Reports only have MDP data? I thought it has pseudo stacks from trace events.

Comment 45 by ssid@chromium.org, Jan 9 2017

No, it is expensive to enable heap profiling in the field right now due to perf regressions. We have plans for the future.
Cc: shrike@chromium.org
Is this a regression only in ToT or does it occur pre-M57?

Labels: -Type-Bug Type-Bug-Regression
For the record, here it is the location where it was regressed in 5489ba4e6d4fa5ef7be7b8bbb8521f75addab7f5. Thanks mmenke@ for pinpointing the exact location.

Whenever we have two jobs (main job and alt job), one of them will be orphaned through Job::Orphan(). We used to cancel the blocked main job if it's being orphaned and we found out that it's blocked. However, in the new code, this piece of logic is missing.

 void HttpStreamFactoryImpl::Job::Orphan() {
   net_log_.AddEvent(NetLog::TYPE_HTTP_STREAM_JOB_ORPHANED);
-  if (blocking_job_) {
-    // We've been orphaned, but there's a job we're blocked on. Don't bother
-    // racing, just cancel ourself.
-    DCHECK(blocking_job_->waiting_job_);
-    blocking_job_->waiting_job_ = NULL;
-    blocking_job_ = NULL;
-    if (delegate_->for_websockets() && connection_ && connection_->socket()) {
-      connection_->socket()->Disconnect();
-    }
-    delegate_->OnOrphanedJobComplete(this);
-  } else if (delegate_->for_websockets()) {

This affects 54.0.2809.0 and newer.
Worth noting the fix for the leak include the fix for another related leak that I think was introduced in M53.  Not sure how bad the other leak is, alone, though (With both leaks, the M54 leak may partially mask the M53 leak.  Don't think it's worth digging into more detail, though, since we have a fix for both).
Re #38: ssid@, thanks for sharing the new traces! It will be great if we can know what the 22 Jobs are doing. Could you apply this temp CL https://codereview.chromium.org/2619403002 to see if we can get any info on the states of these Jobs?
Though #668682 was merged into this one, I'm wondering if they are, in fact, the same.  My experience in #668682 is a period of very rapid memory growth, leading to swapping (or OOM), followed by a dramatic drop in memory usage.  I'm not sure the best way to profile/debug that to even compare to the traces being discussed here.

Comment 52 by ssid@chromium.org, Jan 10 2017

Cc: erikc...@chromium.org
 Issue 674657  has been merged into this issue.
Project Member

Comment 53 by bugdroid1@chromium.org, Jan 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e368398a3a5357a8742f3829ef08cc45080fb808

commit e368398a3a5357a8742f3829ef08cc45080fb808
Author: xunjieli <xunjieli@chromium.org>
Date: Tue Jan 10 12:14:48 2017

Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete

Background: When a server advertises alternative services support,
JobController creates two Jobs (|main_job_| and |alternative_job_|).
The controller blocks |main_job_| when connecting the
|alternative_job_|. It resumes the |main_job_| when |alternative_job_|
fails or when the timeout happens. When both jobs are destroyed, the
JobController goes away.

Bug: We currently leak JobController in the situation where
|alternative_job_| succeeds and |main_job_| is blocked on alt job.
OnRequestComplete() will be called which then destroys the alt job but
not the main job. This causes MaybeNotifyFactoryOfCompletion() to not
clean up the controller.

Fix: This CL fixes this leak by resetting |main_job_| if it is blocked
(i.e. hasn't started connecting) when |alternative_job_| succeeds. This
CL adds a regression test for this.

Note that there is no leak when orphaned job completes, because
"job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a
|!request_| check so it's obvious that we always clean up controller in
the orphaned case. This CL adds a test to confirm that we did clean up
controller when orphaned job completes (which passes with and without
the change).

BUG= 678768 

Review-Url: https://codereview.chromium.org/2619583002
Cr-Commit-Position: refs/heads/master@{#442556}

[modify] https://crrev.com/e368398a3a5357a8742f3829ef08cc45080fb808/net/http/http_basic_stream.cc
[modify] https://crrev.com/e368398a3a5357a8742f3829ef08cc45080fb808/net/http/http_stream_factory_impl_job_controller.cc
[modify] https://crrev.com/e368398a3a5357a8742f3829ef08cc45080fb808/net/http/http_stream_factory_impl_job_controller.h
[modify] https://crrev.com/e368398a3a5357a8742f3829ef08cc45080fb808/net/http/http_stream_factory_impl_job_controller_unittest.cc

Labels: Merge-Request-56
Status: Fixed (was: Started)
Requesting merge into M56. Thanks!
Issue 677191 has been merged into this issue.

Comment 56 by ssid@chromium.org, Jan 10 2017

Can we keep this bug open till I can verify that the 10MB increase that I said in comment #38 is ok? I will run the test again with your Cls, and the new MDM.
Status: Started (was: Fixed)
Project Member

Comment 58 by sheriffbot@chromium.org, Jan 11 2017

Labels: -Merge-Request-56 Hotlist-Merge-Approved Merge-Approved-56
Your change meets the bar and is auto-approved for M56. Please go ahead and merge the CL manually. Please contact milestone owner if you have questions.
Owners: amineer@(clank), cmasso@(bling), gkihumba@(cros), bustamante@(desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 59 by bugdroid1@chromium.org, Jan 11 2017

Labels: -merge-approved-56 merge-merged-2924
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1c7375269bc6e2ef192b74dd1b17f7e4e8b5f63e

commit 1c7375269bc6e2ef192b74dd1b17f7e4e8b5f63e
Author: xunjieli <xunjieli@chromium.org>
Date: Wed Jan 11 16:13:23 2017

Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete

Background: When a server advertises alternative services support,
JobController creates two Jobs (|main_job_| and |alternative_job_|).
The controller blocks |main_job_| when connecting the
|alternative_job_|. It resumes the |main_job_| when |alternative_job_|
fails or when the timeout happens. When both jobs are destroyed, the
JobController goes away.

Bug: We currently leak JobController in the situation where
|alternative_job_| succeeds and |main_job_| is blocked on alt job.
OnRequestComplete() will be called which then destroys the alt job but
not the main job. This causes MaybeNotifyFactoryOfCompletion() to not
clean up the controller.

Fix: This CL fixes this leak by resetting |main_job_| if it is blocked
(i.e. hasn't started connecting) when |alternative_job_| succeeds. This
CL adds a regression test for this.

Note that there is no leak when orphaned job completes, because
"job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a
|!request_| check so it's obvious that we always clean up controller in
the orphaned case. This CL adds a test to confirm that we did clean up
controller when orphaned job completes (which passes with and without
the change).

TBR=zhongyi@chromium.org
TBR=mmenke@chromium.org

NOTRY=true
NOPRESUBMIT=true

BUG= 678768 

Review-Url: https://codereview.chromium.org/2619583002
Cr-Commit-Position: refs/heads/master@{#442556}
(cherry picked from commit e368398a3a5357a8742f3829ef08cc45080fb808)

Review-Url: https://codereview.chromium.org/2622193003
Cr-Commit-Position: refs/branch-heads/2924@{#733}
Cr-Branched-From: 3a87aecc31cd1ffe751dd72c04e5a96a1fc8108a-refs/heads/master@{#433059}

[modify] https://crrev.com/1c7375269bc6e2ef192b74dd1b17f7e4e8b5f63e/net/http/http_basic_stream.cc
[modify] https://crrev.com/1c7375269bc6e2ef192b74dd1b17f7e4e8b5f63e/net/http/http_stream_factory_impl_job_controller.cc
[modify] https://crrev.com/1c7375269bc6e2ef192b74dd1b17f7e4e8b5f63e/net/http/http_stream_factory_impl_job_controller.h
[modify] https://crrev.com/1c7375269bc6e2ef192b74dd1b17f7e4e8b5f63e/net/http/http_stream_factory_impl_job_controller_unittest.cc

Cc: tdres...@chromium.org vmi...@chromium.org csharrison@chromium.org pauljensen@chromium.org xunji...@chromium.org
Issue 678327 has been merged into this issue.
Pity, we missed the beta push by one day
The last beta was cut yesterday, few revisions behind the cherry-pick..

On the good side, the next dev and beta push are in one week. I checked UMA and even just the dev channel population seems enough to show the bug in UMA with 1 day aggregation: go/ptken

So, ideally, one week from now, we should be able to open that same UMA link and see the two histogram being on-par.
I'm wondering, why ChromeOS' leak detector didn't catch this? Simon, any clues?
I don't know anything about ChromeOS's leak detector, but not that the memory isn't technically leaked - it's still owned, it's just that the objects aren't doing anything, and aren't destroyed until the profile is torn down.
As I understand leak detector should've caught it - it monitors increases in allocation sizes, and a significant increase in allocations of the same size is deemed a leak and triggers stack frame collection, which is then sent home and investigated.

Comment 65 by w...@chromium.org, Jan 11 2017

Re #64: ChromeOS stack symbolization was broken in crashes, for quite some time, last year - might the leak detector collection have been similarly affected?

Comment 66 by sque@chromium.org, Jan 11 2017

Re #38: What bloat? I found a leak using leak detector and it was fixed on Dec 3: https://codereview.chromium.org/2544203003/

Re #62: I did actually come across it in November. See attached screenshot. But I didn't know if it was a leak or working as intended. There were a lot of leak reports from various call sites and I thought an HTTP-related call site would require more extensive knowledge of the networking stack, so I prioritized looking at other leaks. We will talk about it today's memory leak detection meeting. If you're interested in joining, message me for an invite.
net::HttpStreamFactoryImpl external.png
244 KB View Download

Comment 67 by ssid@chromium.org, Jan 12 2017

Re #66: This doc talks about other memory bloats, which I am yet to investigate.
https://docs.google.com/document/d/1XKx4-AMBxPxfJ0jDlBlqiWGNbl6-AFNvNceLvlx4mvM
Project Member

Comment 68 by bugdroid1@chromium.org, Jan 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723

commit 910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723
Author: zhongyi <zhongyi@chromium.org>
Date: Tue Jan 17 17:55:52 2017

Move the logic to cancel main job in OrphanUnboundJob when the alt job succeed while main job is blocked.

When the alt job succeeds while the main job hasn't, the alt job will be bound the request by HttpStreamFactoryImpl::JobController::BindJob().
And JobController will orphan the main job. If the main job
- is still blocked(i.e, not started yet), it should be cancelled immediately.
- is no longer blocked
  - but is for websocket, it should be cancelled immediately.
  - and is not for websocket, no-op. The job will continue connecting until it finishes. When it's finishing, it will call one of the callbacks and
notifying the JobController of the Job's completion. The JobController will then check whether this job has been orphaned, and calls
OnOrphanedJobComplete to delete the main job.

BUG= 678768 

Review-Url: https://codereview.chromium.org/2625133006
Cr-Commit-Position: refs/heads/master@{#444090}

[modify] https://crrev.com/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723/net/http/http_stream_factory_impl_job_controller.cc
[modify] https://crrev.com/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723/net/http/http_stream_factory_impl_job_controller_unittest.cc

Project Member

Comment 69 by bugdroid1@chromium.org, Jan 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723

commit 910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723
Author: zhongyi <zhongyi@chromium.org>
Date: Tue Jan 17 17:55:52 2017

Move the logic to cancel main job in OrphanUnboundJob when the alt job succeed while main job is blocked.

When the alt job succeeds while the main job hasn't, the alt job will be bound the request by HttpStreamFactoryImpl::JobController::BindJob().
And JobController will orphan the main job. If the main job
- is still blocked(i.e, not started yet), it should be cancelled immediately.
- is no longer blocked
  - but is for websocket, it should be cancelled immediately.
  - and is not for websocket, no-op. The job will continue connecting until it finishes. When it's finishing, it will call one of the callbacks and
notifying the JobController of the Job's completion. The JobController will then check whether this job has been orphaned, and calls
OnOrphanedJobComplete to delete the main job.

BUG= 678768 

Review-Url: https://codereview.chromium.org/2625133006
Cr-Commit-Position: refs/heads/master@{#444090}

[modify] https://crrev.com/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723/net/http/http_stream_factory_impl_job_controller.cc
[modify] https://crrev.com/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723/net/http/http_stream_factory_impl_job_controller_unittest.cc

Project Member

Comment 70 by bugdroid1@chromium.org, Jan 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723

commit 910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723
Author: zhongyi <zhongyi@chromium.org>
Date: Tue Jan 17 17:55:52 2017

Move the logic to cancel main job in OrphanUnboundJob when the alt job succeed while main job is blocked.

When the alt job succeeds while the main job hasn't, the alt job will be bound the request by HttpStreamFactoryImpl::JobController::BindJob().
And JobController will orphan the main job. If the main job
- is still blocked(i.e, not started yet), it should be cancelled immediately.
- is no longer blocked
  - but is for websocket, it should be cancelled immediately.
  - and is not for websocket, no-op. The job will continue connecting until it finishes. When it's finishing, it will call one of the callbacks and
notifying the JobController of the Job's completion. The JobController will then check whether this job has been orphaned, and calls
OnOrphanedJobComplete to delete the main job.

BUG= 678768 

Review-Url: https://codereview.chromium.org/2625133006
Cr-Commit-Position: refs/heads/master@{#444090}

[modify] https://crrev.com/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723/net/http/http_stream_factory_impl_job_controller.cc
[modify] https://crrev.com/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723/net/http/http_stream_factory_impl_job_controller_unittest.cc

The commit on #68 should fix the code completely. 

It looks like the bugdroid commented the same message multiple times, I will keep watching to make sure the CQ looks great and is not committing duplicately(crbug/432905). 
Owner: ssid@chromium.org
Handing this off to ssid@ per comment #56. 
zhongyi@'s change makes the new code match the old behavior more closely. There's no functional change, so it doesn't need to be merged.
The data for dev channel looks quite promising:

TL;DR of QUIC vs Control Group
 - Windows is on-par
 - Linux is now even better (I guess a side effect of having less connections around without leaking the http ones?)

UMA links:
Linux Dev channel *BEFORE* the fix #53: go/wispm
Linux Dev channel *AFTER* the fix #53: go/hfewr

Win Dev before: go/zjsef
Win Dev after: go/vaeis

Well rescued ;-)

Comment 74 by w...@chromium.org, Jan 20 2017

Cc: junov@chromium.org danakj@chromium.org afakhry@chromium.org markdavidscott@google.com vmp...@chromium.org amineer@chromium.org ericrk@chromium.org
 Issue 641518  has been merged into this issue.

Comment 75 by ssid@chromium.org, Jan 23 2017

Status: Fixed (was: Started)
I think we can close this bug as per comment #73
Labels: sr-pm-2
Not being a chrome dev, I don't know enough about the release cycles. How can I figure out what version I need to have to see the improvement?
The temp fix is available in 56.0.2924.60 as xunjieli@ has merged that to M56, which will be rolled out to Stable in the near future. 

Sign in to add a comment