Issue metadata
Sign in to add a comment
|
Browser memory can grow upto 1GB because of HttpStreamFactoryImpl::JobController leak |
|||||||||||||||||||||||||
Issue descriptionThe browser on my Linux machine grows more than 1GB in 5 days on my Google on-corp machine. Collecting a heap profile snapshot of browser memory shows that 500MiB of allocations are being made by net/ codebase. The main allocation sites are: net::(anonymous namespace)::DefaultJobFactory::CreateJob() net::HttpStreamFactoryImpl::Job::Job() These ClientSocketHandle objects hold read and write socket buffers and certificates. For more details on the experiment and trace files, refer to this doc: https://docs.google.com/document/d/1XKx4-AMBxPxfJ0jDlBlqiWGNbl6-AFNvNceLvlx4mvM Allocation stack trace on a different run is attached here. The stack grows up to 500MiB if left for days.
,
Jan 5 2017
On chat with xunjieli@: The trace shows only 16 url requests, but has 21000 HttpStreamFactoryImpl Jobs (inferred from the number of new calls made from DefaultJobFactory::CreateJob). It's possible that preconnects are misbehaving -- holding on to those jobs. Preconnects create Jobs directly bypassing URL request.
,
Jan 5 2017
,
Jan 5 2017
,
Jan 5 2017
Is it 21000 HttpStreamFactoryImpl Jobs, or 21000 allocations from Job's constructor (i.e., potentially 21000 ClientSocketHandles?). It seems like if we have both Jobs and ClientSocketHandles, they'd have two slightly different stacks for allocation, no?
,
Jan 5 2017
(And two different allocation sizes, presumably)
,
Jan 5 2017
So looking at the screenshot above it shows that there are 21000 new/malloc calls directly made from CreateJob() by seeing the number of allocation count at "ShimCppNew" directly under CreateJob(). There are 167000 new/malloc calls directly or indirectly from CreateJob(). Out of the 167000 calls, 146000 are made directly/indirectly from HttpStreamFactoryImpl::Job::Job.
,
Jan 5 2017
And per comment on the doc, it's possible that these are mostly/all ClientSocketHandles.
,
Jan 5 2017
,
Jan 5 2017
I just added a log every time a new Job is created and the number keeps increasing (size of HttpStreamFactoryImpl::job_controller_set_ is 3000) in past 30 mins by just leaving the browser open with 10 tabs.
,
Jan 5 2017
One thing I don't understand is that net/ MemoryDumpProvider shows there are only 16 active URLRequests. It can't be that we have thousands of undestroyed ClientSocketHandles -- who owns them if not active URLRequests? In case someone is wondering where I got the "16" number of url requests, once you click on "net" category, you can see that the main URLRequestContext has 16 active requests (screenshot attached).
,
Jan 5 2017
[ricea]: We think WebSockets are off the hook (Or at least the problem is strictly at the HttpStreamFactoryImpl layer).
[+mgersh, +rch] Randy and I have been eyeballing the traces, and we think that since we're seeing both an allocation from the layer above the Job's constructor, and allocations from the Job's constructor on the stack, we're leaking the Job as well as the ClientSocketHandle.
So the current theory is these are orphaned Jobs owned the their JobControllers (Owned by the HttpStreamFactoryImpl) that we're seeing. QUIC ("alternate") and non-QUIC ("main") Jobs use slightly different constructors. Looking at the call stack of the leak, these look like "main" jobs that are leaking.
There is a connection timeout, so unclear what state these are really in (Connected? Connecting? Waiting for [proxy] auth? Waiting for client auth? Cert errors? Disconnected? Florida?)
,
Jan 5 2017
If it's of any use: I can cook up a memory dump provider for HttpStreamFactoryImpl and add all these details and run the browser again. This should give us state of the jobs, how many HttpStreamFactoryImpl are around and how many jobs each of them hold.
,
Jan 6 2017
I think that would be a useful confirmation of the current set of beliefs.
,
Jan 6 2017
[+zhongyi] [mgersh/zhongyi], sorry - I just realized it was zhongyi rather than mgersh who wrote http_stream_factory_impl_job_controller.cc, which is around where this issue seems to be.
,
Jan 6 2017
,
Jan 6 2017
I added some logging to net/ MemoryDumpProvider this morning. The leaked jobs are indeed main jobs. These leaked main jobs are all blocked. I will continue digging. -------------------------------------------- // True if the main job has to wait for the alternative job: i.e., the main // job must not create a connection until it is resumed. bool main_job_is_blocked_; --------------------------------------------
,
Jan 6 2017
These main jobs are all in next_state_ = STATE_WAIT_COMPLETE.
,
Jan 6 2017
That means they're waiting for the alternate job....Which presumably already succeeded, and thus orphaned them. Adding the QUIC label (Since the orphan job logic exists just for that, I think?)
,
Jan 6 2017
The ones that I saw are all non-QUIC, so I think this (also) applies to H2.
,
Jan 6 2017
Do you mean the main jobs or the alternate jobs? It's the alts that are QUIC.
,
Jan 6 2017
The alternate jobs are non-QUIC (presumably H2). I think I might know what is missing. Let me write a regression unit test.
,
Jan 6 2017
I made an attempt to fix this: https://codereview.chromium.org/2619583002/ Suggestions are welcome.
,
Jan 6 2017
Willing to summarize root cause for the bug? I couldn't quite figure it out from the CL.
,
Jan 6 2017
,
Jan 6 2017
Here's a quick description: Some servers advertise an alternate server to connect to (QUIC and/or H2 at another domain/port). When this happens, and we want to connect to one, we create two Jobs, one of the actual HTTP URL the request is for, and one of the alternative server (More than one alternate can be advertised, I believe, but we only connect to one of them). When one of the two connects, we no longer need the other...However, we still want to keep the other one around until it connects, so if HTTP connects first, we'll still establish a QUIC connection and can use it in the future (Not sure if we really care about the case where non-QUIC connects first, but we do keep the jobs around in that case). However, we also have logic that the main job won't start until we know the alternate job won't complete synchronously. For this bug, the alternate job is completing synchronously, so the main job never starts. But yet we still keep around the main job waiting for it to complete. Furthmore, it looks to me like there may be a Job leak even if the main job does complete successfully, if it's after the alt job has been destroy.
,
Jan 6 2017
Matt is spot on. I ignored the case where we do want to keep alt jobs around even if main jobs have succeeded. I talked to Matt offline and will upload a second patch. rch@ and zhongyi@ are out. I will check with them on Monday.
,
Jan 6 2017
As far as merging is concerned: We may want to merge the fix to M56, given the potential size of the leak, but I don't think we want to merge it to M55, unless we have reason to think it's high impact.
,
Jan 6 2017
,
Jan 6 2017
+1 on the merge to M56. Huge kudos for tracking this down. I'd also be happy merging to M55 (if it's low technical risk) in case we happen to re-spin stable for another reason, but we shouldn't re-push just for this. How long do we think this issue has existed?
,
Jan 6 2017
My guess is it's been around at least since HttpStreamFactoryImpl::JobController was added, back in June, which would be M53. The old code may have had this same issue, though I think it would have been noticed before if it had been around for 6+ months before that.
,
Jan 6 2017
@xunjieli Can you also write a memory dump provider for HttpStreamFactoryImpl so that we keep track of such regressions in future?
,
Jan 6 2017
Actually, I'm almost certain this broke in 407846 (https://codereview.chromium.org/1952423002), which would be M54, not M53.
,
Jan 6 2017
Thanks Matt for the ideas and helpful discussion. So far I am able to identify exactly one scenario for the leak. To confirm that we aren't leaking when orphaned job completes, I've added a unit test. Let me know if you spot any other potential leak in https://codereview.chromium.org/2619583002/. Re #32: Sg, I will upload a CL once this bug is fixed. It won't be a separate dump provider, but rather an allocator dump in the existing net/ mdp.
,
Jan 7 2017
,
Jan 8 2017
Nice work! I analyze my Windows VirtualFree shutdown trace, from a browser process with 1 1.1 GB private working set, and it shows similar results. I thought I'd share my results in case any of the memory I'm seeing is not explained by the JobController leak. My numbers are all approximate because I could only track calls to VirtualFree, which happen when an entire heap segment is freed. This makes attribution imprecise, but when the number of frees and the amount of memory are large enough then the results should be meaningful. I tried to simplify some of the template function names in the call stacks below - apologies if I mangled any of them. I see ~270 MB of memory freed by the JobController destructor. It looked like the message_center::Notification tree probably had ~1-4 thousand objects total, based on the depth of the call free trees. It looked like ~27 MB of std::string objects were freed by GURL destructors called from here: chrome.dll!std::unique_ptr<ExtensionWebRequestTimeTracker>::~unique_ptr<ExtensionWebRequestTimeTracker> chrome.dll!base::Singleton<extensions::ExtensionWebRequestEventRouter>::OnExit About 26 MB of mojo data was closed along this call stack: chrome.dll!mojo::internal::MultiplexRouter::CloseMessagePipe chrome.dll!mojo::internal::MultiplexedBindingState::Close chrome.dll!shell::InterfaceRegistry::`vector deleting destructor' chrome.dll!std::vector<std::unique_ptr<net::QuicChromiumPacketReader>>::~vector<> chrome.dll!shell::ServiceContext::`vector deleting destructor' chrome.dll!content::ServiceManagerConnectionImpl::IOThreadContext::ShutDownOnIOThread and another ~13 MB of mojo data from: mojo::edk::`anonymous namespace'::ThreadDestructionObserver::WillDestroyCurrentMessageLoop The tooling for analyzing ETW traces of VirtualFree is horrible so I had to hack together some scripts instead, which may be why I can account for only about a third of the memory. I can drill in deeper, and can probably get heap free information, if it seems likely that more information would be helpful.
,
Jan 9 2017
Hrm...I can't see how any of those three callstacks could be related to the HttpStreamFactoryImpl::JobController leak - the leak is deep in the bowels of the network stack, and I don't think any of those look to either be deleting a HttpStreamFactoryImpl, or have the ability to stay alive based on the existence of an HttpStreamFactoryImpl::JobController, though I didn't spend much time digging into them.
,
Jan 9 2017
Thanks for investigation into the issue. It looks like the CL https://codereview.chromium.org/2619583002 fixes the major leak :) I have only tried running browser with the first patch. The browser running for 2.5 days is just 250MB in size (That's a 500MB lesser!), with the exact same tabs as in comment #0. Good work! The memory allocated by the functions, net::(anonymous namespace)::DefaultJobFactory::CreateJob() net::HttpStreamFactoryImpl::Job::Job() has increased by 10MB. It grew from 12 to 22 in 3 days. This growth is also very constant, suggesting that it could be a leak. Is this still expected or should I be using the latest patch in CL? Trace here: https://drive.google.com/open?id=0B8FmUrPL9B_CaE5hUEJacVJoaW8 Note: There are other similar memory bloats observed in components like cc/. I will file different bugs for these.
,
Jan 9 2017
Issue 668682 has been merged into this issue.
,
Jan 9 2017
Thanks everybody for the collaboration on this, y all rock. I have a question here: if we had slow-reports (or anything else that takes data form memory-infra) enabled on desktop AND the net MDP that have been (or are being) developed were already there, would we had caught this sooner? In other words if this issue happens again 6 months from now, the memory-infra probes that the net team has added would be enough to narrow down the problem? This is really for my own curiosity and to learn whether we are shooting in the right direction (and if not, what we need to fix). Essentially what I am trying to figure out here is: was the strategy (about adding MDP to net) right but just too late with regards to this issue? If not what could have we done differently?
,
Jan 9 2017
Re #40. Yes, I believe so. The current net MemoryDumpProvider helped in narrowing down the issue. It showed 16 active URLRequests. This combined with the fact that Ssid@'s manual instrumentation (native heap profiling) showed that there were thousands of unfreed ClientSocketHandle allocations helped to find the cause. Slow-reports would tell us something is wrong, since in this case this leak is pretty big. Even though the pseudo stack from Slow-Reports is coarse, but at least we will have something. ----------------------- This is specifically a QUIC issue (What I said in #22 is incorrect). Removing Http2 label. The corresponding alt jobs of the leaked main jobs are QUIC jobs which reused an existing QUIC session.
,
Jan 9 2017
(And net MDP shows that the socket pools and SSL session cache are not the ones to blame for ClientSocketHandles. I can work on a postmortem so this can be documented)
,
Jan 9 2017
Sorry, what do you mean by "pseudo stack from Slow reports"? We do not have heap profiling turned on for field trials.
,
Jan 9 2017
So Slow-Reports only have MDP data? I thought it has pseudo stacks from trace events.
,
Jan 9 2017
No, it is expensive to enable heap profiling in the field right now due to perf regressions. We have plans for the future.
,
Jan 9 2017
Is this a regression only in ToT or does it occur pre-M57?
,
Jan 9 2017
#46: please see comment 33. https://bugs.chromium.org/p/chromium/issues/detail?id=678768#c33
,
Jan 9 2017
For the record, here it is the location where it was regressed in 5489ba4e6d4fa5ef7be7b8bbb8521f75addab7f5. Thanks mmenke@ for pinpointing the exact location. Whenever we have two jobs (main job and alt job), one of them will be orphaned through Job::Orphan(). We used to cancel the blocked main job if it's being orphaned and we found out that it's blocked. However, in the new code, this piece of logic is missing. void HttpStreamFactoryImpl::Job::Orphan() { net_log_.AddEvent(NetLog::TYPE_HTTP_STREAM_JOB_ORPHANED); - if (blocking_job_) { - // We've been orphaned, but there's a job we're blocked on. Don't bother - // racing, just cancel ourself. - DCHECK(blocking_job_->waiting_job_); - blocking_job_->waiting_job_ = NULL; - blocking_job_ = NULL; - if (delegate_->for_websockets() && connection_ && connection_->socket()) { - connection_->socket()->Disconnect(); - } - delegate_->OnOrphanedJobComplete(this); - } else if (delegate_->for_websockets()) { This affects 54.0.2809.0 and newer.
,
Jan 9 2017
Worth noting the fix for the leak include the fix for another related leak that I think was introduced in M53. Not sure how bad the other leak is, alone, though (With both leaks, the M54 leak may partially mask the M53 leak. Don't think it's worth digging into more detail, though, since we have a fix for both).
,
Jan 9 2017
Re #38: ssid@, thanks for sharing the new traces! It will be great if we can know what the 22 Jobs are doing. Could you apply this temp CL https://codereview.chromium.org/2619403002 to see if we can get any info on the states of these Jobs?
,
Jan 10 2017
Though #668682 was merged into this one, I'm wondering if they are, in fact, the same. My experience in #668682 is a period of very rapid memory growth, leading to swapping (or OOM), followed by a dramatic drop in memory usage. I'm not sure the best way to profile/debug that to even compare to the traces being discussed here.
,
Jan 10 2017
,
Jan 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e368398a3a5357a8742f3829ef08cc45080fb808 commit e368398a3a5357a8742f3829ef08cc45080fb808 Author: xunjieli <xunjieli@chromium.org> Date: Tue Jan 10 12:14:48 2017 Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or when the timeout happens. When both jobs are destroyed, the JobController goes away. Bug: We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on alt job. OnRequestComplete() will be called which then destroys the alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. Fix: This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. Note that there is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so it's obvious that we always clean up controller in the orphaned case. This CL adds a test to confirm that we did clean up controller when orphaned job completes (which passes with and without the change). BUG= 678768 Review-Url: https://codereview.chromium.org/2619583002 Cr-Commit-Position: refs/heads/master@{#442556} [modify] https://crrev.com/e368398a3a5357a8742f3829ef08cc45080fb808/net/http/http_basic_stream.cc [modify] https://crrev.com/e368398a3a5357a8742f3829ef08cc45080fb808/net/http/http_stream_factory_impl_job_controller.cc [modify] https://crrev.com/e368398a3a5357a8742f3829ef08cc45080fb808/net/http/http_stream_factory_impl_job_controller.h [modify] https://crrev.com/e368398a3a5357a8742f3829ef08cc45080fb808/net/http/http_stream_factory_impl_job_controller_unittest.cc
,
Jan 10 2017
Requesting merge into M56. Thanks!
,
Jan 10 2017
Issue 677191 has been merged into this issue.
,
Jan 10 2017
Can we keep this bug open till I can verify that the 10MB increase that I said in comment #38 is ok? I will run the test again with your Cls, and the new MDM.
,
Jan 10 2017
,
Jan 11 2017
Your change meets the bar and is auto-approved for M56. Please go ahead and merge the CL manually. Please contact milestone owner if you have questions. Owners: amineer@(clank), cmasso@(bling), gkihumba@(cros), bustamante@(desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jan 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1c7375269bc6e2ef192b74dd1b17f7e4e8b5f63e commit 1c7375269bc6e2ef192b74dd1b17f7e4e8b5f63e Author: xunjieli <xunjieli@chromium.org> Date: Wed Jan 11 16:13:23 2017 Clean up HttpStreamFactoryImpl::JobController when Impl::Jobs complete Background: When a server advertises alternative services support, JobController creates two Jobs (|main_job_| and |alternative_job_|). The controller blocks |main_job_| when connecting the |alternative_job_|. It resumes the |main_job_| when |alternative_job_| fails or when the timeout happens. When both jobs are destroyed, the JobController goes away. Bug: We currently leak JobController in the situation where |alternative_job_| succeeds and |main_job_| is blocked on alt job. OnRequestComplete() will be called which then destroys the alt job but not the main job. This causes MaybeNotifyFactoryOfCompletion() to not clean up the controller. Fix: This CL fixes this leak by resetting |main_job_| if it is blocked (i.e. hasn't started connecting) when |alternative_job_| succeeds. This CL adds a regression test for this. Note that there is no leak when orphaned job completes, because "job_bound_ && bound_job_ != job" will evaluate to true. This CL adds a |!request_| check so it's obvious that we always clean up controller in the orphaned case. This CL adds a test to confirm that we did clean up controller when orphaned job completes (which passes with and without the change). TBR=zhongyi@chromium.org TBR=mmenke@chromium.org NOTRY=true NOPRESUBMIT=true BUG= 678768 Review-Url: https://codereview.chromium.org/2619583002 Cr-Commit-Position: refs/heads/master@{#442556} (cherry picked from commit e368398a3a5357a8742f3829ef08cc45080fb808) Review-Url: https://codereview.chromium.org/2622193003 Cr-Commit-Position: refs/branch-heads/2924@{#733} Cr-Branched-From: 3a87aecc31cd1ffe751dd72c04e5a96a1fc8108a-refs/heads/master@{#433059} [modify] https://crrev.com/1c7375269bc6e2ef192b74dd1b17f7e4e8b5f63e/net/http/http_basic_stream.cc [modify] https://crrev.com/1c7375269bc6e2ef192b74dd1b17f7e4e8b5f63e/net/http/http_stream_factory_impl_job_controller.cc [modify] https://crrev.com/1c7375269bc6e2ef192b74dd1b17f7e4e8b5f63e/net/http/http_stream_factory_impl_job_controller.h [modify] https://crrev.com/1c7375269bc6e2ef192b74dd1b17f7e4e8b5f63e/net/http/http_stream_factory_impl_job_controller_unittest.cc
,
Jan 11 2017
Issue 678327 has been merged into this issue.
,
Jan 11 2017
Pity, we missed the beta push by one day The last beta was cut yesterday, few revisions behind the cherry-pick.. On the good side, the next dev and beta push are in one week. I checked UMA and even just the dev channel population seems enough to show the bug in UMA with 1 day aggregation: go/ptken So, ideally, one week from now, we should be able to open that same UMA link and see the two histogram being on-par.
,
Jan 11 2017
I'm wondering, why ChromeOS' leak detector didn't catch this? Simon, any clues?
,
Jan 11 2017
I don't know anything about ChromeOS's leak detector, but not that the memory isn't technically leaked - it's still owned, it's just that the objects aren't doing anything, and aren't destroyed until the profile is torn down.
,
Jan 11 2017
As I understand leak detector should've caught it - it monitors increases in allocation sizes, and a significant increase in allocations of the same size is deemed a leak and triggers stack frame collection, which is then sent home and investigated.
,
Jan 11 2017
Re #64: ChromeOS stack symbolization was broken in crashes, for quite some time, last year - might the leak detector collection have been similarly affected?
,
Jan 11 2017
Re #38: What bloat? I found a leak using leak detector and it was fixed on Dec 3: https://codereview.chromium.org/2544203003/ Re #62: I did actually come across it in November. See attached screenshot. But I didn't know if it was a leak or working as intended. There were a lot of leak reports from various call sites and I thought an HTTP-related call site would require more extensive knowledge of the networking stack, so I prioritized looking at other leaks. We will talk about it today's memory leak detection meeting. If you're interested in joining, message me for an invite.
,
Jan 12 2017
Re #66: This doc talks about other memory bloats, which I am yet to investigate. https://docs.google.com/document/d/1XKx4-AMBxPxfJ0jDlBlqiWGNbl6-AFNvNceLvlx4mvM
,
Jan 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723 commit 910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723 Author: zhongyi <zhongyi@chromium.org> Date: Tue Jan 17 17:55:52 2017 Move the logic to cancel main job in OrphanUnboundJob when the alt job succeed while main job is blocked. When the alt job succeeds while the main job hasn't, the alt job will be bound the request by HttpStreamFactoryImpl::JobController::BindJob(). And JobController will orphan the main job. If the main job - is still blocked(i.e, not started yet), it should be cancelled immediately. - is no longer blocked - but is for websocket, it should be cancelled immediately. - and is not for websocket, no-op. The job will continue connecting until it finishes. When it's finishing, it will call one of the callbacks and notifying the JobController of the Job's completion. The JobController will then check whether this job has been orphaned, and calls OnOrphanedJobComplete to delete the main job. BUG= 678768 Review-Url: https://codereview.chromium.org/2625133006 Cr-Commit-Position: refs/heads/master@{#444090} [modify] https://crrev.com/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723/net/http/http_stream_factory_impl_job_controller.cc [modify] https://crrev.com/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723/net/http/http_stream_factory_impl_job_controller_unittest.cc
,
Jan 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723 commit 910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723 Author: zhongyi <zhongyi@chromium.org> Date: Tue Jan 17 17:55:52 2017 Move the logic to cancel main job in OrphanUnboundJob when the alt job succeed while main job is blocked. When the alt job succeeds while the main job hasn't, the alt job will be bound the request by HttpStreamFactoryImpl::JobController::BindJob(). And JobController will orphan the main job. If the main job - is still blocked(i.e, not started yet), it should be cancelled immediately. - is no longer blocked - but is for websocket, it should be cancelled immediately. - and is not for websocket, no-op. The job will continue connecting until it finishes. When it's finishing, it will call one of the callbacks and notifying the JobController of the Job's completion. The JobController will then check whether this job has been orphaned, and calls OnOrphanedJobComplete to delete the main job. BUG= 678768 Review-Url: https://codereview.chromium.org/2625133006 Cr-Commit-Position: refs/heads/master@{#444090} [modify] https://crrev.com/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723/net/http/http_stream_factory_impl_job_controller.cc [modify] https://crrev.com/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723/net/http/http_stream_factory_impl_job_controller_unittest.cc
,
Jan 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723 commit 910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723 Author: zhongyi <zhongyi@chromium.org> Date: Tue Jan 17 17:55:52 2017 Move the logic to cancel main job in OrphanUnboundJob when the alt job succeed while main job is blocked. When the alt job succeeds while the main job hasn't, the alt job will be bound the request by HttpStreamFactoryImpl::JobController::BindJob(). And JobController will orphan the main job. If the main job - is still blocked(i.e, not started yet), it should be cancelled immediately. - is no longer blocked - but is for websocket, it should be cancelled immediately. - and is not for websocket, no-op. The job will continue connecting until it finishes. When it's finishing, it will call one of the callbacks and notifying the JobController of the Job's completion. The JobController will then check whether this job has been orphaned, and calls OnOrphanedJobComplete to delete the main job. BUG= 678768 Review-Url: https://codereview.chromium.org/2625133006 Cr-Commit-Position: refs/heads/master@{#444090} [modify] https://crrev.com/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723/net/http/http_stream_factory_impl_job_controller.cc [modify] https://crrev.com/910b9b9050f70a0a8602a8cfcb0ac45b9bcdd723/net/http/http_stream_factory_impl_job_controller_unittest.cc
,
Jan 17 2017
The commit on #68 should fix the code completely. It looks like the bugdroid commented the same message multiple times, I will keep watching to make sure the CQ looks great and is not committing duplicately(crbug/432905).
,
Jan 17 2017
Handing this off to ssid@ per comment #56. zhongyi@'s change makes the new code match the old behavior more closely. There's no functional change, so it doesn't need to be merged.
,
Jan 18 2017
The data for dev channel looks quite promising: TL;DR of QUIC vs Control Group - Windows is on-par - Linux is now even better (I guess a side effect of having less connections around without leaking the http ones?) UMA links: Linux Dev channel *BEFORE* the fix #53: go/wispm Linux Dev channel *AFTER* the fix #53: go/hfewr Win Dev before: go/zjsef Win Dev after: go/vaeis Well rescued ;-)
,
Jan 20 2017
Issue 641518 has been merged into this issue.
,
Jan 23 2017
I think we can close this bug as per comment #73
,
Jan 24 2017
,
Jan 25 2017
Not being a chrome dev, I don't know enough about the release cycles. How can I figure out what version I need to have to see the improvement?
,
Jan 25 2017
The temp fix is available in 56.0.2924.60 as xunjieli@ has merged that to M56, which will be rolled out to Stable in the near future. |
||||||||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||||||||
Comment 1 by ssid@chromium.org
, Jan 5 2017