New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 14 users
Status: Fixed
Owner:
Last visit > 30 days ago
Closed: May 2010
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug
M-6

Restricted
  • Only users with EditIssue permission may comment.



Sign in to add a comment
Page hoses browser
Project Member Reported by pkasting@chromium.org, Apr 21 2010 Back to list
Visit http://www.teamliquid.net/forum/viewmessage.php?topic_id=120488

Now try various things while it's loading:
* Hitting stop changes the stop button to a go button but the throbber keeps 
spinning
* Clicking one of the "Spoiler" links doesn't do anything
* Trying to end the process of the associated tab doesn't work (at least 
sometimes)
* Opening a new tab, closing the old tab, and trying to visit something (e.g. 
google.com) in the new tab hangs up and glitches

WTF?  I don't even know who should look at this.
 
Comment 1 by eroman@chromium.org, Apr 22 2010
Labels: -Pri-2 -Area-UI Pri-1 Area-Internals Internals-Network
Status: Assigned
Will: Looks like this is from your socket limit change.

It is getting confused when it reaches the max sockets, and ends up with |total > max_sockets_| so 
ReachedMaxSocketsLimit() actually is returning false, since they don't match exactly.

Basically the IO thread is hosed, so the entire browser stops working right.
I suspect this is because of mbelshe's backup ConnectJobs which don't respect socket limits.  My change is 
probably not the root cause, it just exacerbates the problem.  I will look into it tomorrow.
Comment 3 by eroman@chromium.org, Apr 22 2010
Thanks for investigating!
(sorry i didn't mean to assign blame, just figured you would be a good person to debug this)
Hah, don't worry, I'm not a 12 yr old.  Not being defensive, just providing the 
probable diagnosis.
I'm working on fixing this right now.  Just for clarification, I don't think it had anything to do with my 
change (I was confusing it with a different change, which would have exacerbated the issue).  Actually, I'm 
sure of it, because I just tested against dev channel which didn't have my change.  Would people consider this 
a release blocker for Mstone-5?

I'm working on a fix now, although as eroman will attest to, this code is pretty sensitive.  It might be worth 
considering a revert, but I don't know.  Adding laforge and mbelshe.  A simple fix might be to change the != to 
a greater than or less than condition.  I'm not convinced it won't just hide some other issue, but I need to 
investigate further.
I would consider this an M5 blocker, yes.  "Just visiting a particular page hoses the 
IO thread and destroys the whole browser" is a bad bug.
Labels: Mstone-5 ReleaseBlock-Stable
I spent a reasonable chunk of today looking at how to fix this cleanly, without adding 
more one off hacks.  I think we may need to revert the backup socket jobs in branch 
375.  Mike, is that ok with you?  I know it might screw up some SPDY benchmarking / 
stats gathering.
Comment 9 Deleted
Comment 10 Deleted
Comment 11 Deleted
Will - I am happy to take this if you think it is related to the backup jobs.  I think
we can fix this.  Personally, I can't reproduce any problems yet though.  I tried
following the instructions and it just works?
Comment 13 by wtc@chromium.org, Apr 26 2010
mbelshe, willchan: for M5 we need to check in a fix (or revert the backup socket jobs CL)
on the 375 branch today.  Can you do that today?
I did a messy revert of r41543 (original CL), r41547 (crash fix), and r42179 
(vandebo's fix so we don't do backup jobs for non-TCP sockets).  It's not a clean 
revert on ToT, partly due to eroman's LoadLog=>NetLog refactors.  I did verify that a 
revert fixes the problem.

It might be cleaner to revert on 375 directly.  If not, then perhaps we should just 
do a patch similar to r42179 that disables the backup connect job code for all socket 
pools, not just non-TCP pools.  It'd require updating unit tests too though.  I don't 
think any revert will be clean since the code's been in for awhile.

I'm assigning to mbelshe.
Status: Started
Er, Mike started looking into it and the backup connect job doesn't look to be the 
cause.  It does fix the socket limits issue that we were seeing with the page, but 
apparently the ReachedMaxSocketLimits() thing might be a red herring.

Peter, can you double check which version of Chrome causes this problem for you?  If 
it's not reproducible in dev channel, then the backup connect job stuff clearly is not 
at fault.
It was the Dev channel release at the time of filing, which I believe was 5.0.375.9.
Oops, looks like my followup post didn't go through.  Yes, we also verified that dev 
channel is broken.  There is a while(true) loop that breaks out when 
ReachedMaxSocketLimits() returns true.  Mike's change broke that.  Mike says that he 
still sees hangs even after turning off the backup connect job.  I'm working on trying 
to verify this.  I don't see it though.  So, there's at least one bug, perhaps more.  
Fixing the backup socket connect job will address the majority case at least, if not 
fix the entire issue.  mbelshe's got a change for that which I'm going to look at 
shortly.
Labels: -Pri-1 -Mstone-5 -ReleaseBlock-Stable Pri-0 Mstone-5-b3 ReleaseBlock-Dev OS-All
This should block next dev channel/beta, not just stable. Twiddling priority and milestones.
The biggest part of this bug was  bug 42597 .  I believe I can still reproduce this bug 
when we hit limits, but willchan says he cannot.  I think there is a lingering bug, so 
leaving this open for a bit longer.
Comment 21 by wtc@chromium.org, Apr 29 2010
Labels: -Pri-0 -Mstone-5-b3 -ReleaseBlock-Dev Pri-2 Mstone-6
mbelshe fixed the biggest part of this bug in  bug 42597  for M5.

Leave this bug open for the remaining issue.  Moved to M6 and
lowered priority to P2.
 Issue 43208  has been merged into this issue.
This still seems to be happening on ToT for me.  I looked into it, I think there's a 
bug with the releasing sockets bugfix stuff that I wrote back in February.  I'm not 
sure why it's taken this long to show up.  I think it's because on this website, we 
open so many connections.  My worker pool had like 100+ threads in it.  I think the 
fix will be touchy.  A quick guaranteed fix is to turn off the 256 socket limit in the 
375 branch.  I don't think it's gaining us much, so I think we should just do it and 
get a real fix in for Mstone-6.
Labels: -Pri-2 Pri-1
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=46641 

------------------------------------------------------------------------
r46641 | laforge@chromium.org | 2010-05-06 17:19:07 -0700 (Thu, 06 May 2010) | 4 lines
Changed paths:
   M http://src.chromium.org/viewvc/chrome/branches/375/src/net/http/http_network_session.cc?r1=46641&r2=46640

Change socket pool max from 100 to 1000000
This will effectively disable the socket limit, which is known to be buggy and can cause the IO thread to enter into an infinite loop.
BUG= 42267 
Review URL: http://codereview.chromium.org/2038005
------------------------------------------------------------------------

The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=46757 

------------------------------------------------------------------------
r46757 | willchan@chromium.org | 2010-05-07 16:45:30 -0700 (Fri, 07 May 2010) | 7 lines
Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/socket/client_socket_pool_base.cc?r1=46757&r2=46756
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/socket/client_socket_pool_base.h?r1=46757&r2=46756
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/socket/client_socket_pool_base_unittest.cc?r1=46757&r2=46756
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/socket/tcp_client_socket_pool_unittest.cc?r1=46757&r2=46756

Fix IO thread hang on releasing a socket.

The problem was we assumed ProcessPendingRequest() would always make progress once the group's releasing socket went down to zero.  However, in OnAvailableSocketSlot(), since the top stalled group might still have a releasing socket, it won't necessarily make progress.  The algorithmic solution is to simply never do any socket slot processing in DoReleaseSocket() if there are any releasing sockets.  This requires us to search for any stalled groups after releasing sockets gets back down to 0, so it requires the full group scan each time num_releasing_sockets goes back to 0.  There is a performance hit here, but I think a linear search through 256~ groups in the worst case is ok.
TODO(willchan): evaluate the perf hit and consider adding a secondary data structure to improve the stalled group search.
BUG= 42267 

Review URL: http://codereview.chromium.org/2013009
------------------------------------------------------------------------

Status: Fixed
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=46792 

------------------------------------------------------------------------
r46792 | sky@chromium.org | 2010-05-08 13:18:14 -0700 (Sat, 08 May 2010) | 10 lines
Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/socket/client_socket_pool_base.cc?r1=46792&r2=46791
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/socket/client_socket_pool_base.h?r1=46792&r2=46791
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/socket/client_socket_pool_base_unittest.cc?r1=46792&r2=46791
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/socket/tcp_client_socket_pool_unittest.cc?r1=46792&r2=46791

Revert 46757 - Fix IO thread hang on releasing a socket.

The problem was we assumed ProcessPendingRequest() would always make progress once the group's releasing socket went down to zero.  However, in OnAvailableSocketSlot(), since the top stalled group might still have a releasing socket, it won't necessarily make progress.  The algorithmic solution is to simply never do any socket slot processing in DoReleaseSocket() if there are any releasing sockets.  This requires us to search for any stalled groups after releasing sockets gets back down to 0, so it requires the full group scan each time num_releasing_sockets goes back to 0.  There is a performance hit here, but I think a linear search through 256~ groups in the worst case is ok.
TODO(willchan): evaluate the perf hit and consider adding a secondary data structure to improve the stalled group search.
BUG= 42267 

Review URL: http://codereview.chromium.org/2013009

TBR=willchan@chromium.org
Review URL: http://codereview.chromium.org/1992010
------------------------------------------------------------------------

The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=46837 

------------------------------------------------------------------------
r46837 | willchan@chromium.org | 2010-05-10 11:58:54 -0700 (Mon, 10 May 2010) | 5 lines
Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/socket/client_socket_pool_base.cc?r1=46837&r2=46836
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/socket/client_socket_pool_base.h?r1=46837&r2=46836
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/socket/client_socket_pool_base_unittest.cc?r1=46837&r2=46836
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/socket/tcp_client_socket_pool_unittest.cc?r1=46837&r2=46836

Reland 46757.
OnAvailableSocketSlot() was dereferencing a NULL pointer.  Add a NULL check.  Add a test.
BUG= 42267 

Review URL: http://codereview.chromium.org/2050005
------------------------------------------------------------------------

 Issue 43741  has been merged into this issue.
This is not fixed yet

Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x7fffec3d7910 (LWP 2298)]
0x00007ffff3a4296b in write () from /lib/libpthread.so.0
(gdb) q

  Installed: 6.0.401.0~svn20100512r46998-0ubuntu1~ucd1~karmic
6.0.401.0 (46998) 
The bug was an infinite loop.  That shouldn't have anything to do with a SIGPIPE.  
Perhaps you want to file a new bug.
Will i did...  issue 43741 
it got dupped of this one :S
 Bug 43741  doesn't say anything about a SIGPIPE.
ok, i'll file a new one tomorrow
 Issue 44049  has been merged into this issue.
so is  issue 43741  a dupe or not? if it isnt, please remove its status as dupe
The original issue you posted there is a dupe.  The SIGPIPE sounds like a separate issue.  You might be 
encountering multiple different issues, in which you should file separate bugs for each.  Again, please post a 
stacktrace for all thread stacks when you file a new bug.  "thread apply all bt" in gdb.
 Issue 43079  has been merged into this issue.
Will as requested, new issue open

http://code.google.com/p/chromium/issues/detail?id=44358

let me know if i can provide any more data, or if it is a dupe of one of the other i
already have open
 Issue 43769  has been merged into this issue.
Project Member Comment 43 by bugdroid1@chromium.org, Oct 12 2012
Labels: Restrict-AddIssueComment-Commit
This issue has been closed for some time. No one will pay attention to new comments.
If you are seeing this bug or have new data, please click New Issue to start a new bug.
Project Member Comment 44 by bugdroid1@chromium.org, Mar 10 2013
Labels: -Area-Internals -Internals-Network -Mstone-6 M-6 Cr-Internals Cr-Internals-Network
Project Member Comment 45 by bugdroid1@chromium.org, Mar 13 2013
Labels: -Restrict-AddIssueComment-Commit Restrict-AddIssueComment-EditIssue
Sign in to add a comment