New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 51286 link

Starred by 19 users

Issue metadata

Status: Archived
Owner: ----
Closed: Aug 2015
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Chrome
Pri: 2
Type: Bug

Blocked on:
issue 73751



Sign in to add a comment

Don't crash on g_try_realloc g_try_malloc

Reported by evan@chromium.org, Aug 5 2010

Issue description

Our tcmalloc override crashes whenever malloc/realloc fail.
But glib has two flavors of its malloc functions: the default where it expects the crash, and the other where the caller is promising they'll check the result for NULL.  We shouldn't crash in the latter case.

Example bad stack trace:

#0  DebugUtil::BreakDebugger () at base/debug_util_posix.cc:259
#1  0x0943ef85 in ~LogMessage (this=0xbf9e0758, __in_chrg=<value optimised out>) at base/logging.cc:589
#2  0x09453d18 in OnNoMemorySize (size=0) at base/process_util_linux.cc:515
#3  0x09453d2c in OnNoMemory () at base/process_util_linux.cc:519
#4  0x0940f961 in cpp_alloc (size=1543511790, nothrow=true) at third_party/tcmalloc/chromium/src/tcmalloc.cc:1273
#5  0x0940ec2f in do_malloc_or_cpp_alloc (size=1543511790) at third_party/tcmalloc/chromium/src/tcmalloc.cc:923
#6  0x0940f3e4 in do_realloc_with_callback (old_ptr=0xf641000, new_size=1543511790, invalid_free_fn=0x940d8d0 <InvalidFree>, invalid_get_size_fn=0x940d915 <InvalidGetSizeForRealloc>)
    at third_party/tcmalloc/chromium/src/tcmalloc.cc:1112
#7  0x0940f498 in do_realloc (old_ptr=0xf641000, new_size=1543511790) at third_party/tcmalloc/chromium/src/tcmalloc.cc:1135
#8  0x0ad020ab in tc_realloc ()
#9  0x4147ddf0 in IA__g_try_realloc (mem=<value optimised out>, n_bytes=<value optimised out>) at /build/buildd/glib2.0-2.24.1/glib/gmem.c:225
#10 0xacccb32a in DecodeHeader (data=0xf2bb8a0, buf=0xbf9e0f5c  <incomplete sequence \314>, size=53945, error=0xe562d04) at /build/buildd/gtk+2.0-2.20.1/gdk-pixbuf/io-ico.c:364
#11 gdk_pixbuf__ico_image_load_increment (data=0xf2bb8a0, buf=0xbf9e0f5c  <incomplete sequence \314>, size=53945, error=0xe562d04) at /build/buildd/gtk+2.0-2.20.1/gdk-pixbuf/io-ico.c:881
#12 0x477a9d1e in IA__gdk_pixbuf_loader_write (loader=<value optimised out>, buf=<value optimised out>, count=<value optimised out>, error=<value optimised out>)


It looks like glib lets us override its vtable for these functions separately.
static GMemVTable glib_mem_vtable = {
  standard_malloc,
  standard_realloc,
  standard_free,
  standard_calloc,
  standard_try_malloc,
  standard_try_realloc,
};
 
usb.ico
60.0 KB Download
Labels: -Area-Undefined Area-Internals Crash Internals-Core Mstone-7
Heh, I couldn't trigger this crash at first. My VM has 2 GB of memory for running gdb.

Comment 2 by evan@chromium.org, Aug 5 2010

Haha, nice. :)

Comment 3 by evan@chromium.org, Aug 5 2010

Oh, and for reference (e.g. why I included the icon attachment)
http://groups.google.com/a/chromium.org/group/chromium-dev/browse_thread/thread/03c819fa91fc8342
Status: Available
Summary: Don't crash on g_try_realloc g_try_malloc
FYI, this is not a tcmalloc only problem. We've set up malloc to crash on no memory as well, so as is, turning off tcmalloc doesn't help.

Other notes:

To reproduce the crash, press ctrl+o, and try to open usb.ico.

For some reason it wasn't obvious to me at first, but g_mem_set_vtable() is the function to override the vtable.
Is there a suggested fix here?  Use a thread-local static in OnNoMemorySize() that is set by standard_try_malloc()/standard_try_realloc()?

Comment 6 by evan@chromium.org, Aug 17 2010

I don't know the tcmalloc API, but in my imagination there's something like:

void* malloc() {  void *p = tcmalloc(...); if (!p) crash(); return p; }

We "just" want to route these try_malloc calls into the internal malloc, such that they don't call the wrapper.
tcmalloc/malloc does not work the way you think it does.  The "internal" malloc is inlined code in tcmalloc (and I believe in glibc's ptmalloc as well, although my memory is fuzzy) for performance (although there's been some debate as to the actual gains of doing so).  It's not exposed for external users to call.
Status: Assigned
Hey Jim,

Mind lending your expertise about tcmalloc?

Comment 9 by jar@chromium.org, Aug 27 2010

Labels: -Mstone-7 Mstone-X
It would appear that the library is ettempting to call try_realloc, but we have no such function, and someone is defaulting to call realloc.  My guess is that defaulting is automatic, as there isn't (usually) a try_realloc call.  

[It is rather funny that the realloc call is supposed to support the NULL return interface, but lazy non-checking programs have forced folks to crash on out-of-memory in the official realloc, despite its API definition.  Now folks are advocating for this "I really really really promise to check and abide by the interface" flavor called try_realloc :-) ].

So I'm guessing that the "easy" fix (if we want to do this) is to define try_realloc, and do the requisite call into the library (re: "glib lets us override its vtable") to be sure it uses our "real" try_realloc (and not just the commonly available realloc).

Implementing try_realloc is a bit of a bother, because we've used a hack in our allocator implementation.  We've "cleverly implemented" the callback that is supposed to help when we have memory pressure (out-of-memory).  The callback is supposed to "try to free up memory that really isn't desperately needed."  Our clever (hack?) replacement of that method causes us to crash cleanly on ALL out of memory calls, with only a minimal one-line (hack) change.

I suspect the fix is to stop hacking via that callback, and instead do the error checking for a NULL return at the top level (in all supported allocator calls, just before returning to the calling application).  This would allow us to implement a try_realloc that doesn't crash on OOM (at a lower level).

In addition to catching all the return points individually (watching that malloc never returns a NULL), care has to be taken with some functions, like realloc, which do return a NULL if the re-size is zero.  As I recall, there may also be some other subtleties that have to be gracefully handled in/around these implementations (for example: malloc(0) is "allowed" to return NULL I think, but I'm not sure what we do).

That approach of removing the hack, and checking at just before a return, seems much less fragile than messing with thread-local-store, as it guarantees that the change in functionality happens in exactly the right calling context.

That said, this bug doesn't at first blush appear to be a super high priority.  The whole concept of try_realloc seems a BIT fraught with peril, as application code was always supposed to check for NULL returns, and didn't (which is how we got here)... and now there are renewed promises that the caller will gracefully check and handle NULL returns when calling this new function :-/.  My guess is that ALL callers of try_realloc need to be carefully code reviewed as part of a security audit... and I'm not sure how this fleshes out when the caller is inside a third party library. :-(

Could I hear some arguments about why try_realloc and try_malloc is a good thing to do, and why this won't create more security problems (kindred to historical problems with OOM).  How do we assure that we're not creating security bugs as overconfident libraries merely switch to calling try_realloc rather than realloc, to get what they always wanted in the first place?? <sigh>

If *I* (paranoid security guy of the past) were implementing a new/safer API (still not using exception handling), I would have advocated for a new function call to "reserve a giant chunk if we can," to allow a caller to test for availability of memory.  After such a test/reservation, then a try_alloc could officially get the reserved memory (or crash if it asked for too much).  This current interface (re: try_realloc and try_malloc, assuming callers check for NULL) is going IMO to get us back to where we started, with security vulnerable interface corrupted by lazy programmers.

Since I don't really think this should be done, I'll take a small passive-aggressive action and remove the Mstone-7 label.

Comment 10 by evan@chromium.org, Aug 27 2010

Labels: -Mstone-X Mstone-8
I agree with Jim's assessment, except for the conclusion: since we have a real system library relying on this, we can't just hope that the problem goes away.  Since it's easier to type "malloc" than "try_malloc" I would imagine any programmer going to the effort to use the longer form is aware of the difference, as this particular library is.

Comment 11 by jar@chromium.org, Oct 12 2010

Labels: -Mstone-8 Mstone-9 OS-Chrome
Labels: -Mstone-9 Mstone-10
Moving P2s with an owner to Mstone 10, bring back to M9 if you think it's important and you don't have higher priority work.
 Issue 57170  has been merged into this issue.
 Issue 65754  has been merged into this issue.

Comment 15 by kerz@chromium.org, Dec 9 2010

Labels: -Mstone-10 MovedFrom-10 Mstone-11
P2 bugs with an owner that are not marked as started are being automatically moved to mstone:11.

Comment 16 by jar@chromium.org, Feb 16 2011

Labels: -Mstone-11 Mstone-12
For now, I'm going to push this to M12.

Comment 17 by evan@chromium.org, Feb 22 2011

Blockedon: 73751
Filed issue 73751 about the global issue of needing to support tryMalloc.
Labels: -Crash bulkmove Stability-Crash
Our tcmalloc override crashes whenever malloc/realloc fail.
But glib has two flavors of its malloc functions: the default where it expects the crash, and the other where the caller is promising they'll check the result for NULL.  We shouldn't crash in the latter case.

Example bad stack trace:

#0  DebugUtil::BreakDebugger () at base/debug_util_posix.cc:259
#1  0x0943ef85 in ~LogMessage (this=0xbf9e0758, __in_chrg=&lt;value optimised out&gt;) at base/logging.cc:589
#2  0x09453d18 in OnNoMemorySize (size=0) at base/process_util_linux.cc:515
#3  0x09453d2c in OnNoMemory () at base/process_util_linux.cc:519
#4  0x0940f961 in cpp_alloc (size=1543511790, nothrow=true) at third_party/tcmalloc/chromium/src/tcmalloc.cc:1273
#5  0x0940ec2f in do_malloc_or_cpp_alloc (size=1543511790) at third_party/tcmalloc/chromium/src/tcmalloc.cc:923
#6  0x0940f3e4 in do_realloc_with_callback (old_ptr=0xf641000, new_size=1543511790, invalid_free_fn=0x940d8d0 &lt;InvalidFree&gt;, invalid_get_size_fn=0x940d915 &lt;InvalidGetSizeForRealloc&gt;)
    at third_party/tcmalloc/chromium/src/tcmalloc.cc:1112
#7  0x0940f498 in do_realloc (old_ptr=0xf641000, new_size=1543511790) at third_party/tcmalloc/chromium/src/tcmalloc.cc:1135
#8  0x0ad020ab in tc_realloc ()
#9  0x4147ddf0 in IA__g_try_realloc (mem=&lt;value optimised out&gt;, n_bytes=&lt;value optimised out&gt;) at /build/buildd/glib2.0-2.24.1/glib/gmem.c:225
#10 0xacccb32a in DecodeHeader (data=0xf2bb8a0, buf=0xbf9e0f5c  &lt;incomplete sequence \314&gt;, size=53945, error=0xe562d04) at /build/buildd/gtk+2.0-2.20.1/gdk-pixbuf/io-ico.c:364
#11 gdk_pixbuf__ico_image_load_increment (data=0xf2bb8a0, buf=0xbf9e0f5c  &lt;incomplete sequence \314&gt;, size=53945, error=0xe562d04) at /build/buildd/gtk+2.0-2.20.1/gdk-pixbuf/io-ico.c:881
#12 0x477a9d1e in IA__gdk_pixbuf_loader_write (loader=&lt;value optimised out&gt;, buf=&lt;value optimised out&gt;, count=&lt;value optimised out&gt;, error=&lt;value optimised out&gt;)


It looks like glib lets us override its vtable for these functions separately.
static GMemVTable glib_mem_vtable = {
  standard_malloc,
  standard_realloc,
  standard_free,
  standard_calloc,
  standard_try_malloc,
  standard_try_realloc,
};

Comment 19 by k...@google.com, Apr 25 2011

Labels: -Mstone-12 Mstone-13 MovedFrom12
Moving out of M12.
Labels: -Mstone-13 Mstone-14 MovedFrom13
Moving !type=meta|regression and !releaseblocker to next mstone
Labels: -MovedFrom12 MovedFrom-12

Comment 22 by jar@chromium.org, Jul 25 2011

Labels: -Mstone-14 Mstone-15

Comment 23 by kareng@google.com, Sep 9 2011

Labels: -Mstone-15 Mstone-16 MovedFrom-15

Comment 24 by laforge@google.com, Oct 24 2011

Labels: -Mstone-16 MovedFrom-16 Mstone-17

Comment 25 by k...@google.com, Dec 19 2011

Labels: -Mstone-17 Mstone-18 MovedFrom-17
Moving bugs marked as Assigned but not blockers from M17 to M18.  Please move back if you think this is a blocker, and add the ReleaseBlock-Stable label.  If you're able.
Cc: venkataramana@chromium.org anan...@chromium.org
 Issue 92120  has been merged into this issue.

Comment 27 by jar@google.com, Feb 3 2012

I raised a second concern about implementing a resolution to this bug in a chrome-dev thread.  Below is a slighly compacted version of that email.  This argument is probably even stronger than the perilous nature of the requested interface (which I commented on in comment 9 above).


BOTTOM LINE: It is problematic to support libraries that try to test the limit of memory allocations in chrome.

BACKGROUND: Throughout all of chrome, we have not built infrastructure or checking if arbitrary calls for object construction fails.  Similarly, we have not sanitized code that uses malloc to be sure it properly handles NULL return.  We've instead depended on the "crash on over-allocation request" as OOM.

As a result, if a library could attempt to allocate until it can't allocate any more (and gracefully(?) continue), then it would routinely(?) leave the memory arena in a near OOM situation.  A subsequent call to allocate memory will then induce the default OOM crash, and tend to "blame" an innocent bystander, rather the the glutenous library :-/.  Even if a library attempted to "test the water" and free some if its allocated region when it neared an OOM situtation, it would probably fragment memory, resulting in very negative side effects.

...

Given our desire to simplify most code (not check for malloc failures, and simply crash), we should not have libraries routinely working to get close to that border... or we should understand that when we get close, we *will* crash, and avoiding a crash in one place will only put pressure on another place.  If anything, it is better to be an equal opportunity crasher, so that the largest allocating libraries are visible more often in crash dumps, as *they* are the programs that are driving us to the limit.  It is almost a form of memory profiling  as it currently stands (i.e., it provides increased "visibility" for modules that "overly" allocate).

PLAUSIBLE ALTERNATE SOLUTION:

It might be the case that we should support a request for info about how much memory is available, so that libraries can attempt to tune their usages up and down. I'm still even skeptical about such a dynamic approach, but perhaps we can talk further about it.  Such a system is used for disk size, but memory size is generally limited by virtual space, and that is known before the program starts (typically at compile time).  Perhaps we can have dynamic registration of components, each supplying predictions of what they will probably(?) use, and the dynamic resulting tally might be exposed to all libraries... but I've never seen libraries setup to accept such an API.


SUMMARY:  try_alloc would, if implemented, allow glutinous libraries to (at best) fragment memory, and more likely, cause conservative (restricted allocation libraries) to be blamed for OOM, when they are not the root cause at all.  try_alloc is IMO, a bad API, which is more harmful in the context of Chromium's allocation monitoring policy.

Comment 28 by kareng@google.com, Feb 7 2012

Labels: MovedFrom18 Mstone-19

Comment 29 by jar@chromium.org, Feb 15 2012

One compromise that might be helpful.... should we implement a try_malloc() that restricts the argument size to something that is at least plausible?  For instance, it is clear that on Windows it is a waste to attempt malloc something larger that 2GB, as that is the full users allocatable space (despite the fact that 4GB is the virtual space).  This sort of change might be helpful in some of the more strange calls, such as attempts to render giant JPEG images.  Perhaps we should (as an example) limit success try_malloc attempts to no more than a few hundred MB (at least on 32 bit builds). Any larger attempts would get a return of NULL... and presumably the underlying code would "gracefully handle" this denial.

This would at least preclude a lot of silly deliberate crashes... but would probably still crash when memory was mostly exhausted, and a try_malloc (or try_realloc) of significant size was attempted.

Comments??

What size would be "reasonable?"

Comment 30 by jar@chromium.org, Mar 26 2012

Labels: -Mstone-19 Mstone-20
Moved to M20

Comment 31 by k...@google.com, Apr 27 2012

Labels: -Mstone-20 MovedFrom-20
These bugs have hit their move limit.  Please re-target as appropriate.

Comment 32 by evan@chromium.org, Jun 11 2012

Blockedon: -73751 chromium:73751
Cc: -evan@chromium.org
(Un-ccing myself from bugs.)
Project Member

Comment 33 by bugdroid1@chromium.org, Mar 10 2013

Blockedon: -chromium:73751 chromium:73751
Labels: -Area-Internals -Internals-Core Cr-Internals Cr-Internals-Core

Comment 34 by jar@chromium.org, Jun 17 2013

Owner: ----
Status: Available
Marking as available... to allow an advocate to try to resolve this differently.
Status: Archived
(Automated-archive) Crash report hasn't been modified in more than one year, please re-open if this is still an issue.

Sign in to add a comment