What's the graceful way of handling out of memory situations in C/C++?_问答_开发者

I'm writing an caching app that consumes large amounts of memory.

Hopefully, I'll manage my memory well enough, but I'm just thinking about what to do if I do run out of memory.

If a call to allocate even a simple object fails, is it likely that even a syslog call will also fail?

EDIT: Ok perhaps I should clarify the question. If malloc or new returns a NULL or 0L value then it essentially means the call failed and it can't give you the memory for some reason. So, what would be the sensible thing to do in that case?

EDIT2: I've just realised that a call to "new" can throw an exception. This could be caught at a higher level so I can perhaps gracefully exit further up. At that point, it may even be possible to recover depending on how much memory is freed. In the least I should by that point hopefully开发者_如何学Go be able to log something. So while I have seen code that checks the value of a pointer after new, it is unnecessary. While in C, you should check the return value for malloc.

Well, if you are in a case where there is a failure to allocate memory, you're going to get a std::bad_alloc exception. The exception causes the stack of your program to be unwound. In all likelihood, the inner loops of your application logic are not going to be handling out of memory conditions, only higher levels of your application should be doing that. Because the stack is getting unwound, a significant chunk of memory is going to be free'd -- which in fact should be almost all the memory used by your program.

The one exception to this is when you ask for a very large (several hundred MB, for example) chunk of memory which cannot be satisfied. When this happens though, there's usually enough smaller chunks of memory remaining which will allow you to gracefully handle the failure.

Stack unwinding is your friend ;)

EDIT: Just realized that the question was also tagged with C -- if that is the case, then you should be having your functions free their internal structures manually when out of memory conditions are found; not to do so is a memory leak.

EDIT2: Example:

#include <iostream>
#include <vector>

void DoStuff()
{
    std::vector<int> data;
    //insert a whole crapload of stuff into data here.
    //Assume std::vector::push_back does the actual throwing
    //i.e. data.resize(SOME_LARGE_VALUE_HERE);
}

int main()
{
    try
    {
        DoStuff();
        return 0;
    }
    catch (const std::bad_alloc& ex)
    {   //Observe that the local variable `data` no longer exists here.
        std::cerr << "Oops. Looks like you need to use a 64 bit system (or "
                     "get a bigger hard disk) for that calculation!";
        return -1;
    }
}

EDIT3: Okay, according to commenters there are systems out there which do not follow the standard in this regard. On the other hand, on such systems, you're going to be SOL in any case, so I don't see why they merit discussion. But if you are on such a platform, it is something to keep in mind.

Doesn't this question make assumptions regarding overcommitted memory?

I.e., an out of memory situation might not be recoverable! Even if you have no memory left, calls to malloc and other allocators may still succeed until the program attempts to use the memory. Then, BAM!, some process gets killed by the kernel in order to satisfy memory load.

I don't have any specific experience on Linux, but I spent a lot of time working in video games on games consoles, where running out of memory is verboten, and on Windows-based tools.

On a modern OS, you're most likely to run out of address space. Running out of memory, as such, is basically impossible. So just allocate a large buffer, or buffers, on startup, in order to hold all the data you'll ever need, whilst leaving a small amount for the OS. Writing random junk to these regions would probably be a good idea in order to force the OS to actually assign the memory to your process. If your process survives this attempt to use every byte it's asked for, there's some kind of backing now reserved for all of this stuff, so now you're golden.

Write/steal your own memory manager, and direct it to allocate from these buffers. Then use it, consistently, in your app, or take advantage of gcc's --wrap option to forward calls from malloc and friends appropriately. If you use any libraries that can't be directed to call into your memory manager, junk them, because they'll just get in your way. Lack of overridable memory management calls is evidence of deeper-seated issues; you're better of without this particular component. (Note: even if you're using --wrap, trust me, this is still evidence of a problem! Life is too short to use those libraries that don't let you overload their memory management!)

Once you run out of memory, OK, you're screwed, but you've still got that space you left free before, so if freeing up some of the memory you've asked for is too difficult you can (with care) call system calls to write a message to the system log and then terminate, or whatever. Just make sure to avoid calls to the C library, because they'll probably try to allocate some memory when you least expect it -- programmers who work with systems that have virtualised address spaces are notorious for this kind of thing -- and that's the very thing that has caused the problem in the first place.

This approach might sound like a pain in the arse. Well... it is. But it's straightforward, and it's worth putting in a bit of effort for that. I think there's a Kernighan-and/or-Ritche quote about this.

If your application is likely to allocate large blocks of memory and risks hitting the per-process or VM limits, waiting until an allocation actually fails is a difficult situation from which to recover. By the time malloc returns NULL or new throws std::bad_alloc, things may be too far gone to reliably recover. Depending on your recovery strategy, many operations may still require heap allocations themselves, so you have to be extremely careful on which routines you can rely.

Another strategy you may wish to consider is to query the OS and monitor the available memory, proactively managing your allocations. This way you can avoid allocating a large block if you know it is likely to fail, and will thus have a better chance of recovery.

Also, depending on your memory usage patterns, using a custom allocator may give you better results than the standard built-in malloc. For example, certain allocation patterns can actually lead to memory fragmentation over time, so even though you have free memory, the available blocks in the heap arena may not have an available block of the right size. A good example of this is Firefox, which switched to dmalloc and saw a great increase in memory efficiency.

I don't think that capturing the failure of malloc or new will gain you much in your situation. linux allocates large chunks of virtual pages in malloc by means of mmap. By this you may find yourself in a situation where you allocate much more virtual memory than you have (real + swap).

The program then will only fail much later with a segfault (SIGSEGV) when you write to the first page for which there isn't any place in swap. In theory you could test for such situations by writing a signal handler and then dirtying all pages that you allocate.

But usually this will not help much either, since your application will be in a very bad state long before that: constantly swapping, computing mechanically with your harddisk...

It's possible for writes to the syslog to fail in low memory conditions: there's no way to know that for every platform without looking at the source for the relevant functions. They could need dynamic memory to format strings that are passed in, for instance.

Long before you run out of memory, however, you'll start paging stuff to disk. And when that happens, you can forget any performance advantages from caching.

Personally, I'm convinced by the design behind Varnish: the operating system offers services to solve a lot of the relevant problems, and it makes sense to use those services (minor editing):

So what happens with Squid's elaborate memory management is that it gets into fights with the kernel's elaborate memory management ...

Squid creates a HTTP object in RAM and it gets used some times rapidly after creation. Then after some time it get no more hits and the kernel notices this. Then somebody tries to get memory from the kernel for something and the kernel decides to push those unused pages of memory out to swap space and use the (cache-RAM) more sensibly for some data which is actually used by a program. This however, is done without Squid knowing about it. Squid still thinks that these http objects are in RAM, and they will be, the very second it tries to access them, but until then, the RAM is used for something productive. ...

After some time, Squid will also notice that these objects are unused, and it decides to move them to disk so the RAM can be used for more busy data. So Squid goes out, creates a file and then it writes the http objects to the file.

Here we switch to the high-speed camera: Squid calls write(2), the address it gives is a "virtual address" and the kernel has it marked as "not at home". ...

The kernel tries to find a free page, if there are none, it will take a little used page from somewhere, likely another little used Squid object, write it to the paging ... space on the disk (the "swap area") when that write completes, it will read from another place in the paging pool the data it "paged out" into the now unused RAM page, fix up the paging tables, and retry the instruction which failed. ...

So now Squid has the object in a page in RAM and written to the disk two places: one copy in the operating system's paging space and one copy in the filesystem. ...

Here is how Varnish does it:

Varnish allocate some virtual memory, it tells the operating system to back this memory with space from a disk file. When it needs to send the object to a client, it simply refers to that piece of virtual memory and leaves the rest to the kernel.

If/when the kernel decides it needs to use RAM for something else, the page will get written to the backing file and the RAM page reused elsewhere.

When Varnish next time refers to the virtual memory, the operating system will find a RAM page, possibly freeing one, and read the contents in from the backing file.

And that's it. Varnish doesn't really try to control what is cached in RAM and what is not, the kernel has code and hardware support to do a good job at that, and it does a good job.

You may not need to write caching code at all.

As has been stated, exhausting memory means that all bets are off. IMHO the best method of handling this situation is to fail gracefully (as opposed to simply crashing!). Your cache could allocate a reasonable amount of memory on instantiation. The size of this memory would equate to an amount that, when freed, will allow the program to terminate reasonably. When your cache detects that memory is becoming low then it should release this memory and instigate a graceful shutdown.

I'm writing an caching app that consumes large amounts of memory. Hopefully, I'll manage my memory well enough, but I'm just thinking about what to do if I do run out of memory.

If you are writing deamon which should run 24/7/365, then you should not use dynamic memory management: preallocate all the memory in advance and manage it using some slab allocator/memory pool mechanism. That will also protect you again the heap fragmentation.

If a call to allocate even a simple object fails, is it likely that even a syslog call will also fail?

Should not. This is partially reason why syslog exists as a syscall: that application can report an error independent of its internal state.

If malloc or new returns a NULL or 0L value then it essentially means the call failed and it can't give you the memory for some reason. So, what would be the sensible thing to do in that case?

I generally try in the situations to properly handle the error condition, applying the general error handling rules. If error happens during initialization - terminate with error, probably configuration error. If error happens during request processing - fail the request with out-of-memory error.

For plain heap memory, malloc() returning 0 generally means:

that you have exhausted the heap and unless your application free some memory, further malloc()s wouldn't succeed.
the wrong allocation size: it is quite common coding error to mix signed and unsigned types when calculating block size. If the size ends up mistakenly negative, passed to malloc() where size_t is expected, it becomes very large number.

So in some sense it is also not wrong to abort() to produce the core file which can be analyzed later to see why the malloc() returned 0. Though I prefer to (1) include the attempted allocation size in the error message and (2) try to proceed further. If application would crash due to other memory problem down the road (*), it would produce core file anyway.

(*) From my experience of making software with dynamic memory management resilient to malloc() errors I see that often malloc() returns 0 not reliably. First attempts returning 0 are followed by a successful malloc() returning valid pointer. But first access to the pointed memory would crash the application. This is my experience on both Linux and HP-UX - and I have seen similar pattern on Solaris 10 too. The behavior isn't unique to Linux. To my knowledge the only way to make an application 100% resilient to memory problems is to preallocate all memory in advance. And that is mandatory for mission critical, safety, life support and carrier grade applications - they are not allowed dynamic memory management past initialization phase.

I don't know why many of the sensible answers are voted down. In most server environments, running out of memory means that you have a leak somewhere, and that it makes little sense to 'free some memory and try to go on'. The nature of C++ and especially the standard library is that it requires allocations all the time. If you are lucky, you might be able to free some memory and execute a clean shutdown, or at least emit a warning.

It is however far more likely that you won't be able to do a thing, unless the allocation that failed was a huge one, and there is still memory available for 'normal' things.

Dan Bernstein is one of the very few guys I know that can implement server software that operates in memory constrained situations.

For most of the rest of us, we should probably design our software that it leaves things in a useful state when it bails out because of an out of memory error.

Unless you are some kind of brain surgeon, there isn't a lot else to do.

Also, very often you won't even get an std::bad_alloc or something like that, you'll just get a pointer in return to your malloc/new, and only die when you actually try to touch all of that memory. This can be prevented by turning off overcommit in the operating system, but still.

Don't count on being able to deal with the SIGSEGV when you touch memory that the kernel hoped you wouldn't be.. I'm not quite sure how this works on the windows side of things, but I bet they do overcommit too.

All in all, this is not one of C++'s strong spots.