What is the best way to prevent out of memory (OOM) freezes on Linux?_问答_开发者

Is there a way to make the OOM killer work and prevent Linux from freezing? I've been running Java and C# applications, where any memory allocated is usually used, and (if I'm understanding them right) overcommits are causing the machine to freeze. Right now, as a temporary solution, I added,

vm.overcommit_memory = 2
vm.overcommit_ratio = 10

to /etc/sysctl.conf.

Kudos to anyone who can explain why the existing OOM killer can't function correctly in a guaranteed manner, killing processes whenever the kernel runs out of "real" memory.

EDIT -- many responses are along the lines of Michael's "if you are experiencing OOM killer related problems, then you probably need to fix whatever is causing you to run out of memory". I don't think this is the correct solution. There will always be apps with bugs, and I'd like to adjust the kernel so my e开发者_Python百科ntire system doesn't freeze. Given my current technical understandings, this doesn't seem like it should be impossible.

Below is a really basic perl script I wrote. With a bit of tweaking it could be useful. You just need to change the paths I have to the paths of any processes that use Java or C#. You could change the kill commands I've used to restart commands also. Of course to avoid typing in perl memusage.pl manually, you could put it into your crontab file to run automatically. You could also use perl memusage.pl > log.txt to save its output to a log file. Sorry if it doesn't really help, but I was bored while drinking a cup of coffee. :-D Cheers

#!/usr/bin/perl -w
# Checks available memory usage and calculates size in MB
# If free memory is below your minimum level specified, then
# the script will attempt to close the troublesome processes down
# that you specify. If it can't, it will issue a -9 KILL signal.
#
# Uses external commands (cat and pidof)
#
# Cheers, insertable

our $memmin = 50;
our @procs = qw(/usr/bin/firefox /usr/local/sbin/apache2);

sub killProcs
{
    use vars qw(@procs);
    my @pids = ();
    foreach $proc (@procs)
    {
        my $filename=substr($proc, rindex($proc,"/")+1,length($proc)-rindex($proc,"/")-1);
        my $pid = `pidof $filename`;
        chop($pid);
        my @pid = split(/ /,$pid);
        push @pids, $pid[0];
    }
    foreach $pid (@pids)
    {
        #try to kill process normall first
        system("kill -15 " . $pid); 
        print "Killing " . $pid . "\n";
        sleep 1;
        if (-e "/proc/$pid")
        {
            print $pid . " is still alive! Issuing a -9 KILL...\n";
            system("kill -9 " + $pid);
            print "Done.\n";
        } else {
            print "Looks like " . $pid . " is dead\n";
        }
    }
    print "Successfully finished destroying memory-hogging processes!\n";
    exit(0);
}

sub checkMem
{
    use vars qw($memmin);
    my ($free) = $_[0];
    if ($free > $memmin)
    {
        print "Memory usage is OK\n";
        exit(0);
    } else {
        killProcs();
    }
}

sub main
{
    my $meminfo = `cat /proc/meminfo`;
    chop($meminfo);
    my @meminfo = split(/\n/,$meminfo);
    foreach my $line (@meminfo)
    {
        if ($line =~ /^MemFree:\s+(.+)\skB$/)
        {
            my $free = ($1 / 1024);
            &checkMem($free);
        }
    }
}

main();

If your processes's oom_adj is set to -17 it won't be considered for killing altough I doubt it's the issue here.

cat /proc/<pid>/oom_adj

will tell you the value of your process(es)'s oom_adj.

I put together a simple script that'll set the OOM score on launch. All sub-processes will inherit this score.

#!/usr/bin/env sh

if [ -z "$1" ] || [ -z "$2" ]; then
  echo "Usage: $(basename "$0") oom_score_adj command [args]..."
  echo "  oom_score_adj  A score between -1000 and 1000, bigger gets killed first"
  echo "  command        The command to run"
  echo "  [args]         Optional args for the command to run"
  exit 1
fi

set -eux

echo $1 > /proc/self/oom_score_adj
shift
exec $@

The script sets the score for the local process to the first arg provided. This can be anything between -1000 to 1000, where 1000 is the most likely to get killed first. The rest of the arguments are then executed as a command with args, replacing the current process.

I'd have to say the best way of preventing OOM freezes is to not run out of virtual memory. If you are regularly running out of virtual memory, or getting close, then you have bigger problems.

Most tasks don't handle failed memory allocations very well so tend to crash or lose data. Running out of virtual memory (with or without overcommit) will cause some allocations to fail. This is usually bad.

Moreover, before your OS runs out of virtual memory, it will start doing bad things like discarding pages from commonly used shared libraries, which is likely to make performance suck as they have to be pulled back in often, which is very bad for throughput.

My suggestions:

Get more ram
Run fewer processes
Make the processes you do run use less memory (This may include fixing memory leaks in them)

And possibly also

Set up more swap space

If that is helpful in your use-case.

Most multi-process servers run a configurable (maximum) number of processes, so you can typically tune it downwards. Multithreaded servers typically allow you to configure how much memory to use for their buffers etc internally.

First off, how can you be sure the freezes are OOM killer related? I've got a network of systems in the field and I get not infrequent freezes, which don't seem to be OOM related (our app is pretty stable in memory usage). Could it be something else? Is there any interesting hardware involved? Any unstable drivers? High performance video?

Even if the OOM killer is involved, and worked, you'd still have problems, because stuff you thought was running is now dead, and who knows what sort of mess it's left behind.

Really, if you are experiencing OOM killer related problems, then you probably need to fix whatever is causing you to run out of memory.

I've found that fixing stability issues mostly relies on accurately identifying the root cause. Unfortunately, this requires being able to see what's happening when the issue happens, which is a really bad time to be trying to start various monitoring programs.

One thing I sometimes found helpful was to start a little monitoring script at boot time which would log various interesting numbers and snapshot the running processes. Then, in the event of a crash, I could look at the situation just before the crash. I sometimes found that intuition was quite wrong about the root cause. Unfortunately, that script is long out-of-date, or I'd give a link.