Surviving the Linux OOM Killer
Raunak Ramakrishnan

Raunak Ramakrishnan @rrampage

About: Passionate about databases, distributed systems and functional programming.

Location:
India
Joined:
Sep 30, 2017

Surviving the Linux OOM Killer

Publish Date: Oct 4 '18
55 9

When your Linux machine runs out of memory, Out of Memory (OOM) killer is called by kernel to free some memory. It is often encountered on servers which have a number of memory intensive processes running. In this post, we dig a little deeper into when does OOM killer get called, how it decides which process to kill and if we can prevent it from killing important processes like databases.

How does OOM Killer choose which process to kill?

The Linux kernel gives a score to each running process called oom_score which shows how likely it is to be terminated in case of low available memory. The score is proportional to the amount of memory used by the process. The score is 10 x percent of memory used by process. So the maximum score is 100% x 10 = 1000. In addition, if a process is running as a privileged user, it gets a slightly lower oom_score as compared to same memory usage by a normal user process. In earlier versions of Linux ( v2.6.32 kernel), there was a more elaborate heuristic which calculated this score.

The oom_score of a process can be found in the /proc directory. Let's say that the process id (pid) of your process is 42, cat /proc/42/oom_score will give you the process' score.

Can I ensure some important processes do not get killed by OOM Killer?

Yes! The OOM killer checks oom_score_adj to adjust its final calculated score. This file is present in /proc/$pid/oom_score_adj. You can add a large negative score to this file to ensure that your process gets a lower chance of being picked and terminated by OOM killer. The oom_score_adj can vary from -1000 to 1000. If you assign -1000 to it, it can use 100% memory and still avoid getting terminated by OOM killer. On the other hand, if you assign 1000 to it, the Linux kernel will keep killing the process even when it uses minimal memory.

Let's go back to our process with pid 42. Here is how you can change its oom_score_adj:

echo -200 | sudo tee - /proc/42/oom_score_adj
Enter fullscreen mode Exit fullscreen mode

We need to do this as root user or sudo because Linux does not allow normal users to reduce the OOM score. You can increase the OOM score as a normal user without any special permissions. e.g echo 100 > /proc/42/oom_score_adj

There is also another, less fine-grained score called oom_adj which varies from -16 to 15. It is similar to oom_score_adj. In fact, when you set oom_score_adj, the kernel automatically scales it down and calculates oom_adj. oom_adj has a magic value of -17 which indicates that the given process should never be killed by OOM killer.

Display OOM scores of all running processes

This script displays the OOM score and OOM adjusted score of all running processes, in descending order of OOM score


#!/bin/bash
# Displays running processes in descending order of OOM score
printf 'PID\tOOM Score\tOOM Adj\tCommand\n'
while read -r pid comm; do [ -f /proc/$pid/oom_score ] && [ $(cat /proc/$pid/oom_score) != 0 ] && printf '%d\t%d\t\t%d\t%s\n' "$pid" "$(cat /proc/$pid/oom_score)" "$(cat /proc/$pid/oom_score_adj)" "$comm"; done < <(ps -e -o pid= -o comm=) | sort -k 2nr
Enter fullscreen mode Exit fullscreen mode

Check if any of your processes have been OOM-killed

The easiest way is to grep your system logs. In Ubuntu: grep -i kill /var/log/syslog. If a process has been killed, you may get results like my_process invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0

Caveats of adjusting OOM scores

Remember that OOM is a symptom of a bigger problem - low available memory. The best way to solve it is by either increasing the available memory (e.g better hardware) or moving some programs to other machines or by reducing memory consumption of programs (e.g allocate less memory where possible).

Too much tweaking of the OOM adjusted score will result in random processes getting killed and not being able to free enough memory.

References

  1. proc man page
  2. https://askubuntu.com/questions/60672/how-do-i-use-oom-score-adj/
  3. Walkthrough on which part of Linux code is called
  4. Classic LWN article (a bit dated)
  5. Invoking the OOM killer manually

Comments 9 total

  • Raunak Ramakrishnan
    Raunak RamakrishnanOct 5, 2018

    Agreed! There is no substitute for good monitoring. It catches many issues before they become bigger problems. Ultimately, we must be fixing the root cause for high memory which is generally poor design/architecture.

    What you said about the tragedy of commons is exactly what happened to nice scores for process priority.

  • Dustin King
    Dustin KingOct 5, 2018

    Very interesting.

    This must be a linux-specific thing, not *nix in general. My MacOS laptop doesn't seem to have a /proc.

    Edit to add: This article says Mac uses the sysctl function for some things that would otherwise use /proc for.

  • Alex North-Keys
    Alex North-KeysAug 15, 2019

    I toyed with that command a bit - I wanted to get the RSS and username in there, keep the sort, and include how many procs were included and skipped. Something like (fewer than 30 procs are shown due to trimming):

    #                              OOM   OOM   OOM  
    #     User     PID       RSS Score ScAdj   Adj  Command (shown 30, omits 945)
      someuser   17098  13056696   198     0     0  /usr/lib/firefox/firefox -cont
      someuser    5972   3645740    55     0     0  /usr/lib/firefox/firefox -cont
      someuser   17040   2668760    40     0     0  /usr/lib/firefox/firefox -no-r
      someuser    5898   2342168    35     0     0  /usr/lib/firefox/firefox -no-r
          root    4974   1531488    24     0     0  /usr/lib/xorg/Xorg -nolisten t
      someuser   15283    433544     9     0     0  /usr/bin/java -Dosgi.requiredJ
      someuser    6014    133240     2     0     0  /usr/lib/firefox/firefox -cont
      someuser    6094    171836     2     0     0  /usr/lib/firefox/firefox -cont
      someuser   26043    101524     2     0     0  emacs somefile.py 
      someuser    1889     70088     1     0     0  xterm -name XTerm8 -tn xterm-2
      someuser    3607     64096     1     0     0  xterm -name XTerm8 -tn xterm-2
      someuser   11903     99368     1     0     0  python 
      someuser   17166    111764     1     0     0  /usr/lib/firefox/firefox -cont
      someuser   32529     44736     1     0     0  xterm -name XTerm8 -tn xterm-2
      postgres   30473     66132     1     0     0  postgres: checkpointer process
          root    2388     41156     0  -500    -8  /usr/bin/dockerd -H unix:// 
          root   20036     24344     0  -900   -15  /usr/lib/snapd/snapd 
      message+    1605      4100     0  -900   -15  /usr/bin/dbus-daemon --system 
      postgres   30448      7220     0  -900   -15  /usr/lib/postgresql/11/bin/pos
          root   28818      3032     0 -1000   -17  /lib/systemd/systemd-udevd 
          root   30988      2832     0 -1000   -17  /usr/sbin/sshd -D 
    

    Here's the result. Whether this is an argument for or against bash syntax is an exercise for the reader. The cat/tr calls can probably be obviated :-)

    #!/bin/bash
    #    Displays running processes in descending order of OOM score
    #      (skipping those with both score and adjust of zero).
    #    https://dev.to/rrampage/surviving-the-linux-oom-killer-2ki9
    
    contents-or-0 () { if [ -r "$1" ] ; then cat "$1" ; else echo 0 ; fi ; }
    
    {
        header='# %8s %7s %9s %5s %5s %5s  %s\n'
        format="$(echo "$header" | sed 's/^./ /')"
        declare -a lines output
        IFS=$'\r\n' command eval 'lines=($(ps -e -o user,pid,rss))'
        shown=0 ; omits=0
        for n in $(eval echo "{1..$(expr ${#lines[@]} - 1)}") ; do # 1..skip header
            line="${lines[$n]}"
            case "$line" in *[0-9]*)
                set $line ; user=$1 ; pid=$2 ; rss=$3 ; shift 3
                oom_score=$(    contents-or-0  /proc/$pid/oom_score)
                oom_adj=$(      contents-or-0  /proc/$pid/oom_adj)
                oom_score_adj=$(contents-or-0  /proc/$pid/oom_score_adj)            
                if [ -f /proc/$pid/oom_score ] && \
                   [ 0 -ne $oom_score -o 0 -ne $oom_score_adj -o 0 -ne $oom_adj ]
                then
                    output[${#output[@]}]="$( \
                       printf "$format" \
                              "$user" \
                              "$pid" \
                              "$rss" \
                              "$oom_score" \
                              "$oom_score_adj" \
                              "$oom_adj" \
                              "$(cat /proc/$pid/cmdline | tr '\0' ' ' )" \
                    )"
                    (( ++shown ))
                else
                    (( ++omits ))
                fi
                ;;
            esac
        done
        printf "$header"   ''   '' '' OOM   OOM   OOM ''
        printf "$header" User PID RSS Score ScAdj Adj \
            "Command (shown $shown, omits $omits)"
        for n in $(eval echo "{0..$(expr ${#output[@]} - 1)}") ; do
            echo "${output[$n]}"
        done | sort -k 4nr -k 5rn
    }
    
    #----eof
    
  • Andrey Loskutov
    Andrey LoskutovOct 14, 2020

    We've found an interesting issue: specific oom_score_adj values in the range [942,999] seem to produce "unexpected" oom_adj values of 16, which seem to be out of range [-17, 15].

    That is at least unexpected, any idea where it is coming from and if that could affect the oom_killer behavior (e.g. task with oom_score_adj=940 will be killed before the task with oom_score_adj=999)? At least /proc/<pid>/oom_score seem to be "OK" and is higher for oom_score_adj=1000...

    sudo echo "999" > /proc/<pid>/oom_score_adj
    cat /<pid>/oom_score_adj
    999
    cat /<pid>/oom_adj
    16
    cat /proc/<pid>/oom_score
    1008
    
    sudo echo "1000" > /proc/<pid>/oom_score_adj
    cat /<pid>/oom_score_adj
    1000
    cat /<pid>/oom_adj
    15
    cat /proc/<pid>/oom_score
    1009
    
  • Gnoale
    GnoaleFeb 9, 2021

    little typo I spotted : instead of sudo echo -200 > /proc/42/oom_score_adj do echo -200 | sudo tee - /proc/42/oom_score_adj

  • beelong_yank
    beelong_yankMar 4, 2025

    for more information

    #!/bin/bash
    
    LOG_FILE="/var/log/oom_monitor.log"
    declare -A prev_processes  
    
    while true; do
        clear
        printf 'PID\tPPID\tOOM Score\tOOM Adj\tCommand\n'
    
        declare -A current_processes  
    
        while read -r pid ppid comm; do
            if [ -f /proc/$pid/oom_score ] && [ "$(cat /proc/$pid/oom_score)" -ne 0 ]; then
                oom_score=$(cat /proc/$pid/oom_score)
                oom_adj=$(cat /proc/$pid/oom_score_adj)
                printf '%d\t%d\t%d\t\t%d\t%s\n' "$pid" "$ppid" "$oom_score" "$oom_adj" "$comm"
    
                current_processes["$pid"]="$comm"
            fi
        done < <(ps -e -o pid= -o ppid= -o comm=) | sort -k 3nr
    
        for pid in "${!prev_processes[@]}"; do
            if [[ ! -v current_processes["$pid"] ]]; then
                timestamp=$(date "+%Y-%m-%d %H:%M:%S")
                msg="[$timestamp] ALERT: Proces '${prev_processes[$pid]}' (PID $pid) is dead by OOM Killer!"
                echo -e "\n\033[1;31m$msg\033[0m" 
                echo "$msg" >> "$LOG_FILE"
            fi
        done
    
        prev_processes=("${current_processes[@]}")
    
        sleep 2  
    done
    
    
    Enter fullscreen mode Exit fullscreen mode
Add comment