oom: don't count on mm-less current process
authorTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Wed, 11 Feb 2015 23:24:54 +0000 (15:24 -0800)
committerLinus Torvalds <torvalds@linux-foundation.org>
Thu, 12 Feb 2015 01:06:00 +0000 (17:06 -0800)
out_of_memory() doesn't trigger the OOM killer if the current task is
already exiting or it has fatal signals pending, and gives the task
access to memory reserves instead.  However, doing so is wrong if
out_of_memory() is called by an allocation (e.g. from exit_task_work())
after the current task has already released its memory and cleared
TIF_MEMDIE at exit_mm().  If we again set TIF_MEMDIE to post-exit_mm()
current task, the OOM killer will be blocked by the task sitting in the
final schedule() waiting for its parent to reap it.  It will trigger an
OOM livelock if its parent is unable to reap it due to doing an
allocation and waiting for the OOM killer to kill it.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/oom_kill.c

index d503e9c..f82dd13 100644 (file)
@@ -643,8 +643,12 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
         * If current has a pending SIGKILL or is exiting, then automatically
         * select it.  The goal is to allow it to allocate so that it may
         * quickly exit and free its memory.
+        *
+        * But don't select if current has already released its mm and cleared
+        * TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur.
         */
-       if (fatal_signal_pending(current) || task_will_free_mem(current)) {
+       if (current->mm &&
+           (fatal_signal_pending(current) || task_will_free_mem(current))) {
                set_thread_flag(TIF_MEMDIE);
                return;
        }