sched/numa: Ensure task_numa_migrate() checks the preferred node
authorRik van Riel <riel@redhat.com>
Wed, 4 Jun 2014 20:09:42 +0000 (16:09 -0400)
committerIngo Molnar <mingo@kernel.org>
Wed, 18 Jun 2014 16:29:57 +0000 (18:29 +0200)
commita43455a1d572daf7b730fe12eb747d1e17411365
tree9522d6d36e5e6bbc92ab32c5aafc179ceaf668ee
parentebe06187bf2aec10d537ce4595e416035367d703
sched/numa: Ensure task_numa_migrate() checks the preferred node

The first thing task_numa_migrate() does is check to see if there is
CPU capacity available on the preferred node, in order to move the
task there.

However, if the preferred node is all busy, we would skip considering
that node for tasks swaps in the subsequent loop. This prevents NUMA
convergence of tasks on busy systems.

However, swapping locations with a task on our preferred nid, when
the preferred nid is busy, is perfectly fine.

The fix is to also look for a CPU on our preferred nid when it is
totally busy.

This changes "perf bench numa mem -p 4 -t 20 -m -0 -P 1000" from
not converging in 15 minutes on my 4 node system, to converging in
10-20 seconds.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140604160942.6969b101@cuia.bos.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
kernel/sched/fair.c