tracepoints: Do not trace when cpu is offline
authorSteven Rostedt (Red Hat) <rostedt@goodmis.org>
Mon, 15 Feb 2016 17:36:14 +0000 (12:36 -0500)
committerSteven Rostedt <rostedt@goodmis.org>
Mon, 15 Feb 2016 18:04:46 +0000 (13:04 -0500)
The tracepoint infrastructure uses RCU sched protection to enable and
disable tracepoints safely. There are some instances where tracepoints are
used in infrastructure code (like kfree()) that get called after a CPU is
going offline, and perhaps when it is coming back online but hasn't been
registered yet.

This can probuce the following warning:

 [ INFO: suspicious RCU usage. ]
 4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S
 -------------------------------
 include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage!

 other info that might help us debug this:

 RCU used illegally from offline CPU!  rcu_scheduler_active = 1, debug_locks = 1
 no locks held by swapper/8/0.

 stack backtrace:
  CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S              4.4.0-00006-g0fe53e8-dirty #34
  Call Trace:
  [c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable)
  [c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170
  [c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440
  [c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100
  [c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150
  [c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140
  [c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310
  [c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60
  [c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40
  [c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560
  [c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360
  [c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14

This warning is not a false positive either. RCU is not protecting code that
is being executed while the CPU is offline.

Instead of playing "whack-a-mole(TM)" and adding conditional statements to
the tracepoints we find that are used in this instance, simply add a
cpu_online() test to the tracepoint code where the tracepoint will be
ignored if the CPU is offline.

Use of raw_smp_processor_id() is fine, as there should never be a case where
the tracepoint code goes from running on a CPU that is online and suddenly
gets migrated to a CPU that is offline.

Link: http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.org
Reported-by: Denis Kirjanov <kda@linux-powerpc.org>
Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints")
Cc: stable@vger.kernel.org # v2.6.28+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
include/linux/tracepoint.h

index acd522a..acfdbf3 100644 (file)
  * See the file COPYING for more details.
  */
 
+#include <linux/smp.h>
 #include <linux/errno.h>
 #include <linux/types.h>
+#include <linux/cpumask.h>
 #include <linux/rcupdate.h>
 #include <linux/tracepoint-defs.h>
 
@@ -132,6 +134,9 @@ extern void syscall_unregfunc(void);
                void *it_func;                                          \
                void *__data;                                           \
                                                                        \
+               if (!cpu_online(raw_smp_processor_id()))                \
+                       return;                                         \
+                                                                       \
                if (!(cond))                                            \
                        return;                                         \
                prercu;                                                 \