cascardo/linux.git
8 years agoperf stat record: Add pipe support for record command
Jiri Olsa [Thu, 5 Nov 2015 14:40:50 +0000 (15:40 +0100)]
perf stat record: Add pipe support for record command

Allowing storing stat record data into pipe, so report tools
(report/script) could read data directly from record.

Committer note:

Before this patch:

  $ perf stat record -o - usleep 1 | perf report -i -
  incompatible file format (rerun with -v to learn more)
  $ perf stat record -o - usleep 1 | perf script -i -
  incompatible file format (rerun with -v to learn more)
  $ ls -la perf.data
  ls: cannot access perf.data: No such file or directory
  $

After:

  $ perf stat record -o - usleep 1 | perf report -i -
  # To display the perf.data header info, please use
  # --header/--header-only options.
  #
  Error:
  The - file has no samples!
  $ perf stat record -o - usleep 1 | perf script -i -
  Display of symbols requested but neither sample IP nor sample address
  is selected. Hence, no addresses to convert to symbols.
  0 [0x80]: failed to process type: 64
  $ ls -la perf.data
  ls: cannot access perf.data: No such file or directory
  $

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1446734469-11352-7-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf stat record: Store events IDs in perf data file
Jiri Olsa [Thu, 5 Nov 2015 14:40:49 +0000 (15:40 +0100)]
perf stat record: Store events IDs in perf data file

Store event IDs in evlist object so it get stored into perf.data file.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1446734469-11352-6-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf evlist: Export id_add_fd()
Jiri Olsa [Thu, 5 Nov 2015 14:40:49 +0000 (15:40 +0100)]
perf evlist: Export id_add_fd()

Will be used to storing the event IDs in evlist object so it get stored
into perf.data file.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1446734469-11352-6-git-send-email-jolsa@kernel.org
[ Split from the patch storing the ids in the perf.data file ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf stat record: Synthesize stat record data
Jiri Olsa [Thu, 5 Nov 2015 14:40:48 +0000 (15:40 +0100)]
perf stat record: Synthesize stat record data

Synthesizing needed stat record data for report/script:
  - cpu/thread maps
  - stat config

Committer note:

New records generated on a perf.data file with this patch:

  $ perf report -D | grep PERF_RECORD_
  0x568 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 29097
  0x590 [0x12]: PERF_RECORD_CPU_MAP nr: 1 cpu: 65535
  0x5a2 [0x40]: PERF_RECORD_STAT_CONFIG
  $

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1446734469-11352-5-git-send-email-jolsa@kernel.org
[ Adjusted wrt kernel PERF_RECORD_MMAP added when introducing 'perf stat record' ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf stat record: Initialize record features
Jiri Olsa [Thu, 5 Nov 2015 14:40:47 +0000 (15:40 +0100)]
perf stat record: Initialize record features

Disabling all non stat related features.

Also as we now enable STAT feature in the data file, adding code to
instruct session open to skip sample type checking for stat data files.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1446734469-11352-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf stat record: Add record command
Jiri Olsa [Thu, 5 Nov 2015 14:40:46 +0000 (15:40 +0100)]
perf stat record: Add record command

Add 'perf stat record' command support. It creates simple (header only)
perf.data file ATM.

The record command could be specified anywhere among stat options. All
stat command options are valid for stat record command with '-o' option
exception. If specified for record command it denotes the perf data file
name.

Committer note:

Set sample_type to PERF_SAMPLE_IDENTIFIER, which should be harmless
while avoiding that older tools show confusing messages, for instance,
with sample_type = 0, we get:

  $ perf stat record usleep 1

   Performance counter stats for 'usleep 1':

          0.630237      task-clock (msec)         #    0.528 CPUs utilized
                 1      context-switches          #    0.002 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                52      page-faults               #    0.083 M/sec
           978,312      cycles                    #    1.552 GHz
           671,931      stalled-cycles-frontend   #   68.68% frontend cycles idle
   <not supported>      stalled-cycles-backend
           646,379      instructions              #    0.66  insns per cycle
                                                  #    1.04  stalled cycles per insn
           131,046      branches                  #  207.931 M/sec
             7,073      branch-misses             #    5.40% of all branches

       0.001193240 seconds time elapsed

  $ oldperf evlist
  WARNING: The perf.data file's data size field is 0 which is unexpected.
  Was the 'perf record' command properly terminated?
  non matching sample_type
  $

While with sample_type set to PERF_SAMPLE_IDENTIFIER, after we re-run 'perf
stat record usleep' we get:

  $ oldperf evlist
  WARNING: The perf.data file's data size field is 0 which is unexpected.
  Was the 'perf record' command properly terminated?
  task-clock
  context-switches
  cpu-migrations
  page-faults
  cycles
  stalled-cycles-frontend
  stalled-cycles-backend
  instructions
  branches
  branch-misses
  $

Which at least shows the names of the events in the perf.data file.

Additionally, such files, when passed to 'perf report' will produce:

  $ oldperf report --stdio
  WARNING: The perf.data file's data size field is 0 which is unexpected.
  Was the 'perf record' command properly terminated?
  Warning:
  Kernel address maps (/proc/{kallsyms,modules}) were restricted.

  Check /proc/sys/kernel/kptr_restrict before running 'perf record'.

  As no suitable kallsyms nor vmlinux was found, kernel samples
  can't be resolved.

  Samples in kernel modules can't be resolved as well.

  Error:
  The perf.data file has no samples!
  # To display the perf.data header info, please use --header/--header-only options.
  #
  $

Which is confusing and can be solved by just adding the kernel mmap record,
which will also remove that warning about the data size field being equal to
zero, after generating the mmap record:

  $ perf stat record usleep 1

   Performance counter stats for 'usleep 1':

          0.600796      task-clock (msec)         #    0.478 CPUs utilized
                 1      context-switches          #    0.002 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                54      page-faults               #    0.090 M/sec
           886,844      cycles                    #    1.476 GHz
           582,169      stalled-cycles-frontend   #   65.65% frontend cycles idle
   <not supported>      stalled-cycles-backend
           638,344      instructions              #    0.72  insns per cycle
                                                  #    0.91  stalled cycles per insn
           130,204      branches                  #  216.719 M/sec
             7,500      branch-misses             #    5.76% of all branches

       0.001255897 seconds time elapsed

  $ oldperf evlist
  task-clock
  context-switches
  cpu-migrations
  page-faults
  cycles
  stalled-cycles-frontend
  stalled-cycles-backend
  instructions
  branches
  branch-misses
  $ oldperf report --stdio
  Error:
  The perf.data file has no samples!
  # To display the perf.data header info, please use --header/--header-only options.
  #
  [acme@zoo linux]$

No warnings, sensible output about what are the events in the perf.data file and also
a "file has no samples" message, which indeed it doesn't.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: htp://lkml.kernel.org/r/1446734469-11352-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Introduce stat perf.data header feature
Jiri Olsa [Sun, 25 Oct 2015 14:51:43 +0000 (15:51 +0100)]
perf tools: Introduce stat perf.data header feature

Introducing the 'stat' feature to mark a perf.data as created by  the
'perf stat record' command. It contains no data.

It's needed so that the report tools (report/script) can differentiate
sampling data from counting data, because they need to be treated in a
different way.

In the future it might be used to store the version of the stat storage
system used.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-28-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf report: Display newly added events in raw dump
Jiri Olsa [Sun, 25 Oct 2015 14:51:42 +0000 (15:51 +0100)]
perf report: Display newly added events in raw dump

The 'perf report -D' command will now display detailed output for these
newly added events:

  event_update
  thread_map
  cpu_map
  stat
  stat_config
  stat_round

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-27-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Add perf_event__fprintf_event_update function
Jiri Olsa [Sun, 25 Oct 2015 14:51:41 +0000 (15:51 +0100)]
perf tools: Add perf_event__fprintf_event_update function

To display a 'event update' event for raw dump.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-26-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Add event_update event cpus type
Jiri Olsa [Sun, 25 Oct 2015 14:51:40 +0000 (15:51 +0100)]
perf tools: Add event_update event cpus type

Adding the cpumask 'event update' event, that stores/transfer the
cpumask for a event.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-25-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Add event_update event name type
Jiri Olsa [Sun, 25 Oct 2015 14:51:39 +0000 (15:51 +0100)]
perf tools: Add event_update event name type

Adding name type 'event update' event, that stores/transfer events name.
Event's name is stored within perf.data's EVENT_DESC feature, but we
don't have it if we get the report data from pipe.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-24-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Add event_update event scale type
Jiri Olsa [Sun, 25 Oct 2015 14:51:38 +0000 (15:51 +0100)]
perf tools: Add event_update event scale type

A__allocdding scale type 'event update' event, that stores/transfer
events scale value. The PMU events can define the scale
value which is used to multiply events data.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-23-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Add event_update event unit type
Jiri Olsa [Sun, 25 Oct 2015 14:51:37 +0000 (15:51 +0100)]
perf tools: Add event_update event unit type

Adding unit type 'event update' event, that stores/transfer events unit
name. The unit name is part of the perf stat output data.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-22-git-send-email-jolsa@kernel.org
[ Rename __alloc() to __new() for consistency ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Add event_update user level event
Jiri Olsa [Sun, 25 Oct 2015 14:51:36 +0000 (15:51 +0100)]
perf tools: Add event_update user level event

It'll serve as a base event for additional event attributes details,
that are not part of the attr event.

At the moment this event is just a dummy one without any specific
functionality. The type value will distinguish the update event details.
It'll come in the following patches.

The idea for this event is to be extensible for any update that the
event might need in the future.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-21-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Add stat events fprintf functions
Jiri Olsa [Sun, 25 Oct 2015 14:51:35 +0000 (15:51 +0100)]
perf tools: Add stat events fprintf functions

Introducing the following functions to display the stat events for raw
dump.

  perf_event__fprintf_stat
  perf_event__fprintf_stat_round
  perf_event__fprintf_stat_config

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-20-git-send-email-jolsa@kernel.org
[ s/stat/st/g and s/round/rd/g parameters to fix 'already defined' build error with older distros (e.g. RHEL6.7) ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Add stat round event synthesize function
Jiri Olsa [Sun, 25 Oct 2015 14:51:34 +0000 (15:51 +0100)]
perf tools: Add stat round event synthesize function

Introduce the perf_event__synthesize_stat_round function to
synthesize a 'struct stat_round_event'.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-19-git-send-email-jolsa@kernel.org
[ Renamed 'time' parameter to 'evtime' to fix build on older systems ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Add stat round user level event
Jiri Olsa [Sun, 25 Oct 2015 14:51:33 +0000 (15:51 +0100)]
perf tools: Add stat round user level event

Adding the stat round event to be stored after each stat interval round,
so that report tools (report/script) gets notified and process interval
data.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-18-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Add stat event read function
Jiri Olsa [Sun, 25 Oct 2015 14:51:32 +0000 (15:51 +0100)]
perf tools: Add stat event read function

Introducing the perf_event__process_stat_event function to process a
'struct perf_stat' data from a stat event.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-17-git-send-email-jolsa@kernel.org
[ Renamed 'stat' parameter to 'st' to fix 'already defined' build error with older distros (e.g. RHEL6.7) ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Add stat event synthesize function
Jiri Olsa [Sun, 25 Oct 2015 14:51:31 +0000 (15:51 +0100)]
perf tools: Add stat event synthesize function

Introduce the perf_event__synthesize_stat function to synthesize a
'struct stat_event'.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-16-git-send-email-jolsa@kernel.org
[ Renamed 'stat' parameter to 'st' to fix 'already defined' build error with older distros (e.g. RHEL6.7) ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Add stat user level event
Jiri Olsa [Sun, 25 Oct 2015 14:51:30 +0000 (15:51 +0100)]
perf tools: Add stat user level event

Adding a stat event to store a 'struct perf_counter_values' for a given
event/cpu/thread.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-15-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Add stat config event read function
Jiri Olsa [Sun, 25 Oct 2015 14:51:29 +0000 (15:51 +0100)]
perf tools: Add stat config event read function

Introducing the perf_event__read_stat_config function to read a struct
perf_stat_config object data from a stat config event.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-14-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Add stat config event synthesize function
Jiri Olsa [Sun, 25 Oct 2015 14:51:28 +0000 (15:51 +0100)]
perf tools: Add stat config event synthesize function

Introduce the perf_event__synthesize_stat_config to synthesize a 'struct
perf_stat_config'.

Storing the stat config in the form of tag-value pairs will, I believe,
sort out future version extensibility issues.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-13-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Add stat config user level event
Jiri Olsa [Sun, 25 Oct 2015 14:51:27 +0000 (15:51 +0100)]
perf tools: Add stat config user level event

Adding the stat config event to pass/store stat config data, so report
tools (report/script) know how to interpret stat data.

The config data is stored in a 'tag|value' way to allow for easy
extension and backwards compatibility.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-12-git-send-email-jolsa@kernel.org
[ stat_config_term_event -> stat_config_event_entry ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf cpu_map: Add perf_event__fprintf_cpu_map function
Jiri Olsa [Sun, 25 Oct 2015 14:51:26 +0000 (15:51 +0100)]
perf cpu_map: Add perf_event__fprintf_cpu_map function

To display a cpu_map event for raw dump.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-11-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf cpu_map: Add cpu_map__new_event function
Jiri Olsa [Sun, 25 Oct 2015 14:51:25 +0000 (15:51 +0100)]
perf cpu_map: Add cpu_map__new_event function

Introducing the cpu_map__new_event function to create a struct cpu_map
object from a cpu_map event.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-10-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf cpu_map: Add cpu_map event synthesize function
Jiri Olsa [Sun, 25 Oct 2015 14:51:24 +0000 (15:51 +0100)]
perf cpu_map: Add cpu_map event synthesize function

Introduce the perf_event__synthesize_cpu_map function to synthesize a
struct cpu_map.

Added generic interface:
  cpu_map_data__alloc
  cpu_map_data__synthesize

to make the cpu_map synthesizing usable for other events.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-9-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf cpu_map: Add cpu_map user level event
Jiri Olsa [Sun, 25 Oct 2015 14:51:23 +0000 (15:51 +0100)]
perf cpu_map: Add cpu_map user level event

Adding the cpu_map event to pass/store cpu maps as data in
a pipe/perf.data.

We store maps in 2 formats:
  - list of cpus
  - mask of cpus

The format that takes less space is selected transparently in the
following patch.

The interface is made generic, so we could add the cpumap event data
into another event in the following patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-8-git-send-email-jolsa@kernel.org
[ cpu_map_data_cpus -> cpu_map_entries, cpu_map_data_mask -> cpu_map_mask ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf thread_map: Add perf_event__fprintf_thread_map function
Jiri Olsa [Sun, 25 Oct 2015 14:51:22 +0000 (15:51 +0100)]
perf thread_map: Add perf_event__fprintf_thread_map function

To display a thread_map event for a raw dump.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-7-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf thread_map: Add thread_map__new_event function
Jiri Olsa [Sun, 25 Oct 2015 14:51:21 +0000 (15:51 +0100)]
perf thread_map: Add thread_map__new_event function

Introducing the thread_map__new_event function to create a struct
thread_map object from a thread_map event.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-6-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf thread_map: Add thread_map event sythesize function
Jiri Olsa [Sun, 25 Oct 2015 14:51:20 +0000 (15:51 +0100)]
perf thread_map: Add thread_map event sythesize function

Introduce the perf_event__synthesize_thread_map2 function to synthesize
struct thread_map.

The perf_event__synthesize_thread_map name is already taken for
synthesizing the complete threads data (comm/mmap/fork).

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-5-git-send-email-jolsa@kernel.org
[ Rename thread_map_data_event to thread_map_event_entry ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf thread_map: Add thread_map user level event
Jiri Olsa [Sun, 25 Oct 2015 14:51:19 +0000 (15:51 +0100)]
perf thread_map: Add thread_map user level event

Adding the thread_map event to pass/store thread maps as data in
the pipe/perf.data.

Storing the thread ID along with the standard comm[16] thread name string.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-4-git-send-email-jolsa@kernel.org
[ Renamed thread_map_data_event to thread_map_event_entry ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agotools subcmd: Rename subcmd header include guards
Josh Poimboeuf [Tue, 15 Dec 2015 15:39:40 +0000 (09:39 -0600)]
tools subcmd: Rename subcmd header include guards

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/d8081e7528b25ad91f4154b6a3fd063e93c108ec.1450193761.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf subcmd: Create subcmd library
Josh Poimboeuf [Tue, 15 Dec 2015 15:39:39 +0000 (09:39 -0600)]
perf subcmd: Create subcmd library

Move the subcommand-related files from perf to a new library named
libsubcmd.a.

Since we're moving files anyway, go ahead and rename 'exec_cmd.*' to
'exec-cmd.*' to be consistent with the naming of all the other files.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/c0a838d4c878ab17fee50998811612b2281355c1.1450193761.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Finalize subcmd independence
Josh Poimboeuf [Tue, 15 Dec 2015 15:39:38 +0000 (09:39 -0600)]
perf tools: Finalize subcmd independence

For the files that will be moved to the subcmd library, remove all their
perf-specific includes and duplicate any needed functionality.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/6e12946f0f26ce4d543d34db68d9dae3c8551cb9.1450193761.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Remove 'perf' from subcmd function and variable names
Josh Poimboeuf [Tue, 15 Dec 2015 15:39:37 +0000 (09:39 -0600)]
perf tools: Remove 'perf' from subcmd function and variable names

In preparation for moving exec_cmd.c and run-command.c out of perf and
into a library, remove 'perf' from all the symbol names.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/bc3ee82b40b8f396b644fa49e0f7260ce442635b.1450193761.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Remove subcmd dependencies on strbuf
Josh Poimboeuf [Tue, 15 Dec 2015 15:39:36 +0000 (09:39 -0600)]
perf tools: Remove subcmd dependencies on strbuf

Introduce and use new astrcat() and astrcatf() functions which replace
the strbuf functionality for subcmd.

For now they duplicate strbuf's die-on-allocation-error policy.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/957d207e1254406fa11fc2e405e75a7e405aad8f.1450193761.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Provide subcmd configuration at runtime
Josh Poimboeuf [Tue, 15 Dec 2015 15:39:35 +0000 (09:39 -0600)]
perf tools: Provide subcmd configuration at runtime

Create init functions for exec_cmd.c and pager.c.  This allows their
configuration to be specified at runtime so they can be split out into a
separate library which can be used by other programs.  Their
configuration is stored in a shared subcmd_config struct.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/21f5f6b38da72c985a8dcfa185700d03e7eecd1d.1450193761.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Document the fact that parse_options*() may exit
Josh Poimboeuf [Tue, 15 Dec 2015 15:39:34 +0000 (09:39 -0600)]
perf tools: Document the fact that parse_options*() may exit

Generally, calling exit() from a library is bad practice.  Eventually
these functions might be redesigned so that they don't exit.  For now,
just document the fact that they do.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/97b1af06cc3b18dd0f49e655d6d659eaa64ecde5.1450193761.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Move strlcpy() from perf to tools/lib/string.c
Josh Poimboeuf [Tue, 15 Dec 2015 15:39:33 +0000 (09:39 -0600)]
perf tools: Move strlcpy() from perf to tools/lib/string.c

strlcpy() will be needed by the subcmd library.  Move it to the shared
tools/lib/string.c file which can be used by other tools.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/71e2804b973bf39ad3d3b9be10f99f2ea630be46.1450193761.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agotools build: Fix feature Makefile issues with 'O='
Josh Poimboeuf [Tue, 15 Dec 2015 15:39:32 +0000 (09:39 -0600)]
tools build: Fix feature Makefile issues with 'O='

When building perf binaries outside the source tree with 'make O=<dir>',
the auto-detected features get re-tested for every build, which is
unnecessary and inconsistent with the behavior seen when building
directly in the source tree.

Another issue is that 'make O=<dir> clean' doesn't remove the feature
files from the object tree.

Fix these problems by looking for the binaries in the $(OUTPUT)
directory.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/113bd01530e9761778c60a75a96c65fc59860f68.1450193761.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf record: Add record.build-id config option
Namhyung Kim [Tue, 15 Dec 2015 01:49:56 +0000 (10:49 +0900)]
perf record: Add record.build-id config option

Post processing at 'perf record' takes a long time on big machines.

What it does is to find the build-id of binaries found in the event
stream, so that it can make sure, at 'report' time, that the symtabs (be
it ELF, kallsyms, etc) being used to resolve symbols are the ones
matching the binaries found at 'record' time.

Sometimes we just want to skip this processing of events at the end of
the session to get quicker results, making sure the binaries haven't
changed from 'record' to 'report' time.

Add a new config option to control this behavior.

The record.build-id config variable can have one of the following
values:

 - cache: post-process data and save/update the binaries into the
          build-id cache (in ~/.debug).  This is the default.
 - no-cache: post-process the data but not update the build-id cache.
             Same effect as using the -N option.
 - skip: skip post-processing and do not update the cache.
         Same effect as using the -B option.

Reported-and-Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/1450144196-22957-1-git-send-email-namhyung@kernel.org
[ Added some more text to the documentation ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf record: Support custom vmlinux path
He Kuang [Mon, 14 Dec 2015 10:39:23 +0000 (10:39 +0000)]
perf record: Support custom vmlinux path

Make perf-record command support --vmlinux option if BPF_PROLOGUE is on.

'perf record' needs vmlinux as the source of DWARF info to generate
prologue for BPF programs, so path of vmlinux should be specified.

Short name 'k' has been taken by 'clockid'. This patch skips the short
option name and uses '--vmlinux' for vmlinux path.

Documentation is also updated.

Test result:

In a production (or broken) environment:
 (by:
  # rm -rf ~/.debug/
  # mv /lib/modules/`uname -r`/build/vmlinux /tmp/
 )

 # ./perf record -e ./test_bpf_base.c ls
 Failed to find the path for kernel: No such file or directory
 event syntax error: './test_bpf_base.c'
                      \___ You need to check probing points in BPF file
 ...

 # ./perf record --vmlinux /tmp/vmlinux -e ./test_bpf_base.c ls
 ...
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.011 MB perf.data ]

Help messages when build with NO_LIBBPF:

 # ./perf record -h
        --transaction     sample transaction flags (special events only)
        --vmlinux <file>  vmlinux pathname
                          (not built-in because NO_LIBBPF=1)
 # ./perf record --vmlinux /tmp/vmlinux ls /
  Warning: option `vmlinux' is being ignored because NO_LIBBPF=1
 ...
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.011 MB perf.data (11 samples) ]

Help messages when build with NO_DWARF:

 # ./perf record -h
        --transaction     sample transaction flags (special events only)
        --vmlinux <file>  vmlinux pathname
                          (not built-in because NO_DWARF=1)

Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1450089563-122430-15-git-send-email-wangnan0@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Make options always available, even if required libs not linked
Wang Nan [Mon, 14 Dec 2015 10:39:22 +0000 (10:39 +0000)]
perf tools: Make options always available, even if required libs not linked

This patch keeps options of perf builtins same in all conditions. If
one option is disabled because of compiling options, users should be
notified.

Masami suggested another implementation in [1] that, by adding a
OPTION_NEXT_DEPENDS option before those options in the 'struct option'
array, options parser knows an option is disabled. However, in some
cases this array is reordered (options__order()). In addition, in
parse-option.c that array is const, so we can't simply merge
information in decorator option into the affacted option.

This patch chooses a simpler implementation that, introducing a
set_option_nobuild() function and two option parsing flags. Builtins
with such options should call set_option_nobuild() before option
parsing. The complexity of this patch is because we want some of options
can be skipped safely. In this case their arguments should also be
consumed.

Options in 'perf record' and 'perf probe' are fixed in this patch.

[1] http://lkml.kernel.org/g/50399556C9727B4D88A595C8584AAB3752627CD4@GSjpTKYDCembx32.service.hitachi.net

Test result:

Normal case:

  # ./perf probe --vmlinux /tmp/vmlinux sys_write
  Added new event:
    probe:sys_write      (on sys_write)

  You can now use it in all perf tools, such as:

perf record -e probe:sys_write -aR sleep 1

Build with NO_DWARF=1:

  # ./perf probe -L sys_write
    Error: switch `L' is not available because NO_DWARF=1

   Usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]
      or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
      or: perf probe [<options>] --del '[GROUP:]EVENT' ...
      or: perf probe --list [GROUP:]EVENT ...
      or: perf probe [<options>] --funcs

    -L, --line <FUNC[:RLN[+NUM|-RLN2]]|SRC:ALN[+NUM|-ALN2]>
                          Show source code lines.
                          (not built-in because NO_DWARF=1)

  # ./perf probe -k /tmp/vmlinux sys_write
    Warning: switch `k' is being ignored because NO_DWARF=1
  Added new event:
    probe:sys_write      (on sys_write)

  You can now use it in all perf tools, such as:

perf record -e probe:sys_write -aR sleep 1

  # ./perf probe --vmlinux /tmp/vmlinux sys_write
    Warning: option `vmlinux' is being ignored because NO_DWARF=1
  Added new event:
  [SNIP]

  # ./perf probe -l
   Usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]
      or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
...
    -k, --vmlinux <file>  vmlinux pathname
                          (not built-in because NO_DWARF=1)
    -L, --line <FUNC[:RLN[+NUM|-RLN2]]|SRC:ALN[+NUM|-ALN2]>
                          Show source code lines.
                          (not built-in because NO_DWARF=1)
...
    -V, --vars <FUNC[@SRC][+OFF|%return|:RL|;PT]|SRC:AL|SRC;PT>
                          Show accessible variables on PROBEDEF
                          (not built-in because NO_DWARF=1)
        --externs         Show external variables too (with --vars only)
                          (not built-in because NO_DWARF=1)
        --no-inlines      Don't search inlined functions
                          (not built-in because NO_DWARF=1)
        --range           Show variables location range in scope (with --vars only)
                          (not built-in because NO_DWARF=1)

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1450089563-122430-14-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Convert parse-options.c internal functions to static
Josh Poimboeuf [Mon, 14 Dec 2015 04:18:12 +0000 (22:18 -0600)]
perf tools: Convert parse-options.c internal functions to static

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/c027b5f47ec1055077f5650edb1c7ad37c191e6c.1449965119.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Move help_unknown_cmd() to its own file
Josh Poimboeuf [Mon, 14 Dec 2015 04:18:09 +0000 (22:18 -0600)]
perf tools: Move help_unknown_cmd() to its own file

help_unknown_cmd() is quite perf-specific because it relies on some
perf_config*() functions.  Move it and its supporting functions out into
a separate file so that help.c can be moved to a library.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/562d918bcaaf340c1ae3e47586b3f0ae33b9918b.1449965119.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Remove check for unused PERF_PAGER_IN_USE
Josh Poimboeuf [Mon, 14 Dec 2015 04:18:08 +0000 (22:18 -0600)]
perf tools: Remove check for unused PERF_PAGER_IN_USE

PERF_PAGER_IN_USE doesn't seem to be used anywhere, so let's remove it.

This will also make it easier to move pager.c into a separate library.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/ed9e8370db9811746dc590544cf48c36dcfb1731.1449965119.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Create pager.h
Josh Poimboeuf [Mon, 14 Dec 2015 04:18:07 +0000 (22:18 -0600)]
perf tools: Create pager.h

Move the 'pager' function prototypes into a new pager.h so that the
pager code can be moved out to a library.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/ba7c316474dd6bfc047e5c6dc4dcab39a982caf5.1449965119.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf build: Rename LIB_PATH -> API_PATH
Josh Poimboeuf [Mon, 14 Dec 2015 04:18:06 +0000 (22:18 -0600)]
perf build: Rename LIB_PATH -> API_PATH

'LIB_PATH' is a misnomer because there are multiple library paths.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/c10df0b749a27f05cc531fe06b8dd71a329341fa.1449965119.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf build: Fix 'make clean'
Josh Poimboeuf [Mon, 14 Dec 2015 04:18:05 +0000 (22:18 -0600)]
perf build: Fix 'make clean'

Add some missing files to the 'make clean' target.

Reported-and-Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/8b1f5a5bd66a652be071d423e64aaa994254be31.1449965119.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf test: Remove tarpkg at end of test
Josh Poimboeuf [Mon, 14 Dec 2015 04:18:04 +0000 (22:18 -0600)]
perf test: Remove tarpkg at end of test

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5e7e97a23e3ce11b59d1009b39ebb6d2813a0560.1449965119.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf test: Add Build file to dependencies for llvm-src-*.c
Josh Poimboeuf [Mon, 14 Dec 2015 04:18:03 +0000 (22:18 -0600)]
perf test: Add Build file to dependencies for llvm-src-*.c

Because the Build file writes source code to the generated llvm-src-*.c
files, it should be listed as one of the dependencies, so that any
future changes to the code being echoed won't require a 'make clean'.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/9b9886c295750dc83cbbb29a665d280f9c5e8b3e.1449965119.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf build: Remove unnecessary line in Makefile.feature
Josh Poimboeuf [Mon, 14 Dec 2015 04:18:01 +0000 (22:18 -0600)]
perf build: Remove unnecessary line in Makefile.feature

This line always silently fails because it doesn't add the 'test-'
prefix to the .bin file.

And it seems to be unnecessary anyway: the line immediately after it
does all the individual feature checks.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/554a05c18af564ba015c9e68f25730126e0f4acb.1449965119.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf test: Fix hist testcases when kptr_restrict is on
Namhyung Kim [Mon, 14 Dec 2015 03:11:13 +0000 (12:11 +0900)]
perf test: Fix hist testcases when kptr_restrict is on

Currently if kptr_restrict is enabled, all hist tests failed with
segfaults.  This is because machine__create_kernel_maps() in
setup_fake_machine() failed in that situation, and it called
machine__delete() on the error path.  But outer callers again called
machines__exit() causing double free for the host machine.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1450062673-22312-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf evsel: Disable branch flags/cycles for --callgraph lbr
Andi Kleen [Sat, 12 Dec 2015 00:12:24 +0000 (16:12 -0800)]
perf evsel: Disable branch flags/cycles for --callgraph lbr

[The kernel patch needed for this is in tip now (b16a5b52eb9 perf/x86:
Add option to disable ...) So this user tools patch to make use of it
should be merged now]

Automatically disable collecting branch flags and cycles with
--call-graph lbr. This allows avoiding a bunch of extra MSR
reads in the PMI on Skylake.

When the kernel doesn't support the new flags they are automatically
cleared in the fallback code.

v2: Switch to use branch_sample_type instead of sample_type.
Adjust description.
Fix the fallback logic.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1449879144-29074-1-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf thread: Fix reference count initial state
Arnaldo Carvalho de Melo [Fri, 11 Dec 2015 22:11:23 +0000 (19:11 -0300)]
perf thread: Fix reference count initial state

We should always return from thread__new(), the constructor, with the
object with a reference count of one, so that:

     struct thread *thread = thread__new();
     thread__put(thread);

Will call thread__delete().

If any reference is made to that 'thread' variable, it better use
thread__get(thread) to hold a reference.

We were returning with thread->refcnt set to zero, fix it and some cases
where thread__delete() was being called, which were not a problem
because just one reference was being used, now that we set it to 1, use
thread__put() instead.

Reported-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-4b9mkuk66to4ecckpmpvqx6s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf test: Dump the stack when test segfaults when in verbose mode
Arnaldo Carvalho de Melo [Fri, 11 Dec 2015 22:06:53 +0000 (19:06 -0300)]
perf test: Dump the stack when test segfaults when in verbose mode

E.g.:

  # perf test 26
  26: Test mmap thread lookup                                  : FAILED!
  # perf test -v 26
  26: Test mmap thread lookup                                  :
  --- start ---
  test child forked, pid 9269
  tid = 9269, map = 0x7ff99ff0c000
  tid = 9270, map = 0x7ff99ff0b000
  tid = 9271, map = 0x7ff99ff0a000
  tid = 9272, map = 0x7ff99ff09000
  perf: Segmentation fault
  Obtained 13 stack frames.
  perf(sighandler_dump_stack+0x41) [0x4e3541]
  /lib64/libc.so.6(+0x34960) [0x7ff99d5f6960]
  perf(thread__put+0x5b) [0x4c6f6b]
  perf(machine__process_event+0x14e) [0x4bd37e]
  perf(perf_event__synthesize_threads+0x3aa) [0x48678a]
  perf(test__mmap_thread_lookup+0x20a) [0x474e0a]
  perf() [0x460d56]
  perf(cmd_test+0x589) [0x461319]
  perf() [0x47c641]
  perf(main+0x617) [0x422317]
  /lib64/libc.so.6(__libc_start_main+0xf0) [0x7ff99d5e1fe0]
  perf() [0x422429]
  [(nil)]
  test child interrupted
  ---- end ----
  Test mmap thread lookup: FAILED!
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-sypazzsl4ptctrmlyi2zcmaj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Use same signal handling strategy as 'record'
Arnaldo Carvalho de Melo [Fri, 11 Dec 2015 19:43:57 +0000 (16:43 -0300)]
perf tools: Use same signal handling strategy as 'record'

I.e. don't exit with the signal number, instead set the signal handler
to the default one and then raise it again.

Noticed while trying to dump the stack at segfaults in the 'perf test'
forked process used to run each test, that inspects signal info at
each test.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-5x5r176wnoqxi5p6id05wv9w@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoMerge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git...
Ingo Molnar [Mon, 14 Dec 2015 08:31:39 +0000 (09:31 +0100)]
Merge tag 'perf-core-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

User visible changes:

  - Fix 'perf top' annotation in --stdio (Namhyung Kim)

  - Support hw breakpoint events (mem:0xAddress) in the default output mode in
    'perf script' (Wang Nan)

Infrastructure changes:

  - Do not hold the hists lock while emitting one specific warning (Namhyung Kim)

  - Fetch map names from correct strtab, worked so far because llvm/clang
    uses just one string table (Wang Nan)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoMerge tag 'v4.4-rc5' into perf/core, to pick up fixes
Ingo Molnar [Mon, 14 Dec 2015 08:31:23 +0000 (09:31 +0100)]
Merge tag 'v4.4-rc5' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoLinux 4.4-rc5 v4.4-rc5
Linus Torvalds [Mon, 14 Dec 2015 01:42:58 +0000 (17:42 -0800)]
Linux 4.4-rc5

8 years agosched/wait: Fix the signal handling fix
Peter Zijlstra [Sun, 13 Dec 2015 21:11:16 +0000 (22:11 +0100)]
sched/wait: Fix the signal handling fix

Jan Stancek reported that I wrecked things for him by fixing things for
Vladimir :/

His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which
should not be possible, however my previous patch made this possible by
unconditionally checking signal_pending().

We cannot use current->state as was done previously, because the
instruction after the store to that variable it can be changed.  We must
instead pass the initial state along and use that.

Fixes: 68985633bccb ("sched/wait: Fix signal handling in bit wait helpers")
Reported-by: Jan Stancek <jstancek@redhat.com>
Reported-by: Chris Mason <clm@fb.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Chris Mason <clm@fb.com>
Reviewed-by: Paul Turner <pjt@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: tglx@linutronix.de
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: hpa@zytor.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge tag 'nfs-for-4.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Sun, 13 Dec 2015 20:46:04 +0000 (12:46 -0800)]
Merge tag 'nfs-for-4.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfix from Trond Myklebust:
 "SUNRPC: Fix a NFSv4.1 callback channel regression"

* tag 'nfs-for-4.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Fix callback channel

8 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 13 Dec 2015 20:41:10 +0000 (12:41 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fixlets from Thomas Gleixner:
 "Two trivial fixes which add missing header fileas and forward
  declarations so the code will compile even when the magic include
  chains are different"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3: Add missing include for barrier.h
  irqchip/gic-v3: Add missing struct device_node declaration

8 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 13 Dec 2015 20:36:23 +0000 (12:36 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "A single fix to unbreak a clocksource driver which has more than 32bit
  counter width"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: Mmio: remove artificial 32bit limitation

8 years agoMerge tag 'char-misc-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 13 Dec 2015 20:29:22 +0000 (12:29 -0800)]
Merge tag 'char-misc-4.4-rc5' of git://git./linux/kernel/git/gregkh/char-misc

Pull fpga driver fixes from Greg KH:
 "Only two small fpga driver fixes here, both have been in linux-next
  for a while, and resolve some reported issues"

* tag 'char-misc-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  fpga manager: Fix firmware resource leak on error
  fpga manager: remove label

8 years agoMerge tag 'staging-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 13 Dec 2015 20:24:39 +0000 (12:24 -0800)]
Merge tag 'staging-4.4-rc5' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
 "Here are a few staging and IIO driver fixes for 4.4-rc5.

  All of them resolve reported problems and have been in linux-next for
  a while.  Nothing major here, just small fixes where needed"

* tag 'staging-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: lustre: echo_copy.._lsm() dereferences userland pointers directly
  iio: adc: spmi-vadc: add missing of_node_put
  iio: fix some warning messages
  iio: light: apds9960: correct ->last_busy count
  iio: lidar: return -EINVAL on invalid signal
  staging: iio: dummy: complete IIO events delivery to userspace

8 years agoMerge tag 'usb-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 13 Dec 2015 19:58:18 +0000 (11:58 -0800)]
Merge tag 'usb-4.4-rc5' of git://git./linux/kernel/git/gregkh/usb

Pull USB driver fixes from Greg KH:
 "Here are a number of small USB fixes for 4.4-rc5.  All of them have
  been in linux-next.  The majority are gadget and phy issues, with a
  few new quirks and device ids added as well"

* tag 'usb-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (32 commits)
  USB: add quirk for devices with broken LPM
  xhci: fix usb2 resume timing and races.
  usb: musb: fail with error when no DMA controller set
  usb: gadget: uvc: fix permissions of configfs attributes
  usb: musb: core: Fix pm runtime for deferred probe
  usb: phy: msm: fix a possible NULL dereference
  USB: host: ohci-at91: fix a crash in ohci_hcd_at91_overcurrent_irq
  usb: Quiet down false peer failure messages
  usb: xhci: fix config fail of FS hub behind a HS hub with MTT
  xhci: Fix memory leak in xhci_pme_acpi_rtd3_enable()
  usb: Use the USB_SS_MULT() macro to decode burst multiplier for log message
  USB: whci-hcd: add check for dma mapping error
  usb: core : hub: Fix BOS 'NULL pointer' kernel panic
  USB: quirks: Apply ALWAYS_POLL to all ELAN devices
  usb-storage: Fix scsi-sd failure "Invalid field in cdb" for USB adapter JMicron
  USB: quirks: Fix another ELAN touchscreen
  usb: dwc3: gadget: don't prestart interrupt endpoints
  USB: serial: Another Infineon flash loader USB ID
  USB: cdc_acm: Ignore Infineon Flash Loader utility
  USB: cp210x: Remove CP2110 ID from compatibility list
  ...

8 years agoMerge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm...
Linus Torvalds [Sun, 13 Dec 2015 00:43:44 +0000 (16:43 -0800)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
 "Here are a bunch of small bug fixes for various ARM platforms, nothing
  really sticks out this week, most of either fixes bugs in code that
  was just added in 4.4, or that has been broken for many years without
  anyone noticing.

  at91/sama5d2:
   - fix sama5de hardware setup of sd/mmc interface
   - proper selection of pinctrl drivers.  PIO4 is necessary for sama5d2

  berlin:
   - fix incorrect clock input for SDIO

  exynos:
   - Fix potential NULL pointer dereference in Exynos PMU driver.

  imx:
   - Fix vf610 SAI clock configuration bug which is discovered by the
     newly added master mode support in SAI audio driver.
   - Fix buggy L2 cache latency values in vf610 device trees, which may
     cause system hang when cpu runs at a higher frequency.

  ixp4xx:
   - fix prototypes for readl/writel functions

  ls2080a:
   - use little-endian register access for GPIO and SDHCI

  omap:
   - Fix clock source for ARM TWD and global timers on am437x
   - Always select REGULATOR_FIXED_VOLTAGE for omap2+ instead of when
     MACH_OMAP3_PANDORA is selected
   - Fix SPI DMA handles for dm816x as only some were mapped
   - Fix up mbox cells for dm816x to make mailbox usable

  pxa:
   - use PWM lookup table for all ezx machines

  s3c24xx:
   - Remove incorrect __init annotation from s3c24xx cpufreq driver
     structures.

  versatile:
   - fix PCI IRQ mapping on Versatile PB"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ls2080a/dts: Add little endian property for GPIO IP block
  dt-bindings: define little-endian property for QorIQ GPIO
  ARM64: dts: ls2080a: fix eSDHC endianness
  ARM: dts: vf610: use reset values for L2 cache latencies
  ARM: pxa: use PWM lookup table for all machines
  ARM: dts: berlin: add 2nd clock for BG2Q sdhci0 and sdhci1
  ARM: dts: berlin: correct BG2Q's sdhci2 2nd clock
  ARM: dts: am4372: fix clock source for arm twd and global timers
  ARM: at91: fix pinctrl driver selection
  ARM: at91/dt: add always-on to 1.8V regulator
  ARM: dts: vf610: fix clock definition for SAI2
  ARM: imx: clk-vf610: fix SAI clock tree
  ARM: ixp4xx: fix read{b,w,l} return types
  irqchip/versatile-fpga: Fix PCI IRQ mapping on Versatile PB
  ARM: OMAP2+: enable REGULATOR_FIXED_VOLTAGE
  ARM: dts: add dm816x missing spi DT dma handles
  ARM: dts: add dm816x missing #mbox-cells
  cpufreq: s3c24xx: Do not mark s3c2410_plls_add as __init
  ARM: EXYNOS: Fix potential NULL pointer access in exynos_sys_powerdown_conf

8 years agoMerge tag 'powerpc-4.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sat, 12 Dec 2015 21:39:59 +0000 (13:39 -0800)]
Merge tag 'powerpc-4.4-4' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 - opal-irqchip: Fix double endian conversion from Alistair Popple
 - cxl: Set endianess of kernel contexts from Frederic Barrat
 - sbc8641: drop bogus PHY IRQ entries from DTS file from Paul Gortmaker
 - Revert "powerpc/eeh: Don't unfreeze PHB PE after reset" from Andrew
   Donnellan

* tag 'powerpc-4.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  Revert "powerpc/eeh: Don't unfreeze PHB PE after reset"
  powerpc/sbc8641: drop bogus PHY IRQ entries from DTS file
  cxl: Set endianess of kernel contexts
  powerpc/opal-irqchip: Fix double endian conversion

8 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 12 Dec 2015 18:44:49 +0000 (10:44 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "17 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  MIPS: fix DMA contiguous allocation
  sh64: fix __NR_fgetxattr
  ocfs2: fix SGID not inherited issue
  mm/oom_kill.c: avoid attempting to kill init sharing same memory
  drivers/base/memory.c: prohibit offlining of memory blocks with missing sections
  tmpfs: fix shmem_evict_inode() warnings on i_blocks
  mm/hugetlb.c: fix resv map memory leak for placeholder entries
  mm: hugetlb: call huge_pte_alloc() only if ptep is null
  kernel: remove stop_machine() Kconfig dependency
  mm: kmemleak: mark kmemleak_init prototype as __init
  mm: fix kerneldoc on mem_cgroup_replace_page
  osd fs: __r4w_get_page rely on PageUptodate for uptodate
  MAINTAINERS: make Vladimir co-maintainer of the memory controller
  mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress
  mm: fix swapped Movable and Reclaimable in /proc/pagetypeinfo
  memcg: fix memory.high target
  mm: hugetlb: fix hugepage memory leak caused by wrong reserve count

8 years agoMerge branch 'parisc-4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Sat, 12 Dec 2015 18:34:20 +0000 (10:34 -0800)]
Merge branch 'parisc-4.4-3' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:
 "Fix the boot crash on Mako machines with Huge Pages, prevent a panic
  with SATA controllers (and others) by correctly calculating the IOMMU
  space, hook up the mlock2 syscall and drop unneeded code in the parisc
  pci code"

* 'parisc-4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Disable huge pages on Mako machines
  parisc: Wire up mlock2 syscall
  parisc: Remove unused pcibios_init_bus()
  parisc iommu: fix panic due to trying to allocate too large region

8 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 12 Dec 2015 18:24:00 +0000 (10:24 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block layer fixes from Jens Axboe:
 "A set of fixes for the current series.  This contains:

   - A bunch of fixes for lightnvm, should be the last round for this
     series.  From Matias and Wenwei.

   - A writeback detach inode fix from Ilya, also marked for stable.

   - A block (though it says SCSI) fix for an OOPS in SCSI runtime power
     management.

   - Module init error path fixes for null_blk from Minfei"

* 'for-linus' of git://git.kernel.dk/linux-block:
  null_blk: Fix error path in module initialization
  lightnvm: do not compile in debugging by default
  lightnvm: prevent gennvm module unload on use
  lightnvm: fix media mgr registration
  lightnvm: replace req queue with nvmdev for lld
  lightnvm: comments on constants
  lightnvm: check mm before use
  lightnvm: refactor spin_unlock in gennvm_get_blk
  lightnvm: put blks when luns configure failed
  lightnvm: use flags in rrpc_get_blk
  block: detach bdev inode from its wb in __blkdev_put()
  SCSI: Fix NULL pointer dereference in runtime PM

8 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Sat, 12 Dec 2015 18:16:26 +0000 (10:16 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Update the linker script to use L1_CACHE_BYTES instead of hard-coded
   64.  We recently changed L1_CACHE_BYTES to 128

 - Improve race condition reporting on set_pte_at() and change the BUG
   to WARN_ONCE.  With hardware update of the accessed/dirty state, we
   need to ensure that set_pte_at() does not inadvertently override
   hardware updated state.  The patch also makes the checks ignore
   !pte_valid() new entries

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Improve error reporting on set_pte_at() checks
  arm64: update linker script to increased L1_CACHE_BYTES value

8 years agoMIPS: fix DMA contiguous allocation
Qais Yousef [Fri, 11 Dec 2015 21:41:09 +0000 (13:41 -0800)]
MIPS: fix DMA contiguous allocation

Recent changes to how GFP_ATOMIC is defined seems to have broken the
condition to use mips_alloc_from_contiguous() in
mips_dma_alloc_coherent().

I couldn't bottom out the exact change but I think it's this commit
d0164adc89f6 ("mm, page_alloc: distinguish between being unable to
sleep, unwilling to sleep and avoiding waking kswapd").

GFP_ATOMIC has multiple bits set and the check for !(gfp & GFP_ATOMIC)
isn't enough.

The reason behind this condition is to check whether we can potentially
do a sleeping memory allocation.  Use gfpflags_allow_blocking() instead
which should be more robust.

Signed-off-by: Qais Yousef <qais.yousef@imgtec.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agosh64: fix __NR_fgetxattr
Dmitry V. Levin [Fri, 11 Dec 2015 21:41:06 +0000 (13:41 -0800)]
sh64: fix __NR_fgetxattr

According to arch/sh/kernel/syscalls_64.S and common sense, __NR_fgetxattr
has to be defined to 259, but it doesn't.  Instead, it's defined to 269,
which is of course used by another syscall, __NR_sched_setaffinity in this
case.

This bug was found by strace test suite.

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoocfs2: fix SGID not inherited issue
Junxiao Bi [Fri, 11 Dec 2015 21:41:03 +0000 (13:41 -0800)]
ocfs2: fix SGID not inherited issue

Commit 8f1eb48758aa ("ocfs2: fix umask ignored issue") introduced an
issue, SGID of sub dir was not inherited from its parents dir.  It is
because SGID is set into "inode->i_mode" in ocfs2_get_init_inode(), but
is overwritten by "mode" which don't have SGID set later.

Fixes: 8f1eb48758aa ("ocfs2: fix umask ignored issue")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm/oom_kill.c: avoid attempting to kill init sharing same memory
Chen Jie [Fri, 11 Dec 2015 21:41:00 +0000 (13:41 -0800)]
mm/oom_kill.c: avoid attempting to kill init sharing same memory

It's possible that an oom killed victim shares an ->mm with the init
process and thus oom_kill_process() would end up trying to kill init as
well.

This has been shown in practice:

Out of memory: Kill process 9134 (init) score 3 or sacrifice child
Killed process 9134 (init) total-vm:1868kB, anon-rss:84kB, file-rss:572kB
Kill process 1 (init) sharing same memory
...
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

And this will result in a kernel panic.

If a process is forked by init and selected for oom kill while still
sharing init_mm, then it's likely this system is in a recoverable state.
However, it's better not to try to kill init and allow the machine to
panic due to unkillable processes.

[rientjes@google.com: rewrote changelog]
[akpm@linux-foundation.org: fix inverted test, per Ben]
Signed-off-by: Chen Jie <chenjie6@huawei.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agodrivers/base/memory.c: prohibit offlining of memory blocks with missing sections
Seth Jennings [Fri, 11 Dec 2015 21:40:57 +0000 (13:40 -0800)]
drivers/base/memory.c: prohibit offlining of memory blocks with missing sections

Commit bdee237c0343 ("x86: mm: Use 2GB memory block size on large-memory
x86-64 systems") and 982792c782ef ("x86, mm: probe memory block size for
generic x86 64bit") introduced large block sizes for x86.  This made it
possible to have multiple sections per memory block where previously,
there was a only every one section per block.

Since blocks consist of contiguous ranges of section, there can be holes
in the blocks where sections are not present.  If one attempts to
offline such a block, a crash occurs since the code is not designed to
deal with this.

This patch is a quick fix to gaurd against the crash by not allowing
blocks with non-present sections to be offlined.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=107781

Signed-off-by: Seth Jennings <sjennings@variantweb.net>
Reported-by: Andrew Banman <abanman@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agotmpfs: fix shmem_evict_inode() warnings on i_blocks
Hugh Dickins [Fri, 11 Dec 2015 21:40:55 +0000 (13:40 -0800)]
tmpfs: fix shmem_evict_inode() warnings on i_blocks

Dmitry Vyukov provides a little program, autogenerated by syzkaller,
which races a fault on a mapping of a sparse memfd object, against
truncation of that object below the fault address: run repeatedly for a
few minutes, it reliably generates shmem_evict_inode()'s
WARN_ON(inode->i_blocks).

(But there's nothing specific to memfd here, nor to the fstat which it
happened to use to generate the fault: though that looked suspicious,
since a shmem_recalc_inode() had been added there recently.  The same
problem can be reproduced with open+unlink in place of memfd_create, and
with fstatfs in place of fstat.)

v3.7 commit 0f3c42f522dc ("tmpfs: change final i_blocks BUG to WARNING")
explains one cause of such a warning (a race with shmem_writepage to
swap), and possible solutions; but we never took it further, and this
syzkaller incident turns out to have a different cause.

shmem_getpage_gfp()'s error recovery, when a freshly allocated page is
then found to be beyond eof, looks plausible - decrementing the alloced
count that was just before incremented - but in fact can go wrong, if a
racing thread (the truncator, for example) gets its shmem_recalc_inode()
in just after our delete_from_page_cache().  delete_from_page_cache()
decrements nrpages, that shmem_recalc_inode() will balance the books by
decrementing alloced itself, then our decrement of alloced take it one
too low: leading to the WARNING when the object is finally evicted.

Once the new page has been exposed in the page cache,
shmem_getpage_gfp() must leave it to shmem_recalc_inode() itself to get
the accounting right in all cases (and not fall through from "trunc:" to
"decused:").  Adjust that error recovery block; and the reinitialization
of info and sbinfo can be removed too.

While we're here, fix shmem_writepage() to avoid the original issue: it
will be safe against a racing shmem_recalc_inode(), if it merely
increments swapped before the shmem_delete_from_page_cache() which
decrements nrpages (but it must then do its own shmem_recalc_inode()
before that, while still in balance, instead of after).  (Aside: why do
we shmem_recalc_inode() here in the swap path? Because its raison d'etre
is to cope with clean sparse shmem pages being reclaimed behind our
back: so here when swapping is a good place to look for that case.) But
I've not now managed to reproduce this bug, even without the patch.

I don't see why I didn't do that earlier: perhaps inhibited by the
preference to eliminate shmem_recalc_inode() altogether.  Driven by this
incident, I do now have a patch to do so at last; but still want to sit
on it for a bit, there's a couple of questions yet to be resolved.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm/hugetlb.c: fix resv map memory leak for placeholder entries
Mike Kravetz [Fri, 11 Dec 2015 21:40:52 +0000 (13:40 -0800)]
mm/hugetlb.c: fix resv map memory leak for placeholder entries

Dmitry Vyukov reported the following memory leak

unreferenced object 0xffff88002eaafd88 (size 32):
  comm "a.out", pid 5063, jiffies 4295774645 (age 15.810s)
  hex dump (first 32 bytes):
    28 e9 4e 63 00 88 ff ff 28 e9 4e 63 00 88 ff ff  (.Nc....(.Nc....
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
     kmalloc include/linux/slab.h:458
     region_chg+0x2d4/0x6b0 mm/hugetlb.c:398
     __vma_reservation_common+0x2c3/0x390 mm/hugetlb.c:1791
     vma_needs_reservation mm/hugetlb.c:1813
     alloc_huge_page+0x19e/0xc70 mm/hugetlb.c:1845
     hugetlb_no_page mm/hugetlb.c:3543
     hugetlb_fault+0x7a1/0x1250 mm/hugetlb.c:3717
     follow_hugetlb_page+0x339/0xc70 mm/hugetlb.c:3880
     __get_user_pages+0x542/0xf30 mm/gup.c:497
     populate_vma_page_range+0xde/0x110 mm/gup.c:919
     __mm_populate+0x1c7/0x310 mm/gup.c:969
     do_mlock+0x291/0x360 mm/mlock.c:637
     SYSC_mlock2 mm/mlock.c:658
     SyS_mlock2+0x4b/0x70 mm/mlock.c:648

Dmitry identified a potential memory leak in the routine region_chg,
where a region descriptor is not free'ed on an error path.

However, the root cause for the above memory leak resides in region_del.
In this specific case, a "placeholder" entry is created in region_chg.
The associated page allocation fails, and the placeholder entry is left
in the reserve map.  This is "by design" as the entry should be deleted
when the map is released.  The bug is in the region_del routine which is
used to delete entries within a specific range (and when the map is
released).  region_del did not handle the case where a placeholder entry
exactly matched the start of the range range to be deleted.  In this
case, the entry would not be deleted and leaked.  The fix is to take
these special placeholder entries into account in region_del.

The region_chg error path leak is also fixed.

Fixes: feba16e25a57 ("mm/hugetlb: add region_del() to delete a specific range of entries")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: hugetlb: call huge_pte_alloc() only if ptep is null
Naoya Horiguchi [Fri, 11 Dec 2015 21:40:49 +0000 (13:40 -0800)]
mm: hugetlb: call huge_pte_alloc() only if ptep is null

Currently at the beginning of hugetlb_fault(), we call huge_pte_offset()
and check whether the obtained *ptep is a migration/hwpoison entry or
not.  And if not, then we get to call huge_pte_alloc().  This is racy
because the *ptep could turn into migration/hwpoison entry after the
huge_pte_offset() check.  This race results in BUG_ON in
huge_pte_alloc().

We don't have to call huge_pte_alloc() when the huge_pte_offset()
returns non-NULL, so let's fix this bug with moving the code into else
block.

Note that the *ptep could turn into a migration/hwpoison entry after
this block, but that's not a problem because we have another
!pte_present check later (we never go into hugetlb_no_page() in that
case.)

Fixes: 290408d4a250 ("hugetlb: hugepage migration core")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org> [2.6.36+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agokernel: remove stop_machine() Kconfig dependency
Chris Wilson [Fri, 11 Dec 2015 21:40:46 +0000 (13:40 -0800)]
kernel: remove stop_machine() Kconfig dependency

Currently the full stop_machine() routine is only enabled on SMP if
module unloading is enabled, or if the CPUs are hotpluggable.  This
leads to configurations where stop_machine() is broken as it will then
only run the callback on the local CPU with irqs disabled, and not stop
the other CPUs or run the callback on them.

For example, this breaks MTRR setup on x86 in certain configs since
ea8596bb2d8d379 ("kprobes/x86: Remove unused text_poke_smp() and
text_poke_smp_batch() functions") as the MTRR is only established on the
boot CPU.

This patch removes the Kconfig option for STOP_MACHINE and uses the SMP
and HOTPLUG_CPU config options to compile the correct stop_machine() for
the architecture, removing the false dependency on MODULE_UNLOAD in the
process.

Link: https://lkml.org/lkml/2014/10/8/124
References: https://bugs.freedesktop.org/show_bug.cgi?id=84794
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Pranith Kumar <bobby.prani@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: kmemleak: mark kmemleak_init prototype as __init
Nicolas Iooss [Fri, 11 Dec 2015 21:40:43 +0000 (13:40 -0800)]
mm: kmemleak: mark kmemleak_init prototype as __init

The kmemleak_init() definition in mm/kmemleak.c is marked __init but its
prototype in include/linux/kmemleak.h is marked __ref since commit
a6186d89c913 ("kmemleak: Mark the early log buffer as __initdata").

This causes a section mismatch which is reported as a warning when
building with clang -Wsection, because kmemleak_init() is declared in
section .ref.text but defined in .init.text.

Fix this by marking kmemleak_init() prototype __init.

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: fix kerneldoc on mem_cgroup_replace_page
Hugh Dickins [Fri, 11 Dec 2015 21:40:40 +0000 (13:40 -0800)]
mm: fix kerneldoc on mem_cgroup_replace_page

Whoops, I missed removing the kerneldoc comment of the lrucare arg
removed from mem_cgroup_replace_page; but it's a good comment, keep it.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoosd fs: __r4w_get_page rely on PageUptodate for uptodate
Hugh Dickins [Fri, 11 Dec 2015 21:40:38 +0000 (13:40 -0800)]
osd fs: __r4w_get_page rely on PageUptodate for uptodate

Commit 42cb14b110a5 ("mm: migrate dirty page without
clear_page_dirty_for_io etc") simplified the migration of a PageDirty
pagecache page: one stat needs moving from zone to zone and that's about
all.

It's convenient and safest for it to shift the PageDirty bit from old
page to new, just before updating the zone stats: before copying data
and marking the new PageUptodate.  This is all done while both pages are
isolated and locked, just as before; and just as before, there's a
moment when the new page is visible in the radix_tree, but not yet
PageUptodate.  What's new is that it may now be briefly visible as
PageDirty before it is PageUptodate.

When I scoured the tree to see if this could cause a problem anywhere,
the only places I found were in two similar functions __r4w_get_page():
which look up a page with find_get_page() (not using page lock), then
claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate.

I'm not sure whether that was right before, but now it might be wrong
(on rare occasions): only claim the page is uptodate if PageUptodate.
Or perhaps the page in question could never be migratable anyway?

Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Boaz Harrosh <ooo@electrozaur.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMAINTAINERS: make Vladimir co-maintainer of the memory controller
Johannes Weiner [Fri, 11 Dec 2015 21:40:35 +0000 (13:40 -0800)]
MAINTAINERS: make Vladimir co-maintainer of the memory controller

Vladimir architected and authored much of the current state of the
memcg's slab memory accounting and tracking.  Make sure he gets CC'd on
bug reports ;-)

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress
Michal Hocko [Fri, 11 Dec 2015 21:40:32 +0000 (13:40 -0800)]
mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress

Tetsuo Handa has reported that the system might basically livelock in
OOM condition without triggering the OOM killer.

The issue is caused by internal dependency of the direct reclaim on
vmstat counter updates (via zone_reclaimable) which are performed from
the workqueue context.  If all the current workers get assigned to an
allocation request, though, they will be looping inside the allocator
trying to reclaim memory but zone_reclaimable can see stalled numbers so
it will consider a zone reclaimable even though it has been scanned way
too much.  WQ concurrency logic will not consider this situation as a
congested workqueue because it relies that worker would have to sleep in
such a situation.  This also means that it doesn't try to spawn new
workers or invoke the rescuer thread if the one is assigned to the
queue.

In order to fix this issue we need to do two things.  First we have to
let wq concurrency code know that we are in trouble so we have to do a
short sleep.  In order to prevent from issues handled by 0e093d99763e
("writeback: do not sleep on the congestion queue if there are no
congested BDIs or if significant congestion is not being encountered in
the current zone") we limit the sleep only to worker threads which are
the ones of the interest anyway.

The second thing to do is to create a dedicated workqueue for vmstat and
mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to
have a spare worker thread for it.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Tejun Heo <tj@kernel.org>
Cc: Cristopher Lameter <clameter@sgi.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: fix swapped Movable and Reclaimable in /proc/pagetypeinfo
Vlastimil Babka [Fri, 11 Dec 2015 21:40:29 +0000 (13:40 -0800)]
mm: fix swapped Movable and Reclaimable in /proc/pagetypeinfo

Commit 016c13daa5c9 ("mm, page_alloc: use masks and shifts when
converting GFP flags to migrate types") has swapped MIGRATE_MOVABLE and
MIGRATE_RECLAIMABLE in the enum definition.  However, migratetype_names
wasn't updated to reflect that.

As a result, the file /proc/pagetypeinfo shows the counts for Movable as
Reclaimable and vice versa.

Additionally, commit 0aaa29a56e4f ("mm, page_alloc: reserve pageblocks
for high-order atomic allocations on demand") introduced
MIGRATE_HIGHATOMIC, but did not add a letter to distinguish it into
show_migration_types(), so it doesn't appear in the listing of free
areas during page alloc failures or oom kills.

This patch fixes both problems.  The atomic reserves will show with a
letter 'H' in the free areas listings.

Fixes: 016c13daa5c9 ("mm, page_alloc: use masks and shifts when converting GFP flags to migrate types")
Fixes: 0aaa29a56e4f ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomemcg: fix memory.high target
Vladimir Davydov [Fri, 11 Dec 2015 21:40:24 +0000 (13:40 -0800)]
memcg: fix memory.high target

When the memory.high threshold is exceeded, try_charge() schedules a
task_work to reclaim the excess.  The reclaim target is set to the
number of pages requested by try_charge().

This is wrong, because try_charge() usually charges more pages than
requested (batch > nr_pages) in order to refill per cpu stocks.  As a
result, a process in a cgroup can easily exceed memory.high
significantly when doing a lot of charges w/o returning to userspace
(e.g.  reading a file in big chunks).

Fix this issue by assuring that when exceeding memory.high a process
reclaims as many pages as were actually charged (i.e.  batch).

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: hugetlb: fix hugepage memory leak caused by wrong reserve count
Naoya Horiguchi [Fri, 11 Dec 2015 21:40:24 +0000 (13:40 -0800)]
mm: hugetlb: fix hugepage memory leak caused by wrong reserve count

When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back on
alloc_buddy_huge_page() to directly create a hugepage from the buddy
allocator.

In that case, however, if alloc_buddy_huge_page() succeeds we don't
decrement h->resv_huge_pages, which means that successful
hugetlb_fault() returns without releasing the reserve count.  As a
result, subsequent hugetlb_fault() might fail despite that there are
still free hugepages.

This patch simply adds decrementing code on that code path.

I reproduced this problem when testing v4.3 kernel in the following situation:
 - the test machine/VM is a NUMA system,
 - hugepage overcommiting is enabled,
 - most of hugepages are allocated and there's only one free hugepage
   which is on node 0 (for example),
 - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to
   node 1, tries to allocate a hugepage,
 - the allocation should fail but the reserve count is still hold.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org> [3.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoparisc: Disable huge pages on Mako machines
Helge Deller [Sun, 6 Dec 2015 20:25:20 +0000 (21:25 +0100)]
parisc: Disable huge pages on Mako machines

Mako-based machines (PA8800 and PA8900 CPUs) don't allow aliasing on
non-equaivalent addresses.

Signed-off-by: Helge Deller <deller@gmx.de>
8 years agoparisc: Wire up mlock2 syscall
Helge Deller [Sun, 6 Dec 2015 20:56:26 +0000 (21:56 +0100)]
parisc: Wire up mlock2 syscall

Signed-off-by: Helge Deller <deller@gmx.de>
8 years agoparisc: Remove unused pcibios_init_bus()
Bjorn Helgaas [Tue, 1 Dec 2015 16:41:47 +0000 (10:41 -0600)]
parisc: Remove unused pcibios_init_bus()

There are no callers of pcibios_init_bus(), so remove it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Helge Deller <deller@gmx.de>
8 years agoparisc iommu: fix panic due to trying to allocate too large region
Mikulas Patocka [Mon, 30 Nov 2015 19:47:46 +0000 (14:47 -0500)]
parisc iommu: fix panic due to trying to allocate too large region

When using the Promise TX2+ SATA controller on PA-RISC, the system often
crashes with kernel panic, for example just writing data with the dd
utility will make it crash.

Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources

CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2
Backtrace:
 [<000000004021497c>] show_stack+0x14/0x20
 [<0000000040410bf0>] dump_stack+0x88/0x100
 [<000000004023978c>] panic+0x124/0x360
 [<0000000040452c18>] sba_alloc_range+0x698/0x6a0
 [<0000000040453150>] sba_map_sg+0x260/0x5b8
 [<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata]
 [<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata]
 [<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata]
 [<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130
 [<000000004049da34>] scsi_request_fn+0x6e4/0x970
 [<00000000403e95a8>] __blk_run_queue+0x40/0x60
 [<00000000403e9d8c>] blk_run_queue+0x3c/0x68
 [<000000004049a534>] scsi_run_queue+0x2a4/0x360
 [<000000004049be68>] scsi_end_request+0x1a8/0x238
 [<000000004049de84>] scsi_io_completion+0xfc/0x688
 [<0000000040493c74>] scsi_finish_command+0x17c/0x1d0

The cause of the crash is not exhaustion of the IOMMU space, there is
plenty of free pages. The function sba_alloc_range is called with size
0x11000, thus the pages_needed variable is 0x11. The function
sba_search_bitmap is called with bits_wanted 0x11 and boundary size is
0x10 (because dma_get_seg_boundary(dev) returns 0xffff).

The function sba_search_bitmap attempts to allocate 17 pages that must not
cross 16-page boundary - it can't satisfy this requirement
(iommu_is_span_boundary always returns true) and fails even if there are
many free entries in the IOMMU space.

How did it happen that we try to allocate 17 pages that don't cross
16-page boundary? The cause is in the function iommu_coalesce_chunks. This
function tries to coalesce adjacent entries in the scatterlist. The
function does several checks if it may coalesce one entry with the next,
one of those checks is this:

if (startsg->length + dma_len > max_seg_size)
break;

When it finishes coalescing adjacent entries, it allocates the mapping:

sg_dma_len(contig_sg) = dma_len;
dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
sg_dma_address(contig_sg) =
PIDE_FLAG
| (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT)
| dma_offset;

It is possible that (startsg->length + dma_len > max_seg_size) is false
(we are just near the 0x10000 max_seg_size boundary), so the funcion
decides to coalesce this entry with the next entry. When the coalescing
succeeds, the function performs
dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
And now, because of non-zero dma_offset, dma_len is greater than 0x10000.
iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts
to allocate 17 pages for a device that must not cross 16-page boundary.

To fix the bug, we must make sure that dma_len after addition of
dma_offset and alignment doesn't cross the segment boundary. I.e. change
if (startsg->length + dma_len > max_seg_size)
break;
to
if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size)
break;

This patch makes this change (it precalculates max_seg_boundary at the
beginning of the function iommu_coalesce_chunks). I also added a check
that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is
not needed for Promise TX2+ SATA, but it may be needed for other devices
that have dma_get_seg_boundary lower than dma_get_max_seg_size).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
8 years agoMerge tag 'imx-fixes-4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo...
Kevin Hilman [Sat, 12 Dec 2015 00:14:34 +0000 (16:14 -0800)]
Merge tag 'imx-fixes-4.4-2' of git://git./linux/kernel/git/shawnguo/linux into fixes

Merge "ARM: imx: fixes for 4.4, 2nd round" from Shawn Guo:

The i.MX fixes for 4.4, 2nd round:
- Fix vf610 SAI clock configuration bug which is discovered by the newly
  added master mode support in SAI audio driver.
- Fix buggy L2 cache latency values in vf610 device trees, which may
  cause system hang when cpu runs at a higher frequency.

* tag 'imx-fixes-4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  ARM: dts: vf610: use reset values for L2 cache latencies
  ARM: dts: vf610: fix clock definition for SAI2
  ARM: imx: clk-vf610: fix SAI clock tree

8 years agols2080a/dts: Add little endian property for GPIO IP block
Liu Gang [Fri, 4 Dec 2015 22:55:05 +0000 (16:55 -0600)]
ls2080a/dts: Add little endian property for GPIO IP block

The GPIO block for ls2080a platform has little endian registers,
the GPIO driver needs this property to read/write registers by
right interface.

Signed-off-by: Liu Gang <Gang.Liu@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
8 years agodt-bindings: define little-endian property for QorIQ GPIO
Li Yang [Fri, 4 Dec 2015 22:55:04 +0000 (16:55 -0600)]
dt-bindings: define little-endian property for QorIQ GPIO

The GPIO block on different QorIQ chips could have registers in different
endianess.  Define the property to specify which endian is used by the
hardware.

Signed-off-by: Liu Gang <Gang.Liu@freescale.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
8 years agoARM64: dts: ls2080a: fix eSDHC endianness
yangbo lu [Fri, 4 Dec 2015 22:55:03 +0000 (16:55 -0600)]
ARM64: dts: ls2080a: fix eSDHC endianness

Add the "little-endian" property to fix the issue that eSDHC
is not working and dumping out "mmc0: Controller never released
inhibit bit(s)." error messages constantly.

Fixes: 5461597f6ce0 ("dts/ls2080a: Update DTSI to add support of various peripherals")
Signed-off-by: Yangbo Lu <yangbo.lu@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
8 years agoUSB: add quirk for devices with broken LPM
Alan Stern [Thu, 10 Dec 2015 20:27:21 +0000 (15:27 -0500)]
USB: add quirk for devices with broken LPM

Some USB device / host controller combinations seem to have problems
with Link Power Management.  For example, Steinar found that his xHCI
controller wouldn't handle bandwidth calculations correctly for two
video cards simultaneously when LPM was enabled, even though the bus
had plenty of bandwidth available.

This patch introduces a new quirk flag for devices that should remain
disabled for LPM, and creates quirk entries for Steinar's devices.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 years agoxhci: fix usb2 resume timing and races.
Mathias Nyman [Fri, 11 Dec 2015 12:38:06 +0000 (14:38 +0200)]
xhci: fix usb2 resume timing and races.

According to USB 2 specs ports need to signal resume for at least 20ms,
in practice even longer, before moving to U0 state.
Both host and devices can initiate resume.

On device initiated resume, a port status interrupt with the port in resume
state in issued. The interrupt handler tags a resume_done[port]
timestamp with current time + USB_RESUME_TIMEOUT, and kick roothub timer.
Root hub timer requests for port status, finds the port in resume state,
checks if resume_done[port] timestamp passed, and set port to U0 state.

On host initiated resume, current code sets the port to resume state,
sleep 20ms, and finally sets the port to U0 state. This should also
be changed to work in a similar way as the device initiated resume, with
timestamp tagging, but that is not yet tested and will be a separate
fix later.

There are a few issues with this approach

1. A host initiated resume will also generate a resume event. The event
   handler will find the port in resume state, believe it's a device
   initiated resume, and act accordingly.

2. A port status request might cut the resume signalling short if a
   get_port_status request is handled during the host resume signalling.
   The port will be found in resume state. The timestamp is not set leading
   to time_after_eq(jiffies, timestamp) returning true, as timestamp = 0.
   get_port_status will proceed with moving the port to U0.

3. If an error, or anything else happens to the port during device
   initiated resume signalling it will leave all the device resume
   parameters hanging uncleared, preventing further suspend, returning
   -EBUSY, and cause the pm thread to busyloop trying to enter suspend.

Fix this by using the existing resuming_ports bitfield to indicate that
resume signalling timing is taken care of.
Check if the resume_done[port] is set before using it for timestamp
comparison, and also clear out any resume signalling related variables
if port is not in U0 or Resume state

This issue was discovered when a PM thread busylooped, trying to runtime
suspend the xhci USB 2 roothub on a Dell XPS

Cc: stable <stable@vger.kernel.org>
Reported-by: Daniel J Blueman <daniel@quora.org>
Tested-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>