cascardo/linux.git
8 years agoipv6: Skip XFRM lookup if dst_entry in socket cache is valid
Jakub Sitnicki [Wed, 8 Jun 2016 13:13:34 +0000 (15:13 +0200)]
ipv6: Skip XFRM lookup if dst_entry in socket cache is valid

At present we perform an xfrm_lookup() for each UDPv6 message we
send. The lookup involves querying the flow cache (flow_cache_lookup)
and, in case of a cache miss, creating an XFRM bundle.

If we miss the flow cache, we can end up creating a new bundle and
deriving the path MTU (xfrm_init_pmtu) from on an already transformed
dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
to xfrm_lookup(). This can happen only if we're caching the dst_entry
in the socket, that is when we're using a connected UDP socket.

To put it another way, the path MTU shrinks each time we miss the flow
cache, which later on leads to incorrectly fragmented payload. It can
be observed with ESPv6 in transport mode:

  1) Set up a transformation and lower the MTU to trigger fragmentation
    # ip xfrm policy add dir out src ::1 dst ::1 \
      tmpl src ::1 dst ::1 proto esp spi 1
    # ip xfrm state add src ::1 dst ::1 \
      proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
    # ip link set dev lo mtu 1500

  2) Monitor the packet flow and set up an UDP sink
    # tcpdump -ni lo -ttt &
    # socat udp6-listen:12345,fork /dev/null &

  3) Send a datagram that needs fragmentation with a connected socket
    # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345
    2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error
    00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448
    00:00:00.000014 IP6 ::1 > ::1: frag (1448|32)
    00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272
    (^ ICMPv6 Parameter Problem)
    00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136

  4) Compare it to a non-connected socket
    # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
    00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
    00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)

What happens in step (3) is:

  1) when connecting the socket in __ip6_datagram_connect(), we
     perform an XFRM lookup, miss the flow cache, create an XFRM
     bundle, and cache the destination,

  2) afterwards, when sending the datagram, we perform an XFRM lookup,
     again, miss the flow cache (due to mismatch of flowi6_iif and
     flowi6_oif, which is an issue of its own), and recreate an XFRM
     bundle based on the cached (and already transformed) destination.

To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
altogether whenever we already have a destination entry cached in the
socket. This prevents the path MTU shrinkage and brings us on par with
UDPv4.

The fix also benefits connected PINGv6 sockets, another user of
ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
twice.

Joint work with Hannes Frederic Sowa.

Reported-by: Jan Tluka <jtluka@redhat.com>
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agol2tp: fix configuration passed to setup_udp_tunnel_sock()
Guillaume Nault [Wed, 8 Jun 2016 10:59:17 +0000 (12:59 +0200)]
l2tp: fix configuration passed to setup_udp_tunnel_sock()

Unused fields of udp_cfg must be all zeros. Otherwise
setup_udp_tunnel_sock() fills ->gro_receive and ->gro_complete
callbacks with garbage, eventually resulting in panic when used by
udp_gro_receive().

[   72.694123] BUG: unable to handle kernel paging request at ffff880033f87d78
[   72.695518] IP: [<ffff880033f87d78>] 0xffff880033f87d78
[   72.696530] PGD 26e2067 PUD 26e3067 PMD 342ed063 PTE 8000000033f87163
[   72.696530] Oops: 0011 [#1] SMP KASAN
[   72.696530] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pptp gre pppox ppp_generic slhc crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel evdev aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper serio_raw acpi_cpufreq button proc\
essor ext4 crc16 jbd2 mbcache virtio_blk virtio_net virtio_pci virtio_ring virtio
[   72.696530] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.7.0-rc1 #1
[   72.696530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[   72.696530] task: ffff880035b59700 ti: ffff880035b70000 task.ti: ffff880035b70000
[   72.696530] RIP: 0010:[<ffff880033f87d78>]  [<ffff880033f87d78>] 0xffff880033f87d78
[   72.696530] RSP: 0018:ffff880035f87bc0  EFLAGS: 00010246
[   72.696530] RAX: ffffed000698f996 RBX: ffff88003326b840 RCX: ffffffff814cc823
[   72.696530] RDX: ffff88003326b840 RSI: ffff880033e48038 RDI: ffff880034c7c780
[   72.696530] RBP: ffff880035f87c18 R08: 000000000000a506 R09: 0000000000000000
[   72.696530] R10: ffff880035f87b38 R11: ffff880034b9344d R12: 00000000ebfea715
[   72.696530] R13: 0000000000000000 R14: ffff880034c7c780 R15: 0000000000000000
[   72.696530] FS:  0000000000000000(0000) GS:ffff880035f80000(0000) knlGS:0000000000000000
[   72.696530] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   72.696530] CR2: ffff880033f87d78 CR3: 0000000033c98000 CR4: 00000000000406a0
[   72.696530] Stack:
[   72.696530]  ffffffff814cc834 ffff880034b93468 0000001481416818 ffff88003326b874
[   72.696530]  ffff880034c7ccb0 ffff880033e48038 ffff88003326b840 ffff880034b93462
[   72.696530]  ffff88003326b88a ffff88003326b88c ffff880034b93468 ffff880035f87c70
[   72.696530] Call Trace:
[   72.696530]  <IRQ>
[   72.696530]  [<ffffffff814cc834>] ? udp_gro_receive+0x1c6/0x1f9
[   72.696530]  [<ffffffff814ccb1c>] udp4_gro_receive+0x2b5/0x310
[   72.696530]  [<ffffffff814d989b>] inet_gro_receive+0x4a3/0x4cd
[   72.696530]  [<ffffffff81431b32>] dev_gro_receive+0x584/0x7a3
[   72.696530]  [<ffffffff810adf7a>] ? __lock_is_held+0x29/0x64
[   72.696530]  [<ffffffff814321f7>] napi_gro_receive+0x124/0x21d
[   72.696530]  [<ffffffffa000b145>] virtnet_receive+0x8df/0x8f6 [virtio_net]
[   72.696530]  [<ffffffffa000b27e>] virtnet_poll+0x1d/0x8d [virtio_net]
[   72.696530]  [<ffffffff81431350>] net_rx_action+0x15b/0x3b9
[   72.696530]  [<ffffffff815893d6>] __do_softirq+0x216/0x546
[   72.696530]  [<ffffffff81062392>] irq_exit+0x49/0xb6
[   72.696530]  [<ffffffff81588e9a>] do_IRQ+0xe2/0xfa
[   72.696530]  [<ffffffff81587a49>] common_interrupt+0x89/0x89
[   72.696530]  <EOI>
[   72.696530]  [<ffffffff810b05df>] ? trace_hardirqs_on_caller+0x229/0x270
[   72.696530]  [<ffffffff8102b3c7>] ? default_idle+0x1c/0x2d
[   72.696530]  [<ffffffff8102b3c5>] ? default_idle+0x1a/0x2d
[   72.696530]  [<ffffffff8102bb8c>] arch_cpu_idle+0xa/0xc
[   72.696530]  [<ffffffff810a6c39>] default_idle_call+0x1a/0x1c
[   72.696530]  [<ffffffff810a6d96>] cpu_startup_entry+0x15b/0x20f
[   72.696530]  [<ffffffff81039a81>] start_secondary+0x12c/0x133
[   72.696530] Code: ff ff ff ff ff ff ff ff ff ff 7f ff ff ff ff ff ff ff 7f 00 7e f8 33 00 88 ff ff 6d 61 58 81 ff ff ff ff 5e de 0a 81 ff ff ff ff <00> 5c e2 34 00 88 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00
[   72.696530] RIP  [<ffff880033f87d78>] 0xffff880033f87d78
[   72.696530]  RSP <ffff880035f87bc0>
[   72.696530] CR2: ffff880033f87d78
[   72.696530] ---[ end trace ad7758b9a1dccf99 ]---
[   72.696530] Kernel panic - not syncing: Fatal exception in interrupt
[   72.696530] Kernel Offset: disabled
[   72.696530] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

v2: use empty initialiser instead of "{ NULL }" to avoid relying on
    first field's type.

Fixes: 38fd2af24fcf ("udp: Add socket based GRO and config")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofutex: Calculate the futex key based on a tail page for file-based futexes
Mel Gorman [Wed, 8 Jun 2016 13:25:22 +0000 (14:25 +0100)]
futex: Calculate the futex key based on a tail page for file-based futexes

Mike Galbraith reported that the LTP test case futex_wake04 was broken
by commit 65d8fc777f6d ("futex: Remove requirement for lock_page()
in get_futex_key()").

This test case uses futexes backed by hugetlbfs pages and so there is an
associated inode with a futex stored on such pages. The problem is that
the key is being calculated based on the head page index of the hugetlbfs
page and not the tail page.

Prior to the optimisation, the page lock was used to stabilise mappings and
pin the inode is file-backed which is overkill. If the page was a compound
page, the head page was automatically looked up as part of the page lock
operation but the tail page index was used to calculate the futex key.

After the optimisation, the compound head is looked up early and the page
lock is only relied upon to identify truncated pages, special pages or a
shmem page moving to swapcache. The head page is looked up because without
the page lock, special care has to be taken to pin the inode correctly.
However, the tail page is still required to calculate the futex key so
this patch records the tail page.

On vanilla 4.6, the output of the test case is;

futex_wake04    0  TINFO  :  Hugepagesize 2097152
futex_wake04    1  TFAIL  :  futex_wake04.c:126: Bug: wait_thread2 did not wake after 30 secs.

With the patch applied

futex_wake04    0  TINFO  :  Hugepagesize 2097152
futex_wake04    1  TPASS  :  Hi hydra, thread2 awake!

Fixes: 65d8fc777f6d "futex: Remove requirement for lock_page() in get_futex_key()"
Reported-and-tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160608132522.GM2469@suse.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agocxgb4: Add device id of T540-BT adapter
Hariprasad Shenai [Wed, 8 Jun 2016 09:27:28 +0000 (14:57 +0530)]
cxgb4: Add device id of T540-BT adapter

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoobjtool, drm/vmwgfx: Fix "duplicate frame pointer save" warning
Josh Poimboeuf [Thu, 26 May 2016 18:43:43 +0000 (13:43 -0500)]
objtool, drm/vmwgfx: Fix "duplicate frame pointer save" warning

objtool reports the following warnings:

  drivers/gpu/drm/vmwgfx/vmwgfx_msg.o: warning: objtool: vmw_send_msg()+0x107: duplicate frame pointer save
  drivers/gpu/drm/vmwgfx/vmwgfx_msg.o: warning: objtool: vmw_host_get_guestinfo()+0x252: duplicate frame pointer save

To quote Linus:

 "The reason is that VMW_PORT_HB_OUT() uses a magic instruction sequence
  (a "rep outsb") to communicate with the hypervisor (it's a virtual GPU
  driver for vmware), and %rbp is part of the communication. So the
  inline asm does a save-and-restore of the frame pointer around the
  instruction sequence.

  I actually find the objtool warning to be quite reasonable, so it's
  not exactly a false positive, since in this case it actually does
  point out that the frame pointer won't be reliable over that
  instruction sequence.

  But in this particular case it just ends up being the wrong thing -
  the code is what it is, and %rbp just can't have the frame information
  due to annoying magic calling conventions."

Silence the warnings by telling objtool to ignore the two functions
which use the VMW_PORT_HB_{IN,OUT} macros.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: DRI <dri-devel@lists.freedesktop.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160526184343.fdtjjjg67smmeekt@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agosched/debug: Fix 'schedstats=enable' cmdline option
Josh Poimboeuf [Tue, 7 Jun 2016 19:43:16 +0000 (14:43 -0500)]
sched/debug: Fix 'schedstats=enable' cmdline option

The 'schedstats=enable' option doesn't work, and also produces the
following warning during boot:

  WARNING: CPU: 0 PID: 0 at /home/jpoimboe/git/linux/kernel/jump_label.c:61 static_key_slow_inc+0x8c/0xa0
  static_key_slow_inc used before call to jump_label_init
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 4.7.0-rc1+ #25
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   0000000000000086 3ae3475a4bea95d4 ffffffff81e03da8 ffffffff8143fc83
   ffffffff81e03df8 0000000000000000 ffffffff81e03de8 ffffffff810b1ffb
   0000003d00000096 ffffffff823514d0 ffff88007ff197c8 0000000000000000
  Call Trace:
   [<ffffffff8143fc83>] dump_stack+0x85/0xc2
   [<ffffffff810b1ffb>] __warn+0xcb/0xf0
   [<ffffffff810b207f>] warn_slowpath_fmt+0x5f/0x80
   [<ffffffff811e9c0c>] static_key_slow_inc+0x8c/0xa0
   [<ffffffff810e07c6>] static_key_enable+0x16/0x40
   [<ffffffff8216d633>] setup_schedstats+0x29/0x94
   [<ffffffff82148a05>] unknown_bootoption+0x89/0x191
   [<ffffffff810d8617>] parse_args+0x297/0x4b0
   [<ffffffff82148d61>] start_kernel+0x1d8/0x4a9
   [<ffffffff8214897c>] ? set_init_arg+0x55/0x55
   [<ffffffff82148120>] ? early_idt_handler_array+0x120/0x120
   [<ffffffff821482db>] x86_64_start_reservations+0x2f/0x31
   [<ffffffff82148427>] x86_64_start_kernel+0x14a/0x16d

The problem is that it tries to update the 'sched_schedstats' static key
before jump labels have been initialized.

Changing jump_label_init() to be called earlier before
parse_early_param() wouldn't fix it: it would still fail trying to
poke_text() because mm isn't yet initialized.

Instead, just create a temporary '__sched_schedstats' variable which can
be copied to the static key later during sched_init() after jump labels
have been initialized.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")
Link: http://lkml.kernel.org/r/453775fe3433bed65731a583e228ccea806d18cd.1465322027.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agosched/debug: Fix /proc/sched_debug regression
Josh Poimboeuf [Fri, 3 Jun 2016 22:58:40 +0000 (17:58 -0500)]
sched/debug: Fix /proc/sched_debug regression

Commit:

  cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")

... introduced a bug when CONFIG_SCHEDSTATS is enabled and the
runtime tunable is disabled (which is the default).

The wait-time, sum-exec, and sum-sleep fields are missing from the
/proc/sched_debug file in the runnable_tasks section.

Fix it with a new schedstat_val() macro which returns the field value
when schedstats is enabled and zero otherwise.  The macro works with
both SCHEDSTATS and !SCHEDSTATS.  I put the macro in stats.h since it
might end up being useful in other places.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")
Link: http://lkml.kernel.org/r/bcda7c2790cf2ccbe586a28c02dd7b6fe7749a2b.1464994423.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoperf/core: Remove a redundant check
Alexander Shishkin [Tue, 7 Jun 2016 12:44:15 +0000 (15:44 +0300)]
perf/core: Remove a redundant check

There is no way to end up in _free_event() with event::pmu being NULL.
The latter is initialized in event allocation path and remains set
forever. In case of allocation failure, the error path doesn't use
_free_event().

Having the check, however, suggests that it is possible to have a
event::pmu==NULL situation in _free_event() and confuses the robots.

This patch gets rid of the check.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1465303455-26032-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agolocking/qspinlock: Fix spin_unlock_wait() some more
Peter Zijlstra [Wed, 8 Jun 2016 08:19:51 +0000 (10:19 +0200)]
locking/qspinlock: Fix spin_unlock_wait() some more

While this prior commit:

  54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")

... fixes spin_is_locked() and spin_unlock_wait() for the usage
in ipc/sem and netfilter, it does not in fact work right for the
usage in task_work and futex.

So while the 2 locks crossed problem:

spin_lock(A) spin_lock(B)
if (!spin_is_locked(B)) spin_unlock_wait(A)
  foo() foo();

... works with the smp_mb() injected by both spin_is_locked() and
spin_unlock_wait(), this is not sufficient for:

flag = 1;
smp_mb(); spin_lock()
spin_unlock_wait() if (!flag)
  // add to lockless list
// iterate lockless list

... because in this scenario, the store from spin_lock() can be delayed
past the load of flag, uncrossing the variables and loosing the
guarantee.

This patch reworks spin_is_locked() and spin_unlock_wait() to work in
both cases by exploiting the observation that while the lock byte
store can be delayed, the contender must have registered itself
visibly in other state contained in the word.

It also allows for architectures to override both functions, as PPC
and ARM64 have an additional issue for which we currently have no
generic solution.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Giovanni Gherdovich <ggherdovich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <waiman.long@hpe.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org # v4.2 and later
Fixes: 54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agox86/cpu/AMD: Extend X86_FEATURE_TOPOEXT workaround to newer models
Borislav Petkov [Wed, 1 Jun 2016 10:04:28 +0000 (12:04 +0200)]
x86/cpu/AMD: Extend X86_FEATURE_TOPOEXT workaround to newer models

We need to reenable the topology extensions CPUID leafs on newer models
too, if BIOS has disabled them, as we rely on them to get proper compute
unit topology.

Make the printk a once thing, while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rui Huang <ray.huang@amd.com>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-hwmon@vger.kernel.org
Link: http://lkml.kernel.org/r/1464775468-23355-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agox86/cpu/intel: Introduce macros for Intel family numbers
Dave Hansen [Fri, 3 Jun 2016 00:19:27 +0000 (17:19 -0700)]
x86/cpu/intel: Introduce macros for Intel family numbers

Problem:

We have a boatload of open-coded family-6 model numbers.  Half of
them have these model numbers in hex and the other half in
decimal.  This makes grepping for them tons of fun, if you were
to try.

Solution:

Consolidate all the magic numbers.  Put all the definitions in
one header.

The names here are closely derived from the comments describing
the models from arch/x86/events/intel/core.c.  We could easily
make them shorter by doing things like s/SANDYBRIDGE/SNB/, but
they seemed fine even with the longer versions to me.

Do not take any of these names too literally, like "DESKTOP"
or "MOBILE".  These are all colloquial names and not precise
descriptions of everywhere a given model will show up.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com>
Cc: Souvik Kumar Chakravarty <souvik.k.chakravarty@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Vishwanath Somayaji <vishwanath.somayaji@intel.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: jacob.jun.pan@intel.com
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: platform-driver-x86@vger.kernel.org
Link: http://lkml.kernel.org/r/20160603001927.F2A7D828@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoarm64: mm: always take dirty state from new pte in ptep_set_access_flags
Will Deacon [Tue, 7 Jun 2016 16:55:15 +0000 (17:55 +0100)]
arm64: mm: always take dirty state from new pte in ptep_set_access_flags

Commit 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for
hardware AF/DBM") ensured that pte flags are updated atomically in the
face of potential concurrent, hardware-assisted updates. However, Alex
reports that:

 | This patch breaks swapping for me.
 | In the broken case, you'll see either systemd cpu time spike (because
 | it's stuck in a page fault loop) or the system hang (because the
 | application owning the screen is stuck in a page fault loop).

It turns out that this is because the 'dirty' argument to
ptep_set_access_flags is always 0 for read faults, and so we can't use
it to set PTE_RDONLY. The failing sequence is:

  1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte
  2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF
  3. A read faults due to the missing access flag
  4. ptep_set_access_flags is called with dirty = 0, due to the read fault
  5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!)
  6. A write faults, but pte_write is true so we get stuck

The solution is to check the new page table entry (as would be done by
the generic, non-atomic definition of ptep_set_access_flags that just
calls set_pte_at) to establish the dirty state.

Cc: <stable@vger.kernel.org> # 4.3+
Fixes: 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Alexander Graf <agraf@suse.de>
Tested-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
8 years agonet-sysfs: fix missing <linux/of_net.h>
Ben Dooks [Tue, 7 Jun 2016 18:27:51 +0000 (19:27 +0100)]
net-sysfs: fix missing <linux/of_net.h>

The of_find_net_device_by_node() function is defined in
<linux/of_net.h> but not included in the .c file that
implements it. Fix the following warning by including the
header:

net/core/net-sysfs.c:1494:19: warning: symbol 'of_find_net_device_by_node' was not declared. Should it be static?

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobridge: Don't insert unnecessary local fdb entry on changing mac address
Toshiaki Makita [Tue, 7 Jun 2016 10:14:17 +0000 (19:14 +0900)]
bridge: Don't insert unnecessary local fdb entry on changing mac address

The missing br_vlan_should_use() test caused creation of an unneeded
local fdb entry on changing mac address of a bridge device when there is
a vlan which is configured on a bridge port but not on the bridge
device.

Fixes: 2594e9064a57 ("bridge: vlan: add per-vlan struct and move to rhashtables")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopowerpc/mm/hash: Compute the segment size correctly for ISA 3.0
Aneesh Kumar K.V [Thu, 19 May 2016 07:54:30 +0000 (13:24 +0530)]
powerpc/mm/hash: Compute the segment size correctly for ISA 3.0

PowerISA 3.0 encodes the segment size in the second half of hash page
table entry. Update hpte_decode() accordingly.

Fixes: 50de596de8be ("powerpc/mm/hash: Add support for Power9 Hash")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXT
Aneesh Kumar K.V [Thu, 2 Jun 2016 09:44:48 +0000 (15:14 +0530)]
powerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXT

In some of the radix TLB flush routines, we use a local to store the
mm->context.id, AKA the PID.

Currently we use an int, but the PID is unsigned long, so large values
of PID will be truncated. In particular MMU_NO_CONTEXT is -1, which
means all our comparisons against that value can never be true.

This means we'll issue TLB flushes when we shouldn't on radix enabled
machines.

Fix it by using an unsigned long for the local. Discovered by Coverity.

Fixes: 1a472c9dba6b ("powerpc/mm/radix: Add tlbflush routines")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Write change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Wed, 8 Jun 2016 03:41:36 +0000 (20:41 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
 "Fixes for crap of assorted ages: EOPENSTALE one is 4.2+, autofs one is
  4.6, d_walk - 3.2+.

  The atomic_open() and coredump ones are regressions from this window"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  coredump: fix dumping through pipes
  fix a regression in atomic_open()
  fix d_walk()/non-delayed __d_free() race
  autofs braino fix for do_last()
  fix EOPENSTALE bug in do_last()

8 years agohwmon: (lm90) use proper type for update_interval
Wolfram Sang [Sun, 5 Jun 2016 07:35:43 +0000 (09:35 +0200)]
hwmon: (lm90) use proper type for update_interval

The code handles this variable always as unsigned, so adapt the type.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
8 years agohwmon: (ina2xx) Document compatible for INA231
Krzysztof Kozlowski [Wed, 1 Jun 2016 09:43:12 +0000 (11:43 +0200)]
hwmon: (ina2xx) Document compatible for INA231

Document the compatible for INA231 sensor.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
8 years agohwmon: (fam15h_power) Disable preemption when reading registers
Borislav Petkov [Wed, 1 Jun 2016 09:36:13 +0000 (11:36 +0200)]
hwmon: (fam15h_power) Disable preemption when reading registers

We need to read a bunch of registers on each compute unit and possibly
on the current CPU too. Disable preemption around it. Otherwise, you
get:

  BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/327
  caller is read_registers+0x6a/0x110 [fam15h_power]
  CPU: 3 PID: 327 Comm: systemd-udevd Not tainted 4.7.0-rc1+ #4
  Hardware name: HP HP EliteBook 745 G3/807E, BIOS N73 Ver. 01.08 01/28/2016
  ...

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Rui Huang <ray.huang@amd.com>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Acked-by: Huang Rui <ray.huang@amd.com>
Tested-by: Huang Rui <ray.huang@amd.com>
Fixes: fa7943449943 ("hwmon: (fam15h_power) Add compute unit accumulated power")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
8 years agocoredump: fix dumping through pipes
Mateusz Guzik [Sun, 5 Jun 2016 21:14:14 +0000 (23:14 +0200)]
coredump: fix dumping through pipes

The offset in the core file used to be tracked with ->written field of
the coredump_params structure. The field was retired in favour of
file->f_pos.

However, ->f_pos is not maintained for pipes which leads to breakage.

Restore explicit tracking of the offset in coredump_params. Introduce
->pos field for this purpose since ->written was already reused.

Fixes: a00839395103 ("get rid of coredump_params->written").

Reported-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
8 years agofix a regression in atomic_open()
Al Viro [Wed, 8 Jun 2016 01:53:51 +0000 (21:53 -0400)]
fix a regression in atomic_open()

open("/foo/no_such_file", O_RDONLY | O_CREAT) on should fail with
EACCES when /foo is not writable; failing with ENOENT is obviously
wrong.  That got broken by a braino introduced when moving the
creat_error logics from atomic_open() to lookup_open().  Easy to
fix, fortunately.

Spotted-by: "Yan, Zheng" <ukernel@gmail.com>
Tested-by: "Yan, Zheng" <ukernel@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
8 years agofix d_walk()/non-delayed __d_free() race
Al Viro [Wed, 8 Jun 2016 01:26:55 +0000 (21:26 -0400)]
fix d_walk()/non-delayed __d_free() race

Ascend-to-parent logics in d_walk() depends on all encountered child
dentries not getting freed without an RCU delay.  Unfortunately, in
quite a few cases it is not true, with hard-to-hit oopsable race as
the result.

Fortunately, the fix is simiple; right now the rule is "if it ever
been hashed, freeing must be delayed" and changing it to "if it
ever had a parent, freeing must be delayed" closes that hole and
covers all cases the old rule used to cover.  Moreover, pipes and
sockets remain _not_ covered, so we do not introduce RCU delay in
the cases which are the reason for having that delay conditional
in the first place.

Cc: stable@vger.kernel.org # v3.2+ (and watch out for __d_materialise_dentry())
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
8 years agocpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo
Srinivas Pandruvada [Wed, 8 Jun 2016 00:38:53 +0000 (17:38 -0700)]
cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo

When turbo is disabled, the ->set_policy() interface is broken.

For example, when turbo is disabled and cpuinfo.max = 2900000 (full
max turbo frequency), setting the limits results in frequency less
than the requested one:
Set 1000000 KHz results in 0700000 KHz
Set 1500000 KHz results in 1100000 KHz
Set 2000000 KHz results in  1500000 KHz

This is because the limits->max_perf fraction is calculated using
the max turbo frequency as the reference, but when the max P-State is
capped in intel_pstate_get_min_max(), the reference is not the max
turbo P-State. This results in reducing max P-State.

One option is to always use max turbo as reference for calculating
limits. But this will not be correct. By definition the intel_pstate
sysfs limits, shows percentage of available performance. So when
BIOS has disabled turbo, the available performance is max non turbo.
So the max_perf_pct should still show 100%.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Subject & changelog, rewrite in fewer lines of code ]
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
8 years agocpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy()
Srinivas Pandruvada [Wed, 8 Jun 2016 00:38:52 +0000 (17:38 -0700)]
cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy()

The limits->max_perf is rounded_up but immediately overwritten by
another assignment to limits->max_perf.

Move that operation to the correct location.

While here also added a pr_debug() call in ->set_policy to aid in
debugging.

Fixes: 785ee2788141 (cpufreq: intel_pstate: Fix limits->max_perf rounding error)
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Subject & changelog ]
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
8 years agoof: fix autoloading due to broken modalias with no 'compatible'
Wolfram Sang [Mon, 6 Jun 2016 16:48:38 +0000 (18:48 +0200)]
of: fix autoloading due to broken modalias with no 'compatible'

Because of an improper dereference, a stray 'C' character was output to
the modalias when no 'compatible' was specified. This is the case for
some old PowerMac drivers which only set the 'name' property. Fix it to
let them match again.

Reported-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Tested-by: Mathieu Malaterre <malat@debian.org>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Fixes: 6543becf26fff6 ("mod/file2alias: make modalias generation safe for cross compiling")
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/pseries: Fix IBM_ARCH_VEC_NRCORES_OFFSET since POWER8NVL was added
Michael Ellerman [Wed, 8 Jun 2016 00:01:23 +0000 (10:01 +1000)]
powerpc/pseries: Fix IBM_ARCH_VEC_NRCORES_OFFSET since POWER8NVL was added

The recent commit 7cc851039d64 ("powerpc/pseries: Add POWER8NVL support
to ibm,client-architecture-support call") added a new PVR mask & value
to the start of the ibm_architecture_vec[] array.

However it missed the fact that further down in the array, we hard code
the offset of one of the fields, and then at boot use that value to
patch the value in the array. This means every update to the array must
also update the #define, ugh.

This means that on pseries machines we will misreport to firmware the
number of cores we support, by a factor of threads_per_core.

Fix it for now by updating the #define.

Fixes: 7cc851039d64 ("powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call")
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
David S. Miller [Wed, 8 Jun 2016 00:14:10 +0000 (17:14 -0700)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains two Netfilter/IPVS fixes for your net
tree, they are:

1) Fix missing alignment in next offset calculation for standard
   targets, introduced in the previous merge window, patch from
   Florian Westphal.

2) Fix to correct the handling of outgoing connections which use the
   SIP-pe such that the binding of a real-server is updated when needed.
   This was an omission from changes introduced by Marco Angaroni in
   the previous merge window too, to allow handling of outgoing
   connections by the SIP-pe. Patch and report came via Simon Horman.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: record TLP and ER timer stats in v6 stats
Yuchung Cheng [Mon, 6 Jun 2016 22:07:18 +0000 (15:07 -0700)]
tcp: record TLP and ER timer stats in v6 stats

The v6 tcp stats scan do not provide TLP and ER timer information
correctly like the v4 version . This patch fixes that.

Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Fixes: eed530b6c676 ("tcp: early retransmit")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: sched: fix tc_should_offload for specific clsact classes
Daniel Borkmann [Mon, 6 Jun 2016 20:50:39 +0000 (22:50 +0200)]
net: sched: fix tc_should_offload for specific clsact classes

When offloading classifiers such as u32 or flower to hardware, and the
qdisc is clsact (TC_H_CLSACT), then we need to differentiate its classes,
since not all of them handle ingress, therefore we must leave those in
software path. Add a .tcf_cl_offload() callback, so we can generically
handle them, tested on ixgbe.

Fixes: 10cbc6843446 ("net/sched: cls_flower: Hardware offloaded filters statistics support")
Fixes: 5b33f48842fa ("net/flower: Introduce hardware offload support")
Fixes: a1b7c5fd7fe9 ("net: sched: add cls_u32 offload hooks for netdevs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoact_police: fix a crash during removal
WANG Cong [Mon, 6 Jun 2016 16:54:30 +0000 (09:54 -0700)]
act_police: fix a crash during removal

The police action is using its own code to initialize tcf hash
info, which makes us to forgot to initialize a->hinfo correctly.
Fix this by calling the helper function tcf_hash_create() directly.

This patch fixed the following crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
 IP: [<ffffffff810c099f>] __lock_acquire+0xd3/0xf91
 PGD d3c34067 PUD d3e18067 PMD 0
 Oops: 0000 [#1] SMP
 CPU: 2 PID: 853 Comm: tc Not tainted 4.6.0+ #87
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 task: ffff8800d3e28040 ti: ffff8800d3f6c000 task.ti: ffff8800d3f6c000
 RIP: 0010:[<ffffffff810c099f>]  [<ffffffff810c099f>] __lock_acquire+0xd3/0xf91
 RSP: 0000:ffff88011b203c80  EFLAGS: 00010002
 RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000028
 RBP: ffff88011b203d40 R08: 0000000000000001 R09: 0000000000000000
 R10: ffff88011b203d58 R11: ffff88011b208000 R12: 0000000000000001
 R13: ffff8800d3e28040 R14: 0000000000000028 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000028 CR3: 00000000d4be1000 CR4: 00000000000006e0
 Stack:
  ffff8800d3e289c0 0000000000000046 000000001b203d60 ffffffff00000000
  0000000000000000 ffff880000000000 0000000000000000 ffffffff00000000
  ffffffff8187142c ffff88011b203ce8 ffff88011b203ce8 ffffffff8101dbfc
 Call Trace:
  <IRQ>
  [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
  [<ffffffff8101dbfc>] ? native_sched_clock+0x1a/0x35
  [<ffffffff8101dbfc>] ? native_sched_clock+0x1a/0x35
  [<ffffffff810a9604>] ? sched_clock_local+0x11/0x78
  [<ffffffff810bf6a1>] ? mark_lock+0x24/0x201
  [<ffffffff810c1dbd>] lock_acquire+0x120/0x1b4
  [<ffffffff810c1dbd>] ? lock_acquire+0x120/0x1b4
  [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
  [<ffffffff81aad89f>] _raw_spin_lock_bh+0x3c/0x72
  [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
  [<ffffffff8187142c>] __tcf_hash_release+0x77/0xd1
  [<ffffffff81871a27>] tcf_action_destroy+0x49/0x7c
  [<ffffffff81870b1c>] tcf_exts_destroy+0x20/0x2d
  [<ffffffff8189273b>] u32_destroy_key+0x1b/0x4d
  [<ffffffff81892788>] u32_delete_key_freepf_rcu+0x1b/0x1d
  [<ffffffff810de3b8>] rcu_process_callbacks+0x610/0x82e
  [<ffffffff8189276d>] ? u32_destroy_key+0x4d/0x4d
  [<ffffffff81ab0bc1>] __do_softirq+0x191/0x3f4

Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofq_codel: return non zero qlen in class dumps
Eric Dumazet [Mon, 6 Jun 2016 16:12:39 +0000 (09:12 -0700)]
fq_codel: return non zero qlen in class dumps

We properly scan the flow list to count number of packets,
but John passed 0 to gnet_stats_copy_queue() so we report
a zero value to user space instead of the result.

Fixes: 640158536632 ("net: sched: restrict use of qstats qlen")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'u32-hwoffload-fixes'
David S. Miller [Tue, 7 Jun 2016 23:27:15 +0000 (16:27 -0700)]
Merge branch 'u32-hwoffload-fixes'

Jakub Kicinski says:

====================
cls_u32 hardware offload fixes

This set fixes two small issues with error codes I noticed
in cls_u32.  Second patch could be viewed as user space API
change but that portion of API is not part of any release,
yet.

Compile tested only.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: cls_u32: be more strict about skip-sw flag
Jakub Kicinski [Mon, 6 Jun 2016 15:16:48 +0000 (16:16 +0100)]
net: cls_u32: be more strict about skip-sw flag

Return an error if user requested skip-sw and the underlaying
hardware cannot handle tc offloads (or offloads are disabled).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: cls_u32: fix error code for invalid flags
Jakub Kicinski [Mon, 6 Jun 2016 15:16:47 +0000 (16:16 +0100)]
net: cls_u32: fix error code for invalid flags

'err' variable is not set in this test, we would return whatever
previous test set 'err' to.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agogtp: #define _UAPI_LINUX_GTP_H_ and not _UAPI_LINUX_GTP_H__
Colin Ian King [Mon, 6 Jun 2016 15:08:41 +0000 (16:08 +0100)]
gtp: #define _UAPI_LINUX_GTP_H_ and not _UAPI_LINUX_GTP_H__

Fix clang build warning:

./include/uapi/linux/gtp.h:1:9: warning: '_UAPI_LINUX_GTP_H_' is
used as a header guard here, followed by #define of a different
macro [-Wheader-guard]

fix by defining  _UAPI_LINUX_GTP_H_ and not _UAPI_LINUX_GTP_H__

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 7 Jun 2016 23:24:44 +0000 (16:24 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "This finally removes the CLK_IS_ROOT flag by picking up the last few
  stragglers that didn't get merged by anyone this time around.

  Better to do it now than wait for another one to pop up.  There's also
  a minor maintainers update and a Kconfig fix"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: nxp: Select MFD_SYSCON for creg driver
  MAINTAINERS: Add file patterns for clock device tree bindings
  clk: Remove CLK_IS_ROOT flag
  clk: microchip: Remove CLK_IS_ROOT
  powerpc/512x: clk: Remove CLK_IS_ROOT
  vexpress/spc: Remove CLK_IS_ROOT

8 years agonet: fec: fix spelling mistakes and add missing newline
Colin Ian King [Mon, 6 Jun 2016 08:21:44 +0000 (09:21 +0100)]
net: fec: fix spelling mistakes and add missing newline

trivial fix to spelling mistakes and add missing newline in pr_err
messages

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'bnxt_en-fixes'
David S. Miller [Tue, 7 Jun 2016 23:02:04 +0000 (16:02 -0700)]
Merge branch 'bnxt_en-fixes'

Michael Chan says:

====================
bnxt_en: Bug fixes.

Fix a race condition and VLAN rx acceleration logic.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Simplify VLAN receive logic.
Michael Chan [Mon, 6 Jun 2016 06:37:16 +0000 (02:37 -0400)]
bnxt_en: Simplify VLAN receive logic.

Since both CTAG and STAG rx acceleration must be enabled together, we
only need to check one feature flag (NETIF_F_HW_VLAN_CTAG_RX) before
calling __vlan_hwaccel_put_tag().

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Enable and disable RX CTAG and RX STAG VLAN acceleration together.
Michael Chan [Mon, 6 Jun 2016 06:37:15 +0000 (02:37 -0400)]
bnxt_en: Enable and disable RX CTAG and RX STAG VLAN acceleration together.

The hardware can only be set to strip or not strip both the VLAN CTAG and
STAG.  It cannot strip one and not strip the other.  Add logic to
bnxt_fix_features() to toggle both feature flags when the user is toggling
one of them.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Fix tx push race condition.
Michael Chan [Mon, 6 Jun 2016 06:37:14 +0000 (02:37 -0400)]
bnxt_en: Fix tx push race condition.

Set the is_push flag in the software BD before the tx data is pushed to
the chip.  It is possible to get the tx interrupt as soon as the tx data
is pushed.  The tx handler will not handle the event properly if the
is_push flag is not set and it will crash.

Signed-off-by: Michael Chan <michael.chan@broadocm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agox86, build: copy ldlinux.c32 to image.iso
H. Peter Anvin [Wed, 6 Apr 2016 00:01:33 +0000 (17:01 -0700)]
x86, build: copy ldlinux.c32 to image.iso

For newer versions of Syslinux, we need ldlinux.c32 in addition to
isolinux.bin to reside on the boot disk, so if the latter is found,
copy it, too, to the isoimage tree.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Linux Stable Tree <stable@vger.kernel.org>
8 years agorxrpc: fix ptr_ret.cocci warnings
Wu Fengguang [Sat, 4 Jun 2016 23:17:19 +0000 (07:17 +0800)]
rxrpc: fix ptr_ret.cocci warnings

net/rxrpc/rxkad.c:1165:1-3: WARNING: PTR_ERR_OR_ZERO can be used

 Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

CC: David Howells <dhowells@redhat.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'rds-packet-assembly-fixes'
David S. Miller [Tue, 7 Jun 2016 22:10:16 +0000 (15:10 -0700)]
Merge branch 'rds-packet-assembly-fixes'

Sowmini Varadhan says:

====================
RDS: TCP: socket locking RDS packet assembly fixes

This three part patchset fixes bugs in synchronization between
rds_tcp_accept_one() and the rds-tcp send/recv path.

Patch 1 ensures that the lock_sock() is taken appropriately
and the RDS datagram reassembly state is reset  to synchronize
with the receive path.

Patch 2 ensures that partially sent RDS datagrams will get
retransmitted after rds_tcp_accept_one() switches sockets.

Patch 3 fixes a race window which would prematurely re-enable
rds_send_xmit() before the rds_tcp_connection setup has been
completed in rds_tcp_accept_one().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoRDS: TCP: fix race windows in send-path quiescence by rds_tcp_accept_one()
Sowmini Varadhan [Sat, 4 Jun 2016 21:00:00 +0000 (14:00 -0700)]
RDS: TCP: fix race windows in send-path quiescence by rds_tcp_accept_one()

The send path needs to be quiesced before resetting callbacks from
rds_tcp_accept_one(), and commit eb192840266f ("RDS:TCP: Synchronize
rds_tcp_accept_one with rds_send_xmit when resetting t_sock") achieves
this using the c_state and RDS_IN_XMIT bit following the pattern
used by rds_conn_shutdown(). However this leaves the possibility
of a race window as shown in the sequence below
    take t_conn_lock in rds_tcp_conn_connect
    send outgoing syn to peer
    drop t_conn_lock in rds_tcp_conn_connect
    incoming from peer triggers rds_tcp_accept_one, conn is
marked CONNECTING
    wait for RDS_IN_XMIT to quiesce any rds_send_xmit threads
    call rds_tcp_reset_callbacks
    [.. race-window where incoming syn-ack can cause the conn
to be marked UP from rds_tcp_state_change ..]
    lock_sock called from rds_tcp_reset_callbacks, and we set
t_sock to null
As soon as the conn is marked UP in the race-window above, rds_send_xmit()
threads will proceed to rds_tcp_xmit and may encounter a null-pointer
deref on the t_sock.

Given that rds_tcp_state_change() is invoked in softirq context, whereas
rds_tcp_reset_callbacks() is in workq context, and testing for RDS_IN_XMIT
after lock_sock could result in a deadlock with tcp_sendmsg, this
commit fixes the race by using a new c_state, RDS_TCP_RESETTING, which
will prevent a transition to RDS_CONN_UP from rds_tcp_state_change().

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoRDS: TCP: Retransmit half-sent datagrams when switching sockets in rds_tcp_reset_call...
Sowmini Varadhan [Sat, 4 Jun 2016 20:59:59 +0000 (13:59 -0700)]
RDS: TCP: Retransmit half-sent datagrams when switching sockets in rds_tcp_reset_callbacks

When we switch a connection's sockets in rds_tcp_rest_callbacks,
any partially sent datagram must be retransmitted on the new
socket so that the receiver can correctly reassmble the RDS
datagram. Use rds_send_reset() which is designed for this purpose.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoRDS: TCP: Add/use rds_tcp_reset_callbacks to reset tcp socket safely
Sowmini Varadhan [Sat, 4 Jun 2016 20:59:58 +0000 (13:59 -0700)]
RDS: TCP: Add/use rds_tcp_reset_callbacks to reset tcp socket safely

When rds_tcp_accept_one() has to replace the existing tcp socket
with a newer tcp socket (duelling-syn resolution), it must lock_sock()
to suppress the rds_tcp_data_recv() path while callbacks are being
changed.  Also, existing RDS datagram reassembly state must be reset,
so that the next datagram on the new socket  does not have corrupted
state. Similarly when resetting the newly accepted socket, appropriate
locks and synchronization is needed.

This commit ensures correct synchronization by invoking
kernel_sock_shutdown to reset a newly accepted sock, and by taking
appropriate lock_sock()s (for old and new sockets) when resetting
existing callbacks.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofq_codel: fix NET_XMIT_CN behavior
Eric Dumazet [Sat, 4 Jun 2016 19:55:13 +0000 (12:55 -0700)]
fq_codel: fix NET_XMIT_CN behavior

My prior attempt to fix the backlogs of parents failed.

If we return NET_XMIT_CN, our parents wont increase their backlog,
so our qdisc_tree_reduce_backlog() should take this into account.

v2: Florian Westphal pointed out that we could drop the packet,
so we need to save qdisc_pkt_len(skb) in a temp variable before
calling fq_codel_drop()

Fixes: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too")
Reported-by: Stas Nichiporovich <stasn77@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf, trace: use READ_ONCE for retrieving file ptr
Daniel Borkmann [Sat, 4 Jun 2016 18:50:59 +0000 (20:50 +0200)]
bpf, trace: use READ_ONCE for retrieving file ptr

In bpf_perf_event_read() and bpf_perf_event_output(), we must use
READ_ONCE() for fetching the struct file pointer, which could get
updated concurrently, so we must prevent the compiler from potential
refetching.

We already do this with tail calls for fetching the related bpf_prog,
but not so on stored perf events. Semantics for both are the same
with regards to updates.

Fixes: a43eec304259 ("bpf: introduce bpf_perf_event_output() helper")
Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm...
Linus Torvalds [Tue, 7 Jun 2016 17:04:35 +0000 (10:04 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace

Pull userns fixes from Eric Biederman:
 "This contains two small but significant fixes to fs/namespace.c.

  The first adds a filesystem refcount drop on error.  The second
  corrects a test in fs_fully_visible which could be abused to allow
  mounting of proc or sysfs, when that should not be allowed.

  To keep myself honest I have tested to ensure the incorrect test in
  fs_fully_visible actually allows improper mounting of proc before the
  fix and that when fixed the improper mounting is not allowed"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  mnt: fs_fully_visible test the proper mount for MNT_LOCKED
  mnt: If fs_fully_visible fails call put_filesystem.

8 years agoIB/IPoIB: Don't update neigh validity for unresolved entries
Erez Shitrit [Sat, 4 Jun 2016 12:15:19 +0000 (15:15 +0300)]
IB/IPoIB: Don't update neigh validity for unresolved entries

ipoib_neigh_get unconditionally updates the "alive" variable member on
any packet send.  This prevents the neighbor garbage collection from
cleaning out a dead neighbor entry if we are still queueing packets
for it.  If the queue for this neighbor is full, then don't update the
alive timestamp.  That way the neighbor can time out even if packets
are still being queued as long as none of them are being sent.

Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Fix alternate path code
Achiad Shochat [Sat, 4 Jun 2016 12:15:37 +0000 (15:15 +0300)]
IB/mlx5: Fix alternate path code

Userspace flag IBV_QP_ALT_PATH is supposed to set the alternate path
including fields alt_pkey_index and alt_timeout.
Added IB_QP_PKEY_INDEX and IB_QP_TIMEOUT to the attribute mask when
calling mlx5_set_path for the alternate path to force setting the
alt_pkey_index and alt_timeout values.

Fixes: bf24481a3a7c4 ('IB/mlx5: Consider alternate path in pkey ...')
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Fix pkey_index length in the QP path record
Noa Osherovich [Sat, 4 Jun 2016 12:15:36 +0000 (15:15 +0300)]
IB/mlx5: Fix pkey_index length in the QP path record

Pkey index fields in the QP context path record are extended to 16
bits, as required by IB spec (version 1.3).
This change affects all QP commands which include path records.

To enable this change, moved the free adaptive routing flag bit
(free_ar) to the most significant byte of the QP path record.

Fixes: e126ba97dba9e ('mlx5: Add driver for Mellanox Connect-IB ...')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Fix entries check in mlx5_ib_resize_cq
Noa Osherovich [Sat, 4 Jun 2016 12:15:35 +0000 (15:15 +0300)]
IB/mlx5: Fix entries check in mlx5_ib_resize_cq

Verify that number of entries is less than device capability.
Add an appropriate warning message for error flow.

Fixes: bde51583f49b ('IB/mlx5: Add support for resize CQ')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Fix entries checks in mlx5_ib_create_cq
Noa Osherovich [Sat, 4 Jun 2016 12:15:34 +0000 (15:15 +0300)]
IB/mlx5: Fix entries checks in mlx5_ib_create_cq

Number of entries shouldn't be greater than the device's max
capability. This should be checked before rounding the entries number
to power of two.

Fixes: 51ee86a4af639 ('IB/mlx5: Fix check of number of entries...')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Check BlueFlame HCA support
Noa Osherovich [Sat, 4 Jun 2016 12:15:33 +0000 (15:15 +0300)]
IB/mlx5: Check BlueFlame HCA support

BlueFlame support is reported only for PFs when the HCA capability is
on.

Fixes: 938fe83c8dcbb ('net/mlx5_core: New device capabilities...')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Fix returned values of query QP
Noa Osherovich [Sat, 4 Jun 2016 12:15:32 +0000 (15:15 +0300)]
IB/mlx5: Fix returned values of query QP

Some variables were not initialized properly: max_recv_wr,
max_recv_sge, max_send_wr, qp_context and max_inline_data.

Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB...')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Limit query HCA clock
Noa Osherovich [Sat, 4 Jun 2016 12:15:31 +0000 (15:15 +0300)]
IB/mlx5: Limit query HCA clock

When PAGE_SIZE is larger than 4K, the user shouldn't be able to query
the HCA core clock. This counter is within 4KB boundary and the
user-space shall not read information that's after this boundary.

Fixes: b368d7cb8ceb7 ('IB/mlx5: Add hca_core_clock_offset to...')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Fix FW version diaplay in sysfs
Eran Ben Elisha [Sat, 4 Jun 2016 12:15:30 +0000 (15:15 +0300)]
IB/mlx5: Fix FW version diaplay in sysfs

Add a 4-digit padding to show FW version in proper format.

Fixes: 9603b61de1eee ('mlx5: Move pci device handling from...')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Return PORT_ERR in Active to Initializing tranisition
Noa Osherovich [Sat, 4 Jun 2016 12:15:29 +0000 (15:15 +0300)]
IB/mlx5: Return PORT_ERR in Active to Initializing tranisition

FW port-change events are fired on Active <-> non Active port state
transitions only.
When the port state changes from Active to Initializing (Active ->
Down -> Initializing), a single event is fired.
The HCA transitions from Down to Initializing unless prevented from
doing so, hence the driver should also propagate events when the port
state is Initializing to consumers so they'll be aware that the port
is no longer Active and act accordingly.

Fixes: e126ba97dba9e ('mlx5: Add driver for Mellanox Connect-IB...')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Set flow steering capability bit
Maor Gottlieb [Sat, 4 Jun 2016 12:15:28 +0000 (15:15 +0300)]
IB/mlx5: Set flow steering capability bit

Flow steering is supported by mlx5 device when the following
features are supported by firmware:

1. NIC RX flow table.
2. Device has enough flow steering levels.
3. Atomic modification of flow table entry.
4. Flow tables chaining.

To check if flow steering is supported it's enough to check
if the driver opened the mlx5 bypass namespace.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Make all casts in ib_device_cap_flags enum consistent
Max Gurtovoy [Mon, 6 Jun 2016 16:34:40 +0000 (19:34 +0300)]
IB/core: Make all casts in ib_device_cap_flags enum consistent

Replace the few u64 casts with ULL to match the rest of the casts.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Fix bit curruption in ib_device_cap_flags structure
Max Gurtovoy [Mon, 6 Jun 2016 16:34:39 +0000 (19:34 +0300)]
IB/core: Fix bit curruption in ib_device_cap_flags structure

ib_device_cap_flags 64-bit expansion caused caps overlapping
and made consumers read wrong device capabilities. For example
IB_DEVICE_SG_GAPS_REG was falsely read by the iser driver causing
it to use a non-existing capability. This happened because signed
int becomes sign extended when converted it to u64. Fix this by
casting IB_DEVICE_ON_DEMAND_PAGING enumeration to ULL.

Fixes: f5aa9159a418 ('IB/core: Add arbitrary sg_list support')
Reported-by: Robert LeBlanc <robert@leblancnet.us>
Cc: Stable <stable@vger.kernel.org> #[v4.6+]
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Initialize sysfs attributes before sysfs create group
Mark Bloch [Sat, 4 Jun 2016 12:15:24 +0000 (15:15 +0300)]
IB/core: Initialize sysfs attributes before sysfs create group

For dynamically allocated sysfs attributes there is a need to call
sysfs_attr_init in order to comply with lockdep, not calling it
will result in error complaining key is not in .data section.

Fixes: b40f4757daa1 ("IB/core: Make device counter infrastructure dynamic")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/IPoIB: Disable bottom half when dealing with device address
Mark Bloch [Sat, 4 Jun 2016 12:15:22 +0000 (15:15 +0300)]
IB/IPoIB: Disable bottom half when dealing with device address

Align locking usage when touching device address with rest
of the kernel. Lock the bottom half when doing so using
netif_addr_lock_bh.

This also solves the following case as reported by lockdep:
CPU0                    CPU1
----                    ----
lock(_xmit_INFINIBAND);
local_irq_disable();
lock(&(&mc->mca_lock)->rlock);
lock(_xmit_INFINIBAND);
<Interrupt>
lock(&(&mc->mca_lock)->rlock);

*** DEADLOCK ***

Fixes: 492a7e67ff83 ("IB/IPoIB: Allow setting the device address")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Fix removal of default GID cache entry
Aviv Heller [Sat, 4 Jun 2016 12:15:21 +0000 (15:15 +0300)]
IB/core: Fix removal of default GID cache entry

When deleting a default GID from the cache, its gid_type field is set
to 0.

This could set the gid_type to RoCE v1 for a RoCE v2 default GID,
essentially making it inaccessible to future modifications, since it
is no longer found by find_gid().

This fix preserves the gid_type value for default gids during cache
operations.

Fixes: b39ffa1df505 ('IB/core: Add gid_type to gid attribute')
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/IPoIB: Fix race between ipoib_remove_one to sysfs functions
Erez Shitrit [Sat, 4 Jun 2016 12:15:20 +0000 (15:15 +0300)]
IB/IPoIB: Fix race between ipoib_remove_one to sysfs functions

In ipoib_remove_one the driver holds the rtnl_lock and tries to do some
operation like dev_change_flags or unregister_netdev, while sysfs
callback like ipoib_vlan_delete holds sysfs mutex and tries to hold the
rtnl_lock via rtnl_trylock() and restart_syscall() if the lock is not
free, meanwhile ipoib_remove_one tries to get the sysfs lock in order to
free its sysfs directory, and we will get  a->b, b->a deadlock.

    Trace like the following:

        schedule+0x37/0x80
        schedule_preempt_disabled+0xe/0x10
        __mutex_lock_slowpath+0xb5/0x120
        mutex_lock+0x23/0x40
        rtnl_lock+0x15/0x20
        netdev_run_todo+0x17c/0x320
        rtnl_unlock+0xe/0x10
        ipoib_vlan_delete+0x11b/0x1b0 [ib_ipoib]
        delete_child+0x54/0x80 [ib_ipoib]
        dev_attr_store+0x18/0x30
        sysfs_kf_write+0x37/0x40
        mutex_lock+0x16/0x40
        SyS_write+0x55/0xc0
        entry_SYSCALL_64_fastpath+0x16/0x75
    And
        schedule+0x37/0x80
        __kernfs_remove+0x1a8/0x260
        ? wake_atomic_t_function+0x60/0x60
        kernfs_remove+0x25/0x40
        sysfs_remove_dir+0x50/0x80
        kobject_del+0x18/0x50
        device_del+0x19f/0x260
        netdev_unregister_kobject+0x6a/0x80
        rollback_registered_many+0x1fd/0x340
        rollback_registered+0x3c/0x70
        unregister_netdevice_queue+0x55/0xc0
        unregister_netdev+0x20/0x30
        ipoib_remove_one+0x114/0x1b0 [ib_ipoib]
        ib_unregister_client+0x4a/0x170 [ib_core]
        ? find_module_all+0x71/0xa0
        ipoib_cleanup_module+0x10/0x94 [ib_ipoib]
        SyS_delete_module+0x1b5/0x210
        entry_SYSCALL_64_fastpath+0x16/0x75

The fix is by checking the flag IPOIB_FLAG_INTF_ON_DESTROY in order to
get out from the sysfs function.

Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Fix query port failure in RoCE
Eli Cohen [Sat, 4 Jun 2016 12:15:18 +0000 (15:15 +0300)]
IB/core: Fix query port failure in RoCE

Currently ib_query_port always attempts to to read the subnet prefix by
calling ib_query_gid(). For RoCE/iWARP there is no subnet manager and no
subnet prefix. Fix this by querying GID[0] only for IB networks.

Fixes: fad61ad4e755 ('IB/core: Add subnet prefix to port info')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: fix error unwind in sysfs hw counters code
Doug Ledford [Tue, 7 Jun 2016 11:43:46 +0000 (07:43 -0400)]
IB/core: fix error unwind in sysfs hw counters code

Between the initial and final versions of the function setup_hw_stats,
the order of variable initialization was changed.  However, the unwind
flow on error did not properly keep up with the flow changes.  Make
the unwind flow match a proper unwind of the allocation flow, then
remove no longer needed variable initializations.

Fixes: b40f4757daa1 (IB/core: Make device counter infrastructure
dynamic)
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Fix array length allocation
Doug Ledford [Mon, 6 Jun 2016 23:52:55 +0000 (19:52 -0400)]
IB/core: Fix array length allocation

The new sysfs hw_counters code had an off by one in its array allocation
length.  Fix that and the comment along with it.

Reported-by: Mark Bloch <markb@mellanox.com>
Fixes: b40f4757daa1 (IB/core: Make device counter infrastructure
dynamic)
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoALSA: hda/realtek: Add T560 docking unit fixup
Torsten Hilbrich [Tue, 7 Jun 2016 11:14:21 +0000 (13:14 +0200)]
ALSA: hda/realtek: Add T560 docking unit fixup

Tested with Lenovo Ultradock. Fixes the non-working headphone jack on
the docking unit.

Signed-off-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
8 years agomnt: fs_fully_visible test the proper mount for MNT_LOCKED
Eric W. Biederman [Fri, 27 May 2016 19:50:05 +0000 (14:50 -0500)]
mnt: fs_fully_visible test the proper mount for MNT_LOCKED

MNT_LOCKED implies on a child mount implies the child is locked to the
parent.  So while looping through the children the children should be
tested (not their parent).

Typically an unshare of a mount namespace locks all mounts together
making both the parent and the slave as locked but there are a few
corner cases where other things work.

Cc: stable@vger.kernel.org
Fixes: ceeb0e5d39fc ("vfs: Ignore unlocked mounts in fs_fully_visible")
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
8 years agomnt: If fs_fully_visible fails call put_filesystem.
Eric W. Biederman [Mon, 6 Jun 2016 20:36:07 +0000 (15:36 -0500)]
mnt: If fs_fully_visible fails call put_filesystem.

Add this trivial missing error handling.

Cc: stable@vger.kernel.org
Fixes: 1b852bceb0d1 ("mnt: Refactor the logic for mounting sysfs and proc in a user namespace")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
8 years agonet_sched: keep backlog updated with qlen
WANG Cong [Fri, 3 Jun 2016 22:05:57 +0000 (15:05 -0700)]
net_sched: keep backlog updated with qlen

For gso_skb we only update qlen, backlog should be updated too.

Note, it is correct to just update these stats at one layer,
because the gso_skb is cached there.

Reported-by: Stas Nichiporovich <stasn77@gmail.com>
Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoACPI / EC: Fix a boot EC regresion by restoring boot EC support for the DSDT EC
Lv Zheng [Fri, 3 Jun 2016 02:26:12 +0000 (10:26 +0800)]
ACPI / EC: Fix a boot EC regresion by restoring boot EC support for the DSDT EC

According to the Windows probing result, during the table loading, the EC
device described in the ECDT should be used. And the ECDT EC is also
effective during the period the namespace objects are initialized (we can
see a separate process executing _STA/_INI on Windows before executing
other device specific control methods, for example, EC._REG). During the
device enumration, the EC device described in the DSDT should be used. But
there are differences between Linux and Windows around the device probing
order. Thus in Linux, we should enable the DSDT EC as early as possible
before enumerating devices in order not to trigger issues related to the
device enumeration order differences.

This patch thus converts acpi_boot_ec_enable() into acpi_ec_dsdt_probe() to
fix the gap. This also fixes a user reported regression triggered after we
switched the "table loading"/"ECDT support" to be ACPI spec 2.0 compliant.

Fixes: 59f0aa9480cf (ACPI 2.0 / ECDT: Remove early namespace reference from EC)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=119261
Reported-and-tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
8 years agoIB/hfi1: Suppress sparse warnings
Bart Van Assche [Fri, 3 Jun 2016 19:11:16 +0000 (12:11 -0700)]
IB/hfi1: Suppress sparse warnings

Avoid that sparse reports the following warnings for the hfi1 driver:

trace.c:217:13: warning: no previous prototype for ‘print_u64_array’ [-Wmissing-prototypes]
user_sdma.c:1361:17: warning: dubious: !x & y

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Use bit 0 instead of bit 1
Bart Van Assche [Fri, 3 Jun 2016 19:10:37 +0000 (12:10 -0700)]
IB/hfi1: Use bit 0 instead of bit 1

The first argument of test_bit() and clear_bit() is a bit number and
not a bitmask. Hence change that first argument from (1 << 0) into 0.
This patch avoids that smatch reports the following warnings:

user_sdma.c:1059: sdma_cache_evict() warn: test_bit() takes a bit number
user_sdma.c:1590: sdma_rb_remove() warn: test_bit() takes a bit number

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Fix indentation
Bart Van Assche [Fri, 3 Jun 2016 19:09:56 +0000 (12:09 -0700)]
IB/hfi1: Fix indentation

Make the indentation of the source code consistent. Detected by
smatch.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rdmavt: Annotate rvt_reset_qp()
Bart Van Assche [Fri, 3 Jun 2016 19:09:16 +0000 (12:09 -0700)]
IB/rdmavt: Annotate rvt_reset_qp()

This patch avoids that sparse reports the following warning:

rdmavt/qp.c:507:17: warning: context imbalance in 'rvt_reset_qp' - unexpected unlock

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mad: Fix indentation
Bart Van Assche [Fri, 3 Jun 2016 19:08:44 +0000 (12:08 -0700)]
IB/mad: Fix indentation

Make indentation consistent. Detected by smatch.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hal Rosenstock <hal@mellanox.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Reviewed-By: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/core: Fix indentation
Bart Van Assche [Fri, 3 Jun 2016 19:08:00 +0000 (12:08 -0700)]
RDMA/core: Fix indentation

Make indentation consistent. Detected by smatch.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagi@gimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Fix srp_map_sg_dma()
Bart Van Assche [Fri, 3 Jun 2016 18:40:24 +0000 (11:40 -0700)]
IB/srp: Fix srp_map_sg_dma()

Because patch "IB/srp: Move common code into the caller" was applied
partially srp_map_sg_dma() doesn't work properly. Fix this by
applying the remainder of that patch. See also
http://thread.gmane.org/gmane.linux.drivers.rdma/35803/focus=35811.

Fixes: 3849e44d1c4b ("IB/srp: Move common code into the caller")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Sagi Grimberg <sai@grimberg.me>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Always initialize use_fast_reg and use_fmr
Bart Van Assche [Fri, 3 Jun 2016 18:39:35 +0000 (11:39 -0700)]
IB/srp: Always initialize use_fast_reg and use_fmr

Avoid that mapping fails due to use_fast_reg != 0 or use_fmr != 0
if both member variables should be zero (if never_register == 1 or
if neither FMR nor FR is supported). Remove an initialization that
became superfluous due to changing a kmalloc() into a kzalloc()
call.

Fixes: 509c5f33f4f6 ("IB/srp: Prevent mapping failures")
Cc: Sagi Grimberg <sai@grimberg.m>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Fix device managed flow steering support test
Bart Van Assche [Fri, 3 Jun 2016 14:58:32 +0000 (07:58 -0700)]
IB/mlx4: Fix device managed flow steering support test

Perform the test for device managed flow steering support even if
memory windows are not supported. I noticed this because smatch
reported inconsistent indentation for the device managed flow
steering support test.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/usnic: Remove unused DMA attributes
Krzysztof Kozlowski [Thu, 2 Jun 2016 09:45:01 +0000 (11:45 +0200)]
IB/usnic: Remove unused DMA attributes

The DMA attributes are set but never used.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: fix null pointer deref and mem leak in error handling
Colin Ian King [Wed, 1 Jun 2016 18:06:36 +0000 (19:06 +0100)]
IB/core: fix null pointer deref and mem leak in error handling

The current error handling in setup_hw_stats has a couple of issues.
It is possible to generate a null pointer deference on the
kfree of hsag->attrs[i] because two of the early error exit paths
jump to the kfree when hsags NULL and not allocated. Fix this by
moving the kfree on stats and jumping to that, avoiding the hsag
freeing.

Secondly, there is a memory leak of stats if the hsag allocation
fails; instead of returning, jump to the kfree on stats.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: fix an error code in ib_core_init()
Dan Carpenter [Tue, 31 May 2016 16:05:56 +0000 (19:05 +0300)]
IB/core: fix an error code in ib_core_init()

We should return the error code if ib_add_ibnl_clients() fails.  The
current code returns success.

Fixes: 735c631ae99d ('IB/core: Register SA ibnl client during ib_core initialization')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Avoid large frame size warning
Leon Romanovsky [Tue, 31 May 2016 07:54:36 +0000 (10:54 +0300)]
IB/hfi1: Avoid large frame size warning

When CONFIG_FRAME_WARN is set to 1024 bytes, which is useful to find
stack consumers, we get a warning in hfi1 driver.

drivers/infiniband/hw/hfi1/affinity.c: In function
‘hfi1_get_proc_affinity’:
drivers/infiniband/hw/hfi1/affinity.c:415:1: warning: the frame size of
1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]

This change removes unneeded buf[1024] declaration and usage.

Fixes: f48ad614c100 ("IB/hfi1: Move driver out of staging")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: fix some indenting
Dan Carpenter [Sat, 28 May 2016 05:01:20 +0000 (08:01 +0300)]
IB/hfi1: fix some indenting

That extra tabs are misleading.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cm: Fix a recently introduced locking bug
Bart Van Assche [Fri, 25 Mar 2016 15:33:16 +0000 (08:33 -0700)]
IB/cm: Fix a recently introduced locking bug

ib_cm_notify() can be called from interrupt context. Hence do not
reenable interrupts unconditionally in cm_establish().

This patch avoids that lockdep reports the following warning:

WARNING: CPU: 0 PID: 23317 at kernel/locking/lockdep.c:2624 trace _hardirqs_on_caller+0x112/0x1b0
DEBUG_LOCKS_WARN_ON(current->hardirq_context)
Call Trace:
 <IRQ>  [<ffffffff812bd0e5>] dump_stack+0x67/0x92
 [<ffffffff81056f21>] __warn+0xc1/0xe0
 [<ffffffff81056f8a>] warn_slowpath_fmt+0x4a/0x50
 [<ffffffff810a5932>] trace_hardirqs_on_caller+0x112/0x1b0
 [<ffffffff810a59dd>] trace_hardirqs_on+0xd/0x10
 [<ffffffff815992c7>] _raw_spin_unlock_irq+0x27/0x40
 [<ffffffffa0382e9c>] ib_cm_notify+0x25c/0x290 [ib_cm]
 [<ffffffffa068fbc1>] srpt_qp_event+0xa1/0xf0 [ib_srpt]
 [<ffffffffa04efb97>] mlx4_ib_qp_event+0x67/0xd0 [mlx4_ib]
 [<ffffffffa034ec0a>] mlx4_qp_event+0x5a/0xc0 [mlx4_core]
 [<ffffffffa03365f8>] mlx4_eq_int+0x3d8/0xcf0 [mlx4_core]
 [<ffffffffa0336f9c>] mlx4_msi_x_interrupt+0xc/0x20 [mlx4_core]
 [<ffffffff810b0914>] handle_irq_event_percpu+0x64/0x100
 [<ffffffff810b09e4>] handle_irq_event+0x34/0x60
 [<ffffffff810b3a6a>] handle_edge_irq+0x6a/0x150
 [<ffffffff8101ad05>] handle_irq+0x15/0x20
 [<ffffffff8101a66c>] do_IRQ+0x5c/0x110
 [<ffffffff8159a2c9>] common_interrupt+0x89/0x89
 [<ffffffff81297a17>] blk_run_queue_async+0x37/0x40
 [<ffffffffa0163e53>] rq_completed+0x43/0x70 [dm_mod]
 [<ffffffffa0164896>] dm_softirq_done+0x176/0x280 [dm_mod]
 [<ffffffff812a26c2>] blk_done_softirq+0x52/0x90
 [<ffffffff8105bc1f>] __do_softirq+0x10f/0x230
 [<ffffffff8105bec8>] irq_exit+0xa8/0xb0
 [<ffffffff8103653e>] smp_trace_call_function_single_interrupt+0x2e/0x30
 [<ffffffff81036549>] smp_call_function_single_interrupt+0x9/0x10
 [<ffffffff8159a959>] call_function_single_interrupt+0x89/0x90
 <EOI>

Fixes: commit be4b499323bf (IB/cm: Do not queue work to a device that's going away)
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Nikolay Borisov <kernel@kyup.com>
Cc: stable <stable@vger.kernel.org> # v4.2+
Acked-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agosoreuseport: add compat case for setsockopt SO_ATTACH_REUSEPORT_CBPF
Helge Deller [Fri, 3 Jun 2016 21:49:17 +0000 (23:49 +0200)]
soreuseport: add compat case for setsockopt SO_ATTACH_REUSEPORT_CBPF

Commit 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
missed to add the compat case for the SO_ATTACH_REUSEPORT_CBPF option.

Signed-off-by: Helge Deller <deller@gmx.de>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosoreuseport: Fix reuseport_bpf testcase on 32bit architectures
Helge Deller [Fri, 3 Jun 2016 17:19:20 +0000 (19:19 +0200)]
soreuseport: Fix reuseport_bpf testcase on 32bit architectures

This fixes the following compiler warnings when compiling the
reuseport_bpf testcase on a 32 bit platform:

reuseport_bpf.c: In function ‘attach_ebpf’:
reuseport_bpf.c:114:15: warning: cast from pointer to integer of ifferent size [-Wpointer-to-int-cast]

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodrm/nouveau/disp/sor/gm107: training pattern registers are like gm200
Ben Skeggs [Fri, 3 Jun 2016 05:05:52 +0000 (15:05 +1000)]
drm/nouveau/disp/sor/gm107: training pattern registers are like gm200

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
8 years agodrm/nouveau/disp/sor/gf119: both links use the same training register
Ben Skeggs [Fri, 3 Jun 2016 04:37:40 +0000 (14:37 +1000)]
drm/nouveau/disp/sor/gf119: both links use the same training register

It appears that, for whatever reason, both link A and B use the same
register to control the training pattern.  It's a little odd, as the
GPUs before this (Tesla/Fermi1) have per-link registers, as do newer
GPUs (Maxwell).

Fixes the third DP output on NVS 510 (GK107).

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
8 years agodrm/vc4: Make pageflip completion handling more robust.
Mario Kleiner [Wed, 18 May 2016 12:02:46 +0000 (14:02 +0200)]
drm/vc4: Make pageflip completion handling more robust.

Protect both the setup of the pageflip event and the
latching of the new requested displaylist head pointer
by the event lock, so we can't get into a situation
where vc4_atomic_flush latches the new display list via
HVS_WRITE, then immediately gets preempted before queueing
the pageflip event, then the page-flip completes in hw and
the vc4_crtc_handle_page_flip() runs and no-ops due to
lack of a pending pageflip event, then vc4_atomic_flush
continues and only then queues the pageflip event - after
the page flip handling already no-oped. This would cause
flip completion handling only at the next vblank - one
frame too late.

In vc4_crtc_handle_page_flip() check the actual DL head
pointer in SCALER_DISPLACTX against the requested pointer
for page flip to make sure that the flip actually really
completed in the current vblank and doesn't get deferred
to the next one because the DL head pointer was written
a bit too late into SCALER_DISPLISTX, after start of
vblank, and missed the boat. This avoids handling a
pageflip completion too early - one frame too early.

According to Eric, DL head pointer updates which were
written into the HVS DISPLISTX reg get committed to hardware
at the last pixel of active scanout. Our vblank interrupt
handler, as triggered by PV_INT_VFP_START irq, gets to run
earliest at the first pixel of HBLANK at the end of the
last scanline of active scanout, ie. vblank irq handling
runs at least 1 pixel duration after a potential pageflip
completion happened in hardware.

This ordering of events in the hardware, together with the
lock protection and SCALER_DISPLACTX sampling of this patch,
guarantees that pageflip completion handling only runs at
exactly the vblank irq of actual pageflip completion in all
cases.

Background info from Eric about the relative timing of
HVS, PV's and trigger points for interrupts, DL updates:

https://lists.freedesktop.org/archives/dri-devel/2016-May/107510.html

Tested on RPi 2B with hardware timing measurement equipment
and shown to no longer complete flips too early or too late.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
8 years agodrm/vc4: Fix ioctl permissions for render nodes.
Herve Jourdain [Tue, 31 May 2016 18:24:46 +0000 (02:24 +0800)]
drm/vc4: Fix ioctl permissions for render nodes.

Contrary to other flags to DRM_IOCTL_DEF_DRV(), which restrict usage,
the flag for render node is an enabler (the IOCTL can't be used from
render node if it's not present).  So DRM_RENDER_ALLOW needs to be
added to all the flags that were previously 0.

Signed-off-by: Herve Jourdain <herve.jourdain@neuf.fr>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes: 0cd3e2747662 ("drm/vc4: Add missing render node support")

8 years agoMerge tag 'edac_fixes_for_4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Linus Torvalds [Mon, 6 Jun 2016 16:55:31 +0000 (09:55 -0700)]
Merge tag 'edac_fixes_for_4.7' of git://git./linux/kernel/git/bp/bp

Pull EDAC fixes from Borislav Petkov:
 "EDAC fixes to recent fallout from workqueue cleanup and Broadwell
  enablement:

   - sb_edac fallout fixes from recent Broadwell enablement (Tony Luck)

   - EDAC workqueue poll period resetting fix (Nicholas Krause)"

* tag 'edac_fixes_for_4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, sb_edac: Readd accidentally dropped Broadwell-D support
  EDAC: Fix workqueues poll period resetting
  EDAC, sb_edac: Fix rank lookup on Broadwell

8 years agox86/msr: Use the proper trace point conditional for writes
Dr. David Alan Gilbert [Fri, 3 Jun 2016 18:00:59 +0000 (19:00 +0100)]
x86/msr: Use the proper trace point conditional for writes

The msr tracing for writes is incorrectly conditional on the read trace.

Fixes: 7f47d8cc039f "x86, tracing, perf: Add trace point for MSR accesses"
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: stable@vger.kernel.org
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1464976859-21850-1-git-send-email-dgilbert@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agobnx2x: allow adding VLANs while interface is down
Michal Schmidt [Fri, 3 Jun 2016 13:32:18 +0000 (15:32 +0200)]
bnx2x: allow adding VLANs while interface is down

Since implementing VLAN filtering in commit 05cc5a39ddb74
("bnx2x: add vlan filtering offload") bnx2x refuses to add a VLAN while
the interface is down:

  # ip link add link enp3s0f0 enp3s0f0_10 type vlan id 10
  RTNETLINK answers: Bad address

and in dmesg (with bnx2x.debug=0x20):
  bnx2x: [bnx2x_vlan_rx_add_vid:12941(enp3s0f0)]Ignoring VLAN
  configuration the interface is down

Other drivers have no problem with this.
Fix this peculiar behavior in the following way:
 - Accept requests to add/kill VID regardless of the device state.
   Maintain the requested list of VIDs in the bp->vlan_reg list.
 - If the device is up, try to configure the VID list into the hardware.
   If we run out of VLAN credits or encounter a failure configuring an
   entry, fall back to accepting all VLANs.
   If we successfully configure all entries from the list, turn the
   fallback off.
 - Use the same code for reconfiguring VLANs during NIC load.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>