cascardo/linux.git
7 years agoblk-mq: fix deadlock in blk_mq_register_disk() error path
Jens Axboe [Tue, 2 Aug 2016 14:45:44 +0000 (08:45 -0600)]
blk-mq: fix deadlock in blk_mq_register_disk() error path

If we fail registering any of the hardware queues, we call
into blk_mq_unregister_disk() with the hotplug mutex already
held. Since blk_mq_unregister_disk() attempts to acquire the
same mutex, we end up in a less than happy place.

Reported-by: Jinpu Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoInclude: blkdev: Removed duplicate 'struct request;' declaration.
John Pittman [Mon, 1 Aug 2016 20:35:53 +0000 (16:35 -0400)]
Include: blkdev: Removed duplicate 'struct request;' declaration.

In include/linux/blkdev.h duplicate declarations of the request
struct exist.  Cleaned up by removing the second, unneeded
declaration.

Signed-off-by: John Pittman <jpittman@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoFixup direct bi_rw modifiers
Shaun Tancheff [Sat, 30 Jul 2016 21:45:48 +0000 (16:45 -0500)]
Fixup direct bi_rw modifiers

bi_rw should be using bio_set_op_attrs to set bi_rw.

Signed-off-by: Shaun Tancheff <shaun@tancheff.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoblock: fix bdi vs gendisk lifetime mismatch
Dan Williams [Sun, 31 Jul 2016 18:15:13 +0000 (11:15 -0700)]
block: fix bdi vs gendisk lifetime mismatch

The name for a bdi of a gendisk is derived from the gendisk's devt.
However, since the gendisk is destroyed before the bdi it leaves a
window where a new gendisk could dynamically reuse the same devt while a
bdi with the same name is still live.  Arrange for the bdi to hold a
reference against its "owner" disk device while it is registered.
Otherwise we can hit sysfs duplicate name collisions like the following:

 WARNING: CPU: 10 PID: 2078 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80
 sysfs: cannot create duplicate filename '/devices/virtual/bdi/259:1'

 Hardware name: HP ProLiant DL580 Gen8, BIOS P79 05/06/2015
  0000000000000286 0000000002c04ad5 ffff88006f24f970 ffffffff8134caec
  ffff88006f24f9c0 0000000000000000 ffff88006f24f9b0 ffffffff8108c351
  0000001f0000000c ffff88105d236000 ffff88105d1031e0 ffff8800357427f8
 Call Trace:
  [<ffffffff8134caec>] dump_stack+0x63/0x87
  [<ffffffff8108c351>] __warn+0xd1/0xf0
  [<ffffffff8108c3cf>] warn_slowpath_fmt+0x5f/0x80
  [<ffffffff812a0d34>] sysfs_warn_dup+0x64/0x80
  [<ffffffff812a0e1e>] sysfs_create_dir_ns+0x7e/0x90
  [<ffffffff8134faaa>] kobject_add_internal+0xaa/0x320
  [<ffffffff81358d4e>] ? vsnprintf+0x34e/0x4d0
  [<ffffffff8134ff55>] kobject_add+0x75/0xd0
  [<ffffffff816e66b2>] ? mutex_lock+0x12/0x2f
  [<ffffffff8148b0a5>] device_add+0x125/0x610
  [<ffffffff8148b788>] device_create_groups_vargs+0xd8/0x100
  [<ffffffff8148b7cc>] device_create_vargs+0x1c/0x20
  [<ffffffff811b775c>] bdi_register+0x8c/0x180
  [<ffffffff811b7877>] bdi_register_dev+0x27/0x30
  [<ffffffff813317f5>] add_disk+0x175/0x4a0

Cc: <stable@vger.kernel.org>
Reported-by: Yi Zhang <yizhan@redhat.com>
Tested-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Fixed up missing 0 return in bdi_register_owner().

Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoblk-mq: Allow timeouts to run while queue is freezing
Gabriel Krisman Bertazi [Mon, 1 Aug 2016 14:23:39 +0000 (08:23 -0600)]
blk-mq: Allow timeouts to run while queue is freezing

In case a submitted request gets stuck for some reason, the block layer
can prevent the request starvation by starting the scheduled timeout work.
If this stuck request occurs at the same time another thread has started
a queue freeze, the blk_mq_timeout_work will not be able to acquire the
queue reference and will return silently, thus not issuing the timeout.
But since the request is already holding a q_usage_counter reference and
is unable to complete, it will never release its reference, preventing
the queue from completing the freeze started by first thread.  This puts
the request_queue in a hung state, forever waiting for the freeze
completion.

This was observed while running IO to a NVMe device at the same time we
toggled the CPU hotplug code. Eventually, once a request got stuck
requiring a timeout during a queue freeze, we saw the CPU Hotplug
notification code get stuck inside blk_mq_freeze_queue_wait, as shown in
the trace below.

[c000000deaf13690] [c000000deaf13738] 0xc000000deaf13738 (unreliable)
[c000000deaf13860] [c000000000015ce8] __switch_to+0x1f8/0x350
[c000000deaf138b0] [c000000000ade0e4] __schedule+0x314/0x990
[c000000deaf13940] [c000000000ade7a8] schedule+0x48/0xc0
[c000000deaf13970] [c0000000005492a4] blk_mq_freeze_queue_wait+0x74/0x110
[c000000deaf139e0] [c00000000054b6a8] blk_mq_queue_reinit_notify+0x1a8/0x2e0
[c000000deaf13a40] [c0000000000e7878] notifier_call_chain+0x98/0x100
[c000000deaf13a90] [c0000000000b8e08] cpu_notify_nofail+0x48/0xa0
[c000000deaf13ac0] [c0000000000b92f0] _cpu_down+0x2a0/0x400
[c000000deaf13b90] [c0000000000b94a8] cpu_down+0x58/0xa0
[c000000deaf13bc0] [c0000000006d5dcc] cpu_subsys_offline+0x2c/0x50
[c000000deaf13bf0] [c0000000006cd244] device_offline+0x104/0x140
[c000000deaf13c30] [c0000000006cd40c] online_store+0x6c/0xc0
[c000000deaf13c80] [c0000000006c8c78] dev_attr_store+0x68/0xa0
[c000000deaf13cc0] [c0000000003974d0] sysfs_kf_write+0x80/0xb0
[c000000deaf13d00] [c0000000003963e8] kernfs_fop_write+0x188/0x200
[c000000deaf13d50] [c0000000002e0f6c] __vfs_write+0x6c/0xe0
[c000000deaf13d90] [c0000000002e1ca0] vfs_write+0xc0/0x230
[c000000deaf13de0] [c0000000002e2cdc] SyS_write+0x6c/0x110
[c000000deaf13e30] [c000000000009204] system_call+0x38/0xb4

The fix is to allow the timeout work to execute in the window between
dropping the initial refcount reference and the release of the last
reference, which actually marks the freeze completion.  This can be
achieved with percpu_refcount_tryget, which does not require the counter
to be alive.  This way the timeout work can do it's job and terminate a
stuck request even during a freeze, returning its reference and avoiding
the deadlock.

Allowing the timeout to run is just a part of the fix, since for some
devices, we might get stuck again inside the device driver's timeout
handler, should it attempt to allocate a new request in that path -
which is a quite common action for Abort commands, which need to be sent
after a timeout.  In NVMe, for instance, we call blk_mq_alloc_request
from inside the timeout handler, which will fail during a freeze, since
it also tries to acquire a queue reference.

I considered a similar change to blk_mq_alloc_request as a generic
solution for further device driver hangs, but we can't do that, since it
would allow new requests to disturb the freeze process.  I thought about
creating a new function in the block layer to support unfreezable
requests for these occasions, but after working on it for a while, I
feel like this should be handled in a per-driver basis.  I'm now
experimenting with changes to the NVMe timeout path, but I'm open to
suggestions of ways to make this generic.

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: linux-nvme@lists.infradead.org
Cc: linux-block@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agonbd: fix race in ioctl
Vegard Nossum [Fri, 27 May 2016 10:59:35 +0000 (12:59 +0200)]
nbd: fix race in ioctl

Quentin ran into this bug:

WARNING: CPU: 64 PID: 10085 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x65/0x80
sysfs: cannot create duplicate filename '/devices/virtual/block/nbd3/pid'
Modules linked in: nbd
CPU: 64 PID: 10085 Comm: qemu-nbd Tainted: G      D         4.6.0+ #7
 0000000000000000 ffff8820330bba68 ffffffff814b8791 ffff8820330bbac8
 0000000000000000 ffff8820330bbab8 ffffffff810d04ab ffff8820330bbaa8
 0000001f00000296 0000000000017681 ffff8810380bf000 ffffffffa0001790
Call Trace:
 [<ffffffff814b8791>] dump_stack+0x4d/0x6c
 [<ffffffff810d04ab>] __warn+0xdb/0x100
 [<ffffffff810d0574>] warn_slowpath_fmt+0x44/0x50
 [<ffffffff81218c65>] sysfs_warn_dup+0x65/0x80
 [<ffffffff81218a02>] sysfs_add_file_mode_ns+0x172/0x180
 [<ffffffff81218a35>] sysfs_create_file_ns+0x25/0x30
 [<ffffffff81594a76>] device_create_file+0x36/0x90
 [<ffffffffa0000e8d>] __nbd_ioctl+0x32d/0x9b0 [nbd]
 [<ffffffff814cc8e8>] ? find_next_bit+0x18/0x20
 [<ffffffff810f7c29>] ? select_idle_sibling+0xe9/0x120
 [<ffffffff810f6cd7>] ? __enqueue_entity+0x67/0x70
 [<ffffffff810f9bf0>] ? enqueue_task_fair+0x630/0xe20
 [<ffffffff810efa76>] ? resched_curr+0x36/0x70
 [<ffffffff810f0078>] ? check_preempt_curr+0x78/0x90
 [<ffffffff810f00a2>] ? ttwu_do_wakeup+0x12/0x80
 [<ffffffff810f01b1>] ? ttwu_do_activate.constprop.86+0x61/0x70
 [<ffffffff810f0c15>] ? try_to_wake_up+0x185/0x2d0
 [<ffffffff810f0d6d>] ? default_wake_function+0xd/0x10
 [<ffffffff81105471>] ? autoremove_wake_function+0x11/0x40
 [<ffffffffa0001577>] nbd_ioctl+0x67/0x94 [nbd]
 [<ffffffff814ac0fd>] blkdev_ioctl+0x14d/0x940
 [<ffffffff811b0da2>] ? put_pipe_info+0x22/0x60
 [<ffffffff811d96cc>] block_ioctl+0x3c/0x40
 [<ffffffff811ba08d>] do_vfs_ioctl+0x8d/0x5e0
 [<ffffffff811aa329>] ? ____fput+0x9/0x10
 [<ffffffff810e9092>] ? task_work_run+0x72/0x90
 [<ffffffff811ba627>] SyS_ioctl+0x47/0x80
 [<ffffffff8185f5df>] entry_SYSCALL_64_fastpath+0x17/0x93
---[ end trace 7899b295e4f850c8 ]---

It seems fairly obvious that device_create_file() is not being protected
from being run concurrently on the same nbd.

Quentin found the following relevant commits:

1a2ad21 nbd: add locking to nbd_ioctl
90b8f28 [PATCH] end of methods switch: remove the old ones
d4430d6 [PATCH] beginning of methods conversion
08f8585 [PATCH] move block_device_operations to blkdev.h

It would seem that the race was introduced in the process of moving nbd
from BKL to unlocked ioctls.

By setting nbd->task_recv while the mutex is held, we can prevent other
processes from running concurrently (since nbd->task_recv is also checked
while the mutex is held).

Reported-and-tested-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Pavel Machek <pavel@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoblock: fix use-after-free in seq file
Vegard Nossum [Fri, 29 Jul 2016 08:40:31 +0000 (10:40 +0200)]
block: fix use-after-free in seq file

I got a KASAN report of use-after-free:

    ==================================================================
    BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508
    Read of size 8 by task trinity-c1/315
    =============================================================================
    BUG kmalloc-32 (Not tainted): kasan: bad access detected
    -----------------------------------------------------------------------------

    Disabling lock debugging due to kernel taint
    INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315
            ___slab_alloc+0x4f1/0x520
            __slab_alloc.isra.58+0x56/0x80
            kmem_cache_alloc_trace+0x260/0x2a0
            disk_seqf_start+0x66/0x110
            traverse+0x176/0x860
            seq_read+0x7e3/0x11a0
            proc_reg_read+0xbc/0x180
            do_loop_readv_writev+0x134/0x210
            do_readv_writev+0x565/0x660
            vfs_readv+0x67/0xa0
            do_preadv+0x126/0x170
            SyS_preadv+0xc/0x10
            do_syscall_64+0x1a1/0x460
            return_from_SYSCALL_64+0x0/0x6a
    INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315
            __slab_free+0x17a/0x2c0
            kfree+0x20a/0x220
            disk_seqf_stop+0x42/0x50
            traverse+0x3b5/0x860
            seq_read+0x7e3/0x11a0
            proc_reg_read+0xbc/0x180
            do_loop_readv_writev+0x134/0x210
            do_readv_writev+0x565/0x660
            vfs_readv+0x67/0xa0
            do_preadv+0x126/0x170
            SyS_preadv+0xc/0x10
            do_syscall_64+0x1a1/0x460
            return_from_SYSCALL_64+0x0/0x6a

    CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G    B           4.7.0+ #62
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
     ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480
     ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480
     ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970
    Call Trace:
     [<ffffffff81d6ce81>] dump_stack+0x65/0x84
     [<ffffffff8146c7bd>] print_trailer+0x10d/0x1a0
     [<ffffffff814704ff>] object_err+0x2f/0x40
     [<ffffffff814754d1>] kasan_report_error+0x221/0x520
     [<ffffffff8147590e>] __asan_report_load8_noabort+0x3e/0x40
     [<ffffffff83888161>] klist_iter_exit+0x61/0x70
     [<ffffffff82404389>] class_dev_iter_exit+0x9/0x10
     [<ffffffff81d2e8ea>] disk_seqf_stop+0x3a/0x50
     [<ffffffff8151f812>] seq_read+0x4b2/0x11a0
     [<ffffffff815f8fdc>] proc_reg_read+0xbc/0x180
     [<ffffffff814b24e4>] do_loop_readv_writev+0x134/0x210
     [<ffffffff814b4c45>] do_readv_writev+0x565/0x660
     [<ffffffff814b8a17>] vfs_readv+0x67/0xa0
     [<ffffffff814b8de6>] do_preadv+0x126/0x170
     [<ffffffff814b92ec>] SyS_preadv+0xc/0x10

This problem can occur in the following situation:

open()
 - pread()
    - .seq_start()
       - iter = kmalloc() // succeeds
       - seqf->private = iter
    - .seq_stop()
       - kfree(seqf->private)
 - pread()
    - .seq_start()
       - iter = kmalloc() // fails
    - .seq_stop()
       - class_dev_iter_exit(seqf->private) // boom! old pointer

As the comment in disk_seqf_stop() says, stop is called even if start
failed, so we need to reinitialise the private pointer to NULL when seq
iteration stops.

An alternative would be to set the private pointer to NULL when the
kmalloc() in disk_seqf_start() fails.

Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agof2fs: drop bio->bi_rw manual assignment
Jens Axboe [Wed, 27 Jul 2016 18:52:21 +0000 (12:52 -0600)]
f2fs: drop bio->bi_rw manual assignment

Merge 4fc29c1aa375 included this extra line, but it's not needed (or
useful) since we'll bio_set_op_attrs() right after to properly set
the op and flags for the bio.

Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoblock: add missing group association in bio-cloning functions
Paolo Valente [Wed, 27 Jul 2016 05:22:05 +0000 (07:22 +0200)]
block: add missing group association in bio-cloning functions

When a bio is cloned, the newly created bio must be associated with
the same blkcg as the original bio (if BLK_CGROUP is enabled). If
this operation is not performed, then the new bio is not associated
with any group, and the group of the current task is returned when
the group of the bio is requested.

Depending on the cloning frequency, this may cause a large
percentage of the bios belonging to a given group to be treated
as if belonging to other groups (in most cases as if belonging to
the root group). The expected group isolation may thereby be broken.

This commit adds the missing association in bio-cloning functions.

Fixes: da2f0f74cf7d ("Btrfs: add support for blkio controllers")
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Nikolay Borisov <kernel@kyup.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoblkcg: kill unused field nr_undestroyed_grps
Hou Tao [Tue, 26 Jul 2016 09:13:42 +0000 (17:13 +0800)]
blkcg: kill unused field nr_undestroyed_grps

'nr_undestroyed_grps' in struct throtl_data was used to count
the number of throtl_grp related with throtl_data, but now
throtl_grp is tracked by blkcg_gq, so it is useless anymore.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agowriteback: Write dirty times for WB_SYNC_ALL writeback
Jan Kara [Tue, 26 Jul 2016 09:38:20 +0000 (11:38 +0200)]
writeback: Write dirty times for WB_SYNC_ALL writeback

Currently we take care to handle I_DIRTY_TIME in vfs_fsync() and
queue_io() so that inodes which have only dirty timestamps are properly
written on fsync(2) and sync(2). However there are other call sites -
most notably going through write_inode_now() - which expect inode to be
clean after WB_SYNC_ALL writeback. This is not currently true as we do
not clear I_DIRTY_TIME in __writeback_single_inode() even for
WB_SYNC_ALL writeback in all the cases. This then resulted in the
following oops because bdev_write_inode() did not clean the inode and
writeback code later stumbled over a dirty inode with detached wb.

  general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
  Modules linked in:
  CPU: 3 PID: 32 Comm: kworker/u10:1 Not tainted 4.6.0-rc3+ #349
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Workqueue: writeback wb_workfn (flush-11:0)
  task: ffff88006ccf1840 ti: ffff88006cda8000 task.ti: ffff88006cda8000
  RIP: 0010:[<ffffffff818884d2>]  [<ffffffff818884d2>]
  locked_inode_to_wb_and_lock_list+0xa2/0x750
  RSP: 0018:ffff88006cdaf7d0  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88006ccf2050
  RDX: 0000000000000000 RSI: 000000114c8a8484 RDI: 0000000000000286
  RBP: ffff88006cdaf820 R08: ffff88006ccf1840 R09: 0000000000000000
  R10: 000229915090805f R11: 0000000000000001 R12: ffff88006a72f5e0
  R13: dffffc0000000000 R14: ffffed000d4e5eed R15: ffffffff8830cf40
  FS:  0000000000000000(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000003301bf8 CR3: 000000006368f000 CR4: 00000000000006e0
  DR0: 0000000000001ec9 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
  Stack:
   ffff88006a72f680 ffff88006a72f768 ffff8800671230d8 03ff88006cdaf948
   ffff88006a72f668 ffff88006a72f5e0 ffff8800671230d8 ffff88006cdaf948
   ffff880065b90cc8 ffff880067123100 ffff88006cdaf970 ffffffff8188e12e
  Call Trace:
   [<     inline     >] inode_to_wb_and_lock_list fs/fs-writeback.c:309
   [<ffffffff8188e12e>] writeback_sb_inodes+0x4de/0x1250 fs/fs-writeback.c:1554
   [<ffffffff8188efa4>] __writeback_inodes_wb+0x104/0x1e0 fs/fs-writeback.c:1600
   [<ffffffff8188f9ae>] wb_writeback+0x7ce/0xc90 fs/fs-writeback.c:1709
   [<     inline     >] wb_do_writeback fs/fs-writeback.c:1844
   [<ffffffff81891079>] wb_workfn+0x2f9/0x1000 fs/fs-writeback.c:1884
   [<ffffffff813bcd1e>] process_one_work+0x78e/0x15c0 kernel/workqueue.c:2094
   [<ffffffff813bdc2b>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2228
   [<ffffffff813cdeef>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
   [<ffffffff867bc5d2>] ret_from_fork+0x22/0x50 arch/x86/entry/entry_64.S:392
  Code: 05 94 4a a8 06 85 c0 0f 85 03 03 00 00 e8 07 15 d0 ff 41 80 3e
  00 0f 85 64 06 00 00 49 8b 9c 24 88 01 00 00 48 89 d8 48 c1 e8 03 <42>
  80 3c 28 00 0f 85 17 06 00 00 48 8b 03 48 83 c0 50 48 39 c3
  RIP  [<     inline     >] wb_get include/linux/backing-dev-defs.h:212
  RIP  [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750
  fs/fs-writeback.c:281
   RSP <ffff88006cdaf7d0>
  ---[ end trace 986a4d314dcb2694 ]---

Fix the problem by making sure __writeback_single_inode() writes inode
only with dirty times in WB_SYNC_ALL mode.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agofloppy: fix open(O_ACCMODE) for ioctl-only open
Jiri Kosina [Thu, 16 Jun 2016 07:53:58 +0000 (09:53 +0200)]
floppy: fix open(O_ACCMODE) for ioctl-only open

Commit 09954bad4 ("floppy: refactor open() flags handling"), as a
side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that
this is being used setfdprm userspace for ioctl-only open().

Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE)
modes, while still keeping the original O_NDELAY bug fixed.

Cc: stable@vger.kernel.org # v4.5+
Reported-by: Wim Osterholt <wim@djo.tudelft.nl>
Tested-by: Wim Osterholt <wim@djo.tudelft.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agometag: Fix __cmpxchg_u32 asm constraint for CMP
James Hogan [Thu, 4 Aug 2016 16:36:08 +0000 (17:36 +0100)]
metag: Fix __cmpxchg_u32 asm constraint for CMP

The LNKGET based atomic sequence in __cmpxchg_u32 has slightly incorrect
constraints for the return value which under certain circumstances can
allow an address unit register to be used as the first operand of a CMP
instruction. This isn't a valid instruction however as the encodings
only allow a data unit to be specified. This would result in an
assembler error like the following:

  Error: failed to assemble instruction: "CMP A0.2,D0Ar6"

Fix by changing the constraint from "=&da" (assigned, early clobbered,
data or address unit register) to "=&d" (data unit register only).

The constraint for the second operand, "bd" (an op2 register where op1
is a data unit register and the instruction supports O2R) is already
correct assuming the first operand is a data unit register.

Other cases of CMP in inline asm have had their constraints checked, and
appear to all be fine.

Fixes: 6006c0d8ce94 ("metag: Atomics, locks and bitops")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.9.x-
7 years agoInput: silead - remove some dead code
Dan Carpenter [Thu, 4 Aug 2016 15:24:46 +0000 (08:24 -0700)]
Input: silead - remove some dead code

buf[0] is an unsigned char.  touch_nr is an int.  The test for negative
here doesn't make sense so I have removed it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
7 years agoInput: sis-i2c - select CONFIG_CRC_ITU_T
Arnd Bergmann [Thu, 4 Aug 2016 15:19:44 +0000 (08:19 -0700)]
Input: sis-i2c - select CONFIG_CRC_ITU_T

The newly added sis_i2c driver fails to link without the CRC_ITU_T
driver enabled:

drivers/input/touchscreen/sis_i2c.o: In function `sis_ts_irq_handler':
sis_i2c.c:(.text+0xc0): undefined reference to `crc_itu_t'

This adds a Kconfig select statement.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: a485cb037fe6 ("Input: add driver for SiS 9200 family I2C touchscreen controllers")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
7 years agoMerge branches 'misc' and 'rxe' into k.o/for-4.8-1
Doug Ledford [Thu, 4 Aug 2016 15:13:47 +0000 (11:13 -0400)]
Merge branches 'misc' and 'rxe' into k.o/for-4.8-1

7 years agoSoft RoCE driver
Moni Shoua [Thu, 16 Jun 2016 13:45:23 +0000 (16:45 +0300)]
Soft RoCE driver

Soft RoCE (RXE) - The software RoCE driver

ib_rxe implements the RDMA transport and registers to the RDMA core
device as a kernel verbs provider. It also implements the packet IO
layer. On the other hand ib_rxe registers to the Linux netdev stack
as a udp encapsulating protocol, in that case RDMA, for sending and
receiving packets over any Ethernet device.  This yields a RDMA
transport over the UDP/Ethernet network layer forming a RoCEv2
compatible device.

The configuration procedure of the Soft RoCE drivers requires
binding to any existing Ethernet network device. This is done with
/sys interface.

A userspace Soft RoCE library (librxe) provides user applications
the ability to run with Soft RoCE devices.  The use of rxe verbs ins
user space requires the inclusion of librxe as a device specifics
plug-in to libibverbs. librxe is packaged separately.

Architecture:

     +-----------------------------------------------------------+
     |                          Application                      |
     +-----------------------------------------------------------+
                            +-----------------------------------+
                            |             libibverbs            |
User                        +-----------------------------------+
                            +----------------+ +----------------+
                            | librxe         | | HW RoCE lib    |
                            +----------------+ +----------------+
+---------------------------------------------------------------+
     +--------------+                           +------------+
     | Sockets      |                           | RDMA ULP   |
     +--------------+                           +------------+
     +--------------+                  +---------------------+
     | TCP/IP       |                  | ib_core             |
     +--------------+                  +---------------------+
                             +------------+ +----------------+
Kernel                       | ib_rxe     | | HW RoCE driver |
                             +------------+ +----------------+
     +------------------------------------+
     | NIC driver                         |
     +------------------------------------+

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     +-----------------------------------------------------------+
     |                          Application                      |
     +-----------------------------------------------------------+
                            +-----------------------------------+
                            |             libibverbs            |
User                        +-----------------------------------+
                            +----------------+ +----------------+
                            | librxe         | | HW RoCE lib    |
                            +----------------+ +----------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     +--------------+                           +------------+
     | Sockets      |                           | RDMA ULP   |
     +--------------+                           +------------+
     +--------------+                  +---------------------+
     | TCP/IP       |                  | ib_core             |
     +--------------+                  +---------------------+
                             +------------+ +----------------+
Kernel                       | ib_rxe     | | HW RoCE driver |
                             +------------+ +----------------+
     +------------------------------------+
     | NIC driver                         |
     +------------------------------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Soft RoCE resources:

[1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in
Github
[2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE
Wiki page
[3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agodm raid: fix use of wrong status char during resynchronization
Heinz Mauelshagen [Wed, 3 Aug 2016 20:57:49 +0000 (22:57 +0200)]
dm raid: fix use of wrong status char during resynchronization

During a resynchronization, device status char 'a' is output on the raid
status line for every device of a RAID set.  It changes from 'a' to 'A'
(unless device failure) when the resynchronization completes.

Interrupting and restarting a resynchronization, by reloading the DM
table, erroneously lead to status char 'A'.

Fix this by avoiding setting the MD_RECOVERY_REQUESTED flag in
raid_preresume().

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
7 years agoMerge tag 'media/v4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Thu, 4 Aug 2016 13:59:37 +0000 (09:59 -0400)]
Merge tag 'media/v4.8-5' of git://git./linux/kernel/git/mchehab/linux-media

Pull media DocBook removal and some fixups from Mauro Carvalho Chehab:

  - removal of the media DocBook (since it's all in Sphinx now)

  - videobuf2: Fix an allocation regression

  - a few fixes related to the CEC drivers

* tag 'media/v4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] cec: fix off-by-one memset
  [media] staging: add MEDIA_SUPPORT dependency
  [media] vivid: don't handle CEC_MSG_SET_STREAM_PATH
  [media] media: adv7180: Fix broken interrupt register access
  [media] vb2: Fix allocation size of dma_parms
  [media] vim2m: copy the other colorspace-related fields as well
  [media] adv7511: fix VIC autodetect
  doc-rst: Remove the media docbook

7 years agoMerge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 4 Aug 2016 13:14:38 +0000 (09:14 -0400)]
Merge tag 'modules-next-for-linus' of git://git./linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "The only interesting thing here is Jessica's patch to add
  ro_after_init support to modules.  The rest are all trivia"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  extable.h: add stddef.h so "NULL" definition is not implicit
  modules: add ro_after_init support
  jump_label: disable preemption around __module_text_address().
  exceptions: fork exception table content from module.h into extable.h
  modules: Add kernel parameter to blacklist modules
  module: Do a WARN_ON_ONCE() for assert module mutex not held
  Documentation/module-signing.txt: Note need for version info if reusing a key
  module: Invalidate signatures on force-loaded modules
  module: Issue warnings when tainting kernel
  module: fix redundant test.
  module: fix noreturn attribute for __module_put_and_exit()

7 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Thu, 4 Aug 2016 12:51:12 +0000 (08:51 -0400)]
Merge branch 'akpm' (patches from Andrew)

Merge even more updates from Andrew Morton:

 - dma-mapping API cleanup

 - a few cleanups and misc things

 - use jump labels in dynamic-debug

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  dynamic_debug: add jump label support
  jump_label: remove bug.h, atomic.h dependencies for HAVE_JUMP_LABEL
  arm: jump label may reference text in __exit
  tile: support static_key usage in non-module __exit sections
  sparc: support static_key usage in non-module __exit sections
  powerpc: add explicit #include <asm/asm-compat.h> for jump label
  drivers/media/dvb-frontends/cxd2841er.c: avoid misleading gcc warning
  MAINTAINERS: update email and list of Samsung HW driver maintainers
  block: remove BLK_DEV_DAX config option
  samples/kretprobe: fix the wrong type
  samples/kretprobe: convert the printk to pr_info/pr_err
  samples/jprobe: convert the printk to pr_info/pr_err
  samples/kprobe: convert the printk to pr_info/pr_err
  dma-mapping: use unsigned long for dma_attrs
  media: mtk-vcodec: remove unused dma_attrs
  include/linux/bitmap.h: cleanup
  tree-wide: replace config_enabled() with IS_ENABLED()
  drivers/fpga/Kconfig: fix build failure

7 years agodynamic_debug: add jump label support
Jason Baron [Wed, 3 Aug 2016 20:46:39 +0000 (13:46 -0700)]
dynamic_debug: add jump label support

Although dynamic debug is often only used for debug builds, sometimes
its enabled for production builds as well.  Minimize its impact by using
jump labels.  This reduces the text section by 7000+ bytes in the kernel
image below.  It does increase data, but this should only be referenced
when changing the direction of the branches, and hence usually not in
cache.

     text     data     bss       dec     hex  filename
  8194852  4879776  925696  14000324  d5a0c4  vmlinux.pre
  8187337  4960224  925696  14073257  d6bda9  vmlinux.post

Link: http://lkml.kernel.org/r/d165b465e8c89bc582d973758d40be44c33f018b.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agojump_label: remove bug.h, atomic.h dependencies for HAVE_JUMP_LABEL
Jason Baron [Wed, 3 Aug 2016 20:46:36 +0000 (13:46 -0700)]
jump_label: remove bug.h, atomic.h dependencies for HAVE_JUMP_LABEL

The current jump_label.h includes bug.h for things such as WARN_ON().
This makes the header problematic for inclusion by kernel.h or any
headers that kernel.h includes, since bug.h includes kernel.h (circular
dependency).  The inclusion of atomic.h is similarly problematic.  Thus,
this should make jump_label.h 'includable' from most places.

Link: http://lkml.kernel.org/r/7060ce35ddd0d20b33bf170685e6b0fab816bdf2.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoarm: jump label may reference text in __exit
Jason Baron [Wed, 3 Aug 2016 20:46:33 +0000 (13:46 -0700)]
arm: jump label may reference text in __exit

The jump table can reference text found in an __exit section.  Thus,
instead of discarding it at build time, include EXIT_TEXT as part of
__init and it will be released when the system boots.

Link: http://lkml.kernel.org/r/60284113bb759121e8ae3e99af1535647e52123f.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agotile: support static_key usage in non-module __exit sections
Chris Metcalf [Wed, 3 Aug 2016 20:46:30 +0000 (13:46 -0700)]
tile: support static_key usage in non-module __exit sections

Previously, all the __exit sections were just dropped by the link phase.
However, if there are static_key (jump label) constructs in __exit
sections that are not modules, the link fails with the message:

   `.exit.text' referenced in section `__jump_table' of xxx.o:
   defined in discarded section `.exit.text' of xxx.o

Support this usage by keeping the .exit.text sections in the final image
if JUMP_LABEL is defined, then discarding them once initialization is
complete.

Link: http://lkml.kernel.org/r/bfd7c107c610c30e992868ebfe2a5d796a097464.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agosparc: support static_key usage in non-module __exit sections
Jason Baron [Wed, 3 Aug 2016 20:46:27 +0000 (13:46 -0700)]
sparc: support static_key usage in non-module __exit sections

The jump table can reference text found in an __exit section.  Thus,
instead of discarding it at build/link time, include EXIT_TEXT as part
of __init and release it at system boot time.

Without this patch the link fails with:

    `.exit.text' referenced in section `__jump_table' of xxx.o:
    defined in discarded section `.exit.text' of xxx.o

Link: http://lkml.kernel.org/r/d822da427ab07a02a394602eca687104ff682f83.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agopowerpc: add explicit #include <asm/asm-compat.h> for jump label
Jason Baron [Wed, 3 Aug 2016 20:46:24 +0000 (13:46 -0700)]
powerpc: add explicit #include <asm/asm-compat.h> for jump label

The stringify_in_c() macro may not be included. Make the dependency
explicit.

Link: http://lkml.kernel.org/r/564720c5328edd53c9d56db325be7215440eec3e.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Joe Perches <joe@perches.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agodrivers/media/dvb-frontends/cxd2841er.c: avoid misleading gcc warning
Arnd Bergmann [Wed, 3 Aug 2016 20:46:21 +0000 (13:46 -0700)]
drivers/media/dvb-frontends/cxd2841er.c: avoid misleading gcc warning

The addition of jump label support in dynamic_debug caused an unexpected
warning in exactly one file in the kernel:

  drivers/media/dvb-frontends/cxd2841er.c: In function 'cxd2841er_tune_tc':
  include/linux/dynamic_debug.h:134:3: error: 'carrier_offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
     __dynamic_dev_dbg(&descriptor, dev, fmt, \
     ^~~~~~~~~~~~~~~~~
  drivers/media/dvb-frontends/cxd2841er.c:3177:11: note: 'carrier_offset' was declared here
    int ret, carrier_offset;
             ^~~~~~~~~~~~~~

The problem seems to be that the compiler gets confused by the extra
conditionals in static_branch_unlikely, to the point where it can no
longer keep track of which branches have already been taken, and it
doesn't realize that this variable is now always initialized when it
gets used.

I have done lots of randconfig kernel builds and could not find any
other file with this behavior, so I assume it's a rare enough glitch
that we don't need to change the jump label support but instead just
work around the warning in the driver.

To achieve that, I'm moving the check for the return value into the
switch() statement, which is an obvious transformation, but is enough to
un-confuse the compiler here.  The resulting code is not as nice to
read, but at least we retain the behavior of warning if it gets changed
to actually access an uninitialized carrier offset value in the future.

Link: http://lkml.kernel.org/r/20160713204342.1221511-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Abylay Ospan <aospan@netup.ru>
Cc: Sergey Kozlov <serjk@netup.ru>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMAINTAINERS: update email and list of Samsung HW driver maintainers
Kamil Debski [Wed, 3 Aug 2016 20:46:18 +0000 (13:46 -0700)]
MAINTAINERS: update email and list of Samsung HW driver maintainers

Change my email address in the MAINTAINERS file.
Add new maintainers of selected Samsung HW drivers.

Link: http://lkml.kernel.org/r/1470060703-20423-1-git-send-email-k.debski@samsung.com
Signed-off-by: Kamil Debski <k.debski@samsung.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Kishon Vijay Abraham I <kishon@ti.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Lukasz Majewski <l.majewski@samsung.com>
Cc: Sylwester Nawrocki <s.nawrocki@samsung.com>
Cc: Kamil Debski <kamil@wypas.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoblock: remove BLK_DEV_DAX config option
Ross Zwisler [Wed, 3 Aug 2016 20:46:15 +0000 (13:46 -0700)]
block: remove BLK_DEV_DAX config option

The functionality for block device DAX was already removed with commit
acc93d30d7d4 ("Revert "block: enable dax for raw block devices"")

However, we still had a config option hanging around that was always
disabled because it depended on CONFIG_BROKEN.  This config option was
introduced in commit 03cdadb04077 ("block: disable block device DAX by
default")

This change reverts that commit, removing the dead config option.

Link: http://lkml.kernel.org/r/20160729182314.6368-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agosamples/kretprobe: fix the wrong type
Huang Shijie [Wed, 3 Aug 2016 20:46:12 +0000 (13:46 -0700)]
samples/kretprobe: fix the wrong type

The regs_return_value() returns "unsigned long" or "long" value.  But the
retval is int type now, it may cause overflow, the log may becomes:

    [ 2911.078869] do_brk returned -2003877888 and took 4620 ns to execute

This patch converts the retval to "unsigned long" type, and fixes the
overflow issue.

Link: http://lkml.kernel.org/r/1464143083-3877-4-git-send-email-shijie.huang@arm.com
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agosamples/kretprobe: convert the printk to pr_info/pr_err
Huang Shijie [Wed, 3 Aug 2016 20:46:09 +0000 (13:46 -0700)]
samples/kretprobe: convert the printk to pr_info/pr_err

We prefer to use the pr_* to print out the log now, this patch converts
the printk to pr_info.  In the error path, use the pr_err to replace the
printk.

Link: http://lkml.kernel.org/r/1464143083-3877-3-git-send-email-shijie.huang@arm.com
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agosamples/jprobe: convert the printk to pr_info/pr_err
Huang Shijie [Wed, 3 Aug 2016 20:46:06 +0000 (13:46 -0700)]
samples/jprobe: convert the printk to pr_info/pr_err

We prefer to use the pr_* to print out the log now, this patch converts
the printk to pr_info.  In the error path, use the pr_err to replace the
printk.

Link: http://lkml.kernel.org/r/1464143083-3877-2-git-send-email-shijie.huang@arm.com
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agosamples/kprobe: convert the printk to pr_info/pr_err
Huang Shijie [Wed, 3 Aug 2016 20:46:03 +0000 (13:46 -0700)]
samples/kprobe: convert the printk to pr_info/pr_err

We prefer to use the pr_* to print out the log now, this patch converts
the printk to pr_info.  In the error path, use the pr_err to replace the
printk.

Link: http://lkml.kernel.org/r/1464143083-3877-1-git-send-email-shijie.huang@arm.com
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agodma-mapping: use unsigned long for dma_attrs
Krzysztof Kozlowski [Wed, 3 Aug 2016 20:46:00 +0000 (13:46 -0700)]
dma-mapping: use unsigned long for dma_attrs

The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer.  Thus the pointer can point to const data.
However the attributes do not have to be a bitfield.  Instead unsigned
long will do fine:

1. This is just simpler.  Both in terms of reading the code and setting
   attributes.  Instead of initializing local attributes on the stack
   and passing pointer to it to dma_set_attr(), just set the bits.

2. It brings safeness and checking for const correctness because the
   attributes are passed by value.

Semantic patches for this change (at least most of them):

    virtual patch
    virtual context

    @r@
    identifier f, attrs;

    @@
    f(...,
    - struct dma_attrs *attrs
    + unsigned long attrs
    , ...)
    {
    ...
    }

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

and

    // Options: --all-includes
    virtual patch
    virtual context

    @r@
    identifier f, attrs;
    type t;

    @@
    t f(..., struct dma_attrs *attrs);

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomedia: mtk-vcodec: remove unused dma_attrs
Krzysztof Kozlowski [Wed, 3 Aug 2016 20:45:57 +0000 (13:45 -0700)]
media: mtk-vcodec: remove unused dma_attrs

The local variable dma_attrs is set but never read.

Link: http://lkml.kernel.org/r/1468399300-5399-1-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoinclude/linux/bitmap.h: cleanup
Andrew Morton [Wed, 3 Aug 2016 20:45:54 +0000 (13:45 -0700)]
include/linux/bitmap.h: cleanup

Remove two unneeded `else's.

Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agotree-wide: replace config_enabled() with IS_ENABLED()
Masahiro Yamada [Wed, 3 Aug 2016 20:45:50 +0000 (13:45 -0700)]
tree-wide: replace config_enabled() with IS_ENABLED()

The use of config_enabled() against config options is ambiguous.  In
practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
author might have used it for the meaning of IS_ENABLED().  Using
IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
clearer.

This commit replaces config_enabled() with IS_ENABLED() where possible.
This commit is only touching bool config options.

I noticed two cases where config_enabled() is used against a tristate
option:

 - config_enabled(CONFIG_HWMON)
  [ drivers/net/wireless/ath/ath10k/thermal.c ]

 - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
  [ drivers/gpu/drm/gma500/opregion.c ]

I did not touch them because they should be converted to IS_BUILTIN()
in order to keep the logic, but I was not sure it was the authors'
intention.

Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Joshua Kinard <kumba@gentoo.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Will Drewry <wad@chromium.org>
Cc: Nikolay Martynov <mar.kolya@gmail.com>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Cc: Rafal Milecki <zajec5@gmail.com>
Cc: James Cowgill <James.Cowgill@imgtec.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Alex Smith <alex.smith@imgtec.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Qais Yousef <qais.yousef@imgtec.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Tony Wu <tung7970@gmail.com>
Cc: Huaitong Han <huaitong.han@intel.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Gelmini <andrea.gelmini@gelma.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Rabin Vincent <rabin@rab.in>
Cc: "Maciej W. Rozycki" <macro@imgtec.com>
Cc: David Daney <david.daney@cavium.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agodrivers/fpga/Kconfig: fix build failure
Sudip Mukherjee [Wed, 3 Aug 2016 20:45:46 +0000 (13:45 -0700)]
drivers/fpga/Kconfig: fix build failure

While building m32r allmodconfig the build is failing with the error:

  ERROR: "bad_dma_ops" [drivers/fpga/zynq-fpga.ko] undefined!

Xilinx Zynq FPGA is using DMA but there was no dependency while
building.

Link: http://lkml.kernel.org/r/1464346526-13913-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Acked-by: Moritz Fischer <moritz.fischer@ettus.com>
Cc: Alan Tull <atull@opensource.altera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoarm64: Fix copy-on-write referencing in HugeTLB
Steve Capper [Wed, 3 Aug 2016 14:15:55 +0000 (15:15 +0100)]
arm64: Fix copy-on-write referencing in HugeTLB

set_pte_at(.) will set or unset the PTE_RDONLY hardware bit before
writing the entry to the table.

This can cause problems with the copy-on-write logic in hugetlb_cow:
 *) hugetlb_cow(.) called to handle a write fault on read only pte,
 *) Before the copy-on-write updates the new page table a call is
    made to pte_same(huge_ptep_get(ptep), pte)), to check for a race,
 *) Because set_pte_at(.) changed the pte, *ptep != pte, and the
    hugetlb_cow(.) code erroneously assumes that it lost the race,
 *) The new page is subsequently freed without being used.

On arm64 this problem only becomes apparent when we apply:
67961f9 mm/hugetlb: fix huge page reserve accounting for private
mappings

When one runs the libhugetlbfs test suite, there are allocation errors
and hugetlbfs pages become erroneously locked in memory as reserved.
(There is a high HugePages_Rsvd: count).

In this patch we introduce pte_same which ignores the PTE_RDONLY bit,
allowing for the libhugetlbfs test suite to pass as expected and
without leaking any reserved HugeTLB pages.

Reported-by: Huang Shijie <shijie.huang@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
7 years agonvmx: mark ept single context invalidation as supported
Bandan Das [Tue, 2 Aug 2016 20:32:36 +0000 (16:32 -0400)]
nvmx: mark ept single context invalidation as supported

Commit 4b855078601f ("KVM: nVMX: Don't advertise single
context invalidation for invept") removed advertising
single context invalidation since the spec does not mandate it.
However, some hypervisors (such as ESX) require it to be present
before willing to use ept in a nested environment. Advertise it
and fallback to the global case.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agonvmx: remove comment about missing nested vpid support
Bandan Das [Tue, 2 Aug 2016 20:32:35 +0000 (16:32 -0400)]
nvmx: remove comment about missing nested vpid support

Nested vpid is already supported and both single/global
modes are advertised to the guest

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoKVM: lapic: fix access preemption timer stuff even if kernel_irqchip=off
Wanpeng Li [Wed, 3 Aug 2016 04:04:12 +0000 (12:04 +0800)]
KVM: lapic: fix access preemption timer stuff even if kernel_irqchip=off

BUG: unable to handle kernel NULL pointer dereference at 000000000000008c
IP: [<ffffffffc04e0180>] kvm_lapic_hv_timer_in_use+0x10/0x20 [kvm]
PGD 0
Oops: 0000 [#1] SMP
Call Trace:
 kvm_arch_vcpu_load+0x86/0x260 [kvm]
 vcpu_load+0x46/0x60 [kvm]
 kvm_vcpu_ioctl+0x79/0x7c0 [kvm]
 ? __lock_is_held+0x54/0x70
 do_vfs_ioctl+0x96/0x6a0
 ? __fget_light+0x2a/0x90
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x7c/0x1e0
 entry_SYSCALL64_slow_path+0x25/0x25
RIP  [<ffffffffc04e0180>] kvm_lapic_hv_timer_in_use+0x10/0x20 [kvm]
 RSP <ffff8800db1f3d70>
CR2: 000000000000008c
---[ end trace a55fb79d2b3b4ee8 ]---

This can be reproduced steadily by kernel_irqchip=off.

We should not access preemption timer stuff if lapic is emulated in userspace.
This patch fix it by avoiding access preemption timer stuff when kernel_irqchip=off.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoKVM: documentation: fix KVM_CAP_X2APIC_API information
Paolo Bonzini [Thu, 4 Aug 2016 12:01:05 +0000 (14:01 +0200)]
KVM: documentation: fix KVM_CAP_X2APIC_API information

The KVM_X2APIC_API_USE_32BIT_IDS feature applies to both
KVM_SET_GSI_ROUTING and KVM_SIGNAL_MSI, but was not mentioned in the
documentation for the latter ioctl.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoMerge tag 'kvm-arm-for-4.8-take2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Thu, 4 Aug 2016 11:59:56 +0000 (13:59 +0200)]
Merge tag 'kvm-arm-for-4.8-take2' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM Changes for v4.8 - Take 2

Includes GSI routing support to go along with the new VGIC and a small fix that
has been cooking in -next for a while.

7 years agox86: vdso: use __pvclock_read_cycles
Paolo Bonzini [Thu, 9 Jun 2016 18:12:34 +0000 (20:12 +0200)]
x86: vdso: use __pvclock_read_cycles

The new simplified __pvclock_read_cycles does the same computation
as vread_pvclock, except that (because it takes the pvclock_vcpu_time_info
pointer) it has to be moved inside the loop.  Since the loop is expected to
never roll, this makes no difference.

Acked-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agopvclock: introduce seqcount-like API
Paolo Bonzini [Thu, 9 Jun 2016 11:06:08 +0000 (13:06 +0200)]
pvclock: introduce seqcount-like API

The version field in struct pvclock_vcpu_time_info basically implements
a seqcount.  Wrap it with the usual read_begin and read_retry functions,
and use these APIs instead of peppering the code with smp_rmb()s.
While at it, change it to the more pedantically correct virt_rmb().

With this change, __pvclock_read_cycles can be simplified noticeably.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agopowerpc/mm: Move register_process_table() out of ppc_md
Michael Ellerman [Thu, 4 Aug 2016 05:32:06 +0000 (15:32 +1000)]
powerpc/mm: Move register_process_table() out of ppc_md

We want to initialise register_process_table() before ppc_md is setup,
so that it can be called as part of MMU init (at least on Radix ATM).

That no longer works because probe_machine() requires that ppc_md be
empty before it's called, and we now do probe_machine() much later.

So make register_process_table a global for now. It will probably move
into a mmu_radix_ops struct at some point in the future.

This was broken by me when applying commit 7025776ed1eb "powerpc/mm:
Move hash table ops to a separate structure" due to conflicts with other
patches.

Fixes: 7025776ed1eb ("powerpc/mm: Move hash table ops to a separate structure")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
7 years agopowerpc/perf: Fix incorrect event codes in power9-event-list
Madhavan Srinivasan [Sun, 31 Jul 2016 19:28:39 +0000 (00:58 +0530)]
powerpc/perf: Fix incorrect event codes in power9-event-list

These have been changed in the hardware, update Linux's version.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
7 years agoALSA: hda - Fix headset mic detection problem for two dell machines
Hui Wang [Thu, 4 Aug 2016 07:28:04 +0000 (15:28 +0800)]
ALSA: hda - Fix headset mic detection problem for two dell machines

One of the machines has ALC255 on it, another one has ALC298 on it.

On the machine with the codec ALC298, it also has the speaker volume
problem, so we add the fixup chained to ALC298_FIXUP_SPK_VOLUME rather
than adding a group of pin definition in the pin quirk table, since
the speak volume problem does not happen on other machines yet.

Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
7 years agoMerge tag 'perf-core-for-mingo-20160803' of git://git.kernel.org/pub/scm/linux/kernel...
Ingo Molnar [Thu, 4 Aug 2016 09:02:38 +0000 (11:02 +0200)]
Merge tag 'perf-core-for-mingo-20160803' of git://git./linux/kernel/git/acme/linux into perf/urgent

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

New features:

- Add --sample-cpu to 'perf record', to explicitely ask for sampling
  the CPU (Jiri Olsa)

Fixes:

- Fix processing of multi byte chunks in objdump output, fixing
  disassemble processing for annotation on at least ARM64 (Jan Stancek)

- Use SyS_epoll_wait in a BPF 'perf test' entry instead of sys_epoll_wait, that
  is not present in the DWARF info in vmlinux files (Arnaldo Carvalho de Melo)

- Add -wno-shadow when processing files using perl headers, fixing
  the build on Fedora Rawhide and Arch Linux (Namhyung Kim)

Infrastructure changes:

- Annotate prep work to better catch and report errors related to
  using objdump to disassemble DSOs (Arnaldo Carvalho de Melo)

- Add 'alloc', 'scnprintf' and 'and' methods for bitmap processing (Jiri Olsa)

- Add nested output resorting callback in hists processing (Jiri Olsa)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agoRevert "ACPI / hotplug / PCI: Runtime resume bridge before rescan"
Linus Torvalds [Thu, 4 Aug 2016 02:20:22 +0000 (22:20 -0400)]
Revert "ACPI / hotplug / PCI: Runtime resume bridge before rescan"

This reverts commit 16468c783cb4cf72475dcda23fabecb4a4bb0e17.

Bisection showed that it was the root cause for a resume hang on a
bog-standard all-Intel laptop (Sony Vaio Pro 11), and reverting fixes
the hang.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMerge branches 'hfi1' and 'sge-limit' into k.o/for-4.8-2
Doug Ledford [Thu, 4 Aug 2016 01:51:20 +0000 (21:51 -0400)]
Merge branches 'hfi1' and 'sge-limit' into k.o/for-4.8-2

7 years agoMerge branch 'next' into for-linus
Dmitry Torokhov [Thu, 4 Aug 2016 01:16:10 +0000 (18:16 -0700)]
Merge branch 'next' into for-linus

Prepare second round of input updates for 4.8 merge window.

7 years agoIB/core: Support for CMA multicast join flags
Alex Vesker [Wed, 6 Jul 2016 13:36:35 +0000 (16:36 +0300)]
IB/core: Support for CMA multicast join flags

Added UCMA and CMA support for multicast join flags. Flags are
passed using UCMA CM join command previously reserved fields.
Currently supporting two join flags indicating two different
multicast JoinStates:

1. Full Member:
   The initiator creates the Multicast group(MCG) if it wasn't
   previously created, can send Multicast messages to the group
   and receive messages from the MCG.

2. Send Only Full Member:
   The initiator creates the Multicast group(MCG) if it wasn't
   previously created, can send Multicast messages to the group
   but doesn't receive any messages from the MCG.

   IB: Send Only Full Member requires a query of ClassPortInfo
       to determine if SM/SA supports this option. If SM/SA
       doesn't support Send-Only there will be no join request
       sent and an error will be returned.

   ETH: When Send Only Full Member is requested no IGMP join
will be sent.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/sa: Add cached attribute containing SM information to SA port
Alex Vesker [Wed, 6 Jul 2016 13:36:34 +0000 (16:36 +0300)]
IB/sa: Add cached attribute containing SM information to SA port

Added a new SA port attribute containing SM ClassPortInfo fields,
(ClassPortInfo fields: Table 126 IB Spec 1.3.). This is useful for
checking SM support for specific features. The attribute is cached
to avoid resending queries, caching is done when a successful
ClassPortInfo reply is received on the port. Invalidation of the
attribute is done on SM change events, SM re-registration events,
and SM LID change events. The fields in ClassPortInfo should not
change during SM runtime without an event.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/uverbs: Fix race between uverbs_close and remove_one
Jason Gunthorpe [Sun, 3 Jul 2016 12:28:18 +0000 (15:28 +0300)]
IB/uverbs: Fix race between uverbs_close and remove_one

Fixes an oops that might happen if uverbs_close races with
remove_one.

Both contexts may run ib_uverbs_cleanup_ucontext, it depends
on the flow.

Currently, there is no protection for a case that remove_one
didn't make the cleanup it runs to its end, the underlying
ib_device was freed then uverbs_close will call
ib_uverbs_cleanup_ucontext and OOPs.

Above might happen if uverbs_close deleted the file from the list
then remove_one didn't find it and runs to its end.

Fixes to protect against that case by a new cleanup lock so that
ib_uverbs_cleanup_ucontext will be called always before that
remove_one is ended.

Fixes: 35d4a0b63dc0 ("IB/uverbs: Fix race between ib_uverbs_open and remove_one")
Reported-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mthca: Clean up error unwind flow in mthca_reset()
Markus Elfring [Sat, 23 Jul 2016 09:28:30 +0000 (11:28 +0200)]
IB/mthca: Clean up error unwind flow in mthca_reset()

The kfree() function was called in a few cases by the mthca_reset()
function during error handling even if the passed variables "bridge_header"
and "hca_header" contained a null pointer.

Adjust jump targets according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mthca: NULL arg to pci_dev_put is OK
Markus Elfring [Sat, 23 Jul 2016 08:05:12 +0000 (10:05 +0200)]
IB/mthca: NULL arg to pci_dev_put is OK

The pci_dev_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: NULL arg to sc_return_credits is OK
Markus Elfring [Sat, 23 Jul 2016 06:30:52 +0000 (08:30 +0200)]
IB/hfi1: NULL arg to sc_return_credits is OK

The sc_return_credits() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: Add diagnostic hardware counters
Mark Bloch [Tue, 19 Jul 2016 17:54:58 +0000 (20:54 +0300)]
IB/mlx4: Add diagnostic hardware counters

Expose IB diagnostic hardware counters.
The counters count IB events and are applicable for IB and RoCE.

The counters can be divided into two groups, per device and per port.
Device counters are always exposed.
Port counters are exposed only if the firmware supports per port counters.

rq_num_dup and sq_num_to are only exposed if we have firmware support
for them, if we do, we expose them per device and per port.
rq_num_udsdprd and num_cqovf are device only counters.

rq - denotes responder.
sq - denotes requester.

|-----------------------|---------------------------------------|
| Name | Description |
|-----------------------|---------------------------------------|
|rq_num_lle | Number of local length errors |
|-----------------------|---------------------------------------|
|sq_num_lle | number of local length errors |
|-----------------------|---------------------------------------|
|rq_num_lqpoe | Number of local QP operation errors |
|-----------------------|---------------------------------------|
|sq_num_lqpoe | Number of local QP operation errors |
|-----------------------|---------------------------------------|
|rq_num_lpe | Number of local protection errors |
|-----------------------|---------------------------------------|
|sq_num_lpe | Number of local protection errors |
|-----------------------|---------------------------------------|
|rq_num_wrfe | Number of CQEs with error |
|-----------------------|---------------------------------------|
|sq_num_wrfe | Number of CQEs with error |
|-----------------------|---------------------------------------|
|sq_num_mwbe | Number of Memory Window bind errors |
|-----------------------|---------------------------------------|
|sq_num_bre | Number of bad response errors |
|-----------------------|---------------------------------------|
|sq_num_rire | Number of Remote Invalid request |
| | errors |
|-----------------------|---------------------------------------|
|rq_num_rire | Number of Remote Invalid request |
| | errors |
|-----------------------|---------------------------------------|
|sq_num_rae | Number of remote access errors |
|-----------------------|---------------------------------------|
|rq_num_rae | Number of remote access errors |
|-----------------------|---------------------------------------|
|sq_num_roe | Number of remote operation errors |
|-----------------------|---------------------------------------|
|sq_num_tree | Number of transport retries exceeded |
| | errors |
|-----------------------|---------------------------------------|
|sq_num_rree | Number of RNR NAK retries exceeded |
| | errors |
|-----------------------|---------------------------------------|
|rq_num_rnr | Number of RNR NAKs sent |
|-----------------------|---------------------------------------|
|sq_num_rnr | Number of RNR NAKs received |
|-----------------------|---------------------------------------|
|rq_num_oos | Number of Out of Sequence requests |
| | received |
|-----------------------|---------------------------------------|
|sq_num_oos | Number of Out of Sequence NAKs |
| | received |
|-----------------------|---------------------------------------|
|rq_num_udsdprd | Number of UD packets silently |
| | discarded on the Receive Queue due to |
| | lack of receive descriptor |
|-----------------------|---------------------------------------|
|rq_num_dup | Number of duplicate requests received |
|-----------------------|---------------------------------------|
|sq_num_to | Number of time out received |
|-----------------------|---------------------------------------|
|num_cqovf | Number of CQ overflows |
|-----------------------|---------------------------------------|

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agonet/mlx4: Query performance and diagnostics counters
Mark Bloch [Tue, 19 Jul 2016 17:54:57 +0000 (20:54 +0300)]
net/mlx4: Query performance and diagnostics counters

Add a function to query diagnostics counters from the firmware.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agonet/mlx4: Add diagnostic counters capability bit
Mark Bloch [Tue, 19 Jul 2016 17:54:56 +0000 (20:54 +0300)]
net/mlx4: Add diagnostic counters capability bit

Add a bit that indicates if the firmware supports per port
diagnostic counters.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoUse smaller 512 byte messages for portmapper messages
Mustafa Ismail [Mon, 18 Jul 2016 19:21:49 +0000 (14:21 -0500)]
Use smaller 512 byte messages for portmapper messages

Portmapper messages are short and do not occupy more than 512 bytes.
Lower portmapper message size to 512 bytes. This change significantly
reduces the amount of memory needed when trying to establish a large
number of connections simultaneously. The old value is based on page
size.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/ipoib: Report SG feature regardless of HW UD CSUM capability
Yuval Shaia [Wed, 20 Jul 2016 08:30:06 +0000 (01:30 -0700)]
IB/ipoib: Report SG feature regardless of HW UD CSUM capability

Decouple SG support from HW ability to do UD checksum.
This coupling is for historical reasons and removed with 'commit
ec5f06156423 ("net: Kill link between CSUM and SG features.")'

During driver load it is assumed that device does not supports SG. The
final decision is taken after creating UD QP based on device capability.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: Don't use GFP_ATOMIC for CQ resize struct
Roland Dreier [Fri, 29 Jul 2016 04:58:43 +0000 (21:58 -0700)]
IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct

We allocate a small tracking structure as part of mlx4_ib_resize_cq().
However, we don't need to use GFP_ATOMIC -- immediately after the
allocation, we call mlx4_cq_resize(), which allocates a command
mailbox with GFP_KERNEL and then sleeps on a firmware command, so we
better not be in an atomic context.

This actually has a real impact, because when this GFP_ATOMIC
allocation fails (and GFP_ATOMIC does fail in practice) then a
userspace consumer resizing a CQ will get a spurious failure that we
can easily avoid.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Disable by default
Bart Van Assche [Tue, 19 Jul 2016 17:03:44 +0000 (10:03 -0700)]
IB/hfi1: Disable by default

There is a strict policy in the Linux kernel that new drivers must be
disabled by default. Hence leave out the "default m" line from Kconfig.

Fixes: f48ad614c100 ("IB/hfi1: Move driver out of staging")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Jubin John <jubin.john@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: <stable@vger.kernel.org> # v4.7+
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Disable by default
Bart Van Assche [Tue, 19 Jul 2016 17:03:17 +0000 (10:03 -0700)]
IB/rdmavt: Disable by default

There is a strict policy in the Linux kernel that new drivers must be
disabled by default. Hence leave out the "default m" line from Kconfig.

Fixes: 0194621b2253 ("IB/rdmavt: Create module framework and handle driver registration")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Jubin John <jubin.john@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoMerge branch 'i40iw' into k.o/for-4.8
Doug Ledford [Thu, 4 Aug 2016 01:00:16 +0000 (21:00 -0400)]
Merge branch 'i40iw' into k.o/for-4.8

7 years agoMerge branches 'cxgb4' and 'mlx5' into k.o/for-4.8
Doug Ledford [Thu, 4 Aug 2016 00:58:45 +0000 (20:58 -0400)]
Merge branches 'cxgb4' and 'mlx5' into k.o/for-4.8

7 years agoextable.h: add stddef.h so "NULL" definition is not implicit
Paul Gortmaker [Thu, 28 Jul 2016 03:11:47 +0000 (23:11 -0400)]
extable.h: add stddef.h so "NULL" definition is not implicit

While not an issue now, eventually we will have independent users of
the extable.h file and we will stop sourcing it via module.h header.

In testing that pending work, with very sparse builds, characteristic
of an "allnoconfig" on various architectures, we can sometimes hit an
instance where the very basic standard definitions aren't present,
resulting in:

 include/linux/extable.h:26:9: error: 'NULL' undeclared (first use in this function)

To be clear, this isn't a regression, since currently extable.h is
only used by module.h -- however, we will need this addition present
before we start migrating exception table users off module.h and onto
extable.h during the next release cycle.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
7 years agomodules: add ro_after_init support
Jessica Yu [Wed, 27 Jul 2016 02:36:21 +0000 (12:06 +0930)]
modules: add ro_after_init support

Add ro_after_init support for modules by adding a new page-aligned section
in the module layout (after rodata) for ro_after_init data and enabling RO
protection for that section after module init runs.

Signed-off-by: Jessica Yu <jeyu@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
7 years agojump_label: disable preemption around __module_text_address().
Rusty Russell [Wed, 27 Jul 2016 02:47:35 +0000 (12:17 +0930)]
jump_label: disable preemption around __module_text_address().

Steven reported a warning caused by not holding module_mutex or
rcu_read_lock_sched: his backtrace was corrupted but a quick audit
found this possible cause.  It's wrong anyway...

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
7 years agoexceptions: fork exception table content from module.h into extable.h
Paul Gortmaker [Wed, 27 Jul 2016 02:36:34 +0000 (12:06 +0930)]
exceptions: fork exception table content from module.h into extable.h

For historical reasons (i.e. pre-git) the exception table stuff was
buried in the middle of the module.h file.  I noticed this while
doing an audit for needless includes of module.h and found core
kernel files (both arch specific and arch independent) were just
including module.h for this.

The converse is also true, in that conventional drivers, be they
for filesystems or actual hardware peripherals or similar, do not
normally care about the exception tables.

Here we fork the exception table content out of module.h into a
new file called extable.h -- and temporarily include it into the
module.h itself.

Then we will work our way across the arch independent and arch
specific files needing just exception table content, and move
them off module.h and onto extable.h

Once that is done, we can remove the extable.h from module.h
and in doing it like this, we avoid introducing build failures
into the git history.

The gain here is that module.h gets a bit smaller, across all
modular drivers that we build for allmodconfig.  Also the core
files that only need exception table stuff don't have an include
of module.h that brings in lots of extra stuff and just looks
generally out of place.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
7 years agomodules: Add kernel parameter to blacklist modules
Prarit Bhargava [Thu, 21 Jul 2016 06:07:56 +0000 (15:37 +0930)]
modules: Add kernel parameter to blacklist modules

Blacklisting a module in linux has long been a problem.  The current
procedure is to use rd.blacklist=module_name, however, that doesn't
cover the case after the initramfs and before a boot prompt (where one
is supposed to use /etc/modprobe.d/blacklist.conf to blacklist
runtime loading). Using rd.shell to get an early prompt is hit-or-miss,
and doesn't cover all situations AFAICT.

This patch adds this functionality of permanently blacklisting a module
by its name via the kernel parameter module_blacklist=module_name.

[v2]: Rusty, use core_param() instead of __setup() which simplifies
things.

[v3]: Rusty, undo wreckage from strsep()

[v4]: Rusty, simpler version of blacklisted()

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
7 years agomodule: Do a WARN_ON_ONCE() for assert module mutex not held
Steven Rostedt [Mon, 18 Jul 2016 20:29:24 +0000 (05:59 +0930)]
module: Do a WARN_ON_ONCE() for assert module mutex not held

When running with lockdep enabled, I triggered the WARN_ON() in the
module code that asserts when module_mutex or rcu_read_lock_sched are
not held. The issue I have is that this can also be called from the
dump_stack() code, causing us to enter an infinite loop...

 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
 Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375253c #14
 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
  ffff880215e8fa70 ffff880215e8fa70 ffffffff812fc8e3 0000000000000000
  ffffffff81d3e55b ffff880215e8fac0 ffffffff8104fc88 ffffffff8104fcab
  0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001
 Call Trace:
  [<ffffffff812fc8e3>] dump_stack+0x67/0x90
  [<ffffffff8104fc88>] __warn+0xcb/0xe9
  [<ffffffff8104fcab>] ? warn_slowpath_null+0x5/0x1f
 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
 Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375253c #14
 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
  ffff880215e8f7a0 ffff880215e8f7a0 ffffffff812fc8e3 0000000000000000
  ffffffff81d3e55b ffff880215e8f7f0 ffffffff8104fc88 ffffffff8104fcab
  0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001
 Call Trace:
  [<ffffffff812fc8e3>] dump_stack+0x67/0x90
  [<ffffffff8104fc88>] __warn+0xcb/0xe9
  [<ffffffff8104fcab>] ? warn_slowpath_null+0x5/0x1f
 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
 Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375253c #14
 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
  ffff880215e8f4d0 ffff880215e8f4d0 ffffffff812fc8e3 0000000000000000
  ffffffff81d3e55b ffff880215e8f520 ffffffff8104fc88 ffffffff8104fcab
  0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001
 Call Trace:
  [<ffffffff812fc8e3>] dump_stack+0x67/0x90
  [<ffffffff8104fc88>] __warn+0xcb/0xe9
  [<ffffffff8104fcab>] ? warn_slowpath_null+0x5/0x1f
 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
[...]

Which gives us rather useless information. Worse yet, there's some race
that causes this, and I seldom trigger it, so I have no idea what
happened.

This would not be an issue if that warning was a WARN_ON_ONCE().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
7 years agoACPI / EC: Work around method reentrancy limit in ACPICA for _Qxx
Lv Zheng [Wed, 3 Aug 2016 01:00:14 +0000 (09:00 +0800)]
ACPI / EC: Work around method reentrancy limit in ACPICA for _Qxx

A regression is caused by the following commit:

  Commit: 02b771b64b73226052d6e731a0987db3b47281e9
  Subject: ACPI / EC: Fix an issue caused by the serialized _Qxx evaluations

In this commit, using system workqueue causes that the maximum parallel
executions of _Qxx can exceed 255. This violates the method reentrancy
limit in ACPICA and generates the following error log:

  ACPI Error: Method reached maximum reentrancy limit (255) (20150818/dsmethod-341)

This patch creates a seperate workqueue and limits the number of parallel
_Qxx evaluations down to a configurable value (can be tuned against number
of online CPUs).

Since EC events are handled after driver probe, we can create the workqueue
in acpi_ec_init().

Fixes: 02b771b64b73 (ACPI / EC: Fix an issue caused by the serialized _Qxx evaluations)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=135691
Cc: 4.3+ <stable@vger.kernel.org> # 4.3+
Reported-and-tested-by: Helen Buus <ubuntu@hbuus.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
7 years agoperf tests bpf: Use SyS_epoll_wait alias
Arnaldo Carvalho de Melo [Tue, 2 Aug 2016 19:48:19 +0000 (16:48 -0300)]
perf tests bpf: Use SyS_epoll_wait alias

Something made the sys_epoll_wait() function alias not to be found in
the vmlinux DWARF info, being found only in /proc/kallsyms, which made
the BPF perf tests to fail:

  [root@jouet ~]# perf test BPF
  37: Test BPF filter                                          :
  37.1: Test basic BPF filtering                               : FAILED!
  37.2: Test BPF prologue generation                           : Skip
  37.3: Test BPF relocation checker                            : Skip
  [root@jouet ~]#

Using -v we can see it is failing to find DWARF info for the probed function,
sys_epoll_wait, which we can find in /proc/kallsyms but not in vmlinux with
CONFIG_DEBUG_INFO:

  [root@jouet ~]# grep -w sys_epoll_wait /proc/kallsyms
  ffffffffbd295b50 T sys_epoll_wait
  [root@jouet ~]#

  [root@jouet ~]# readelf -wi /lib/modules/4.7.0+/build/vmlinux | grep -w sys_epoll_wait
  [root@jouet ~]#

If we try to use perf probe:

[root@jouet ~]# perf probe sys_epoll_wait
Failed to find debug information for address ffffffffbd295b50
Probe point 'sys_epoll_wait' not found.
  Error: Failed to add events.
[root@jouet ~]#

It all works if we use SyS_epoll_wait, that is just an alias to the probed
function:

  [root@jouet ~]# grep -i sys_epoll_wait /proc/kallsyms
  ffffffffbd295b50 T SyS_epoll_wait
  ffffffffbd295b50 T sys_epoll_wait
  [root@jouet ~]#

So use it:

  [root@jouet ~]# perf test BPF
  37: Test BPF filter                                          :
  37.1: Test basic BPF filtering                               : Ok
  37.2: Test BPF prologue generation                           : Ok
  37.3: Test BPF relocation checker                            : Ok
  [root@jouet ~]#

Further info:

  [root@jouet ~]# gcc --version
  gcc (GCC) 6.1.1 20160621 (Red Hat 6.1.1-3)
  [acme@jouet linux]$ cat /etc/fedora-release
  Fedora release 24 (Twenty Four)

Investigation as to why it fails is still underway, but it was always
going from sys_epoll_wait to SyS_epoll_wait when looking up the DWARF
info in vmlinux, and this is what is breaking now.

Switching to use SyS_epoll_wait allows this test to proceed and test the
BPF code it was designed for, so lets have this in to allow passing this
test while we fix the root cause.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-7hekjp0bodwjbb419sl2b55h@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoshmem: Fix link error if huge pages support is disabled
Geert Uytterhoeven [Wed, 3 Aug 2016 17:58:19 +0000 (19:58 +0200)]
shmem: Fix link error if huge pages support is disabled

If CONFIG_TRANSPARENT_HUGE_PAGECACHE=n, HPAGE_PMD_NR evaluates to
BUILD_BUG_ON(), and may cause (e.g. with gcc 4.12):

    mm/built-in.o: In function `shmem_alloc_hugepage':
    shmem.c:(.text+0x17570): undefined reference to `__compiletime_assert_1365'

To fix this, move the assignment to hindex after the check for huge
pages support.

Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agohostfs: Freeing an ERR_PTR in hostfs_fill_sb_common()
Dan Carpenter [Wed, 13 Jul 2016 10:12:34 +0000 (13:12 +0300)]
hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common()

We can't pass error pointers to kfree() or it causes an oops.

Fixes: 52b209f7b848 ('get rid of hostfs_read_inode()')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
7 years agoum: Support kcov
Vegard Nossum [Sat, 21 May 2016 15:46:10 +0000 (17:46 +0200)]
um: Support kcov

This adds support for kcov to UML.

There is a small problem where UML will randomly segfault during boot;
this is because current_thread_info() occasionally returns an invalid
(non-NULL) pointer and we try to dereference it in
__sanitizer_cov_trace_pc(). I consider this a bug in UML itself and this
patch merely exposes it.

[v2: disable instrumentation in UML-specific code]

Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Meyer <thomas@m3y3r.de>
Cc: user-mode-linux-devel <user-mode-linux-devel@lists.sourceforge.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
7 years agoum: Enable TRACE_IRQFLAGS_SUPPORT
Richard Weinberger [Sun, 12 Jun 2016 20:04:54 +0000 (22:04 +0200)]
um: Enable TRACE_IRQFLAGS_SUPPORT

Now we have everything we need, so enable
TRACE_IRQFLAGS_SUPPORT.

Signed-off-by: Richard Weinberger <richard@nod.at>
7 years agoum: Use asm-generic/irqflags.h
Daniel Wagner [Sun, 12 Jun 2016 20:04:53 +0000 (22:04 +0200)]
um: Use asm-generic/irqflags.h

Instead proving its own arch_local_irq_save() and arch_irqs_disabled()
version use the generic version from asm-generic/irqflags.h.

A nice side effect is that um gets a few additional arch_ functions
as well.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
[rw: Massaged commit message]
Signed-off-by: Richard Weinberger <richard@nod.at>
7 years agoum: Fix possible deadlock in sig_handler_common()
Richard Weinberger [Sun, 12 Jun 2016 20:03:16 +0000 (22:03 +0200)]
um: Fix possible deadlock in sig_handler_common()

We are in atomic context and must not sleep.
Sleeping here is possible since malloc() maps
to kmalloc() with GFP_KERNEL.

Cc: stable@vger.kernel.org
Fixes: b6024b21 ("um: extend fpstate to _xstate to support YMM registers")
Signed-off-by: Richard Weinberger <richard@nod.at>
7 years agodrm: i915: fix build when DEBUG_FS is disabled
Linus Torvalds [Wed, 3 Aug 2016 22:11:24 +0000 (18:11 -0400)]
drm: i915: fix build when DEBUG_FS is disabled

This clearly had never gotten tested, probably because you need a fairly
minimal configuration in order to disable DEBUG_FS (several other
options select it).

The dummy inline functions that were used for the no-DEBUG_FS case were
missing the argument names in the declarations.

Fixes: 1dac891c1c95 ("drm/i915: Register debugfs interface last")
Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoum: Select HAVE_DEBUG_KMEMLEAK
Richard Weinberger [Sun, 12 Jun 2016 19:56:43 +0000 (21:56 +0200)]
um: Select HAVE_DEBUG_KMEMLEAK

Now we have the infrastructure to support kmemleak.
Enable the HAVE flag.

Signed-off-by: Richard Weinberger <richard@nod.at>
7 years agoum: Setup physical memory in setup_arch()
Richard Weinberger [Sun, 12 Jun 2016 19:56:42 +0000 (21:56 +0200)]
um: Setup physical memory in setup_arch()

Currently UML sets up physical memory very early,
long before setup_arch() was called by the kernel main
function.
This can cause problems when code paths in UML's memory setup
code assume that the kernel is already running.
i.e. when kmemleak is enabled it will evaluate current()
in free_bootmem(). That early current() is undefined and
UML explodes.

Solve the problem by setting up physical memory in setup_arch(),
at this stage the kernel has materialized and basic infrastructure
such as current() works.

Signed-off-by: Richard Weinberger <richard@nod.at>
7 years agoum: Eliminate null test after alloc_bootmem
Amitoj Kaur Chawla [Sat, 28 May 2016 13:09:03 +0000 (18:39 +0530)]
um: Eliminate null test after alloc_bootmem

alloc_bootmem function never returns NULL. Thus a NULL test after a
call to this function is unnecessary.

The Coccinelle semantic patch used to make this change is follows:
@@
expression E;
statement S;
@@

E =
alloc_bootmem(...)
... when != E
- if (E == NULL) S

Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
7 years agoDocumenation: update cgroup's document path
seokhoon.yoon [Tue, 2 Aug 2016 14:23:57 +0000 (23:23 +0900)]
Documenation: update cgroup's document path

cgroup's document path is changed to "cgroup-v1". update it.

Signed-off-by: seokhoon.yoon <iamyooon@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
7 years agoDocumentation/sphinx: do not warn about missing tools in 'make help'
Jani Nikula [Mon, 1 Aug 2016 09:37:05 +0000 (12:37 +0300)]
Documentation/sphinx: do not warn about missing tools in 'make help'

Simply move the dochelp rule outside of the HAVE_SPHINX check,
overriding the .DEFAULT rule for HAVE_SPHINX=0.

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
7 years agoBtrfs: fix __MAX_CSUM_ITEMS
Chris Mason [Wed, 3 Aug 2016 21:05:46 +0000 (14:05 -0700)]
Btrfs: fix __MAX_CSUM_ITEMS

Jeff Mahoney's cleanup commit (14a1e067b4) wasn't correct for csums on
machines where the pagesize >= metadata blocksize.

This just reverts the relevant hunks to bring the old math back.

Signed-off-by: Chris Mason <clm@fb.com>
7 years agocachefiles: Fix race between inactivating and culling a cache object
David Howells [Wed, 3 Aug 2016 16:57:39 +0000 (17:57 +0100)]
cachefiles: Fix race between inactivating and culling a cache object

There's a race between cachefiles_mark_object_inactive() and
cachefiles_cull():

 (1) cachefiles_cull() can't delete a backing file until the cache object
     is marked inactive, but as soon as that's the case it's fair game.

 (2) cachefiles_mark_object_inactive() marks the object as being inactive
     and *only then* reads the i_blocks on the backing inode - but
     cachefiles_cull() might've managed to delete it by this point.

Fix this by making sure cachefiles_mark_object_inactive() gets any data it
needs from the backing inode before deactivating the object.

Without this, the following oops may occur:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
IP: [<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
...
CPU: 11 PID: 527 Comm: kworker/u64:4 Tainted: G          I    ------------   3.10.0-470.el7.x86_64 #1
Hardware name: Hewlett-Packard HP Z600 Workstation/0B54h, BIOS 786G4 v03.19 03/11/2011
Workqueue: fscache_object fscache_object_work_func [fscache]
task: ffff880035edaf10 ti: ffff8800b77c0000 task.ti: ffff8800b77c0000
RIP: 0010:[<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
RSP: 0018:ffff8800b77c3d70  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800bf6cc400 RCX: 0000000000000034
RDX: 0000000000000000 RSI: ffff880090ffc710 RDI: ffff8800bf761ef8
RBP: ffff8800b77c3d88 R08: 2000000000000000 R09: 0090ffc710000000
R10: ff51005d2ff1c400 R11: 0000000000000000 R12: ffff880090ffc600
R13: ffff8800bf6cc520 R14: ffff8800bf6cc400 R15: ffff8800bf6cc498
FS:  0000000000000000(0000) GS:ffff8800bb8c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000098 CR3: 00000000019ba000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff880090ffc600 ffff8800bf6cc400 ffff8800867df140 ffff8800b77c3db0
 ffffffffa06c48cb ffff880090ffc600 ffff880090ffc180 ffff880090ffc658
 ffff8800b77c3df0 ffffffffa085d846 ffff8800a96b8150 ffff880090ffc600
Call Trace:
 [<ffffffffa06c48cb>] cachefiles_drop_object+0x6b/0xf0 [cachefiles]
 [<ffffffffa085d846>] fscache_drop_object+0xd6/0x1e0 [fscache]
 [<ffffffffa085d615>] fscache_object_work_func+0xa5/0x200 [fscache]
 [<ffffffff810a605b>] process_one_work+0x17b/0x470
 [<ffffffff810a6e96>] worker_thread+0x126/0x410
 [<ffffffff810a6d70>] ? rescuer_thread+0x460/0x460
 [<ffffffff810ae64f>] kthread+0xcf/0xe0
 [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
 [<ffffffff81695418>] ret_from_fork+0x58/0x90
 [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140

The oopsing code shows:

callq  0xffffffff810af6a0 <wake_up_bit>
mov    0xf8(%r12),%rax
mov    0x30(%rax),%rax
mov    0x98(%rax),%rax   <---- oops here
lock add %rax,0x130(%rbx)

where this is:

d_backing_inode(object->dentry)->i_blocks

Fixes: a5b3a80b899bda0f456f1246c4c5a1191ea01519 (CacheFiles: Provide read-and-reset release counters for cachefilesd)
Reported-by: Jianhong Yin <jiyin@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
7 years agoMerge branch 'for-viro' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi...
Al Viro [Wed, 3 Aug 2016 17:31:51 +0000 (13:31 -0400)]
Merge branch 'for-viro' of git://git./linux/kernel/git/mszeredi/vfs into for-linus

7 years agoMerge tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Wed, 3 Aug 2016 16:50:06 +0000 (12:50 -0400)]
Merge tag 'trace-v4.8-1' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "A few updates and fixes:

   - move the suppressing of the __builtin_return_address >0 warning to
     the tracing directory only.

   - metag recordmcount fix for newer glibc's

   - two tracing histogram fixes that were reported by KASAN"

* tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix use-after-free in hist_register_trigger()
  tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_all
  Makefile: Mute warning for __builtin_return_address(>0) for tracing only
  ftrace/recordmcount: Work around for addition of metag magic but not relocations

7 years agofs/proc: Add compiler check for -Wno-override-init to support gcc < 4.2
Geert Uytterhoeven [Wed, 3 Aug 2016 15:33:32 +0000 (17:33 +0200)]
fs/proc: Add compiler check for -Wno-override-init to support gcc < 4.2

With gcc < 4.2 (e.g. 4.1.2):

      CC      fs/proc/task_mmu.o
    cc1: error: unrecognized command line option "-Wno-override-init"

To fix this, only enable the compiler option when it is actually
supported by the compiler.

Fixes: ca52953f5f24 ("fs/proc/task_mmu.c: suppress compilation warnings with W=1")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agodm raid: constructor fails on non-zero incompat_features
Heinz Mauelshagen [Wed, 3 Aug 2016 15:47:04 +0000 (17:47 +0200)]
dm raid: constructor fails on non-zero incompat_features

When lvm2 userspace requests a RaidLV repair, it sets the rebuild
constructor flag on the new replacement DataLVs but does not clear the
respective MetaLVs.  Hence the superblock that is loaded from such new
MetaLVs may have a non-zero incompat_features member and the constructor
will fail with false-positive on incompat_features.

Solve by initializing the incompat_features member properly.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
7 years ago9p: use clone_fid()
Al Viro [Wed, 3 Aug 2016 15:12:12 +0000 (11:12 -0400)]
9p: use clone_fid()

in a bunch of places it cleans the things up

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
7 years ago9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"
Al Viro [Wed, 3 Aug 2016 15:02:48 +0000 (11:02 -0400)]
9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"

In v9fs_vfs_rename() we need to clone the parents' fids, not just
find them.

Spotted-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
7 years agodm raid: fix processing of max_recovery_rate constructor flag
Heinz Mauelshagen [Wed, 27 Jul 2016 21:34:01 +0000 (23:34 +0200)]
dm raid: fix processing of max_recovery_rate constructor flag

__CTR_FLAG_MIN_RECOVERY_RATE was used instead of __CTR_FLAG_MAX_RECOVERY_RATE
thus causing max_recovery_rate to be rejected in case min_recovery_rate
was already set.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
7 years agoALSA: hda: Fix krealloc() with __GFP_ZERO usage
Takashi Iwai [Wed, 3 Aug 2016 13:13:00 +0000 (15:13 +0200)]
ALSA: hda: Fix krealloc() with __GFP_ZERO usage

krealloc() doesn't work always properly with __GFP_ZERO flag as
expected.  For clearing the reallocated area, we need to clear
explicitly instead.

Reported-by: Joe Perches <joe@perches.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>