cascardo/linux.git
7 years agoMerge tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Wed, 3 Aug 2016 16:50:06 +0000 (12:50 -0400)]
Merge tag 'trace-v4.8-1' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "A few updates and fixes:

   - move the suppressing of the __builtin_return_address >0 warning to
     the tracing directory only.

   - metag recordmcount fix for newer glibc's

   - two tracing histogram fixes that were reported by KASAN"

* tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix use-after-free in hist_register_trigger()
  tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_all
  Makefile: Mute warning for __builtin_return_address(>0) for tracing only
  ftrace/recordmcount: Work around for addition of metag magic but not relocations

7 years agofs/proc: Add compiler check for -Wno-override-init to support gcc < 4.2
Geert Uytterhoeven [Wed, 3 Aug 2016 15:33:32 +0000 (17:33 +0200)]
fs/proc: Add compiler check for -Wno-override-init to support gcc < 4.2

With gcc < 4.2 (e.g. 4.1.2):

      CC      fs/proc/task_mmu.o
    cc1: error: unrecognized command line option "-Wno-override-init"

To fix this, only enable the compiler option when it is actually
supported by the compiler.

Fixes: ca52953f5f24 ("fs/proc/task_mmu.c: suppress compilation warnings with W=1")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agodm raid: constructor fails on non-zero incompat_features
Heinz Mauelshagen [Wed, 3 Aug 2016 15:47:04 +0000 (17:47 +0200)]
dm raid: constructor fails on non-zero incompat_features

When lvm2 userspace requests a RaidLV repair, it sets the rebuild
constructor flag on the new replacement DataLVs but does not clear the
respective MetaLVs.  Hence the superblock that is loaded from such new
MetaLVs may have a non-zero incompat_features member and the constructor
will fail with false-positive on incompat_features.

Solve by initializing the incompat_features member properly.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
7 years ago9p: use clone_fid()
Al Viro [Wed, 3 Aug 2016 15:12:12 +0000 (11:12 -0400)]
9p: use clone_fid()

in a bunch of places it cleans the things up

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
7 years ago9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"
Al Viro [Wed, 3 Aug 2016 15:02:48 +0000 (11:02 -0400)]
9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"

In v9fs_vfs_rename() we need to clone the parents' fids, not just
find them.

Spotted-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
7 years agodm raid: fix processing of max_recovery_rate constructor flag
Heinz Mauelshagen [Wed, 27 Jul 2016 21:34:01 +0000 (23:34 +0200)]
dm raid: fix processing of max_recovery_rate constructor flag

__CTR_FLAG_MIN_RECOVERY_RATE was used instead of __CTR_FLAG_MAX_RECOVERY_RATE
thus causing max_recovery_rate to be rejected in case min_recovery_rate
was already set.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
7 years agoALSA: hda: Fix krealloc() with __GFP_ZERO usage
Takashi Iwai [Wed, 3 Aug 2016 13:13:00 +0000 (15:13 +0200)]
ALSA: hda: Fix krealloc() with __GFP_ZERO usage

krealloc() doesn't work always properly with __GFP_ZERO flag as
expected.  For clearing the reallocated area, we need to clear
explicitly instead.

Reported-by: Joe Perches <joe@perches.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
7 years agovfs: make dentry_needs_remove_privs() internal
Miklos Szeredi [Wed, 3 Aug 2016 11:44:27 +0000 (13:44 +0200)]
vfs: make dentry_needs_remove_privs() internal

Only used by the vfs.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
7 years agovfs: remove file_needs_remove_privs()
Miklos Szeredi [Wed, 3 Aug 2016 11:44:27 +0000 (13:44 +0200)]
vfs: remove file_needs_remove_privs()

This function is now unused.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
7 years agovfs: fix deadlock in file_remove_privs() on overlayfs
Miklos Szeredi [Wed, 3 Aug 2016 11:44:27 +0000 (13:44 +0200)]
vfs: fix deadlock in file_remove_privs() on overlayfs

file_remove_privs() is called with inode lock on file_inode(), which
proceeds to calling notify_change() on file->f_path.dentry.  Which triggers
the WARN_ON_ONCE(!inode_is_locked(inode)) in addition to deadlocking later
when ovl_setattr tries to lock the underlying inode again.

Fix this mess by not mixing the layers, but doing everything on underlying
dentry/inode.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 07a2daab49c5 ("ovl: Copy up underlying inode's ->i_mode to overlay inode")
Cc: <stable@vger.kernel.org>
7 years agoALSA: hda: add AMD Bonaire AZ PCI ID with proper driver caps
Maruthi Srinivas Bayyavarapu [Wed, 3 Aug 2016 11:16:39 +0000 (16:46 +0530)]
ALSA: hda: add AMD Bonaire AZ PCI ID with proper driver caps

This commit fixes garbled audio on Bonaire HDMI

Signed-off-by: Maruthi Bayyavarapu <maruthi.bayyavarapu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Wed, 3 Aug 2016 11:26:11 +0000 (07:26 -0400)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix several cases of missing of_node_put() calls in various
    networking drivers.  From Peter Chen.

 2) Don't try to remove unconfigured VLANs in qed driver, from Yuval
    Mintz.

 3) Unbalanced locking in TIPC error handling, from Wei Yongjun.

 4) Fix lockups in CPDMA driver, from Grygorii Strashko.

 5) More MACSEC refcount et al fixes, from Sabrina Dubroca.

 6) Fix MAC address setting in r8169 during runtime suspend, from
    Chun-Hao Lin.

 7) Various printf format specifier fixes, from Heinrich Schuchardt.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
  qed: Fail driver load in 100g MSI mode.
  ethernet: ti: davinci_emac: add missing of_node_put after calling of_parse_phandle
  ethernet: stmicro: stmmac: add missing of_node_put after calling of_parse_phandle
  ethernet: stmicro: stmmac: dwmac-socfpga: add missing of_node_put after calling of_parse_phandle
  ethernet: renesas: sh_eth: add missing of_node_put after calling of_parse_phandle
  ethernet: renesas: ravb_main: add missing of_node_put after calling of_parse_phandle
  ethernet: marvell: pxa168_eth: add missing of_node_put after calling of_parse_phandle
  ethernet: marvell: mvpp2: add missing of_node_put after calling of_parse_phandle
  ethernet: marvell: mvneta: add missing of_node_put after calling of_parse_phandle
  ethernet: hisilicon: hns: hns_dsaf_main: add missing of_node_put after calling of_parse_phandle
  ethernet: hisilicon: hns: hns_dsaf_mac: add missing of_node_put after calling of_parse_phandle
  ethernet: cavium: octeon: add missing of_node_put after calling of_parse_phandle
  ethernet: aurora: nb8800: add missing of_node_put after calling of_parse_phandle
  ethernet: arc: emac_main: add missing of_node_put after calling of_parse_phandle
  ethernet: apm: xgene: add missing of_node_put after calling of_parse_phandle
  ethernet: altera: add missing of_node_put
  8139too: fix system hang when there is a tx timeout event.
  qed: Fix error return code in qed_resc_alloc()
  net: qlcnic: avoid superfluous assignement
  dsa: b53: remove redundant if
  ...

7 years agoMerge branch '4.7-fixes' into mips-for-linux-next
Ralf Baechle [Wed, 3 Aug 2016 10:55:49 +0000 (12:55 +0200)]
Merge branch '4.7-fixes' into mips-for-linux-next

7 years agoInput: add driver for SiS 9200 family I2C touchscreen controllers
Mika Penttilä [Wed, 3 Aug 2016 06:52:32 +0000 (23:52 -0700)]
Input: add driver for SiS 9200 family I2C touchscreen controllers

This is a driver for SiS 9200 family touchscreen controllers using I2C bus.

Signed-off-by: Mika Penttilä <mika.penttila@nextfour.com>
Acked-by: Tammy Tseng <tammy_tseng@sis.com>
Acked-by: Yuger Yu <yuger_yu@sis.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
7 years agoMIPS: mm: Fix definition of R6 cache instruction
Matt Redfearn [Tue, 14 Jun 2016 13:59:38 +0000 (14:59 +0100)]
MIPS: mm: Fix definition of R6 cache instruction

Commit a168b8f1cde6 ("MIPS: mm: Add MIPS R6 instruction encodings") added
an incorrect definition of the redefined MIPSr6 cache instruction.

Executing any kernel code including this instuction results in a
reserved instruction exception and kernel panic.

Fix the instruction definition.

Fixes: a168b8f1cde6588ff7a67699fa11e01bc77a5ddd
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: <stable@vger.kernel.org> # 4.x-
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13663/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
7 years agoMIPS: tools: Fix relocs tool compiler warnings
Harvey Hunt [Thu, 16 Jun 2016 15:35:39 +0000 (16:35 +0100)]
MIPS: tools: Fix relocs tool compiler warnings

When using clang as HOSTCC, the following warnings appear:

In file included from arch/mips/boot/tools/relocs_64.c:27:0:
arch/mips/boot/tools/relocs.c: In function ‘read_relocs’:
arch/mips/boot/tools/relocs.c:397:4: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
    ELF_R_SYM(rel->r_info) = elf32_to_cpu(ELF_R_SYM(rel->r_info));
    ^~~~~~~~~
arch/mips/boot/tools/relocs.c:397:4: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
arch/mips/boot/tools/relocs.c: In function ‘walk_relocs’:
arch/mips/boot/tools/relocs.c:491:4: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
    Elf_Sym *sym = &sh_symtab[ELF_R_SYM(rel->r_info)];
    ^~~~~~~
arch/mips/boot/tools/relocs.c: In function ‘do_reloc’:
arch/mips/boot/tools/relocs.c:502:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
  unsigned r_type = ELF_R_TYPE(rel->r_info);
  ^~~~~~~~
arch/mips/boot/tools/relocs.c: In function ‘do_reloc_info’:
arch/mips/boot/tools/relocs.c:641:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   rel_type(ELF_R_TYPE(rel->r_info)),
   ^~~~~~~~

Fix them by making Elf64_Mips_Rela a union

Signed-off-by: Harvey Hunt <harvey.hunt@imgtec.com>
Acked-by: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13683/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
7 years agoInput: ili210x - fix permissions on "calibrate" attribute
Dmitry Torokhov [Tue, 2 Aug 2016 17:31:43 +0000 (10:31 -0700)]
Input: ili210x - fix permissions on "calibrate" attribute

"calibrate" attribute does not provide "show" methods and thus we should
not mark it as readable.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
7 years agoInput: elan_i2c - properly wake up touchpad on ASUS laptops
KT Liao [Wed, 13 Jul 2016 18:12:12 +0000 (11:12 -0700)]
Input: elan_i2c - properly wake up touchpad on ASUS laptops

Some ASUS laptops were shipped with touchpads that require to be woken up
first, before trying to switch them into absolute reporting mode, otherwise
touchpad would fail to work while flooding the logs with:

elan_i2c i2c-ELAN1000:00: invalid report id data (1)

Among affected devices are Asus E202SA, N552VW, X456UF, UX305CA, and
others. We detect such devices by checking the IC type and product ID
numbers and adjusting order of operations accordingly.

Signed-off-by: KT Liao <kt.liao@emc.com.tw>
Reported-by: Chris Chiu <chiu@endlessm.com>
Reported-by: Vlad Glagolev <stealth@vaygr.net>
Tested-by: Vlad Glagolev <stealth@vaygr.net>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
7 years agoInput: add driver for Silead touchscreens
Robert Dolca [Thu, 28 Jul 2016 21:28:46 +0000 (14:28 -0700)]
Input: add driver for Silead touchscreens

This driver adds support for Silead touchscreens. It has been tested
with GSL1680 and GSL3680 touch panels.

It supports ACPI and device tree enumeration. Screen resolution,
the maximum number of fingers supported and firmware name are
configurable.

Signed-off-by: Robert Dolca <robert.dolca@intel.com>
Signed-off-by: Daniel Jansen <djaniboe@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
7 years agoInput: elantech - fix debug dump of the current packet
Benjamin Tissoires [Thu, 28 Jul 2016 18:22:13 +0000 (11:22 -0700)]
Input: elantech - fix debug dump of the current packet

The use of mixed psmouse_printk() and printk creates 2 lines in the log,
while the use of %*ph solves everything.

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
7 years agoMIPS: Cobalt: Fix typo
Andrea Gelmini [Sat, 21 May 2016 11:59:39 +0000 (13:59 +0200)]
MIPS: Cobalt: Fix typo

Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Cc: trivial@kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13316/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
7 years agoMIPS: Octeon: Fix typo
Andrea Gelmini [Sat, 21 May 2016 11:59:32 +0000 (13:59 +0200)]
MIPS: Octeon: Fix typo

Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Cc: trivial@kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13315/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
7 years agoMIPS: Lantiq: Fix build failure
Sudip Mukherjee [Thu, 16 Jun 2016 20:46:08 +0000 (21:46 +0100)]
MIPS: Lantiq: Fix build failure

Some configs of mips like xway_defconffig are failing with the error:
arch/mips/lantiq/irq.c:209:2: error: initialization from incompatible
pointer type [-Werror]
  "icu",
  ^
arch/mips/lantiq/irq.c:209:2: error: (near initialization for
'ltq_irq_type.parent_device') [-Werror]
arch/mips/lantiq/irq.c:219:2: error: initialization from incompatible
pointer type [-Werror]
  "eiu",
  ^
arch/mips/lantiq/irq.c:219:2: error: (near initialization for
'ltq_eiu_type.parent_device') [-Werror]

The first member of the "struct irq" is no longer a pointer for the
name.

Fixes: be45beb2df69 ("genirq: Add runtime power management support for IRQ chips")
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Acked-by: John Crispin <john@phrozen.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13684/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
7 years agopowerpc/32: Fix early access to cpu_spec relocation
Benjamin Herrenschmidt [Tue, 2 Aug 2016 05:53:01 +0000 (15:53 +1000)]
powerpc/32: Fix early access to cpu_spec relocation

Commit 9402c6846131 ("powerpc: Factor do_feature_fixup calls")
introduced a subtle bug on 32-bit. When reading the cpu spec from the
global, we not only need to do a pointer relocation on the global
address but also on the pointer we read from it.

This fixes crashes reported on MPC5200 based machines.

Fixes: 9402c6846131 ("powerpc: Factor do_feature_fixup calls")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
7 years agoIB/hfi1: Add cache evict LRU list
Dean Luick [Thu, 28 Jul 2016 19:21:27 +0000 (15:21 -0400)]
IB/hfi1: Add cache evict LRU list

The original code used a LRU list to evict nodes which were least
recently used.  For correctness the evict code was moved under the
handler->lock, now add back the LRU list.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Fix memory leak during unexpected shutdown
Ira Weiny [Thu, 28 Jul 2016 19:21:26 +0000 (15:21 -0400)]
IB/hfi1: Fix memory leak during unexpected shutdown

During an unexpected shutdown, references to tid_rb_node were NULL'ed out
without properly being released.

Fix this by calling clear_tid_node in the mmu notifier remove callback
rather than after these callbacks are called.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove unneeded mm argument in remove function
Dean Luick [Thu, 28 Jul 2016 19:21:25 +0000 (15:21 -0400)]
IB/hfi1: Remove unneeded mm argument in remove function

The reworked mmu_rb interface allows the unused mm argument to be removed.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Consistently call ops->remove outside spinlock
Dean Luick [Thu, 28 Jul 2016 19:21:24 +0000 (15:21 -0400)]
IB/hfi1: Consistently call ops->remove outside spinlock

The ops->remove() callback was called by hfi1_mmu_unregister() with a
NULL mm argument while holding a spinlock.  In the case of sdma_rb_remove()
this caused it to pass current->mm to hfi1_release_user_pages()

This had 2 problems.  First this would attempt to acquire the mmap_sem
under a spin lock.  Second the use of current->mm is not always guaranteed
to be the proper mm when the fd is being closed.

Rather than depend on this implicit behavior we move all calls to
ops->remove outside of the spinlock.  This also allows the correct
mm to be used in the remove callback without fear of deadlock.

Because the MMU notifier is not guaranteed to hold mm->mmap_sem, but
usually does, we must delay all remove callbacks until out of the notifier,
when the callbacks can take the mmap_sem if they need to.

Code comments were added to clarify what the expectations are for the
users of the mmu rb tree.

Suggested-by: Jim Foraker <foraker1@llnl.gov>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Use evict mmu rb operation
Dean Luick [Thu, 28 Jul 2016 19:21:23 +0000 (15:21 -0400)]
IB/hfi1: Use evict mmu rb operation

Use the new cache evict operation in the SDMA code.  This allows the cache
to properly coordinate evicts and removes, preventing any race.  With this
change, the separate list, lock, and race flag are not needed.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Add evict operation to the mmu rb handler
Dean Luick [Thu, 28 Jul 2016 19:21:22 +0000 (15:21 -0400)]
IB/hfi1: Add evict operation to the mmu rb handler

Allow users to clear nodes from the rb tree based on their evict callback.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Fix TID caching actions
Dean Luick [Thu, 28 Jul 2016 19:21:21 +0000 (15:21 -0400)]
IB/hfi1: Fix TID caching actions

Per file descriptor TID caching actions depend on a global that can
change midway through the lifetime of that file descriptor.

Make the use of caching consistent for the life of the file descriptor
by using the presence of the cache handler to decide when to use the cache
functions.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Make the cache handler own its rb tree root
Dean Luick [Thu, 28 Jul 2016 19:21:20 +0000 (15:21 -0400)]
IB/hfi1: Make the cache handler own its rb tree root

The objects which use cache handling should reference their own handler
object not the internal data structure it uses to track the nodes.

Have the "users" of the mmu notifier code pass opaque objects which can
then be properly used in the mmu callbacks depending on the owners needs.

This patch has the additional benefit that operations no longer require a
look up in a list to find the handlers.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Make use of mm consistent
Ira Weiny [Thu, 28 Jul 2016 19:21:19 +0000 (15:21 -0400)]
IB/hfi1: Make use of mm consistent

The hfi1 driver registers a mmu_notifier callback when /dev/hfi1_* is
opened, and unregisters it when the device is closed.  The driver
incorrectly assumes that the close will always happen from the same
context as the open.  In particular, closes due to SIGKILL or OOM killer
activity may happen from a different context.  In these cases, the wrong
mm is passed to mmu_notifier_unregister(), which causes improper reference
counting for the victim mm, and eventual memory corruption.

Preserve the mm for all open file descriptors and use this mm rather than
current->mm for memory operations for the lifetime of that fd.  Note: this
patch leaves 1 use of current->mm in place.  This use is removed in a
follow on patch because other functional changes were required prior to
that use being removed.

If registration fails, there is no reason to keep the handler object
around.  Free the handler object rather than add it to the list to
prevent any mmu_notifier operations, including unregister, when
registration fails.

Suggested-by: Jim Foraker <foraker1@llnl.gov>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Fix user SDMA racy user request claim
Dean Luick [Thu, 28 Jul 2016 19:21:18 +0000 (15:21 -0400)]
IB/hfi1: Fix user SDMA racy user request claim

The user SDMA in-use claim bit is in the structure that gets zeroed out
once the claim is made.  Move the request in-use flag into its own bit
array and use that for atomic claims.  This cleans up the claim code and
removes any race possibility.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Fix error condition that needs to clean up
Dean Luick [Thu, 28 Jul 2016 19:21:17 +0000 (15:21 -0400)]
IB/hfi1: Fix error condition that needs to clean up

If input validation fails, properly free the request before returning.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Release node on insert failure
Dean Luick [Thu, 28 Jul 2016 19:21:16 +0000 (15:21 -0400)]
IB/hfi1: Release node on insert failure

If unable to insert node into the RB tree cache, node will be freed
before returning from the function.  Null out iovec's pointer to node
so iovec does not try to free it later.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Validate SDMA user iovector count
Dean Luick [Thu, 28 Jul 2016 19:21:15 +0000 (15:21 -0400)]
IB/hfi1: Validate SDMA user iovector count

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Validate SDMA user request index
Dean Luick [Thu, 28 Jul 2016 19:21:14 +0000 (15:21 -0400)]
IB/hfi1: Validate SDMA user request index

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Use the same capability state for all shared contexts
Dean Luick [Thu, 28 Jul 2016 19:21:13 +0000 (15:21 -0400)]
IB/hfi1: Use the same capability state for all shared contexts

Save the current capability state at user context creation
time.  Report this saved value for all shared contexts.

Also get rid of unnecessary hfi1_get_base_kinfo function.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Prevent null pointer dereference
Ira Weiny [Thu, 28 Jul 2016 19:21:12 +0000 (15:21 -0400)]
IB/hfi1: Prevent null pointer dereference

If a context has not been assigned or assignment failed, pq may be NULL.
Move the unregister within the protection of the null check.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Rename TID mmu_rb_* functions
Dean Luick [Thu, 28 Jul 2016 16:27:37 +0000 (12:27 -0400)]
IB/hfi1: Rename TID mmu_rb_* functions

Clarify the names of the TID mmu functions.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister()
Dean Luick [Thu, 28 Jul 2016 16:27:36 +0000 (12:27 -0400)]
IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister()

Checking if the rb tree is empty is redundant with the while loop which is
emptying the rb tree.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Restructure hfi1_file_open
Ira Weiny [Thu, 28 Jul 2016 16:27:35 +0000 (12:27 -0400)]
IB/hfi1: Restructure hfi1_file_open

Rearrange the file open call in prep for new changes.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Make iovec loop index easy to understand
Dean Luick [Thu, 28 Jul 2016 16:27:34 +0000 (12:27 -0400)]
IB/hfi1: Make iovec loop index easy to understand

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Use "false" not 0
Ira Weiny [Thu, 28 Jul 2016 16:27:33 +0000 (12:27 -0400)]
IB/hfi1: Use "false" not 0

For bool parameters "false" should be used

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove unused sub-context parameter
Ira Weiny [Thu, 28 Jul 2016 16:27:32 +0000 (12:27 -0400)]
IB/hfi1: Remove unused sub-context parameter

subctxt is not used, just remove it.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Consolidate __mmu_rb_remove and hfi1_mmu_rb_remove
Ira Weiny [Thu, 28 Jul 2016 16:27:31 +0000 (12:27 -0400)]
IB/hfi1: Consolidate __mmu_rb_remove and hfi1_mmu_rb_remove

__mmu_rb_remove was called in only 1 place which was a very simple
call site.  Combine this function into its caller.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Always expect ops functions
Dean Luick [Thu, 28 Jul 2016 16:27:30 +0000 (12:27 -0400)]
IB/hfi1: Always expect ops functions

Remove, insert, and invalidate are always provided.  No
need to test.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Add parameter names to callback declarations
Ira Weiny [Thu, 28 Jul 2016 16:27:29 +0000 (12:27 -0400)]
IB/hfi1: Add parameter names to callback declarations

This makes it more clear what these functions are
operating on.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Add parameter names to function declarations
Ira Weiny [Thu, 28 Jul 2016 16:27:28 +0000 (12:27 -0400)]
IB/hfi1: Add parameter names to function declarations

Parameter names to function declarations make it more clear
what those parameters do.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove unused function hfi1_mmu_rb_search
Dean Luick [Thu, 28 Jul 2016 16:27:27 +0000 (12:27 -0400)]
IB/hfi1: Remove unused function hfi1_mmu_rb_search

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove unused uctxt->subpid and uctxt->pid
Dean Luick [Thu, 28 Jul 2016 16:27:26 +0000 (12:27 -0400)]
IB/hfi1: Remove unused uctxt->subpid and uctxt->pid

These are no longer needed.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Fix minor format error
Ira Weiny [Thu, 28 Jul 2016 16:27:25 +0000 (12:27 -0400)]
IB/hfi1: Fix minor format error

Brackets should be on the next line of a function

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Expand reported serial number
Ira Weiny [Thu, 28 Jul 2016 01:09:40 +0000 (21:09 -0400)]
IB/hfi1: Expand reported serial number

Expand the serial number space by using more bits
from the GUID.

Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Allow for non-double word multiple message sizes for user SDMA
Ira Weiny [Thu, 28 Jul 2016 01:08:42 +0000 (21:08 -0400)]
IB/hfi1: Allow for non-double word multiple message sizes for user SDMA

The driver pads non-double word multiple message sizes but it doesn't
account for this padding when the packet length is calculated. Also, the
data length is miscalculated for message sizes less than 4 bytes due to
the bit representation in LRH. And there's a check for non-double word
multiple message sizes that prevents these messages from being sent.
This patch fixes length miscalculations and enables the functionality to
send non-double word multiple message sizes.

Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Eliminate redundant opcode test in mr ref clear
Ira Weiny [Thu, 28 Jul 2016 01:07:36 +0000 (21:07 -0400)]
IB/rdmavt: Eliminate redundant opcode test in mr ref clear

The use of the specific opcode test is redundant since
all ack entry users correctly manipulate the mr pointer
to selectively trigger the reference clearing.

The overly specific test hinders the use of implementation
specific operations.

The change needs to get rid of the union to insure that
an atomic value is not seen as an MR pointer.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Handle kzalloc failure in init_pervl_scs
Ira Weiny [Thu, 28 Jul 2016 01:06:15 +0000 (21:06 -0400)]
IB/hfi1: Handle kzalloc failure in init_pervl_scs

Checking the return value of the memory allocation call in
init_pervl_scs() was missed.  Recently the kmalloc() was changed to
kzalloc() which identified the problem.

While fixing this issue 2 other bugs were noticed.  First, the array
being allocated is accessed in the nomem path which can be reached before
it is allocated.  Second, kernel_send_context was not released on error.
Fix both of these by creating a more common memory unwind label structure.

Fixes: 35f6befc8441 ("staging/rdma/hfi1: Add qp to send context mapping for PIO")
Reported-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoxfs: move (and rename) the deferred bmap-free tracepoints
Darrick J. Wong [Wed, 3 Aug 2016 02:31:07 +0000 (12:31 +1000)]
xfs: move (and rename) the deferred bmap-free tracepoints

Rename the deferred bmap-free to extent_free and make them only
trigger when we're really running deferred ops.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: collapse single use static functions
Darrick J. Wong [Wed, 3 Aug 2016 02:30:31 +0000 (12:30 +1000)]
xfs: collapse single use static functions

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: remove unnecessary parentheses from log redo item recovery functions
Darrick J. Wong [Wed, 3 Aug 2016 02:29:32 +0000 (12:29 +1000)]
xfs: remove unnecessary parentheses from log redo item recovery functions

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: remove the extents array from the rmap update done log item
Darrick J. Wong [Wed, 3 Aug 2016 02:28:43 +0000 (12:28 +1000)]
xfs: remove the extents array from the rmap update done log item

Nothing ever uses the extent array in the rmap update done redo
item, so remove it before it is fixed in the on-disk log format.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: in btree_lshift, only allocate temporary cursor when needed
Darrick J. Wong [Wed, 3 Aug 2016 02:26:22 +0000 (12:26 +1000)]
xfs: in btree_lshift, only allocate temporary cursor when needed

We only need the temporary cursor in _btree_lshift if we're shifting
in an overlapped btree.  Therefore, factor that into a single block
of code so we avoid unnecessary cursor duplication.

Also fix use of the wrong cursor when checking for corruption in
xfs_btree_rshift().

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: remove unnecesary lshift/rshift key initialization
Darrick J. Wong [Wed, 3 Aug 2016 02:22:45 +0000 (12:22 +1000)]
xfs: remove unnecesary lshift/rshift key initialization

In the lshift/rshift functions we don't use the key variable for
anything now, so remove the variable and its initializer.  The
update_keys functions figure out the key for a block on their own.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: remove the get*keys and update_keys btree ops pointers
Darrick J. Wong [Wed, 3 Aug 2016 02:22:12 +0000 (12:22 +1000)]
xfs: remove the get*keys and update_keys btree ops pointers

These are internal btree functions; we don't need them to be
dispatched via function pointers.  Make them static again and
just check the overlapped flag to figure out what we need to
do.  The strategy behind this patch was suggested by Christoph.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: enable the rmap btree functionality
Darrick J. Wong [Wed, 3 Aug 2016 02:20:57 +0000 (12:20 +1000)]
xfs: enable the rmap btree functionality

Originally-From: Dave Chinner <dchinner@redhat.com>

Add the feature flag to the supported matrix so that the kernel can
mount and use rmap btree enabled filesystems

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick.wong@oracle.com: move the experimental tag]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: don't update rmapbt when fixing agfl
Darrick J. Wong [Wed, 3 Aug 2016 02:19:53 +0000 (12:19 +1000)]
xfs: don't update rmapbt when fixing agfl

Allow a caller of xfs_alloc_fix_freelist to disable rmapbt updates
when fixing the AG freelist.  xfs_repair needs this during phase 5
to be able to adjust the freelist while it's reconstructing the rmap
btree; the missing entries will be added back at the very end of
phase 5 once the AGFL contents settle down.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled
Darrick J. Wong [Wed, 3 Aug 2016 02:18:07 +0000 (12:18 +1000)]
xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled

Swapping extents between two inodes requires the owner to be updated
in the rmap tree for all the extents that are swapped. This code
does not yet exist, so switch off the XFS_IOC_SWAPEXT ioctl until
support has been implemented. This will need to be done before the
rmap btree code can have the experimental tag removed.

This functionality will be provided in a (much) later patch, using
some of the reflink deferred block remapping functionality to
accomlish extent swapping with rmap updates.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: add rmap btree block detection to log recovery
Darrick J. Wong [Wed, 3 Aug 2016 02:17:11 +0000 (12:17 +1000)]
xfs: add rmap btree block detection to log recovery

Originally-From: Dave Chinner <dchinner@redhat.com>

So such blocks can be correctly identified and have their operations
structures attached to validate recovery has not resulted in a
correct block.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: add rmap btree geometry feature flag
Darrick J. Wong [Wed, 3 Aug 2016 02:16:44 +0000 (12:16 +1000)]
xfs: add rmap btree geometry feature flag

Originally-From: Dave Chinner <dchinner@redhat.com>

So xfs_info and other userspace utilities know the filesystem is
using this feature.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: propagate bmap updates to rmapbt
Darrick J. Wong [Wed, 3 Aug 2016 02:16:05 +0000 (12:16 +1000)]
xfs: propagate bmap updates to rmapbt

When we map, unmap, or convert an extent in a file's data or attr
fork, schedule a respective update in the rmapbt.  Previous versions
of this patch required a 1:1 correspondence between bmap and rmap,
but this is no longer true as we now have ability to make interval
queries against the rmapbt.

We use the deferred operations code to handle redo operations
atomically and deadlock free.  This plumbs in all five rmap actions
(map, unmap, convert extent, alloc, free); we'll use the first three
now for file data, and reflink will want the last two.  We also add
an error injection site to test log recovery.

Finally, we need to fix the bmap shift extent code to adjust the
rmaps correctly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: enable the xfs_defer mechanism to process rmaps to update
Darrick J. Wong [Wed, 3 Aug 2016 02:11:01 +0000 (12:11 +1000)]
xfs: enable the xfs_defer mechanism to process rmaps to update

Connect the xfs_defer mechanism with the pieces that we'll need to
handle deferred rmap updates.  We'll wire up the existing code to
our new deferred mechanism later.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: log rmap intent items
Darrick J. Wong [Wed, 3 Aug 2016 02:09:48 +0000 (12:09 +1000)]
xfs: log rmap intent items

Provide a mechanism for higher levels to create RUI/RUD items, submit
them to the log, and a stub function to deal with recovered RUI items.
These parts will be connected to the rmapbt in a later patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: create rmap update intent log items
Darrick J. Wong [Wed, 3 Aug 2016 02:04:45 +0000 (12:04 +1000)]
xfs: create rmap update intent log items

Create rmap update intent/done log items to record redo information in
the log.  Because we need to roll transactions between updating the
bmbt mapping and updating the reverse mapping, we also have to track
the status of the metadata updates that will be recorded in the
post-roll transactions, just in case we crash before committing the
final transaction.  This mechanism enables log recovery to finish what
was already started.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: add rmap btree insert and delete helpers
Darrick J. Wong [Wed, 3 Aug 2016 02:03:58 +0000 (12:03 +1000)]
xfs: add rmap btree insert and delete helpers

Add a couple of helper functions to encapsulate rmap btree insert and
delete operations.  Add tracepoints to the update function.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: convert unwritten status of reverse mappings
Darrick J. Wong [Wed, 3 Aug 2016 02:03:19 +0000 (12:03 +1000)]
xfs: convert unwritten status of reverse mappings

Provide a function to convert an unwritten rmap extent to a real one
and vice versa.

[ dchinner: Note that this algorithm and code was derived from the
  existing bmapbt unwritten extent conversion code in
  xfs_bmap_add_extent_unwritten_real(). ]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: remove an extent from the rmap btree
Darrick J. Wong [Wed, 3 Aug 2016 01:45:12 +0000 (11:45 +1000)]
xfs: remove an extent from the rmap btree

Originally-From: Dave Chinner <dchinner@redhat.com>

Now that we have records in the rmap btree, we need to remove them
when extents are freed. This needs to find the relevant record in
the btree and remove/trim/split it accordingly.

[darrick.wong@oracle.com: make rmap routines handle the enlarged keyspace]
[dchinner: remove remaining unused debug printks]
[darrick: fix a bug when growfs in an AG with an rmap ending at EOFS]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: add an extent to the rmap btree
Darrick J. Wong [Wed, 3 Aug 2016 01:44:21 +0000 (11:44 +1000)]
xfs: add an extent to the rmap btree

Originally-From: Dave Chinner <dchinner@redhat.com>

Now all the btree, free space and transaction infrastructure is in
place, we can finally add the code to insert reverse mappings to the
rmap btree. Freeing will be done in a separate patch, so just the
addition operation can be focussed on here.

[darrick: handle owner offsets when adding rmaps]
[dchinner: remove remaining debug printk statements]
[darrick: move unwritten bit to rm_offset]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: add tracepoints for the rmap functions
Darrick J. Wong [Wed, 3 Aug 2016 01:43:24 +0000 (11:43 +1000)]
xfs: add tracepoints for the rmap functions

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: teach rmapbt to support interval queries
Darrick J. Wong [Wed, 3 Aug 2016 01:42:39 +0000 (11:42 +1000)]
xfs: teach rmapbt to support interval queries

Now that the generic btree code supports querying all records within a
range of keys, use that functionality to allow us to ask for all the
extents mapped to a range of physical blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: support overlapping intervals in the rmap btree
Darrick J. Wong [Wed, 3 Aug 2016 01:40:56 +0000 (11:40 +1000)]
xfs: support overlapping intervals in the rmap btree

Now that the generic btree code supports overlapping intervals, plug
in the rmap btree to this functionality.  We will need it to find
potential left neighbors in xfs_rmap_{alloc,free} later in the patch
set.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: add rmap btree operations
Darrick J. Wong [Wed, 3 Aug 2016 01:39:05 +0000 (11:39 +1000)]
xfs: add rmap btree operations

Originally-From: Dave Chinner <dchinner@redhat.com>

Implement the generic btree operations needed to manipulate rmap
btree blocks. This is very similar to the per-ag freespace btree
implementation, and uses the AGFL for allocation and freeing of
blocks.

Adapt the rmap btree to store owner offsets within each rmap record,
and to handle the primary key being redefined as the tuple
[agblk, owner, offset].  The expansion of the primary key is crucial
to allowing multiple owners per extent.

[darrick: adapt the btree ops to deal with offsets]
[darrick: remove init_rec_from_key]
[darrick: move unwritten bit to rm_offset]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: rmap btree requires more reserved free space
Darrick J. Wong [Wed, 3 Aug 2016 01:38:24 +0000 (11:38 +1000)]
xfs: rmap btree requires more reserved free space

Originally-From: Dave Chinner <dchinner@redhat.com>

The rmap btree is allocated from the AGFL, which means we have to
ensure ENOSPC is reported to userspace before we run out of free
space in each AG. The last allocation in an AG can cause a full
height rmap btree split, and that means we have to reserve at least
this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
Update the various space calculation functions to handle this.

Also, because the macros are now executing conditional code and are
called quite frequently, convert them to functions that initialise
variables in the struct xfs_mount, use the new variables everywhere
and document the calculations better.

[darrick.wong@oracle.com: don't reserve blocks if !rmap]
[dchinner@redhat.com: update m_ag_max_usable after growfs]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: rmap btree transaction reservations
Darrick J. Wong [Wed, 3 Aug 2016 01:37:10 +0000 (11:37 +1000)]
xfs: rmap btree transaction reservations

The rmap btrees will use the AGFL as the block allocation source, so
we need to ensure that the transaction reservations reflect the fact
this tree is modified by allocation and freeing. Hence we need to
extend all the extent allocation/free reservations used in
transactions to handle this.

Note that this also gets rid of the unused XFS_ALLOCFREE_LOG_RES
macro, as we now do buffer reservations based on the number of
buffers logged via xfs_calc_buf_res(). Hence we only need the buffer
count calculation now.

[darrick: use rmap_maxlevels when calculating log block resv]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: add rmap btree growfs support
Darrick J. Wong [Wed, 3 Aug 2016 01:36:08 +0000 (11:36 +1000)]
xfs: add rmap btree growfs support

Originally-From: Dave Chinner <dchinner@redhat.com>

Now we can read and write rmap btree blocks, we can add support to
the growfs code to initialise new rmap btree blocks.

[darrick.wong@oracle.com: fill out the rmap offset fields]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: define the on-disk rmap btree format
Darrick J. Wong [Wed, 3 Aug 2016 01:36:07 +0000 (11:36 +1000)]
xfs: define the on-disk rmap btree format

Originally-From: Dave Chinner <dchinner@redhat.com>

Now we have all the surrounding call infrastructure in place, we can
start filling out the rmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for adding the
btree operations implementation.

[darrick: record owner and offset info in rmap btree]
[darrick: fork, bmbt and unwritten state in rmap btree]
[darrick: flags are a separate field in xfs_rmap_irec]
[darrick: calculate maxlevels separately]
[darrick: move the 'unwritten' bit into unused parts of rm_offset]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: introduce rmap extent operation stubs
Darrick J. Wong [Wed, 3 Aug 2016 01:33:43 +0000 (11:33 +1000)]
xfs: introduce rmap extent operation stubs

Originally-From: Dave Chinner <dchinner@redhat.com>

Add the stubs into the extent allocation and freeing paths that the
rmap btree implementation will hook into. While doing this, add the
trace points that will be used to track rmap btree extent
manipulations.

[darrick.wong@oracle.com: Extend the stubs to take full owner info.]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: add owner field to extent allocation and freeing
Darrick J. Wong [Wed, 3 Aug 2016 01:33:42 +0000 (11:33 +1000)]
xfs: add owner field to extent allocation and freeing

For the rmap btree to work, we have to feed the extent owner
information to the the allocation and freeing functions. This
information is what will end up in the rmap btree that tracks
allocated extents. While we technically don't need the owner
information when freeing extents, passing it allows us to validate
that the extent we are removing from the rmap btree actually
belonged to the owner we expected it to belong to.

We also define a special set of owner values for internal metadata
that would otherwise have no owner. This allows us to tell the
difference between metadata owned by different per-ag btrees, as
well as static fs metadata (e.g. AG headers) and internal journal
blocks.

There are also a couple of special cases we need to take care of -
during EFI recovery, we don't actually know who the original owner
was, so we need to pass a wildcard to indicate that we aren't
checking the owner for validity. We also need special handling in
growfs, as we "free" the space in the last AG when extending it, but
because it's new space it has no actual owner...

While touching the xfs_bmap_add_free() function, re-order the
parameters to put the struct xfs_mount first.

Extend the owner field to include both the owner type and some sort
of index within the owner.  The index field will be used to support
reverse mappings when reflink is enabled.

When we're freeing extents from an EFI, we don't have the owner
information available (rmap updates have their own redo items).
xfs_free_extent therefore doesn't need to do an rmap update. Make
sure that the log replay code signals this correctly.

This is based upon a patch originally from Dave Chinner. It has been
extended to add more owner information with the intent of helping
recovery operations when things go wrong (e.g. offset of user data
block in a file).

[dchinner: de-shout the xfs_rmap_*_owner helpers]
[darrick: minor style fixes suggested by Christoph Hellwig]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: rmap btree add more reserved blocks
Darrick J. Wong [Wed, 3 Aug 2016 01:31:47 +0000 (11:31 +1000)]
xfs: rmap btree add more reserved blocks

Originally-From: Dave Chinner <dchinner@redhat.com>

XFS reserves a small amount of space in each AG for the minimum
number of free blocks needed for operation. Adding the rmap btree
increases the number of reserved blocks, but it also increases the
complexity of the calculation as the free inode btree is optional
(like the rmbt).

Rather than calculate the prealloc blocks every time we need to
check it, add a function to calculate it at mount time and store it
in the struct xfs_mount, and convert the XFS_PREALLOC_BLOCKS macro
just to use the xfs-mount variable directly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: add rmap btree stats infrastructure
Darrick J. Wong [Wed, 3 Aug 2016 01:31:11 +0000 (11:31 +1000)]
xfs: add rmap btree stats infrastructure

Originally-From: Dave Chinner <dchinner@redhat.com>

The rmap btree will require the same stats as all the other generic
btrees, so add all the code for that now.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: introduce rmap btree definitions
Darrick J. Wong [Wed, 3 Aug 2016 01:30:32 +0000 (11:30 +1000)]
xfs: introduce rmap btree definitions

Originally-From: Dave Chinner <dchinner@redhat.com>

Add new per-ag rmap btree definitions to the per-ag structures. The
rmap btree will sit in the empty slots on disk after the free space
btrees, and hence form a part of the array of space management
btrees. This requires the definition of the btree to be contiguous
with the free space btrees.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: increase XFS_BTREE_MAXLEVELS to fit the rmapbt
Darrick J. Wong [Wed, 3 Aug 2016 01:29:42 +0000 (11:29 +1000)]
xfs: increase XFS_BTREE_MAXLEVELS to fit the rmapbt

By my calculations, a 1,073,741,824 block AG with a 1k block size
can attain a maximum height of 9.  Assuming a record size of 24
bytes, a key/ptr size of 44 bytes, and half-full btree nodes, we'd
need 53,687,092 blocks for the records and ~6 million blocks for the
keys.  That requires a btree of height 9 based on the following
derivation:

Block size = 1024b
sblock CRC header = 56b
== 1024-56 = 968 bytes for tree data

rmapbt record = 24b
== 40 records per leaf block

rmapbt ptr/key = 44b
== 22 ptr/keys per block

Worst case, each block is half full, so 20 records and 11 ptrs per block.

1073741824 rmap records / 20 records per block
== 53687092 leaf blocks

53687092 leaves / 11 ptrs per block
== 4880645 level 1 blocks
== 443695 level 2 blocks
== 40336 level 3 blocks
== 3667 level 4 blocks
== 334 level 5 blocks
== 31 level 6 blocks
== 3 level 7 blocks
== 1 level 8 block

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: add tracepoints and error injection for deferred extent freeing
Darrick J. Wong [Wed, 3 Aug 2016 01:26:33 +0000 (11:26 +1000)]
xfs: add tracepoints and error injection for deferred extent freeing

Add a couple of tracepoints for the deferred extent free operation and
a site for injecting errors while finishing the operation.  This makes
it easier to debug deferred ops and test log redo.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: refactor redo intent item processing
Darrick J. Wong [Wed, 3 Aug 2016 01:23:49 +0000 (11:23 +1000)]
xfs: refactor redo intent item processing

Refactor the EFI intent item recovery (and cancellation) functions
into a general function that scans the AIL and an intent item type
specific handler.  Move the function that recovers a single EFI item
into the extent free item code.  We'll want the generalized function
when we start wiring up more redo item types.

Furthermore, ensure that log recovery only replays the redo items
that were in the AIL prior to recovery by checking the item LSN
against the largest LSN seen during log scanning.  As written this
should never happen, but we can be defensive anyway.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: rename flist/free_list to dfops
Darrick J. Wong [Wed, 3 Aug 2016 01:19:29 +0000 (11:19 +1000)]
xfs: rename flist/free_list to dfops

Mechanical change of flist/free_list to dfops, since they're now
deferred ops, not just a freeing list.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: change xfs_bmap_{finish,cancel,init,free} -> xfs_defer_*
Darrick J. Wong [Wed, 3 Aug 2016 01:18:10 +0000 (11:18 +1000)]
xfs: change xfs_bmap_{finish,cancel,init,free} -> xfs_defer_*

Drop the compatibility shims that we were using to integrate the new
deferred operation mechanism into the existing code.  No new code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: rework xfs_bmap_free callers to use xfs_defer_ops
Darrick J. Wong [Wed, 3 Aug 2016 01:15:38 +0000 (11:15 +1000)]
xfs: rework xfs_bmap_free callers to use xfs_defer_ops

Restructure everything that used xfs_bmap_free to use xfs_defer_ops
instead.  For now we'll just remove the old symbols and play some
cpp magic to make it work; in the next patch we'll actually rename
everything.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: enable the xfs_defer mechanism to process extents to free
Darrick J. Wong [Wed, 3 Aug 2016 01:14:35 +0000 (11:14 +1000)]
xfs: enable the xfs_defer mechanism to process extents to free

Connect the xfs_defer mechanism with the pieces that we'll need to
handle deferred extent freeing.  We'll wire up the existing code to
our new deferred mechanism later.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: clean up typedef usage in the EFI/EFD handling code
Darrick J. Wong [Wed, 3 Aug 2016 01:13:47 +0000 (11:13 +1000)]
xfs: clean up typedef usage in the EFI/EFD handling code

Replace structure typedefs with struct xfs_foo_* in the EFI/EFD
handling code in preparation to move it over to deferred ops.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: add tracepoints for the deferred ops mechanism
Darrick J. Wong [Wed, 3 Aug 2016 01:13:02 +0000 (11:13 +1000)]
xfs: add tracepoints for the deferred ops mechanism

Add tracepoints for the internals of the deferred ops mechanism
and tracepoint classes for clients of the dops, to make debugging
easier.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
7 years agoxfs: move deferred operations into a separate file
Darrick J. Wong [Wed, 3 Aug 2016 01:12:25 +0000 (11:12 +1000)]
xfs: move deferred operations into a separate file

All the code around struct xfs_bmap_free basically implements a
deferred operation framework through which we can roll transactions
(to unlock buffers and avoid violating lock order rules) while
managing all the necessary log redo items.  Previously we only used
this code to free extents after some sort of mapping operation, but
with the advent of rmap and reflink, we suddenly need to do more than
that.

With that in mind, xfs_bmap_free really becomes a deferred ops control
structure.  Rename the structure and move the deferred ops into their
own file to avoid further bloating of the bmap code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>