cascardo/linux.git
8 years agoIB/ipoib: Suppress warning for send only join failures
Jason Gunthorpe [Fri, 21 Aug 2015 23:34:13 +0000 (17:34 -0600)]
IB/ipoib: Suppress warning for send only join failures

We expect send only joins to fail, it just means there are no listeners
for the group. The correct thing to do is silently drop the packet
at source.

Eg avahi will full join 224.0.0.251 which causes a send only IGMP packet
to 224.0.0.22, and then a warning level kmessage like this:

 ib0: sendonly multicast join failed for ff12:401b:ffff:0000:0000:0000:0000:0016, status -22

If there is no IP router listening to IGMP.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/ipoib: Clean up send-only multicast joins
Doug Ledford [Thu, 3 Sep 2015 21:05:58 +0000 (17:05 -0400)]
IB/ipoib: Clean up send-only multicast joins

Even though we don't expect the group to be created by the SM we
sill need to provide all the parameters to force the SM to validate
they are correct.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Fix possible protection fault
Sagi Grimberg [Mon, 24 Aug 2015 16:04:51 +0000 (19:04 +0300)]
IB/srp: Fix possible protection fault

srp_destroy_qp is designed to indicate we are safe to continue with
freeing the channel resources by modifying the qp error state,
posting a dummy wr on the queue-pair and waiting for it to flush.
This also holds for the channel registration pool as we are unmapping
the memory region when handling a scsi response. Destroying the
channel registration pool before we make sure we processed all the
inflight IO might introduce a use-after-free of the registration pool.

This use-after-free is demonstrated in the stack trace below where
srp is trying to unmap a used FMR after the fmr_pool was already destroyed.

general protection fault: 0000 [#1] SMP
RIP: 0010:[<ffffffff8151121b>]  [<ffffffff8151121b>] _raw_spin_lock_irqsave+0x1b/0x50
Call Trace:
 [<ffffffffa055d88a>] ib_fmr_pool_unmap+0x1a/0xb0 [ib_core]
 [<ffffffffa06c00ed>] srp_unmap_data.isra.28+0x17d/0x250 [ib_srp]
 [<ffffffffa06c01eb>] srp_free_req+0x2b/0x60 [ib_srp]
 [<ffffffffa06c0c94>] srp_recv_completion+0x174/0x580 [ib_srp]
 [<ffffffffa04580fe>] mlx4_eq_int+0x4de/0xe50 [mlx4_core]
 [<ffffffffa0458b00>] mlx4_msi_x_interrupt+0x10/0x20 [mlx4_core]
 [<ffffffff810abc45>] handle_irq_event_percpu+0x35/0x1b0
 [<ffffffff810abdf2>] handle_irq_event+0x32/0x50
 [<ffffffff810ae5cf>] handle_edge_irq+0x6f/0x120
 [<ffffffff8100455a>] handle_irq+0x1a/0x30
 [<ffffffff8151b475>] do_IRQ+0x45/0xb0
 [<ffffffff8151162d>] common_interrupt+0x6d/0x6d
 [<ffffffff813e4d2f>] cpuidle_enter_state+0x4f/0xc0
 [<ffffffff813e4e6c>] cpuidle_idle_call+0xcc/0x210
 [<ffffffff8100b9ea>] arch_cpu_idle+0xa/0x30
 [<ffffffff810ab1e1>] cpu_startup_entry+0xe1/0x270
 [<ffffffff81030b3a>] start_secondary+0x21a/0x2c0

Reported-by: Eliott Kespi <eliottk@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Move SM class defines from ib_mad.h to ib_smi.h
Ira Weiny [Thu, 3 Sep 2015 18:16:30 +0000 (14:16 -0400)]
IB/core: Move SM class defines from ib_mad.h to ib_smi.h

When the hfi1 driver was added these definitions were moved from the qib driver
to ib_mad.h to be used by both qib and hfi1.  They should have been moved to
ib_smi.h instead.

Fixes: d4ab347005fb ("IB/core: Add core header changes needed for OPA")
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Remove unnecessary defines from ib_mad.h
Ira Weiny [Wed, 2 Sep 2015 22:45:54 +0000 (18:45 -0400)]
IB/core: Remove unnecessary defines from ib_mad.h

Remove the unused IB_NOTICE_REPRESS_* defines.

When the hfi1 driver was added these definitions were moved from the qib driver
to ib_mad.h.  They should have been removed instead.

Fixes: d4ab347005fb ("IB/core: Add core header changes needed for OPA")
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Add PSM2 user space header to header_install
Ira Weiny [Wed, 2 Sep 2015 22:46:21 +0000 (18:46 -0400)]
IB/hfi1: Add PSM2 user space header to header_install

When the hfi1 driver was added a user space header file (hfi1_user.h) was added
to be shared between PSM2 and the driver.  However, the file was not added to
the header install.  Add it now.

Fixes: d4ab347005fb ("IB/core: Add core header changes needed for OPA")
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY
Jubin John [Wed, 2 Sep 2015 14:43:24 +0000 (10:43 -0400)]
IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY

3 CSRs needed by the CONFIG_SDMA_VERBOSITY code were removed during
the CSR clean up. Adding these CSRs back to resolve 0-day build failure:
https://lists.01.org/pipermail/kbuild-all/2015-August/011919.html

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agomlx5: Fix incorrect wc pkey_index assignment for GSI messages
Sagi Grimberg [Wed, 2 Sep 2015 19:23:04 +0000 (22:23 +0300)]
mlx5: Fix incorrect wc pkey_index assignment for GSI messages

Since patch series "Demux IB CM requests in the rdma_cm module" the
P_Key index is taken from the work completion rather than the message
itself.

The HCA provides us with the message P_Key. In order to provide the
P_Key index, we need to look it up. Given that this is relevant only
for GSI messages (session establishments) which is less performance critical,
micro-optimize against the GSI (is_qp1) branch.

Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to
RDMA CM")
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow
Haggai Eran [Tue, 1 Sep 2015 06:56:56 +0000 (09:56 +0300)]
IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow

The mlx5_ib_reg_user_mr() function will attempt to call clean_mr() in
its error flow even though there is never a case where the error flow
occurs with a valid MR pointer to destroy.

Remove the clean_mr() call and the incorrect comment above it.

Fixes: b4cfe447d47b ("IB/mlx5: Implement on demand paging by adding
support for MMU notifiers")
Cc: Eli Cohen <eli@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/uverbs: reject invalid or unknown opcodes
Christoph Hellwig [Wed, 26 Aug 2015 09:00:37 +0000 (11:00 +0200)]
IB/uverbs: reject invalid or unknown opcodes

We have many WR opcodes that are only supported in kernel space
and/or require optional information to be copied into the WR
structure.  Reject all those not explicitly handled so that we
can't pass invalid information to drivers.

Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cxgb4: Fix if statement in pick_local_ip6adddrs
Nicholas Krause [Thu, 27 Aug 2015 03:00:59 +0000 (23:00 -0400)]
IB/cxgb4: Fix if statement in pick_local_ip6adddrs

This fixes an if statement checking the return value of the function
get_lladdr for success in the function pick_local_ip6addrs to instead
of directly checking the return value of this call check the opposite
as get_lladdr returns zero for success which would incorrectly make
this if statement block not execute with the current if statement
check.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/sa: Fix rdma netlink message flags
Kaike Wan [Thu, 20 Aug 2015 18:20:42 +0000 (14:20 -0400)]
IB/sa: Fix rdma netlink message flags

The flags to ibnl_put_msg should be NLM_F_REQUEST instead of GFP_KERNEL.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/ucma: HW Device hot-removal support
Yishai Hadas [Thu, 13 Aug 2015 15:32:07 +0000 (18:32 +0300)]
IB/ucma: HW Device hot-removal support

Currently, IB/cma remove_one flow blocks until all user descriptor managed by
IB/ucma are released. This prevents hot-removal of IB devices. This patch
allows IB/cma to remove devices regardless of user space activity. Upon getting
the RDMA_CM_EVENT_DEVICE_REMOVAL event we close all the underlying HW resources
for the given ucontext. The ucontext itself is still alive till its explicit
destroying by its creator.

Running applications at that time will have some zombie device, further
operations may fail.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4_ib: Disassociate support
Yishai Hadas [Thu, 13 Aug 2015 15:32:06 +0000 (18:32 +0300)]
IB/mlx4_ib: Disassociate support

Implements the IB core disassociate_ucontext API. The driver detaches the HW
resources for a given user context to prevent a dependency between application
termination and device disconnecting. This is done by managing the VMAs that
were mapped to the HW bars such as door bell and blueflame. When need to detach
remap them to an arbitrary kernel page returned by the zap API.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/uverbs: Enable device removal when there are active user space applications
Yishai Hadas [Thu, 13 Aug 2015 15:32:05 +0000 (18:32 +0300)]
IB/uverbs: Enable device removal when there are active user space applications

Enables the uverbs_remove_one to succeed despite the fact that there are
running IB applications working with the given ib device.  This
functionality enables a HW device to be unbind/reset despite the fact that
there are running user space applications using it.

It exposes a new IB kernel API named 'disassociate_ucontext' which lets
a driver detaching its HW resources from a given user context without
crashing/terminating the application. In case a driver implemented the
above API and registered with ib_uverb there will be no dependency between its
device to its uverbs_device. Upon calling remove_one of ib_uverbs the call
should return after disassociating the open HW resources without waiting to
clients disconnecting. In case driver didn't implement this API there will be no
change to current behaviour and uverbs_remove_one will return only when last
client has disconnected and reference count on uverbs device became 0.

In case the lower driver device was removed any application will
continue working over some zombie HCA, further calls will ended with an
immediate error.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/uverbs: Explicitly pass ib_dev to uverbs commands
Yishai Hadas [Thu, 13 Aug 2015 15:32:04 +0000 (18:32 +0300)]
IB/uverbs: Explicitly pass ib_dev to uverbs commands

Done in preparation for deploying RCU for the device removal
flow. Allows isolating the RCU handling to the uverb_main layer and
keeping the uverbs_cmd code as is.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/uverbs: Fix race between ib_uverbs_open and remove_one
Yishai Hadas [Thu, 13 Aug 2015 15:32:03 +0000 (18:32 +0300)]
IB/uverbs: Fix race between ib_uverbs_open and remove_one

Fixes: 2a72f212263701b927559f6850446421d5906c41 ("IB/uverbs: Remove dev_table")

Before this commit there was a device look-up table that was protected
by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When
it was dropped and container_of was used instead, it enabled the race
with remove_one as dev might be freed just after:
dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev) but
before the kref_get.

In addition, this buggy patch added some dead code as
container_of(x,y,z) can never be NULL and so dev can never be NULL.
As a result the comment above ib_uverbs_open saying "the open method
will either immediately run -ENXIO" is wrong as it can never happen.

The solution follows Jason Gunthorpe suggestion from below URL:
https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html

cdev will hold a kref on the parent (the containing structure,
ib_uverbs_device) and only when that kref is released it is
guaranteed that open will never be called again.

In addition, fixes the active count scheme to use an atomic
not a kref to prevent WARN_ON as pointed by above comment
from Jason.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/uverbs: Fix reference counting usage of event files
Yishai Hadas [Thu, 13 Aug 2015 15:32:02 +0000 (18:32 +0300)]
IB/uverbs: Fix reference counting usage of event files

Fix the reference counting usage to be handled in the event file
creation/destruction function, instead of being done by the caller.
This is done for both async/non-async event files.

Based on Jason Gunthorpe report at https://www.mail-archive.com/
linux-rdma@vger.kernel.org/msg24680.html:
"The existing code for this is broken, in ib_uverbs_get_context all
the error paths between ib_uverbs_alloc_event_file and the
kref_get(file->ref) are wrong - this will result in fput() which will
call ib_uverbs_event_close, which will try to do kref_put and
ib_unregister_event_handler - which are no longer paired."

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Make ib_dealloc_pd return void
Jason Gunthorpe [Wed, 5 Aug 2015 20:34:31 +0000 (14:34 -0600)]
IB/core: Make ib_dealloc_pd return void

The majority of callers never check the return value, and even if they
did, they can't do anything about a failure.

All possible failure cases represent a bug in the caller, so just
WARN_ON inside the function instead.

This fixes a few random errors:
 net/rd/iw.c infinite loops while it fails. (racing with EBUSY?)

This also lays the ground work to get rid of error return from the
drivers. Most drivers do not error, the few that do are broken since
it cannot be handled.

Since uverbs can legitimately make use of EBUSY, open code the
check.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Create an insecure all physical rkey only if needed
Bart Van Assche [Tue, 11 Aug 2015 00:09:36 +0000 (17:09 -0700)]
IB/srp: Create an insecure all physical rkey only if needed

The SRP initiator only needs this if the insecure register_always=N
performance optimization is enabled, or if FRWR/FMR is not supported
in the driver.

Do not create an all physical MR unless it is needed to support
either of those modes. Default register_always to true so the out of
the box configuration does not create an insecure all physical MR.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
[bvanassche: reworked and rebased this patch]
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Register the indirect data buffer descriptor
Bart Van Assche [Tue, 11 Aug 2015 00:09:05 +0000 (17:09 -0700)]
IB/srp: Register the indirect data buffer descriptor

Instead of always using the global rkey for the indirect data
buffer descriptor, register that descriptor with the HCA if
the kernel module parameter register_always has been set to Y.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Introduce srp_device.use_fmr
Bart Van Assche [Tue, 11 Aug 2015 00:08:44 +0000 (17:08 -0700)]
IB/srp: Introduce srp_device.use_fmr

Introduce the variable srp_device.use_fmr. Leave out the dev->has_fr /
dev->has_fmr and ch->fr_pool / ch->fmr_pool checks since these are
redundant. This patch does not change any functionality but makes the
source code easier to read.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Remove use_mr argument from srp_map_sg_entry()
Bart Van Assche [Tue, 11 Aug 2015 00:08:18 +0000 (17:08 -0700)]
IB/srp: Remove use_mr argument from srp_map_sg_entry()

Move the srp_map_desc() call from inside srp_map_sg_entry() to
srp_map_sg() such that the use_mr argument can be removed from
srp_map_sg_entry().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Remove the memory registration backtracking code
Bart Van Assche [Tue, 11 Aug 2015 00:07:46 +0000 (17:07 -0700)]
IB/srp: Remove the memory registration backtracking code

Mapping a discontiguous sg-list requires multiple memory regions
and hence can exhaust the memory region pool. The SRP initiator
already handles this by temporarily reducing the queue depth. This
means that it is safe to remove the memory registration backtracking
code. This patch has been tested with direct I/O sizes up to 256 MB.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Add memory descriptor array pointer range checking
Bart Van Assche [Tue, 11 Aug 2015 00:07:27 +0000 (17:07 -0700)]
IB/srp: Add memory descriptor array pointer range checking

Although most paths through which a request is submitted check
block layer parameters like the max_segments limit, these are
not checked when an SG_IO or direct I/O request is submitted.
Hence add a range check for the memory descriptor array pointer.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Use multiple registrations for large memory regions
Bart Van Assche [Tue, 11 Aug 2015 00:06:57 +0000 (17:06 -0700)]
IB/srp: Use multiple registrations for large memory regions

Instead of using the global rkey for large memory regions, use
multiple registrations. See also the while (dma_len) loop further
down in srp_map_sg_entry().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Re-enable FMR for non-page aligned buffers
Bart Van Assche [Tue, 11 Aug 2015 00:06:29 +0000 (17:06 -0700)]
IB/srp: Re-enable FMR for non-page aligned buffers

During a discussion in 2011 nobody recalled why FMR was not used for
non-page aligned buffers (see also
http://thread.gmane.org/gmane.linux.drivers.rdma/7149). Re-enable FMR
for such buffers. For the reason why the srp_map_fmr() function needs
to be modified, see also patch "IB/srp: rework mapping engine to use
multiple FMR entries" (commit ID 8f26c9ff9cd0; January 2011).

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agords/ib: Remove ib_get_dma_mr calls
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:26 +0000 (17:22 -0600)]
rds/ib: Remove ib_get_dma_mr calls

The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/9p: Remove ib_get_dma_mr calls
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:25 +0000 (17:22 -0600)]
net/9p: Remove ib_get_dma_mr calls

The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoib_srpt: Remove ib_get_dma_mr calls
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:24 +0000 (17:22 -0600)]
ib_srpt: Remove ib_get_dma_mr calls

The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Use pd->local_dma_lkey
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:22 +0000 (17:22 -0600)]
IB/srp: Use pd->local_dma_lkey

Replace all leys with  pd->local_dma_lkey. This driver does not support
iWarp, so this is safe.

The insecure use of ib_get_dma_mr is thus isolated to an rkey, and will
have to be fixed separately.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoiser-target: Remove ib_get_dma_mr calls
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:21 +0000 (17:22 -0600)]
iser-target: Remove ib_get_dma_mr calls

The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Use pd->local_dma_lkey
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:20 +0000 (17:22 -0600)]
IB/iser: Use pd->local_dma_lkey

Replace all leys with  pd->local_dma_lkey. This driver does not support
iWarp, so this is safe.

The insecure use of ib_get_dma_mr is thus isolated to an rkey, and this
looks trivially fixed by forcing the use of registration in a future
patch.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Remove ib_get_dma_mr calls
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:19 +0000 (17:22 -0600)]
IB/mlx5: Remove ib_get_dma_mr calls

The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Remove ib_get_dma_mr calls
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:18 +0000 (17:22 -0600)]
IB/mlx4: Remove ib_get_dma_mr calls

The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/ipoib: Remove ib_get_dma_mr calls
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:17 +0000 (17:22 -0600)]
IB/ipoib: Remove ib_get_dma_mr calls

The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mad: Remove ib_get_dma_mr calls
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:16 +0000 (17:22 -0600)]
IB/mad: Remove ib_get_dma_mr calls

The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Guarantee that a local_dma_lkey is available
Jason Gunthorpe [Wed, 5 Aug 2015 20:14:45 +0000 (14:14 -0600)]
IB/core: Guarantee that a local_dma_lkey is available

Every single ULP requires a local_dma_lkey to do anything with
a QP, so let us ensure one exists for every PD created.

If the driver can supply a global local_dma_lkey then use that, otherwise
ask the driver to create a local use all physical memory MR associated
with the new PD.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Acked-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Chain all iser transaction send work requests
Sagi Grimberg [Thu, 6 Aug 2015 15:33:06 +0000 (18:33 +0300)]
IB/iser: Chain all iser transaction send work requests

Chaning of send work requests benefits performance by
reducing the send queue lock contention (acquired in
ib_post_send) and saves us HW doorbells which is posted
only once.

Currently, in normal IO flows iser does not chain the CDB send
work request with the registration work request. Also in PI
flows, signature work requests are not chained as well.

Lets chain those and post only once.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Add debug prints to the various memory registration methods
Sagi Grimberg [Thu, 6 Aug 2015 15:33:05 +0000 (18:33 +0300)]
IB/iser: Add debug prints to the various memory registration methods

Easier to debug when we have the registration details.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Support up to 8MB data transfer in a single command
Sagi Grimberg [Thu, 6 Aug 2015 15:33:04 +0000 (18:33 +0300)]
IB/iser: Support up to 8MB data transfer in a single command

iser support up to 512KB data transfer in a single scsi command.
This means that larger IOs will split to different request. While
iser can easily saturate FDR/EDR wires, some arrays are fine tuned
for 1MB (or larger) IO sizes, hence add an option to support larger
transfers (up to 8MB) if the device allows it.

Given that a few target implementations don't support data transfers
of more than 512KB by default and the fact that larger IO sizes require
more resources, we introduce a module parameter to determine the
maximum number of 512B sectors in a single scsi command.
Users that are interested in larger transfers can change this value given
that the target supports larger transfers.

At the moment, iser works in 4K pages granularity, In a later stage
we will get it to work with system page size instead.

IO operations that consists of N pages will need a page vector
of size N+1 in case the first SG element contains an offset. Given
that some devices allocates memory regions in powers of 2, this
means that allocating a region with N+1 pages, will result in
region resources allocation of the next power of 2. Since we don't
want that to happen, in case we are in the limit of IO size supported
and the first SG element has an offset, we align the SG list using a
bounce buffer (which is OK given that this is not likely to happen a lot).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Pass registration pool a size parameter
Sagi Grimberg [Thu, 6 Aug 2015 15:33:03 +0000 (18:33 +0300)]
IB/iser: Pass registration pool a size parameter

Hard coded for now. This will allow to allocate different
sized MRs depending on the IO size needed (and device
capabilities).

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Unify fast memory registration flows
Sagi Grimberg [Thu, 6 Aug 2015 15:33:02 +0000 (18:33 +0300)]
IB/iser: Unify fast memory registration flows

iser_reg_rdma_mem_[fastreg|fmr] share a lot of code, and
logically do the same thing other than the buffer registration
method itself (iser_fast_reg_mr vs. iser_fast_reg_fmr).
The DIF logic is not implemented in the FMR flow as there is no
existing device that supports FMRs and Signature feature.

This patch unifies the flow in a single routine iser_reg_rdma_mem
and just split to fmr/frwr for the buffer registration itself.

Also, for symmetry reasons, unify iser_unreg_rdma_mem (which will
call the relevant device specific unreg routine).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Make reg_desc_get a per device routine
Sagi Grimberg [Thu, 6 Aug 2015 15:33:01 +0000 (18:33 +0300)]
IB/iser: Make reg_desc_get a per device routine

As for fmrs we will hold a single registration descriptor
as no need for multiple like in the frwr mode (descriptor
for each task). This change helps unifying the duplicate
registration code paths.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Rename iser_reg_page_vec to iser_fast_reg_fmr
Sagi Grimberg [Thu, 6 Aug 2015 15:33:00 +0000 (18:33 +0300)]
IB/iser: Rename iser_reg_page_vec to iser_fast_reg_fmr

Also, change a name of a local variable.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Maintain connection fmr_pool under a single registration descriptor
Adir Lev [Thu, 6 Aug 2015 15:32:59 +0000 (18:32 +0300)]
IB/iser: Maintain connection fmr_pool under a single registration descriptor

This will allow us to unify the memory registration code path between
the various methods which vary by the device capabilities. This change
will make it easier and less intrusive to remove fmr_pools from the
code when we'd want to.

The reason we use a single descriptor is to avoid taking a
redundant spinlock when working with FMRs.

We also change the signature of iser_reg_page_vec to make it match
iser_fast_reg_mr (and the future indirect registration method).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Introduce iser registration pool struct
Sagi Grimberg [Thu, 6 Aug 2015 15:32:58 +0000 (18:32 +0300)]
IB/iser: Introduce iser registration pool struct

Instead of having it a part of the connection structure,
have it be under a dedicated (embedded) structure in the
connection. A logical separation of the registration pool
and the connection structure.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Move fastreg descriptor allocation to iser_create_fastreg_desc
Sagi Grimberg [Thu, 6 Aug 2015 15:32:57 +0000 (18:32 +0300)]
IB/iser: Move fastreg descriptor allocation to iser_create_fastreg_desc

Don't have the caller allocate the structure and worry about
freeing it in case the routine failed.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Introduce iser_reg_ops
Sagi Grimberg [Thu, 6 Aug 2015 15:32:56 +0000 (18:32 +0300)]
IB/iser: Introduce iser_reg_ops

Move all the per-device function pointers to an easy
extensible iser_reg_ops structure that contains all
the iser registration operations.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Remove dead code in fmr_pool alloc/free
Sagi Grimberg [Thu, 6 Aug 2015 15:32:55 +0000 (18:32 +0300)]
IB/iser: Remove dead code in fmr_pool alloc/free

In the past the we always tried to allocate an fmr_pool
and if it failed on ENOSYS (not supported) then we continued
with dma mr. This is not the case anymore and if we tried to
allocate an fmr_pool then it is supported and we expect to succeed.

Also, the check if fmr_pool is allocated when free is called is
redundant as well as we are guaranteed it exists.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Rename struct fast_reg_descriptor -> iser_fr_desc
Sagi Grimberg [Thu, 6 Aug 2015 15:32:54 +0000 (18:32 +0300)]
IB/iser: Rename struct fast_reg_descriptor -> iser_fr_desc

Avoid struct names without iser_ prefix.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Introduce struct iser_reg_resources
Sagi Grimberg [Thu, 6 Aug 2015 15:32:53 +0000 (18:32 +0300)]
IB/iser: Introduce struct iser_reg_resources

Have fast_reg_descriptor hold struct iser_reg_resources
(mr, frpl, valid flag). This will be useful when the
actual buffer registration routines will be passed with
the needed registration resources (i.e. iser_reg_resources)
without being aware of their nature (i.e. data or protection).

In order to achieve this, we remove reg_indicators flags container
and place specific flags (mr_valid) within iser_reg_resources struct.
We also place the sig_mr_valid and sig_protcted flags in iser_pi_context.

This patch also modifies iser_fast_reg_mr to receive the
reg_resources instead of the fast_reg_descriptor and a data/protection
indicator.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Remove an unneeded print for unaligned memory
Sagi Grimberg [Thu, 6 Aug 2015 15:32:52 +0000 (18:32 +0300)]
IB/iser: Remove an unneeded print for unaligned memory

We can do it in iser_aligned_data_len instead and
it will save us an argument that is passed to
fall_to_counce_buf just for the print.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Remove a redundant always-false condition
Sagi Grimberg [Thu, 6 Aug 2015 15:32:51 +0000 (18:32 +0300)]
IB/iser: Remove a redundant always-false condition

We always call iser_initialize_task_headers() and set
the header tx_sg.lkey to the device mr lkey, so no
point in checking it in iser_create_send_desc().

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Fix possible bogus DMA unmapping
Sagi Grimberg [Thu, 6 Aug 2015 15:32:50 +0000 (18:32 +0300)]
IB/iser: Fix possible bogus DMA unmapping

If iser_initialize_task_headers() routine failed before
dma mapping, we should not attempt to unmap in cleanup_task().

Fixes: 7414dde0a6c3a958e (IB/iser: Fix race between iser connection ...)
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Get rid of un-maintained counters
Sagi Grimberg [Thu, 6 Aug 2015 15:32:49 +0000 (18:32 +0300)]
IB/iser: Get rid of un-maintained counters

We don't update those anywhere in the code and they
seem pretty useless (no one seem to care about those).

qp_tx_queue_full: We never should get this
fmr_map_not_avail: We can never get to this
eh_abort_cnt: We don't monitor aborts

Go ahead and remove them.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Fix missing return status check in iser_send_data_out
Sagi Grimberg [Thu, 6 Aug 2015 15:32:48 +0000 (18:32 +0300)]
IB/iser: Fix missing return status check in iser_send_data_out

Since commit "IB/iser: Fix race between iser connection teardown..."
iser_initialize_task_headers() might fail, so we need to check that.

Fixes: 7414dde0a6c3a958e (IB/iser: Fix race between iser connection ...)
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Remove '.' from log message
Sagi Grimberg [Thu, 6 Aug 2015 15:32:47 +0000 (18:32 +0300)]
IB/iser: Remove '.' from log message

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Change minor assignments and logging prints
Sagi Grimberg [Thu, 6 Aug 2015 15:32:46 +0000 (18:32 +0300)]
IB/iser: Change minor assignments and logging prints

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Change some module parameters to be RO
Jenny Falkovich [Thu, 6 Aug 2015 15:32:45 +0000 (18:32 +0300)]
IB/iser: Change some module parameters to be RO

While we're at it, use permission defines instead
of octal values and rearrange a little bit.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/sa: Route SA pathrecord query through netlink
Kaike Wan [Fri, 14 Aug 2015 12:52:09 +0000 (08:52 -0400)]
IB/sa: Route SA pathrecord query through netlink

This patch routes a SA pathrecord query to netlink first and processes the
response appropriately. If a failure is returned, the request will be sent
through IB. The decision whether to route the request to netlink first is
determined by the presence of a listener for the local service netlink
multicast group. If the user-space local service netlink multicast group
listener is not present, the request will be sent through IB, just like
what is currently being done.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/sa: Allocate SA query with kzalloc
Kaike Wan [Fri, 14 Aug 2015 12:52:08 +0000 (08:52 -0400)]
IB/sa: Allocate SA query with kzalloc

Replace kmalloc with kzalloc so that all uninitialized fields in SA query
will be zero-ed out to avoid unintentional consequence. This prepares the
SA query structure to accept new fields in the future.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Add rdma netlink helper functions
Kaike Wan [Fri, 14 Aug 2015 12:52:07 +0000 (08:52 -0400)]
IB/core: Add rdma netlink helper functions

This patch adds a function to check if listeners for a netlink multicast
group are present. It also adds a function to receive netlink response
messages.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/netlink: Add defines for local service requests through netlink
Kaike Wan [Fri, 14 Aug 2015 12:52:06 +0000 (08:52 -0400)]
IB/netlink: Add defines for local service requests through netlink

This patch adds netlink defines for local service client, local service
group, local service operations, and related attributes.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Stop the scsi_eh_<n> and scsi_tmf_<n> threads if login fails
Bart Van Assche [Fri, 14 Aug 2015 18:01:09 +0000 (11:01 -0700)]
IB/srp: Stop the scsi_eh_<n> and scsi_tmf_<n> threads if login fails

scsi_host_alloc() not only allocates memory for a SCSI host but also
creates the scsi_eh_<n> kernel thread and the scsi_tmf_<n> workqueue.
Stop these threads if login fails by calling scsi_host_put().

Reported-by: Konstantin Krotov <kkv@clodo.ru>
Fixes: fb49c8bbaae7 ("Remove an extraneous scsi_host_put() from an error path")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: <stable@vger.kernel.org> #v3.19
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Bump driver version and release date
Bart Van Assche [Fri, 31 Jul 2015 21:13:52 +0000 (14:13 -0700)]
IB/srp: Bump driver version and release date

Since version 1.0 e.g. scsi-mq has been added. Since this is
a significant change, bump the driver version and release date.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Handle partial connection success correctly
Bart Van Assche [Fri, 31 Jul 2015 21:13:22 +0000 (14:13 -0700)]
IB/srp: Handle partial connection success correctly

Avoid that the following kernel warning is reported if the SRP
target system accepts fewer channels per connection than what
was requested by the initiator system:

WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:617 srp_destroy_qp+0xb1/0x120 [ib_srp]()
Call Trace:
[<ffffffff8105d67f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8105d6da>] warn_slowpath_null+0x1a/0x20
[<ffffffffa05419e1>] srp_destroy_qp+0xb1/0x120 [ib_srp]
[<ffffffffa05445fb>] srp_create_ch_ib+0x19b/0x420 [ib_srp]
[<ffffffffa0545257>] srp_create_target+0x7d7/0xa94 [ib_srp]
[<ffffffff8138dac0>] dev_attr_store+0x20/0x30
[<ffffffff812079ef>] sysfs_write_file+0xef/0x170
[<ffffffff81191fc4>] vfs_write+0xb4/0x130
[<ffffffff8119276f>] sys_write+0x5f/0xa0
[<ffffffff815a0a59>] system_call_fastpath+0x16/0x1b

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Constify a function argument
Bart Van Assche [Fri, 31 Jul 2015 21:12:48 +0000 (14:12 -0700)]
IB/srp: Constify a function argument

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Fix incorrect cq flushing in error state
Ariel Nahum [Sun, 9 Aug 2015 08:16:27 +0000 (11:16 +0300)]
IB/mlx4: Fix incorrect cq flushing in error state

When handling a device internal error, the driver is responsible to
drain the completion queue with flush errors.

In case a completion queue was assigned to multiple send queues, the
driver iterates over the send queues and generates flush errors of
inflight wqes. The driver must correctly pass the wc array with an
offset as a result of the previous send queue iteration. Not doing so
will overwrite previously set completions and return a wrong number
of polled completions which includes ones which were not correctly set.

Fixes: 35f05dabf95a (IB/mlx4: Reset flow support for IB kernel ULPs)
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Use correct SL on AH query under RoCE
Noa Osherovich [Thu, 30 Jul 2015 14:34:24 +0000 (17:34 +0300)]
IB/mlx4: Use correct SL on AH query under RoCE

The mlx4 IB driver implementation for ib_query_ah used a wrong offset
(28 instead of 29) when link type is Ethernet. Fixed to use the correct one.

Fixes: fa417f7b520e ('IB/mlx4: Add support for IBoE')
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Forbid using sysfs to change RoCE pkeys
Jack Morgenstein [Thu, 30 Jul 2015 14:34:23 +0000 (17:34 +0300)]
IB/mlx4: Forbid using sysfs to change RoCE pkeys

The pkey mapping for RoCE must remain the default mapping:
VFs:
  virtual index 0 = mapped to real index 0 (0xFFFF)
  All others indices: mapped to a real pkey index containing an
                      invalid pkey.
PF:
  virtual index i = real index i.

Don't allow users to change these mappings using files found in
sysfs.

Fixes: c1e7e466120b ('IB/mlx4: Add iov directory in sysfs under the ib device')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Demote mcg message from warning to debug
Jack Morgenstein [Thu, 30 Jul 2015 14:34:22 +0000 (17:34 +0300)]
IB/mlx4: Demote mcg message from warning to debug

The mcg "too many pending requests" warning message fills the log
when OpenSM is downed. Demote the message from  warning level to
debug level.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Fix potential deadlock when sending mad to wire
Jack Morgenstein [Thu, 30 Jul 2015 14:34:21 +0000 (17:34 +0300)]
IB/mlx4: Fix potential deadlock when sending mad to wire

send_mad_to_wire takes the same spinlock that is taken in
the interrupt context.  Therefore, it needs irqsave/restore.

Fixes: b9c5d6a64358 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Remove needless bracketization
Doug Ledford [Sat, 15 Aug 2015 14:16:14 +0000 (10:16 -0400)]
IB/core: Remove needless bracketization

Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/ocrdma: Incorporate the moving of GID Table mgmt to IB/Core
Somnath Kotur [Thu, 30 Jul 2015 15:33:31 +0000 (18:33 +0300)]
RDMA/ocrdma: Incorporate the moving of GID Table mgmt to IB/Core

1.Change query_gid hook to return value from IB/Core GID
  management APIs.
2.Get rid of all the netdev notifier chain subscription code as well
  as maintenance of SGID Table in memory.
3.Implement get_netdev hook in driver.

Signed-off-by: Somnath Kotur <somnath.kotur@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Replace mechanism for RoCE GID management
Moni Shoua [Thu, 30 Jul 2015 15:33:30 +0000 (18:33 +0300)]
IB/mlx4: Replace mechanism for RoCE GID management

Manage RoCE gid table with logic in IB/core, which is common to all
vendors, and remove the mechanism from the mlx4 IB driver.
Since management of the GID cache may lead to index mismatch with the
hardware GID table, a translation between indexes is required when
modifying a QP or creating an address handle.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Implement ib_device callbacks
Moni Shoua [Thu, 30 Jul 2015 15:33:29 +0000 (18:33 +0300)]
IB/mlx4: Implement ib_device callbacks

get_netdev: get the net_device on the physical port of the IB transport port. In
port aggregation mode it is required to return the netdev of the active port.

modify_gid: note for a change in the RoCE gid cache. Handle this by writing to
the harsware GID table. It is possible that indexes in cahce and hardware tables
won't match so a translation is required when modifying a QP or creating an
address handle.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/mlx4: Postpone the registration of net_device
Moni Shoua [Thu, 30 Jul 2015 15:33:28 +0000 (18:33 +0300)]
net/mlx4: Postpone the registration of net_device

The mlx4 network driver was registered in the context of the 'add'
function of the core driver (called when HW should be registered).
This makes the netdev event NETDEV_REGISTER to be sent in a context
where the answer to get_protocol_dev() callback returns NULL. This may
be confusing to listeners of netdev events.
This patch is a preparation to the patch that implements the
get_netdev() callback in the IB/mlx4 driver.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Add RoCE table bonding support
Matan Barak [Thu, 30 Jul 2015 15:33:27 +0000 (18:33 +0300)]
IB/core: Add RoCE table bonding support

Handling bonding and other devices require us to all all GIDs of the
net-devices which are upper-devices of the RoCE port related
net-device.

Active-backup configurations imposes even more challenges as the
default GID should only be set on the active devices (this is
necessary as otherwise the same MAC could be used for several
slaves and thus several slaves will have identical GIDs).

Managing these configurations are done by listening to:
(a) NETDEV_CHANGEUPPER event
(1) if a related net-device is linked, delete all inactive
    slaves default GIDs and add the upper device GIDs.
(2) if a related net-device is unlinked, delete all upper GIDs
    and add the default GIDs.
(b) NETDEV_BONDING_FAILOVER:
(1) delete the bond GIDs from inactive slaves
(2) delete the inactive slave's default GIDs
(3) Add the bond GIDs to the active slave.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: missing curly braces in ib_find_gid()
Dan Carpenter [Tue, 18 Aug 2015 09:22:10 +0000 (12:22 +0300)]
IB/core: missing curly braces in ib_find_gid()

Smatch says that, based on the indenting, we should probably add curly
braces here.

Fixes: 03db3a2d81e6 ('IB/core: Add RoCE GID table management')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Add RoCE GID table management
Matan Barak [Thu, 30 Jul 2015 15:33:26 +0000 (18:33 +0300)]
IB/core: Add RoCE GID table management

RoCE GIDs are based on IP addresses configured on Ethernet net-devices
which relate to the RDMA (RoCE) device port.

Currently, each of the low-level drivers that support RoCE (ocrdma,
mlx4) manages its own RoCE port GID table. As there's nothing which is
essentially vendor specific, we generalize that, and enhance the RDMA
core GID cache to do this job.

In order to populate the GID table, we listen for events:

(a) netdev up/down/change_addr events - if a netdev is built onto
    our RoCE device, we need to add/delete its IPs. This involves
    adding all GIDs related to this ndev, add default GIDs, etc.

(b) inet events - add new GIDs (according to the IP addresses)
    to the table.

For programming the port RoCE GID table, providers must implement
the add_gid and del_gid callbacks.

RoCE GID management requires us to state the associated net_device
alongside the GID. This information is necessary in order to manage
the GID table. For example, when a net_device is removed, its
associated GIDs need to be removed as well.

RoCE mandates generating a default GID for each port, based on the
related net-device's IPv6 link local. In contrast to the GID based on
the regular IPv6 link-local (as we generate GID per IP address),
the default GID is also available when the net device is down (in
order to support loopback).

Locking is done as follows:
The patch modify the GID table code both for new RoCE drivers
implementing the add_gid/del_gid callbacks and for current RoCE and
IB drivers that do not. The flows for updating the table are
different, so the locking requirements are too.

While updating RoCE GID table, protection against multiple writers is
achieved via mutex_lock(&table->lock). Since writing to a table
requires us to find an entry (possible a free entry) in the table and
then modify it, this mutex protects both the find_gid and write_gid
ensuring the atomicity of the action.
Each entry in the GID cache is protected by rwlock. In RoCE, writing
(usually results from netdev notifier) involves invoking the vendor's
add_gid and del_gid callbacks, which could sleep.
Therefore, an invalid flag is added for each entry. Updates for RoCE are
done via a workqueue, thus sleeping is permitted.

In IB, updates are done in write_lock_irq(&device->cache.lock), thus
write_gid isn't allowed to sleep and add_gid/del_gid are not called.

When passing net-device into/out-of the GID cache, the device
is always passed held (dev_hold).

The code uses a single work item for updating all RDMA devices,
following a netdev or inet notifier.

The patch moves the cache from being a client (which was incorrect,
as the cache is part of the IB infrastructure) to being explicitly
initialized/freed when a device is registered/removed.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Make ib_alloc_device init the kobject
Jason Gunthorpe [Tue, 4 Aug 2015 21:23:34 +0000 (15:23 -0600)]
IB/core: Make ib_alloc_device init the kobject

This gets rid of the weird in-between state where struct ib_device
was allocated but the kobject didn't work.

Consequently ib_device_release is now guaranteed to be called in
all situations and we needn't duplicate its kfrees on error paths.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/bonding: Export bond_option_active_slave_get_rcu
Matan Barak [Thu, 30 Jul 2015 15:33:24 +0000 (18:33 +0300)]
net/bonding: Export bond_option_active_slave_get_rcu

Some consumers of the netdev events API would like to know who is the
active slave when a NETDEV_CHANGEUPPER or NETDEV_BONDING_FAILOVER
events occur. For example, when managing RoCE GIDs, GIDs based on the
bond's ips should only be set on the port which corresponds to active
slave netdevice.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet: Add info for NETDEV_CHANGEUPPER event
Matan Barak [Thu, 30 Jul 2015 15:33:23 +0000 (18:33 +0300)]
net: Add info for NETDEV_CHANGEUPPER event

Some consumers of NETDEV_CHANGEUPPER event would like to know which
upper device was linked/unlinked and what operation was carried.

Add information in the notifier info block for that purpose.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/ipv6: Export addrconf_ifid_eui48
Matan Barak [Thu, 30 Jul 2015 15:33:22 +0000 (18:33 +0300)]
net/ipv6: Export addrconf_ifid_eui48

For loopback purposes, RoCE devices should have a default GID in the
port GID table, even when the interface is down. In order to do so,
we use the IPv6 link local address which would have been genenrated
for the related Ethernet netdevice when it goes up as a default GID.

addrconf_ifid_eui48 is used to gernerate this address, export it.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Drop ib_alloc_fast_reg_mr
Sagi Grimberg [Thu, 30 Jul 2015 07:32:48 +0000 (10:32 +0300)]
IB/core: Drop ib_alloc_fast_reg_mr

Fully replaced by a more generic and suitable
ib_alloc_mr.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Support ib_alloc_mr verb
Mike Marciniszyn [Fri, 7 Aug 2015 14:51:25 +0000 (10:51 -0400)]
IB/hfi1: Support ib_alloc_mr verb

Ported from upstream qib commit
68c02e232b8a ("qib: Support ib_alloc_mr verb")

Tested-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoqib: Support ib_alloc_mr verb
Sagi Grimberg [Thu, 30 Jul 2015 07:32:47 +0000 (10:32 +0300)]
qib: Support ib_alloc_mr verb

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agones: Support ib_alloc_mr verb
Sagi Grimberg [Thu, 30 Jul 2015 07:32:46 +0000 (10:32 +0300)]
nes: Support ib_alloc_mr verb

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agocxgb3: Support ib_alloc_mr verb
Sagi Grimberg [Thu, 30 Jul 2015 07:32:45 +0000 (10:32 +0300)]
cxgb3: Support ib_alloc_mr verb

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoiw_cxgb4: Support ib_alloc_mr verb
Sagi Grimberg [Thu, 30 Jul 2015 07:32:44 +0000 (10:32 +0300)]
iw_cxgb4: Support ib_alloc_mr verb

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoocrdma: Support ib_alloc_mr verb
Sagi Grimberg [Thu, 30 Jul 2015 07:32:43 +0000 (10:32 +0300)]
ocrdma: Support ib_alloc_mr verb

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agomlx4: Support ib_alloc_mr verb
Sagi Grimberg [Thu, 30 Jul 2015 07:32:42 +0000 (10:32 +0300)]
mlx4: Support ib_alloc_mr verb

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agomlx5: Drop mlx5_ib_alloc_fast_reg_mr
Sagi Grimberg [Thu, 30 Jul 2015 07:32:41 +0000 (10:32 +0300)]
mlx5: Drop mlx5_ib_alloc_fast_reg_mr

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDS: Convert to ib_alloc_mr
Sagi Grimberg [Thu, 30 Jul 2015 07:32:40 +0000 (10:32 +0300)]
RDS: Convert to ib_alloc_mr

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agosvcrdma: limit FRMR page list lengths to device max
Steve Wise [Fri, 7 Aug 2015 16:11:20 +0000 (11:11 -0500)]
svcrdma: limit FRMR page list lengths to device max

Svcrdma was incorrectly allocating fastreg MRs and page lists using
RPCSVC_MAXPAGES, which can exceed the device capabilities.  So limit
the depth to the minimum of RPCSVC_MAXPAGES and xprt->sc_frmr_pg_list_len.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoxprtrdma, svcrdma: Convert to ib_alloc_mr
Sagi Grimberg [Thu, 30 Jul 2015 07:32:39 +0000 (10:32 +0300)]
xprtrdma, svcrdma: Convert to ib_alloc_mr

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Convert to ib_alloc_mr
Sagi Grimberg [Thu, 30 Jul 2015 07:32:38 +0000 (10:32 +0300)]
IB/srp: Convert to ib_alloc_mr

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoiser-target: Convert to ib_alloc_mr
Sagi Grimberg [Thu, 30 Jul 2015 07:32:37 +0000 (10:32 +0300)]
iser-target: Convert to ib_alloc_mr

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Convert to ib_alloc_mr
Sagi Grimberg [Thu, 30 Jul 2015 07:32:36 +0000 (10:32 +0300)]
IB/iser: Convert to ib_alloc_mr

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>