cascardo/linux.git
8 years agoIB/core: Guarantee that a local_dma_lkey is available
Jason Gunthorpe [Wed, 5 Aug 2015 20:14:45 +0000 (14:14 -0600)]
IB/core: Guarantee that a local_dma_lkey is available

Every single ULP requires a local_dma_lkey to do anything with
a QP, so let us ensure one exists for every PD created.

If the driver can supply a global local_dma_lkey then use that, otherwise
ask the driver to create a local use all physical memory MR associated
with the new PD.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Acked-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Chain all iser transaction send work requests
Sagi Grimberg [Thu, 6 Aug 2015 15:33:06 +0000 (18:33 +0300)]
IB/iser: Chain all iser transaction send work requests

Chaning of send work requests benefits performance by
reducing the send queue lock contention (acquired in
ib_post_send) and saves us HW doorbells which is posted
only once.

Currently, in normal IO flows iser does not chain the CDB send
work request with the registration work request. Also in PI
flows, signature work requests are not chained as well.

Lets chain those and post only once.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Add debug prints to the various memory registration methods
Sagi Grimberg [Thu, 6 Aug 2015 15:33:05 +0000 (18:33 +0300)]
IB/iser: Add debug prints to the various memory registration methods

Easier to debug when we have the registration details.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Support up to 8MB data transfer in a single command
Sagi Grimberg [Thu, 6 Aug 2015 15:33:04 +0000 (18:33 +0300)]
IB/iser: Support up to 8MB data transfer in a single command

iser support up to 512KB data transfer in a single scsi command.
This means that larger IOs will split to different request. While
iser can easily saturate FDR/EDR wires, some arrays are fine tuned
for 1MB (or larger) IO sizes, hence add an option to support larger
transfers (up to 8MB) if the device allows it.

Given that a few target implementations don't support data transfers
of more than 512KB by default and the fact that larger IO sizes require
more resources, we introduce a module parameter to determine the
maximum number of 512B sectors in a single scsi command.
Users that are interested in larger transfers can change this value given
that the target supports larger transfers.

At the moment, iser works in 4K pages granularity, In a later stage
we will get it to work with system page size instead.

IO operations that consists of N pages will need a page vector
of size N+1 in case the first SG element contains an offset. Given
that some devices allocates memory regions in powers of 2, this
means that allocating a region with N+1 pages, will result in
region resources allocation of the next power of 2. Since we don't
want that to happen, in case we are in the limit of IO size supported
and the first SG element has an offset, we align the SG list using a
bounce buffer (which is OK given that this is not likely to happen a lot).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Pass registration pool a size parameter
Sagi Grimberg [Thu, 6 Aug 2015 15:33:03 +0000 (18:33 +0300)]
IB/iser: Pass registration pool a size parameter

Hard coded for now. This will allow to allocate different
sized MRs depending on the IO size needed (and device
capabilities).

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Unify fast memory registration flows
Sagi Grimberg [Thu, 6 Aug 2015 15:33:02 +0000 (18:33 +0300)]
IB/iser: Unify fast memory registration flows

iser_reg_rdma_mem_[fastreg|fmr] share a lot of code, and
logically do the same thing other than the buffer registration
method itself (iser_fast_reg_mr vs. iser_fast_reg_fmr).
The DIF logic is not implemented in the FMR flow as there is no
existing device that supports FMRs and Signature feature.

This patch unifies the flow in a single routine iser_reg_rdma_mem
and just split to fmr/frwr for the buffer registration itself.

Also, for symmetry reasons, unify iser_unreg_rdma_mem (which will
call the relevant device specific unreg routine).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Make reg_desc_get a per device routine
Sagi Grimberg [Thu, 6 Aug 2015 15:33:01 +0000 (18:33 +0300)]
IB/iser: Make reg_desc_get a per device routine

As for fmrs we will hold a single registration descriptor
as no need for multiple like in the frwr mode (descriptor
for each task). This change helps unifying the duplicate
registration code paths.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Rename iser_reg_page_vec to iser_fast_reg_fmr
Sagi Grimberg [Thu, 6 Aug 2015 15:33:00 +0000 (18:33 +0300)]
IB/iser: Rename iser_reg_page_vec to iser_fast_reg_fmr

Also, change a name of a local variable.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Maintain connection fmr_pool under a single registration descriptor
Adir Lev [Thu, 6 Aug 2015 15:32:59 +0000 (18:32 +0300)]
IB/iser: Maintain connection fmr_pool under a single registration descriptor

This will allow us to unify the memory registration code path between
the various methods which vary by the device capabilities. This change
will make it easier and less intrusive to remove fmr_pools from the
code when we'd want to.

The reason we use a single descriptor is to avoid taking a
redundant spinlock when working with FMRs.

We also change the signature of iser_reg_page_vec to make it match
iser_fast_reg_mr (and the future indirect registration method).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Introduce iser registration pool struct
Sagi Grimberg [Thu, 6 Aug 2015 15:32:58 +0000 (18:32 +0300)]
IB/iser: Introduce iser registration pool struct

Instead of having it a part of the connection structure,
have it be under a dedicated (embedded) structure in the
connection. A logical separation of the registration pool
and the connection structure.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Move fastreg descriptor allocation to iser_create_fastreg_desc
Sagi Grimberg [Thu, 6 Aug 2015 15:32:57 +0000 (18:32 +0300)]
IB/iser: Move fastreg descriptor allocation to iser_create_fastreg_desc

Don't have the caller allocate the structure and worry about
freeing it in case the routine failed.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Introduce iser_reg_ops
Sagi Grimberg [Thu, 6 Aug 2015 15:32:56 +0000 (18:32 +0300)]
IB/iser: Introduce iser_reg_ops

Move all the per-device function pointers to an easy
extensible iser_reg_ops structure that contains all
the iser registration operations.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Remove dead code in fmr_pool alloc/free
Sagi Grimberg [Thu, 6 Aug 2015 15:32:55 +0000 (18:32 +0300)]
IB/iser: Remove dead code in fmr_pool alloc/free

In the past the we always tried to allocate an fmr_pool
and if it failed on ENOSYS (not supported) then we continued
with dma mr. This is not the case anymore and if we tried to
allocate an fmr_pool then it is supported and we expect to succeed.

Also, the check if fmr_pool is allocated when free is called is
redundant as well as we are guaranteed it exists.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Rename struct fast_reg_descriptor -> iser_fr_desc
Sagi Grimberg [Thu, 6 Aug 2015 15:32:54 +0000 (18:32 +0300)]
IB/iser: Rename struct fast_reg_descriptor -> iser_fr_desc

Avoid struct names without iser_ prefix.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Introduce struct iser_reg_resources
Sagi Grimberg [Thu, 6 Aug 2015 15:32:53 +0000 (18:32 +0300)]
IB/iser: Introduce struct iser_reg_resources

Have fast_reg_descriptor hold struct iser_reg_resources
(mr, frpl, valid flag). This will be useful when the
actual buffer registration routines will be passed with
the needed registration resources (i.e. iser_reg_resources)
without being aware of their nature (i.e. data or protection).

In order to achieve this, we remove reg_indicators flags container
and place specific flags (mr_valid) within iser_reg_resources struct.
We also place the sig_mr_valid and sig_protcted flags in iser_pi_context.

This patch also modifies iser_fast_reg_mr to receive the
reg_resources instead of the fast_reg_descriptor and a data/protection
indicator.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Remove an unneeded print for unaligned memory
Sagi Grimberg [Thu, 6 Aug 2015 15:32:52 +0000 (18:32 +0300)]
IB/iser: Remove an unneeded print for unaligned memory

We can do it in iser_aligned_data_len instead and
it will save us an argument that is passed to
fall_to_counce_buf just for the print.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Remove a redundant always-false condition
Sagi Grimberg [Thu, 6 Aug 2015 15:32:51 +0000 (18:32 +0300)]
IB/iser: Remove a redundant always-false condition

We always call iser_initialize_task_headers() and set
the header tx_sg.lkey to the device mr lkey, so no
point in checking it in iser_create_send_desc().

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Fix possible bogus DMA unmapping
Sagi Grimberg [Thu, 6 Aug 2015 15:32:50 +0000 (18:32 +0300)]
IB/iser: Fix possible bogus DMA unmapping

If iser_initialize_task_headers() routine failed before
dma mapping, we should not attempt to unmap in cleanup_task().

Fixes: 7414dde0a6c3a958e (IB/iser: Fix race between iser connection ...)
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Get rid of un-maintained counters
Sagi Grimberg [Thu, 6 Aug 2015 15:32:49 +0000 (18:32 +0300)]
IB/iser: Get rid of un-maintained counters

We don't update those anywhere in the code and they
seem pretty useless (no one seem to care about those).

qp_tx_queue_full: We never should get this
fmr_map_not_avail: We can never get to this
eh_abort_cnt: We don't monitor aborts

Go ahead and remove them.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Fix missing return status check in iser_send_data_out
Sagi Grimberg [Thu, 6 Aug 2015 15:32:48 +0000 (18:32 +0300)]
IB/iser: Fix missing return status check in iser_send_data_out

Since commit "IB/iser: Fix race between iser connection teardown..."
iser_initialize_task_headers() might fail, so we need to check that.

Fixes: 7414dde0a6c3a958e (IB/iser: Fix race between iser connection ...)
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Remove '.' from log message
Sagi Grimberg [Thu, 6 Aug 2015 15:32:47 +0000 (18:32 +0300)]
IB/iser: Remove '.' from log message

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Change minor assignments and logging prints
Sagi Grimberg [Thu, 6 Aug 2015 15:32:46 +0000 (18:32 +0300)]
IB/iser: Change minor assignments and logging prints

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Change some module parameters to be RO
Jenny Falkovich [Thu, 6 Aug 2015 15:32:45 +0000 (18:32 +0300)]
IB/iser: Change some module parameters to be RO

While we're at it, use permission defines instead
of octal values and rearrange a little bit.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/sa: Route SA pathrecord query through netlink
Kaike Wan [Fri, 14 Aug 2015 12:52:09 +0000 (08:52 -0400)]
IB/sa: Route SA pathrecord query through netlink

This patch routes a SA pathrecord query to netlink first and processes the
response appropriately. If a failure is returned, the request will be sent
through IB. The decision whether to route the request to netlink first is
determined by the presence of a listener for the local service netlink
multicast group. If the user-space local service netlink multicast group
listener is not present, the request will be sent through IB, just like
what is currently being done.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/sa: Allocate SA query with kzalloc
Kaike Wan [Fri, 14 Aug 2015 12:52:08 +0000 (08:52 -0400)]
IB/sa: Allocate SA query with kzalloc

Replace kmalloc with kzalloc so that all uninitialized fields in SA query
will be zero-ed out to avoid unintentional consequence. This prepares the
SA query structure to accept new fields in the future.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Add rdma netlink helper functions
Kaike Wan [Fri, 14 Aug 2015 12:52:07 +0000 (08:52 -0400)]
IB/core: Add rdma netlink helper functions

This patch adds a function to check if listeners for a netlink multicast
group are present. It also adds a function to receive netlink response
messages.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/netlink: Add defines for local service requests through netlink
Kaike Wan [Fri, 14 Aug 2015 12:52:06 +0000 (08:52 -0400)]
IB/netlink: Add defines for local service requests through netlink

This patch adds netlink defines for local service client, local service
group, local service operations, and related attributes.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Stop the scsi_eh_<n> and scsi_tmf_<n> threads if login fails
Bart Van Assche [Fri, 14 Aug 2015 18:01:09 +0000 (11:01 -0700)]
IB/srp: Stop the scsi_eh_<n> and scsi_tmf_<n> threads if login fails

scsi_host_alloc() not only allocates memory for a SCSI host but also
creates the scsi_eh_<n> kernel thread and the scsi_tmf_<n> workqueue.
Stop these threads if login fails by calling scsi_host_put().

Reported-by: Konstantin Krotov <kkv@clodo.ru>
Fixes: fb49c8bbaae7 ("Remove an extraneous scsi_host_put() from an error path")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: <stable@vger.kernel.org> #v3.19
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Bump driver version and release date
Bart Van Assche [Fri, 31 Jul 2015 21:13:52 +0000 (14:13 -0700)]
IB/srp: Bump driver version and release date

Since version 1.0 e.g. scsi-mq has been added. Since this is
a significant change, bump the driver version and release date.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Handle partial connection success correctly
Bart Van Assche [Fri, 31 Jul 2015 21:13:22 +0000 (14:13 -0700)]
IB/srp: Handle partial connection success correctly

Avoid that the following kernel warning is reported if the SRP
target system accepts fewer channels per connection than what
was requested by the initiator system:

WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:617 srp_destroy_qp+0xb1/0x120 [ib_srp]()
Call Trace:
[<ffffffff8105d67f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8105d6da>] warn_slowpath_null+0x1a/0x20
[<ffffffffa05419e1>] srp_destroy_qp+0xb1/0x120 [ib_srp]
[<ffffffffa05445fb>] srp_create_ch_ib+0x19b/0x420 [ib_srp]
[<ffffffffa0545257>] srp_create_target+0x7d7/0xa94 [ib_srp]
[<ffffffff8138dac0>] dev_attr_store+0x20/0x30
[<ffffffff812079ef>] sysfs_write_file+0xef/0x170
[<ffffffff81191fc4>] vfs_write+0xb4/0x130
[<ffffffff8119276f>] sys_write+0x5f/0xa0
[<ffffffff815a0a59>] system_call_fastpath+0x16/0x1b

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Constify a function argument
Bart Van Assche [Fri, 31 Jul 2015 21:12:48 +0000 (14:12 -0700)]
IB/srp: Constify a function argument

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Fix incorrect cq flushing in error state
Ariel Nahum [Sun, 9 Aug 2015 08:16:27 +0000 (11:16 +0300)]
IB/mlx4: Fix incorrect cq flushing in error state

When handling a device internal error, the driver is responsible to
drain the completion queue with flush errors.

In case a completion queue was assigned to multiple send queues, the
driver iterates over the send queues and generates flush errors of
inflight wqes. The driver must correctly pass the wc array with an
offset as a result of the previous send queue iteration. Not doing so
will overwrite previously set completions and return a wrong number
of polled completions which includes ones which were not correctly set.

Fixes: 35f05dabf95a (IB/mlx4: Reset flow support for IB kernel ULPs)
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Use correct SL on AH query under RoCE
Noa Osherovich [Thu, 30 Jul 2015 14:34:24 +0000 (17:34 +0300)]
IB/mlx4: Use correct SL on AH query under RoCE

The mlx4 IB driver implementation for ib_query_ah used a wrong offset
(28 instead of 29) when link type is Ethernet. Fixed to use the correct one.

Fixes: fa417f7b520e ('IB/mlx4: Add support for IBoE')
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Forbid using sysfs to change RoCE pkeys
Jack Morgenstein [Thu, 30 Jul 2015 14:34:23 +0000 (17:34 +0300)]
IB/mlx4: Forbid using sysfs to change RoCE pkeys

The pkey mapping for RoCE must remain the default mapping:
VFs:
  virtual index 0 = mapped to real index 0 (0xFFFF)
  All others indices: mapped to a real pkey index containing an
                      invalid pkey.
PF:
  virtual index i = real index i.

Don't allow users to change these mappings using files found in
sysfs.

Fixes: c1e7e466120b ('IB/mlx4: Add iov directory in sysfs under the ib device')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Demote mcg message from warning to debug
Jack Morgenstein [Thu, 30 Jul 2015 14:34:22 +0000 (17:34 +0300)]
IB/mlx4: Demote mcg message from warning to debug

The mcg "too many pending requests" warning message fills the log
when OpenSM is downed. Demote the message from  warning level to
debug level.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Fix potential deadlock when sending mad to wire
Jack Morgenstein [Thu, 30 Jul 2015 14:34:21 +0000 (17:34 +0300)]
IB/mlx4: Fix potential deadlock when sending mad to wire

send_mad_to_wire takes the same spinlock that is taken in
the interrupt context.  Therefore, it needs irqsave/restore.

Fixes: b9c5d6a64358 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Remove needless bracketization
Doug Ledford [Sat, 15 Aug 2015 14:16:14 +0000 (10:16 -0400)]
IB/core: Remove needless bracketization

Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/ocrdma: Incorporate the moving of GID Table mgmt to IB/Core
Somnath Kotur [Thu, 30 Jul 2015 15:33:31 +0000 (18:33 +0300)]
RDMA/ocrdma: Incorporate the moving of GID Table mgmt to IB/Core

1.Change query_gid hook to return value from IB/Core GID
  management APIs.
2.Get rid of all the netdev notifier chain subscription code as well
  as maintenance of SGID Table in memory.
3.Implement get_netdev hook in driver.

Signed-off-by: Somnath Kotur <somnath.kotur@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Replace mechanism for RoCE GID management
Moni Shoua [Thu, 30 Jul 2015 15:33:30 +0000 (18:33 +0300)]
IB/mlx4: Replace mechanism for RoCE GID management

Manage RoCE gid table with logic in IB/core, which is common to all
vendors, and remove the mechanism from the mlx4 IB driver.
Since management of the GID cache may lead to index mismatch with the
hardware GID table, a translation between indexes is required when
modifying a QP or creating an address handle.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Implement ib_device callbacks
Moni Shoua [Thu, 30 Jul 2015 15:33:29 +0000 (18:33 +0300)]
IB/mlx4: Implement ib_device callbacks

get_netdev: get the net_device on the physical port of the IB transport port. In
port aggregation mode it is required to return the netdev of the active port.

modify_gid: note for a change in the RoCE gid cache. Handle this by writing to
the harsware GID table. It is possible that indexes in cahce and hardware tables
won't match so a translation is required when modifying a QP or creating an
address handle.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/mlx4: Postpone the registration of net_device
Moni Shoua [Thu, 30 Jul 2015 15:33:28 +0000 (18:33 +0300)]
net/mlx4: Postpone the registration of net_device

The mlx4 network driver was registered in the context of the 'add'
function of the core driver (called when HW should be registered).
This makes the netdev event NETDEV_REGISTER to be sent in a context
where the answer to get_protocol_dev() callback returns NULL. This may
be confusing to listeners of netdev events.
This patch is a preparation to the patch that implements the
get_netdev() callback in the IB/mlx4 driver.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Add RoCE table bonding support
Matan Barak [Thu, 30 Jul 2015 15:33:27 +0000 (18:33 +0300)]
IB/core: Add RoCE table bonding support

Handling bonding and other devices require us to all all GIDs of the
net-devices which are upper-devices of the RoCE port related
net-device.

Active-backup configurations imposes even more challenges as the
default GID should only be set on the active devices (this is
necessary as otherwise the same MAC could be used for several
slaves and thus several slaves will have identical GIDs).

Managing these configurations are done by listening to:
(a) NETDEV_CHANGEUPPER event
(1) if a related net-device is linked, delete all inactive
    slaves default GIDs and add the upper device GIDs.
(2) if a related net-device is unlinked, delete all upper GIDs
    and add the default GIDs.
(b) NETDEV_BONDING_FAILOVER:
(1) delete the bond GIDs from inactive slaves
(2) delete the inactive slave's default GIDs
(3) Add the bond GIDs to the active slave.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: missing curly braces in ib_find_gid()
Dan Carpenter [Tue, 18 Aug 2015 09:22:10 +0000 (12:22 +0300)]
IB/core: missing curly braces in ib_find_gid()

Smatch says that, based on the indenting, we should probably add curly
braces here.

Fixes: 03db3a2d81e6 ('IB/core: Add RoCE GID table management')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Add RoCE GID table management
Matan Barak [Thu, 30 Jul 2015 15:33:26 +0000 (18:33 +0300)]
IB/core: Add RoCE GID table management

RoCE GIDs are based on IP addresses configured on Ethernet net-devices
which relate to the RDMA (RoCE) device port.

Currently, each of the low-level drivers that support RoCE (ocrdma,
mlx4) manages its own RoCE port GID table. As there's nothing which is
essentially vendor specific, we generalize that, and enhance the RDMA
core GID cache to do this job.

In order to populate the GID table, we listen for events:

(a) netdev up/down/change_addr events - if a netdev is built onto
    our RoCE device, we need to add/delete its IPs. This involves
    adding all GIDs related to this ndev, add default GIDs, etc.

(b) inet events - add new GIDs (according to the IP addresses)
    to the table.

For programming the port RoCE GID table, providers must implement
the add_gid and del_gid callbacks.

RoCE GID management requires us to state the associated net_device
alongside the GID. This information is necessary in order to manage
the GID table. For example, when a net_device is removed, its
associated GIDs need to be removed as well.

RoCE mandates generating a default GID for each port, based on the
related net-device's IPv6 link local. In contrast to the GID based on
the regular IPv6 link-local (as we generate GID per IP address),
the default GID is also available when the net device is down (in
order to support loopback).

Locking is done as follows:
The patch modify the GID table code both for new RoCE drivers
implementing the add_gid/del_gid callbacks and for current RoCE and
IB drivers that do not. The flows for updating the table are
different, so the locking requirements are too.

While updating RoCE GID table, protection against multiple writers is
achieved via mutex_lock(&table->lock). Since writing to a table
requires us to find an entry (possible a free entry) in the table and
then modify it, this mutex protects both the find_gid and write_gid
ensuring the atomicity of the action.
Each entry in the GID cache is protected by rwlock. In RoCE, writing
(usually results from netdev notifier) involves invoking the vendor's
add_gid and del_gid callbacks, which could sleep.
Therefore, an invalid flag is added for each entry. Updates for RoCE are
done via a workqueue, thus sleeping is permitted.

In IB, updates are done in write_lock_irq(&device->cache.lock), thus
write_gid isn't allowed to sleep and add_gid/del_gid are not called.

When passing net-device into/out-of the GID cache, the device
is always passed held (dev_hold).

The code uses a single work item for updating all RDMA devices,
following a netdev or inet notifier.

The patch moves the cache from being a client (which was incorrect,
as the cache is part of the IB infrastructure) to being explicitly
initialized/freed when a device is registered/removed.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Make ib_alloc_device init the kobject
Jason Gunthorpe [Tue, 4 Aug 2015 21:23:34 +0000 (15:23 -0600)]
IB/core: Make ib_alloc_device init the kobject

This gets rid of the weird in-between state where struct ib_device
was allocated but the kobject didn't work.

Consequently ib_device_release is now guaranteed to be called in
all situations and we needn't duplicate its kfrees on error paths.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/bonding: Export bond_option_active_slave_get_rcu
Matan Barak [Thu, 30 Jul 2015 15:33:24 +0000 (18:33 +0300)]
net/bonding: Export bond_option_active_slave_get_rcu

Some consumers of the netdev events API would like to know who is the
active slave when a NETDEV_CHANGEUPPER or NETDEV_BONDING_FAILOVER
events occur. For example, when managing RoCE GIDs, GIDs based on the
bond's ips should only be set on the port which corresponds to active
slave netdevice.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet: Add info for NETDEV_CHANGEUPPER event
Matan Barak [Thu, 30 Jul 2015 15:33:23 +0000 (18:33 +0300)]
net: Add info for NETDEV_CHANGEUPPER event

Some consumers of NETDEV_CHANGEUPPER event would like to know which
upper device was linked/unlinked and what operation was carried.

Add information in the notifier info block for that purpose.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/ipv6: Export addrconf_ifid_eui48
Matan Barak [Thu, 30 Jul 2015 15:33:22 +0000 (18:33 +0300)]
net/ipv6: Export addrconf_ifid_eui48

For loopback purposes, RoCE devices should have a default GID in the
port GID table, even when the interface is down. In order to do so,
we use the IPv6 link local address which would have been genenrated
for the related Ethernet netdevice when it goes up as a default GID.

addrconf_ifid_eui48 is used to gernerate this address, export it.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Drop ib_alloc_fast_reg_mr
Sagi Grimberg [Thu, 30 Jul 2015 07:32:48 +0000 (10:32 +0300)]
IB/core: Drop ib_alloc_fast_reg_mr

Fully replaced by a more generic and suitable
ib_alloc_mr.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Support ib_alloc_mr verb
Mike Marciniszyn [Fri, 7 Aug 2015 14:51:25 +0000 (10:51 -0400)]
IB/hfi1: Support ib_alloc_mr verb

Ported from upstream qib commit
68c02e232b8a ("qib: Support ib_alloc_mr verb")

Tested-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoqib: Support ib_alloc_mr verb
Sagi Grimberg [Thu, 30 Jul 2015 07:32:47 +0000 (10:32 +0300)]
qib: Support ib_alloc_mr verb

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agones: Support ib_alloc_mr verb
Sagi Grimberg [Thu, 30 Jul 2015 07:32:46 +0000 (10:32 +0300)]
nes: Support ib_alloc_mr verb

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agocxgb3: Support ib_alloc_mr verb
Sagi Grimberg [Thu, 30 Jul 2015 07:32:45 +0000 (10:32 +0300)]
cxgb3: Support ib_alloc_mr verb

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoiw_cxgb4: Support ib_alloc_mr verb
Sagi Grimberg [Thu, 30 Jul 2015 07:32:44 +0000 (10:32 +0300)]
iw_cxgb4: Support ib_alloc_mr verb

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoocrdma: Support ib_alloc_mr verb
Sagi Grimberg [Thu, 30 Jul 2015 07:32:43 +0000 (10:32 +0300)]
ocrdma: Support ib_alloc_mr verb

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agomlx4: Support ib_alloc_mr verb
Sagi Grimberg [Thu, 30 Jul 2015 07:32:42 +0000 (10:32 +0300)]
mlx4: Support ib_alloc_mr verb

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agomlx5: Drop mlx5_ib_alloc_fast_reg_mr
Sagi Grimberg [Thu, 30 Jul 2015 07:32:41 +0000 (10:32 +0300)]
mlx5: Drop mlx5_ib_alloc_fast_reg_mr

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDS: Convert to ib_alloc_mr
Sagi Grimberg [Thu, 30 Jul 2015 07:32:40 +0000 (10:32 +0300)]
RDS: Convert to ib_alloc_mr

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agosvcrdma: limit FRMR page list lengths to device max
Steve Wise [Fri, 7 Aug 2015 16:11:20 +0000 (11:11 -0500)]
svcrdma: limit FRMR page list lengths to device max

Svcrdma was incorrectly allocating fastreg MRs and page lists using
RPCSVC_MAXPAGES, which can exceed the device capabilities.  So limit
the depth to the minimum of RPCSVC_MAXPAGES and xprt->sc_frmr_pg_list_len.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoxprtrdma, svcrdma: Convert to ib_alloc_mr
Sagi Grimberg [Thu, 30 Jul 2015 07:32:39 +0000 (10:32 +0300)]
xprtrdma, svcrdma: Convert to ib_alloc_mr

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srp: Convert to ib_alloc_mr
Sagi Grimberg [Thu, 30 Jul 2015 07:32:38 +0000 (10:32 +0300)]
IB/srp: Convert to ib_alloc_mr

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoiser-target: Convert to ib_alloc_mr
Sagi Grimberg [Thu, 30 Jul 2015 07:32:37 +0000 (10:32 +0300)]
iser-target: Convert to ib_alloc_mr

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Convert to ib_alloc_mr
Sagi Grimberg [Thu, 30 Jul 2015 07:32:36 +0000 (10:32 +0300)]
IB/iser: Convert to ib_alloc_mr

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB: Modify ib_create_mr API
Sagi Grimberg [Thu, 30 Jul 2015 07:32:35 +0000 (10:32 +0300)]
IB: Modify ib_create_mr API

Use ib_alloc_mr with specific parameters.
Change the existing callers.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Get rid of redundant verb ib_destroy_mr
Sagi Grimberg [Thu, 30 Jul 2015 07:32:34 +0000 (10:32 +0300)]
IB/core: Get rid of redundant verb ib_destroy_mr

This was added in a thought of uniting all mr allocation
and deallocation routines but the fact is we have a single
deallocation routine already, ib_dereg_mr.

And, move mlx5_ib_destroy_mr specific logic into mlx5_ib_dereg_mr
(includes only signature stuff for now).

And, fixup the only callers (iser/isert) accordingly.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cma: Fix net_dev reference leak with failed requests
Haggai Eran [Thu, 27 Aug 2015 12:55:15 +0000 (15:55 +0300)]
IB/cma: Fix net_dev reference leak with failed requests

When no matching listening ID is found for a given request, the net_dev
that was used to find the request isn't released.

Fixes: 0b3ca768fcb0 ("IB/cma: Use found net_dev for passive connections")
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cm: Remove compare_data checks
Haggai Eran [Thu, 30 Jul 2015 14:50:26 +0000 (17:50 +0300)]
IB/cm: Remove compare_data checks

Now that there are no ib_cm clients using the compare_data feature for
matching IB CM requests' private data, remove the compare_data parameter of
ib_cm_listen and remove the code implementing the feature.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cma: Share ib_cm_ids between rdma_cm_ids
Haggai Eran [Thu, 30 Jul 2015 14:50:25 +0000 (17:50 +0300)]
IB/cma: Share ib_cm_ids between rdma_cm_ids

Use ib_cm_insert_listen to create listening IB CM IDs or share existing
ones if needed. When given a request on a specific CM ID, the code now
matches the request to the RDMA CM ID based on the request parameters, so
it no longer needs to rely on the ib_cm's private data matching
capabilities.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cma: Use found net_dev for passive connections
Haggai Eran [Thu, 30 Jul 2015 14:50:24 +0000 (17:50 +0300)]
IB/cma: Use found net_dev for passive connections

When receiving a new connection in cma_req_handler, we actually already
know the net_dev that is used for the connection's creation. Instead of
calling cma_translate_addr to resolve the new connection id's source
address, just use the net_dev that was found.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cma: Validate routing of incoming requests
Haggai Eran [Thu, 30 Jul 2015 14:50:23 +0000 (17:50 +0300)]
IB/cma: Validate routing of incoming requests

Pass incoming request parameters through the relevant IPv4/IPv6 routing
tables and make sure the network stack is configured to handle such
requests.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cma: Add net_dev and private data checks to RDMA CM
Haggai Eran [Thu, 30 Jul 2015 14:50:22 +0000 (17:50 +0300)]
IB/cma: Add net_dev and private data checks to RDMA CM

Instead of relying on a the ib_cm module to check an incoming CM request's
private data header, add these checks to the RDMA CM module. This allows a
following patch to to clean up the ib_cm interface and remove the code that
looks into the private headers. It will also allow supporting namespaces in
RDMA CM by making these checks namespace aware later on.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cm: Expose BTH P_Key in CM and SIDR request events
Haggai Eran [Thu, 30 Jul 2015 14:50:21 +0000 (17:50 +0300)]
IB/cm: Expose BTH P_Key in CM and SIDR request events

The rdma_cm module will later use the P_Key from the BTH to de-mux
requests.

See discussion at:
  http://www.spinics.net/lists/netdev/msg336067.html

Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Liran Liss <liranl@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cma: Helper functions to access port space IDRs
Haggai Eran [Thu, 30 Jul 2015 14:50:20 +0000 (17:50 +0300)]
IB/cma: Helper functions to access port space IDRs

Add helper functions to access the IDRs by port-space and port number.

Pass around the port-space enum in cma.c instead of using pointers to
port-space IDRs.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cma: Refactor RDMA IP CM private-data parsing code
Haggai Eran [Thu, 30 Jul 2015 14:50:19 +0000 (17:50 +0300)]
IB/cma: Refactor RDMA IP CM private-data parsing code

When receiving a connection request, rdma_cm needs to associate the request
with a network device, in order to disambiguate requests. To do this, it
needs to know the request's destination IP. For this the module needs to
allow getting this information from the private data in the request packet,
instead of relying on the information already being in the listening RDMA
CM ID.

When creating a new incoming connection ID, the code in
cma_save_ip{4,6}_info can no longer rely on the listener's private data to
find the port number, so it reads it from the requested service ID.

Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cm: Share listening CM IDs
Haggai Eran [Thu, 30 Jul 2015 14:50:18 +0000 (17:50 +0300)]
IB/cm: Share listening CM IDs

Enabling network namespaces for RDMA CM will allow processes on different
namespaces to listen on the same port. In order to leave namespace support
out of the CM layer, this requires that multiple RDMA CM IDs will be able
to share a single CM ID.

This patch adds infrastructure to retrieve an existing listening ib_cm_id,
based on its device and service ID, or create a new one if one does not
already exist. It also adds a reference count for such instances
(cm_id_private.listen_sharecount), and prevents cm_destroy_id from
destroying a CM if it is still shared. See the relevant discussion [1].

[1] Re: [PATCH v3 for-next 05/13] IB/cm: Reference count ib_cm_ids
    http://www.spinics.net/lists/netdev/msg328860.html

Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cm: Expose service ID in request events
Haggai Eran [Thu, 30 Jul 2015 14:50:17 +0000 (17:50 +0300)]
IB/cm: Expose service ID in request events

Expose the service ID on an incoming CM or SIDR request to the event
handler. This will allow the RDMA CM module to de-multiplex connection
requests based on the information encoded in the service ID.

Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/ipoib: Return IPoIB devices matching connection parameters
Guy Shapiro [Thu, 30 Jul 2015 14:50:16 +0000 (17:50 +0300)]
IB/ipoib: Return IPoIB devices matching connection parameters

Implement the get_net_device_by_port_pkey_ip callback that returns network
device to ib_core according to connection parameters. Check the ipoib
device and iterate over all child devices to look for a match.

For each IPoIB device we iterate through all upper devices when searching
for a matching IP, in order to support bonding.

Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Find the network device matching connection parameters
Yotam Kenneth [Thu, 30 Jul 2015 14:50:15 +0000 (17:50 +0300)]
IB/core: Find the network device matching connection parameters

In the case of IPoIB, and maybe in other cases, the network device is
managed by an upper-layer protocol (ULP). In order to expose this
network device to other users of the IB device, let ULPs implement
a callback that returns network device according to connection parameters.

The IB device and port, together with the P_Key and the GID should
be enough to uniquely identify the ULP net device. However, in current
kernels there can be multiple IPoIB interfaces created with the same GID.
Furthermore, such configuration may be desireable to support ipvlan-like
configurations for RDMA CM with IPoIB.  To resolve the device in these
cases the code will also take the IP address as an additional input.

Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: lock client data with lists_rwsem
Haggai Eran [Thu, 30 Jul 2015 14:50:14 +0000 (17:50 +0300)]
IB/core: lock client data with lists_rwsem

An ib_client callback that is called with the lists_rwsem locked only for
read is protected from changes to the IB client lists, but not from
ib_unregister_device() freeing its client data. This is because
ib_unregister_device() will remove the device from the device list with
lists_rwsem locked for write, but perform the rest of the cleanup,
including the call to remove() without that lock.

Mark client data that is undergoing de-registration with a new going_down
flag in the client data context. Lock the client data list with lists_rwsem
for write in addition to using the spinlock, so that functions calling the
callback would be able to lock only lists_rwsem for read and let callbacks
sleep.

Since ib_unregister_client() now marks the client data context, no need for
remove() to search the context again, so pass the client data directly to
remove() callbacks.

Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Add rwsem to allow reading device list or client list
Haggai Eran [Thu, 30 Jul 2015 14:50:13 +0000 (17:50 +0300)]
IB/core: Add rwsem to allow reading device list or client list

Currently the RDMA subsystem's device list and client list are protected by
a single mutex. This prevents adding user-facing APIs that iterate these
lists, since using them may cause a deadlock. The patch attempts to solve
this problem by adding a read-write semaphore to protect the lists. Readers
now don't need the mutex, and are safe just by read-locking the semaphore.

The ib_register_device, ib_register_client, ib_unregister_device, and
ib_unregister_client functions are modified to lock the semaphore for write
during their respective list modification. Also, in order to make sure
client callbacks are called only between add() and remove() calls, the code
is changed to only add items to the lists after the add() calls and remove
from the lists before the remove() calls.

This patch attempts to solve a similar need [1] that was seen in the RoCE
v2 patch series.

[1] http://www.spinics.net/lists/linux-rdma/msg24733.html

Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Matan Barak <matanb@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/Core: remove rdma_cap_read_multi_sge() helper
Steve Wise [Mon, 27 Jul 2015 23:10:18 +0000 (18:10 -0500)]
RDMA/Core: remove rdma_cap_read_multi_sge() helper

This functionality already exists via the max_sge_rd
device capability.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agosvcrdma: Use max_sge_rd for destination read depths
Steve Wise [Mon, 27 Jul 2015 23:10:12 +0000 (18:10 -0500)]
svcrdma: Use max_sge_rd for destination read depths

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoipath,qib: Expose max_sge_rd correctly
Steve Wise [Mon, 27 Jul 2015 23:10:07 +0000 (18:10 -0500)]
ipath,qib: Expose max_sge_rd correctly

Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agomlx4, mlx5, mthca: Expose max_sge_rd correctly
Sagi Grimberg [Mon, 27 Jul 2015 23:10:01 +0000 (18:10 -0500)]
mlx4, mlx5, mthca: Expose max_sge_rd correctly

Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.

Reported-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/hfi1: replace indent spaces with tabs
Jeff Becker [Fri, 21 Aug 2015 19:26:22 +0000 (12:26 -0700)]
staging/hfi1: replace indent spaces with tabs

Running checkpatch.pl on mad.c produces several
"ERROR: code indent should use tabs where possible" messages.
This patch fixes these.

Signed-off-by: Jeff Becker <Jeffrey.C.Becker@nasa.gov>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: add driver files
Mike Marciniszyn [Thu, 30 Jul 2015 19:17:43 +0000 (15:17 -0400)]
IB/hfi1: add driver files

Signed-off-by: Andrew Friedley <andrew.friedley@intel.com>
Signed-off-by: Arthur Kepner <arthur.kepner@intel.com>
Signed-off-by: Brendan Cunningham <brendan.cunningham@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Caz Yokoyama <caz.yokoyama@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jim Snow <jim.m.snow@intel.com>
Signed-off-by: John Gregor <john.a.gregor@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Kevin Pine <kevin.pine@intel.com>
Signed-off-by: Kyle Liddell <kyle.liddell@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Ravi Krishnaswamy <ravi.krishnaswamy@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Sanath Kumar <sanath.s.kumar@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Vlad Danushevsky <vladimir.danusevsky@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Add core header changes needed for OPA
Dennis Dalessandro [Thu, 30 Jul 2015 19:17:32 +0000 (15:17 -0400)]
IB/core: Add core header changes needed for OPA

This patch adds the value of the CNP opcode to the existing list of enumerated
opcodes in ib_pack.h

Add common OPA header definitions for driver
build:
- opa_port_info.h
- opa_smi.h
- hfi1_user.h

Additionally, ib_mad.h, has additional definitions
that are common to ib_drivers including:
- trap support
- cca support

The qib driver has the duplication removed in favor
those in ib_mad.h

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: John, Jubin <jubin.john@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/amso1100: Deprecate the amso1100 driver and move to staging
Steve Wise [Wed, 29 Jul 2015 14:44:14 +0000 (09:44 -0500)]
RDMA/amso1100: Deprecate the amso1100 driver and move to staging

The HW hasn't been sold since 2005, and the SW has definite bit rot.
Its time to remove it.  So move it to staging for a few releases and
then remove it after that.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/ipath: Deprecate ipath driver and move to staging.
Dennis Dalessandro [Thu, 30 Jul 2015 13:25:42 +0000 (09:25 -0400)]
IB/ipath: Deprecate ipath driver and move to staging.

It is now time for the ipath driver to begin to be phased out of the kernel.
This patch moves the ipath driver from the Infiniband sub tree to the staging
area where it will remain until the code is removed from the kernel in a few
releases.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoStaging: Add staging/rdma directory and update MAINTAINERS
Doug Ledford [Thu, 27 Aug 2015 18:18:57 +0000 (14:18 -0400)]
Staging: Add staging/rdma directory and update MAINTAINERS

Create the rdma directory in the staging area for use as we deprecate
some older drivers and as we bring in some new drivers that are in
need of work.  Update the MAINTAINERS file so that updates to these
files go to linux-rdma@vger.kernel.org.  Expected lifespan of this
directory is three releases for any deprecated drivers moved here
and an unknown, but theoretically bounded amount of time for the new
drivers as a new core RDMA transfer library needs to be written and
the drivers modified to use it in order for them to move out of this
directory.

Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoiw_cxgb4: Add support for clip
Hariprasad S [Tue, 25 Aug 2015 08:38:23 +0000 (14:08 +0530)]
iw_cxgb4: Add support for clip

Add support for ipv6 address handling clip api provided by lld

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/cma: fix IPv6 address resolution
Spencer Baugh [Thu, 13 Aug 2015 19:19:10 +0000 (12:19 -0700)]
RDMA/cma: fix IPv6 address resolution

Resolving a link-local IPv6 address with an unspecified source address
was broken by commit 5462eddd7a, which prevented the IPv6 stack from
learning the scope id of the link-local IPv6 address, causing random
failures as the IP stack chose a random link to resolve the address on.

This commit 5462eddd7a made us bail out of cma_check_linklocal early if
the address passed in was not an IPv6 link-local address. On the address
resolution path, the address passed in is the source address; if the
source address is the unspecified address, which is not link-local, we
will bail out early.

This is mostly correct, but if the destination address is a link-local
address, then we will be following a link-local route, and we'll need to
tell the IPv6 stack what the scope id of the destination address is.
This used to be done by last line of cma_check_linklocal, which is
skipped when bailing out early:

dev_addr->bound_dev_if = sin6->sin6_scope_id;

(In cma_bind_addr, the sin6_scope_id of the source address is set to the
sin6_scope_id of the destination address, so this is correct)
This line is required in turn for the following line, L279 of
addr6_resolve, to actually inform the IPv6 stack of the scope id:

      fl6.flowi6_oif = addr->bound_dev_if;

Since we can only know we are in this failure case when we have access
to both the source IPv6 address and destination IPv6 address, we have to
deal with this further up the stack. So detect this failure case in
cma_bind_addr, and set bound_dev_if to the destination address scope id
to correct it.

Signed-off-by: Spencer Baugh <sbaugh@catern.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/ucma: Fix theoretical user triggered use-after-free
Jason Gunthorpe [Tue, 4 Aug 2015 23:13:32 +0000 (17:13 -0600)]
IB/ucma: Fix theoretical user triggered use-after-free

Something like this:

CPU A                         CPU B
Acked-by: Sean Hefty <sean.hefty@intel.com>
========================      ================================
ucma_destroy_id()
 wait_for_completion()
                              .. anything
                                ucma_put_ctx()
                                  complete()
 .. continues ...
                              ucma_leave_multicast()
                               mutex_lock(mut)
                                 atomic_inc(ctx->ref)
                               mutex_unlock(mut)
 ucma_free_ctx()
  ucma_cleanup_multicast()
   mutex_lock(mut)
     kfree(mc)
                               rdma_leave_multicast(mc->ctx->cm_id,..

Fix it by latching the ref at 0. Once it goes to 0 mc and ctx cannot
leave the mutex(mut) protection.

The other atomic_inc in ucma_get_ctx is OK because mutex(mut) protects
it from racing with ucma_destroy_id.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoiw_cxgb4: set the default MPA version to 2
Hariprasad S [Mon, 27 Jul 2015 08:38:52 +0000 (14:08 +0530)]
iw_cxgb4: set the default MPA version to 2

This enables ORD/IRD negotiation and its about time to enable it by
default

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/iser: Limit sgs to the device fastreg depth
Steve Wise [Tue, 28 Jul 2015 14:13:52 +0000 (09:13 -0500)]
RDMA/iser: Limit sgs to the device fastreg depth

Currently the sg tablesize, which dictates fast register page list
depth to use, does not take into account the limits of the rdma device.
So adjust it once we discover the device fastreg max depth limit.  Also
adjust the max_sectors based on the resulting sg tablesize.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Remove dead code from alloc_cached_mr()
Roland Dreier [Mon, 27 Jul 2015 21:43:23 +0000 (14:43 -0700)]
IB/mlx5: Remove dead code from alloc_cached_mr()

The only place that assigns mr inside the loop already does a break.
So "if (mr)" will never be true here since the function initializes mr
to NULL at the top.  We can just drop the extra if and break here.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/qib: Change lkey table allocation to support more MRs
Mike Marciniszyn [Tue, 21 Jul 2015 12:36:07 +0000 (08:36 -0400)]
IB/qib: Change lkey table allocation to support more MRs

The lkey table is allocated with with a get_user_pages() with an
order based on a number of index bits from a module parameter.

The underlying kernel code cannot allocate that many contiguous pages.

There is no reason the underlying memory needs to be physically
contiguous.

This patch:
- switches the allocation/deallocation to vmalloc/vfree
- caps the number of bits to 23 to insure at least 1 generation bit
  o this matches the module parameter description

Cc: stable@vger.kernel.org
Reviewed-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agomlx5: Expose correct page_size_cap in device attributes
Sagi Grimberg [Tue, 21 Jul 2015 11:40:12 +0000 (14:40 +0300)]
mlx5: Expose correct page_size_cap in device attributes

Should be all the page sizes that are supported by the
device.

Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agomlx5: Fix missing device local_dma_lkey
Sagi Grimberg [Mon, 20 Jul 2015 16:54:36 +0000 (19:54 +0300)]
mlx5: Fix missing device local_dma_lkey

The mlx5 driver exposes device capability IB_DEVICE_LOCAL_DMA_LKEY
but does not set the the device local_dma_lkey. This breaks
rpcrdma drivers.

Query and set this lkey when creating the device resources.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoLinux 4.2-rc8 v4.2-rc8
Linus Torvalds [Mon, 24 Aug 2015 03:52:59 +0000 (20:52 -0700)]
Linux 4.2-rc8