cascardo/linux.git
10 years agoMerge branches 'mlx5', 'qib' and 'srp' into for-next
Roland Dreier [Thu, 11 Jul 2013 23:49:30 +0000 (16:49 -0700)]
Merge branches 'mlx5', 'qib' and 'srp' into for-next

10 years agomlx5: Return -EFAULT instead of -EPERM
Dan Carpenter [Wed, 10 Jul 2013 10:58:59 +0000 (13:58 +0300)]
mlx5: Return -EFAULT instead of -EPERM

For copy_to/from_user() failure, the correct error code is -EFAULT not
-EPERM.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/qib: Log all SDMA errors unconditionally
Dean Luick [Thu, 11 Jul 2013 19:32:14 +0000 (15:32 -0400)]
IB/qib: Log all SDMA errors unconditionally

This patch adds code to log SDMA errors for supportability purposes.

Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/qib: Fix module-level leak
Mike Marciniszyn [Wed, 3 Jul 2013 17:50:28 +0000 (13:50 -0400)]
IB/qib: Fix module-level leak

The vzalloc()'ed field physshadow is leaked on module unload.

This patch adds vfree after the sibling page shadow is freed.

Reported-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agomlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec
Moshe Lazer [Wed, 10 Jul 2013 11:31:03 +0000 (14:31 +0300)]
mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec

Sparse reported an endianness bug in the assignment to hca_cap.uar_page_sz.

Fix the declaration of this field to be __be16 (which is what is in
the firmware spec), renaming the field to log_uar_pg_size to conform
to the spec, which fixes the endianness bug reported by sparse.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline
Bart Van Assche [Wed, 10 Jul 2013 15:36:35 +0000 (17:36 +0200)]
IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline

If the transport layer is offline it is more appropriate to let
srp_abort() return FAST_IO_FAIL instead of SUCCESS.

Reported-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoMerge branches 'af_ib', 'cxgb4', 'misc', 'mlx5', 'ocrdma', 'qib' and 'srp' into for...
Roland Dreier [Mon, 8 Jul 2013 18:22:11 +0000 (11:22 -0700)]
Merge branches 'af_ib', 'cxgb4', 'misc', 'mlx5', 'ocrdma', 'qib' and 'srp' into for-next

10 years agoIB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd()
Roland Dreier [Mon, 8 Jul 2013 18:15:45 +0000 (11:15 -0700)]
IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd()

The macro get_unused_fd() is used to allocate a file descriptor with
default flags.  Those default flags (0) can be "unsafe": O_CLOEXEC must
be used by default to not leak file descriptor across exec().

Replace calls to get_unused_fd() in uverbs with calls to
get_unused_fd_flags(O_CLOEXEC).  Inheriting uverbs fds across exec()
cannot be used to do anything useful.

Based on a patch/suggestion from Yann Droneaud <ydroneaud@opteya.com>.

Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agomlx5_core: Fixes for sparse warnings
Roland Dreier [Mon, 8 Jul 2013 17:52:28 +0000 (10:52 -0700)]
mlx5_core: Fixes for sparse warnings

 - use be32_to_cpu() instead of cpu_to_be32() where appropriate.
 - use proper accessors for pointers marked __iomem.

Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/mlx5: Make profile[] static in main.c
Roland Dreier [Mon, 8 Jul 2013 07:13:35 +0000 (00:13 -0700)]
IB/mlx5: Make profile[] static in main.c

Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agomlx5: Fix parameter type of health_handler_t
Roland Dreier [Mon, 1 Jul 2013 21:15:17 +0000 (14:15 -0700)]
mlx5: Fix parameter type of health_handler_t

This deals with the sparse warning:

    drivers/net/ethernet/mellanox/mlx5/core/health.c:94:54: warning: incorrect type in argument 2 (different address spaces)
    drivers/net/ethernet/mellanox/mlx5/core/health.c:94:54:    expected void *buf
    drivers/net/ethernet/mellanox/mlx5/core/health.c:94:54:    got struct health_buffer [noderef] <asn:2>*health

Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agomlx5: Add driver for Mellanox Connect-IB adapters
Eli Cohen [Sun, 7 Jul 2013 14:25:49 +0000 (17:25 +0300)]
mlx5: Add driver for Mellanox Connect-IB adapters

The driver is comprised of two kernel modules: mlx5_ib and mlx5_core.
This partitioning resembles what we have for mlx4, except that mlx5_ib
is the pci device driver and not mlx5_core.

mlx5_core is essentially a library that provides general functionality
that is intended to be used by other Mellanox devices that will be
introduced in the future.  mlx5_ib has a similar role as any hardware
device under drivers/infiniband/hw.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
[ Merge in coccinelle fixes from Fengguang Wu <fengguang.wu@intel.com>.
  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/core: Add reserved values to enums for low-level driver use
Jack Morgenstein [Sun, 7 Jul 2013 14:25:52 +0000 (17:25 +0300)]
IB/core: Add reserved values to enums for low-level driver use

Continue the approach taken by commit d2b57063e4a ("IB/core: Reserve
bits in enum ib_qp_create_flags for low-level driver use") and add
reserved entries to the ib_qp_type and ib_wr_opcode enums.  Low-level
drivers can then define macros to use these reserved values, giving
proper names to the macros for readability.  Also add a range of
reserved flags to enum ib_send_flags.

The mlx5 IB driver uses the new additions.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/srp: Bump driver version and release date
Vu Pham [Fri, 28 Jun 2013 12:59:08 +0000 (14:59 +0200)]
IB/srp: Bump driver version and release date

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/srp: Make HCA completion vector configurable
Bart Van Assche [Fri, 28 Jun 2013 12:57:42 +0000 (14:57 +0200)]
IB/srp: Make HCA completion vector configurable

Several InfiniBand HCAs allow configuring the completion vector per
CQ.  This allows spreading the workload created by IB completion
interrupts over multiple MSI-X vectors and hence over multiple CPU
cores.  In other words, configuring the completion vector properly not
only allows reducing latency on an initiator connected to multiple
SRP targets but also allows improving throughput.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/srp: Maintain a single connection per I_T nexus
Bart Van Assche [Fri, 28 Jun 2013 12:51:26 +0000 (14:51 +0200)]
IB/srp: Maintain a single connection per I_T nexus

An SRP target is required to maintain a single connection between
initiator and target.  This means that if the 'add_target' attribute
is used to create a second connection to a target, the first
connection will be logged out and that the SCSI error handler will
kick in.  The SCSI error handler will cause the SRP initiator to
reconnect, which will cause I/O over the second connection to fail.
Avoid such ping-pong behavior by disabling relogins.

If reconnecting manually is necessary, that is possible by deleting
and recreating an rport via sysfs.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/srp: Fail I/O fast if target offline
Bart Van Assche [Fri, 28 Jun 2013 12:49:58 +0000 (14:49 +0200)]
IB/srp: Fail I/O fast if target offline

If reconnecting failed we know that no command completion will
be received anymore.  Hence let the SCSI error handler fail such
commands immediately.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/srp: Skip host settle delay
Bart Van Assche [Wed, 12 Jun 2013 13:24:25 +0000 (15:24 +0200)]
IB/srp: Skip host settle delay

The SRP initiator implements host reset by reconnecting to the SRP
target.  That means that communication with the target is possible as
soon as host reset finished. Hence skip the host settle delay.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/srp: Avoid skipping srp_reset_host() after a transport error
Bart Van Assche [Wed, 12 Jun 2013 13:23:04 +0000 (15:23 +0200)]
IB/srp: Avoid skipping srp_reset_host() after a transport error

The SCSI error handler assumes that the transport layer is operational
if an eh_abort_handler() returns SUCCESS.  Hence srp_abort() only
should return SUCCESS if sending the ABORT TASK task management
function succeeded.  This patch avoids the SCSI error handler skipping
the srp_reset_host() call after a transport layer error.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/srp: Fix remove_one crash due to resource exhaustion
Dotan Barak [Wed, 12 Jun 2013 13:20:36 +0000 (15:20 +0200)]
IB/srp: Fix remove_one crash due to resource exhaustion

If the add_one callback fails during driver load no resources are
allocated so there isn't a need to release any resources. Trying
to clean the resource may lead to the following kernel panic:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp]
    RIP: 0010:[<ffffffffa0132331>]  [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp]
    Process rmmod (pid: 4562, threadinfo ffff8800dd738000, task ffff8801167e60c0)
    Call Trace:
     [<ffffffffa024500e>] ib_unregister_client+0x4e/0x120 [ib_core]
     [<ffffffffa01361bd>] srp_cleanup_module+0x15/0x71 [ib_srp]
     [<ffffffff810ac6a4>] sys_delete_module+0x194/0x260
     [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/qib: New transmitter tunning settings for Dell 1.1 backplane
Mitko Haralanov [Wed, 26 Jun 2013 14:46:22 +0000 (10:46 -0400)]
IB/qib: New transmitter tunning settings for Dell 1.1 backplane

The Dell blade chassis got an updated backplane which requires new
transmitter tuning settings.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/core: Fix error return code in add_port()
Wei Yongjun [Fri, 21 Jun 2013 03:24:27 +0000 (11:24 +0800)]
IB/core: Fix error return code in add_port()

Fix to return -ENOMEM in the add_port() error handling case instead of
0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/ocrdma: Fix error return code in ocrdma_set_create_qp_rq_cmd()
Wei Yongjun [Sun, 23 Jun 2013 01:07:19 +0000 (09:07 +0800)]
RDMA/ocrdma: Fix error return code in ocrdma_set_create_qp_rq_cmd()

Fix to return -ENOMEM in the alloc dma coherent error case instead of
0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/qib: Add qp_stats debug file
Mike Marciniszyn [Sat, 15 Jun 2013 21:07:14 +0000 (17:07 -0400)]
IB/qib: Add qp_stats debug file

This adds a seq_file iterator for reporting the QP hash table when the
qp_stats file is read.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/qib: Add per-context stats interface
Mike Marciniszyn [Sat, 15 Jun 2013 21:07:09 +0000 (17:07 -0400)]
IB/qib: Add per-context stats interface

This patch adds a debugfs stats interface for per kernel contexts
packet counts.

The code uses the opcode stats count and eliminates the counter in the
context.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/qib: Convert opcode counters to per-context
Mike Marciniszyn [Sat, 15 Jun 2013 21:07:03 +0000 (17:07 -0400)]
IB/qib: Convert opcode counters to per-context

This fix changes the opcode relative counters for receive to per
context.

Profiling has shown that when mulitple contexts are being used there
is a lot of cache activity associated with these counters.

The code formerly kept these counters per port, but only provided the
interface to read per HCA.  This patch converts the read of counters
to per HCA and adds the debugfs hooks to be able to read the file as a
sequence of opcodes.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/qib: Optimize CQ callbacks
Mike Marciniszyn [Tue, 4 Jun 2013 19:05:37 +0000 (15:05 -0400)]
IB/qib: Optimize CQ callbacks

The current workqueue implemention has the following performance
deficiencies on QDR HCAs:

- The CQ call backs tend to run on the CPUs processing the
  receive queues
- The single thread queue isn't optimal for multiple HCAs

This patch adds a dedicated per HCA bound thread to process CQ callbacks.

Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/qib: Add dual-rail NUMA awareness for PSM processes
Ramkrishna Vepa [Sun, 2 Jun 2013 19:16:11 +0000 (15:16 -0400)]
IB/qib: Add dual-rail NUMA awareness for PSM processes

The driver currently selects a HCA based on the algorithm that PSM
chooses, contexts within a HCA or across. The HCA can also be chosen
by the user. Either way, this patch assigns a CPU on the NUMA node
local to the selected HCA. This patch also tries to select the HCA
closest to the NUMA node of the CPU assigned via taskset to PSM
process. If this HCA is unusable then another unit is selected based
on the algorithm that is currently enforced or selected by PSM - round
robin context selection 'within' or 'across' HCA's.

Fixed a bug wherein contexts are setup on the NUMA node on which the
processes are opened (setup_ctxt()) and not on the NUMA node that the
driver recommends the CPU on.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/qib: Add optional NUMA affinity
Ramkrishna Vepa [Tue, 28 May 2013 16:57:33 +0000 (12:57 -0400)]
IB/qib: Add optional NUMA affinity

This patch adds context relative numa affinity conditioned on the
module parameter numa_aware. The qib_ctxtdata has an additional
node_id member and qib_create_ctxtdata() has an addition node_id
parameter.

The allocations within the hdr queue and eager queue setup routines
now take this additional member and adjust allocations as necesary.
PSM will pass the either current numa node or the node closest to the
HCA depending on numa_aware. Verbs will always use the node closest to
the HCA.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/qib: Update minor version number
Vinit Agnihotri [Sat, 15 Jun 2013 21:11:38 +0000 (17:11 -0400)]
IB/qib: Update minor version number

External PSM repositories have advanced the minor number for a variety
of reasons. The driver needs to increase to avoid warnings.

Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/qib: Remove atomic_inc_not_zero() from QP RCU
Mike Marciniszyn [Sat, 15 Jun 2013 21:06:58 +0000 (17:06 -0400)]
IB/qib: Remove atomic_inc_not_zero() from QP RCU

Follow Documentation/RCU/rcuref.txt guidance in removing
atomic_inc_not_zero() from QP RCU implementation.

This patch also removes an unneeded synchronize_rcu() in the add path.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/qib: Add DCA support
Mike Marciniszyn [Thu, 30 May 2013 22:25:25 +0000 (18:25 -0400)]
IB/qib: Add DCA support

This patch adds DCA cache warming for systems that support DCA.

The code uses cpu affinity notification to react to an affinity change
from a user mode program like irqbalance and (re-)program the chip
accordingly. This notification avoids reading the current cpu on every
interrupt.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
[ Add Kconfig dependency on SMP && GENERIC_HARDIRQS to avoid failure to
  build due to undefined struct irq_affinity_notify.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Export AF_IB statistics
Sean Hefty [Wed, 29 May 2013 17:09:34 +0000 (10:09 -0700)]
RDMA/cma: Export AF_IB statistics

Report AF_IB source and destination addresses through netlink
interface.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/ucma: Allow user space to specify AF_IB when joining multicast
Sean Hefty [Wed, 29 May 2013 17:09:33 +0000 (10:09 -0700)]
RDMA/ucma: Allow user space to specify AF_IB when joining multicast

Allow user space applications to join multicast groups using MGIDs
directly.  MGIDs may be passed using AF_IB addresses.  Since the
current multicast join command only supports addresses as large as
sockaddr_in6, define a new structure for joining addresses specified
using sockaddr_ib.

Since AF_IB allows the user to specify the qkey when resolving a
remote UD QP address, when joining the multicast group use the qkey
value, if one has been assigned.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/ucma: Allow user space to pass AF_IB into resolve
Sean Hefty [Wed, 29 May 2013 17:09:32 +0000 (10:09 -0700)]
RDMA/ucma: Allow user space to pass AF_IB into resolve

Allow user space applications to call resolve_addr using AF_IB.  To
support sockaddr_ib, we need to define a new structure capable of
handling the larger address size.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/ucma: Allow user space to bind to AF_IB
Sean Hefty [Wed, 29 May 2013 17:09:31 +0000 (10:09 -0700)]
RDMA/ucma: Allow user space to bind to AF_IB

Support user space binding to addresses using AF_IB.  Since
sockaddr_ib is larger than sockaddr_in6, we need to define a larger
structure when binding using AF_IB.  This time we use sockaddr_storage
to cover future cases.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/ucma: Name changes to indicate only IP addresses supported
Sean Hefty [Wed, 29 May 2013 17:09:30 +0000 (10:09 -0700)]
RDMA/ucma: Name changes to indicate only IP addresses supported

Several commands into the RDMA CM from user space are restricted to
supporting addresses which fit into a sockaddr_in6 structure: bind
address, resolve address, and join multicast.

With the addition of AF_IB, we need to support addresses which are
larger than sockaddr_in6.  This will be done by adding new commands
that exchange address information using sockaddr_storage.  However, to
support existing applications, we maintain the current commands and
structures, but rename them to indicate that they only support IPv4
and v6 addresses.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/ucma: Add ability to query GID addresses
Sean Hefty [Wed, 29 May 2013 17:09:29 +0000 (10:09 -0700)]
RDMA/ucma: Add ability to query GID addresses

Part of address resolution is mapping IP addresses to IB GIDs.  With
the changes to support querying larger addresses and more path records,
also provide a way to query IB GIDs after resolution completes.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Export cma_get_service_id()
Sean Hefty [Wed, 29 May 2013 17:09:28 +0000 (10:09 -0700)]
RDMA/cma: Export cma_get_service_id()

Allow the rdma_ucm to query the IB service ID formed or allocated by
the rdma_cm by exporting the cma_get_service_id() functionality.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/ucma: Support querying when IB paths are not reversible
Sean Hefty [Wed, 29 May 2013 17:09:27 +0000 (10:09 -0700)]
RDMA/ucma: Support querying when IB paths are not reversible

The current query_route call can return up to two path records.  The
assumption being that one is the primary path, with optional support
for an alternate path.  In both cases, the paths are assumed to be
reversible and are used to send CM MADs.

With the ability to manually set IB path data, the rdma cm can
eventually be capable of using up to 6 paths per connection:

forward primary, reverse primary,
forward alternate, reverse alternate,
reversible primary path for CM MADs
reversible alternate path for CM MADs.

(It is unclear at this time if IB routing will complicate this)  In
order to handle more flexible routing topologies, add a new command to
report any number of paths.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/sa: Export function to pack a path record into wire format
Sean Hefty [Wed, 29 May 2013 17:09:26 +0000 (10:09 -0700)]
IB/sa: Export function to pack a path record into wire format

Allow converting from struct ib_sa_path_rec to the IB defined SA path
record wire format.  This will be used to report path data from the
rdma cm into user space.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/ucma: Support querying for AF_IB addresses
Sean Hefty [Wed, 29 May 2013 17:09:25 +0000 (10:09 -0700)]
RDMA/ucma: Support querying for AF_IB addresses

The sockaddr structure for AF_IB is larger than sockaddr_in6.  The
rdma cm user space ABI uses the latter to exchange address information
between user space and the kernel.

To support querying for larger addresses, define a new query command
that exchanges data using sockaddr_storage, rather than sockaddr_in6.
Unlike the existing query_route command, the new command only returns
address information.  Route (i.e. path record) data is separated.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Only listen on IB devices when using AF_IB
Sean Hefty [Wed, 29 May 2013 17:09:24 +0000 (10:09 -0700)]
RDMA/cma: Only listen on IB devices when using AF_IB

If an rdma_cm_id is bound to AF_IB, with a wild card address, only
listen on IB devices.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Set qkey for AF_IB
Sean Hefty [Wed, 29 May 2013 17:09:23 +0000 (10:09 -0700)]
RDMA/cma: Set qkey for AF_IB

Allow the user to specify the qkey when using AF_IB.  The qkey is
added to struct rdma_ucm_conn_param in place of a reserved field, but
for backwards compatability, is only accessed if the associated
rdma_cm_id is using AF_IB.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Expose private data when using AF_IB
Sean Hefty [Wed, 29 May 2013 17:09:22 +0000 (10:09 -0700)]
RDMA/cma: Expose private data when using AF_IB

If the source or destination address is AF_IB, then do not reserve a
portion of the private data in the IB CM REQ or SIDR REQ messages for
the cma header.  Instead, all private data should be exported to the
user.  When AF_IB is used, the rdma cm does not have sufficient
information to fill in the cma header.  Additionally, this will be
necessary to support any IB connection through the rdma cm interface,

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Merge cma_get/save_net_info
Sean Hefty [Wed, 29 May 2013 17:09:21 +0000 (10:09 -0700)]
RDMA/cma: Merge cma_get/save_net_info

With the removal of SDP related code, we can merge cma_get_net_info()
with cma_save_net_info(), since we're only ever dealing with a single
header format.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Remove unused SDP related code
Sean Hefty [Wed, 29 May 2013 17:09:20 +0000 (10:09 -0700)]
RDMA/cma: Remove unused SDP related code

The SDP protocol was never merged upstream.  Remove unused SDP related
code from the RDMA CM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Add support for AF_IB to cma_get_service_id()
Sean Hefty [Wed, 29 May 2013 17:09:19 +0000 (10:09 -0700)]
RDMA/cma: Add support for AF_IB to cma_get_service_id()

cma_get_service_id() forms the service ID based on the port space and
port number of the rdma_cm_id.  Extend the call to support AF_IB,
which contains the service ID directly.  This will be needed to
support any arbitrary SID.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Add support for AF_IB to rdma_resolve_route()
Sean Hefty [Wed, 29 May 2013 17:09:18 +0000 (10:09 -0700)]
RDMA/cma: Add support for AF_IB to rdma_resolve_route()

Allow rdma_resolve_route() to handle the case where the user specified
the source and destination addresses using AF_IB.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Add support for AF_IB to rdma_resolve_addr()
Sean Hefty [Wed, 29 May 2013 17:09:17 +0000 (10:09 -0700)]
RDMA/cma: Add support for AF_IB to rdma_resolve_addr()

Allow the user to specify the remote address using AF_IB format.  When
AF_IB is used, the remote address simply needs to be recorded, and no
resolution using ARP is done.  The local address may still need to be
matched with a local IB device.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Verify that source and dest sa_family are the same
Sean Hefty [Wed, 29 May 2013 17:09:16 +0000 (10:09 -0700)]
RDMA/cma: Verify that source and dest sa_family are the same

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Restrict AF_IB loopback to binding to IB devices only
Sean Hefty [Wed, 29 May 2013 17:09:15 +0000 (10:09 -0700)]
RDMA/cma: Restrict AF_IB loopback to binding to IB devices only

If a user specifies AF_IB as the source address for a loopback
connection, limit the resolution to IB devices only.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Add helper functions to return id address information
Sean Hefty [Wed, 29 May 2013 17:09:14 +0000 (10:09 -0700)]
RDMA/cma: Add helper functions to return id address information

Provide inline helpers to extract source and destination address data
from the rdma_cm_id.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Do not modify sa_family when setting loopback address
Sean Hefty [Wed, 29 May 2013 17:09:13 +0000 (10:09 -0700)]
RDMA/cma: Do not modify sa_family when setting loopback address

cma_resolve_loopback is called after an rdma_cm_id has been
bound to a specific sa_family and port.  Once the
source sa_family for the id has been set, do not modify it.
Only the actual IP address portion of the source address
needs to be set.

As part of this fix, we can simplify setting the source address
by moving the loopback address assignment from cma_resolve_loopback
to cma_bind_loopback.  cma_bind_loopback is only invoked when
the source address is the loopback address.

Finally, add loopback support for AF_IB as part of the change.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Allow user to specify AF_IB when binding
Sean Hefty [Wed, 29 May 2013 17:09:12 +0000 (10:09 -0700)]
RDMA/cma: Allow user to specify AF_IB when binding

Modify rdma_bind_addr to allow the user to specify AF_IB when binding
to a device.  AF_IB indicates that the user is not mapping an IP
address to the native IB addressing.  (The mapping may have already
been done, or is not needed)

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Update port reservation to support AF_IB
Sean Hefty [Wed, 29 May 2013 17:09:11 +0000 (10:09 -0700)]
RDMA/cma: Update port reservation to support AF_IB

The AF_IB uses a 64-bit service id (SID), which the user can control
through the use of a mask.  The rdma_cm will assign values to the
unmasked portions of the SID based on the selected port space and port
number.

Because the IB spec divides the SID range into several regions, a
SID/mask combination may fall into one of the existing port space
ranges as defined by the RDMA CM IP Annex.  Map the AF_IB SID to the
correct RDMA port space.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/addr: Add AF_IB support to ip_addr_size
Sean Hefty [Wed, 29 May 2013 17:09:10 +0000 (10:09 -0700)]
IB/addr: Add AF_IB support to ip_addr_size

Add support for AF_IB to ip_addr_size, and rename the function to
account for the change.  Give the compiler more control over whether
the call should be inline or not by moving the definition into the .c
file, removing the static inline, and exporting it.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Include AF_IB in loopback and any address checks
Sean Hefty [Wed, 29 May 2013 17:09:09 +0000 (10:09 -0700)]
RDMA/cma: Include AF_IB in loopback and any address checks

Enhance checks for loopback and any address to support AF_IB in
addition to AF_INET and AF_INT6.  This will allow future patches to
use AF_IB when binding and resolving addresses.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Allow enabling reuseaddr in any state
Sean Hefty [Wed, 29 May 2013 17:09:08 +0000 (10:09 -0700)]
RDMA/cma: Allow enabling reuseaddr in any state

The rdma_cm only allows setting reuseaddr if the corresponding
rdma_cm_id is in the idle state.  Allow setting this value in other
states.  This brings the behavior more inline with sockets.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cma: Define native IB address
Sean Hefty [Wed, 29 May 2013 17:09:07 +0000 (10:09 -0700)]
RDMA/cma: Define native IB address

Define AF_IB and sockaddr_ib to allow the rdma_cm to use native IB
addressing.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/ocrdma: Reorg structures to avoid padding
Naresh Gottumukkala [Mon, 10 Jun 2013 04:42:42 +0000 (04:42 +0000)]
RDMA/ocrdma: Reorg structures to avoid padding

Reorg structures to better packing to avoid cacheline padding.

Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/ocrdma: Change macros to inline funtions
Naresh Gottumukkala [Mon, 10 Jun 2013 04:42:41 +0000 (04:42 +0000)]
RDMA/ocrdma: Change macros to inline funtions

Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/ocrdma: Set bad_wr in error case
Naresh Gottumukkala [Mon, 10 Jun 2013 04:42:40 +0000 (04:42 +0000)]
RDMA/ocrdma: Set bad_wr in error case

Fix post_send to set the bad_wr in error case.

Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/ocrdma: Replace ocrdma_err with pr_err
Naresh Gottumukkala [Mon, 10 Jun 2013 04:42:39 +0000 (04:42 +0000)]
RDMA/ocrdma: Replace ocrdma_err with pr_err

Remove private macro ocrdma_err and replace with standard pr_err.

Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/ocrdma: Use MCC_CREATE_EXT_V1 for MCC create
Naresh Gottumukkala [Mon, 10 Jun 2013 04:42:38 +0000 (04:42 +0000)]
RDMA/ocrdma: Use MCC_CREATE_EXT_V1 for MCC create

Use MCC_CREATE_EXT_V1 to create MCC_queue to receive RoCE events.

Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/ocrdma: Remove use_cnt for queues
Gottumukkala, Naresh [Wed, 5 Jun 2013 08:50:46 +0000 (08:50 +0000)]
RDMA/ocrdma: Remove use_cnt for queues

Remove use_cnt.  Rely on IB midlayer to keep track of the use count.

Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/ehca: Fix error return code in ehca_create_slab_caches()
Wei Yongjun [Wed, 19 Jun 2013 02:40:09 +0000 (10:40 +0800)]
IB/ehca: Fix error return code in ehca_create_slab_caches()

Fix to return -ENOMEM in the kmem_cache_create() error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cxgb3: Timeout condition is never true
Dan Carpenter [Tue, 18 Jun 2013 07:27:38 +0000 (10:27 +0300)]
RDMA/cxgb3: Timeout condition is never true

This is a static checker fix.  "count" is unsigned so it's never -1.
Since "count" is 16 bits and the addition operation is implicitly
casted to int then there is no wrapping here.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agondisc: Convert use of typedef ctl_table to struct ctl_table
Joe Perches [Fri, 14 Jun 2013 02:37:54 +0000 (19:37 -0700)]
ndisc: Convert use of typedef ctl_table to struct ctl_table

This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv6: Convert use of typedef ctl_table to struct ctl_table
Joe Perches [Fri, 14 Jun 2013 02:37:53 +0000 (19:37 -0700)]
ipv6: Convert use of typedef ctl_table to struct ctl_table

This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoinet: frag , remove an empty ifdef.
Rami Rosen [Sat, 15 Jun 2013 20:04:56 +0000 (23:04 +0300)]
inet: frag , remove an empty ifdef.

This patch removes an empty ifdef from inet_frag_intern()
in net/ipv4/inet_fragment.c.

commit b67bfe0d42cac56c512dd5da4b1b347a23f4b70a
(hlist: drop the node parameter from iterators) removed hlist from
net/ipv4/inet_fragment.c, but did not remove the enclosing ifdef command,
which is now empty.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agohtb: refactor struct htb_sched fields for performance
Eric Dumazet [Sat, 15 Jun 2013 10:30:10 +0000 (03:30 -0700)]
htb: refactor struct htb_sched fields for performance

htb_sched structures are big, and source of false sharing on SMP.

Every time a packet is queued or dequeue, many cache lines must be
touched because structures are not lay out properly.

By carefully splitting htb_sched in two parts, and define sub structures
to increase data locality, we can improve performance dramatically on
SMP.

New htb_prio structure can also be used in htb_class to increase data
locality.

I got 26 % performance increase on a 24 threads machine, with 200
concurrent netperf in TCP_RR mode, using a HTB hierarchy of 4 classes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotcp: introduce a per-route knob for quick ack
Cong Wang [Sat, 15 Jun 2013 01:39:18 +0000 (09:39 +0800)]
tcp: introduce a per-route knob for quick ack

In previous discussions, I tried to find some reasonable heuristics
for delayed ACK, however this seems not possible, according to Eric:

"ACKS might also be delayed because of bidirectional
traffic, and is more controlled by the application
response time. TCP stack can not easily estimate it."

"ACK can be incredibly useful to recover from losses in
a short time.

The vast majority of TCP sessions are small lived, and we
send one ACK per received segment anyway at beginning or
retransmits to let the sender smoothly increase its cwnd,
so an auto-tuning facility wont help them that much."

and according to David:

"ACKs are the only information we have to detect loss.

And, for the same reasons that TCP VEGAS is fundamentally
broken, we cannot measure the pipe or some other
receiver-side-visible piece of information to determine
when it's "safe" to stretch ACK.

And even if it's "safe", we should not do it so that losses are
accurately detected and we don't spuriously retransmit.

The only way to know when the bandwidth increases is to
"test" it, by sending more and more packets until drops happen.
That's why all successful congestion control algorithms must
operate on explicited tested pieces of information.

Similarly, it's not really possible to universally know if
it's safe to stretch ACK or not."

It still makes sense to enable or disable quick ack mode like
what TCP_QUICK_ACK does.

Similar to TCP_QUICK_ACK option, but for people who can't
modify the source code and still wants to control
TCP delayed ACK behavior. As David suggested, this should belong
to per-path scope, since different pathes may want different
behaviors.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Rick Jones <rick.jones2@hp.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosctp: Convert __list_for_each use to list_for_each
Dave Jones [Tue, 18 Jun 2013 02:26:52 +0000 (22:26 -0400)]
sctp: Convert __list_for_each use to list_for_each

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobnx2: use pdev->pm_cap instead of pci_find_capability(.., PCI_CAP_ID_PM)
Yijing Wang [Tue, 18 Jun 2013 08:12:37 +0000 (16:12 +0800)]
bnx2: use pdev->pm_cap instead of pci_find_capability(.., PCI_CAP_ID_PM)

Pci core has been saved pm cap register offset by pdev->pm_cap in pci_pm_init()
in init path. So we can use pdev->pm_cap instead of using
pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and simplified code.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Michael Chan <mchan@broadcom.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoamd8111e: use pdev->pm_cap instead of pci_find_capability(.., PCI_CAP_ID_PM)
Yijing Wang [Tue, 18 Jun 2013 08:06:37 +0000 (16:06 +0800)]
amd8111e: use pdev->pm_cap instead of pci_find_capability(.., PCI_CAP_ID_PM)

Pci core has been saved pm cap register offset by pdev->pm_cap in pci_pm_init()
in init path. So we can use pdev->pm_cap instead of using
pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and simplified code.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: netdev@vger.kernel.org (open list:NETWORKING DRIVERS)
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoBnx2x: remove redundant D0 power state set
Yijing Wang [Tue, 18 Jun 2013 08:05:39 +0000 (16:05 +0800)]
Bnx2x: remove redundant D0 power state set

Pci_enable_device() will set device power state to D0,
so it's no need to do it again in bnx2x_init_dev().
Also remove redundant PM Cap find code, because pci core
has been saved the pci device pm cap value.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: Add missing dependencies on NETDEVICES
Ben Hutchings [Tue, 18 Jun 2013 02:37:05 +0000 (03:37 +0100)]
net: Add missing dependencies on NETDEVICES

ETRAX_ETHERNET selects ETHERNET and MII, which depend on NETDEVICES.
I don't think anything should select NETDEVICES, so make it a
dependency.  It also doesn't need to select or depend on ETHERNET,
which has nothing to do with the Ethernet library functions.

BPCTL selects MII, which depends on NETDEVICES.  But everything in the
drivers/staging/silicom directory is related to net devices, so make
NET_VENDOR_SILICOM depend on NETDEVICES and remove the now-redundant
dependencies on NET.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoat91_ether: Do not select NET_CORE
Ben Hutchings [Tue, 18 Jun 2013 02:27:29 +0000 (03:27 +0100)]
at91_ether: Do not select NET_CORE

This has no dependency on any of the drivers under NET_CORE.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: Move MII out from under NET_CORE and hide it
Ben Hutchings [Tue, 18 Jun 2013 02:24:51 +0000 (03:24 +0100)]
net: Move MII out from under NET_CORE and hide it

All drivers that select MII also need to select NET_CORE because MII
depends on it.  This is a bit ridiculous because NET_CORE is just a
menu option that doesn't enable any code by itself.

There is also no need for it to be a visible option, since its users
all select it.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotcp:typo unset should be unsent
Weiping Pan [Tue, 18 Jun 2013 13:00:31 +0000 (21:00 +0800)]
tcp:typo unset should be unsent

Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: trivial: make alb use bond_slave_has_mac()
Veaceslav Falico [Tue, 18 Jun 2013 11:44:52 +0000 (13:44 +0200)]
bonding: trivial: make alb use bond_slave_has_mac()

Also, cleanup bond_alb_handle_active_change() from 2 identical ifs.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobe2net: use pci_vfs_assigned()/pci_num_vf() instead of be_find_vfs()
Sathya Perla [Fri, 14 Jun 2013 10:24:51 +0000 (15:54 +0530)]
be2net: use pci_vfs_assigned()/pci_num_vf() instead of be_find_vfs()

be_find_vfs() is no longer needed as the common PCI calls provide the same
functionality.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosit: fix an oops when IFLA_IPTUN_PROTO is not set
Nicolas Dichtel [Wed, 19 Jun 2013 10:03:13 +0000 (12:03 +0200)]
sit: fix an oops when IFLA_IPTUN_PROTO is not set

The use of this attribute has been added in 32b8a8e59c9c (sit: add IPv4 over
IPv4 support). It is optional, by default proto is IPPROTO_IPV6.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: sock: adapt SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF
Daniel Borkmann [Wed, 19 Jun 2013 10:51:20 +0000 (12:51 +0200)]
net: sock: adapt SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF

The current situation is that SOCK_MIN_RCVBUF is 2048 + sizeof(struct sk_buff))
while SOCK_MIN_SNDBUF is 2048. Since in both cases, skb->truesize is used for
sk_{r,w}mem_alloc accounting, we should have both sizes adjusted via defining a
TCP_SKB_MIN_TRUESIZE.

Further, as Eric Dumazet points out, the minimal skb truesize in transmit path is
SKB_TRUESIZE(2048) after commit f07d960df33c5 ("tcp: avoid frag allocation for
small frames"), and tcp_sendmsg() tries to limit skb size to half the congestion
window, meaning we try to build two skbs at minimum. Thus, having SOCK_MIN_SNDBUF
as 2048 can hit a small regression for some applications setting to low
SO_SNDBUF / SO_RCVBUF. Note that we define a TCP_SKB_MIN_TRUESIZE, because
SKB_TRUESIZE(2048) adds SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), but in
case of TCP skbs, the skb_shared_info is part of the 2048 bytes allocation for
skb->head.

The minor adaption in sk_stream_moderate_sndbuf() is to silence a warning by
using a typed max macro, as similarly done in SOCK_MIN_RCVBUF occurences, that
would appear otherwise.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoneigh: disallow un-init_net to change thresh of neigh
Gao feng [Thu, 20 Jun 2013 02:01:34 +0000 (10:01 +0800)]
neigh: disallow un-init_net to change thresh of neigh

thresh and interval are global resources,
only init net can change them.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoneigh: only allow init_net to change the default neigh_parms
Gao feng [Thu, 20 Jun 2013 02:01:33 +0000 (10:01 +0800)]
neigh: only allow init_net to change the default neigh_parms

Though we don't export the /proc/sys/net/ipv[4,6]/neigh/default/
directory to the un-init_net, but we can still use cmd such as
"ip ntable change name arp_cache locktime 129" to change the locktime
of default neigh_parms.

This patch disallows the un-init_net to find out the neigh_table.parms.
So the un-init_net will failed to influence the init_net.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoneigh: no need to call lookup_neigh_parms in neigh_parms_alloc
Gao feng [Thu, 20 Jun 2013 02:01:32 +0000 (10:01 +0800)]
neigh: no need to call lookup_neigh_parms in neigh_parms_alloc

neigh_table.parms always exist and is initialized,kmemdup
can use it to create new neigh_parms, actually lookup_neigh_parms
here will return neigh_table.parms too.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobnx2x: replace mechanism to check for next available packet
Dmitry Kravkov [Tue, 18 Jun 2013 22:36:05 +0000 (01:36 +0300)]
bnx2x: replace mechanism to check for next available packet

Check next packet availability by validating that HW has finished CQE
placement. This saves latency of another dma transaction performed to update
SB indexes.

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobnx2x: add support for ndo_ll_poll
Dmitry Kravkov [Tue, 18 Jun 2013 22:36:04 +0000 (01:36 +0300)]
bnx2x: add support for ndo_ll_poll

Adds ndo_ll_poll method and locking for FPs between LL and the napi.

When receiving a packet we use skb_mark_ll to record the napi it came from.
Add each napi to the napi_hash right after netif_napi_add().

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_en: Low Latency recv statistics
Amir Vadai [Tue, 18 Jun 2013 13:18:28 +0000 (16:18 +0300)]
net/mlx4_en: Low Latency recv statistics

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_en: Add Low Latency Socket (LLS) support
Amir Vadai [Tue, 18 Jun 2013 13:18:27 +0000 (16:18 +0300)]
net/mlx4_en: Add Low Latency Socket (LLS) support

Add basic support for LLS.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoopenvswitch: gre tunneling support.
David S. Miller [Thu, 20 Jun 2013 01:07:49 +0000 (18:07 -0700)]
openvswitch: gre tunneling support.

Pravin B Shelar says:

====================
Following patch series adds support for gre tunneling.
First six patches extend kernel gre and ip_tunnel modules
api so that there is more code sharing between gre modules
and ovs. Rest of patches adds ovs tunneling infrastructre
and gre protocol vport.

V2 fixes two patches according to comments from Jesse.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoopenvswitch: Add gre tunnel support.
Pravin B Shelar [Tue, 18 Jun 2013 00:50:33 +0000 (17:50 -0700)]
openvswitch: Add gre tunnel support.

Add gre vport implementation.  Most of gre protocol processing
is pushed to gre module. It make use of gre demultiplexer
therefore it can co-exist with linux device based gre tunnels.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoopenvswitch: Optimize flow key match for non tunnel flows.
Pravin B Shelar [Tue, 18 Jun 2013 00:50:28 +0000 (17:50 -0700)]
openvswitch: Optimize flow key match for non tunnel flows.

Following patch adds start offset for sw_flow-key, so that we can
skip tunneling information in key for non-tunnel flows.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoopenvswitch: Expand action buffer size.
Pravin B Shelar [Tue, 18 Jun 2013 00:50:23 +0000 (17:50 -0700)]
openvswitch: Expand action buffer size.

MAX_ACTIONS_BUFSIZE limits action list size, set tunnel action
needs extra space on action list, for now increase max actions list limit.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoopenvswitch: Add tunneling interface.
Pravin B Shelar [Tue, 18 Jun 2013 00:50:18 +0000 (17:50 -0700)]
openvswitch: Add tunneling interface.

Add ovs tunnel interface for set tunnel action for userspace.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoopenvswitch: Copy individual actions.
Pravin B Shelar [Tue, 18 Jun 2013 00:50:12 +0000 (17:50 -0700)]
openvswitch: Copy individual actions.

Rather than validating actions and then copying all actiaons
in one block, following patch does same operation in single pass.
This validate and copy action one by one. This is required for
ovs tunneling patch.

This patch does not change any functionality.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoip_tunnel: Add dont fragment flag.
Pravin B Shelar [Tue, 18 Jun 2013 00:50:07 +0000 (17:50 -0700)]
ip_tunnel: Add dont fragment flag.

This flag will be used by ovs tunneling.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoip_tunnel: push generic protocol handling to ip_tunnel module.
Pravin B Shelar [Tue, 18 Jun 2013 00:50:02 +0000 (17:50 -0700)]
ip_tunnel: push generic protocol handling to ip_tunnel module.

Process skb tunnel header before sending packet to protocol handler.
this allows code sharing between gre and ovs gre modules.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>