cascardo/linux.git
8 years agobe2net: cleanup FW flash image related macro defines
Suresh Reddy [Wed, 30 Dec 2015 06:28:58 +0000 (01:28 -0500)]
be2net: cleanup FW flash image related macro defines

Many constant definitions relating to the FW-image layout
(such as section offset values) were defined in decimal format rather than
hexa-decimal. This makes this part of the code un-readable. Also some
defines related to BE2 are labeld "g2" and defines related to BE3 are
labeled "g3".  This patch cleans up all of this to make this code more
readable.

Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobe2net: avoid configuring VEPA mode on BE3
Suresh Reddy [Wed, 30 Dec 2015 06:28:57 +0000 (01:28 -0500)]
be2net: avoid configuring VEPA mode on BE3

BE3 chip doesn't support VEPA mode.

Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobe2net: fix VF link state transition from disabled to auto
Suresh Reddy [Wed, 30 Dec 2015 06:28:56 +0000 (01:28 -0500)]
be2net: fix VF link state transition from disabled to auto

The VF link state setting transition from "disable" to "auto" does not work
due to a bug in SET_LOGICAL_LINK_CONFIG_V1 cmd in FW. This issue could not
be fixed in FW due to some backward compatibility issues it causes with
some released drivers. The issue has been fixed by introducing a new
version (v2) of the cmd from 10.6 FW onwards. In v2, to set the VF link
state to auto, both PLINK_ENABLE and PLINK_TRACK bits have to be set to 1.

The VF link state setting feature now works on Lancer chips too from
FW ver 10.6.315.0 onwards.

Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: hns: use to_platform_device()
Geliang Tang [Sun, 27 Dec 2015 13:15:44 +0000 (21:15 +0800)]
net: hns: use to_platform_device()

Use to_platform_device() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoatm: solos-pci: use to_pci_dev()
Geliang Tang [Sun, 27 Dec 2015 10:45:57 +0000 (18:45 +0800)]
atm: solos-pci: use to_pci_dev()

Use to_pci_dev() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'bpf_hash-locking'
David S. Miller [Tue, 29 Dec 2015 20:13:45 +0000 (15:13 -0500)]
Merge branch 'bpf_hash-locking'

Ming Lei says:

====================
bpf: hash: use per-bucket spinlock

This patchset tries to optimize ebpf hash map, and follows
the idea:

        Both htab_map_update_elem() and htab_map_delete_elem()
        can be called from eBPF program, and they may be in kernel
        hot path, it isn't efficient to use a per-hashtable lock
        in this two helpers, so this patch converts the lock into
        per-bucket spinlock.

With this patchset, looks the performance penalty from eBPF
decreased a lot, see the following test:

        1) run 'tools/biolatency' of bcc before running block test;

        2) run fio to test block throught over /dev/nullb0,
        (randread, 16jobs, libaio, 4k bs) and the test box
        is one 24cores(dual sockets) VM server:
        - without patchset:  607K IOPS
        - with this patchset: 1184K IOPS
        - without running eBPF prog: 1492K IOPS

TODO:
        - remove the per-hashtable atomic counter

V2:
        - fix checking on buckets size
V1:
        - fix the wrong 3/3 patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: hash: use per-bucket spinlock
tom.leiming@gmail.com [Tue, 29 Dec 2015 14:40:27 +0000 (22:40 +0800)]
bpf: hash: use per-bucket spinlock

Both htab_map_update_elem() and htab_map_delete_elem() can be
called from eBPF program, and they may be in kernel hot path,
so it isn't efficient to use a per-hashtable lock in this two
helpers.

The per-hashtable spinlock is used for protecting bucket's
hlist, and per-bucket lock is just enough. This patch converts
the per-hashtable lock into per-bucket spinlock, so that
contention can be decreased a lot.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: hash: move select_bucket() out of htab's spinlock
tom.leiming@gmail.com [Tue, 29 Dec 2015 14:40:26 +0000 (22:40 +0800)]
bpf: hash: move select_bucket() out of htab's spinlock

The spinlock is just used for protecting the per-bucket
hlist, so it isn't needed for selecting bucket.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: hash: use atomic count
tom.leiming@gmail.com [Tue, 29 Dec 2015 14:40:25 +0000 (22:40 +0800)]
bpf: hash: use atomic count

Preparing for removing global per-hashtable lock, so
the counter need to be defined as aotmic_t first.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'bnxt_en-next'
David S. Miller [Mon, 28 Dec 2015 05:57:29 +0000 (00:57 -0500)]
Merge branch 'bnxt_en-next'

Michael Chan says:

====================
bnxt_en: Patches for net-next.

Mainly clean-ups, optimizations, and updating to the latest firmware
interface spec.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Add BCM57301 & BCM57402 devices.
David Christensen [Sun, 27 Dec 2015 23:19:29 +0000 (18:19 -0500)]
bnxt_en: Add BCM57301 & BCM57402 devices.

Added the PCI IDs for the BCM57301 and BCM57402 controllers.

Signed-off-by: David Christensen <davidch@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Update to Firmware interface spec 1.0.0.
Michael Chan [Sun, 27 Dec 2015 23:19:28 +0000 (18:19 -0500)]
bnxt_en: Update to Firmware interface spec 1.0.0.

This interface will be forward compatible with future changes.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Keep track of the ring group resource.
Michael Chan [Sun, 27 Dec 2015 23:19:27 +0000 (18:19 -0500)]
bnxt_en: Keep track of the ring group resource.

Newer firmware will return the ring group resource when we call
hwrm_func_qcaps().  To be compatible with older firmware, use the
number of tx rings as the number of ring groups if the older firmware
returns 0.  When determining how many rx rings we can support, take
the ring group resource in account as well in _bnxt_get_max_rings().
Divide and assign the ring groups to VFs.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Improve VF resource accounting.
Michael Chan [Sun, 27 Dec 2015 23:19:26 +0000 (18:19 -0500)]
bnxt_en: Improve VF resource accounting.

We need to keep track of all resources, such as rx rings, tx rings,
cmpl rings, rss contexts, stats contexts, vnics, after we have
divided them for the VFs.  Otherwise, subsequent ring changes on
the PF may not work correctly.

We adjust all max resources in struct bnxt_pf_info after they have been
assigned to the VFs.  There is no need to keep the separate
max_pf_tx_rings and max_pf_rx_rings.

When SR-IOV is disabled, we call bnxt_hwrm_func_qcaps() to restore the
max resources for the PF.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Cleanup bnxt_hwrm_func_cfg().
Michael Chan [Sun, 27 Dec 2015 23:19:25 +0000 (18:19 -0500)]
bnxt_en: Cleanup bnxt_hwrm_func_cfg().

1. Use local variable pf for repeated access to this pointer.

2.  The 2nd argument num_vfs was unnecessarily declared as pointer to int.
This function doesn't change num_vfs so change the argument to int.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Check hardware resources before enabling NTUPLE.
Michael Chan [Sun, 27 Dec 2015 23:19:24 +0000 (18:19 -0500)]
bnxt_en: Check hardware resources before enabling NTUPLE.

The hardware resources required to enable NTUPLE varies depending on
how many rx channels are configured.  We need to make sure we have the
resources before we enable NTUPLE.  Add bnxt_rfs_capable() to do the
checking.

In addition, we need to do the same checking in ndo_fix_features().  As
the rx channels are changed using ethtool -L, we call
netdev_update_features() to make the necessary adjustment for NTUPLE.

Calling netdev_update_features() in netif_running() state but before
calling bnxt_open_nic() would be a problem.  To make this work,
bnxt_set_features() has to be modified to test for BNXT_STATE_OPEN for
the true hardware state instead of checking netif_running().

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Don't treat single segment rx frames as GRO frames.
Michael Chan [Sun, 27 Dec 2015 23:19:23 +0000 (18:19 -0500)]
bnxt_en: Don't treat single segment rx frames as GRO frames.

If hardware completes single segment rx frames, don't bother setting
up all the GRO related fields.  Pass the SKB up as a normal frame.

Reviewed-by: vasundhara volam <vvolam@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Allocate rx_cpu_rmap only if Accelerated RFS is enabled.
Michael Chan [Sun, 27 Dec 2015 23:19:22 +0000 (18:19 -0500)]
bnxt_en: Allocate rx_cpu_rmap only if Accelerated RFS is enabled.

Also, no need to check for bp->rx_nr_rings as it is always >= 1.  If the
allocation fails, it is not a fatal error and we can still proceed.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Increment checksum error counter only if NETIF_F_RXCSUM is set.
Satish Baddipadige [Sun, 27 Dec 2015 23:19:21 +0000 (18:19 -0500)]
bnxt_en: Increment checksum error counter only if NETIF_F_RXCSUM is set.

rx_l4_csum_error is now incremented only when offload is enabled

Signed-off-by: Satish Baddipadige <sbaddipa@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Add support for upgrading APE/NC-SI firmware via Ethtool FLASHDEV
Rob Swindell [Sun, 27 Dec 2015 23:19:20 +0000 (18:19 -0500)]
bnxt_en: Add support for upgrading APE/NC-SI firmware via Ethtool FLASHDEV

NC-SI firmware of type apeFW (10) is now supported.

Signed-off-by: Rob Swindell <swindell@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Optimize ring alloc and ring free functions.
Michael Chan [Sun, 27 Dec 2015 23:19:19 +0000 (18:19 -0500)]
bnxt_en: Optimize ring alloc and ring free functions.

Remove the unnecessary "if" statement before the "for" statement:

if (x) {
        for (i = 0; i < x; i++)
...
}

Also, change the ring free function to return void as it only returns 0.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: support hwrm_func_drv_unrgtr command
Jeffrey Huang [Sun, 27 Dec 2015 23:19:18 +0000 (18:19 -0500)]
bnxt_en: support hwrm_func_drv_unrgtr command

During remove_one, the driver should issue hwrm_func_drv_unrgtr
command to inform firmware that this function has been unloaded.
This is to let firmware keep track of driver present/absent state
when driver is gracefully unloaded. A keep alive timer is needed
later to keep track of driver state during abnormal shutdown.

Signed-off-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoqlcnic: constify qlcnic_dcb_ops structures
Julia Lawall [Sun, 27 Dec 2015 21:01:29 +0000 (22:01 +0100)]
qlcnic: constify qlcnic_dcb_ops structures

The qlcnic_dcb_ops structures are never modified, so declare them as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'r8169-RTL8168H-PHY-fixes'
David S. Miller [Mon, 28 Dec 2015 05:19:38 +0000 (00:19 -0500)]
Merge branch 'r8169-RTL8168H-PHY-fixes'

Chunhao Lin says:

====================
r8169: Update RTL8168H PHY parameters

Fix typo in setting PHY parameter and update the way of reading PHY register
"rg_saw_cnt".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agor8169:Update the way of reading RTL8168H PHY register "rg_saw_cnt"
Chun-Hao Lin [Thu, 24 Dec 2015 13:15:27 +0000 (21:15 +0800)]
r8169:Update the way of reading RTL8168H PHY register "rg_saw_cnt"

The vlaue of RTL8168H PHY register "rg_saw_cnt" only valid from bit0 to bit13.
When read this register, add bitwise-anding its value with 0x3fff.

Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agor8169:Fix typo in setting RTL8168H PHY parameter
Chun-Hao Lin [Thu, 24 Dec 2015 13:15:26 +0000 (21:15 +0800)]
r8169:Fix typo in setting RTL8168H PHY parameter

In function "rtl8168h_2_hw_phy_config", there is a typo in setting
RTL8168H PHY parameter.

Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoDriver for IBM System i/p VNIC protocol
Thomas Falcon [Mon, 21 Dec 2015 17:26:06 +0000 (11:26 -0600)]
Driver for IBM System i/p VNIC protocol

This is a new device driver for a high performance SR-IOV assisted virtual
network for IBM System p and IBM System i systems.  The SR-IOV VF will be
attached to the VIOS partition and mapped to the Linux client via the
hypervisor's VNIC protocol that this driver implements.

This driver is able to perform basic tx and rx, new features
and improvements will be added as they are being developed and tested.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'fsl-fmain'
David S. Miller [Mon, 28 Dec 2015 01:51:40 +0000 (20:51 -0500)]
Merge branch 'fsl-fmain'

Igal Liberman says:

====================
Freescale DPAA FMan

The Freescale Data Path Acceleration Architecture (DPAA) is a set
of hardware components on specific QorIQ multicore processors.
This architecture provides the infrastructure to support
simplified sharing of networking interfaces and accelerators
by multiple CPU cores and the accelerators.

One of the DPAA accelerators is the Frame Manager (FMan)
which contains a series of hardware blocks: ports, Ethernet MACs,
a multi user RAM (MURAM) and Storage Profile (SP).

This patch set introduce the FMan drivers.
Each driver configures and initializes the corresponding
FMan hardware module (described above).
The MAC driver offers support for three different
types of MACs (eTSEC, TGEC, MEMAC).

v9 --> v10:
- Addressed feedback from David Miller
Remove private CRC implementation
- Addressed feedback from Kenneth Klette Jonassen:
- Use Kernel PHY API to configure dTSEC TBI
- Use Kernel PHY API to configure mEMAC PCS
  This patchset requires device tree update:
  https://patchwork.ozlabs.org/patch/559501/
- Addressed feedback from Andy Fleming

v8 --> v9:
No changes

v7 --> v8:
- Addressed feedback from David Miller
- Support for ARM:
- Device tree parsing
- IO Accessors
- Addressed compilation issue on non-PPC targets

v6 --> v7:
- Addressed compilation issue on non-PPC targets
- Removed B4860 rev 1 support

v5 --> v6:
- Addressed feedback from Scott:
- Moved kernel doc to source files
- Removed a series of configurable settings
- Miscellaneous code updates

v4 --> v5:
- Addressed feedback from David Miller:
- Removed driver layering
- Reduce namespace pollution
- Reduce code complexity and size

v3 --> v4:
- Remove device_initcall call in driver registration (redundant)
- Remove hot/cold labels
- Minor update in FMan Clock read from device-tree
- Update fixed-link support
- Addressed feedback from Stephen Hemminger
- Remove bogus blank line

v2 --> v3:
- Addressed feedback from Scott:
- Remove typedefs
- Remove unnecessary memory barriers
- Remove unnecessary casting
- Remove KConfig options
- Remove early_params
- Remove Hungarian notation
- Remove __packed__  attribute and padding from structures
- Remove unlikely attribute (where it's not needed)
- Use proper error codes and remove unnecessary prints
- Use proper values for sleep routines
- Replace complex Macros with functions
- Improve device tree processing code
- Use symbolic defines
- Add time-out in busy-wait loops
- Removed exit code (loadable module support will be added later)
- Fixed "fixed-link" issue raised by Joakim Tjernlund

v1 --> v2:
- Addressed feedback from Paul Bolle:
- General feedback of FMan Driver layer
- Remove Errata defines
- Aligned comments to Kernel Doc
- Remove Loadable Module support (not yet supported)
- Removed not needed KConfig dependencies
- Addressed feedback from Scott Wood
- Use Kernel ioread/iowrite services
- Squash FLIB source and header patches together

This submission is based on the prior Freescale DPAA FMan V3,RFC submission.
Several issues addresses in this submission:
- Reduced MAC layering and complexity
- Reduced code base
- T1024/T2080 10G best effort support
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofsl/fman: Add FMan MAC driver
Igal Liberman [Mon, 21 Dec 2015 00:21:30 +0000 (02:21 +0200)]
fsl/fman: Add FMan MAC driver

This patch adds the Ethernet MAC driver supporting the three
different types of MACs: dTSEC, tGEC and mEMAC.

Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofsl/fman: Add FMan Port Support
Igal Liberman [Mon, 21 Dec 2015 00:21:29 +0000 (02:21 +0200)]
fsl/fman: Add FMan Port Support

Add the Data Path Acceleration Architecture Frame Manger Port Driver.
The FMan driver uses a module called "Port" to represent the physical
TX and RX ports.
Each FMan version has different number of physical ports.
This patch adds The FMan Port configuration, initialization and
runtime control routines for both TX and RX.

Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofsl/fman: Add FMan SP support
Igal Liberman [Mon, 21 Dec 2015 00:21:28 +0000 (02:21 +0200)]
fsl/fman: Add FMan SP support

The Storage Profiles contain parameters that are used
by the FMan for frame reception and transmission.

Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofsl/fman: Add FMan MAC support
Igal Liberman [Mon, 21 Dec 2015 00:21:27 +0000 (02:21 +0200)]
fsl/fman: Add FMan MAC support

Add the Data Path Acceleration Architecture Frame Manger MAC support.
This patch adds The FMan MAC configuration, initialization and
runtime control routines.
This patch contains support for these types of MACs:
- dTSEC: Three speed Ethernet controller (10/100/1000 Mbps)
- tGEC: 10G Ethernet controller (10 Gbps)
- mEMAC: Multi-rate Ethernet MAC (10/100/1000/10000 Mbps)
Different FMan revisions have different type and number of MACs.

Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofsl/fman: Add FMan support
Igal Liberman [Mon, 21 Dec 2015 00:21:26 +0000 (02:21 +0200)]
fsl/fman: Add FMan support

Add the Data Path Acceleration Architecture Frame Manger Driver.
The FMan embeds a series of hardware blocks that implement a group
of Ethernet interfaces. This patch adds The FMan configuration,
initialization and runtime control routines.

The FMan driver supports several hardware versions
differentiated by things like:
- Different type of MACs
- Number of MAC and ports
- Available resources
- Different hardware errata

Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofsl/fman: Add FMan MURAM support
Igal Liberman [Mon, 21 Dec 2015 00:21:25 +0000 (02:21 +0200)]
fsl/fman: Add FMan MURAM support

Add Frame Manager Multi-User RAM support.
This internal FMan memory block is used by the
FMan hardware modules, the management being made
through the generic allocator.

The FMan Internal memory, for example, is used for
allocating transmit and receive FIFOs.

Signed-off-by: Igal Liberman <Igal.Liberman@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoip_tunnel: Move stats update to iptunnel_xmit()
Pravin B Shelar [Thu, 24 Dec 2015 22:34:54 +0000 (14:34 -0800)]
ip_tunnel: Move stats update to iptunnel_xmit()

By moving stats update into iptunnel_xmit(), we can simplify
iptunnel_xmit() usage. With this change there is no need to
call another function (iptunnel_xmit_stats()) to update stats
in tunnel xmit code path.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: Remove deprecated module parameters
Hariprasad Shenai [Thu, 24 Dec 2015 10:54:53 +0000 (16:24 +0530)]
cxgb4: Remove deprecated module parameters

Remove deprecated module parameters, and mark one parameter as
deprecated.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: Get TID calculation right for IPv6 mode
Hariprasad Shenai [Thu, 24 Dec 2015 10:45:17 +0000 (16:15 +0530)]
cxgb4: Get TID calculation right for IPv6 mode

CLIP is always enabled and hardware uses 2 TID entries instead of 4 for
IPv6 in CLIP mode.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'cxgb4-T6-update'
David S. Miller [Thu, 24 Dec 2015 03:34:45 +0000 (22:34 -0500)]
Merge branch 'cxgb4-T6-update'

Hariprasad Shenai says:

====================
Update support for T6 adapters

This patch changes updates the various code changes related to
register, stats and hardware related changes for T6 family of
adapters.

This patch series has been created against net-next tree and includes
patches on cxgb4 and cxgb4vf driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4vf: Update to 128 byte mailbox size for T6 adapter
Hariprasad Shenai [Wed, 23 Dec 2015 17:17:19 +0000 (22:47 +0530)]
cxgb4vf: Update to 128 byte mailbox size for T6 adapter

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: Update SGE context congestion map change for T6 adapter
Hariprasad Shenai [Wed, 23 Dec 2015 17:17:18 +0000 (22:47 +0530)]
cxgb4: Update SGE context congestion map change for T6 adapter

SGE context congestion map changed from 4 to 8 priority per port
in T6 as there are only 2 channels.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: Update mps_tcam output to include T6 fields
Hariprasad Shenai [Wed, 23 Dec 2015 17:17:17 +0000 (22:47 +0530)]
cxgb4: Update mps_tcam output to include T6 fields

In T6, MPS classification has a 512 deep TCAM to do the match lookup.
Each entry has 80x2b sets containing 48 bit MAC address, port number,
VLAN Valid/ID, VNI, lookup type (outer or inner packet header).
[71:48] bit locations are overloaded for outer vs. inner lookup types.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: Update correct encoding of SGE Ingress DMA States for T6 adapter
Hariprasad Shenai [Wed, 23 Dec 2015 17:17:16 +0000 (22:47 +0530)]
cxgb4: Update correct encoding of SGE Ingress DMA States for T6 adapter

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: Update Congestion Channel map for T6 adapter
Hariprasad Shenai [Wed, 23 Dec 2015 17:17:15 +0000 (22:47 +0530)]
cxgb4: Update Congestion Channel map for T6 adapter

Updating Congestion Channel/Priority Map in Congestion Manager Context
for T6. In T6 port 0 is mapped to channel 0 and port 1 is mapped to
channel 1. For 2 port T4/T5 adapter, port 0 is mapped to channel 0,1 and
port 1 is mapped to channel 2,3

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: Update register range and SGE registers for T6 adapter
Hariprasad Shenai [Wed, 23 Dec 2015 17:17:14 +0000 (22:47 +0530)]
cxgb4: Update register range and SGE registers for T6 adapter

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4/cxgb4vf: Update Ingress padding boundary values for T6 adapter
Hariprasad Shenai [Wed, 23 Dec 2015 17:17:13 +0000 (22:47 +0530)]
cxgb4/cxgb4vf: Update Ingress padding boundary values for T6 adapter

Ingress padding boundary values got changed for T6.
    T5: 0=32B 1=64B 2=128B 3=256B 4=512B 5=1024B 6=2048B 7=4096B
    T6: 0=8B  1=16B 2=32B  3=64B  4=128B 5=128B  6=256B  7=512B

Updating the driver to set the correct boundary values in SGE_CONTROL to
32B.
Also, need to take care of this fl alignment change when calculating the
next packet offset.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: Update pm_stats for T6 adapter family
Hariprasad Shenai [Wed, 23 Dec 2015 17:17:12 +0000 (22:47 +0530)]
cxgb4: Update pm_stats for T6 adapter family

Updated pm_stats code to display input FIFO wait (index 5) and read
latency (index 7) counters for T6 adapters

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: Pass correct argument to t4_link_l1cfg()
Hariprasad Shenai [Wed, 23 Dec 2015 17:17:11 +0000 (22:47 +0530)]
cxgb4: Pass correct argument to t4_link_l1cfg()

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobridge: use kobj_to_dev instead of to_dev
Geliang Tang [Wed, 23 Dec 2015 12:42:21 +0000 (20:42 +0800)]
bridge: use kobj_to_dev instead of to_dev

kobj_to_dev has been defined in linux/device.h, so I replace to_dev
with it.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobonding: drop unused to_dev macro in bond_sysfs.c
Geliang Tang [Wed, 23 Dec 2015 12:42:20 +0000 (20:42 +0800)]
bonding: drop unused to_dev macro in bond_sysfs.c

to_dev is not used anymore so drop it.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodsa: mv88e6xxx: Add Second back of statistics
Andrew Lunn [Wed, 23 Dec 2015 12:23:17 +0000 (13:23 +0100)]
dsa: mv88e6xxx: Add Second back of statistics

The 6320 family of switch chips has a second bank for statistics, but
is missing three statistics in the port registers. Generalise and
extend the code:

* adding a field to the statistics table indicating the bank/register
  set where each statistics is.
* add a function indicating if an individual statistics
  is available on this device
* calculate at run time the sset_count.
* return strings based on the available statistics of the device
* return statistics based on the available statistics of the device
* Add support for reading from the second bank.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'sfc-vf'
David S. Miller [Thu, 24 Dec 2015 03:06:39 +0000 (22:06 -0500)]
Merge branch 'sfc-vf'

Bert Kenward says:

====================
sfc: additional virtual function support​

This introduces the client side of a mechanism to defer authorisation of
operations, for example multicast subscription. Although primarily aimed at
SRIOV VFs this can also apply to unprivileged PFs.

Also handle reboot ordering corner cases better and reduce the level of some
logging.

v2: remove #ifdef DEBUG around new WARN_ON in mcdi.c.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosfc: Downgrade or remove some error messages
Bert Kenward [Wed, 23 Dec 2015 08:58:15 +0000 (08:58 +0000)]
sfc: Downgrade or remove some error messages

Depending on configuration the NIC may return errors for unprivileged
functions and/or VFs. Where these are expected and handled, reduce the
level of any output.

Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosfc: Downgrade EPERM messages from MCDI to debug
Tomáš Pilař [Wed, 23 Dec 2015 08:57:51 +0000 (08:57 +0000)]
sfc: Downgrade EPERM messages from MCDI to debug

When running in an unprivileged function we expect some MC commands
to fail with permission errors. To avoid log spew downgrade these to
debug only.

Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosfc: Make failed filter removal less noisy
Bert Kenward [Wed, 23 Dec 2015 08:57:36 +0000 (08:57 +0000)]
sfc: Make failed filter removal less noisy

There are situations - mostly reset related - where our view of the
filter table differs from the hardware. In this case we may try and
remove filters that aren't actually installed. This isn't that
interesting in most situations, so downgrade the logging.

Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosfc: Handle MCDI proxy authorisation
Bert Kenward [Wed, 23 Dec 2015 08:57:18 +0000 (08:57 +0000)]
sfc: Handle MCDI proxy authorisation

For unprivileged functions operations can be authorised by an admin
function. Extra steps are introduced to the MCDI protocol in this
situation - the initial response from the MCDI tells us that the
operation has been deferred, and we must retry when told. We then
receive an event telling us to retry.

Note that this provides only the functionality for the unprivileged
functions, not the handling of the administrative side.

Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosfc: Retry MCDI after NO_EVB_PORT error on a VF
Bert Kenward [Wed, 23 Dec 2015 08:56:40 +0000 (08:56 +0000)]
sfc: Retry MCDI after NO_EVB_PORT error on a VF

After reboot the vswitch configuration from the PF may not be
complete before the VF attempts to restore filters. In that
case we see NO_EVB_PORT errors from the MC. Retry up to a time
limit or until a different result is seen.

Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'cxgb4-next'
David S. Miller [Wed, 23 Dec 2015 17:05:53 +0000 (12:05 -0500)]
Merge branch 'cxgb4-next'

Hariprasad Shenai says:

====================
Trivial enhancements for cxgb4

This series adds a debug message if adapter isn't inserted in right PCI
slot. Changes naming conventions for iSCSI rx queues, use node info while
allocating rx queue and use napi_complete_done() api in napi handler.

This patch series has been created against net-next tree and includes
patches on cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.

Thanks

V2: Dropped 'dcb_info' debug entry patch, since the same can be achieved
    using lldp tool.
    Based on review comments by Or Gerlitz <gerlitz.or@gmail.com> and
    David Miller.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: Use napi_complete_done() api in napi handler
Hariprasad Shenai [Wed, 23 Dec 2015 05:59:56 +0000 (11:29 +0530)]
cxgb4: Use napi_complete_done() api in napi handler

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: Use the node info to alloc_ring() for RX queues
Hariprasad Shenai [Wed, 23 Dec 2015 05:59:55 +0000 (11:29 +0530)]
cxgb4: Use the node info to alloc_ring() for RX queues

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: get naming correct for iscsi queues
Hariprasad Shenai [Wed, 23 Dec 2015 05:59:54 +0000 (11:29 +0530)]
cxgb4: get naming correct for iscsi queues

All the upper level protocols like rdma, iscsi have their own offload rx
queues, so instead of using the generic naming convention be specific
while naming them. Improves code readability

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: Warn if device doesn't have enough PCI bandwidth
Hariprasad Shenai [Wed, 23 Dec 2015 05:59:53 +0000 (11:29 +0530)]
cxgb4: Warn if device doesn't have enough PCI bandwidth

Check if the device get enough bandwidth from the entire PCI chain to
satisfy its capabilities. This patch determines the PCIe device's
bandwidth capabilities by reading its PCIe Link Capabilities registers
and then call the pcie_get_minimum_link function to ensure that the
adapter is hooked into a slot which is capable of providing the
necessary bandwidth capabilities.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'bindtodevice_tw_rst'
David S. Miller [Tue, 22 Dec 2015 22:03:06 +0000 (17:03 -0500)]
Merge branch 'bindtodevice_tw_rst'

Florian Westphal says:

====================
tcp: honour SO_BINDTODEVICE for TW_RST case too

This is V2, this time as a small series since I followed Erics advice
to split this into smaller chunks, I hope this makes it easier to
review.

First patch adds inet_sk_transparent helper.
Second patch contains an if/else swap that I split from the
original TW_RST v1 one.
Third patch is the actual change without the superfluous sock_net change.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: honour SO_BINDTODEVICE for TW_RST case too
Florian Westphal [Mon, 21 Dec 2015 20:29:26 +0000 (21:29 +0100)]
tcp: honour SO_BINDTODEVICE for TW_RST case too

Hannes points out that when we generate tcp reset for timewait sockets we
pretend we found no socket and pass NULL sk to tcp_vX_send_reset().

Make it cope with inet tw sockets and then provide tw sk.

This makes RSTs appear on correct interface when SO_BINDTODEVICE is used.

Packetdrill test case:
// want default route to be used, we rely on BINDTODEVICE
`ip route del 192.0.2.0/24 via 192.168.0.2 dev tun0`

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
// test case still works due to BINDTODEVICE
0.001 setsockopt(3, SOL_SOCKET, SO_BINDTODEVICE, "tun0", 4) = 0
0.100...0.200 connect(3, ..., ...) = 0

0.100 > S 0:0(0) <mss 1460,sackOK,nop,nop>
0.200 < S. 0:0(0) ack 1 win 32792 <mss 1460,sackOK,nop,nop>
0.200 > . 1:1(0) ack 1

0.210 close(3) = 0

0.210 > F. 1:1(0) ack 1 win 29200
0.300 < . 1:1(0) ack 2 win 46

// more data while in FIN_WAIT2, expect RST
1.300 < P. 1:1001(1000) ack 1 win 46

// fails without this change -- default route is used
1.301 > R 1:1(0) win 0

Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: send_reset: test for non-NULL sk first
Florian Westphal [Mon, 21 Dec 2015 20:29:25 +0000 (21:29 +0100)]
tcp: send_reset: test for non-NULL sk first

tcp_md5_do_lookup requires a full socket, so once we extend
_send_reset() to also accept timewait socket we would have to change

if (!sk && hash_location)

to something like

if ((!sk || !sk_fullsock(sk)) && hash_location) {
  ...
} else {
  (sk && sk_fullsock(sk)) tcp_md5_do_lookup()
}

Switch the two branches: check if we have a socket first, then
fall back to a listener lookup if we saw a md5 option (hash_location).

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: add inet_sk_transparent() helper
Florian Westphal [Mon, 21 Dec 2015 20:29:24 +0000 (21:29 +0100)]
net: add inet_sk_transparent() helper

Avoids cluttering tcp_v4_send_reset when followup patch extends
it to deal with timewait sockets.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: core: Use devm_kzalloc to allocate mlxsw_hwmon structure
Jiri Pirko [Tue, 22 Dec 2015 08:43:07 +0000 (09:43 +0100)]
mlxsw: core: Use devm_kzalloc to allocate mlxsw_hwmon structure

KASan reported use-after-free for the hwmon structure. So fix this by
using devm_kzalloc and let the core take care about freeing the memory
during device dettach.

Reported-by: Ido Schimmel <idosch@mellanox.com>
Fixes: 89309da39 ("mlxsw: core: Implement temperature hwmon interface")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: tcp: deal with listen sockets properly in tcp_abort.
Lorenzo Colitti [Mon, 21 Dec 2015 15:03:44 +0000 (00:03 +0900)]
net: tcp: deal with listen sockets properly in tcp_abort.

When closing a listen socket, tcp_abort currently calls
tcp_done without clearing the request queue. If the socket has a
child socket that is established but not yet accepted, the child
socket is then left without a parent, causing a leak.

Fix this by setting the socket state to TCP_CLOSE and calling
inet_csk_listen_stop with the socket lock held, like tcp_close
does.

Tested using net_test. With this patch, calling SOCK_DESTROY on a
listen socket that has an established but not yet accepted child
socket results in the parent and the child being closed, such
that they no longer appear in sock_diag dumps.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: core: Allow to reset temperature history via hwmon interface
Jiri Pirko [Mon, 21 Dec 2015 10:14:21 +0000 (11:14 +0100)]
mlxsw: core: Allow to reset temperature history via hwmon interface

Add another sysfs hwmon attribute to expose possibility to reset
temperature sensors history.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoRDS: don't pretend to use cpu notifiers
Sebastian Andrzej Siewior [Sat, 19 Dec 2015 20:55:43 +0000 (12:55 -0800)]
RDS: don't pretend to use cpu notifiers

It looks like an attempt to use CPU notifier here which was never
completed. Nobody tried to wire it up completely since 2k9. So I unwind
this code and get rid of everything not required. Oh look! 19 lines were
removed while code still does the same thing.

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet-sysfs: use to_net_dev in net_namespace()
Geliang Tang [Tue, 22 Dec 2015 15:11:49 +0000 (23:11 +0800)]
net-sysfs: use to_net_dev in net_namespace()

Use to_net_dev() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Tue, 22 Dec 2015 19:49:03 +0000 (14:49 -0500)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2015-12-22

This series contains updates to fm10k only.

Bruce cleans up the initialization of fm10k_workqueue at the global level,
which fixes a checkpatch.pl error.  Made several other cleanups of the
driver, like making structures that do not change constant, remove unused
code, cleanup code comments and use boolean states true/false instead of
an integer since a bool is all that is needed.

Jacob fixed the TLV format for little endian structures which are 4 byte
aligned copy, so add an additional __aligned(4) and __packed to ensure
that these structures are actually 4 byte aligned and packed correctly.
Updated the driver to use ether_addr_equal() instead of memcmp() to
compare MAC addresses.

Alex Duyck cleans up the exception handling so all of the paths result in
a similar state if we fail.  Specifically the driver will now unload the
mailbox interrupt, free the queue vectors and MSI-X, and then detach the
interface.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofm10k: IS_ENABLED() is not appropriate for boolean kconfig option
Bruce Allan [Wed, 9 Dec 2015 01:20:49 +0000 (17:20 -0800)]
fm10k: IS_ENABLED() is not appropriate for boolean kconfig option

Tri-states need 'if IS_ENABLED()', booleans should use 'ifdef'.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: cleanup mailbox code comments etc
Bruce Allan [Wed, 9 Dec 2015 01:20:44 +0000 (17:20 -0800)]
fm10k: cleanup mailbox code comments etc

Cleanup a number of issues with function header comments, lower-case
acronyms (i.e. FIFO, TLV), duplicate comments and a stubbed-out header
comment for fm10k_sm_mbx_init.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: use true/false for boolean get_host_state
Bruce Allan [Tue, 8 Dec 2015 23:51:11 +0000 (15:51 -0800)]
fm10k: use true/false for boolean get_host_state

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: remove unused struct element
Bruce Allan [Tue, 8 Dec 2015 23:51:04 +0000 (15:51 -0800)]
fm10k: remove unused struct element

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: constify fm10k_mac_ops, fm10k_iov_ops and fm10k_info structures
Bruce Allan [Tue, 8 Dec 2015 23:50:39 +0000 (15:50 -0800)]
fm10k: constify fm10k_mac_ops, fm10k_iov_ops and fm10k_info structures

These structures never change so declare them as const.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: address operator not needed when declaring function pointers
Bruce Allan [Tue, 8 Dec 2015 23:50:34 +0000 (15:50 -0800)]
fm10k: address operator not needed when declaring function pointers

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: use ether_addr_equal instead of memcmp
Jacob Keller [Mon, 16 Nov 2015 23:33:34 +0000 (15:33 -0800)]
fm10k: use ether_addr_equal instead of memcmp

When comparing MAC addresses, use ether_addr_equal instead of memcmp to
ETH_ALEN length. Found and replaced using the following sed:

 sed -e 's/memcmp\x28\(.*\), ETH_ALEN\x29/!ether_addr_equal\x28\1\x29/'

Reported-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: Cleanup exception handling for changing queues
Alexander Duyck [Tue, 10 Nov 2015 17:40:30 +0000 (09:40 -0800)]
fm10k: Cleanup exception handling for changing queues

This patch is meant to cleanup the exception handling for the paths where
we reset the interrupts and then reconfigure them.  In all of these paths
we had very different levels of exception handling.  I have updated the
driver so that all of the paths should result in a similar state if we
fail.

Specifically the driver will now unload the mailbox interrupt, free the
queue vectors and MSI-X, and then detach the interface.

In addition for any of the PCIe related resets I have added a check with
the hw_ready function to just make sure the registers are in a readable
state prior to reopening the interface.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: correctly pack TLV structures and explain reasoning
Jacob Keller [Mon, 9 Nov 2015 22:04:08 +0000 (14:04 -0800)]
fm10k: correctly pack TLV structures and explain reasoning

The TLV format for little endian structures is actually 4 byte aligned
copy. To this end, we need to add an additional __aligned(4) marker
along with __packed to ensure that these structures are actually 4 byte
aligned and packed correctly. Use of just __packed will not work as this
will result in 1byte alignment which is incorrect. Add a comment
explaining the reasoning behind why these structures need the special
treatment.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: don't initialize fm10k_workqueue at global level
Bruce Allan [Tue, 3 Nov 2015 19:35:02 +0000 (11:35 -0800)]
fm10k: don't initialize fm10k_workqueue at global level

Cleans up checkpatch GLOBAL_INITIALIZERS error

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoibmveth: consolidate kmalloc of array, memset 0 to kcalloc
Nicholas Mc Guire [Sun, 20 Dec 2015 14:06:18 +0000 (15:06 +0100)]
ibmveth: consolidate kmalloc of array, memset 0 to kcalloc

This is an API consolidation only. The use of kmalloc + memset to 0
is equivalent to kcalloc in this case as it is allocating an array
of elements.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonetcp: fix regression in receive processing
Arnd Bergmann [Fri, 18 Dec 2015 14:18:08 +0000 (15:18 +0100)]
netcp: fix regression in receive processing

A cleanup patch I did was unfortunately wrong and introduced
multiple serious bugs in the netcp rx processing, as indicated
by these correct gcc warnings:

drivers/net/ethernet/ti/netcp_core.c:776:14: warning: 'buf_ptr' may be used uninitialized in this function [-Wuninitialized]
drivers/net/ethernet/ti/netcp_core.c:687:14: warning: 'ptr' may be used uninitialized in this function [-Wuninitialized]

I have checked the patch once more and found that a call to
get_pkt_info() accidentally got removed in netcp_free_rx_desc_chain,
and netcp_process_one_rx_packet no longer retrieved the correct
buffer length. This patch should fix all the known problems,
but I did not test on real hardware.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 899077791403 ("netcp: try to reduce type confusion in descriptors")
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoasix: silence log message from oversize packet
stephen hemminger [Fri, 18 Dec 2015 01:51:16 +0000 (17:51 -0800)]
asix: silence log message from oversize packet

Since it is possible for an external system to send oversize packets
at anytime, it is best for driver not to print a message and spam
the log (potential external DoS).

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=109471

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: diag: add support for request sockets to tcp_abort()
Eric Dumazet [Fri, 18 Dec 2015 00:14:11 +0000 (16:14 -0800)]
tcp: diag: add support for request sockets to tcp_abort()

Adding support for SYN_RECV request sockets to tcp_abort()
is quite easy after our tcp listener rewrite.

Note that we also need to better handle listeners, or we might
leak not yet accepted children, because of a missing
inet_csk_listen_stop() call.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Tested-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'bpf-misc-updates'
David S. Miller [Fri, 18 Dec 2015 21:04:52 +0000 (16:04 -0500)]
Merge branch 'bpf-misc-updates'

Daniel Borkmann says:

====================
Misc BPF updates

This series contains a couple of misc updates to the BPF code, besides
others a new helper bpf_skb_load_bytes(), moving clearing of A/X to the
classic converter, etc. Please see individual patches for details.

Thanks!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf, test: add couple of test cases
Daniel Borkmann [Thu, 17 Dec 2015 22:51:57 +0000 (23:51 +0100)]
bpf, test: add couple of test cases

Add couple of test cases for interpreter but also JITs, f.e. to test that
when imm32 moves are being done, upper 32bits of the regs are being zero
extended.

Without JIT:

  [...]
  [ 1114.129301] test_bpf: #43 MOV REG64 jited:0 128 PASS
  [ 1114.130626] test_bpf: #44 MOV REG32 jited:0 139 PASS
  [ 1114.132055] test_bpf: #45 LD IMM64 jited:0 124 PASS
  [...]

With JIT (generated code can as usual be nicely verified with the help of
bpf_jit_disasm tool):

  [...]
  [ 1062.726782] test_bpf: #43 MOV REG64 jited:1 6 PASS
  [ 1062.726890] test_bpf: #44 MOV REG32 jited:1 6 PASS
  [ 1062.726993] test_bpf: #45 LD IMM64 jited:1 6 PASS
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf, x86: detect/optimize loading 0 immediates
Daniel Borkmann [Thu, 17 Dec 2015 22:51:56 +0000 (23:51 +0100)]
bpf, x86: detect/optimize loading 0 immediates

When sometimes structs or variables need to be initialized/'memset' to 0 in
an eBPF C program, the x86 BPF JIT converts this to use immediates. We can
however save a couple of bytes (f.e. even up to 7 bytes on a single emmission
of BPF_LD | BPF_IMM | BPF_DW) in the image by detecting such case and use xor
on the dst register instead.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: fix misleading comment in bpf_convert_filter
Daniel Borkmann [Thu, 17 Dec 2015 22:51:55 +0000 (23:51 +0100)]
bpf: fix misleading comment in bpf_convert_filter

Comment says "User BPF's register A is mapped to our BPF register 6",
which is actually wrong as the mapping is on register 0. This can
already be inferred from the code itself. So just remove it before
someone makes assumptions based on that. Only code tells truth. ;)

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: move clearing of A/X into classic to eBPF migration prologue
Daniel Borkmann [Thu, 17 Dec 2015 22:51:54 +0000 (23:51 +0100)]
bpf: move clearing of A/X into classic to eBPF migration prologue

Back in the days where eBPF (or back then "internal BPF" ;->) was not
exposed to user space, and only the classic BPF programs internally
translated into eBPF programs, we missed the fact that for classic BPF
A and X needed to be cleared. It was fixed back then via 83d5b7ef99c9
("net: filter: initialize A and X registers"), and thus classic BPF
specifics were added to the eBPF interpreter core to work around it.

This added some confusion for JIT developers later on that take the
eBPF interpreter code as an example for deriving their JIT. F.e. in
f75298f5c3fe ("s390/bpf: clear correct BPF accumulator register"), at
least X could leak stack memory. Furthermore, since this is only needed
for classic BPF translations and not for eBPF (verifier takes care
that read access to regs cannot be done uninitialized), more complexity
is added to JITs as they need to determine whether they deal with
migrations or native eBPF where they can just omit clearing A/X in
their prologue and thus reduce image size a bit, see f.e. cde66c2d88da
("s390/bpf: Only clear A and X for converted BPF programs"). In other
cases (x86, arm64), A and X is being cleared in the prologue also for
eBPF case, which is unnecessary.

Lets move this into the BPF migration in bpf_convert_filter() where it
actually belongs as long as the number of eBPF JITs are still few. It
can thus be done generically; allowing us to remove the quirk from
__bpf_prog_run() and to slightly reduce JIT image size in case of eBPF,
while reducing code duplication on this matter in current(/future) eBPF
JITs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Zi Shen Lim <zlim.lnx@gmail.com>
Cc: Yang Shi <yang.shi@linaro.org>
Acked-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: add bpf_skb_load_bytes helper
Daniel Borkmann [Thu, 17 Dec 2015 22:51:53 +0000 (23:51 +0100)]
bpf: add bpf_skb_load_bytes helper

When hacking tc programs with eBPF, one of the issues that come up
from time to time is to load addresses from headers. In eBPF as in
classic BPF, we have BPF_LD | BPF_ABS | BPF_{B,H,W} instructions that
extract a byte, half-word or word out of the skb data though helpers
such as bpf_load_pointer() (interpreter case).

F.e. extracting a whole IPv6 address could possibly look like ...

  union v6addr {
    struct {
      __u32 p1;
      __u32 p2;
      __u32 p3;
      __u32 p4;
    };
    __u8 addr[16];
  };

  [...]

  a.p1 = htonl(load_word(skb, off));
  a.p2 = htonl(load_word(skb, off +  4));
  a.p3 = htonl(load_word(skb, off +  8));
  a.p4 = htonl(load_word(skb, off + 12));

  [...]

  /* access to a.addr[...] */

This work adds a complementary helper bpf_skb_load_bytes() (we also
have bpf_skb_store_bytes()) as an alternative where the same call
would look like from an eBPF program:

  ret = bpf_skb_load_bytes(skb, off, addr, sizeof(addr));

Same verifier restrictions apply as in ffeedafbf023 ("bpf: introduce
current->pid, tgid, uid, gid, comm accessors") case, where stack memory
access needs to be statically verified and thus guaranteed to be
initialized in first use (otherwise verifier cannot tell whether a
subsequent access to it is valid or not as it's runtime dependent).

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Fri, 18 Dec 2015 20:37:42 +0000 (15:37 -0500)]
Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains the first batch of Netfilter updates for
the upcoming 4.5 kernel. This batch contains userspace netfilter header
compilation fixes, support for packet mangling in nf_tables, the new
tracing infrastructure for nf_tables and cgroup2 support for iptables.
More specifically, they are:

1) Two patches to include dependencies in our netfilter userspace
   headers to resolve compilation problems, from Mikko Rapeli.

2) Four comestic cleanup patches for the ebtables codebase, from Ian Morris.

3) Remove duplicate include in the netfilter reject infrastructure,
   from Stephen Hemminger.

4) Two patches to simplify the netfilter defragmentation code for IPv6,
   patch from Florian Westphal.

5) Fix root ownership of /proc/net netfilter for unpriviledged net
   namespaces, from Philip Whineray.

6) Get rid of unused fields in struct nft_pktinfo, from Florian Westphal.

7) Add mangling support to our nf_tables payload expression, from
   Patrick McHardy.

8) Introduce a new netlink-based tracing infrastructure for nf_tables,
   from Florian Westphal.

9) Change setter functions in nfnetlink_log to be void, from
    Rami Rosen.

10) Add netns support to the cttimeout infrastructure.

11) Add cgroup2 support to iptables, from Tejun Heo.

12) Introduce nfnl_dereference_protected() in nfnetlink, from Florian.

13) Add support for mangling pkttype in the nf_tables meta expression,
    also from Florian.

BTW, I need that you pull net into net-next, I have another batch that
requires changes that I don't yet see in net.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonfp: call netif_carrier_off() during init
Jakub Kicinski [Thu, 17 Dec 2015 14:18:44 +0000 (14:18 +0000)]
nfp: call netif_carrier_off() during init

Netdevs default to carrier on, we should call netif_carrier_off()
during initialization since we handle carrier state changes in the
driver.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Rolf Neugebauer <rolf.neugebauer@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'l3mdev-accept'
David S. Miller [Fri, 18 Dec 2015 19:43:39 +0000 (14:43 -0500)]
Merge branch 'l3mdev-accept'

David Ahern says:

====================
net: Allow accepted sockets to be bound to l3mdev domain

Allow accepted sockets to derive their sk_bound_dev_if setting from the
l3mdev domain in which the packets originated. This version adds a sysctl
to control whether the setting is inherited, making the functionality
similar to sk_mark and its sysctl_tcp_fwmark_accept setting.

This effectively allow a process to have a "VRF-global" listen socket,
with child sockets bound to the VRF device in which the packet originated.
A similar behavior can be achieved using sk_mark, but a solution using marks
is incomplete as it does not handle duplicate addresses in different L3
domains/VRFs. Allowing sockets to inherit the sk_bound_dev_if from l3mdev
domain provides a complete solution.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: Allow accepted sockets to be bound to l3mdev domain
David Ahern [Wed, 16 Dec 2015 21:20:44 +0000 (13:20 -0800)]
net: Allow accepted sockets to be bound to l3mdev domain

Allow accepted sockets to derive their sk_bound_dev_if setting from the
l3mdev domain in which the packets originated. A sysctl setting is added
to control the behavior which is similar to sk_mark and
sysctl_tcp_fwmark_accept.

This effectively allow a process to have a "VRF-global" listen socket,
with child sockets bound to the VRF device in which the packet originated.
A similar behavior can be achieved using sk_mark, but a solution using marks
is incomplete as it does not handle duplicate addresses in different L3
domains/VRFs. Allowing sockets to inherit the sk_bound_dev_if from l3mdev
domain provides a complete solution.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: l3mdev: Add master device lookup by index
David Ahern [Wed, 16 Dec 2015 21:20:43 +0000 (13:20 -0800)]
net: l3mdev: Add master device lookup by index

Add helper to lookup l3mdev master index given a device index.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv6: addrconf: use stable address generator for ARPHRD_NONE
Bjørn Mork [Wed, 16 Dec 2015 15:44:38 +0000 (16:44 +0100)]
ipv6: addrconf: use stable address generator for ARPHRD_NONE

Add a new address generator mode, using the stable address generator
with an automatically generated secret. This is intended as a default
address generator mode for device types with no EUI64 implementation.
The new generator is used for ARPHRD_NONE interfaces initially, adding
default IPv6 autoconf support to e.g. tun interfaces.

If the addrgenmode is set to 'random', either by default or manually,
and no stable secret is available, then a random secret is used as
input for the stable-privacy address generator.  The secret can be
read and modified like manually configured secrets, using the proc
interface.  Modifying the secret will change the addrgen mode to
'stable-privacy' to indicate that it operates on a known secret.

Existing behaviour of the 'stable-privacy' mode is kept unchanged. If
a known secret is available when the device is created, then the mode
will default to 'stable-privacy' as before.  The mode can be manually
set to 'random' but it will behave exactly like 'stable-privacy' in
this case. The secret will not change.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: 吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoila: add NETFILTER dependency
Arnd Bergmann [Fri, 18 Dec 2015 14:37:37 +0000 (15:37 +0100)]
ila: add NETFILTER dependency

The recently added generic ILA translation facility fails to
build when CONFIG_NETFILTER is disabled:

net/ipv6/ila/ila_xlat.c:229:20: warning: 'struct nf_hook_state' declared inside parameter list
net/ipv6/ila/ila_xlat.c:235:27: error: array type has incomplete element type 'struct nf_hook_ops'
 static struct nf_hook_ops ila_nf_hook_ops[] __read_mostly = {

This adds an explicit Kconfig dependency to avoid that case.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 7f00feaf1076 ("ila: Add generic ILA translation facility")
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonetfilter: meta: add support for setting skb->pkttype
Florian Westphal [Thu, 10 Dec 2015 17:04:07 +0000 (18:04 +0100)]
netfilter: meta: add support for setting skb->pkttype

This allows to redirect bridged packets to local machine:

ether type ip ether daddr set aa:53:08:12:34:56 meta pkttype set unicast
Without 'set unicast', ip stack discards PACKET_OTHERHOST skbs.

It is also useful to add support for a '-m cluster like' nft rule
(where switch floods packets to several nodes, and each cluster node
 node processes a subset of packets for load distribution).

Mangling is restricted to HOST/OTHER/BROAD/MULTICAST, i.e. you cannot set
skb->pkt_type to PACKET_KERNEL or change PACKET_LOOPBACK to PACKET_HOST.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Fri, 18 Dec 2015 03:08:28 +0000 (22:08 -0500)]
Merge git://git./linux/kernel/git/davem/net

Conflicts:
drivers/net/geneve.c

Here we had an overlapping change, where in 'net' the extraneous stats
bump was being removed whilst in 'net-next' the final argument to
udp_tunnel6_xmit_skb() was being changed.

Signed-off-by: David S. Miller <davem@davemloft.net>