cascardo/linux.git
8 years agosock: convert sk_peek_offset functions to WRITE_ONCE
Willem de Bruijn [Tue, 5 Apr 2016 16:41:14 +0000 (12:41 -0400)]
sock: convert sk_peek_offset functions to WRITE_ONCE

Make the peek offset interface safe to use in lockless environments.
Use READ_ONCE and WRITE_ONCE to avoid race conditions between testing
and updating the peek offset.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Tue, 5 Apr 2016 20:26:31 +0000 (16:26 -0400)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2016-04-05

This series contains updates to i40e and i40evf only.

Stefan converts dev_close() to ndo_stop() for ethtool offline self test,
since dev_close() causes IFF_UP to be cleared which will remove the
interface routes and addresses.

Alex bumps up the size of the transmit data buffer to 12K rather than 8K,
which provides a gain in throughput and a reduction in overhead for
putting together the frame.  Fixed an issue in the polling routines where
we were using bitwise operators to avoid the side effects of the
logical operators.  Then added support for bulk transmit clean for skbs.

Jesse fixed a sparse issue in the type casting in the transmit code and
fixed i40e_aq_set_phy_debug() to use i40e_status as a return code.

Catherine cleans up duplicated code.

Shannon fixed the cleaning up of the interrupt handling to clean up the
IRQs only if we actually got them set up.  Also fixed up the error
scenarios where we were trying to remove a non-existent timer or
worktask, which causes the kernel heartburn.

Mitch changes the notification of resets to the reset interrupt handler,
instead of the actual reset initiation code.  This allows the VFs to get
properly notified for all resets, including resets initiated by different
PFs on the same physical device.  Also moved the clearing of VFLR bit
after reset processing, instead of before which could lead to double
resets on VF init.  Fixed code comment to match the actual function name.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'bnxt_en-next'
David S. Miller [Tue, 5 Apr 2016 20:20:49 +0000 (16:20 -0400)]
Merge branch 'bnxt_en-next'

Michael Chan says:

====================
bnxt_en: Updates for net-next.

Update to latest firmware interface, add EEE feature, unsupported SFP+
module warning, and ethtool -s improvements.

v2: Removed the GEEPROM patch and added more comments to the get_eee patch.
====================

8 years agobnxt_en: Improve ethtool .get_settings().
Michael Chan [Tue, 5 Apr 2016 18:09:03 +0000 (14:09 -0400)]
bnxt_en: Improve ethtool .get_settings().

If autoneg is off, we should always report the speed and duplex settings
even if it is link down so the user knows the current settings.  The
unknown speed and duplex should only be used for autoneg when link is
down.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Check for valid forced speed during ethtool -s.
Michael Chan [Tue, 5 Apr 2016 18:09:02 +0000 (14:09 -0400)]
bnxt_en: Check for valid forced speed during ethtool -s.

Check that the forced speed is a valid speed supported by firmware.
If not supported, return -EINVAL.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Add unsupported SFP+ module warnings.
Michael Chan [Tue, 5 Apr 2016 18:09:01 +0000 (14:09 -0400)]
bnxt_en: Add unsupported SFP+ module warnings.

Add the PORT_CONN_NOT_ALLOWED async event handling logic.  The driver
will print an appropriate warning to reflect the SFP+ module enforcement
policy done in the firmware.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Set async event bits when registering with the firmware.
Michael Chan [Tue, 5 Apr 2016 18:09:00 +0000 (14:09 -0400)]
bnxt_en: Set async event bits when registering with the firmware.

Currently, the driver only sets bit 0 of the async_event_fwd fields.
To be compatible with the latest spec, we need to set the
appropriate event bits handled by the driver.  We should be handling
link change and PF driver unload events, so these 2 bits should be
set.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Add get_eee() and set_eee() ethtool support.
Michael Chan [Tue, 5 Apr 2016 18:08:59 +0000 (14:08 -0400)]
bnxt_en: Add get_eee() and set_eee() ethtool support.

Allow users to get|set EEE parameters.

v2: Added comment for preserving the tx_lpi_timer value in get_eee.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Add EEE setup code.
Michael Chan [Tue, 5 Apr 2016 18:08:58 +0000 (14:08 -0400)]
bnxt_en: Add EEE setup code.

1. Add bnxt_hwrm_set_eee() function to setup EEE firmware parameters based
on the bp->eee settings.
2. The new function bnxt_eee_config_ok() will check if EEE parameters need
to be modified due to autoneg changes.
3. bnxt_hwrm_set_link() has added a new parameter to update EEE.  If the
parameter is set, it will call bnxt_hwrm_set_eee().

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Add basic EEE support.
Michael Chan [Tue, 5 Apr 2016 18:08:57 +0000 (14:08 -0400)]
bnxt_en: Add basic EEE support.

Get EEE capability and the initial EEE settings from firmware.
Add "EEE is active | not active" to link up dmesg.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Improve flow control autoneg with Firmware 1.2.1 interface.
Michael Chan [Tue, 5 Apr 2016 18:08:56 +0000 (14:08 -0400)]
bnxt_en: Improve flow control autoneg with Firmware 1.2.1 interface.

Make use of the new AUTONEG_PAUSE bit in the new interface to better
control autoneg flow control settings, independent of RX and TX
advertisement settings.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnxt_en: Update to Firmware 1.2.2 spec.
Michael Chan [Tue, 5 Apr 2016 18:08:55 +0000 (14:08 -0400)]
bnxt_en: Update to Firmware 1.2.2 spec.

Use new field names in API structs and stop using deprecated fields
auto_link_speed and auto_duplex in phy_cfg/phy_qcfg structs.

Update copyright year to 2016.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Tue, 5 Apr 2016 20:08:02 +0000 (16:08 -0400)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2016-04-05

This series contains updates to fm10k only.

Bruce provides nearly half of the patches in the series, most of which do
general cleanup of the driver.  These include semantic cleanups,
checkpatch.pl fixes, update driver to use BIT() kernel macro, use
BUILD_BUG_ON() where appropriate and use ether_addr_copy() instead of
memcpy().

Jake provides the remaining patches in the series, starting with a fix
for a possible NULL pointer deference.  Next delays initialization of the
service timer and service task until late in probe().  If we do not wait,
failures in probe do not properly cleanup the service timer or service
task items which result in a kernel panic.  Added better reporting during
error conditions.  Fixed another possible kernel panic where we were
clearing the interrupt scheme before we freed the mailbox IRQ.  Added
helper functions for setting strings and data for ethtool stats.  Fixed
comment mis-spelled words.

v2: Dropped patch 3 from the original submission, until a better solution
    can be worked up based on feedback from Joe Perches and David Miller.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofm10k: use ethtool_rxfh_indir_default for default redirection table
Jacob Keller [Wed, 17 Feb 2016 00:19:24 +0000 (16:19 -0800)]
fm10k: use ethtool_rxfh_indir_default for default redirection table

The fm10k driver used its own code for generating a default indirection
table on device load, which was not the same as the default generated by
ethtool when indir_size of 0 is passed to SRXFH. Take advantage of
ethtool_rxfh_indir_default() and simplify code to write the redirection
table to reduce some code duplication.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: fix a minor typo in some comments
Jacob Keller [Wed, 10 Feb 2016 22:45:51 +0000 (14:45 -0800)]
fm10k: fix a minor typo in some comments

s/funciton/function to resolve a typo, and cleanup grammar on a few
comments regarding processing the VF mailboxes.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: correctly clean up when init_queueing_scheme fails
Jacob Keller [Wed, 10 Feb 2016 22:45:50 +0000 (14:45 -0800)]
fm10k: correctly clean up when init_queueing_scheme fails

Fix a kernel panic that occurs during surprise removal. Clear the
interface queue counts upon fm10k_init_msix_capability failure. This
prevents further code (fm10k_update_stats etc.) from attempting to
access unallocated queue vector or ring memory.

[  628.692648] BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
[  628.692805] IP: [<ffffffffa0475caf>] fm10k_update_stats+0x7f/0x2c0 [fm10k]
[  628.693173] PGD 0
[  628.693759] Oops: 0000 [#1] SMP
[  628.699321] CPU: 10 PID: 8164 Comm: kworker/10:0 Tainted: G           OE  ------------   3.10.0-327.el7.x86_64 #1
[  628.700096] Hardware name: Supermicro X9DAi/X9DAi, BIOS 3.2 05/09/2015
[  628.700894] Workqueue: pciehp-1 pciehp_power_thread
[  628.701686] task: ffff88086559c500 ti: ffff8808593c0000 task.ti: ffff8808593c0000
[  628.702493] RIP: 0010:[<ffffffffa0475caf>]  [<ffffffffa0475caf>] fm10k_update_stats+0x7f/0x2c0 [fm10k]
[  628.703310] RSP: 0018:ffff8808593c3b00  EFLAGS: 00010282
[  628.704132] RAX: 0000000000000000 RBX: ffff880860760000 RCX: 0000000000000000
[  628.704963] RDX: ffff880860760b08 RSI: 0000000000000000 RDI: 0000000000000000
[  628.705794] RBP: ffff8808593c3b40 R08: 0000000000000000 R09: 0000000000000000
[  628.706604] R10: 0000000000000000 R11: ffff880860760c40 R12: 0000000000000080
[  628.707420] R13: ffff8808607608c0 R14: ffff880860779ec0 R15: ffff880860779f40
[  628.708238] FS:  0000000000000000(0000) GS:ffff88086f000000(0000) knlGS:0000000000000000
[  628.709071] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  628.709923] CR2: 0000000000000068 CR3: 000000000194a000 CR4: 00000000001407e0
[  628.710752] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  628.711596] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  628.712438] Stack:
[  628.713255]  ffff880860764458 ffff8808607608c0 ffff880860760000 ffff880860760000
[  628.714088]  0000000000000080 ffff8808607608c0 ffff880860779ec0 ffff880860779f40
[  628.714925]  ffff8808593c3b88 ffffffffa04780c5 ffff880860764458 0000000a8163cb5b
[  628.715752] Call Trace:
[  628.716560]  [<ffffffffa04780c5>] fm10k_down+0x155/0x1f0 [fm10k]
[  628.717367]  [<ffffffffa0479958>] fm10k_close+0x28/0xd0 [fm10k]
[  628.718184]  [<ffffffff81526365>] __dev_close_many+0x85/0xd0
[  628.718986]  [<ffffffff815264d8>] dev_close_many+0x98/0x120
[  628.719764]  [<ffffffff81527ab8>] rollback_registered_many+0xa8/0x230
[  628.720527]  [<ffffffff81527c80>] rollback_registered+0x40/0x70
[  628.721294]  [<ffffffff81529198>] unregister_netdevice_queue+0x48/0x80
[  628.722052]  [<ffffffff815291ec>] unregister_netdev+0x1c/0x30
[  628.722816]  [<ffffffffa04762b8>] fm10k_remove+0xd8/0xe0 [fm10k]
[  628.723581]  [<ffffffff81328c7b>] pci_device_remove+0x3b/0xb0
[  628.724340]  [<ffffffff813f5fbf>] __device_release_driver+0x7f/0xf0
[  628.725088]  [<ffffffff813f6053>] device_release_driver+0x23/0x30
[  628.725814]  [<ffffffff81321fe4>] pci_stop_bus_device+0x94/0xa0
[  628.726535]  [<ffffffff813220d2>] pci_stop_and_remove_bus_device+0x12/0x20
[  628.727249]  [<ffffffff8133de40>] pciehp_unconfigure_device+0xb0/0x1b0
[  628.727964]  [<ffffffff8133d822>] pciehp_disable_slot+0x52/0xd0
[  628.728664]  [<ffffffff8133d98a>] pciehp_power_thread+0xea/0x150
[  628.729358]  [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[  628.730036]  [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[  628.730730]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[  628.731385]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
[  628.732036]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[  628.732674]  [<ffffffff81645858>] ret_from_fork+0x58/0x90
[  628.733289]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[  628.733883] Code: 83 e8 01 48 8d 97 40 02 00 00 45 31 c0 4c 8d 9c c7 48 02 0
[  628.735202] RIP  [<ffffffffa0475caf>] fm10k_update_stats+0x7f/0x2c0 [fm10k]
[  628.735732]  RSP <ffff8808593c3b00>
[  628.736285] CR2: 0000000000000068
[  628.736846] ---[ end trace 9156088b311aff42 ]---

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: prevent possibly uninitialized variable
Bruce Allan [Wed, 10 Feb 2016 22:45:47 +0000 (14:45 -0800)]
fm10k: prevent possibly uninitialized variable

If 'attr_flag < (1 << (2 * FM10K_TEST_MSG_NESTED))' is ever false, err
will be used uninitialized.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: add helper functions to set strings and data for ethtool stats
Jacob Keller [Fri, 5 Feb 2016 18:43:08 +0000 (10:43 -0800)]
fm10k: add helper functions to set strings and data for ethtool stats

Reduce duplicate code and the amount of indentation by adding
fm10k_add_stat_strings and fm10k_add_ethtool_stats functions which help
add fm10k_stat structures to the ethtool stats callbacks. This helps
increase ease of use for future stat additions, and increases code
readability. Skip handling of the per-queue stats as these will be
reworked in a following patch.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: free MBX IRQ before clearing interrupt scheme
Jacob Keller [Thu, 4 Feb 2016 18:47:58 +0000 (10:47 -0800)]
fm10k: free MBX IRQ before clearing interrupt scheme

During fm10k_io_error_detected we were clearing the interrupt scheme
before we freed the MBX IRQ. This causes a kernel panic because the MBX
IRQ are assigned after MSI-X initialization. Clearing the interrupt
scheme results in removing the MSI-X entry table. Fix this by freeing
the MBX IRQ before we clear the interrupt scheme, as we do elsewhere in
the driver.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: print error message when stop_hw fails
Jacob Keller [Thu, 4 Feb 2016 18:47:57 +0000 (10:47 -0800)]
fm10k: print error message when stop_hw fails

fm10k_stop_hw_generic calls fm10k_disable_queues_generic, which may
return an error code indicating that the queues were not stopped within
the time limit. Notify the user by displaying a message in the kernel
message ring, in a similar way to how we notify the user when reset_hw
fails. There isn't much we can do to recover from this error, so
currently nothing else is done.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: base queue scheme covered by RSS
Jacob Keller [Thu, 4 Feb 2016 18:47:56 +0000 (10:47 -0800)]
fm10k: base queue scheme covered by RSS

In fm10k_set_num_queues, we previously assigned the base template. This
would always be overwritten by either fm10k_set_qos_queues or
fm10k_set_rss_queues. In either case, we don't need the base values, so
we can just remove them.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: don't initialize service task until later in probe
Jacob Keller [Thu, 4 Feb 2016 18:47:55 +0000 (10:47 -0800)]
fm10k: don't initialize service task until later in probe

Delay initialization of the service timer and service task until late
probe. If we don't wait, failures in probe do not properly cleanup the
service timer or service task items, which results in the kernel panic
below, potentially freezing the whole system. In addition, ensure that
the SERVICE_DISABLE bit is set before we request the MBX IRQ since the
MBX interrupt attempts to schedule the service task otherwise. This
prevents a similar trace from occurring after this change.

We didn't notice this issue before because probe almost always completes
successfully. I discovered it due to a mis-ordered mailbox handler
array, which resulted in the following failure when requesting mailbox
interrupt.

[  555.325619] ------------[ cut here ]------------
[  555.325628] WARNING: CPU: 0 PID: 4941 at lib/list_debug.c:33 __list_add+0xa0/0xd0()
[  555.325631] list_add corruption. prev->next should be next (ffffffff81f46648), but was           (null). (prev=ffff8807fad5d0e8).
<snip>
[  555.325722] CPU: 0 PID: 4941 Comm: insmod Tainted: G           OE   4.0.4-303.fc22.x86_64 #1
[  555.325725] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.03.8x23.060520140825 06/05/2014
[  555.325727]  0000000000000000 00000000b4f161b3 ffff88081a21f8e8 ffffffff81783124
[  555.325734]  0000000000000000 ffff88081a21f940 ffff88081a21f928 ffffffff8109c66a
[  555.325740]  0000000064000000 ffff8807fad5d0e8 ffff8807fad5d0e8 ffffffff81f46648
[  555.325746] Call Trace:
[  555.325752]  [<ffffffff81783124>] dump_stack+0x45/0x57
[  555.325757]  [<ffffffff8109c66a>] warn_slowpath_common+0x8a/0xc0
[  555.325759]  [<ffffffff8109c6f5>] warn_slowpath_fmt+0x55/0x70
[  555.325763]  [<ffffffff813ba270>] __list_add+0xa0/0xd0
[  555.325768]  [<ffffffff81102d1d>] __internal_add_timer+0x9d/0x110
[  555.325771]  [<ffffffff81102dbf>] internal_add_timer+0x2f/0xc0
[  555.325774]  [<ffffffff81104e5a>] mod_timer+0x12a/0x230
[  555.325782]  [<ffffffffa03d54ca>] fm10k_probe+0x69a/0xc80 [fm10k]
[  555.325787]  [<ffffffff813e8355>] local_pci_probe+0x45/0xa0
[  555.325791]  [<ffffffff8129cf42>] ? sysfs_do_create_link_sd.isra.2+0x72/0xc0
[  555.325794]  [<ffffffff813e96b9>] pci_device_probe+0xf9/0x150
[  555.325799]  [<ffffffff814d7e73>] driver_probe_device+0xa3/0x400
[  555.325802]  [<ffffffff814d82ab>] __driver_attach+0x9b/0xa0
[  555.325805]  [<ffffffff814d8210>] ? __device_attach+0x40/0x40
[  555.325808]  [<ffffffff814d5bd3>] bus_for_each_dev+0x73/0xc0
[  555.325811]  [<ffffffff814d78ce>] driver_attach+0x1e/0x20
[  555.325815]  [<ffffffff814d7480>] bus_add_driver+0x180/0x250
[  555.325819]  [<ffffffffa03b2000>] ? 0xffffffffa03b2000
[  555.325823]  [<ffffffff814d8aa4>] driver_register+0x64/0xf0
[  555.325826]  [<ffffffff813e7bec>] __pci_register_driver+0x4c/0x50
[  555.325832]  [<ffffffffa03d6ca3>] fm10k_register_pci_driver+0x23/0x30 [fm10k]
[  555.325838]  [<ffffffffa03b2080>] fm10k_init_module+0x80/0x1000 [fm10k]
[  555.325843]  [<ffffffff81002128>] do_one_initcall+0xb8/0x200
[  555.325848]  [<ffffffff811e10d2>] ? __vunmap+0xa2/0x100
[  555.325852]  [<ffffffff811fe239>] ? kmem_cache_alloc_trace+0x1b9/0x240
[  555.325855]  [<ffffffff8178230e>] ? do_init_module+0x28/0x1cb
[  555.325858]  [<ffffffff81782346>] do_init_module+0x60/0x1cb
[  555.325862]  [<ffffffff8112168e>] load_module+0x205e/0x26b0
[  555.325866]  [<ffffffff8111d110>] ? store_uevent+0x70/0x70
[  555.325870]  [<ffffffff812234b0>] ? kernel_read+0x50/0x80
[  555.325873]  [<ffffffff81121f3e>] SyS_finit_module+0xbe/0xf0
[  555.325878]  [<ffffffff81789749>] system_call_fastpath+0x12/0x17
[  555.325880] ---[ end trace 9e0f58d071eafd2a ]---

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: prevent null pointer dereference of msix_entries table
Jacob Keller [Thu, 4 Feb 2016 18:47:54 +0000 (10:47 -0800)]
fm10k: prevent null pointer dereference of msix_entries table

According to the C standard dereferencing a variable before it is
checked invokes undefined behavior, and thus compilers are free to
assume the check for NULL isn't necessary. Prevent this by re-ordering
the NULL check of msix_entries in fm10k_free_mbx_irq.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: use ether_addr_copy to copy MAC address
Bruce Allan [Tue, 29 Dec 2015 02:00:30 +0000 (18:00 -0800)]
fm10k: use ether_addr_copy to copy MAC address

Cleanup the remaining instances of using memcpy() instead of the preferred
ether_addr_copy().

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: cleanup SPACE_BEFORE_TAB checkpatch warning
Bruce Allan [Tue, 22 Dec 2015 22:55:26 +0000 (14:55 -0800)]
fm10k: cleanup SPACE_BEFORE_TAB checkpatch warning

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: demote BUG_ON() to WARN_ON() where appropriate
Bruce Allan [Tue, 22 Dec 2015 22:55:20 +0000 (14:55 -0800)]
fm10k: demote BUG_ON() to WARN_ON() where appropriate

We don't need to crash the kernel in this instance so just warn about the
condition and play on.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: cleanup remaining right-bit-shifted 1
Bruce Allan [Tue, 22 Dec 2015 21:43:49 +0000 (13:43 -0800)]
fm10k: cleanup remaining right-bit-shifted 1

Use BIT() macro instead.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: Move constants to the right of binary operators
Bruce Allan [Tue, 22 Dec 2015 21:43:44 +0000 (13:43 -0800)]
fm10k: Move constants to the right of binary operators

The semantic patch that makes this change is available
in scripts/coccinelle/misc/compare_const_fl.cocci.

More information about semantic patching is available at
http://coccinelle.lip6.fr/

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e/i40evf: Bump patch from 1.4.25 to 1.5.1
Catherine Sullivan [Thu, 10 Mar 2016 22:59:51 +0000 (14:59 -0800)]
i40e/i40evf: Bump patch from 1.4.25 to 1.5.1

Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e: Change comment to reflect correct function name
Mitch Williams [Thu, 10 Mar 2016 22:59:50 +0000 (14:59 -0800)]
i40e: Change comment to reflect correct function name

Minor correction in the comment to reflect the correct function name

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40evf: Add additional check for reset
Mitch Williams [Thu, 10 Mar 2016 22:59:49 +0000 (14:59 -0800)]
i40evf: Add additional check for reset

If the driver happens to read a register during the time in which the
device is undergoing reset, it will receive a value of 0xdeadbeef
instead of a valid value. Unfortunately, the driver may misinterpret
this as a valid value, especially if it's just looking for individual
bits.

Add an explicit check for this value when we are looking for admin queue
errors, and trigger reset recovery if we find it.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e: Change unknown event error msg to ignore message
Shannon Nelson [Thu, 10 Mar 2016 22:59:48 +0000 (14:59 -0800)]
i40e: Change unknown event error msg to ignore message

There's no real error in an unknown event from the Firmware, we're just
posting a useful FYI notice, so this patch simply removes the "Error" word.

Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e: Added code to prevent double resets
Mitch Williams [Thu, 10 Mar 2016 22:59:47 +0000 (14:59 -0800)]
i40e: Added code to prevent double resets

Clear the VFLR bit after reset processing, instead of before. This
prevents double resets on VF init.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e: Notify VFs of all resets
Mitch Williams [Thu, 10 Mar 2016 22:59:46 +0000 (14:59 -0800)]
i40e: Notify VFs of all resets

Notify VFs in the reset interrupt handler, instead of the actual
reset initiation code. This allows the VFs to get properly notified for
all resets, including resets initiated by different PFs on the same
physical device.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e: Remove timer and task only if created
Shannon Nelson [Thu, 10 Mar 2016 22:59:45 +0000 (14:59 -0800)]
i40e: Remove timer and task only if created

In some error scenarios, we may find ourselves trying to remove a
non-existent timer or worktask.  This causes the kernel some bit
of consternation, so don't do it.

Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoMerge branch 'mlxsw-next'
David S. Miller [Tue, 5 Apr 2016 19:07:55 +0000 (15:07 -0400)]
Merge branch 'mlxsw-next'

Jiri Pirko says:

====================
mlxsw: small driver update, including switchdev doc update

Ido Schimmel (3):
  mlxsw: spectrum: Reduce number of supported 802.1D bridges
  switchdev: Use switch ID in suggested udev rule
  mlxsw: spectrum: Add support for physical port names
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum: Add support for physical port names
Ido Schimmel [Tue, 5 Apr 2016 08:20:04 +0000 (10:20 +0200)]
mlxsw: spectrum: Add support for physical port names

Export to userspace the front panel name of the port, so that udev can
rename the ports accordingly. The convention suggested by switchdev
documentation is used:

1) Non-split: pX
2) Split: pXsY

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoswitchdev: Use switch ID in suggested udev rule
Ido Schimmel [Tue, 5 Apr 2016 08:20:03 +0000 (10:20 +0200)]
switchdev: Use switch ID in suggested udev rule

Since there can be multiple switch ASICs on the same system we should
use the switch ID in order to differentiate between them and set the
switch name (e.g. swX) accordingly.

Also, replace the order of the "Switch ID" and "Port Netdev Naming"
sections following the above change.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum: Reduce number of supported 802.1D bridges
Ido Schimmel [Tue, 5 Apr 2016 08:20:02 +0000 (10:20 +0200)]
mlxsw: spectrum: Reduce number of supported 802.1D bridges

Resources allocated for these bridges at init time cannot be later used
for other purposes. While current number is supported by the device,
it's mostly theoretical with regards to any real use case, which leads
to poor utilization of device's resources. Solve that by reducing the
number.

The long term plan is to make this value (along with others) user
configurable via devlink and write it to NVRAM, so that it can be used
during the next init. Until then we must hardcode such values.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoi40e: Assure that adminq is alive in debug mode
Shannon Nelson [Thu, 10 Mar 2016 22:59:44 +0000 (14:59 -0800)]
i40e: Assure that adminq is alive in debug mode

When dropping into debug mode in a failed probe, make sure that
the AdminQ is left alive for possible hand debug of driver and
firmware states.

Move the mutex_init calls earlier in probe so that if init fails,
the admin queue interface is still available for debugging purposes.

Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e: Remove MSIx only if created
Shannon Nelson [Thu, 10 Mar 2016 22:59:43 +0000 (14:59 -0800)]
i40e: Remove MSIx only if created

When cleaning up the interrupt handling, clean up the IRQs only if
we actually got them set up.  There are a couple of error recovery
paths that were violating this and causing the kernel a bit of
indigestion.

Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Reviewed-by: Williams, Mitch A <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e: Fix up return code
Jesse Brandeburg [Thu, 10 Mar 2016 22:59:42 +0000 (14:59 -0800)]
i40e: Fix up return code

The i40e_common.c typically uses i40e_status as a return code,
but got missed this one case.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e: Save off VSI resource count when updating VSI
Kevin Scott [Thu, 10 Mar 2016 22:59:41 +0000 (14:59 -0800)]
i40e: Save off VSI resource count when updating VSI

When updating a VSI, save off the number of allocated and unallocated
VSIs as we do when adding a VSI.

Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e/i40evf: Remove I40E_MAX_USER_PRIORITY define
Catherine Sullivan [Thu, 10 Mar 2016 22:59:40 +0000 (14:59 -0800)]
i40e/i40evf: Remove I40E_MAX_USER_PRIORITY define

This patch removes the duplicate definition of I40E_MAX_USER_PRIORITY
in i40e.h that is not needed.

Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e/i40evf: Fix casting in transmit code
Jesse Brandeburg [Thu, 10 Mar 2016 22:59:39 +0000 (14:59 -0800)]
i40e/i40evf: Fix casting in transmit code

Simple cast to fix a sparse warning.

Fixes: commit 5453205cd097 ("i40e/i40evf: Enable support for
SKB_GSO_UDP_TUNNEL_CSUM")

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e/i40evf: Add support for bulk free in Tx cleanup
Alexander Duyck [Mon, 7 Mar 2016 17:30:03 +0000 (09:30 -0800)]
i40e/i40evf: Add support for bulk free in Tx cleanup

This patch enables bulk Tx clean for skbs.  In order to enable it we need
to pass the napi_budget value as that is used to determine if we are truly
running in NAPI mode or if we are simply calling the routine from netpoll
with a budget of 0.  In order to avoid adding too many more variables I
thought it best to pass the VSI directly in a fashion similar to what we do
on igb and ixgbe with the q_vector.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e/i40evf: Fix handling of boolean logic in polling routines
Alexander Duyck [Mon, 7 Mar 2016 17:29:57 +0000 (09:29 -0800)]
i40e/i40evf: Fix handling of boolean logic in polling routines

In the polling routines for i40e and i40evf we were using bitwise operators
to avoid the side effects of the logical operators, specifically the fact
that if the first case is true with "||" we skip the second case, or if it
is false with "&&" we skip the second case.  This fixes an earlier patch
that converted the bitwise operators over to the logical operators and
instead replaces the entire thing with just an if statement since it should
be more readable what we are trying to do this way.

Fixes: 1a36d7fadd14 ("i40e/i40evf: use logical operators, not bitwise")
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40evf: remove dead code
Alan Cox [Wed, 2 Mar 2016 00:02:15 +0000 (16:02 -0800)]
i40evf: remove dead code

The only error case is when the malloc fails, in which case the clean up
loop does nothing at all, so remove it

Signed-off-by: Alan Cox <alan@linux.intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e/i40evf: Allow up to 12K bytes of data per Tx descriptor instead of 8K
Alexander Duyck [Fri, 19 Feb 2016 20:17:08 +0000 (12:17 -0800)]
i40e/i40evf: Allow up to 12K bytes of data per Tx descriptor instead of 8K

From what I can tell the practical limitation on the size of the Tx data
buffer is the fact that the Tx descriptor is limited to 14 bits.  As such
we cannot use 16K as is typically used on the other Intel drivers.  However
artificially limiting ourselves to 8K can be expensive as this means that
we will consume up to 10 descriptors (1 context, 1 for header, and 9 for
payload, non-8K aligned) in a single send.

I propose that we can reduce this by increasing the maximum data for a 4K
aligned block to 12K.  We can reduce the descriptors used for a 32K aligned
block by 1 by increasing the size like this.  In addition we still have the
4K - 1 of space that is still unused.  We can use this as a bit of extra
padding when dealing with data that is not aligned to 4K.

By aligning the descriptors after the first to 4K we can improve the
efficiency of PCIe accesses as we can avoid using byte enables and can fetch
full TLP transactions after the first fetch of the buffer.  This helps to
improve PCIe efficiency.  Below is the results of testing before and after
with this patch:

Recv   Send   Send                         Utilization      Service Demand
Socket Socket Message  Elapsed             Send     Recv    Send    Recv
Size   Size   Size     Time    Throughput  local    remote  local   remote
bytes  bytes  bytes    secs.   10^6bits/s  % S      % U     us/KB   us/KB
Before:
87380  16384  16384    10.00     33682.24  20.27    -1.00   0.592   -1.00
After:
87380  16384  16384    10.00     34204.08  20.54    -1.00   0.590   -1.00

So the net result of this patch is that we have a small gain in throughput
due to a reduction in overhead for putting together the frame.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoi40e: call ndo_stop() instead of dev_close() when running offline selftest
Stefan Assmann [Wed, 3 Feb 2016 08:20:47 +0000 (09:20 +0100)]
i40e: call ndo_stop() instead of dev_close() when running offline selftest

Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoMerge branch 'tcp-udp-misc'
David S. Miller [Tue, 5 Apr 2016 02:11:21 +0000 (22:11 -0400)]
Merge branch 'tcp-udp-misc'

Eric Dumazet says:

====================
net: various udp/tcp changes

First round of patches for linux-4.7

Add a generic facility for sockets to be freed after an RCU grace
period, if they need to.

Then UDP stack is changed to no longer use SLAB_DESTROY_BY_RCU,
in order to speedup rx processing for traffic encapsulated in UDP.
It gives a 17 % speedup for normal UDP reception in stress conditions.

Then TCP listeners are changed to use SOCK_RCU_FREE as well
to avoid touching sk_refcnt in synflood case :
I got up to 30 % performance increase for a mono listener.

Then three patches add SK_MEMINFO_DROPS to sock_diag
and add per socket rx drops accounting to TCP.

Last patch adds rate limiting on ACK sent on behalf of SYN_RECV
to better resist to SYNFLOOD targeting one or few flows.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: rate limit ACK sent by SYN_RECV request sockets
Eric Dumazet [Fri, 1 Apr 2016 15:52:22 +0000 (08:52 -0700)]
tcp: rate limit ACK sent by SYN_RECV request sockets

Attackers like to use SYNFLOOD targeting one 5-tuple, as they
hit a single RX queue (and cpu) on the victim.

If they use random sequence numbers in their SYN, we detect
they do not match the expected window and send back an ACK.

This patch adds a rate limitation, so that the effect of such
attacks is limited to ingress only.

We roughly double our ability to absorb such attacks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv4: tcp: set SOCK_USE_WRITE_QUEUE for ip_send_unicast_reply()
Eric Dumazet [Fri, 1 Apr 2016 15:52:21 +0000 (08:52 -0700)]
ipv4: tcp: set SOCK_USE_WRITE_QUEUE for ip_send_unicast_reply()

TCP uses per cpu 'sockets' to send some packets :
- RST packets ( tcp_v4_send_reset()) )
- ACK packets for SYN_RECV and TIMEWAIT sockets

By setting SOCK_USE_WRITE_QUEUE flag, we tell sock_wfree()
to not call sk_write_space() since these internal sockets
do not care.

This gives a small performance improvement, merely by allowing
cpu to properly predict the sock_wfree() conditional branch,
and avoiding one atomic operation.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: increment sk_drops for listeners
Eric Dumazet [Fri, 1 Apr 2016 15:52:20 +0000 (08:52 -0700)]
tcp: increment sk_drops for listeners

Goal: packets dropped by a listener are accounted for.

This adds tcp_listendrop() helper, and clears sk_drops in sk_clone_lock()
so that children do not inherit their parent drop count.

Note that we no longer increment LINUX_MIB_LISTENDROPS counter when
sending a SYNCOOKIE, since the SYN packet generated a SYNACK.
We already have a separate LINUX_MIB_SYNCOOKIESSENT

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: increment sk_drops for dropped rx packets
Eric Dumazet [Fri, 1 Apr 2016 15:52:19 +0000 (08:52 -0700)]
tcp: increment sk_drops for dropped rx packets

Now ss can report sk_drops, we can instruct TCP to increment
this per socket counter when it drops an incoming frame, to refine
monitoring and debugging.

Following patch takes care of listeners drops.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosock_diag: add SK_MEMINFO_DROPS
Eric Dumazet [Fri, 1 Apr 2016 15:52:18 +0000 (08:52 -0700)]
sock_diag: add SK_MEMINFO_DROPS

Reporting sk_drops to user space was available for UDP
sockets using /proc interface.

Add this to sock_diag, so that we can have the same information
available to ss users, and we'll be able to add sk_drops
indications for TCP sockets as well.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp/dccp: do not touch listener sk_refcnt under synflood
Eric Dumazet [Fri, 1 Apr 2016 15:52:17 +0000 (08:52 -0700)]
tcp/dccp: do not touch listener sk_refcnt under synflood

When a SYNFLOOD targets a non SO_REUSEPORT listener, multiple
cpus contend on sk->sk_refcnt and sk->sk_wmem_alloc changes.

By letting listeners use SOCK_RCU_FREE infrastructure,
we can relax TCP_LISTEN lookup rules and avoid touching sk_refcnt

Note that we still use SLAB_DESTROY_BY_RCU rules for other sockets,
only listeners are impacted by this change.

Peak performance under SYNFLOOD is increased by ~33% :

On my test machine, I could process 3.2 Mpps instead of 2.4 Mpps

Most consuming functions are now skb_set_owner_w() and sock_wfree()
contending on sk->sk_wmem_alloc when cooking SYNACK and freeing them.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoinet: reqsk_alloc() needs to take care of dead listeners
Eric Dumazet [Fri, 1 Apr 2016 15:52:16 +0000 (08:52 -0700)]
inet: reqsk_alloc() needs to take care of dead listeners

We'll soon no longer take a refcount on listeners,
so reqsk_alloc() can not assume a listener refcount is not
zero. We need to use atomic_inc_not_zero()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp/dccp: use rcu locking in inet_diag_find_one_icsk()
Eric Dumazet [Fri, 1 Apr 2016 15:52:15 +0000 (08:52 -0700)]
tcp/dccp: use rcu locking in inet_diag_find_one_icsk()

RX packet processing holds rcu_read_lock(), so we can remove
pairs of rcu_read_lock()/rcu_read_unlock() in lookup functions
if inet_diag also holds rcu before calling them.

This is needed anyway as __inet_lookup_listener() and
inet6_lookup_listener() will soon no longer increment
refcount on the found listener.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp/dccp: remove BH disable/enable in lookup
Eric Dumazet [Fri, 1 Apr 2016 15:52:14 +0000 (08:52 -0700)]
tcp/dccp: remove BH disable/enable in lookup

Since linux 2.6.29, lookups only use rcu locking.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoudp: no longer use SLAB_DESTROY_BY_RCU
Eric Dumazet [Fri, 1 Apr 2016 15:52:13 +0000 (08:52 -0700)]
udp: no longer use SLAB_DESTROY_BY_RCU

Tom Herbert would like not touching UDP socket refcnt for encapsulated
traffic. For this to happen, we need to use normal RCU rules, with a grace
period before freeing a socket. UDP sockets are not short lived in the
high usage case, so the added cost of call_rcu() should not be a concern.

This actually removes a lot of complexity in UDP stack.

Multicast receives no longer need to hold a bucket spinlock.

Note that ip early demux still needs to take a reference on the socket.

Same remark for functions used by xt_socket and xt_PROXY netfilter modules,
but this might be changed later.

Performance for a single UDP socket receiving flood traffic from
many RX queues/cpus.

Simple udp_rx using simple recvfrom() loop :
438 kpps instead of 374 kpps : 17 % increase of the peak rate.

v2: Addressed Willem de Bruijn feedback in multicast handling
 - keep early demux break in __udp4_lib_demux_lookup()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Willem de Bruijn <willemb@google.com>
Tested-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: add SOCK_RCU_FREE socket flag
Eric Dumazet [Fri, 1 Apr 2016 15:52:12 +0000 (08:52 -0700)]
net: add SOCK_RCU_FREE socket flag

We want a generic way to insert an RCU grace period before socket
freeing for cases where RCU_SLAB_DESTROY_BY_RCU is adding too
much overhead.

SLAB_DESTROY_BY_RCU strict rules force us to take a reference
on the socket sk_refcnt, and it is a performance problem for UDP
encapsulation, or TCP synflood behavior, as many CPUs might
attempt the atomic operations on a shared sk_refcnt

UDP sockets and TCP listeners can set SOCK_RCU_FREE so that their
lookup can use traditional RCU rules, without refcount changes.
They can set the flag only once hashed and visible by other cpus.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Tested-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Tue, 5 Apr 2016 02:01:44 +0000 (22:01 -0400)]
Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2016-04-04

This series contains updates to ixgbe and ixgbevf.

Pavel Tikhomirov fixes a typo where we were incrementing transmit stats
instead of receive stats on the receive side.

Emil updates the ixgbevf driver to use bit operations for setting and
checking the adapter state.

Chas Williams adds the new NDO trust feature check so that the VF guest
has the ability to set the unicast address of the interface, if it is a
trusted VF.

Alex cleans up the driver to that the only time we add a PF entry to the
VLVF is either for VLAN 0 or if the PF has requested a VLAN that a VF
is already using.  Also adds support for generic transmit checksums,
giving the added advantage is that we can support inner checksum offloads
for tunnels and MPLS while still being able to transparently insert
VLAN tags.  Lastly, changed ixgbe so that we can use the ethtool
rx-vlan-filter flag to toggle receive VLAN filtering on and off.

Mark cleans up the ixgbe driver by making all op structures that do not
change constants.  Also fixed flow control for Xeon D KR backplanes, since
we cannot use auto-negotiation to determine the mode, we have to use
whatever the user configured.

Sowmini Varadhan updates ixgbe to use eth_platform_get_mac_address()
instead of the arch specific solution that was added by a previous
commit.

Don fixed an issue where it was possible that a system reset could occur
when we were holding the SWFW semaphore lock, which the next time the
driver loaded would see it incorrectly as locked.

v2: updated patch 8 of the series to include a minor flags issue where
    we had lost NETIF_F_HW_TC and we were setting NETIF_F_SCTP_CRC in
    two different areas, when we only needed/wanted it in one spot.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'mv88e6131-hw-bridging-6185'
David S. Miller [Tue, 5 Apr 2016 01:31:35 +0000 (21:31 -0400)]
Merge branch 'mv88e6131-hw-bridging-6185'

Vivien Didelot says:

====================
net: dsa: mv88e6131: HW bridging support for 6185

All packets passing through a switch of the 6185 family are currently all
directed to the CPU port. This means that port bridging is software driven.

To enable hardware bridging for this switch family, we need to implement the
port mapping operations, the FDB operations, and optionally the VLAN operations
(for 802.1Q and VLAN filtering aware systems).

However this family only has 256 FDBs indexed by 8-bit identifiers, opposed to
4096 FDBs with 12-bit identifiers for other families such as 6352. It also
doesn't have dedicated FID registers for ATU and VTU operations.

This patchset fixes these differences, and enable hardware bridging for 6185.

Changes v1 -> v2:
 - Describe the different numbers of databases and prefer a feature-based logic
   over the current ID/family-based logic.
====================

Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6131: enable hardware bridging
Vivien Didelot [Thu, 31 Mar 2016 20:53:46 +0000 (16:53 -0400)]
net: dsa: mv88e6131: enable hardware bridging

By adding support for bridge operations, FDB operations, and optionally
VLAN operations (for 802.1Q and VLAN filtering aware systems), the
switch bridges ports correctly, the CPU is able to populate the hardware
address databases, and thus hardware bridging becomes functional within
the 88E6185 family of switches.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: map destination addresses for 6185
Vivien Didelot [Thu, 31 Mar 2016 20:53:45 +0000 (16:53 -0400)]
net: dsa: mv88e6xxx: map destination addresses for 6185

The 88E6185 switch also has a MapDA bit in its Port Control 2 register.
When this bit is cleared, all frames are sent out to the CPU port.

Set this bit to rely on address databases (ATU) hits and direct frames
out of the correct ports, and thus allow hardware bridging.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: support 256 databases
Vivien Didelot [Thu, 31 Mar 2016 20:53:44 +0000 (16:53 -0400)]
net: dsa: mv88e6xxx: support 256 databases

The 6185 family of devices has only 256 address databases. Their 8-bit
FID for ATU and VTU operations are split into ATU Control and ATU/VTU
Operation registers.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: variable number of databases
Vivien Didelot [Thu, 31 Mar 2016 20:53:43 +0000 (16:53 -0400)]
net: dsa: mv88e6xxx: variable number of databases

Marvell switch chips have different number of address databases.

The code currently only supports models with 4096 databases. Such switch
has dedicated FID registers for ATU and VTU operations. Models with
fewer databases have their FID split in several registers.

List them all but only support models with 4096 databases at the moment.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: protect FID registers access
Vivien Didelot [Thu, 31 Mar 2016 20:53:42 +0000 (16:53 -0400)]
net: dsa: mv88e6xxx: protect FID registers access

Only switch families with 4096 address databases have dedicated FID
registers for ATU and VTU operations.

Factorize the access to the GLOBAL_ATU_FID register and introduce a
mv88e6xxx_has_fid_reg() helper function to protect the access to
GLOBAL_ATU_FID and GLOBAL_VTU_FID.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: protect SID register access
Vivien Didelot [Thu, 31 Mar 2016 20:53:41 +0000 (16:53 -0400)]
net: dsa: mv88e6xxx: protect SID register access

Introduce a mv88e6xxx_has_stu() helper to protect the access to the
GLOBAL_VTU_SID register, instead of checking switch families.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoixgbe: Add support for toggling VLAN filtering flag via ethtool
Alexander Duyck [Thu, 10 Mar 2016 18:01:10 +0000 (10:01 -0800)]
ixgbe: Add support for toggling VLAN filtering flag via ethtool

This change makes it so that we can use the ethtool rx-vlan-filter flag to
toggle Rx VLAN filtering on and off.  This is basically just an extension
of the existing VLAN promisc work in that it just adds support for the
additional ethtool flag.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Extend cls_u32 offload to support UDP headers
Amritha Nambiar [Wed, 9 Mar 2016 23:32:16 +0000 (18:32 -0500)]
ixgbe: Extend cls_u32 offload to support UDP headers

Added support to match on UDP fields in the transport layer.
Extended core logic to support multiple headers.

Verified with the following filters :

handle 1: u32 divisor 1
u32 ht 800: order 1 link 1: \
offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
u32 ht 1: order 2 \
match tcp src 1024 ffff match tcp dst 23 ffff action drop
handle 2: u32 divisor 1
u32 ht 800: order 3 link 2: \
offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 17 ff
u32 ht 2: order 4 \
match udp src 1025 ffff match udp dst 24 ffff action drop

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Place SWFW semaphore in known valid state at probe
Don Skidmore [Wed, 9 Mar 2016 21:45:00 +0000 (16:45 -0500)]
ixgbe: Place SWFW semaphore in known valid state at probe

It is possible on some HW that a system reset could occur when we are
holding the SWFW semaphore lock.  So next time the driver was loaded we
would see it incorrectly as locked. This patch will recover from that state
by: Attempting to acquire the semaphore and then regardless of whether or
not it was acquire we immediately release it. This will force us into
a known good state.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: add a callback to set the maximum transmit bitrate
Rostislav Pehlivanov [Wed, 27 Jan 2016 18:33:30 +0000 (18:33 +0000)]
ixgbe: add a callback to set the maximum transmit bitrate

This commit adds a callback which allows to adjust the maximum transmit
bitrate the card can output. This makes it possible to get a smooth
traffic instead of the default burst-y behaviour when trying to output
e.g. a video stream.

Much of the logic needed to get a correct bcnrc_val was taken from the
ixgbe_set_vf_rate_limit() function.

Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Fix flow control for Xeon D KR backplane
Mark Rustad [Tue, 26 Jan 2016 00:32:10 +0000 (16:32 -0800)]
ixgbe: Fix flow control for Xeon D KR backplane

Xeon D KR backplane is different from other backplanes,
in that we can't use auto-negotiation to determine the
mode. Instead, use whatever the user configured.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbevf: Add support for generic Tx checksums
Alexander Duyck [Wed, 13 Jan 2016 15:31:17 +0000 (07:31 -0800)]
ixgbevf: Add support for generic Tx checksums

This patch adds support for generic Tx checksums to the ixgbevf driver.  It
turns out this is actually pretty easy after going over the datasheet as we
were doing a number of steps we didn't need to.

In order to perform a Tx checksum for an L4 header we need to fill in the
following fields in the Tx descriptor:
  MACLEN (maximum of 127), retrieved from:
skb_network_offset()
  IPLEN  (maximum of 511), retrieved from:
skb_checksum_start_offset() - skb_network_offset()
  TUCMD.L4T indicates offset and if checksum or crc32c, based on:
skb->csum_offset

The added advantage to doing this is that we can support inner checksum
offloads for tunnels and MPLS while still being able to transparently
insert VLAN tags.

I also took the opportunity to clean-up many of the feature flag
configuration bits to make them a bit more consistent between drivers.  In
the case of the VF drivers this meant adding support for SCTP CRCs, and
inner checksum offloads for MPLS and various tunnel types.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Add support for generic Tx checksums
Alexander Duyck [Wed, 13 Jan 2016 15:31:11 +0000 (07:31 -0800)]
ixgbe: Add support for generic Tx checksums

This patch adds support for generic Tx checksums to the ixgbe driver.  It
turns out this is actually pretty easy after going over the datasheet as we
were doing a number of steps we didn't need to.

In order to perform a Tx checksum for an L4 header we need to fill in the
following fields in the Tx descriptor:
  MACLEN (maximum of 127), retrieved from:
skb_network_offset()
  IPLEN  (maximum of 511), retrieved from:
skb_checksum_start_offset() - skb_network_offset()
  TUCMD.L4T indicates offset and if checksum or crc32c, based on:
skb->csum_offset

The added advantage to doing this is that we can support inner checksum
offloads for tunnels and MPLS while still being able to transparently
insert VLAN tags.

I also took the opportunity to clean-up many of the feature flag
configuration bits to make them a bit more consistent between drivers.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: use eth_platform_get_mac_address()
Sowmini Varadhan [Wed, 13 Jan 2016 03:32:30 +0000 (19:32 -0800)]
ixgbe: use eth_platform_get_mac_address()

This commit converts commit c762dff24c06 ("ixgbe: Look up MAC address in
Open Firmware or IDPROM") to use eth_platform_get_mac_address()
added by commit c7f5d105495a ("net: Add eth_platform_get_mac_address()
helper.")

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Make all unchanging ops structures const
Mark Rustad [Thu, 7 Jan 2016 18:13:03 +0000 (10:13 -0800)]
ixgbe: Make all unchanging ops structures const

The source for the ops structure contents are const, so make them
so. Copy them in place with structure assignments instead of memcpys.
Make the mbx_ops accessed by reference instead of making a copy of
the source structure. Update copyright date on the touched files.

Reported-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Avoid adding VLAN 0 twice to VLVF and VFTA
Alexander Duyck [Thu, 7 Jan 2016 06:48:50 +0000 (22:48 -0800)]
ixgbe: Avoid adding VLAN 0 twice to VLVF and VFTA

We were adding VLAN 0 twice each time we restored the VLAN configuration.
Instead of doing it twice we can just start working through the active
VLANs from ID 1 on and skip the double write.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoirda: sh_irda: remove driver
Simon Horman [Mon, 4 Apr 2016 01:44:39 +0000 (10:44 +0900)]
irda: sh_irda: remove driver

Remove the sh-irda driver as it appears to be unused since
c0bb9b302769 ("ARCH: ARM: shmobile: Remove ag5evm board support").

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'macb-coding-style'
David S. Miller [Mon, 4 Apr 2016 20:16:36 +0000 (16:16 -0400)]
Merge branch 'macb-coding-style'

Moritz Fischer says:

====================
macb: Codingstyle cleanups

resending almost unchanged v2 here:

Changes from v2:
* Rebased onto net-next
* Changed 5th patches commit message
* Added Nicholas' and Michal's Acked-Bys

Changes from v1:
* Backed out variable scope changes
* Separated out ether_addr_copy into it's own commit
* Fixed typo in comments as suggested by Joe
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: macb: Fix simple typo
Moritz Fischer [Wed, 30 Mar 2016 02:11:15 +0000 (19:11 -0700)]
net: macb: Fix simple typo

Acked-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: macb: Use ether_addr_copy over memcpy
Moritz Fischer [Wed, 30 Mar 2016 02:11:14 +0000 (19:11 -0700)]
net: macb: Use ether_addr_copy over memcpy

Checkpatch suggests using ether_addr_copy over memcpy
to copy the mac address.

Acked-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: macb: Fix coding style suggestions
Moritz Fischer [Wed, 30 Mar 2016 02:11:13 +0000 (19:11 -0700)]
net: macb: Fix coding style suggestions

This commit deals with a bunch of checkpatch suggestions
that without changing behavior make checkpatch happier.

Acked-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: macb: Fix coding style warnings
Moritz Fischer [Wed, 30 Mar 2016 02:11:12 +0000 (19:11 -0700)]
net: macb: Fix coding style warnings

This commit takes care of the coding style warnings
that are mostly due to a different comment style and
lines over 80 chars, as well as a dangling else.

Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: macb: Fix coding style error message
Moritz Fischer [Wed, 30 Mar 2016 02:11:11 +0000 (19:11 -0700)]
net: macb: Fix coding style error message

checkpatch.pl gave the following error:

ERROR: space required before the open parenthesis '('
+ for(; p < end; p++, offset += 4)

Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoravb: Add dma queue interrupt support
Kazuya Mizuguchi [Sun, 3 Apr 2016 14:54:38 +0000 (23:54 +0900)]
ravb: Add dma queue interrupt support

This patch supports the following interrupts.

- One interrupt for multiple (timestamp, error, gPTP)
- One interrupt for emac
- Four interrupts for dma queue (best effort rx/tx, network control rx/tx)

This patch improve efficiency of the interrupt handler by adding the
interrupt handler corresponding to each interrupt source described
above. Additionally, it reduces the number of times of the access to
EthernetAVB IF.
Also this patch prevent this driver depends on the whim of a boot loader.

[ykaneko0929@gmail.com: define bit names of registers]
[ykaneko0929@gmail.com: add comment for gen3 only registers]
[ykaneko0929@gmail.com: fix coding style]
[ykaneko0929@gmail.com: update changelog]
[ykaneko0929@gmail.com: gen3: fix initialization of interrupts]
[ykaneko0929@gmail.com: gen3: fix clearing interrupts]
[ykaneko0929@gmail.com: gen3: add helper function for request_irq()]
[ykaneko0929@gmail.com: gen3: remove IRQF_SHARED flag for request_irq()]
[ykaneko0929@gmail.com: revert ravb_close() and ravb_ptp_stop()]
[ykaneko0929@gmail.com: avoid calling free_irq() to non-hooked interrupts]
[ykaneko0929@gmail.com: make NC/BE interrupt handler a function]
[ykaneko0929@gmail.com: make timestamp interrupt handler a function]
[ykaneko0929@gmail.com: timestamp interrupt is handled in multiple
 interrupt handler instead of dma queue interrupt handler]
Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoixgbe: Do not allow PF to add VLVF entry unless it actually needs it
Alexander Duyck [Thu, 7 Jan 2016 06:48:44 +0000 (22:48 -0800)]
ixgbe: Do not allow PF to add VLVF entry unless it actually needs it

While doing the work on igb I realized there were a few cases where we were
still adding VLANs to the VLVF entries for the PF when they were not
needed.  This patch cleans that up so that the only time we add a PF entry
to the VLVF is either for VLAN 0 or if the PF has requested a VLAN that a VF
is already using.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbe: Extend trust to allow guest to set unicast address
chas williams [Tue, 5 Jan 2016 22:30:39 +0000 (17:30 -0500)]
ixgbe: Extend trust to allow guest to set unicast address

When running certain routing protocols like VRRP, VF guests need the
ability to set the unicast address of the interface.  Extend the new ndo
trust feature to let the hypervisor trust a guest to set/update its own
unicast address.

Signed-off-by: Chas Williams <3chas3@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoixgbevf: use bit operations for setting and checking resets
Emil Tantilov [Fri, 18 Dec 2015 01:32:55 +0000 (17:32 -0800)]
ixgbevf: use bit operations for setting and checking resets

Move the reset flags to adapter->state in order to make use of bit
operations.

This is an alternative patch to the one previously submitted by
John Greene.

Suggested-by: Alexander Duyck <aduyck@mirantis.com>
Reported-by: Scott Otto <otts62@yahoo.com>
Reported-by: John Greene <jogreene@redhat.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agoMerge branch 'cmsg_timestamp'
David S. Miller [Mon, 4 Apr 2016 19:50:31 +0000 (15:50 -0400)]
Merge branch 'cmsg_timestamp'

Soheil Hassas Yeganeh says:

====================
add TX timestamping via cmsg

This patch series aim at enabling TX timestamping via cmsg.

Currently, to occasionally sample TX timestamping on a socket,
applications need to call setsockopt twice: first for enabling
timestamps and then for disabling them. This is an unnecessary
overhead. With cmsg, in contrast, applications can sample TX
timestamps per sendmsg().

This patch series adds the code for processing SO_TIMESTAMPING
for cmsg's of the SOL_SOCKET level, and adds the glue code for
TCP, UDP, and RAW for both IPv4 and IPv6. This implementation
supports overriding timestamp generation flags (i.e.,
SOF_TIMESTAMPING_TX_*) but not timestamp reporting flags.
Applications must still enable timestamp reporting via
setsockopt to receive timestamps.

This series does not change existing timestamping behavior for
applications that are using socket options.

I will follow up with another patch to enable timestamping for
active TFO (client-side TCP Fast Open) and also setting packet
mark via cmsgs.

Thanks!

Changes in v2:
        - Replace u32 with __u32 in the documentation.

Changes in v3:
- Fix the broken build for L2TP (due to changes
  in IPv6).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosock: document timestamping via cmsg in Documentation
Soheil Hassas Yeganeh [Sun, 3 Apr 2016 03:08:13 +0000 (23:08 -0400)]
sock: document timestamping via cmsg in Documentation

Update docs and add code snippet for using cmsg for timestamping.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosock: enable timestamping using control messages
Soheil Hassas Yeganeh [Sun, 3 Apr 2016 03:08:12 +0000 (23:08 -0400)]
sock: enable timestamping using control messages

Currently, SOL_TIMESTAMPING can only be enabled using setsockopt.
This is very costly when users want to sample writes to gather
tx timestamps.

Add support for enabling SO_TIMESTAMPING via control messages by
using tsflags added in `struct sockcm_cookie` (added in the previous
patches in this series) to set the tx_flags of the last skb created in
a sendmsg. With this patch, the timestamp recording bits in tx_flags
of the skbuff is overridden if SO_TIMESTAMPING is passed in a cmsg.

Please note that this is only effective for overriding the recording
timestamps flags. Users should enable timestamp reporting (e.g.,
SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_OPT_ID) using
socket options and then should ask for SOF_TIMESTAMPING_TX_*
using control messages per sendmsg to sample timestamps for each
write.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv6: process socket-level control messages in IPv6
Soheil Hassas Yeganeh [Sun, 3 Apr 2016 03:08:11 +0000 (23:08 -0400)]
ipv6: process socket-level control messages in IPv6

Process socket-level control messages by invoking
__sock_cmsg_send in ip6_datagram_send_ctl for control messages on
the SOL_SOCKET layer.

This makes sure whenever ip6_datagram_send_ctl is called for
udp and raw, we also process socket-level control messages.

This is a bit uglier than IPv4, since IPv6 does not have
something like ipcm_cookie. Perhaps we can later create
a control message cookie for IPv6?

Note that this commit interprets new control messages that
were ignored before. As such, this commit does not change
the behavior of IPv6 control messages.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv4: process socket-level control messages in IPv4
Soheil Hassas Yeganeh [Sun, 3 Apr 2016 03:08:10 +0000 (23:08 -0400)]
ipv4: process socket-level control messages in IPv4

Process socket-level control messages by invoking
__sock_cmsg_send in ip_cmsg_send for control messages on
the SOL_SOCKET layer.

This makes sure whenever ip_cmsg_send is called in udp, icmp,
and raw, we also process socket-level control messages.

Note that this commit interprets new control messages that
were ignored before. As such, this commit does not change
the behavior of IPv4 control messages.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosock: accept SO_TIMESTAMPING flags in socket cmsg
Soheil Hassas Yeganeh [Sun, 3 Apr 2016 03:08:09 +0000 (23:08 -0400)]
sock: accept SO_TIMESTAMPING flags in socket cmsg

Accept SO_TIMESTAMPING in control messages of the SOL_SOCKET level
as a basis to accept timestamping requests per write.

This implementation only accepts TX recording flags (i.e.,
SOF_TIMESTAMPING_TX_HARDWARE, SOF_TIMESTAMPING_TX_SOFTWARE,
SOF_TIMESTAMPING_TX_SCHED, and SOF_TIMESTAMPING_TX_ACK) in
control messages. Users need to set reporting flags (e.g.,
SOF_TIMESTAMPING_OPT_ID) per socket via socket options.

This commit adds a tsflags field in sockcm_cookie which is
set in __sock_cmsg_send. It only override the SOF_TIMESTAMPING_TX_*
bits in sockcm_cookie.tsflags allowing the control message
to override the recording behavior per write, yet maintaining
the value of other flags.

This patch implements validating the control message and setting
tsflags in struct sockcm_cookie. Next commits in this series will
actually implement timestamping per write for different protocols.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: use one bit in TCP_SKB_CB to mark ACK timestamps
Soheil Hassas Yeganeh [Sun, 3 Apr 2016 03:08:08 +0000 (23:08 -0400)]
tcp: use one bit in TCP_SKB_CB to mark ACK timestamps

Currently, to avoid a cache line miss for accessing skb_shinfo,
tcp_ack_tstamp skips socket that do not have
SOF_TIMESTAMPING_TX_ACK bit set in sk_tsflags. This is
implemented based on an implicit assumption that the
SOF_TIMESTAMPING_TX_ACK is set via socket options for the
duration that ACK timestamps are needed.

To implement per-write timestamps, this check should be
removed and replaced with a per-packet alternative that
quickly skips packets missing ACK timestamps marks without
a cache-line miss.

To enable per-packet marking without a cache line miss, use
one bit in TCP_SKB_CB to mark a whether a SKB might need a
ack tx timestamp or not. Further checks in tcp_ack_tstamp are not
modified and work as before.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: accept SOF_TIMESTAMPING_OPT_ID for passive TFO
Soheil Hassas Yeganeh [Sun, 3 Apr 2016 03:08:07 +0000 (23:08 -0400)]
tcp: accept SOF_TIMESTAMPING_OPT_ID for passive TFO

SOF_TIMESTAMPING_OPT_ID is set to get data-independent IDs
to associate timestamps with send calls. For TCP connections,
tp->snd_una is used as the starting point to calculate
relative IDs.

This socket option will fail if set before the handshake on a
passive TCP fast open connection with data in SYN or SYN/ACK,
since setsockopt requires the connection to be in the
ESTABLISHED state.

To address these, instead of limiting the option to the
ESTABLISHED state, accept the SOF_TIMESTAMPING_OPT_ID option as
long as the connection is not in LISTEN or CLOSE states.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosock: break up sock_cmsg_snd into __sock_cmsg_snd and loop
Willem de Bruijn [Sun, 3 Apr 2016 03:08:06 +0000 (23:08 -0400)]
sock: break up sock_cmsg_snd into __sock_cmsg_snd and loop

To process cmsg's of the SOL_SOCKET level in addition to
cmsgs of another level, protocols can call sock_cmsg_send().
This causes a double walk on the cmsghdr list, one for SOL_SOCKET
and one for the other level.

Extract the inner demultiplex logic from the loop that walks the list,
to allow having this called directly from a walker in the protocol
specific code.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>