git.cascardo.info Git - cascardo/linux.git/log

netlink: autosize skb lengthes

One known problem with netlink is the fact that NLMSG_GOODSIZE is
really small on PAGE_SIZE==4096 architectures, and it is difficult
to know in advance what buffer size is used by the application.

This patch adds an automatic learning of the size.

First netlink message will still be limited to ~4K, but if user used
bigger buffers, then following messages will be able to use up to 16KB.

This speedups dump() operations by a large factor and should be safe
for legacy applications.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Use ether_addr_copy and eth_broadcast_addr

Faster than memcpy/memset on some architectures.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'gianfar-next'

Claudiu Manoil says:

====================
gianfar: Tx timeout issue

There's an older Tx timeout issue showing up on etsec2 devices
with 2 CPUs.  I pinned this issue down to processing overhead
incurred by supporting multiple Tx/Rx rings, as explained in
the 2nd patch below.  But before this, there's also a concurency
issue leading to Rx/Tx spurrious interrupts, addressed by the
'Tx NAPI' patch below.
The Tx timeout can be triggered with multiple Tx flows,
'iperf -c -N 8' commands, on a 2 CPUs etsec2 based (P1020) board.

Before the patches:
"""
root@p1020rdb-pc:~# iperf -c 172.16.1.3 -n 1000M -P 8 &
[...]
root@p1020rdb-pc:~# NETDEV WATCHDOG: eth1 (fsl-gianfar): transmit queue 1 timed out
WARNING: at net/sched/sch_generic.c:279
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.13.0-rc3-03386-g89ea59c #23
task: ed84ef40 ti: ed868000 task.ti: ed868000
NIP: c04627a8 LR: c04627a8 CTR: c02fb270
REGS: ed869d00 TRAP: 0700   Not tainted  (3.13.0-rc3-03386-g89ea59c)
MSR: 00029000 <CE,EE,ME>  CR: 44000022  XER: 20000000
[...]

root@p1020rdb-pc:~# [ ID] Interval       Transfer     Bandwidth
[  5]  0.0-19.3 sec  1000 MBytes    434 Mbits/sec
[  8]  0.0-39.7 sec  1000 MBytes    211 Mbits/sec
[  9]  0.0-40.1 sec  1000 MBytes    209 Mbits/sec
[  3]  0.0-40.2 sec  1000 MBytes    209 Mbits/sec
[ 10]  0.0-59.0 sec  1000 MBytes    142 Mbits/sec
[  7]  0.0-74.6 sec  1000 MBytes    112 Mbits/sec
[  6]  0.0-74.7 sec  1000 MBytes    112 Mbits/sec
[  4]  0.0-74.7 sec  1000 MBytes    112 Mbits/sec
[SUM]  0.0-74.7 sec  7.81 GBytes    898 Mbits/sec

root@p1020rdb-pc:~# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:04:9f:00:13:01
          inet addr:172.16.1.1  Bcast:172.16.255.255  Mask:255.255.0.0
          inet6 addr: fe80::204:9fff:fe00:1301/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:708722 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8717849 errors:6 dropped:0 overruns:1470 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:58118018 (55.4 MiB)  TX bytes:274069482 (261.3 MiB)
          Base address:0xa000

"""

After applying the patches:
"""
root@p1020rdb-pc:~# iperf -c 172.16.1.3 -n 1000M -P 8 &
[...]
root@p1020rdb-pc:~# [ ID] Interval       Transfer     Bandwidth
[  9]  0.0-70.5 sec  1000 MBytes    119 Mbits/sec
[  5]  0.0-70.5 sec  1000 MBytes    119 Mbits/sec
[  6]  0.0-70.7 sec  1000 MBytes    119 Mbits/sec
[  4]  0.0-71.0 sec  1000 MBytes    118 Mbits/sec
[  8]  0.0-71.1 sec  1000 MBytes    118 Mbits/sec
[  3]  0.0-71.2 sec  1000 MBytes    118 Mbits/sec
[ 10]  0.0-71.3 sec  1000 MBytes    118 Mbits/sec
[  7]  0.0-71.3 sec  1000 MBytes    118 Mbits/sec
[SUM]  0.0-71.3 sec  7.81 GBytes    942 Mbits/sec

root@p1020rdb-pc:~# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:04:9f:00:13:01
          inet addr:172.16.1.1  Bcast:172.16.255.255  Mask:255.255.0.0
          inet6 addr: fe80::204:9fff:fe00:1301/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:728446 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8690057 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:59732650 (56.9 MiB)  TX bytes:271554306 (258.9 MiB)
          Base address:0xa000
"""
v2: PATCH 2:
    Replaced CPP check with run-time condition to
    limit the number of queues. Updated comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Use Single-Queue polling for "fsl,etsec2"

For the "fsl,etsec2" compatible models the driver currently
supports 8 Tx and Rx DMA rings (aka HW queues).  However, there
are only 2 pairs of Rx/Tx interrupt lines, as these controllers
are integrated in low power SoCs with 2 CPUs at most.  As a result,
there are at most 2 NAPI instances that have to service multiple
Tx and Rx queues for these devices.  This complicates the NAPI
polling routine having to iterate over the mutiple Rx/Tx queues
hooked to the same interrupt lines.  And there's also an overhead
at HW level, as the controller needs to service all the 8 Tx rings
in a round robin manner.  The combined overhead shows up for multi
parallel Tx flows transmitted by the kernel stack, when the driver
usually starts returning NETDEV_TX_BUSY leading to NETDEV WATCHDOG
Tx timeout triggering if the Tx path is congested for too long.

As an alternative, this patch makes the driver support only one
Tx/Rx DMA ring per NAPI instance (per interrupt group or pair
of Tx/Rx interrupt lines) by default.  The simplified single queue
polling routine (gfar_poll_sq) will be the default napi poll routine
for the etsec2 devices too.  Some adjustments needed to be made to
link the Tx/Rx HW queues with each NAPI instance (2 in this case).
The gfar_poll_sq() is already successfully used by older SQ_SG_MODE
(single interrupt group) controllers.
This patch fixes Tx timeout triggering under heavy Tx traffic load
(i.e. iperf -c -P 8) for the "fsl,etsec2" (currently the only
MQ_MG_MODE devices).  There's also a significant memory footprint
reduction by supporting 2 Rx/Tx DMA rings (at most), instead of 8,
for these devices.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Separate out the Tx interrupt handling (Tx NAPI)

There are some concurrency issues on devices w/ 2 CPUs related
to the handling of Rx and Tx interrupts.  eTSEC has separate
interrupt lines for Rx and Tx but a single imask register
to mask these interrupts and a single NAPI instance to handle
both Rx and Tx work.  As a result, the Rx and Tx ISRs are
identical, both are invoking gfar_schedule_cleanup(), however
both handlers can be entered at the same time when the Rx and
Tx interrupts are taken by different CPUs.  In this case
spurrious interrupts (SPU) show up (in /proc/interrupts)
indicating a concurrency issue.  Also, Tx overruns followed
by Tx timeout have been observed under heavy Tx traffic load.

To address these issues, the schedule cleanup ISR part has
been changed to handle the Rx and Tx interrupts independently.
The patch adds a separate NAPI poll routine for Tx cleanup to
be triggerred independently by the Tx confirmation interrupts
only.  Existing poll functions are modified to handle only
the Rx path processing.  The Tx poll routine does not need a
budget, since Tx processing doesn't consume NAPI budget, and
hence it is registered with minimum NAPI weight.
NAPI scheduling does not require locking since there are
different NAPI instances between the Rx and Tx confirmation
paths now.
So, the patch fixes the occurence of spurrious Rx/Tx interrupts.
Tx overruns also occur less frequently now.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: use use ether_addr_equal_64bits to instead of ether_addr_equal

Ether_addr_equal_64bits is more efficient than ether_addr_equal, and
can be used when each argument is an array within a structure that
contains at least two bytes of data beyond the array, so it is safe
to use it for vlan, and make sense for fast path.

Cc: Joe Perches <joe@perches.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: slight optimization for vlan_do_receive()

According Joe's suggestion, maybe it'd be faster to add an unlikely to
the test for PCKET_OTHERHOST, so I add it and see whether the performance
could be better, although the differences is so small and negligible, but
it is hard to catch that any lower device would set the skb type to
PACKET_OTHERHOST, so most of time, I think it make sense to add unlikely
for the test.

Cc: Joe Perches <joe@perches.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pkt_sched: fq: do not hold qdisc lock while allocating memory

Resizing fq hash table allocates memory while holding qdisc spinlock,
with BH disabled.

This is definitely not good, as allocation might sleep.

We can drop the lock and get it when needed, we hold RTNL so no other
changes can happen at the same time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to e1000e, ixgbevf and igb.

Majority of this series contains fixes and cleanups to e1000e,
most notably are:

Todd provides a fix to PTP in e1000e which adds a lock in e1000e_phc_adjfreq
to prevent concurrent changes to TIMINCA and SYSTIMH/L.  Then provides an
igb fix to use ARRAY_SIZE for array size calculation.

David provides the remaining e1000e which contain:
- cleanup of pointer references that are no longer used
- fix an issue on systems with Management Engine enabled with the
   ethernet cable unplugged
- fix an issue on 82579 where enabling EEE LPI sooner than one second
   after link up causes link issues on some switches
- refactor the power management flows to prevent the suspend path from
   being executed twice when hibernating
- refactor the runtime power management to fix interfering with the
   functionality of Energy Efficient Ethernet when enabled and to fix
   the device from repeatedly flip between suspend and resume with the
   interface administratively downed
- enable the feature PHY Ultra Low Power Mode which is a power saving
   feature that reduces the power consumption of the PHY when a cable is
   not connected
- fix the ethtool offline tests for 82579 parts
- fix SHRA register access for 82579 parts which was introduced by
   previous commit c3a0dce35af0 "e1000e: fix overrun of PHY RAR array"

Florian provides a fix for ixgbevf where skb->pkt_type was being checked
like a bitmask, but it is not a bitmask.

Fix an issue reported by Stephen Hemminger where there was a warning
about code defined but never used if IGB_HWMON is not defined.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

igb: fix warning if !CONFIG_IGB_HWMON

Fix warning about code defined but never used if IGB_HWMON not defined.

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: fix array size calculation

Use ARRAY_SIZE for array size calculation.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbevf: fix skb->pkt_type checks

skb->pkt_type is not a bitmask, but contains only value at a time from
the range defined in include/uapi/linux/if_packet.h.

Checking it like if it was a bitmask of values would also cause
PACKET_OTHERHOST, PACKET_LOOPBACK and PACKET_FASTROUTE to be matched by
this check since their lower 2 bits are also set, although that does not
fix a real bug, it is still potentially confusing.

This bogus check was introduced in commit 815cccbf ("ixgbe: add setlink,
getlink support to ixgbe and ixgbevf").

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Fix SHRA register access for 82579

Previous commit c3a0dce35af0 fixed an overrun for the RAR on i218 devices.
This commit also attempted to homogenize the RAR/SHRA access for all parts
accessed by the e1000e driver. This change introduced an error for
assigning MAC addresses to guest OS's for 82579 devices.

Only RAR[0] is accessible to the driver for 82579 parts, and additional
addresses must be placed into the SHRA[L|H] registers. The rar_entry_count
was changed in the previous commit to an inaccurate value that accounted
for all RAR and SHRA registers, not just the ones usable by the driver.

This patch fixes the count to the correct value and adjusts the
e1000_rar_set_pch2lan() function to user the correct index.

Cc: John Greene <jogreene@redhat.com>
Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Fix ethtool offline tests for 82579 parts

Changes to the rar_entry_count value require a change to the indexing
used to access the SHRA[H|L] registers when testing them with
'ethtool -t <iface> offline'

Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Fix not generating an error on invalid load parameter

Valid values for InterruptThrottleRate are 10-100000, or one of
0, 1, 3, 4. '2' is not valid. This is a legacy from the branching
from the e1000 driver code that e1000e was based from.

Prior to this patch, if the e1000e driver was loaded with a forced
invalid InterruptThrottleRate of '2', then no throttle rate would be
set and no error message generated.

Now, a message will be generated that an invalid value was used and the
value for InterruptThrottleRate will be set to the default value.

Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Feature Enable PHY Ultra Low Power Mode (ULP)

ULP is a power saving feature that reduces the power consumption of the
PHY when a cable is not connected.

ULP is gated on the following conditions:
1) The hardware must support ULP.  Currently this is only I218
   devices from Intel
2) ULP is initiated by the driver, so, no driver results in no ULP.
3) ULP's implementation utilizes Runtime Power Management to toggle its
   execution.  ULP is enabled/disabled based on the state of Runtime PM.
4) ULP is not active when wake-on-unicast, multicast or broadcast is active
   as these features are mutually-exclusive.

Since the PHY is in an unavailable state while ULP is active, any access
of the PHY registers will fail.  This is resolved by utilizing kernel
calls that cause the device to exit Runtime PM (e.g. pm_runtime_get_sync)
and then, after PHY access is complete,  allow the device to resume
Runtime PM (e.g. pm_runtime_put_sync).

Under certain conditions, toggling the LANPHYPC is necessary to disable
ULP mode.  Break out existing code to toggle LANPHYPC to a new function
to avoid code duplication.

Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e Refactor of Runtime Power Management

Fix issues with:
RuntimePM causing the device to repeatedly flip between suspend and resume
with the interface administratively downed.
Having RuntimePM enabled interfering with the functionality of Energy
Efficient Ethernet.

Added checks to disallow functions that should not be executed if the
device is currently runtime suspended

Make runtime_idle callback to use same deterministic behavior as the igb
driver.

Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Refactor PM flows

Refactor the system power management flows to prevent the suspend path from
being executed twice when hibernating since both the freeze and
poweroff callbacks were set to e1000_suspend() via SET_SYSTEM_SLEEP_PM_OPS.
There are HW workarounds that are performed during this flow and calling
them twice was causing erroneous behavior.

Re-arrange the code to take advantage of common code paths and explicitly
set the individual dev_pm_ops callbacks for suspend, resume, freeze,
thaw, poweroff and restore.

Add a boolean parameter (reset) to the e1000e_down function to allow
for cases when the HW should not be reset when downed during a PM event.

Now that all suspend/shutdown paths result in a call to __e1000_shutdown()
that checks Wake on Lan status, removing redundant check for WoL in
e1000_power_down_phy().

Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Add missing branding strings in ich8lan.c

Branding strings from recently released and soon to be released
hardware configurations that are supported by e1000e.

Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Cleanup - Update GPL header and Copyright

This patch is to update the GPL header by removing the portion that
refers to the Free Software Foundation address.

Change the copyright date for 2014.

Reformat the header comments to conform to kernel networking coding norms

Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Fix 82579 sets LPI too early.

Enabling EEE LPI sooner than one second after link up on 82579 causes link
issues with some switches.

Remove EEE enablement for 82579 parts from the link initialization flow to
avoid initializing too early. EEE initialization for 82579 will be done
in e1000e_update_phy_task.

Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Acked-by: Bruce W Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Resolve issues with Management Engine (ME) briefly blocking PHY resets

On a ME enabled system with the cable out, the driver init flow would
generate an erroneous message indicating that resets were being blocked
by an active ME session.  Cause was ME clearing the semaphore bit to
block further PHY resets for up to 50 msec during power-on/cycle.  After
this interval, ME would re-set the bit and allow PHY resets.

To resolve this, change the flow of e1000e_phy_hw_reset_generic() to
utilize a delay and retry method.  Poll the FWSM register to minimize
any extra time added to the flow.  If the delay times out at 100ms
(checked in 10msec increments), then return the value E1000_BLK_PHY_RESET,
as this is the accurate state of the PHY.  Attempting to alter just the
call to e1000e_phy_hw_reset_generic() in e1000_init_phy_workarounds_pchlan()
just caused the problem to move further down the flow.

Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Acked-by: Bruce W. Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Cleanup unecessary references

Cleaning up some pointer references that are no longer necessary

Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: PTP lock in e1000e_phc_adjustfreq

Add lock in e1000e_phc_adjfreq to prevent concurrent changes to TIMINCA
and SYSTIMH/L.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

6lowpan: reassembly: fix return of init function

This patch adds a missing return after fragmentation init. Otherwise we
register a sysctl interface and deregister it afterwards which makes no
sense.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-next-for-3.15-20140307' of git://gitorious.org/linux-can/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2014-02-12

this is a pull request of twelve patches for net-next/master.

Alexander Shiyan contributes two patches for the mcp251x, one making
the driver more quiet and the other one improves the compile time
coverage by removing the #ifdef CONFIG_PM_SLEEP. Then two patches for
the flexcan driver by me, one removing the #ifdef CONFIG_PM_SLEEP, too,
the other one making use of platform_get_device_id(). Another patch by
me which converts the janz-ican3 driver to use netdev_<level>(). The
remaining 7 patches are by Oliver Hartkopp, they add CAN FD support to
the netlink configuration interface.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'r8152'

Hayes Wang says:

====================
r8152: tx/rx improvement

- Select the suitable spin lock for each function.
- Add additional check to reduce the spin lock.
- Up the priority of the tx to avoid interrupted by rx.
- Support rx checksum, large send, and IPv6 hw checksum.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: support IPv6

Support hw IPv6 checksum for TCP and UDP packets.

Note that the hw has the limitation of the range of the transport
offset. Besides, the TCP Pseudo Header of the IPv6 TSO of the hw
bases on the Microsoft document which excludes the packet length.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: support TSO

Support scatter gather and TSO.

Adjust the tx checksum function and set the max gso size to fix the
size of the tx aggregation buffer.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: support rx checksum

Support hw rx checksum for TCP and UDP packets.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: calculate the dropped packets for rx

Continue dealing with the remain rx packets, even though the allocation
of the skb fail. This could calculate the correct dropped packets.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: up the priority of the transmission

move the tx_bottom() from delayed_work to tasklet. It makes the rx
and tx balanced. If the device is in runtime suspend when getting
the tx packet, wakeup the device before trasmitting.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: check tx agg list before spin lock

Check tx agg list before spin lock to avoid doing spin lock every
times.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: replace spin_lock_irqsave and spin_unlock_irqrestore

Use spin_lock and spin_unlock in interrupt context.

The ndo_start_xmit would not be called in interrupt context, so
replace the relative spin_lock_irqsave and spin_unlock_irqrestore
with spin_lock_bh and spin_unlock_bh.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to i40e and i40evf.

Most notable are:
Joseph completes the implementation of the ethtool ntuple rule
management interface by adding the get, update and delete interface
reset.

Akeem provides a fix to prevent a possible overflow due to multiplication
of number and size by using kzalloc, so use kcalloc.

Jesse provides an implementation for skb_set_hash() and adds the L4 type
return when we know it is an L4 hash.  He also adds a counter to
statistics for Tx timeouts to help users.  Lastly he provides a change
to stay away from the cache line where the done bit may be getting
written back for the transmit ring since the hardware may be writing the
whole cache line for a partial update.

Shannon cleans up code comments.

Anjali removes a firmware workaround for newer firmware since the number
of MSIx vectors are being reported correctly.

v2:
-  dropped patch 01 of the series based on feedback from the author
    Joe Perches and Shannon Nelson.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'rxrpc-devel-20140304' of git://git./linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
net-next: AF_RXRPC fixes and development

Here are some AF_RXRPC fixes:

(1) Fix to remove incorrect checksum calculation made during recvmsg().  It's
     unnecessary to try to do this there since we check the checksum before
     reading the RxRPC header from the packet.

(2) Fix to prevent the sending of an ABORT packet in response to another
     ABORT packet and inducing a storm.

(3) Fix UDP MTU calculation from parsing ICMP_FRAG_NEEDED packets where we
     don't handle the ICMP packet not specifying an MTU size.

And development patches:

(4) Add sysctls for configuring RxRPC parameters, specifically various delays
     pertaining to ACK generation, the time before we resend a packet for
     which we don't receive an ACK, the maximum time a call is permitted to
     live and the amount of time transport, connection and dead call
     information is cached.

(5) Improve ACK packet production by adjusting the handling of ACK_REQUESTED
     packets, ignoring the MORE_PACKETS flag, delaying the production of
     otherwise immediate ACK_IDLE packets and delaying all ACK_IDLE production
     (barring the call termination) to half a second.

(6) Add more sysctl parameters to expose the Rx window size, the maximum
     packet size that we're willing to receive and the number of jumbo rxrpc
     packets we're willing to handle in a single UDP packet.

(7) Request ACKs on alternate DATA packets so that the other side doesn't
     wait till we fill up the Tx window.

(8) Use a RCU hash table to look up the rxrpc_call for an incoming packet
     rather than stepping through a hierarchy involving several spinlocks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'xen-netback-next'

Zoltan Kiss says:

====================
xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower AMD
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 IP stack through deliver_skb, which is due to this [2] patch. This affects
DomU->Dom0 IP traffic and when Dom0 does routing/NAT for the guest. That's a bit
unfortunate, but luckily it doesn't cause a major regression for this usecase.
In the future we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Use skb->cb to store pending_idx
2: Some refactoring
3: Change RX path for mapped SKB fragments (moved here to keep bisectability,
review it after #4)
4: Introduce TX grant mapping
5: Remove old TX grant copy definitons and fix indentations
6: Add stat counters for zerocopy
7: Handle guests with too many frags
8: Timeout packets in RX path
9: Aggregate TX unmap operations

v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush

v3: Apart from fixing a few things mentioned in responses the important change
is the use the hypercall directly for grant [un]mapping, therefore we can
avoid m2p override.

v4: Now we are using a new grant mapping API to avoid m2p_override. The RX queue
timeout logic changed also.

v5: Only minor fixes based on Wei's comments

v6: Important bugfixes for xenvif_poll exit path and zerocopy callback, see
first 2 patches. Also rework of handling packets with too many slots, and
reorder the series a bit.

v7: Small fixes in comments/log messages/error paths, and merging the frag
overflow stats patch into its parent.

[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363
====================

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netback: Aggregate TX unmap operations

Unmapping causes TLB flushing, therefore we should make it in the largest
possible batches. However we shouldn't starve the guest for too long. So if
the guest has space for at least two big packets and we don't have at least a
quarter ring to unmap, delay it for at most 1 milisec.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netback: Timeout packets in RX path

A malicious or buggy guest can leave its queue filled indefinitely, in which
case qdisc start to queue packets for that VIF. If those packets came from an
another guest, it can block its slots and prevent shutdown. To avoid that, we
make sure the queue is drained in every 10 seconds.
The QDisc queue in worst case takes 3 round to flush usually.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netback: Handle guests with too many frags

Xen network protocol had implicit dependency on MAX_SKB_FRAGS. Netback has to
handle guests sending up to XEN_NETBK_LEGACY_SLOTS_MAX slots. To achieve that:
- create a new skb
- map the leftover slots to its frags (no linear buffer here!)
- chain it to the previous through skb_shinfo(skb)->frag_list
- map them
- copy and coalesce the frags into a brand new one and send it to the stack
- unmap the 2 old skb's pages

It's also introduces new stat counters, which help determine how often the guest
sends a packet with more than MAX_SKB_FRAGS frags.

NOTE: if bisect brought you here, you should apply the series up until
"xen-netback: Timeout packets in RX path", otherwise malicious guests can block
other guests by not releasing their sent packets.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netback: Add stat counters for zerocopy

These counters help determine how often the buffers had to be copied. Also
they help find out if packets are leaked, as if "sent != success + fail",
there are probably packets never freed up properly.

NOTE: if bisect brought you here, you should apply the series up until
"xen-netback: Timeout packets in RX path", otherwise Windows guests can't work
properly and malicious guests can block other guests by not releasing their sent
packets.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netback: Remove old TX grant copy definitons and fix indentations

These became obsolete with grant mapping. I've left intentionally the
indentations in this way, to improve readability of previous patches.

NOTE: if bisect brought you here, you should apply the series up until
"xen-netback: Timeout packets in RX path", otherwise Windows guests can't work
properly and malicious guests can block other guests by not releasing their sent
packets.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netback: Introduce TX grant mapping

This patch introduces grant mapping on netback TX path. It replaces grant copy
operations, ditching grant copy coalescing along the way. Another solution for
copy coalescing is introduced in "xen-netback: Handle guests with too many
frags", older guests and Windows can broke before that patch applies.
There is a callback (xenvif_zerocopy_callback) from core stack to release the
slots back to the guests when kfree_skb or skb_orphan_frags called. It feeds a
separate dealloc thread, as scheduling NAPI instance from there is inefficient,
therefore we can't do dealloc from the instance.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netback: Handle foreign mapped pages on the guest RX path

RX path need to know if the SKB fragments are stored on pages from another
domain.
Logically this patch should be after introducing the grant mapping itself, as
it makes sense only after that. But to keep bisectability, I moved it here. It
shouldn't change any functionality here. xenvif_zerocopy_callback and
ubuf_to_vif are just stubs here, they will be introduced properly later on.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netback: Minor refactoring of netback code

This patch contains a few bits of refactoring before introducing the grant
mapping changes:
- introducing xenvif_tx_pending_slots_available(), as this is used several
times, and will be used more often
- rename the thread to vifX.Y-guest-rx, to signify it does RX work from the
guest point of view

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netback: Use skb->cb for pending_idx

Storing the pending_idx at the first byte of the linear buffer never looked
good, skb->cb is a more proper place for this. It also prevents the header to
be directly grant copied there, and we don't have the pending_idx after we
copied the header here, so it's time to change it.
It also introduces helpers for the RX side

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: keep original skb ownership

There is no reason to orphan skb in l2tp.

This breaks things like per socket memory limits, TCP Small queues...

Fix this before more people copy/paste it.

This is very similar to commit 8f646c922d550
("vxlan: keep original skb ownership")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: do not leak non zero tstamp in output packets

Usage of skb->tstamp should remain private to TCP stack
(only set on packets on write queue, not on cloned ones)

Otherwise, packets given to loopback interface with a non null tstamp
can confuse netif_rx() / net_timestamp_check()

Other possibility would be to clear tstamp in loopback_xmit(),
as done in skb_scrub_packet()

Fixes: 740b0f1841f6 ("tcp: switch rtt estimations to usec resolution")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

can: add bittiming check at interface open for CAN FD

Additionally to have the second (data) bitrate available the data bitrate
has to be greater or equal to the arbitration bitrate in CAN FD.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: allow to change the device mtu for CAN FD capable devices

The configuration for CAN FD depends on CAN_CTRLMODE_FD enabled in the driver
specific ctrlmode_supported capabilities.

The configuration can be done either with the 'fd { on | off }' option in the
'ip' tool from iproute2 or by setting the CAN netdevice MTU to CAN_MTU (16) or
to CANFD_MTU (72).

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: introduce the data bitrate configuration for CAN FD

As CAN FD offers a second bitrate for the data section of the CAN frame the
infrastructure for storing and configuring this second bitrate is introduced.
Improved the readability of the if-statement by inserting some newlines.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: provide a separate bittiming_const parameter to bittiming functions

As the bittiming calculation functions are to be used with different
bittiming_const structures for CAN and CAN FD the direct reference to
priv->bittiming_const inside these functions has to be removed.

Also moved the check for existing bittiming const to one place.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: move sanity check for bitrate and tq into can_get_bittiming

This patch moves a sanity check in order to have a second user for CAN FD.
Also simplify the return value generation in can_get_bittiming() as only
correct return values of can_[calc|fixup]_bittiming() lead to a return value of
zero.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: only send bitrate data via netlink when available

When setting the bitrate both can_calc_bittiming() and can_fixup_bittiming()
lead to the bitrate variable to be set, when a proper bit timing is available.
Only then the bitrate configuration is stored for the device, so checking for
priv->bittiming.bitrate is always sufficient.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: preserve skbuff protocol in can_put_echo_skb

The skbuff protocol value was formerly fixed/sanitized to ETH_P_CAN in
can_put_echo_skb(). With CAN FD this value has to be preserved.
This patch changes the hard assignment of the protocol value to a check of
valid protocol values for CAN and CAN FD.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: janz-ican3: convert dev_<level> printing to netdev_<level>

This patch converts the dev_<level> printing to netdev_<level>, this makes it
possible to remove the "struct device *dev" pointer from the "struct
ican3_dev".

Cc: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

i40e/i40evf: Bump pf&vf build versions

Bump i40e to 0.3.34 and i40evf to 0.9.14.

Change-ID: I6b3fb8ccf55b128d2baa4bdc20d3911ec81d4a5b
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: carefully fill tx ring

We need to make sure that we stay away from the cache line
where the DD bit (done) may be getting written back for
the transmit ring since the hardware may be writing the
whole cache line for a partial update.

Change-ID: Id0b6dfc01f654def6a2a021af185803be1915d7e
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: fix nvm version and remove firmware report

The driver needs to use the format that the current NVM
uses when printing the version of the NVM. It should remain
this way from now on forward.

The driver was reporting when firmware was less than
an expected version number, but this is not a requirement
for the product and we print the firmware number at
init and in ethtool -i output. Just remove the print.

Change-ID: Ide0b856cd454ebf867610ef9a0d639bb358a4a60
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Fix static checker warning

This patch fixes the following static checker warning:

drivers/net/ethernet/intel/i40e/i40e_dcb.c:342
i40e_lldp_to_dcb_config() warn: 'tlv' can't be NULL.

Exit criteria from the while loop is encountering LLDP END
LV or if the TLV length goes beyond the buffer length.

Change-ID: I7548b16db90230ec2ba0fa791b0343ca8b7dd5bb
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Tested-By: Jack Morgan<jack.morgan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Remove a redundant filter addition

Remove a redundant filter addition to stop FW complaints about a redundant
filter removal.

Change-ID: I22bef6b682bd8d43432557e6e2b3e73ffb27b985
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: count timeout events

The ethtool -S statistics should have a counter for
tx timeouts in order to better help inform the masses.

Change-ID: Ice4b20ed4a151509f366719ab105be49c9e7b2b4
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Remove a FW workaround for Number of MSIX vectors

The Number of MSIX vectors being reported is correct and hence
we need a check to do the right thing for FWs before and after.

Change-ID: I50902d1c848adcb960ea49ac73f7865ca871a1c3
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: clean up comment style

Lots of trivial changes to remove double spaces in function headers,
unnecessary periods in short comments, and adjust the English usage here
and there.

No actual code was harmed in the making of this patch.

Change-ID: I6e756c500756945e81a61ffb10221753eb7923ea
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: i40e implementation for skb_set_hash

Original comment from Tom Herbert <therbert@google.com>

Drivers should call skb_set_hash to set the hash and its type
in an skbuff.

This patch builds upon Tom's original implementation and adds
the L4 type return when we know it is an L4 hash.
This requires use of the ptype decoder ring, so enable it.

Change-ID: I2f9fa86d1a6add58cff13386f7f4238b1abcc468
CC: Tom Herbert <therbert@google.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Prevent overflow due to kzalloc

To prevent the possibility of overflow due multiplication of number and size
use kcalloc instead of kzalloc.

Change-ID: Ibe4d81ed7d9738d3bbe66ee4844ff9be817e8080
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Flow Director sideband accounting

This patch completes implementation of the ethtool ntuple
rule management interface. It adds the get, update and delete
interface reset.

Change-ID: Ida7f481d9ee4e405ed91340b858eabb18a52fdb5
Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40evf: Enable the ndo_set_features netdev op

Set netdev->hw_features to enable the ndo_set_features netdev op.

Change-Id: I5a086fbfa5a089de5adba2800c4d0b3a73747b11
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

bonding: fix const in options processing

This is a fixup patch to resolve issues with const from my earlier patch.
Make all the setter functions use const on input parameter.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net_sched: htb: do not acquire qdisc lock in dump operations

htb_dump() and htb_dump_class() do not strictly need to acquire
qdisc lock to fetch qdisc and/or class parameters.

We hold RTNL and no changes can occur.

This reduces by 50% qdisc lock pressure while doing tc qdisc|class dump
operations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '6lowpan'

Alexander Aring says:

====================
6lowpan: header cleanup

this patch series fix a missing include of 6LoWPAN header and move it
into the include/net directory. Since we did some code sharing with
bluetooth 6LoWPAN the header turns into a generic header for 6LoWPAN.
Instead to use a relative path in bluetooth 6LoWPAN we can now use
include <net/6lowpan.h>.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

6lowpan: move 6lowpan header to include/net

This header is used by bluetooth and ieee802154 branch. This patch
move this header to the include/net directory to avoid a use of a
relative path in include.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6lowpan: add missing include of net/ipv6.h

The 6lowpan.h file contains some static inline function which use
internal ipv6 api structs. Add a include of ipv6.h to be sure that it's
known before.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: do external loopback test only when it is requested

v2: remove unnecessary braces from all 'loopback' if-blocks (thx Sergei)

Cc: sathya.perla@emulex.com
Cc: subbu.seetharaman@emulex.com
Cc: ajit.khaparde@emulex.com
Cc: sergei.shtylyov@cogentembedded.com
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: mlx4_en_verify_params() can be static

Fix static error introduced by commit:
b97b33a3df0439401f80f041eda507d4fffa0dbf [645/653] net/mlx4_en: Verify
mlx4_en module parameters

sparse warnings:
drivers/net/ethernet/mellanox/mlx4/en_main.c:335:6: sparse: symbol
'mlx4_en_verify_params' was not declared. Should it be static?

CC: netdev@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

can: flexcan: make use of platform_get_device_id()

This patch replaces an open coded pdev->id_entry by platform_get_device_id().

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: flexcan: Remove #ifdef CONFIG_PM_SLEEP

This patch removes #ifdef CONFIG_PM_SLEEP to improve compile coverage.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251x: Remove #ifdef CONFIG_PM_SLEEP

This patch removes #ifdef CONFIG_PM_SLEEP to improve compile coverage.

Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251x: Make driver more quiet

This patch moves one diagnostic message used for debugging purposes
to dev_dbg() and removes one useless message.

Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

bonding: options handling cleanup

Make local functions static (ie. only used in bond_options.c)
Make bond options parsing tables constant.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: remove dead code

These functions are defined but no longer used.
Compile tested only.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Use NET_ADD_STATS instead of NET_ADD_STATS_BH in tcp_event_new_data_sent()

Can be invoked from non-BH context.

Based upon a patch by Eric Dumazet.

Fixes: f19c29e3e391 ("tcp: snmp stats for Fast Open, SYN rtx, and data pkts")
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: make slave status notifications GFP_ATOMIC

Currently we're using GFP_KERNEL, however there are some path(s) where we
can hold some spinlocks, specifically bond->curr_slave_lock:

[    4.722916] BUG: sleeping function called from invalid context at mm/slub.c:965
[    4.724438] in_atomic(): 1, irqs_disabled(): 0, pid: 940, name: ifup-eth
[    4.726034] 5 locks held by ifup-eth/940:
...snip...
[    4.734646]  #4:  (&bond->curr_slave_lock){+...+.}, at: [<ffffffffa00badc6>] bond_enslave+0xda6/0xdd0 [bonding]
...snip...
[    4.759081]  [<ffffffffa00b6f11>] bond_change_active_slave+0x191/0x3b0 [bonding]
[    4.760917]  [<ffffffffa00b7227>] bond_select_active_slave+0xf7/0x1d0 [bonding]
[    4.762751]  [<ffffffffa00badce>] bond_enslave+0xdae/0xdd0 [bonding]
...snip...

As it's out of hot path and is a really rare event - change the gfp_t flags
to GFP_ATOMIC to avoid sleeping under spinlock.

v2: convert new notify calls to GFP_ATOMIC.

CC: Thomas Glanzmann <thomas@glanzmann.de>
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: remove now unused flag DST_NOPEER

Commit e688a604807647 ("net: introduce DST_NOPEER dst flag") introduced
DST_NOPEER because because of crashes in ipv6_select_ident called from
udp6_ufo_fragment.

Since commit 916e4cf46d0204 ("ipv6: reuse ip6_frag_id from
ip6_ufo_append_data") we don't call ipv6_select_ident any more from
ip6_ufo_append_data, thus this flag lost its purpose and can be removed.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'r8152'

Hayes Wang says:

====================
r8152: cleanups

Deal with some empty lines and spaces, replace some tp->netdev with netdev,
and remove the unnecessary function.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: remove rtl8152_get_stats

The rtl8152_get_stats() returns the point address of the struct
net_device_stats. This could be got from struct net_device directly.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: replace tp->netdev with netdev

Replace some tp->netdev with netdev.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: deal with the empty line and space

Add or remove some empty lines. Replace the spaces with the tabs.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Conflicts:
drivers/net/wireless/ath/ath9k/recv.c
drivers/net/wireless/mwifiex/pcie.c
net/ipv6/sit.c

The SIT driver conflict consists of a bug fix being done by hand
in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
was created (netdev_alloc_pcpu_stats()) which takes care of this.

The two wireless conflicts were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>

ieee802154: fix whitespace issues in Kconfig

This patch fixes some whitespace issues in Kconfig files of IEEE
802.15.4 subsytem.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

at86rf230: add help and 212 to Kconfig menu entry

Since commit 8fad346f366a72978ea942abd06bd501ebd39c22
(ieee802154: add basic support for RF212 to at86rf230 driver)

we support at86rf212 as well.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: dma_sync each RX frag before passing it to the stack

The driver currently maps a page for DMA, divides the page into multiple
frags and posts them to the HW. It un-maps the page after data is received
on all the frags of the page. This scheme doesn't work when bounce buffers
are used for DMA (swiotlb=force kernel param).

This patch fixes this problem by calling dma_sync_single_for_cpu() for each
frag (excepting the last one) so that the data is copied from the bounce
buffers. The page is un-mapped only when DMA finishes on the last frag of
the page.
(Thanks Ben H. for suggesting the dma_sync API!)

This patch also renames the "last_page_user" field of be_rx_page_info{}
struct to "last_frag" to improve readability of the fixed code.

Reported-by: Li Fengmao <li.fengmao@zte.com.cn>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mpls_tc'

Simon Wunderlich says:

====================
this series contains a header file proposal for MPLS labels. These
labels do not seem to be properly defined in the kernel so far. We are
developing a wired/wireless 802.21/MPLS switch and need to check the
MPLS labels to use the traffic control info for transmissions over
802.11 networks.

Changes to third version:

* rename mpls_label_stack to mpls_label (thanks Neil)
* fix over-indendented closing brac (thanks Sergei)
* add Johannes' Ack
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cfg80211: add MPLS and 802.21 classification

MPLS labels may contain traffic control information, which should be
evaluated and used by the wireless subsystem if present.

Also check for IEEE 802.21 which is always network control traffic.

Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

UAPI: add MPLS label stack definition

Labels for the Multiprotocol Label Switching are defined in RFC 3032
which was superseded by RFC 5462. Add the definition to UAPI and a stub
header for include/linux.

Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

if_ether.h: add IEEE 802.21 Ethertype

Add the Ethertype for IEEE Std 802.21 - Media Independent Handover
Protocol. This Ethertype is used for network control messages.

Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix memory leak in ieee80211_prep_connection(), sta_info leaked on
    error.  From Eytan Lifshitz.

2) Unintentional switch case fallthrough in nft_reject_inet_eval(),
    from Patrick McHardy.

3) Must check if payload lenth is a power of 2 in
    nft_payload_select_ops(), from Nikolay Aleksandrov.

4) Fix mis-checksumming in xen-netfront driver, ip_hdr() is not in the
    correct place when we invoke skb_checksum_setup().  From Wei Liu.

5) TUN driver should not advertise HW vlan offload features in
    vlan_features.  Fix from Fernando Luis Vazquez Cao.

6) IPV6_VTI needs to select NET_IPV_TUNNEL to avoid build errors, fix
    from Steffen Klassert.

7) Add missing locking in xfrm_migrade_state_find(), we must hold the
    per-namespace xfrm_state_lock while traversing the lists.  Fix from
    Steffen Klassert.

8) Missing locking in ath9k driver, access to tid->sched must be done
    under ath_txq_lock().  Fix from Stanislaw Gruszka.

9) Fix two bugs in TCP fastopen.  First respect the size argument given
    to tcp_sendmsg() in the fastopen path, and secondly prevent
    tcp_send_syn_data() from potentially using order-5 allocations.
    From Eric Dumazet.

10) Fix handling of default neigh garbage collection params, from Jiri
    Pirko.

11) Fix cwnd bloat and over-inflation of RTT when transmit segmentation
    is in use.  From Eric Dumazet.

12) Missing initialization of Realtek r8169 driver's statistics
    seqlocks.  Fix from Kyle McMartin.

13) Fix RTNL assertion failures in 802.3ad and AB ARP monitor of bonding
    driver, from Ding Tianhong.

14) Bonding slave release race can cause divide by zero, fix from
    Nikolay Aleksandrov.

15) Overzealous return from neigh_periodic_work() causes reachability
    time to not be computed.  Fix from Duain Jiong.

16) Fix regression in ipv6_find_hdr(), it should not return -ENOENT when
    a specific target is specified and found.  From Hans Schillstrom.

17) Fix VLAN tag stripping regression in BNA driver, from Ivan Vecera.

18) Tail loss probe can calculate bogus RTTs due to missing packet
    marking on retransmit.  Fix from Yuchung Cheng.

19) We cannot do skb_dst_drop() in iptunnel_pull_header() because
    multicast loopback detection in later code paths need access to
    skb_rtable().  Fix from Xin Long.

20) The macvlan driver regresses in that it propagates lower device
    offload support disables into itself, causing severe slowdowns when
    running over a bridge.  Provide the software offloads always on
    macvlan devices to deal with this and the regression is gone.  From
    Vlad Yasevich.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
  macvlan: Add support for 'always_on' offload features
  net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable
  ip_tunnel:multicast process cause panic due to skb->_skb_refdst NULL pointer
  net: cpsw: fix cpdma rx descriptor leak on down interface
  be2net: isolate TX workarounds not applicable to Skyhawk-R
  be2net: Fix skb double free in be_xmit_wrokarounds() failure path
  be2net: clear promiscuous bits in adapter->flags while disabling promiscuous mode
  be2net: Fix to reset transparent vlan tagging
  qlcnic: dcb: a couple off by one bugs
  tcp: fix bogus RTT on special retransmission
  hsr: off by one sanity check in hsr_register_frame_in()
  can: remove CAN FD compatibility for CAN 2.0 sockets
  can: flexcan: factor out soft reset into seperate funtion
  can: flexcan: flexcan_remove(): add missing netif_napi_del()
  can: flexcan: fix transition from and to freeze mode in chip_{,un}freeze
  can: flexcan: factor out transceiver {en,dis}able into seperate functions
  can: flexcan: fix transition from and to low power mode in chip_{en,dis}able
  can: flexcan: flexcan_open(): fix error path if flexcan_chip_start() fails
  can: flexcan: fix shutdown: first disable chip, then all interrupts
  USB AX88179/178A: Support D-Link DUB-1312
  ...

Merge tag 'regulator-v3.14-rc5' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"A couple of fixes here which ensure that regulators using the core
  support for GPIO enables work in all cases by ensuring that helpers
  are used consistently rather than open coding in places and hence not
  having GPIO support in some of them"

* tag 'regulator-v3.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: core: Replace direct ops->disable usage
  regulator: core: Replace direct ops->enable usage

Merge branch 'akpm' (patches from Andrew Morton)

Merge misc fixes from Andrew Morton.

* emailed patches from Andrew Morton akpm@linux-foundation.org>:
  mm: page_alloc: exempt GFP_THISNODE allocations from zone fairness
  mm: numa: bugfix for LAST_CPUPID_NOT_IN_PAGE_FLAGS
  MAINTAINERS: add and correct types of some "T:" entries
  MAINTAINERS: use tab for separator
  rapidio/tsi721: fix tasklet termination in dma channel release
  hfsplus: fix remount issue
  zram: avoid null access when fail to alloc meta
  sh: prefix sh-specific "CCR" and "CCR2" by "SH_"
  ocfs2: fix quota file corruption
  drivers/rtc/rtc-s3c.c: fix incorrect way of save/restore of S3C2410_TICNT for TYPE_S3C64XX
  kallsyms: fix absolute addresses for kASLR
  scripts/gen_initramfs_list.sh: fix flags for initramfs LZ4 compression
  mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking
  memcg: reparent charges of children before processing parent
  memcg: fix endless loop in __mem_cgroup_iter_next()
  lib/radix-tree.c: swapoff tmpfs radix_tree: remember to rcu_read_unlock
  dma debug: account for cachelines and read-only mappings in overlap tracking
  mm: close PageTail race
  MAINTAINERS: EDAC: add Mauro and Borislav as interim patch collectors

mm: page_alloc: exempt GFP_THISNODE allocations from zone fairness

Jan Stancek reports manual page migration encountering allocation
failures after some pages when there is still plenty of memory free, and
bisected the problem down to commit 81c0a2bb515f ("mm: page_alloc: fair
zone allocator policy").

The problem is that GFP_THISNODE obeys the zone fairness allocation
batches on one hand, but doesn't reset them and wake kswapd on the other
hand. After a few of those allocations, the batches are exhausted and
the allocations fail.

Fixing this means either having GFP_THISNODE wake up kswapd, or
GFP_THISNODE not participating in zone fairness at all. The latter
seems safer as an acute bugfix, we can clean up later.

Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: <stable@kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>