git.cascardo.info Git - cascardo/linux.git/log

selftests/net: test classic bpf fanout mode

Test PACKET_FANOUT_CBPF by inserting a cBPF program that selects a
socket by payload. Requires modifying the test program to send
packets with multiple payloads.

Also fix a bug in testing the return value of mmap()

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

packet: add extended BPF fanout mode

Add fanout mode PACKET_FANOUT_EBPF that accepts an en extended BPF
program to select a socket.

Update the internal eBPF program by passing to socket option
SOL_PACKET/PACKET_FANOUT_DATA a file descriptor returned by bpf().

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

packet: add classic BPF fanout mode

Add fanout mode PACKET_FANOUT_CBPF that accepts a classic BPF program
to select a socket.

This avoids having to keep adding special case fanout modes. One
example use case is application layer load balancing. The QUIC
protocol, for instance, encodes a connection ID in UDP payload.

Also add socket option SOL_PACKET/PACKET_FANOUT_DATA that updates data
associated with the socket group. Fanout mode PACKET_FANOUT_CBPF is the
only user so far.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lwtunnel: rename ip lwtunnel attributes

We already have IFLA_IPTUN_ netlink attributes. The IP_TUN_ attributes look
very similar, yet they serve very different purpose. This is confusing for
anyone trying to implement a user space tool supporting lwt.

As the IP_TUN_ attributes are used only for the lightweight tunnels, prefix
them with LWTUNNEL_IP_ instead to make their purpose clear. Also, it's more
logical to have them in lwtunnel.h together with the encap enum.

Fixes: 3093fbe7ff4b ("route: Per route IP tunnel metadata via lightweight tunnel")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

smsc911x: Fix crash seen if neither ACPI nor OF is configured or used

Commit 0b50dc4fc971 ("Convert smsc911x to use ACPI as well as DT") makes
the call to smsc911x_probe_config() unconditional, and no longer fails if
there is no device node. device_get_phy_mode() is called unconditionally,
and if there is no phy node configured returns an error code. This error
code is assigned to phy_interface, and interpreted elsewhere in the code
as valid phy mode. This in turn causes qemu to crash when running a
variant of realview_pb_defconfig.

qemu: hardware error: lan9118_read: Bad reg 0x86

Fixes: 0b50dc4fc971 ("Convert smsc911x to use ACPI as well as DT")
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc Graeme Gregory <graeme.gregory@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2015-08-17

1) Fix IPv6 ECN decapsulation for IPsec interfamily tunnels.
   From Thomas Egerer.

2) Use kmemdup instead of duplicating it in xfrm_dump_sa().
   From Andrzej Hajda.

3) Pass oif to the xfrm lookups so that it gets set on the flow
   and the resolver routines can match based on oif.
   From David Ahern.

4) Add documentation for the new xfrm garbage collector threshold.
   From Alexander Duyck.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix endian check warning in etherdevice.h

Sparse builds have been warning for a really long time now
that etherdevice.h has a conversion that is unsafe.

include/linux/etherdevice.h:79:32: warning: restricted __be16 degrades to integer

This code change fixes the issue and generates the exact
same assembly before/after (checked on x86_64)

Fixes: 2c722fe1c821 (etherdevice: Optimize a few is_<foo>_ether_addr functions)
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'iff_no_queue'

Phil Sutter says:

====================
net: introduce IFF_NO_QUEUE as successor of zero tx_queue_len

This series adds a new private net_device flag indicating that a device may
(and probably should) be used without a queueing discipline attached to it.
This is already common practice for many virtual device types like e.g.
loopback, VLAN (802.1Q) or bridges (802.1D). The reason for this is that these
devices lack an underlying layer which could impose back pressure and therefore
making a TX queue necessary to not slow down senders.

Up to now, drivers being aware of the above applying to them set
dev->tx_queue_len to zero to indicate no qdisc should be attached to the
interface they drive and the kernel reacts upon this by assigning the noop
qdisc instead of the default pfifo_fast. This implicit agreement though leads
to an inconvenient situation once a user tries to attach a real qdisc to these
devices, as the formerly special tx_queue_len value becomes a regular one,
limiting the queue to zero packets and thus prevents any TX from happening. To
overcome this, practically all qdisc implementations intercept and sanitize the
malicious value.

With this series applied, drivers may signal the lack of need for a qdisc
without having to tamper with tx_queue_len, making fallbacks in qdiscs and
caveats in userspace unnecessary.

Upon upstream acceptance, this series will be followed up by a set of patches
converting device drivers, adding a warning so out-of-tree driver authors get
aware of this change and dropping all special handling of tx_queue_len in
net/sched/.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: sch_generic: react upon IFF_NO_QUEUE flag

Handle IFF_NO_QUEUE as alternative to tx_queue_len being zero.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: declare new net_device priv_flag IFF_NO_QUEUE

This private net_device flag can be set by drivers to inform that a
device runs fine without a qdisc attached. This was formerly done by
setting tx_queue_len to zero.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: don't sanity check non-existing TLV (NL compat)

A zero length payload means that no TLV (Type Length Value) data has
been passed. Prior to this patch a non-existing TLV could be sanity
checked with TLV_OK() resulting in random behavior where a user
sending an empty message occasionally got a incorrect "operation not
supported" message back.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Fix bandwidth allocation for some MF modes

Management firmware tells driver in case bandwidth configuration for
a specific function exists, but [regretably] the same field has different
meanings depending on the multi-function mode - it can either be
a percentile value or an actual speed.

For newer multi-function modes current logic is incorrect -
driver understands values as actual speeds instead of percentages,
causing the resulting chip configuration to be incorrect.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: fix refcount leak in fib_check_nh()

fib_lookup() forces FIB_LOOKUP_NOREF flag, while fib_table_lookup()
does not.

This patch solves the typical message at reboot time or device
dismantle :

unregister_netdevice: waiting for eth0 to become free. Usage count = 4

Fixes: 3bfd847203c6 ("net: Use passed in table for nexthop lookups")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'vrf-lite'

David Ahern says:

====================
VRF-lite - v6

In the context of internet scale routing a requirement that always comes
up is the need to partition the available routing tables into disjoint
routing planes. A specific use case is the multi-tenancy problem where
each tenant has their own unique routing tables and in the very least
need different default gateways.

This patch allows the ability to create virtual router domains (aka VRFs
(VRF-lite to be specific) in the linux packet forwarding stack. The main
observation is that through the use of rules and socket binding to interfaces,
all the facilities that we need are already present in the infrastructure. What
is missing is a handle that identifies a routing domain and can be used to
gather applicable rules/tables and uniqify neighbor selection. The scheme used
needs to preserves the notions of ECMP, and general routing principles.

This driver is a cross between functionality that the IPVLAN driver
and the Team drivers provide where a device is created and packets
into/out of the routing domain are shuttled through this device. The
device is then used as a handle to identify the applicable rules. The
VRF device is thus the layer3 equivalent of a vlan device.

The very important point to note is that this is only a Layer3 concept
so L2 tools (e.g., LLDP) do not need to be run in each VRF, processes can
run in unaware mode or select a VRF to be talking through. Also the
behavioral model is a generalized application of the familiar VRF-Lite
model with some performance paths that need optimization. (Specifically
the output route selector that Roopa, Robert, Thomas and EricB are
currently discussing on the MPLS thread)

High Level points
=================
1. Simple overlay driver (minimal changes to current stack)
   * uses the existing fib tables and fib rules infrastructure
2. Modelled closely after the ipvlan driver
3. Uses current API and infrastructure.
   * Applications can use SO_BINDTODEVICE or cmsg device indentifiers
     to pick VRF (ping, traceroute just work)
   * Standard IP Rules work, and since they are aggregated against the
     device, scale is manageable
4. Completely orthogonal to Namespaces and only provides separation in
   the routing plane (and ARP)

                                                 N2
           N1 (all configs here)          +---------------+
    +--------------+                      |               |
    |swp1 :10.0.1.1+----------------------+swp1 :10.0.1.2 |
    |              |                      |               |
    |swp2 :10.0.2.1+----------------------+swp2 :10.0.2.2 |
    |              |                      +---------------+
    | VRF 1        |
    | table 5      |
    |              |
    +---------------+
    |              |
    | VRF 2        |                             N3
    | table 6      |                      +---------------+
    |              |                      |               |
    |swp3 :10.0.2.1+----------------------+swp1 :10.0.2.2 |
    |              |                      |               |
    |swp4 :10.0.3.1+----------------------+swp2 :10.0.3.2 |
    +--------------+                      +---------------+

Given the topology above, the setup needed to get the basic VRF
functions working would be

Create the VRF devices and associate with a table
    ip link add vrf1 type vrf table 5
    ip link add vrf2 type vrf table 6

Install the lookup rules that map table to VRF domain
    ip rule add pref 200 oif vrf1 lookup 5
    ip rule add pref 200 iif vrf1 lookup 5
    ip rule add pref 200 oif vrf2 lookup 6
    ip rule add pref 200 iif vrf2 lookup 6

    ip link set vrf1 up
    ip link set vrf2 up

Enslave the routing member interfaces
    ip link set swp1 master vrf1
    ip link set swp2 master vrf1
    ip link set swp3 master vrf2
    ip link set swp4 master vrf2

Connected and local routes are automatically moved from main and local
tables to the VRF table.

ping using VRF0 is simply
    ping -I vrf0 10.0.1.2

Design Highlights
=================
If a device is enslaved to a VRF device (ie., associated with a VRF)
then:
1. Rx path
   The master device index is used as the iif for all lookups.

2. Tx path
   Similarly, for Tx the VRF device oif is used in the flow to direct
   lookups to the table associated with the VRF via its rule. From there
   the FLOWI_FLAG_VRFSRC flag is used to indicate that the oif should
   not be used for FIB table lookups.

3. Connected and local routes
   On link up for a device, connected and local routes are added to the
   table associated with the VRF device, rather than the local and main
   tables.

4. Socket lookups
   Sockets operating in the VRF must be bound to the VRF device. As such
   socket lookups compare the VRF device index to sk_bound_dev_if.

5. Neighbor entries
   Neighbor entries are not impacted by the VRF device. Entries are
   associated with a particular interface; the VRF association is indirect
   via the interface-to-VRF device enslavement.

Version 6
- addressed comments from DaveM

- added patch to properly set oif in ip_send_unicast_reply. Needs to be
  set to VRF device for proper FIB lookup

- added patch to handle IP fragments

Version 5
- dropped patch regarding socket lookups; no longer needed
  + removed vrf helpers no longer needed after this patch is dropped
- removed dev_open and close operations
  + no need to reset vrf data on an ifdown and creates problems if a
    slave is deleted while the vrf interface is down (Thanks, Nikolay)
- cleanups for sparse warnings
  + make C=2 is now clean for vrf driver

Version 4
- builds are clean with and without VRF device enabled (no, yes and module)
- tightened the driver implementation
  + device add/delete, slave add/remove, and module unload are all clean
- fixed RCU references
  + with RCU and lock debugging enabled changes are clean through the
    suite of tests
- TX path uses custom dst, so patch refactoring rtable allocation is
  dropped along with the patch adding rt_nexthop helper
- dropped the task patch that adds default bind to interface for sockets
  and the associated chvrf example command
  + the patches are a convenience for running unmodified code. They
    are not needed for the core functionality. Any application with
    support for SO_BINDTODEVICE works properly with this patch set.

Version 3
- addressed comments from first 2 RFCs with the exception of the name
  Nicolas: We will do the name conversion once we agree on what the
           correct name should be (vrf, mrf or something else)

-  packets flow through the VRF device in both directions allowing the
   following:
   - tcpdump -i vrf<n>
   - tc rules on vrf device
   - netfilter rules on vrf device

TO-DO
=====
1. IPv6

2. ipsec, xfrms
   - dst patch accepted into ipsec-next; will post VRF patch once merge happens

3. listen filter to allow 1 socket to work with multiple VRF devices
   - i.e., bind to VRF's a, b, c only or NOT VRFs e, f, g

Eric B:
  I have ipsec working with VRFs implemented using the VRF driver,
  including the worst case scenario of complete duplication in the
  networking config.

Thanks to Nikolay for his many, many code reviews whipping the device
driver into shape, and bug-Fixes and ideas from Hannes, Roopa Prabhu,
Jon Toppins, Jamal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: Introduce VRF device driver

This driver borrows heavily from IPvlan and teaming drivers.

Routing domains (VRF-lite) are created by instantiating a VRF master
device with an associated table and enslaving all routed interfaces that
participate in the domain. As part of the enslavement, all connected
routes for the enslaved devices are moved to the table associated with
the VRF device. Outgoing sockets must bind to the VRF device to function.

Standard FIB rules bind the VRF device to tables and regular fib rule
processing is followed. Routed traffic through the box, is forwarded by
using the VRF device as the IIF and following the IIF rule to a table
that is mated with the VRF.

Example:

   Create vrf 1:
     ip link add vrf1 type vrf table 5
     ip rule add iif vrf1 table 5
     ip rule add oif vrf1 table 5
     ip route add table 5 prohibit default
     ip link set vrf1 up

   Add interface to vrf 1:
     ip link set eth1 master vrf1

Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: frags: Add VRF device index to cache and lookup

Fragmentation cache uses information from the IP header to reassemble
packets. That information can be duplicated across VRFs -- same source
and destination addresses, protocol and id. Handle fragmentation with
VRFs by adding the VRF device index to entries in the cache and the
lookup arg.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Use VRF index for oif in ip_send_unicast_reply

If output device is not specified use VRF device if input device is
enslaved. This is needed to ensure tcp acks and resets go out VRF device.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Use passed in table for nexthop lookups

If a user passes in a table for new routes use that table for nexthop
lookups. Specifically, this solves the case where a connected route does
not exist in the main table, but only another table and then a subsequent
route is added with a next hop using the connected route. ie.,

$ ip route ls
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0  proto kernel  scope link  src 10.0.2.15
169.254.0.0/16 dev eth0  scope link  metric 1003
192.168.56.0/24 dev eth1  proto kernel  scope link  src 192.168.56.51

$ ip route ls table 10
1.1.1.0/24 dev eth2  scope link

Without this patch adding a nexthop route fails:

$ ip route add table 10 2.2.2.0/24 via 1.1.1.10
RTNETLINK answers: Network is unreachable

With this patch the route is added successfully.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add routes to the table associated with the device

When a device associated with a VRF is brought up or down routes
should be added to/removed from the table associated with the VRF.
fib_magic defaults to using the main or local tables. Have it use
the table with the device if there is one.

A part of this is directing prefsrc validations to the correct
table as well.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix up inet_addr_type checks

Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.

inet_addr_type_dev_table keeps the same semantics as inet_addr_type but
if the passed in device is enslaved to a VRF then the table for that VRF
is used for the lookup.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add inet_addr lookup by table

Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.

Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: Handle VRF device in sendmsg

For unconnected UDP sockets using a VRF device lookup source address
based on VRF table. This allows the UDP header to be properly setup
before showing up at the VRF device via the dst.

Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Use VRF device index for lookups on TX

As with ingress use the index of VRF master device for route lookups on
egress. However, the oif should only be used to direct the lookups to a
specific table. Routes in the table are not based on the VRF device but
rather interfaces that are part of the VRF so do not consider the oif for
lookups within the table. The FLOWI_FLAG_VRFSRC is used to control this
latter part.

Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Use VRF device index for lookups on RX

On ingress use index of VRF master device for route lookups if real device
is enslaved. Rules are expected to be installed for the VRF device to
direct lookups to a specific table.

Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Introduce VRF related flags and helpers

Add a VRF_MASTER flag for interfaces and helper functions for determining
if a device is a VRF_MASTER.

Add link attribute for passing VRF_TABLE id.

Add vrf_ptr to netdevice.

Add various macros for determining if a device is a VRF device, the index
of the master VRF device and table associated with VRF device.

Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: addr IFLA_OPERSTATE to netlink message for ipv6 ifinfo

This is useful information to include in ipv6 netlink messages that
report interface information. IFLA_OPERSTATE is already included in
ipv4 messages, but missing for ipv6. This closes that gap.

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: allow sleeping when modifying store_rps_map

Commit 10e4ea751 ("net: Fix race condition in store_rps_map") has moved the
manipulation of the rps_needed jump label under a spinlock. Since changing
the state of a jump label may sleep this is incorrect and causes warnings
during runtime.

Make rps_map_lock a mutex to allow sleeping under it.

Fixes: 10e4ea751 ("net: Fix race condition in store_rps_map")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mv88e6xxx-hw-vlan'

Vivien Didelot says:

====================
net: dsa: mv88e6xxx: add hardware VLAN support

This patchset brings support to access hardware VLAN entries in DSA and
mv88e6xxx, through switchdev VLAN objects.

In the following example, ports swp[0-2] belong to bridge br0, and ports
swp[3-4] belong to bridge br1. Here's an example of what can be achieved
after this patchset:

    # bridge vlan add dev swp1 vid 100 master
    # bridge vlan add dev swp2 vid 100 master
    # bridge vlan add dev swp3 vid 100 master
    # bridge vlan add dev swp4 vid 100 master
    # bridge vlan del dev swp1 vid 100 master

The above commands correctly programmed hardware VLAN 100 for port swp2,
while ports swp3 and swp4 use software VLAN 100, as shown with:

    # bridge vlan
    port vlan ids
    swp0 None
    swp0
    swp1 None
    swp1
    swp2 100

    swp2 100

    swp3 100

    swp3
    swp4 100

    swp4
    br0 None
    br1 None

Assuming that port 5 is the CPU port, the hardware VLAN table would
contain the following data:

    VID  FID  SID  0  1  2  3  4  5  6
    100    8    0  x  x  t  x  x  t  x

Where 'x' means excluded, and 't' means tagged.

Also, adding an FDB entry to VLAN 100 for port swp2 like this:

    # bridge fdb add 3c:97:0e:11:6e:30 dev swp2 vlan 100

Would result in the following example output:

    # bridge fdb
    # 01:00:5e:00:00:01 dev eth0 self permanent
    # 01:00:5e:00:00:01 dev eth1 self permanent
    # 00:50:d2:10:78:15 dev swp0 master br0 permanent
    # 00:50:d2:10:78:15 dev swp2 vlan 100 master br0 permanent
    # 3c:97:0e:11:6e:30 dev swp2 vlan 100 self static
    # 00:50:d2:10:78:15 dev swp3 master br1 permanent
    # 00:50:d2:10:78:15 dev swp3 vlan 100 master br1 permanent

And the Address Translation Unit would contain:

    DB   T/P  Vec State Addr
    008  Port 004   e   3c:97:0e:11:6e:30
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: use port 802.1Q mode Secure

This commit changes the 802.1Q mode of each port from Disabled to
Secure. This enables the VLAN support, by checking the VTU entries on
ingress.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: add VLAN Load support

Implement port_pvid_set and port_vlan_add to add new entries in the VLAN
hardware table, and join ports to them.

The patch also implement the STU Get Next and Load Purge operations,
since it is required to have a valid STU entry for at least all VLANs.

Each VLAN has its own forwarding database, with FID num_ports+1 to 4095.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: add VLAN Purge support

Add support for the VTU Load Purge operation and implement the
port_vlan_del driver function to remove a port from a VLAN entry, and
delete the VLAN if the given port was its last member.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: add VLAN support to FDB dump

Add an helper function to read the next valid VLAN entry for a given
port. It is used in the VID to FID conversion function to retrieve the
forwarding database assigned to a given VLAN port.

Finally update the FDB getnext operation to iterate on the next valid
port VLAN when the end of the current database is reached.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: add VLAN Get Next support

Implement the port_pvid_get and vlan_getnext driver functions required
to dump VLAN entries from the hardware, with the VTU Get Next operation.

Some functions and structure will be shared with STU operations, since
their table format are similar (e.g. STU data entries are accessible
with the same registers as VTU entries, except with an offset of 2).

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: flush VTU and STU entries

Implement the VTU Flush operation (which also flushes the STU), so that
warm boots won't preserved old entries.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: add support for switchdev VLAN objects

Add new functions in DSA drivers to access hardware VLAN entries through
SWITCHDEV_OBJ_PORT_VLAN objects:

- port_pvid_get() and vlan_getnext() to dump a VLAN
- port_vlan_del() to exclude a port from a VLAN
- port_pvid_set() and port_vlan_add() to join a port to a VLAN

The DSA infrastructure will ensure that each VLAN of the given range
does not already belong to another bridge. If it does, it will fallback
to software VLAN and won't program the hardware.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv6 sysctl option to ignore routes when nexthop link is down

Like the ipv4 patch with a similar title, this adds a sysctl to allow
the user to change routing behavior based on whether or not the
interface associated with the nexthop was an up or down link.  The
default setting preserves the current behavior, but anyone that enables
it will notice that nexthops on down interfaces will no longer be
selected:

net.ipv6.conf.all.ignore_routes_with_linkdown = 0
net.ipv6.conf.default.ignore_routes_with_linkdown = 0
net.ipv6.conf.lo.ignore_routes_with_linkdown = 0
...

When the above sysctls are set, not only will link status be reported to
userspace, but an indication that a nexthop is dead and will not be used
is also reported.

1000::/8 via 7000::2 dev p7p1  metric 1024 dead linkdown  pref medium
1000::/8 via 8000::2 dev p8p1  metric 1024  pref medium
7000::/8 dev p7p1  proto kernel  metric 256 dead linkdown  pref medium
8000::/8 dev p8p1  proto kernel  metric 256  pref medium
9000::/8 via 8000::2 dev p8p1  metric 2048  pref medium
9000::/8 via 7000::2 dev p7p1  metric 1024 dead linkdown  pref medium
fe80::/64 dev p7p1  proto kernel  metric 256 dead linkdown  pref medium
fe80::/64 dev p8p1  proto kernel  metric 256  pref medium

This also adds devconf support and notification when sysctl values
change.

v2: drop use of rt6i_nhflags since it is not needed right now

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: track link status of ipv6 nexthops

Add support to track current link status of ipv6 nexthops to match
recent changes that added support for ipv4 nexthops. This takes a
simple approach to track linkdown status for next-hops and simply
checks the dev for the dst entry and sets proper flags that to be used
in the netlink message.

v2: drop use of rt6i_nhflags since it is not needed right now

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Add MPS tracing support

Handle TRACE_PKT, stack can sniff them on the first port
Add debubfs enrty to configure tracing for offload traffic like iWARP
& iSCSI for debugging purpose.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/fddi: remove HWM_REVERSE() macro

HWM_REVERSE() macro is unused, remove it.

Signed-off-by: yalin wang <yalin.wang2010@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: hook ndo_neigh_destroy to cleanup neigh refs in driver

Rocker driver tracks arp_tbl neighs to resolve IPv4 route nexthops.  The
driver uses NETEVENT_NEIGH_UPDATE for neigh adds and updates, but there is
no event when the neigh is removed from the device (such as when the device
goes admin down).  This patches hooks ndo_neigh_destroy so the driver can
know when a neigh is removed from the device.  In response, the driver will
purge the neigh entry from its internal tbl.

I didn't find an in-tree users of ndo_neigh_destroy, so I'm not sure if
this ndo is vestigial or if there are out-of-tree users.  In any case, it
does what I need here.  An alternative design would be to generate
NETEVENT_NEIGH_UPDATE event when neigh is being destroyed, setting state to
NUD_NONE so driver knows neigh entry is dead.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: print switch ID consistent with phys_switch_id sysfs node

On sucessful probe, driver prints the switch ID. This patch changes the
format of the printed ID to match what's used in sysfs phys_switch_id node.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'smsc911x-acpi'

Jeremy Linton says:

====================
Enable smsc911x for use with ACPI

This set of patches enables the front Ethernet port on the
ARM Juno development platform when used with an ACPI enabled kernel.

These patches covert the of_property* calls in the driver to the
DT/ACPI agnostic device_property* calls, and add the arm hardware
id to the acpi_match_table.

To support the above changes I copied a couple routines from
of_net into the properties.c file, and modified them to
be ACPI/DT agnostic. I'm not 100% sure this is the correct location
for these functions. But I think they are required to avoid having
a dozen different implementations scattered across assorted Ethernet
adapters that are being enabled to use ACPI properties.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Convert smsc911x to use ACPI as well as DT

Add ACPI bindings for the smsc911x driver. Convert the DT specific calls
to nonspecific device* calls, This allows the driver to work
with both ACPI and DT configurations. Ethernet should now work when using
ACPI on ARM Juno.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Graeme Gregory <graeme.gregory@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Add a matching set of device_ functions for determining mac/phy

OF has some helper functions for parsing MAC and PHY settings.
In cases where the platform is providing this information rather
than the device itself, there needs to be similar functions for ACPI.

These functions are slightly modified versions of the ones in
of_net which can use information provided via DT or ACPI.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp-loss-probe'

Yuchung Cheng says:

====================
minor tail loss probe improvements

This patch series enhance the tail loss probe (TLP) on some error
conditions. When TLP fails to send a probe, it will no longer
extend the RTO. When it fails to send a new packet because of
receiver window limit, it'll try to retransmit the last packet.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: TLP retransmits last if failed to send new packet

When TLP fails to send new packet because of receive window
limit, it should fall back to retransmit the last packet instead.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: don't extend RTO on failed loss probe attempts

If TLP was unable to send a probe, it extended the RTO to
now + icsk_rto. But extending the RTO makes little sense
if no TLP probe went out. With this commit, instead of
extending the RTO we re-arm it relative to the transmit time
of the write queue head.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cpsw-errata-workaround'

Mugunthan V N says:

====================
Add AM335x PG1.0 CPSW errata workaround

With commit 870915feabdc ("drivers: net: cpsw: remove
disable_irq/enable_irq as irq can be masked from cpsw itself"),
CPSW on AM335x beagle bone white is broken as there is a errata
for AM335x PG1.0. This patch series implements the workaround by
disabling the interrupts from ARM IRQ controller for AM335x SoC
in addition to the masking of interrupts in CPSW.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: dts: am33xx: update cpsw compatible

CPSW driver has been updated with compatibles for enabling errata
workarounds. So updating cpsw compatibles.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: dts: dra7: update cpsw compatible

CPSW driver has been updated with compatibles for enabling errata
workarounds. So updating cpsw compatibles.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: cpsw: add am335x errata workarround for interrutps

As per Am335x Errata [1] Advisory 1.0.9, The CPSW C0_TX_PEND and
C0_RX_PEND interrupt outputs provide a single transmit interrupt
that combines transmit channel interrupts TXPEND[7:0] and a
single receive interrupt that combines receive channel interrupts
RXPEND[7:0]. The TXPEND[0] and RXPEND[0] interrupt outputs are
connected to the ARM Cortex-A8 interrupt controller (INTC) rather
than the C0_TX_PEND and C0_RX_PEND interrupt outputs. So even
though CPSW interrupt is cleared by writing appropriate values to
EOI register the interrupt is not cleared in IRQ controller. So
interrupt is still pending and CPU is struck in ISR, the
workaround is to disable the interrupts in ARM irq controller.

[1] http://www.ti.com/lit/er/sprz360f/sprz360f.pdf

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Conflicts:
drivers/net/ethernet/cavium/Kconfig

The cavium conflict was overlapping dependency
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'dm-4.2-fixes-5' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

- two stable fixes for corruption seen in a snapshot of thinp metadata;
   metadata snapshots aren't widely used but help provide a consistent
   view of the metadata associated with an active thin-pool.

- a dm-cache fix for the 4.2 "default" policy switch from "mq" to "smq"

* tag 'dm-4.2-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm cache policy smq: move 'dm-cache-default' module alias to SMQ
  dm btree: add ref counting ops for the leaves of top level btrees
  dm thin metadata: delete btrees when releasing metadata snapshot

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull xen block driver fixes from Jens Axboe:
"A few small bug fixes for xen-blk{front,back} that have been sitting
  over my vacation"

* 'for-linus' of git://git.kernel.dk/linux-block:
  xen-blkback: replace work_pending with work_busy in purge_persistent_gnt()
  xen-blkfront: don't add indirect pages to list when !feature_persistent
  xen-blkfront: introduce blkfront_gather_backend_features()

Merge tag 'for-linus-4.2-rc6-tag' of git://git./linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:

- revert a fix from 4.2-rc5 that was causing lots of WARNING spam.

- fix a memory leak affecting backends in HVM guests.

- fix PV domU hang with certain configurations.

* tag 'for-linus-4.2-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
  Revert "xen/events/fifo: Handle linked events when closing a port"
  x86/xen: build "Xen PV" APIC driver for domU as well

Revert x86 sigcontext cleanups

This reverts commits 9a036b93a344 ("x86/signal/64: Remove 'fs' and 'gs'
from sigcontext") and c6f2062935c8 ("x86/signal/64: Fix SS handling for
signals delivered to 64-bit programs").

They were cleanups, but they break dosemu by changing the signal return
behavior (and removing 'fs' and 'gs' from the sigcontext struct - while
not actually changing any behavior - causes build problems).

Reported-and-tested-by: Stas Sergeev <stsp@list.ru>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Workaround hw bug when acquiring PCI bos ownership of iwlwifi
    devices, from Emmanuel Grumbach.

2) Falling back to vmalloc in conntrack should not emit a warning, from
    Pablo Neira Ayuso.

3) Fix NULL deref when rtlwifi driver is used as an AP, from Luis
    Felipe Dominguez Vega.

4) Rocker doesn't free netdev on device removal, from Ido Schimmel.

5) UDP multicast early sock demux has route handling races, from Eric
    Dumazet.

6) Fix L4 checksum handling in openvswitch, from Glenn Griffin.

7) Fix use-after-free in skb_set_peeked, from Herbert Xu.

8) Don't advertize NETIF_F_FRAGLIST in virtio_net driver, this can lead
    to fraglists longer than the driver can support.  From Jason Wang.

9) Fix mlx5 on non-4k-pagesize systems, from Carol L Soto.

10) Fix interrupt storm in bna driver, from Ivan Vecera.

11) Don't propagate -EBUSY from netlink_insert(), from Daniel Borkmann.

12) Fix inet request sock leak, from Eric Dumazet.

13) Fix TX interrupt masking and marking in TX descriptors of fs_enet
    driver, from LEROY Christophe.

14) Get rid of rule optimizer in gianfar driver, it's buggy and unlikely
    to get fixed any time soon.  From Jakub Kicinski

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
  cosa: missing error code on failure in probe()
  gianfar: remove faulty filer optimizer
  gianfar: correct list membership accounting
  gianfar: correct filer table writing
  bonding: Gratuitous ARP gets dropped when first slave added
  net: dsa: Do not override PHY interface if already configured
  net: fs_enet: mask interrupts for TX partial frames.
  net: fs_enet: explicitly remove I flag on TX partial frames
  inet: fix possible request socket leak
  inet: fix races with reqsk timers
  mkiss: Fix error handling in mkiss_open()
  bnx2x: Free NVRAM lock at end of each page
  bnx2x: Prevent null pointer dereference on SKB release
  cxgb4: missing curly braces in t4_setup_debugfs()
  net-timestamp: Update skb_complete_tx_timestamp comment
  ipv6: don't reject link-local nexthop on other interface
  netlink: make sure -EBUSY won't escape from netlink_insert
  bna: fix interrupts storm caused by erroneous packets
  net: mvpp2: replace TX coalescing interrupts with hrtimer
  net: mvpp2: enable proper per-CPU TX buffers unmapping
  ...

Merge tag 'edac_fix_for_4.2' of git://git./linux/kernel/git/bp/bp

Pull EDAC fix from Borislav Petkov:
"A ppc4xx_edac fix for accessing ->csrows properly.  This driver was
  missed during the conversion a couple of years ago"

* tag 'edac_fix_for_4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, ppc4xx: Access mci->csrows array elements properly

EDAC, ppc4xx: Access mci->csrows array elements properly

The commit

de3910eb79ac ("edac: change the mem allocation scheme to
make Documentation/kobject.txt happy")

changed the memory allocation for the csrows member. But ppc4xx_edac was
forgotten in the patch. Fix it.

Signed-off-by: Michael Walle <michael@walle.cc>
Cc: <stable@vger.kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/1437469253-8611-1-git-send-email-michael@walle.cc
Signed-off-by: Borislav Petkov <bp@suse.de>

cosa: missing error code on failure in probe()

If register_hdlc_device() fails, the current code returns 0 but we
should return an error code instead.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

documentation: bring vxlan documentation more up-to-date

A few things have changed since the previous version of the vxlan
documentation was written, so update it and correct some grammar and
such while we are at it.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: Remove unneeded use of IS_ERR_VALUE() macro

There is no need to use the IS_ERR_VALUE() macro for checking
the return value from pm_runtime_* functions.

Just do a simple negative test instead.

The semantic patch that makes this change is available
in scripts/coccinelle/api/pm_runtime.cocci.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: fix bpf_perf_event_read() loop upper bound

Verifier rejects programs incorrectly.

Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read()")
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cxgb4-more-debug-info'

Hariprasad Shenai says:

====================
Add some more debug info

This patch series adds the following.
Add more info for sge_qinfo dump
Differentiate tid and stids between different regions, and add a debugfs
entry to dump all the tid info

This patch series has been created against net-next tree and includes
patches on cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Add debugfs support to dump tid info

Add debugfs support to dump tid info like stid, sftid, tids, atid and
hwtids

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Differentiate between stids between server and filter region

For T4 adapter, offloaded servers tid for IPv4 connections are
allocated from filter region. So add a new field for server filter tid if
server tid is allocated from filter region.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Differentiates between TIDs being used in TCAM and HASH

For the tid info, differentiate from which region the TID is allocated
from. It can be from TCAM region or HASH region.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Add some more details to sge qinfo

Adding more details to sge qinfo for debugging purpose.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv4: increase dhcp inter device timeout

When a system has multiple ethernet devices and during DHCP
request (for using NFS), the system waits only for HZ/2 which is
500mS before switching to another interface for DHCP.

There are some routers (Ex: Trendnet routers) which responds to
DHCP request at about 560mS. When the system has only one
ethernet interface there is no issue as the timeout is 2S and the
dev xid doesn't changes and only retries.

But when the system has multiple Ethernet like DRA74x with CPSW
in dual EMAC mode, the DHCP response is dropped as the dev xid
changes while shifting to the next device. So changing inter
device timeout to HZ (which is 1S).

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: fix build warnings and add function read_trace_pipe()

There are two improvements in this patch:
1. Fix the build warnings;
2. Add function read_trace_pipe() to print the result on
    the screen;

Before this patch, we can get the result through /sys/kernel/de
bug/tracing/trace_pipe and get nothing on the screen.
By applying this patch, the result can be printed on the screen.
  $ ./tracex6
...
         tracex6-705   [003] d..1   131.428593: : CPU-3   19981414
            sshd-683   [000] d..1   131.428727: : CPU-0   221682321
            sshd-683   [000] d..1   131.428821: : CPU-0   221808766
            sshd-683   [000] d..1   131.428950: : CPU-0   221982984
            sshd-683   [000] d..1   131.429045: : CPU-0   222111851
         tracex6-705   [003] d..1   131.429168: : CPU-3   20757551
            sshd-683   [000] d..1   131.429170: : CPU-0   222281240
            sshd-683   [000] d..1   131.429261: : CPU-0   222403340
            sshd-683   [000] d..1   131.429378: : CPU-0   222561024
...

Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atl1c: add BQL support

This BQL implementation is mostly derived from its related driver, alx.
Tested on AR8131 (rev c0) [1969:1063]. Saturated a 100mbps link with 5
concurrent runs of netperf. Ping latency dropped from 14ms to 3ms.

Signed-off-by: Ron Angeles <ronangeles@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'gianfar-fixes'

Jakub Kicinski says:

====================
gianfar: filer changes

respinning with examples as requested.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: remove faulty filer optimizer

Current filer rule optimization is broken in several ways:
(1) Can perform reads/writes beyond end of allocated tables.
     (gianfar_ethtool.c:1326).

(2) It breaks badly for rules with more than 2 specifiers
     (e.g. matching ip, port, tos).

Example:
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.1 dst-port 1 tos 1 action 1
Added rule with ID 254
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.2 dst-port 2 tos 2 action 9
Added rule with ID 253
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.3 dst-port 3 tos 3 action 17
Added rule with ID 252
# ./filer_decode /sys/kernel/debug/gfar1/filer_raw
00: MASK == 00000210 AND         Q:00           ctrl:00000080 prop:00000210
01: FPR  == 00000210 AND CLE     Q:00           ctrl:00000281 prop:00000210
02: MASK == ffffffff AND         Q:00           ctrl:00000080 prop:ffffffff
03: DPT  == 00000003 AND         Q:00           ctrl:0000008e prop:00000003
04: TOS  == 00000003 AND         Q:00           ctrl:0000008a prop:00000003
05: DIA  == 0a000003 AND         Q:11           ctrl:0000448c prop:0a000003
06: DPT  == 00000002 AND         Q:00           ctrl:0000008e prop:00000002
07: TOS  == 00000002 AND         Q:00           ctrl:0000008a prop:00000002
08: DIA  == 0a000002 AND         Q:09           ctrl:0000248c prop:0a000002
09: DIA  == 0a000001 AND         Q:00           ctrl:0000008c prop:0a000001
0a: DPT  == 00000001 AND         Q:00           ctrl:0000008e prop:00000001
0b: TOS  == 00000001     CLE     Q:01           ctrl:0000060a prop:00000001
ff: MASK >= 00000000             Q:00           ctrl:00000020 prop:00000000

(Entire cluster gets AND-ed together).

(3) We observed that the masking rules it generates do not
     play well with clustering on P2020.  Only first rule
     of the cluster would ever fire.  Given that optimizer
     relies heavily on masking this is very hard to fix.

Example:
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.1 dst-port 1  action 1
Added rule with ID 254
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.2 dst-port 2  action 9
Added rule with ID 253
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.3 dst-port 3  action 17
Added rule with ID 252
# ./filer_decode /sys/kernel/debug/gfar1/filer_raw
00: MASK == 00000210 AND         Q:00           ctrl:00000080 prop:00000210
01: FPR  == 00000210 AND CLE     Q:00           ctrl:00000281 prop:00000210
02: MASK == ffffffff AND         Q:00           ctrl:00000080 prop:ffffffff
03: DPT  == 00000003 AND         Q:00           ctrl:0000008e prop:00000003
04: DIA  == 0a000003             Q:11           ctrl:0000440c prop:0a000003
05: DPT  == 00000002 AND         Q:00           ctrl:0000008e prop:00000002
06: DIA  == 0a000002             Q:09           ctrl:0000240c prop:0a000002
07: DIA  == 0a000001 AND         Q:00           ctrl:0000008c prop:0a000001
08: DPT  == 00000001     CLE     Q:01           ctrl:0000060e prop:00000001
ff: MASK >= 00000000             Q:00           ctrl:00000020 prop:00000000

Which looks correct according to the spec but only the first
(eth id 252)/last added rule for 10.0.0.3 will ever trigger.
As if filer did not treat the AND CLE as cluster start but
also kept AND-ing the rules.  We found no errata covering this.

The fact that nobody noticed (2) or (3) makes me think
that this feature is not very widely used and we should just
remove it.

Reported-by: Aleksander Dutkowski <adutkowski@gmail.com>
Signed-off-by: Jakub Kicinski <kubakici@wp.pl>
Acked-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: correct list membership accounting

At a cost of one line let's make sure .count is correct
when calling gfar_process_filer_changes().

Signed-off-by: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: correct filer table writing

MAX_FILER_IDX is the last usable index. Using less-than
will already guarantee that one entry for catch-all rule
will be left, no need to subtract 1 here.

Signed-off-by: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

hv_netvsc: Implement set_channels ethtool op

This enables the use of ethtool --set-channels devname combined N to
change the number of vRSS queues. Separate rx, tx, and other parameters
are not supported. The maximum is rsscap.num_recv_que. It passes the
given value to rndis_filter_device_add through the device_info->num_chn
field.

If the procedure fails, it attempts to recover to the prior state. If
the recovery fails, it logs an error and aborts.

Current num_chn is saved and restored when changing the MTU.

Signed-off-by: Andrew Schwartzmeyer <andschwa@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hv_netvsc: Set vRSS with num_chn in RNDIS filter

Uses device_info->num_chn to pass user provided number of vRSS
queues (from ethtool --set-channels) to rndis_filter_device_add. If
nonzero and less than the maximum, set net_device->num_chn to the given
value; else default to prior algorithm.

Always initialize struct device_info to 0, otherwise not all its fields
are guaranteed to be 0, which is necessary when checking if num_chn has
been purposefully set.

Signed-off-by: Andrew Schwartzmeyer <andschwa@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lan78xx: Remove BUG_ON()

Removing BUG_ON()

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lan78xx: Fix Smatch Warnings

lan78xx.c:2282 tx_complete() warn: variable dereferenced before check 'skb' (see line 2249)
lan78xx.c:2885 lan78xx_bh() info: ignoring unreachable code.
lan78xx.c:3159 lan78xx_probe() info: ignoring unreachable code.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: Gratuitous ARP gets dropped when first slave added

When the first slave is added (such as during bootup) the first
gratuitous ARP gets dropped. We don't see this drop during a failover.
The packet gets dropped in qdisc (noop_enqueue).

The fix is to delay the sending of gratuitous ARPs till the bond dev's
carrier is present.

It can also be worked around by setting num_grat_arp to more than 1.

Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: eth: altera: Remove sgdmadesclen member from altera_tse_private

altera_tse_private->sgdmadesclen is always assigned assigned the same
value and never changes during runtime. Remove the struct member and
use a new define for sizeof(struct sgdma_descrip) instead.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Do not override PHY interface if already configured

In case we need to divert reads/writes using the slave MII bus, we may have
already fetched a valid PHY interface property from Device Tree, and that
mode is used by the PHY driver to make configuration decisions.

If we could not fetch the "phy-mode" property, we will assign p->phy_interface
to PHY_INTERFACE_MODE_NA, such that we can actually check for that condition as
to whether or not we should override the interface value.

Fixes: 19334920eaf7 ("net: dsa: Set valid phy interface type")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
"Fix coarse clock monotonicity (VDSO timestamp off by one jiffy
compared to the syscall one)"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: VDSO: fix coarse clock monotonicity regression

Merge branch 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linux

Pull amd drm fixes from Alex Deucher:
"Dave is on vacation at the moment, so please pull these radeon and
  amdgpu fixes directly.

  Just a few minor things for 4.2:

   - add a new radeon pci id
   - fix a power management regression in amdgpu
   - fix HEVC command buffer validation in amdgpu"

* 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon: add new OLAND pci id
  Revert "drm/amdgpu: Configure doorbell to maximum slots"
  drm/amdgpu: add context buffer size check for HEVC

drm/radeon: add new OLAND pci id

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

Revert "drm/amdgpu: Configure doorbell to maximum slots"

This reverts commit 78ad5cdd21f0d614983fc397338944e797ec70b9.
This commit breaks dpm and suspend/resume on CZ.

drm/amdgpu: add context buffer size check for HEVC

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

Merge tag 'regmap-fix-v4.2-rc6' of git://git./linux/kernel/git/broonie/regmap

Pull regmap fix from Mark Brown:
"regmap: Fix handling of present bits on rbtree cache block resize

  When expanding a cache block we use krealloc() to resize the register
  present bitmap without initialising the newly allocated data (the
  original code was written for kzalloc()).  Add an appropraite memset()
  to fix that"

* tag 'regmap-fix-v4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: regcache-rbtree: Clean new present bits on present bitmap resize

dm cache policy smq: move 'dm-cache-default' module alias to SMQ

When creating dm-cache with the default policy, it will call
request_module("dm-cache-default") to register the default policy.
But the "dm-cache-default" alias was left referring to the MQ policy.
Fix this by moving the module alias to SMQ.

Fixes: bccab6a0 (dm cache: switch the "default" cache replacement policy from mq to smq)
Signed-off-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm btree: add ref counting ops for the leaves of top level btrees

When using nested btrees, the top leaves of the top levels contain
block addresses for the root of the next tree down. If we shadow a
shared leaf node the leaf values (sub tree roots) should be incremented
accordingly.

This is only an issue if there is metadata sharing in the top levels.
Which only occurs if metadata snapshots are being used (as is possible
with dm-thinp). And could result in a block from the thinp metadata
snap being reused early, thus corrupting the thinp metadata snap.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

dm thin metadata: delete btrees when releasing metadata snapshot

The device details and mapping trees were just being decremented
before. Now btree_del() is called to do a deep delete.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

net: Document xfrm4_gc_thresh and xfrm6_gc_thresh

This change adds documentation for xfrm4_gc_thresh and xfrm6_gc_thresh
based on the comments in commit eeb1b73378b56 ("xfrm: Increase the garbage
collector threshold").

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

Merge tag 'localmodconfig-v4.2-rc6' of git://git./linux/kernel/git/rostedt/linux-kconfig

Pull localmodconfig fix from Steven Rostedt:
"Leonidas Spyropoulos found that modules like nouveau were being
  unselected by make localmodconfig even though their configs were set
  and the module was loaded and visible by lsmod.

  The reason for this was because streamline-config.pl only looks at
  Makefiles, and not Kbuild files.  As these modules use Kbuild for
  their names, they too need to be checked by localmodconfig.  This was
  fixed by Richard Weinberger"

* tag 'localmodconfig-v4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-kconfig:
  localmodconfig: Use Kbuild files too

localmodconfig: Use Kbuild files too

In kbuild it is allowed to define objects in files named "Makefile"
and "Kbuild".
Currently localmodconfig reads objects only from "Makefile"s and misses
modules like nouveau.

Link: http://lkml.kernel.org/r/1437948415-16290-1-git-send-email-richard@nod.at
Cc: stable@vger.kernel.org
Reported-and-tested-by: Leonidas Spyropoulos <artafinde@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth

Johan Hedberg says:

====================
pull request: bluetooth 2015-08-11

Here's an important regression fix for the 4.2-rc series that ensures
user space isn't given invalid LTK values. The bug essentially prevents
the encryption of subsequent LE connections, i.e. makes it impossible to
pair devices over LE.

Let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: fs_enet: mask interrupts for TX partial frames.

We are not interested in interrupts for partially transmitted frames.
Unlike SCC and FCC, the FEC doesn't handle the I bit in buffer
descriptors, instead it defines two interrupt bits, TXB and TXF.

We have to mask TXB in order to only get interrupts once the
frame is fully transmitted.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fs_enet: explicitly remove I flag on TX partial frames

We are not interested in interrupts for partially transmitted frames,
we have to clear BD_ENET_TX_INTR explicitly otherwise it may remain
from a previously used descriptor.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mv88e6xxx-switchdev-fdb'

Vivien Didelot says:

====================
net: dsa: mv88e6xxx: support switchdev FDB objects

This patchset refactors the FDB management in the mv88e6xxx code and adds the
glue in DSA to use the switchdev FDB objects.

Below is an usage example (ports 0-2 belongs to br0, ports 3-4 belongs to br1):

    # bridge fdb add 3c:97:0e:11:30:6e dev swp2
    # bridge fdb add 3c:97:0e:11:40:78 dev swp3
    # bridge fdb add 3c:97:0e:11:50:86 dev swp4
    # bridge fdb del 3c:97:0e:11:40:78 dev swp3
    # bridge fdb
    01:00:5e:00:00:01 dev eth0 self permanent
    01:00:5e:00:00:01 dev eth1 self permanent
    00:50:d2:10:78:15 dev swp0 master br0 permanent
    3c:97:0e:11:30:6e dev swp2 self static
    00:50:d2:10:78:15 dev swp3 master br1 permanent
    3c:97:0e:11:50:86 dev swp4 self static
    # cat /sys/kernel/debug/dsa0/atu
    # DB   T/P  Vec State Addr
    # 001  Port 004   e   3c:97:0e:11:30:6e
    # 004  Port 010   e   3c:97:0e:11:50:86

For the 88E6xxx switches, FIDs 1 to num_ports will be reserved for non-bridged
ports and bridge groups, and the remaining will be later used by VLANs.

This change is necessary to welcome the support for hardware VLANs (which will
follow soon).

Changes in v3:

- reorder commits to improve bisectability and minimize diffs

- add an ndm_state member in switchdev_fdb_obj instead of an is_static boolean

- drop the need to convert unsigned char *addr to u8 addr[ETH_ALEN]
   (it is casted to char pointer anyway)

Changes in v2:

- remove ndo_bridge_{get,set,del}link from switchdev/DSA glue code

- use ether_addr_copy instead of memcpy for MAC addresses

- constify MAC address in port_fdb_{add,del}

- split the mv88e6xxx code refactoring into several patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: add support for switchdev FDB objects

Implement the switchdev_port_obj_{add,del,dump} functions in DSA to
support the SWITCHDEV_OBJ_PORT_FDB objects.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: switchdev: support static FDB addresses

This patch adds an ndm_state member to the switchdev_obj_fdb structure,
in order to support static FDB addresses.

Set Rocker ndm_state to NUD_REACHABLE.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>