git.cascardo.info Git - cascardo/linux.git/log

Merge git://git./linux/kernel/git/davem/net

Conflicts:
drivers/net/phy/bcm7xxx.c
drivers/net/phy/marvell.c
drivers/net/vxlan.c

All three conflicts were cases of simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:
"Looks like a lot, but mostly driver fixes scattered all over as usual.

  Of note:

   1) Add conditional sched in nf conntrack in cleanup to avoid NMI
      watchdogs.  From Florian Westphal.

   2) Fix deadlock in nfnetlink cttimeout, also from Floarian.

   3) Fix handling of slaves in bonding ARP monitor validation, from Jay
      Vosburgh.

   4) Callers of ip_cmsg_send() are responsible for freeing IP options,
      some were not doing so.  Fix from Eric Dumazet.

   5) Fix per-cpu bugs in mvneta driver, from Gregory CLEMENT.

   6) Fix vlan handling in mv88e6xxx DSA driver, from Vivien Didelot.

   7) bcm7xxx PHY driver bug fixes from Florian Fainelli.

   8) Avoid unaligned accesses to protocol headers wrt.  GRE, from
      Alexander Duyck.

   9) SKB leaks and other problems in arc_emac driver, from Alexander
      Kochetkov.

  10) tcp_v4_inbound_md5_hash() releases listener socket instead of
      request socket on error path, oops.  Fix from Eric Dumazet.

  11) Missing socket release in pppoe_rcv_core() that seems to have
      existed basically forever.  From Guillaume Nault.

  12) Missing slave_dev unregister in dsa_slave_create() error path,
      from Florian Fainelli.

  13) crypto_alloc_hash() never returns NULL, fix return value check in
      __tcp_alloc_md5sig_pool.  From Insu Yun.

  14) Properly expire exception route entries in ipv4, from Xin Long.

  15) Fix races in tcp/dccp listener socket dismantle, from Eric
      Dumazet.

  16) Don't set IFF_TX_SKB_SHARING in vxlan, geneve, or GRE, it's not
      legal.  These drivers modify the SKB on transmit.  From Jiri Benc.

  17) Fix regression in the initialziation of netdev->tx_queue_len.
      From Phil Sutter.

  18) Missing unlock in tipc_nl_add_bc_link() error path, from Insu Yun.

  19) SCTP port hash sizing does not properly ensure that table is a
      power of two in size.  From Neil Horman.

  20) Fix initializing of software copy of MAC address in fmvj18x_cs
      driver, from Ken Kawasaki"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (129 commits)
  bnx2x: Fix 84833 phy command handler
  bnx2x: Fix led setting for 84858 phy.
  bnx2x: Correct 84858 PHY fw version
  bnx2x: Fix 84833 RX CRC
  bnx2x: Fix link-forcing for KR2
  net: ethernet: davicom: fix devicetree irq resource
  fmvj18x_cs: fix incorrect indexing of dev->dev_addr[] when copying the MAC address
  Driver: Vmxnet3: Update Rx ring 2 max size
  net: netcp: rework the code for get/set sw_data in dma desc
  soc: ti: knav_dma: rename pad in struct knav_dma_desc to sw_data
  net: ti: netcp: restore get/set_pad_info() functionality
  MAINTAINERS: Drop myself as xen netback maintainer
  sctp: Fix port hash table size computation
  can: ems_usb: Fix possible tx overflow
  Bluetooth: hci_core: Avoid mixing up req_complete and req_complete_skb
  net: bcmgenet: Fix internal PHY link state
  af_unix: Don't use continue to re-execute unix_stream_read_generic loop
  unix_diag: fix incorrect sign extension in unix_lookup_by_ino
  bnxt_en: Failure to update PHY is not fatal condition.
  bnxt_en: Remove unnecessary call to update PHY settings.
  ...

Merge tag 'hwmon-for-linus-v4.5-rc6' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
"Two fixes headed for stable:

   - Remove an unnecessary speed_index lookup for thermal hook in the
     gpio-fan driver.  The unnecessary speed lookup can hog the system.

   - Handle negative conversion values correctly in the ads1015 driver"

* tag 'hwmon-for-linus-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (gpio-fan) Remove un-necessary speed_index lookup for thermal hook
  hwmon: (ads1015) Handle negative conversion values correctly

Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma

Pull rdma fixes from Doug Ledford:
"One ocrdma fix:

   - The new CQ API support was added to ocrdma, but they got the arming
     logic wrong, so without this, transfers eventually fail when they
     fail to arm the interrupt properly under load

  Two related fixes for mlx4:

   - When we added the 64bit extended counters support to the core IB
     code, they forgot to update the RoCE side of the mlx4 driver (the
     IB side they properly updated).

     I debated whether or not to include these patches as they could be
     considered feature enablement patches, but the existing code will
     blindy copy the 32bit counters, whether any counters were requested
     at all (a bug).

     These two patches make it (a) check to see that counters were
     requested and (b) copy the right counters (the 64bit support is
     new, the 32bit is not).  For that reason I went ahead and took
     them"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/mlx4: Add support for the port info class for RoCE ports
  IB/mlx4: Add support for extended counters over RoCE ports
  RDMA/ocrdma: Fix arm logic to align with new cq API

Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Some bugfixes from I2C for you:

  A fix for a RuntimePM regression with OMAP, a fix to enable TCO for
  Lewisburg platforms, and a typo fix while we are here"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: i801: Adding Intel Lewisburg support for iTCO
  i2c: uniphier: fix typos in error messages
  i2c: omap: Fix PM regression with deferred probe for pm_runtime_reinit

tools, bpf_asm: simplify parser rule for BPF extensions

We can already use yylval in the lexer for encoding the BPF extension
number, so that the parser rules can be further reduced to a single one
for each B/H/W case.

Signed-off-by: Ray Bellis <ray@isc.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netcp: use pointer to fix build fail

While building keystone_defconfig of arm we are getting build failure
with the error:

drivers/net/ethernet/ti/netcp_core.c:1846:31: error: invalid type argument of '->' (have 'struct tc_to_netdev')
  if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
                               ^
drivers/net/ethernet/ti/netcp_core.c:1851:35: error: invalid type argument of '->' (have 'struct tc_to_netdev')
      (dev->real_num_tx_queues < tc->tc))
                                   ^
drivers/net/ethernet/ti/netcp_core.c:1855:8: error: invalid type argument of '->' (have 'struct tc_to_netdev')
  if (tc->tc) {
        ^
drivers/net/ethernet/ti/netcp_core.c:1856:28: error: invalid type argument of '->' (have 'struct tc_to_netdev')
   netdev_set_num_tc(dev, tc->tc);
                            ^
drivers/net/ethernet/ti/netcp_core.c:1857:21: error: invalid type argument of '->' (have 'struct tc_to_netdev')
   for (i = 0; i < tc->tc; i++)
                     ^
drivers/net/ethernet/ti/netcp_core.c: At top level:
drivers/net/ethernet/ti/netcp_core.c:1879:2: warning: initialization from incompatible pointer type
  .ndo_setup_tc  = netcp_setup_tc,
  ^

The callback of ndo_setup_tc should be:
int (*ndo_setup_tc)(struct net_device *dev, u32 handle, __be16 protocol,
                    struct tc_to_netdev *tc);

But we missed marking the last argument as a pointer.

Fixes: 16e5cc647173 ("net: rework setup_tc ndo op to consume general tc operand")
CC: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-fixes-for-4.5-20160221' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2016-02-21

this is a pull reqeust of one patch for net/master.

The patch is by Gerhard Uttenthaler and fixes a potential tx overflow in the
ems_usb driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bnx2x-848xx-phy-fixes'

Yuval Mintz says:

====================
bnx2x: Fix 848xx phys

This series contains link-related fixes, mostly for the 848xx phys
[2 patches are for 84833, and 2 patches are for 84858].
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Fix 84833 phy command handler

Current initialization sequence is lacking, causing some configurations
to fail.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Fix led setting for 84858 phy.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Correct 84858 PHY fw version

The phy's firmware version isn't being parsed properly as it's
currently parsed like the rest of the 848xx phys.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Fix 84833 RX CRC

There's a problem in current 84833 phy configuration -
in case 1Gb link is configured and jumbo-sized packets are being
used, device will experience RX crc errors.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Fix link-forcing for KR2

Currently, when link is using KR2 it cannot be forced to any speed other
than 20g.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.om>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qed-next'

Yuval Mintz says:

====================
qed*: Driver updates

This contains various minor changes to driver - changing memory allocation,
fixing a small theoretical bug, as well as some mostly-semantic changes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qed,qede: Bump driver versions to 8.7.0.0

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Introduce DMA_REGPAIR_LE

FW hsi contains regpairs, mostly for 64-bit address representations.
Since same paradigm is applied each time a regpair is filled, this
introduces a new utility macro for setting such regpairs.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Change metadata needed for SPQ entries

Each configuration element send via ramrod requires a Slow Path Queue
entry. This slightly changes the way such an entry is configured, but
contains mostly semantic changes [where more parameters are gathered
in a sub-struct instead of being directly passed].

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Handle possible race in SB config

Due to HW design, some of the memories are wide-bus and access to those
needs to be sequentialized on a per-HW-block level; Read/write to a
given HW-block might break other read/write to wide-bus memory done at
~same time.

Status blocks initialization in CAU is done into such a wide-bus memory.
This moves the initialization into using DMAE which is guaranteed to be
safe to use on such memories.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Turn most GFP_ATOMIC into GFP_KERNEL

Initial driver submission used GFP_ATOMIC almost inclusively when
allocating memory. We now remedy this point, using GFP_KERNEL where
it's possible.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth

Johan Hedberg says:

====================
pull request: bluetooth 2016-02-20

Here's an important patch for 4.5 which fixes potential invalid pointer
access when processing completed Bluetooth HCI commands.

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipvlan-misc'

Mahesh Bandewar says:

====================
IPvlan misc patches

This is a collection of unrelated patches for IPvlan driver.
a. crub_skb() changes are added to ensure that the packets hit the
NF_HOOKS in masters' ns in L3 mode.
b. u16 change is bug fix while
c. the third patch is to group tx/rx variables in single cacheline
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipvlan: misc changes

1. scope correction for few functions that are used in single file.
2. Adjust variables that are used in fast-path to fit into single cacheline
3. Update rcv_frame() to skip shared check for frames coming over wire

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipvlan: mode is u16

The mode argument was erronusly defined as u32 but it has always
been u16. Also use ipvlan_set_mode() helper to set the mode instead
of assigning directly. This should avoid future erronus assignments /
updates.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipvlan: scrub skb before routing in L3 mode.

Scrub skb before hitting the iptable hooks to ensure packets hit
these hooks. Set the xnet param only when the packet is crossing the
ns boundry so if the IPvlan slave and master belong to the same ns,
the param will be set to false.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: davicom: fix devicetree irq resource

The dm9000 driver doesn't work in at least one device-tree
configuration, spitting an error message on irq resource :
[    1.062495] dm9000 8000000.ethernet: insufficient resources
[    1.068439] dm9000 8000000.ethernet: not found (-2).
[    1.073451] dm9000: probe of 8000000.ethernet failed with error -2

The reason behind is that the interrupt might be provided by a gpio
controller, not probed when dm9000 is probed, and needing the probe
deferral mechanism to apply.

Currently, the interrupt is directly taken from resources. This patch
changes this to use the more generic platform_get_irq(), which handles
the deferral.

Moreover, since commit Fixes: 7085a7401ba5 ("drivers: platform: parse
IRQ flags from resources"), the interrupt trigger flags are honored in
platform_get_irq(), so remove the needless code in dm9000.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Acked-by: Marcel Ziswiler <marcel@ziswiler.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Tested-by: Sergei Ianovich <ynvich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-next-for-4.6-20160220' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2016-02-20

this is a pull request of 9 patch for net-next/master.

The first 3 patches are from Damien Riegel, they add support for
Technologic Systems IP core to tje sja100 driver. The next patches 6 by
Marek Vasut (including one my me) first clean sort the CAN driver's
Kconfig and Makefiles and then add support for the IFI CANFD IP core.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

fmvj18x_cs: fix incorrect indexing of dev->dev_addr[] when copying the MAC address

fix incorrect indexing of dev->dev_addr[] when copying the MAC address
of FMV-J182 at buf[5].

Signed-off-by: Ken Kawasaki <ken_kawasaki@nifty.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-helper-improvements'

Daniel Borkmann says:

====================
BPF updates

This set contains various updates for eBPF, i.e. the addition of a
generic csum helper function and other misc bits that mostly improve
existing helpers and ease programming with eBPF on cls_bpf. For more
details, please see individual patches.

Set is rebased on top of http://patchwork.ozlabs.org/patch/584465/.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: don't emit mov A,A on return

While debugging with bpf_jit_disasm I noticed emissions of 'mov %eax,%eax',
and found that this comes from BPF_RET | BPF_A translations from classic
BPF. Emitting this is unnecessary as BPF_REG_A is mapped into BPF_REG_0
already, therefore only emit a mov when immediates are used as return value.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: fix csum update in bpf_l4_csum_replace helper for udp

When using this helper for updating UDP checksums, we need to extend
this in order to write CSUM_MANGLED_0 for csum computations that result
into 0 as sum. Reason we need this is because packets with a checksum
could otherwise become incorrectly marked as a packet without a checksum.
Likewise, if the user indicates BPF_F_MARK_MANGLED_0, then we should
not turn packets without a checksum into ones with a checksum.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: try harder on clones when writing into skb

When we're dealing with clones and the area is not writeable, try
harder and get a copy via pskb_expand_head(). Replace also other
occurences in tc actions with the new skb_try_make_writable().

Reported-by: Ashhad Sheikh <ashhadsheikh394@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: remove artificial bpf_skb_{load, store}_bytes buffer limitation

We currently limit bpf_skb_store_bytes() and bpf_skb_load_bytes()
helpers to only store or load a maximum buffer of 16 bytes. Thus,
loading, rewriting and storing headers require several bpf_skb_load_bytes()
and bpf_skb_store_bytes() calls.

Also here we can use a per-cpu scratch buffer instead in order to not
pressure stack space any further. I do suspect that this limit was mainly
set in place for this particular reason. So, ease program development
by removing this limitation and make the scratchpad generic, so it can
be reused.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add generic bpf_csum_diff helper

For L4 checksums, we currently have bpf_l4_csum_replace() helper. It's
currently limited to handle 2 and 4 byte changes in a header and feeds the
from/to into inet_proto_csum_replace{2,4}() helpers of the kernel. When
working with IPv6, for example, this makes it rather cumbersome to deal
with, similarly when editing larger parts of a header.

Instead, extend the API in a more generic way: For bpf_l4_csum_replace(),
add a case for header field mask of 0 to change the checksum at a given
offset through inet_proto_csum_replace_by_diff(), and provide a helper
bpf_csum_diff() that can generically calculate a from/to diff for arbitrary
amounts of data.

This can be used in multiple ways: for the bpf_l4_csum_replace() only
part, this even provides us with the option to insert precalculated diffs
from user space f.e. from a map, or from bpf_csum_diff() during runtime.

bpf_csum_diff() has a optional from/to stack buffer input, so we can
calculate a diff by using a scratchbuffer for scenarios where we're
inserting (from is NULL), removing (to is NULL) or diffing (from/to buffers
don't need to be of equal size) data. Also, bpf_csum_diff() allows to
feed a previous csum into csum_partial(), so the function can also be
cascaded.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add new arg_type that allows for 0 sized stack buffer

Currently, when we pass a buffer from the eBPF stack into a helper
function, the function proto indicates argument types as ARG_PTR_TO_STACK
and ARG_CONST_STACK_SIZE pair. If R<X> contains the former, then R<X+1>
must be of the latter type. Then, verifier checks whether the buffer
points into eBPF stack, is initialized, etc. The verifier also guarantees
that the constant value passed in R<X+1> is greater than 0, so helper
functions don't need to test for it and can always assume a non-NULL
initialized buffer as well as non-0 buffer size.

This patch adds a new argument types ARG_CONST_STACK_SIZE_OR_ZERO that
allows to also pass NULL as R<X> and 0 as R<X+1> into the helper function.
Such helper functions, of course, need to be able to handle these cases
internally then. Verifier guarantees that either R<X> == NULL && R<X+1> == 0
or R<X> != NULL && R<X+1> != 0 (like the case of ARG_CONST_STACK_SIZE), any
other combinations are not possible to load.

I went through various options of extending the verifier, and introducing
the type ARG_CONST_STACK_SIZE_OR_ZERO seems to have most minimal changes
needed to the verifier.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'geneve-vxlan-outer-checksum'

Alexander Duyck says:

====================
GENEVE/VXLAN: Enable outer Tx checksum by default

This patch series makes it so that we enable the outer Tx checksum for IPv4
tunnels by default.  This makes the behavior consistent with how we were
handling this for IPv6.  In addition I have updated the internal flags for
these tunnels so that we use a ZERO_CSUM_TX flag for IPv4 which should
match up will with the ZERO_CSUM6_TX flag which was already in use for
IPv6.

For most network devices this should be a net gain in terms of performance
as having the outer header checksum present allows for devices to report
CHECKSUM_UNNECESSARY which we can then convert to CHECKSUM_COMPLETE in order
to determine if the inner header checksum is valid.

Below is some data I collected with ixgbe with an X540 that demonstrates
this.  I located two PFs connected back to back in two different name
spaces and then setup a pair of tunnels on each, one with checksum enabled
and one without.

Recv   Send    Send                          Utilization
Socket Socket  Message  Elapsed              Send
Size   Size    Size     Time     Throughput  local
bytes  bytes   bytes    secs.    10^6bits/s  % S

noudpcsum:
87380  16384  16384    30.00      8898.67   12.80
udpcsum:
87380  16384  16384    30.00      9088.47   5.69

The one spot where this may cause a performance regression is if the
environment contains devices that can parse the inner headers and a device
supports NETIF_F_GSO_UDP_TUNNEL but not NETIF_F_GSO_UDP_TUNNEL_CSUM.  In
the case of such a device we have to fall back to using GSO to segment the
tunnel instead of TSO and as a result we may take a performance hit as seen
below with i40e.

Recv   Send    Send                          Utilization
Socket Socket  Message  Elapsed              Send
Size   Size    Size     Time     Throughput  local
bytes  bytes   bytes    secs.    10^6bits/s  % S

noudpcsum:
87380  16384  16384    30.00      9085.21   3.32
udpcsum:
87380  16384  16384    30.00      9089.23   5.54

In addition it will be necessary to update iproute2 so that we don't
provide the checksum attribute unless specified.  This way on older kernels
which don't have local checksum offload we will default to disabling the
outer checksum, and on newer kernels that have LCO we can default to
enabling it.

I also haven't investigated the effect this will have on OVS.  However I
suspect the impact should be minimal as the worst case scenario should be
that Tx checksumming will become enabled by default which should be
consistent with the existing behavior for IPv6.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

VXLAN: Support outer IPv4 Tx checksums by default

This change makes it so that if UDP CSUM is not specified we will default
to enabling it. The main motivation behind this is the fact that with the
use of outer checksum we can greatly improve the performance for VXLAN
tunnels on devices that don't know how to parse tunnel headers.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

GENEVE: Support outer IPv4 Tx checksums by default

This change makes it so that if UDP CSUM is not specified we will default
to enabling it. The main motivation behind this is the fact that with the
use of outer checksum we can greatly improve the performance for GENEVE
tunnels on hardware that doesn't know how to parse them.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Driver: Vmxnet3: Update Rx ring 2 max size

Device emulation supports max size of 4096.

Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: Bhavesh Davda <bhavesh@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'netcp-fixes'

Murali Karicheri says:

====================
net: ti: netcp: restore get/set_pad_info() functionality

This series fixes a regression and add some improvements for the ease
of maintainance. Incorporated comments against v1.

Changelogs:

v2 : combined 2-3 into one patch as this involves a header change
      fixed a parse warning in 3/4 per comment from Arnd.
      Removed Sign-off from Arnd against 1/4
      added comments in 3/3 to alert on the usage of sw data per review
      comments
v1 : added 2-4 to accomodate feedback received from review
v0 : initial version to fix the regression (From Grygorii)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: rework the code for get/set sw_data in dma desc

SW data field in descriptor can be used by software to hold private
data for the driver. As there are 4 words available for this purpose,
use separate macros to place it or retrieve the same to/from
descriptors. Also do type cast of data types accordingly.

Cc: Wingman Kwok <w-kwok2@ti.com>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Grygorii Strashko <grygorii.strashko@ti.com>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

soc: ti: knav_dma: rename pad in struct knav_dma_desc to sw_data

Rename the pad to sw_data as per description of this field in the hardware
spec(refer sprugr9 from www.ti.com). Latest version of the document is
at http://www.ti.com/lit/ug/sprugr9h/sprugr9h.pdf and section 3.1
Host Packet Descriptor describes this field.

Define and use a constant for the size of sw_data field similar to
other fields in the struct for desc and document the sw_data field
in the header. As the sw_data is not touched by hw, it's type can be
changed to u32.

Rename the helpers to match with the updated dma desc field sw_data.

Cc: Wingman Kwok <w-kwok2@ti.com>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Grygorii Strashko <grygorii.strashko@ti.com>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ti: netcp: restore get/set_pad_info() functionality

The commit 899077791403 ("netcp: try to reduce type confusion in
descriptors") introduces a regression in Kernel 4.5-rc1 and it breaks
get/set_pad_info() functionality.

The TI NETCP driver uses pad0 and pad1 fields of knav_dma_desc to
store DMA/MEM buffer pointer and buffer size respectively. And in both
cases for Keystone 2 the pointer type size is 32 bit regardless of
LAPE enabled or not, because CONFIG_ARCH_DMA_ADDR_T_64BIT originally
is not expected to be defined.

Unfortunately, above commit changed buffer's pointers save/restore
code (get/set_pad_info()) and added intermediate conversation to u64
which works incorrectly on 32bit Keystone 2 and causes TI NETCP driver
crash in RX/TX path due to "Unable to handle kernel NULL pointer"
exception. This issue was reported and discussed in [1].

Hence, fix it by partially reverting above commit and restoring
get/set_pad_info() functionality as it was before.

[1] https://www.mail-archive.com/netdev@vger.kernel.org/msg95361.html
Cc: Wingman Kwok <w-kwok2@ti.com>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
CC: David Laight <David.Laight@ACULAB.COM>
CC: Arnd Bergmann <arnd@arndb.de>
Reported-by: Franklin S Cooper Jr <fcooper@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'lwt-autoload'

Robert Shearman says:

====================
lwtunnel: autoload of lwt modules

Changes since v1:
- remove "LWTUNNEL_ENCAP_" prefix for the string form of the encaps
   used when requesting the module to reduce duplication, and don't
   bother returning strings for lwt modules using netdevices, both
   suggested by Jiri.
- update commit message of first patch to clarify security
   implications, in response to Eric's comments.

The lwt implementations using net devices can autoload using the
existing mechanism using IFLA_INFO_KIND. However, there's no mechanism
that lwt modules not using net devices can use.

Therefore, these patches add the ability to autoload modules
registering lwt operations for lwt implementations not using a net
device so that users don't have to manually load the modules.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ila: autoload module

Avoid users having to manually load the module by adding a module
alias allowing it to be autoloaded by the lwt infra.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mpls: autoload lwt module

Avoid users having to manually load the module by adding a module
alias allowing it to be autoloaded by the lwt infra.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lwtunnel: autoload of lwt modules

The lwt implementations using net devices can autoload using the
existing mechanism using IFLA_INFO_KIND. However, there's no mechanism
that lwt modules not using net devices can use.

Therefore, add the ability to autoload modules registering lwt
operations for lwt implementations not using a net device so that
users don't have to manually load the modules.

Only users with the CAP_NET_ADMIN capability can cause modules to be
loaded, which is ensured by rtnetlink_rcv_msg rejecting non-RTM_GETxxx
messages for users without this capability, and by
lwtunnel_build_state not being called in response to RTM_GETxxx
messages.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: Drop myself as xen netback maintainer

Wei has been picking this up for quite a while now.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: turn on unicast filtering on vlan device

Currently vlan device inherits unicast filtering flag from underlying
device. If underlying device doesn't support unicast filter, this will
put vlan device into promiscuous mode when it's stacked.

Tun on IFF_UNICAST_FLT on the vlan device in any case so that it does
not go into promiscuous mode needlessly. If underlying device does not
support unicast filtering, that device will enter promiscuous mode.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: Fix port hash table size computation

Dmitry Vyukov noted recently that the sctp_port_hashtable had an error in
its size computation, observing that the current method never guaranteed
that the hashsize (measured in number of entries) would be a power of two,
which the input hash function for that table requires.  The root cause of
the problem is that two values need to be computed (one, the allocation
order of the storage requries, as passed to __get_free_pages, and two the
number of entries for the hash table).  Both need to be ^2, but for
different reasons, and the existing code is simply computing one order
value, and using it as the basis for both, which is wrong (i.e. it assumes
that ((1<<order)*PAGE_SIZE)/sizeof(bucket) is still ^2 when its not).

To fix this, we change the logic slightly.  We start by computing a goal
allocation order (which is limited by the maximum size hash table we want
to support.  Then we attempt to allocate that size table, decreasing the
order until a successful allocation is made.  Then, with the resultant
successful order we compute the number of buckets that hash table supports,
which we then round down to the nearest power of two, giving us the number
of entries the table actually supports.

I've tested this locally here, using non-debug and spinlock-debug kernels,
and the number of entries in the hashtable consistently work out to be
powers of two in all cases.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
CC: Dmitry Vyukov <dvyukov@google.com>
CC: Vladislav Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

can: ems_usb: Fix possible tx overflow

This patch fixes the problem that more CAN messages could be sent to the
interface as could be send on the CAN bus. This was more likely for slow baud
rates. The sleeping _start_xmit was woken up in the _write_bulk_callback. Under
heavy TX load this produced another bulk transfer without checking the
free_slots variable and hence caused the overflow in the interface.

Signed-off-by: Gerhard Uttenthaler <uttenthaler@ems-wuensche.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Linux 4.5-rc5

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"This is unusually large, partly due to the EFI fixes that prevent
  accidental deletion of EFI variables through efivarfs that may brick
  machines.  These fixes are somewhat involved to maintain compatibility
  with existing install methods and other usage modes, while trying to
  turn off the 'rm -rf' bricking vector.

  Other fixes are for large page ioremap()s and for non-temporal
  user-memcpy()s"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix vmalloc_fault() to handle large pages properly
  hpet: Drop stale URLs
  x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()
  x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable
  lib/ucs2_string: Correct ucs2 -> utf8 conversion
  efi: Add pstore variables to the deletion whitelist
  efi: Make efivarfs entries immutable by default
  efi: Make our variable validation list include the guid
  efi: Do variable name validation tests in utf8
  efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version
  lib/ucs2_string: Add ucs2 -> utf8 helper functions

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"A handful of CPU hotplug related fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Plug potential memory leak in CPU_UP_PREPARE
  perf/core: Remove the bogus and dangerous CPU_DOWN_FAILED hotplug state
  perf/core: Remove bogus UP_CANCELED hotplug state
  perf/x86/amd/uncore: Plug reference leak

Merge tag 'powerpc-4.5-3' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
- Fix build error on 32-bit with checkpoint restart from Aneesh Kumar
- Fix dedotify for binutils >= 2.26 from Andreas Schwab
- Don't trace hcalls on offline CPUs from Denis Kirjanov
- eeh: Fix stale cached primary bus from Gavin Shan
- eeh: Fix stale PE primary bus from Gavin Shan
- mm: Fix Multi hit ERAT cause by recent THP update from Aneesh Kumar K.V
- ioda: Set "read" permission when "write" is set from Alexey Kardashevskiy

* tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/ioda: Set "read" permission when "write" is set
  powerpc/mm: Fix Multi hit ERAT cause by recent THP update
  powerpc/powernv: Fix stale PE primary bus
  powerpc/eeh: Fix stale cached primary bus
  powerpc/pseries: Don't trace hcalls on offline CPUs
  powerpc: Fix dedotify for binutils >= 2.26
  powerpc/book3s_32: Fix build error with checkpoint restart

Merge tag 'dmaengine-fix-4.5-rc5' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
"A few fixes for drivers, nothing major here.

  Fixes are: iotdma fix to restart channels, new ID for wildcat PCH,
  residue fix for edma, disable irq for non-cyclic in dw"

* tag 'dmaengine-fix-4.5-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: dw: disable BLOCK IRQs for non-cyclic xfer
  dmaengine: edma: fix residue race for cyclic
  dmaengine: dw: pci: add ID for WildcatPoint PCH
  dmaengine: IOATDMA: fix timer code that continues to restart channels during idle

Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux

Pull clk driver fixes from Stephen Boyd:
"An assortment of vendor specific clk drivers fixes, most notably
  fallout from adding Tegra210 and rockchip rk3036/rk3368 drivers this
  cycle.

  There's also the random smattering of sparse/checker fixes, a build
  "fix" to get the Tango clk driver to compile because the Kconfig
  symbol was renamed after the fact, and a clk gpio fix for a patch
  mismerge"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (28 commits)
  clk: gpio: Really allow an optional clock= DT property
  Revert "clk: qcom: Specify LE device endianness"
  clk: versatile: mask VCO bits before writing
  clk: tegra: super: Fix sparse warnings for functions not declared as static
  clk: tegra: Fix sparse warnings for functions not declared as static
  clk: tegra: Fix sparse warning for pll_m
  clk: tegra: Use definition for pll_u override bit
  clk: tegra: Fix warning caused by pll_u failing to lock
  clk: tegra: Fix clock sources for Tegra210 EMC
  clk: tegra: Add the APB2APE audio clock on Tegra210
  clk: tegra: Add missing of_node_put()
  clk: tegra: Fix PLLE SS coefficients
  clk: tegra: Fix typos around clearing PLLE bits during enable
  clk: tegra: Do not disable PLLE when under hardware control
  clk: tegra: Fix pllx dyn step calculation
  clk: tegra: pll: Fix potential sleeping-while-atomic
  clk: tegra: Fix the misnaming of nvenc from msenc
  clk: tegra: Fix naming of MISC registers
  clk: tango4: rename ARCH_TANGOX to ARCH_TANGO
  clk: scpi: Fix checking return value of platform_device_register_simple()
  ...

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull more drm fixes from Dave Airlie:
"Some more fixes trickled in:

  A bunch of VC4 ones since it's a pretty new driver not much chance of
  regressions, and it fixes GPU resets.

  Also one atomic fix, one set of fixes for a common bug in TTM cleanup,
  and one i915 hotplug fix"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/nouveau: use post-decrement in error handling
  drm/atomic: Allow for holes in connector state, v2.
  drm/i915: Fix hpd live status bits for g4x
  drm/vc4: Use runtime PM to power cycle the device when the GPU hangs.
  drm/vc4: Enable runtime PM.
  drm/vc4: Fix spurious GPU resets due to BO reuse.
  drm/vc4: Drop error message on seqno wait timeouts.
  drm/vc4: Fix -ERESTARTSYS error return from BO waits.
  drm/vc4: Return an ERR_PTR from BO creation instead of NULL.
  drm/vc4: Fix the clear color for the first tile rendered.
  drm/vc4: Validate that WAIT_BO padding is cleared.
  drm/radeon: use post-decrement in error handling
  drm/amdgpu: use post-decrement in error handling

kernel/resource.c: fix muxed resource handling in __request_region()

In __request_region, if a conflict with a BUSY and MUXED resource is
detected, then the caller goes to sleep and waits for the resource to be
released.  A pointer on the conflicting resource is kept.  At wake-up
this pointer is used as a parent to retry to request the region.

A first problem is that this pointer might well be invalid (if for
example the conflicting resource have already been freed).  Another
problem is that the next call to __request_region() fails to detect a
remaining conflict.  The previously conflicting resource is passed as a
parameter and __request_region() will look for a conflict among the
children of this resource and not at the resource itself.  It is likely
to succeed anyway, even if there is still a conflict.

Instead, the parent of the conflicting resource should be passed to
__request_region().

As a fix, this patch doesn't update the parent resource pointer in the
case we have to wait for a muxed region right after.

Reported-and-tested-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Tested-by: Vincent Donnefort <vdonnefort@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

can: ifi: Add IFI CANFD IP support

The patch adds support for IFI CAN/FD controller [1]. This driver
currently supports sending and receiving both standard CAN and new
CAN/FD frames. Both ISO and BOSCH variant of CAN/FD is supported.

[1] http://www.ifi-pld.de/IP/CANFD/canfd.html

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: ifi: Add DT bindings for ifi,canfd

Add device tree bindings for the I/F/I CANFD controller IP core.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

of: Add vendor prefix for I/F/I

Add vendor prefix for I/F/I, Ingenieurbüro Für IC-Technologie
http://www.ifi-pld.de/

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: Makefile: Sort the Makefile

Just sort the drivers in the Makefile, no functional change.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: Kconfig: sort drivers alphabetically

Sort the drivers that are directly listed in the Kconfig alphabetically, no
functional change.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: Kconfig: Sort the Kconfig includes

Sort the Kconfig includes, no functional change.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: sja1000: of: add compatibility with Technologic Systems version

Technologic Systems provides an IP compatible with the SJA1000,
instantiated in an FPGA. Because of some bus widths issue, access to
registers is made through a "window" that works like this:

base + 0x0: address to read/write
base + 0x2: 8-bit register value

This commit adds a new compatible device, "technologic,sja1000", with
read and write functions using the window mechanism.

Signed-off-by: Damien Riegel <damien.riegel@savoirfairelinux.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: sja1000: add documentation for Technologic Systems version

This commit adds documentation for the Technologic Systems version of
SJA1000. The difference with the NXP version is in the way the registers
are accessed.

Signed-off-by: Damien Riegel <damien.riegel@savoirfairelinux.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: sja1000: of: add per-compatible init hook

This commit adds the capability to allocate and init private data
embedded in the sja1000_priv structure on a per-compatible basis. The
device node is passed as a parameter of the init callback to allow
parsing of custom device tree properties.

Signed-off-by: Damien Riegel <damien.riegel@savoirfairelinux.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Bluetooth: hci_core: Avoid mixing up req_complete and req_complete_skb

In commit 44d271377479 ("Bluetooth: Compress the size of struct
hci_ctrl") we squashed down the size of the structure by using a union
with the assumption that all users would use the flag to determine
whether we had a req_complete or a req_complete_skb.

Unfortunately we had a case in hci_req_cmd_complete() where we weren't
looking at the flag. This can result in a situation where we might be
storing a hci_req_complete_skb_t in a hci_req_complete_t variable, or
vice versa.

During some testing I found at least one case where the function
hci_req_sync_complete() was called improperly because the kernel thought
that it didn't require an SKB. Looking through the stack in kgdb I
found that it was called by hci_event_packet() and that
hci_event_packet() had both of its locals "req_complete" and
"req_complete_skb" pointing to the same place: both to
hci_req_sync_complete().

Let's make sure we always check the flag.

For more details on debugging done, see <http://crbug.com/588288>.

Fixes: 44d271377479 ("Bluetooth: Compress the size of struct hci_ctrl")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Merge branch 'bpf-get-stackid'

Alexei Starovoitov says:

====================
bpf_get_stackid() and stack_trace map

This patch set introduces new map type to store stack traces and
corresponding bpf_get_stackid() helper.
BPF programs already can walk the stack via unrolled loop
of bpf_probe_read()s which is ok for simple analysis, but it's
not efficient and limited to <30 frames after that the programs
don't fit into MAX_BPF_STACK. With bpf_get_stackid() helper
the programs can collect up to PERF_MAX_STACK_DEPTH both
user and kernel frames.
Using stack traces as a key in a map turned out to be very useful
for generating flame graphs, off-cpu graphs, waker and chain graphs.
Patch 3 is a simplified version of 'offwaketime' tool which is
described in detail here:
http://brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html

Earlier version of this patch were using save_stack_trace() helper,
but 'unreliable' frames add to much noise and two equiavlent
stack traces produce different 'stackid's.
Using lockdep style of storing frames with MAX_STACK_TRACE_ENTRIES is
great for lockdep, but not acceptable for bpf, since the stack_trace
map needs to be freed when user Ctrl-C the tool.
The ftrace style with per_cpu(struct ftrace_stack) is great, but it's
tightly coupled with ftrace ring buffer and has the same 'unreliable'
noise. perf_event's perf_callchain() mechanism is also very efficient
and it only needed minor generalization which is done in patch 1
to be used by bpf stack_trace maps.
Peter, please take a look at patch 1.
If you're ok with it, I'd like to take the whole set via net-next.

Patch 1 - generalization of perf_callchain()
Patch 2 - stack_trace map done as lock-less hashtable without link list
  to avoid spinlock on insertion which is critical path when
  bpf_get_stackid() helper is called for every task switch event
Patch 3 - offwaketime example

After the patch the 'perf report' for artificial 'sched_bench'
benchmark that doing pthread_cond_wait/signal and 'offwaketime'
example is running in the background:
16.35%  swapper      [kernel.vmlinux]    [k] intel_idle
  2.18%  sched_bench  [kernel.vmlinux]    [k] __switch_to
  2.18%  sched_bench  libpthread-2.12.so  [.] pthread_cond_signal@@GLIBC_2.3.2
  1.72%  sched_bench  libpthread-2.12.so  [.] pthread_mutex_unlock
  1.53%  sched_bench  [kernel.vmlinux]    [k] bpf_get_stackid
  1.44%  sched_bench  [kernel.vmlinux]    [k] entry_SYSCALL_64
  1.39%  sched_bench  [kernel.vmlinux]    [k] __call_rcu.constprop.73
  1.13%  sched_bench  libpthread-2.12.so  [.] pthread_mutex_lock
  1.07%  sched_bench  libpthread-2.12.so  [.] pthread_cond_wait@@GLIBC_2.3.2
  1.07%  sched_bench  [kernel.vmlinux]    [k] hash_futex
  1.05%  sched_bench  [kernel.vmlinux]    [k] do_futex
  1.05%  sched_bench  [kernel.vmlinux]    [k] get_futex_key_refs.isra.13

The hotest part of bpf_get_stackid() is inlined jhash2, so we may consider
using some faster hash in the future, but it's good enough for now.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

samples/bpf: offwaketime example

This is simplified version of Brendan Gregg's offwaketime:
This program shows kernel stack traces and task names that were blocked and
"off-CPU", along with the stack traces and task names for the threads that woke
them, and the total elapsed time from when they blocked to when they were woken
up. The combined stacks, task names, and total time is summarized in kernel
context for efficiency.

Example:
$ sudo ./offwaketime | flamegraph.pl > demo.svg
Open demo.svg in the browser as FlameGraph visualization.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: introduce BPF_MAP_TYPE_STACK_TRACE

add new map type to store stack traces and corresponding helper
bpf_get_stackid(ctx, map, flags) - walk user or kernel stack and return id
@ctx: struct pt_regs*
@map: pointer to stack_trace map
@flags: bits 0-7 - numer of stack frames to skip
        bit 8 - collect user stack instead of kernel
        bit 9 - compare stacks by hash only
        bit 10 - if two different stacks hash into the same stackid
                 discard old
        other bits - reserved
Return: >= 0 stackid on success or negative error

stackid is a 32-bit integer handle that can be further combined with
other data (including other stackid) and used as a key into maps.

Userspace will access stackmap using standard lookup/delete syscall commands to
retrieve full stack trace for given stackid.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

perf: generalize perf_callchain

. avoid walking the stack when there is no room left in the buffer
. generalize get_perf_callchain() to be called from bpf helper

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Fix internal PHY link state

The PHY link state is not chaged in GENETv2 caused by the previous
commit 49f7a471e4d1 ("net: bcmgenet: Properly configure PHY to ignore
interrupt") was set to PHY_IGNORE_INTERRUPT in bcmgenet_mii_probe().

The internal PHY should use phy_mac_interrupt() when not in use
PHY_POLL. The statement for phy_mac_interrupt() has two conditions. The
first condition to check GENET_HAS_MDIO_INTR is not related PHY link
state, so this patch removes it.

Fixes: 49f7a471e4d1 ("net: bcmgenet: Properly configure PHY to ignore interrupt")
Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af_unix: Don't use continue to re-execute unix_stream_read_generic loop

The unix_stream_read_generic function tries to use a continue statement
to restart the receive loop after waiting for a message. This may not
work as intended as the caller might use a recvmsg call to peek at
control messages without specifying a message buffer. If this was the
case, the continue will cause the function to return without an error
and without the credential information if the function had to wait for a
message while it had returned with the credentials otherwise. Change to
using goto to restart the loop without checking the condition first in
this case so that credentials are returned either way.

Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

unix_diag: fix incorrect sign extension in unix_lookup_by_ino

The value passed by unix_diag_get_exact to unix_lookup_by_ino has type
__u32, but unix_lookup_by_ino's argument ino has type int, which is not
a problem yet.
However, when ino is compared with sock_i_ino return value of type
unsigned long, ino is sign extended to signed long, and this results
to incorrect comparison on 64-bit architectures for inode numbers
greater than INT_MAX.

This bug was found by strace test suite.

Fixes: 5d3cae8bc39d ("unix_diag: Dumping exact socket core")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: use skb_postpush_rcsum instead of own implementations

Replace individual implementations with the recently introduced
skb_postpush_rcsum() helper.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tom Herbert <tom@herbertland.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

phy: marvell/micrel: Fix Unpossible condition

commit 2b2427d06426 ("phy: micrel: Add ethtool statistics counters")
from Dec 30, 2015, leads to the following static checker
warning:

        drivers/net/phy/micrel.c:609 kszphy_get_stat()
        warn: unsigned 'val' is never less than zero.

drivers/net/phy/micrel.c
   602  static u64 kszphy_get_stat(struct phy_device *phydev, int i)
   603  {
   604          struct kszphy_hw_stat stat = kszphy_hw_stats[i];
   605          struct kszphy_priv *priv = phydev->priv;
   606          u64 val;
   607
   608          val = phy_read(phydev, stat.reg);
   609          if (val < 0) {
                    ^^^^^^^
Unpossible!

   610                  val = UINT64_MAX;
   611          } else {
   612                  val = val & ((1 << stat.bits) - 1);
   613                  priv->stats[i] += val;
   614                  val = priv->stats[i];
   615          }
   616
   617          return val;
   618  }

The same problem exists in the Marvell driver. Fix both.

Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Julia.Lawall <julia.lawall@lip6.fr>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bnxt_en-fixes'

Michael Chan says:

====================
bnxt_en: Phy related fixes.

3 small patches to fix PHY related code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Failure to update PHY is not fatal condition.

If we fail to update the PHY, we should print a warning and continue.
The current code to exit is buggy as it has not freed up the NIC
resources yet.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Remove unnecessary call to update PHY settings.

Fix bnxt_update_phy_setting() to check the correct parameters when
determining whether to update the PHY. Requested line speed/duplex should
only be checked for forced speed mode. This avoids unnecessary link
interruptions when loading the driver.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Poll link at the end of __bnxt_open_nic().

When shutting down the NIC, we shutdown async event processing before
freeing all the rings. If there is a link change event during reset, the
driver may miss it and the link state may be incorrect after the NIC is
re-opened. Poll the link at the end of __bnxt_open_nic() to get the
correct link status.

Signed-off-by Michael Chan <michael.chan@broadcom.com>

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ethtool-perqueue-params'

Kan Liang says:

====================
ethtool per queue parameters support

Modern network interface controllers usually support multiple receive
and transmit queues. Each queue may have its own parameters. For
example, Intel XL710/X710 hardware supports per queue interrupt
moderation. However, current ethtool does not support per queue
parameters option. User has to set parameters for the whole NIC.
This series extends ethtool to support per queue parameters option.

Since the support of per queue parameters vary with different cards,
it is impossible to address all cards in one patch. This series only
supports per queue coalesce options on i40e driver. The framework used
in the patch can be easily extended to other cards and parameters.

The lib bitmap needs to be extended to facilitate exchanging queue bitmaps
between user space and kernel space. Two patches from David's latest V8
patch series are also cited in this series. You may refer to
https://lkml.org/lkml/2016/2/9/919 for more details.

Changes since V6:
- Rebase on commit 76d13b568776. Did minor change in patch 6.

Changes since V5:
- Add test_bitmap.c and bitmap.sh in the series. They are forgot
   to be added previously.
- Update the first two patches to David's latest V8 version. The changes
   include
      - bitmap u32 API returns number of bits copied, unit tests updated
      - module_exit in test_bitmap
- Also change the mode of bitmap.sh to 755 according to Ben's suggestion

Changes since V4:
- Modify set/get_per_queue_coalesce function description
- Change the queue number to be u32
- Correct an error of calculating coalesce backup buffer address
- Rename queue_num to n_queues
- Don't log error message in __i40e_get_coalesce

Changes since V3:
- Based on David's lib bitmap.
- ETHTOOL_PERQUEUE should be handled before the containing switch
- Make the rollback code unconditional
- some minor changes according to Ben's feedback

Changes since V2:
- Add queue-specific settings for interrupt moderation in i40e

Changes since V1:
- Checking the sub-command number to determine whether the command
   requires CAP_NET_ADMIN
- Refine the struct ethtool_per_queue_op and improve the comments
- Use bitmap functions to parse queue mask
- Improve comments
- Use bitmap functions to parse queue mask
- Improve comments
- Add rollback support
- Correct the way to find the vector for specific queue.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

i40e/ethtool: support coalesce setting by queue

This patch implements set_per_queue_coalesce for i40e driver.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e/ethtool: support coalesce getting by queue

This patch implements get_per_queue_coalesce for i40e driver.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: queue-specific settings for interrupt moderation

For i40e driver, each vector has its own ITR register. However, there
are no concept of queue-specific settings in the driver proper. Only
global variable is used to store ITR values. That will cause problems
especially when resetting the vector. The specific ITR values could be
lost.
This patch move rx_itr_setting and tx_itr_setting to i40e_ring to store
specific ITR register for each queue.
i40e_get_coalesce and i40e_set_coalesce are also modified accordingly to
support queue-specific settings. To make it compatible with old ethtool,
if user doesn't specify the queue number, i40e_get_coalesce will return
queue 0's value. While i40e_set_coalesce will apply value to all queues.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ethtool: support set coalesce per queue

This patch implements sub command ETHTOOL_SCOALESCE for ioctl
ETHTOOL_PERQUEUE. It introduces an interface set_per_queue_coalesce to
set coalesce of each masked queue to device driver. The wanted coalesce
information are stored in "data" for each masked queue, which can copy
from userspace.
If it fails to set coalesce to device driver, the value which already
set to specific queue will be tried to rollback.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ethtool: support get coalesce per queue

This patch implements sub command ETHTOOL_GCOALESCE for ioctl
ETHTOOL_PERQUEUE. It introduces an interface get_per_queue_coalesce to
get coalesce of each masked queue from device driver. Then the interrupt
coalescing parameters will be copied back to user space one by one.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ethtool: introduce a new ioctl for per queue setting

Introduce a new ioctl ETHTOOL_PERQUEUE for per queue parameters setting.
The following patches will enable some SUB_COMMANDs for per queue
setting.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

test_bitmap: unit tests for lib/bitmap.c

This is mainly testing bitmap construction and conversion to/from u32[]
for now.

Tested:
qemu i386, x86_64, ppc, ppc64 BE and LE, ARM.

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lib/bitmap.c: conversion routines to/from u32 array

Aimed at transferring bitmaps to/from user-space in a 32/64-bit agnostic
way.

Tested:
unit tests (next patch) on qemu i386, x86_64, ppc, ppc64 BE and LE,
ARM.

Signed-off-by: David Decotigny <decot@googlers.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

hwmon: (gpio-fan) Remove un-necessary speed_index lookup for thermal hook

Thermal hook gpio_fan_get_cur_state is only interested in knowing
the current speed index that was setup in the system, this is
already available as part of fan_data->speed_index which is always
set by set_fan_speed. Using get_fan_speed_index is useful when we
have no idea about the fan speed configuration (for example during
fan_ctrl_init).

When thermal framework invokes
gpio_fan_get_cur_state=>get_fan_speed_index via gpio_fan_get_cur_state
especially in a polled configuration for thermal governor, we
basically hog the i2c interface to the extent that other functions
fail to get any traffic out :(.

Instead, just provide the last state set in the driver - since the gpio
fan driver is responsible for the fan state immaterial of override, the
fan_data->speed_index should accurately reflect the state.

Fixes: b5cf88e46bad ("(gpio-fan): Add thermal control hooks")
Reported-by: Tony Lindgren <tony@atomide.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Cc: stable@vger.kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 bugfixes from Ted Ts'o:
"Miscellaneous ext4 bug fixes for v4.5"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix crashes in dioread_nolock mode
  ext4: fix bh->b_state corruption
  ext4: fix memleak in ext4_readdir()
  ext4: remove unused parameter "newblock" in convert_initialized_extent()
  ext4: don't read blocks from disk after extents being swapped
  ext4: fix potential integer overflow
  ext4: add a line break for proc mb_groups display
  ext4: ioctl: fix erroneous return value
  ext4: fix scheduling in atomic on group checksum failure
  ext4 crypto: move context consistency check to ext4_file_open()
  ext4 crypto: revalidate dentry after adding or removing the key

Merge branch 'for-linus-4.5' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs fix from Chris Mason:
"My for-linus-4.5 branch has a btrfs DIO error passing fix.

  I know how much you love DIO, so I'm going to suggest against reading
  it.  We'll follow up with a patch to drop the error arg from
  dio_end_io in the next merge window."

* 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix direct IO requests not reporting IO error to user space

Merge branch 'akpm' (patches from Andrew)

Merge fixes from Andrew Morton:
"10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: slab: free kmem_cache_node after destroy sysfs file
  ipc/shm: handle removed segments gracefully in shm_mmap()
  MAINTAINERS: update Kselftest Framework mailing list
  devm_memremap_release(): fix memremap'd addr handling
  mm/hugetlb.c: fix incorrect proc nr_hugepages value
  mm, x86: fix pte_page() crash in gup_pte_range()
  fsnotify: turn fsnotify reaper thread into a workqueue job
  Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"
  mm: fix regression in remap_file_pages() emulation
  thp, dax: do not try to withdraw pgtable from non-anon VMA

ser_gigaset: use container_of() instead of detour

The purpose of gigaset_device_release() is to kfree() the struct
ser_cardstate that contains our struct device. This is done via a bit of
a detour. First we make our struct device's driver_data point to the
container of our struct ser_cardstate (which is a struct cardstate). In
gigaset_device_release() we then retrieve that driver_data again. And
after that we finally kfree() the struct ser_cardstate that was saved in
the struct cardstate.

All of this can be achieved much easier by using container_of() to get
from our struct device to its container, struct ser_cardstate. Do so.

Note that at the time the detour was implemented commit b8b2c7d845d5
("base/platform: assert that dev_pm_domain callbacks are called
unconditionally") had just entered the tree. That commit disconnected
our platform_device and our platform_driver. These were reconnected
again in v4.5-rc2 through commit 25cad69f21f5 ("base/platform: Fix
platform drivers with no probe callback"). And one of the consequences
of that fix was that it broke the detour via driver_data. That's because
it made __device_release_driver() stop being a NOP for our struct device
and actually do stuff again. One of the things it now does, is setting
our driver_data to NULL. That, in turn, makes it impossible for
gigaset_device_release() to get to our struct cardstate. Which has the
net effect of leaking a struct ser_cardstate at every call of this
driver's tty close() operation. So using container_of() has the
additional benefit of actually working.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-drivers-for-davem-2016-02-18' of git://git./linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
rtlwifi

* fix broken VHT (802.11ac) support, reported by Linus

wlcore

* fix firmware initialisation regression on wl1271

iwlwifi

* fix a race that users reported when we try to load the firmware
  and the hardware rfkill interrupt triggers at the same time
* fix a very visible bug in scheduled scan: the firmware
  doesn't support scheduled scan with no profile configured and
  the supplicant sometimes requests such scheduled scans
* build system fix to be able to link iwlwifi statically into kernel
* firmware name update for 8265
* typo fix in return value
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb3: fix up vpd strings for kstrto*()

The vpd strings are left justified, in a fixed length array, with possible
trailing white space and no NUL. So fix them up before calling kstrto*().

This is a recent regression which causes cxgb3 to fail to load.

Fixes: e72c932 ("cxgb3: Convert simple_strtoul to kstrtox")

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hv_netvsc: add software transmit timestamp support

Enable skb_tx_timestamp in hyperv netvsc.

Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: pass up EMSGSIZE msg for UDP socket in Ipv6

In ipv4, when the machine receives a ICMP_FRAG_NEEDED message, the
connected UDP socket will get EMSGSIZE message on its next read from the
socket.
However, this is not the case for ipv6.
This fix modifies the udp err handler in Ipv6 for ICMP6_PKT_TOOBIG to
make it similar to ipv4 behavior. That is when the machine gets an
ICMP6_PKT_TOOBIG message, the connected UDP socket will get EMSGSIZE
message on its next read from the socket.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>