cascardo/linux.git
7 years agoxen-netback: add control ring boilerplate
Paul Durrant [Fri, 13 May 2016 08:37:26 +0000 (09:37 +0100)]
xen-netback: add control ring boilerplate

My recent patch to include/xen/interface/io/netif.h defines a new shared
ring (in addition to the rx and tx rings) for passing control messages
from a VM frontend driver to a backend driver.

This patch adds the necessary code to xen-netback to map this new shared
ring, should it be created by a frontend, but does not add implementations
for any of the defined protocol messages. These are added in a subsequent
patch for clarity.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'cls_u32_hw_sw'
David S. Miller [Mon, 16 May 2016 17:30:57 +0000 (13:30 -0400)]
Merge branch 'cls_u32_hw_sw'

Sridhar Samudrala says:

====================
Enable SW only or HW only offloads with u32 classifier

This set of patches export TCA_CLS_FLAGS_SKIP_HW to userspace and also
introduces another flag TCA_CLS_FLAGS_SKIP_SW. These flags enable offloading
u32 filters to either SW or HW only.

The default semantics with no flags is to add the filter to HW if possible and
also into SW.
With SKIP_HW flag, the filter is only added to SW.
With SKIP_SW flag, the filter is added to HW and an error is returned
to user on failure.
These flags are mutually exclusive.
There was an earlier discussion on these semantics in the following email
thread.
http://thread.gmane.org/gmane.linux.network/401733
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: cls_u32: Add support for skip-sw flag to tc u32 classifier.
Samudrala, Sridhar [Fri, 13 May 2016 00:08:23 +0000 (17:08 -0700)]
net: cls_u32: Add support for skip-sw flag to tc u32 classifier.

On devices that support TC U32 offloads, this flag enables a filter to be
added only to HW. skip-sw and skip-hw are mutually exclusive flags. By
default without any flags, the filter is added to both HW and SW, but no
error checks are done in case of failure to add to HW. With skip-sw,
failure to add to HW is treated as an error.

Here is a sample script that adds 2 filters, one with skip-sw and the other
with skip-hw flag.

   # add ingress qdisc
   tc qdisc add dev p4p1 ingress

   # enable hw tc offload.
   ethtool -K p4p1 hw-tc-offload on

   # add u32 filter with skip-sw flag.
   tc filter add dev p4p1 parent ffff: protocol ip prio 99 \
      handle 800:0:1 u32 ht 800: flowid 800:1 \
      skip-sw \
      match ip src 192.168.1.0/24 \
      action drop

   # add u32 filter with skip-hw flag.
   tc filter add dev p4p1 parent ffff: protocol ip prio 99 \
      handle 800:0:2 u32 ht 800: flowid 800:2 \
      skip-hw \
      match ip src 192.168.2.0/24 \
      action drop

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: Move TCA_CLS_FLAGS_SKIP_HW to uapi header file.
Samudrala, Sridhar [Fri, 13 May 2016 00:08:22 +0000 (17:08 -0700)]
net: sched: Move TCA_CLS_FLAGS_SKIP_HW to uapi header file.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'hv_netvsc-races'
David S. Miller [Mon, 16 May 2016 17:26:01 +0000 (13:26 -0400)]
Merge branch 'hv_netvsc-races'

Vitaly Kuznetsov says:

====================
hv_netvsc: avoid races on mtu change/set channels

Changes since v1:
- Rebased to net-next [Haiyang Zhang]

Original description:

MTU change and set channels operations are implemented as netvsc device
re-creation destroying internal structures (struct net_device stays). This
is really unfortunate but there is no support from Hyper-V host to do it
in a different way. Such re-creation is unsurprisingly racy, Haiyang
reported a crash when netvsc_change_mtu() is racing with
netvsc_link_change() but I was able to identify additional races upon
investigation. Both netvsc_set_channels() and netvsc_change_mtu() race
against:
1) netvsc_link_change()
2) netvsc_remove()
3) netvsc_send()

To solve these issues without introducing new locks some refactoring is
required. We need to get rid of very complex link graph in all the
internal structures and avoid traveling through structures which are being
removed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agohv_netvsc: set nvdev link after populating chn_table
Vitaly Kuznetsov [Fri, 13 May 2016 11:55:25 +0000 (13:55 +0200)]
hv_netvsc: set nvdev link after populating chn_table

Crash in netvsc_send() is observed when netvsc device is re-created on
mtu change/set channels. The crash is caused by dereferencing of NULL
channel pointer which comes from chn_table. The root cause is a mixture
of two facts:
- we set nvdev pointer in net_device_context in alloc_net_device()
  before we populate chn_table.
- we populate chn_table[0] only.

The issue could be papered over by checking channel != NULL in
netvsc_send() but populating the whole chn_table and writing the
nvdev pointer afterwards seems more appropriate.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agohv_netvsc: synchronize netvsc_change_mtu()/netvsc_set_channels() with netvsc_remove()
Vitaly Kuznetsov [Fri, 13 May 2016 11:55:24 +0000 (13:55 +0200)]
hv_netvsc: synchronize netvsc_change_mtu()/netvsc_set_channels() with netvsc_remove()

When netvsc device is removed during mtu change or channels setup we get
into troubles as both paths are trying to remove the device. Synchronize
them with start_remove flag and rtnl lock.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agohv_netvsc: get rid of struct net_device pointer in struct netvsc_device
Vitaly Kuznetsov [Fri, 13 May 2016 11:55:23 +0000 (13:55 +0200)]
hv_netvsc: get rid of struct net_device pointer in struct netvsc_device

Simplify netvsvc pointer graph by getting rid of the redundant ndev
pointer. We can always get a pointer to struct net_device from somewhere
else.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agohv_netvsc: untangle the pointer mess
Vitaly Kuznetsov [Fri, 13 May 2016 11:55:22 +0000 (13:55 +0200)]
hv_netvsc: untangle the pointer mess

We have the following structures keeping netvsc adapter state:
- struct net_device
- struct net_device_context
- struct netvsc_device
- struct rndis_device
- struct hv_device
and there are pointers/dependencies between them:
- struct net_device_context is contained in struct net_device
- struct hv_device has driver_data pointer which points to
  'struct net_device' OR 'struct netvsc_device' depending on driver's
  state (!).
- struct net_device_context has a pointer to 'struct hv_device'.
- struct netvsc_device has pointers to 'struct hv_device' and
  'struct net_device_context'.
- struct rndis_device has a pointer to 'struct netvsc_device'.

Different functions get different structures as parameters and use these
pointers for traveling. The problem is (in addition to keeping in mind
this complex graph) that some of these structures (struct netvsc_device
and struct rndis_device) are being removed and re-created on mtu change
(as we implement it as re-creation of hyper-v device) so our travel using
these pointers is dangerous.

Simplify this to a the following:
- add struct netvsc_device pointer to struct net_device_context (which is
  a part of struct net_device and thus never disappears)
- remove struct hv_device and struct net_device_context pointers from
  struct netvsc_device
- replace pointer to 'struct netvsc_device' with pointer to
  'struct net_device'.
- always keep 'struct net_device' in hv_device driver_data.

We'll end up with the following 'circular' structure:

net_device:
 [net_device_context] -> netvsc_device -> rndis_device -> net_device
                      -> hv_device -> net_device

On MTU change we'll be removing the 'netvsc_device -> rndis_device'
branch and re-creating it making the synchronization easier.

There is one additional redundant pointer left, it is struct net_device
link in struct netvsc_device, it is going to be removed in a separate
commit.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agohv_netvsc: use start_remove flag to protect netvsc_link_change()
Vitaly Kuznetsov [Fri, 13 May 2016 11:55:21 +0000 (13:55 +0200)]
hv_netvsc: use start_remove flag to protect netvsc_link_change()

netvsc_link_change() can race with netvsc_change_mtu() or
netvsc_set_channels() as these functions destroy struct netvsc_device and
rndis filter. Use start_remove flag for syncronization. As
netvsc_change_mtu()/netvsc_set_channels() are called with rtnl lock held
we need to take it before checking start_remove value in
netvsc_link_change().

Reported-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agohv_netvsc: move start_remove flag to net_device_context
Vitaly Kuznetsov [Fri, 13 May 2016 11:55:20 +0000 (13:55 +0200)]
hv_netvsc: move start_remove flag to net_device_context

struct netvsc_device is destroyed on mtu change so keeping the
protection flag there is not a good idea. Move it to struct
net_device_context which is preserved.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agophy: add support for a reset-gpio specification
Uwe Kleine-König [Thu, 12 May 2016 10:00:33 +0000 (12:00 +0200)]
phy: add support for a reset-gpio specification

The framework only asserts (for now) that the reset gpio is not active.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Sun, 15 May 2016 17:47:27 +0000 (13:47 -0400)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2016-05-14

This series contains updates to i40e and i40evf.

Kevin adds support to disable link on all ports and changes bits set
for telling firmware the PHY needs to be modified by the driver.

Anjali adds a feature to enable/disable all multicast for a trusted
VF.  Added priv-flag knob to configure global true promiscuous
support.

Shannon adds the support code for calling the admin queue API call
aq_set_switch_config().

Mitch modifies the VF, to log a message if an untrusted VF attempts to
configure promiscuous mode, but lies to it and returns everything is ok
instead of returning an error.  Corrects the logic for reporting the
receive packet hash.  Fixed the adding of a broadcast filter for VFs,
since that all VSIs are configured to receive broadcasts as default,
so do not need to add a filter.

Catherine refactors the ethtool get_settings to report the possible
supported link modes from what we know about the current PHY type and
that with the firmware supported PHY types.

Jacob changes the driver to use WARN_ONCE in order to highlight the
issue, but do not display a warning every time when receive hang
message is received.

Akeem corrects receive ptype payload layer for non_tunneled IPv6, when
it should be layer 4 for UDP, instead of layer 3.

Dan Carpenter fixes an uninitialized variable bug.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'bnxt_en-next'
David S. Miller [Sun, 15 May 2016 17:35:49 +0000 (13:35 -0400)]
Merge branch 'bnxt_en-next'

Michael Chan says:

====================
bnxt_en: updates for net-next.

Non-critical bug fixes, improvements, a new ethtool feature, and a new
device ID.

v2: Fixed a bug in bnxt_get_module_eeprom() found by Ben Hutchings.

Ajit Khaparde (2):
  bnxt_en: Add Support for ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPRO
  bnxt_en: Report PCIe link speed and width during driver load

Michael Chan (6):
  bnxt_en: Reduce maximum ring pages if page size is 64K.
  bnxt_en: Improve the delay logic for firmware response.
  bnxt_en: Fix length value in dmesg log firmware error message.
  bnxt_en: Simplify and improve unsupported SFP+ module reporting.
  bnxt_en: Add BCM57314 device ID.
  bnxt_en: Use dma_rmb() instead of rmb().

Satish Baddipadige (1):
  bnxt_en: Fix invalid max channel parameter in ethtool -l.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobnxt_en: Use dma_rmb() instead of rmb().
Michael Chan [Sun, 15 May 2016 07:04:51 +0000 (03:04 -0400)]
bnxt_en: Use dma_rmb() instead of rmb().

Use the weaker but more appropriate dma_rmb() to order the reading of
the completion ring.

Suggested-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobnxt_en: Add BCM57314 device ID.
Michael Chan [Sun, 15 May 2016 07:04:50 +0000 (03:04 -0400)]
bnxt_en: Add BCM57314 device ID.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobnxt_en: Simplify and improve unsupported SFP+ module reporting.
Michael Chan [Sun, 15 May 2016 07:04:49 +0000 (03:04 -0400)]
bnxt_en: Simplify and improve unsupported SFP+ module reporting.

The current code is more complicated than necessary and can only report
unsupported SFP+ module if it is plugged in after the device is up.

Rename bnxt_port_module_event() to bnxt_get_port_module_status().  We
already have the current module_status in the link_info structure, so
just check that and report any unsupported SFP+ module status.  Delete
the unnecessary last_port_module_event.  Call this function at the
end of bnxt_open to report unsupported module already plugged in.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobnxt_en: Fix length value in dmesg log firmware error message.
Michael Chan [Sun, 15 May 2016 07:04:48 +0000 (03:04 -0400)]
bnxt_en: Fix length value in dmesg log firmware error message.

The len value in the hwrm error message is wrong.  Use the properly adjusted
value in the variable len.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobnxt_en: Improve the delay logic for firmware response.
Michael Chan [Sun, 15 May 2016 07:04:47 +0000 (03:04 -0400)]
bnxt_en: Improve the delay logic for firmware response.

The current code has 2 problems:

1. The maximum wait time is not long enough.  It is about 60% of the
duration specified by the firmware.  It is calling usleep_range(600, 800)
for every 1 msec we are supposed to wait.

2. The granularity of the delay is too coarse.  Many simple firmware
commands finish in 25 usec or less.

We fix these 2 issues by multiplying the original 1 msec loop counter by
40 and calling usleep_range(25, 40) for each iteration.

There is also a second delay loop to wait for the last DMA word to
complete.  This delay loop should be a very short 5 usec wait.

This change results in much faster bring-up/down time:

Before the patch:

time ip link set p4p1 up

real    0m0.120s
user    0m0.001s
sys     0m0.009s

After the patch:

time ip link set p4p1 up

real    0m0.030s
user    0m0.000s
sys     0m0.010s

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobnxt_en: Reduce maximum ring pages if page size is 64K.
Michael Chan [Sun, 15 May 2016 07:04:46 +0000 (03:04 -0400)]
bnxt_en: Reduce maximum ring pages if page size is 64K.

The chip supports 4K/8K/64K page sizes for the rings and we try to
match it to the CPU PAGE_SIZE.  The current page size limits for the rings
are based on 4K/8K page size. If the page size is 64K, these limits are
too large.  Reduce them appropriately.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobnxt_en: Report PCIe link speed and width during driver load
Ajit Khaparde [Sun, 15 May 2016 07:04:45 +0000 (03:04 -0400)]
bnxt_en: Report PCIe link speed and width during driver load

Add code to log a message during driver load indicating PCIe link
speed and width.

The log message will look like this:
bnxt_en 0000:86:00.0 eth0: PCIe: Speed 8.0GT/s Width x8

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobnxt_en: Add Support for ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPRO
Ajit Khaparde [Sun, 15 May 2016 07:04:44 +0000 (03:04 -0400)]
bnxt_en: Add Support for ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPRO

Add support to fetch the SFP EEPROM settings from the firmware
and display it via the ethtool -m command.  We support SFP+ and QSFP
modules.

v2: Fixed a bug in bnxt_get_module_eeprom() found by Ben Hutchings.

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobnxt_en: Fix invalid max channel parameter in ethtool -l.
Satish Baddipadige [Sun, 15 May 2016 07:04:43 +0000 (03:04 -0400)]
bnxt_en: Fix invalid max channel parameter in ethtool -l.

When there is only 1 MSI-X vector or in INTA mode, tx and rx pre-set
max channel parameters are shown incorrectly in ethtool -l.  With only 1
vector, bnxt_get_max_rings() will return -ENOMEM.  bnxt_get_channels
should check this return value, and set max_rx/max_tx to 0 if it is
non-zero.

Signed-off-by: Satish Baddipadige <sbaddipa@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Sun, 15 May 2016 17:32:12 +0000 (13:32 -0400)]
Merge git://git./linux/kernel/git/davem/net

The nf_conntrack_core.c fix in 'net' is not relevant in 'net-next'
because we no longer have a per-netns conntrack hash.

The ip_gre.c conflict as well as the iwlwifi ones were cases of
overlapping changes.

Conflicts:
drivers/net/wireless/intel/iwlwifi/mvm/tx.c
net/ipv4/ip_gre.c
net/netfilter/nf_conntrack_core.c

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sat, 14 May 2016 21:15:06 +0000 (14:15 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix mvneta/bm dependencies, from Arnd Bergmann.

 2) RX completion hw bug workaround in bnxt_en, from Michael Chan.

 3) Kernel pointer leak in nf_conntrack, from Linus.

 4) Hoplimit route attribute limits not enforced properly, from Paolo
    Abeni.

 5) qlcnic driver NULL deref fix from Dan Carpenter.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  arm64: bpf: jit JMP_JSET_{X,K}
  net/route: enforce hoplimit max value
  nf_conntrack: avoid kernel pointer value leak in slab name
  drivers: net: xgene: fix register offset
  drivers: net: xgene: fix statistics counters race condition
  drivers: net: xgene: fix ununiform latency across queues
  drivers: net: xgene: fix sharing of irqs
  drivers: net: xgene: fix IPv4 forward crash
  xen-netback: fix extra_info handling in xenvif_tx_err()
  net: mvneta: bm: fix dependencies again
  bnxt_en: Add workaround to detect bad opaque in rx completion (part 2)
  bnxt_en: Add workaround to detect bad opaque in rx completion (part 1)
  qlcnic: potential NULL dereference in qlcnic_83xx_get_minidump_template()

7 years agonet: switchdev: Drop EXPERIMENTAL from description
Florian Fainelli [Sat, 14 May 2016 19:49:54 +0000 (12:49 -0700)]
net: switchdev: Drop EXPERIMENTAL from description

Switchdev has been around for quite a while now, putting "EXPERIMENTAL"
in the description is no longer accurate, drop it.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoarm64: bpf: jit JMP_JSET_{X,K}
Zi Shen Lim [Fri, 13 May 2016 06:37:58 +0000 (23:37 -0700)]
arm64: bpf: jit JMP_JSET_{X,K}

Original implementation commit e54bcde3d69d ("arm64: eBPF JIT compiler")
had the relevant code paths, but due to an oversight always fail jiting.

As a result, we had been falling back to BPF interpreter whenever a BPF
program has JMP_JSET_{X,K} instructions.

With this fix, we confirm that the corresponding tests in lib/test_bpf
continue to pass, and also jited.

...
[    2.784553] test_bpf: #30 JSET jited:1 188 192 197 PASS
[    2.791373] test_bpf: #31 tcpdump port 22 jited:1 325 677 625 PASS
[    2.808800] test_bpf: #32 tcpdump complex jited:1 323 731 991 PASS
...
[    3.190759] test_bpf: #237 JMP_JSET_K: if (0x3 & 0x2) return 1 jited:1 110 PASS
[    3.192524] test_bpf: #238 JMP_JSET_K: if (0x3 & 0xffffffff) return 1 jited:1 98 PASS
[    3.211014] test_bpf: #249 JMP_JSET_X: if (0x3 & 0x2) return 1 jited:1 120 PASS
[    3.212973] test_bpf: #250 JMP_JSET_X: if (0x3 & 0xffffffff) return 1 jited:1 89 PASS
...

Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler")
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/route: enforce hoplimit max value
Paolo Abeni [Fri, 13 May 2016 16:33:41 +0000 (18:33 +0200)]
net/route: enforce hoplimit max value

Currently, when creating or updating a route, no check is performed
in both ipv4 and ipv6 code to the hoplimit value.

The caller can i.e. set hoplimit to 256, and when such route will
 be used, packets will be sent with hoplimit/ttl equal to 0.

This commit adds checks for the RTAX_HOPLIMIT value, in both ipv4
ipv6 route code, substituting any value greater than 255 with 255.

This is consistent with what is currently done for ADVMSS and MTU
in the ipv4 code.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonf_conntrack: avoid kernel pointer value leak in slab name
Linus Torvalds [Sat, 14 May 2016 18:11:44 +0000 (11:11 -0700)]
nf_conntrack: avoid kernel pointer value leak in slab name

The slab name ends up being visible in the directory structure under
/sys, and even if you don't have access rights to the file you can see
the filenames.

Just use a 64-bit counter instead of the pointer to the 'net' structure
to generate a unique name.

This code will go away in 4.7 when the conntrack code moves to a single
kmemcache, but this is the backportable simple solution to avoiding
leaking kernel pointers to user space.

Fixes: 5b3501faa874 ("netfilter: nf_conntrack: per netns nf_conntrack_cachep")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sat, 14 May 2016 18:59:43 +0000 (11:59 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
 "Overlayfs fixes from Miklos, assorted fixes from me.

  Stable fodder of varying severity, all sat in -next for a while"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ovl: ignore permissions on underlying lookup
  vfs: add lookup_hash() helper
  vfs: rename: check backing inode being equal
  vfs: add vfs_select_inode() helper
  get_rock_ridge_filename(): handle malformed NM entries
  ecryptfs: fix handling of directory opening
  atomic_open(): fix the handling of create_error
  fix the copy vs. map logics in blk_rq_map_user_iov()
  do_splice_to(): cap the size before passing to ->splice_read()

7 years agoi40e: fix an uninitialized variable bug
Dan Carpenter [Thu, 5 May 2016 13:18:02 +0000 (16:18 +0300)]
i40e: fix an uninitialized variable bug

We removed this initialization but it is required.  Let's put it back.

Fixes: 895106a577c4 ('i40e: trivial fixes')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: Bump version from 1.5.10 to 1.5.16
Bimmy Pujari [Tue, 3 May 2016 22:13:20 +0000 (15:13 -0700)]
i40e: Bump version from 1.5.10 to 1.5.16

Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: don't add broadcast filter for VFs
Mitch Williams [Tue, 3 May 2016 22:13:19 +0000 (15:13 -0700)]
i40e: don't add broadcast filter for VFs

Now that all VSIs are configured to receive broadcasts as default, we
don't need to add a filter. This eliminates an annoying but harmless
error message each time VFs are created or reset.

Change-ID: I4cd6339684df45b0d2722133eeb84c14fa93ea19
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e/i40evf: properly report Rx packet hash
Mitch Williams [Tue, 3 May 2016 22:13:18 +0000 (15:13 -0700)]
i40e/i40evf: properly report Rx packet hash

This logic is inverted. If the RXHASH flag is set, then we should go
ahead and call skb_set_hash.

Change-ID: Ib2e30356dced1d3e939c8061ab6ad5bd94197e7c
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: set context to use VSI RSS LUT for SR-IOV
Ashish Shah [Tue, 3 May 2016 22:13:17 +0000 (15:13 -0700)]
i40e: set context to use VSI RSS LUT for SR-IOV

For the SR-IOV VSIs, when the queue filtering section is valid,
the RSS LUT needs to be set to use the VSI specific lookup table
(otherwise it will use the PF RSS LUT table).

Change-ID: Ia9377cc818078238a75c3bdeade1b593a91b3480
Signed-off-by: Ashish Shah <ashish.n.shah@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: Correct UDP packet header for non_tunnel-ipv6
Akeem G Abodunrin [Tue, 3 May 2016 22:13:16 +0000 (15:13 -0700)]
i40e: Correct UDP packet header for non_tunnel-ipv6

This patch corrects Rx ptype payload layer for non_tunneled ipv6. It
should be layer 4 for UDP, instead of layer 3.

Change-ID: I9382e4458ab3c4e58f6d2e9f195d5d4ee513805e
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: change Rx hang message into a WARN_ONCE
Jacob Keller [Tue, 3 May 2016 22:13:15 +0000 (15:13 -0700)]
i40e: change Rx hang message into a WARN_ONCE

Use WARN_ONCE in order to highlight the issue, but don't display
a warning every time. The user should be able to see the ethtool counter
we created if necessary to see how often it is occurring.

Change-ID: I40c4ea159819b64a7d33b7f5716749089791533a
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: Refactor ethtool get_settings
Catherine Sullivan [Tue, 3 May 2016 22:13:14 +0000 (15:13 -0700)]
i40e: Refactor ethtool get_settings

Previously we were only looking at the FW supported PHY types if link
was down, because we want to be more specific when link is up. This
refactor changes this. When link is down, we still rely on the FW
supported PHY types, but when link is up, we select the possible
supported link modes from what we know about the current PHY type, and
AND that with the FW supported PHY types.

Change-ID: Ice5dad83f2a17932b0b8b59f07439696ad6aa013
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: lie to the VF
Mitch Williams [Tue, 3 May 2016 22:13:13 +0000 (15:13 -0700)]
i40e: lie to the VF

If an untrusted VF attempts to configure promiscuous mode, log a message
pointing out its naughty behavior. But then, instead of returning an
error to the offender, just lie to it and say everything's OK. It will
continue on its way, thinking it's in promiscuous mode, but receiving no
packets except its own.

Change-ID: I63369215b1720f3c531eedfc06af86ff8c0e3dc8
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: Add vf-true-promisc-support priv flag
Anjali Singhai Jain [Tue, 3 May 2016 22:13:12 +0000 (15:13 -0700)]
i40e: Add vf-true-promisc-support priv flag

This patch adds priv-flag knob to configure global true promisc
support. With this patch the user can decide the flavor of
promiscuous that the VFs will see when promiscuous mode is enabled
on the interface. Since this a global setting for the whole device,
the priv-flag is exposed only on the first PF of the device.

The default is true promisc support is off, which means the promisc
mode for the VF will be limited/defport mode.

For the PF, we still will be in limited promisc unless in MFP mode
irrespective of the flavor picked through this knob.

Usage:
On PF0
ethtool --show-priv-flags p261p1
Private flags for p261p1:
MFP                    : off
LinkPolling            : off
flow-director-atr      : on
veb-stats              : off
hw-atr-eviction        : off
vf-true-promisc-support: off

to enable setting true promisc
ethtool --set-priv-flags p261p1 vf-true-promisc-support on

At this point if the VF is set to trust and promisc is enabled
on the VF through
ip link set ... promisc on
The VF/VFs will be able to see ALL ingress traffic

Change-Id: I8fac4b6eb1af9ca77b5376b79c50bdce5055bd94
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: Implement the API function for aq_set_switch_config
Shannon Nelson [Tue, 3 May 2016 22:13:11 +0000 (15:13 -0700)]
i40e: Implement the API function for aq_set_switch_config

Add the support code for calling the AdminQ API call aq_set_switch_config

Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: Add allmulti support for the VF
Anjali Singhai Jain [Tue, 3 May 2016 22:13:10 +0000 (15:13 -0700)]
i40e: Add allmulti support for the VF

This patch enables a feature to enable/disable all multicast
for a trusted VF.

Change-Id: I926eba7f8850c8d40f8ad7e08bbe4056bbd3985f
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: Add support for disabling all link and change bits needed for PHY interactions
Kevin Scott [Tue, 3 May 2016 22:13:09 +0000 (15:13 -0700)]
i40e: Add support for disabling all link and change bits needed for PHY interactions

Add flag to tell firmware to disable link on all ports.

This patch changes the bits set for telling firmware the PHY needs
to be modified by driver.  Without this patch, the setting will only
set that mode for the current port on the device.  Because the
MDIO interface is common for the copper device. The command needs to
set the mode for all ports.

Change-ID: I8baa7da91d384291ac95b41ae1a516604f8eb67f
Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoMerge branch 'xgene-fixes'
David S. Miller [Sat, 14 May 2016 01:12:07 +0000 (21:12 -0400)]
Merge branch 'xgene-fixes'

Iyappan Subramanian says:

====================
drivers: net: xgene: Bug fixes

This patch set addresses the following bug fixes that were found during testing.

  1. IPv4 forward test crash
    - drivers: net: xgene: fix IPv4 forward crash

  2. Sharing of irqs
    - drivers: net: xgene: fix sharing of irqs

  3. Ununiform latency across queues
    - drivers: net: xgene: fix ununiform latency across queues

  4. Fix statistics counters race condition
    - drivers: net: xgene: fix statistics counters race condition

  5. Correcting register offset and field lengths
    - drivers: net: xgene: fix register offset

v2: Address review comments from v1
- Defer TSO fix, and reposting all other patches from v1

v1:
- Initial version
====================

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodrivers: net: xgene: fix register offset
Iyappan Subramanian [Fri, 13 May 2016 23:53:01 +0000 (16:53 -0700)]
drivers: net: xgene: fix register offset

This patch fixes SG_RX_DV_GATE_REG_0_ADDR register offset
and ring state field lengths.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodrivers: net: xgene: fix statistics counters race condition
Iyappan Subramanian [Fri, 13 May 2016 23:53:00 +0000 (16:53 -0700)]
drivers: net: xgene: fix statistics counters race condition

This patch fixes the race condition on updating the statistics
counters by moving the counters to the ring structure.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodrivers: net: xgene: fix ununiform latency across queues
Iyappan Subramanian [Fri, 13 May 2016 23:52:59 +0000 (16:52 -0700)]
drivers: net: xgene: fix ununiform latency across queues

This patch addresses ununiform latency across queues by adding
more queues to match with, upto number of CPU cores.

Also, number of interrupts are increased and the channel numbers
are reordered.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodrivers: net: xgene: fix sharing of irqs
Iyappan Subramanian [Fri, 13 May 2016 23:52:58 +0000 (16:52 -0700)]
drivers: net: xgene: fix sharing of irqs

Since hardware doesn't allow sharing of interrupts,
this patch fixes the same by removing IRQF_SHARED flag.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodrivers: net: xgene: fix IPv4 forward crash
Iyappan Subramanian [Fri, 13 May 2016 23:52:57 +0000 (16:52 -0700)]
drivers: net: xgene: fix IPv4 forward crash

This patch fixes the crash observed during IPv4 forward test by
setting the drop field in the dbptr.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Sat, 14 May 2016 01:08:24 +0000 (21:08 -0400)]
Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2016-05-13

This series contains updates to e1000e, igb and igbvf.

Steve Shih fixes an issue for disabling auto-negotiation and forcing
speed and duplex settings for non-copper media.

Brian Walsh cleanups some inconsistency in the use of return variables
names to avoid confusion.

Jake cleans up the drivers to use the BIT() macro when it can, which will
future proof the drivers for GCC 6 when it gets released.  Cleaned up
dead code which was never being used.  Also fixed e1000e, where it was
incorrectly restting the SYSTIM registers every time the ioctl was being
run.

Denys Vlasenko fixes an oversight where incvalue variable holds a 32
bit value so we should declare it as such, instead of 64 bits.  Also
fixed an overflow check, where two reads are the same, then it is not
an overflow.

Nathan Sullivan fixes the PTP timestamps for transmit and receive
latency based on the current link speed.

Alexander Duyck adds support for partial GSO segmentation in the case
of tunnels for igb and igbvf.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj...
Linus Torvalds [Fri, 13 May 2016 23:26:46 +0000 (16:26 -0700)]
Merge branch 'for-4.6-fixes' of git://git./linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
 "During v4.6-rc1 cgroup namespace support was merged.  There is an
  issue where it's impossible to tell whether a given cgroup mount point
  is bind mounted or namespaced.  Serge has been working on the issue
  but it took longer than expected to resolve, so the late pull request.

  Given that it's a completely new feature and the patches don't touch
  anything else, the risk seems acceptable.  However, if this is too
  late, an alternative is plugging new cgroup ns creation for v4.6 and
  retrying for v4.7"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix compile warning
  kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call
  cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces
  kernfs_path_from_node_locked: don't overwrite nlen

7 years agoMerge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Linus Torvalds [Fri, 13 May 2016 23:16:51 +0000 (16:16 -0700)]
Merge branch 'for-4.6-fixes' of git://git./linux/kernel/git/tj/wq

Pull workqueue fix from Tejun Heo:
 "CPU hotplug callbacks can invoke DOWN_FAILED w/o preceding
  DOWN_PREPARE which can trigger a WARN_ON() in workqueue.

  The bug has been there for a very long time.  It only triggers if CPU
  down fails at a specific point and I don't think it has adverse
  effects other than the warning messages.  The fix is very low impact"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix rebind bound workers warning

7 years agoe1000e: don't modify SYSTIM registers during SIOCSHWTSTAMP ioctl
Jacob Keller [Wed, 20 Apr 2016 18:36:42 +0000 (11:36 -0700)]
e1000e: don't modify SYSTIM registers during SIOCSHWTSTAMP ioctl

The e1000e_config_hwtstamp function was incorrectly resetting the SYSTIM
registers every time the ioctl was being run. If you happened to be
running ptp4l and lost the PTP connect (removing cable, or blocking the
UDP traffic for example), then ptp4l will eventually perform a restart
which involves re-requesting timestamp settings. In e1000e this has the
unfortunate and incorrect result of resetting SYSTIME to the kernel
time. Since kernel time is usually in UTC, and PTP time is in TAI, this
results in the leap second being re-applied.

Fix this by extracting the SYSTIME reset out into its own function,
e1000e_ptp_reset, which we call during reset to restore the hardware
registers. This function will (a) restart the timecounter based on the
new system time, (b) restore the previous PPB setting, and (c) restore
the previous hwtstamp settings.

In order to perform (b), I had to modify the adjfreq ptp function
pointer to store the old delta each time it is called. This also has the
side effect of restoring the correct base timinca register correctly.
The driver does not need to explicitly zero the ptp_delta variable since
the entire adapter structure comes zero-initialized.

Reported-by: Brian Walsh <brian@walsh.ws>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Brian Walsh <brian@walsh.ws>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoigb/igbvf: Add support for GSO partial
Alexander Duyck [Thu, 14 Apr 2016 21:19:38 +0000 (17:19 -0400)]
igb/igbvf: Add support for GSO partial

This patch adds support for partial GSO segmentation in the case of
tunnels.  Specifically with this change the driver an perform segmentation
as long as the frame either has IPv6 inner headers, or we are allowed to
mangle the IP IDs on the inner header.  This is needed because we will not
be modifying any fields from the start of the start of the outer transport
header to the start of the inner transport header as we are treating them
like they are just a block of IP options.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoe1000e: mark shifted values as unsigned
Jacob Keller [Wed, 13 Apr 2016 23:08:33 +0000 (16:08 -0700)]
e1000e: mark shifted values as unsigned

The E1000_ICH_NVM_SIG_MASK value is shifted, out to the 31st bit, which
is the signed bit for signed constants. Mark these values as unsigned to
prevent compiler warnings and issues on platforms which a different
signed bit implementation.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoe1000e: use BIT() macro for bit defines
Jacob Keller [Wed, 13 Apr 2016 23:08:32 +0000 (16:08 -0700)]
e1000e: use BIT() macro for bit defines

This prevents signed bitshift issues when the shift would overwrite the
signed bit, and prevents making this mistake in the future when copying
and modifying code.

Use GENMASK or the unsigned postfix for cases which aren't suitable for
BIT() macro.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoigbvf: use BIT() macro instead of shifts
Jacob Keller [Wed, 13 Apr 2016 23:08:31 +0000 (16:08 -0700)]
igbvf: use BIT() macro instead of shifts

To prevent signed bitshift issues, and improve code readability, use the
BIT() macro. Also make use of GENMASK or the unsigned postfix where this
is more appropriate than BIT()

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoigbvf: remove unused variable and dead code
Jacob Keller [Wed, 13 Apr 2016 23:08:30 +0000 (16:08 -0700)]
igbvf: remove unused variable and dead code

The variable rdlen is set but never used, and thus setting it is dead
code. Remove it.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoigb: adjust PTP timestamps for Tx/Rx latency
Nathan Sullivan [Tue, 3 May 2016 23:10:56 +0000 (18:10 -0500)]
igb: adjust PTP timestamps for Tx/Rx latency

Table 7-62 on page 338 of the i210 datasheet lists TX and RX latencies
for the various speeds the chip supports.  To give better PTP timestamp
accuracy, adjust the timestamps by the amounts Intel gives based on
current link speed.

Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoe1000e: e1000e_cyclecounter_read(): do overflow check only if needed
Denys Vlasenko [Wed, 20 Apr 2016 15:45:56 +0000 (17:45 +0200)]
e1000e: e1000e_cyclecounter_read(): do overflow check only if needed

SYSTIMH:SYSTIML registers are incremented by 24-bit value TIMINCA[23..0]

er32(SYSTIML) are probably moderately expensive (they are pci bus reads).
Can we avoid one of them? Yes, we can.

If the SYSTIML value we see is smaller than 0xff000000, the overflow
into SYSTIMH would require at least two increments.

We do two reads, er32(SYSTIML) and er32(SYSTIMH), in this order.

Even if one increment happens between them, the overflow into SYSTIMH
is impossible, and we can avoid doing another er32(SYSTIML) read
and overflow check.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoe1000e: e1000e_cyclecounter_read(): fix er32(SYSTIML) overflow check
Denys Vlasenko [Wed, 20 Apr 2016 15:45:55 +0000 (17:45 +0200)]
e1000e: e1000e_cyclecounter_read(): fix er32(SYSTIML) overflow check

If two consecutive reads of the counter are the same, it is also
not an overflow.  "systimel_1 < systimel_2" should be
"systimel_1 <= systimel_2".

Before the patch, we could perform an *erroneous* correction:

Let's say that systimel_1 == systimel_2 == 0xffffffff.
"systimel_1 < systimel_2" is false, we think it's an overflow,
we read "systimeh = er32(SYSTIMH)" which meanwhile had incremented,
and use "(systimeh << 32) + systimel_2" value which is 2^32 too large.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: intel-wired-lan@lists.osuosl.org
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoe1000e: e1000e_cyclecounter_read(): incvalue is 32 bits, not 64
Denys Vlasenko [Wed, 20 Apr 2016 15:45:54 +0000 (17:45 +0200)]
e1000e: e1000e_cyclecounter_read(): incvalue is 32 bits, not 64

"incvalue" variable holds a result of "er32(TIMINCA) &
E1000_TIMINCA_INCVALUE_MASK" and used in "do_div(temp, incvalue)"
as a divisor.

Thus, "u64 incvalue" declaration is probably a mistake.
Even though it seems to be a harmless one, let's fix it.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoigb: make igb_update_pf_vlvf static
Jacob Keller [Wed, 13 Apr 2016 23:08:29 +0000 (16:08 -0700)]
igb: make igb_update_pf_vlvf static

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoigb: use BIT() macro or unsigned prefix
Jacob Keller [Wed, 13 Apr 2016 23:08:28 +0000 (16:08 -0700)]
igb: use BIT() macro or unsigned prefix

For bitshifts, we should make use of the BIT macro when possible, and
ensure that other bitshifts are marked as unsigned. This helps prevent
signed bitshift errors, and ensures similar style.

Make use of GENMASK and the unsigned postfix where BIT() isn't
appropriate.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoe1000e: Cleanup consistency in ret_val variable usage
Brian Walsh [Wed, 13 Apr 2016 03:22:30 +0000 (23:22 -0400)]
e1000e: Cleanup consistency in ret_val variable usage

Fixed the file to use a consistent ret_val for return value checking.

Signed-off-by: Brian Walsh <brian@walsh.ws>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoe1000e: fix ethtool autoneg off for non-copper
Steve Shih [Tue, 5 Apr 2016 18:30:03 +0000 (11:30 -0700)]
e1000e: fix ethtool autoneg off for non-copper

This patch fixes the issues for disabling auto-negotiation and forcing
speed and duplex settings for the non-copper media.

For non-copper media, e1000_get_settings should return ETH_TP_MDI_INVALID for
eth_tp_mdix_ctrl instead of ETH_TP_MDI_AUTO so subsequent e1000_set_settings
call would not fail with -EOPNOTSUPP.

e1000_set_spd_dplx should not automatically turn autoneg back on for forced
1000 Mbps full duplex settings for non-copper media.

Cc: xe-kernel@external.cisco.com
Cc: Daniel Walker <dwalker@fifo99.com>
Signed-off-by: Steve Shih <sshih@cisco.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 13 May 2016 19:21:17 +0000 (12:21 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fix from Ingo Molnar:
 "This is a revert to fix an interactivity problem.

  The proper fixes for the problems that the reverted commit exposed are
  now in sched/core (consisting of 3 patches), but were too risky for
  v4.6 and will arrive in the v4.7 merge window"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "sched/fair: Fix fairness issue on migration"

7 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 13 May 2016 18:54:02 +0000 (11:54 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "An uncharacteristically large number of bugs popped up in the last
  week:

   - various tooling fixes, two crashes and build problems
   - two Intel PT fixes
   - an KNL uncore driver fix
   - an Intel PMU driver fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf stat: Fallback to user only counters when perf_event_paranoid > 1
  perf evsel: Handle EACCESS + perf_event_paranoid=2 in fallback()
  perf evsel: Improve EPERM error handling in open_strerror()
  tools lib traceevent: Do not reassign parg after collapse_tree()
  perf probe: Check if dwarf_getlocations() is available
  perf dwarf: Guard !x86_64 definitions under #ifdef else clause
  perf tools: Use readdir() instead of deprecated readdir_r()
  perf thread_map: Use readdir() instead of deprecated readdir_r()
  perf script: Use readdir() instead of deprecated readdir_r()
  perf tools: Use readdir() instead of deprecated readdir_r()
  perf/core: Disable the event on a truncated AUX record
  perf/x86/intel/pt: Generate PMI in the STOP region as well
  perf/x86: Fix undefined shift on 32-bit kernels
  perf/x86/msr: Fix SMI overflow
  perf/x86/intel/uncore: Fix CHA registers configuration procedure for Knights Landing platform
  perf diff: Fix duplicated output column

7 years agoMerge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm...
Linus Torvalds [Fri, 13 May 2016 16:52:00 +0000 (09:52 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
 "Three more bug fixes for ARM SoCs this week:

   - The Atmel sama5d2 was registering the wrong NFC device type

   - On Atmel sam9x5, the power management controller had an incorrect
     register area size

   - On ARM64 Allwinner machine was not secting the generic irqchip
     code, causing build errors in some configurations"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC
  arm64/sunxi: 4.6-rc1: Add dependency on generic irq chip
  ARM: dts: at91: sama5d2: use "atmel,sama5d3-nfc" compatible for nfc

7 years agoMerge tag 'regulator-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 13 May 2016 16:46:00 +0000 (09:46 -0700)]
Merge tag 'regulator-fix-v4.6-rc7' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "A small collection of driver specific fixes for the regulator
  subsysetem:

   - Fix handling of probe deferral for GPIO regulators

   - Fix a typo in the module alias for DA9053

   - Fix the definition of BUCK9 in the S2MPS11 driver.  This change
     looks larger than it is because an irregularity in the hardware
     means that the macro used to define bucks 6-10 needs duplicating
     and tweaking to have a separate macro for 9

   - Fix a series of errors in the definitions of the LDOs the AXP20x
     regulators, some of which had always been present and some of which
     were introduced in the merge window"

* tag 'regulator-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: da9063: Correct module alias prefix to fix module autoloading
  regulator: axp20x: Fix axp22x ldo_io registration error on cold boot
  regulator: axp20x: Fix axp22x ldo_io voltage ranges
  regulator: axp20x: Fix LDO4 linear voltage range
  regulator: s2mps11: Fix invalid selector mask and voltages for buck9
  regulator: gpio: check return value of of_get_named_gpio

7 years agoMerge tag 'regmap-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 13 May 2016 16:40:32 +0000 (09:40 -0700)]
Merge tag 'regmap-fix-v4.6-rc7' of git://git./linux/kernel/git/broonie/regmap

Pull regmap fixes from Mark Brown:
 "This is rather too late so it'd be completely understandable if you
  don't want to pull it at this point, I had thought I'd sent this
  earlier but it seems I didn't.  Everything has been in -next for some
  time now.

  The main set of fixes here are mopping up some more issues with MMIO,
  fixing handling of endianness configuration in DT (which just wasn't
  working at all) and cases where the register and value endianness are
  different.

  There is also a fix for bulk register reads on SPMI"

* tag 'regmap-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case
  regmap: mmio: Explicitly say little endian is the defualt in the bus config
  regmap: mmio: Parse endianness definitions from DT
  regmap: Fix implicit inclusion of device.h
  regmap: mmio: Fix value endianness selection
  regmap: fix documentation to match code

7 years agoMerge tag 'media/v4.6-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Fri, 13 May 2016 16:34:59 +0000 (09:34 -0700)]
Merge tag 'media/v4.6-6' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fix from Mauro Carvalho Chehab:
 "A revert fixing a breakage that caused an OOPS on all VB2-based DVB
  drivers.

  We already have a proper fix, but it sounds safer to keep it being
  tested for a while and not hurry, to avoid the risk of another
  regression, specially since this is meant to be c/c to stable.  So,
  for now, let's just revert the broken patch"

* tag 'media/v4.6-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  Revert "[media] videobuf2-v4l2: Verify planes array in buffer dequeueing"

7 years agoMerge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Fri, 13 May 2016 16:27:05 +0000 (09:27 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "A bunch of radeon displayport mode setting fixes, and some misc i915
  fixes.

  There is one revert, the MST audio code in i915 was causing some
  oopses, so we've decided just to drop it until next kernel when we can
  fix it properly"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/amdgpu: fix DP mode validation
  drm/radeon: fix DP mode validation
  drm/i915: Bail out of pipe config compute loop on LPT
  drm/radeon: fix PLL sharing on DCE6.1 (v2)
  drm/radeon: fix DP link training issue with second 4K monitor
  Revert "drm/i915: start adding dp mst audio"
  drm/i915/bdw: Add missing delay during L3 SQC credit programming
  drm/i915/lvds: separate border enable readout from panel fitter
  drm/i915: Update CDCLK_FREQ register on BDW after changing cdclk frequency

7 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Fri, 13 May 2016 16:21:31 +0000 (09:21 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "This fixes a bug in the RSA self-test that may cause crashes on some
  architectures such as SPARC"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: testmgr - Use kmalloc memory for RSA input

7 years agoMerge remote-tracking branches 'regulator/fix/axp20x', 'regulator/fix/da9063', 'regul...
Mark Brown [Fri, 13 May 2016 10:11:08 +0000 (11:11 +0100)]
Merge remote-tracking branches 'regulator/fix/axp20x', 'regulator/fix/da9063', 'regulator/fix/gpio' and 'regulator/fix/s2mps11' into regulator-linus

7 years agoMerge remote-tracking branches 'regmap/fix/be', 'regmap/fix/doc' and 'regmap/fix...
Mark Brown [Fri, 13 May 2016 09:36:10 +0000 (10:36 +0100)]
Merge remote-tracking branches 'regmap/fix/be', 'regmap/fix/doc' and 'regmap/fix/spmi' into regmap-linus

7 years agoMerge remote-tracking branch 'regmap/fix/mmio' into regmap-linus
Mark Brown [Fri, 13 May 2016 09:36:09 +0000 (10:36 +0100)]
Merge remote-tracking branch 'regmap/fix/mmio' into regmap-linus

7 years agoMerge branch 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux into drm...
Dave Airlie [Fri, 13 May 2016 06:03:39 +0000 (16:03 +1000)]
Merge branch 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

DP mode validation regression fix.
* 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux:
  drm/amdgpu: fix DP mode validation
  drm/radeon: fix DP mode validation

7 years agoxen-netback: fix extra_info handling in xenvif_tx_err()
Paul Durrant [Thu, 12 May 2016 13:43:03 +0000 (14:43 +0100)]
xen-netback: fix extra_info handling in xenvif_tx_err()

Patch 562abd39 "xen-netback: support multiple extra info fragments
passed from frontend" contained a mistake which can result in an in-
correct number of responses being generated when handling errors
encountered when processing packets containing extra info fragments.
This patch fixes the problem.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reported-by: Jan Beulich <JBeulich@suse.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoudp: Resolve NULL pointer dereference over flow-based vxlan device
Alexander Duyck [Thu, 12 May 2016 23:23:44 +0000 (16:23 -0700)]
udp: Resolve NULL pointer dereference over flow-based vxlan device

While testing an OpenStack configuration using VXLANs I saw the following
call trace:

 RIP: 0010:[<ffffffff815fad49>] udp4_lib_lookup_skb+0x49/0x80
 RSP: 0018:ffff88103867bc50  EFLAGS: 00010286
 RAX: ffff88103269bf00 RBX: ffff88103269bf00 RCX: 00000000ffffffff
 RDX: 0000000000004300 RSI: 0000000000000000 RDI: ffff880f2932e780
 RBP: ffff88103867bc60 R08: 0000000000000000 R09: 000000009001a8c0
 R10: 0000000000004400 R11: ffffffff81333a58 R12: ffff880f2932e794
 R13: 0000000000000014 R14: 0000000000000014 R15: ffffe8efbfd89ca0
 FS:  0000000000000000(0000) GS:ffff88103fd80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000488 CR3: 0000000001c06000 CR4: 00000000001426e0
 Stack:
  ffffffff81576515 ffffffff815733c0 ffff88103867bc98 ffffffff815fcc17
  ffff88103269bf00 ffffe8efbfd89ca0 0000000000000014 0000000000000080
  ffffe8efbfd89ca0 ffff88103867bcc8 ffffffff815fcf8b ffff880f2932e794
 Call Trace:
  [<ffffffff81576515>] ? skb_checksum+0x35/0x50
  [<ffffffff815733c0>] ? skb_push+0x40/0x40
  [<ffffffff815fcc17>] udp_gro_receive+0x57/0x130
  [<ffffffff815fcf8b>] udp4_gro_receive+0x10b/0x2c0
  [<ffffffff81605863>] inet_gro_receive+0x1d3/0x270
  [<ffffffff81589e59>] dev_gro_receive+0x269/0x3b0
  [<ffffffff8158a1b8>] napi_gro_receive+0x38/0x120
  [<ffffffffa0871297>] gro_cell_poll+0x57/0x80 [vxlan]
  [<ffffffff815899d0>] net_rx_action+0x160/0x380
  [<ffffffff816965c7>] __do_softirq+0xd7/0x2c5
  [<ffffffff8107d969>] run_ksoftirqd+0x29/0x50
  [<ffffffff8109a50f>] smpboot_thread_fn+0x10f/0x160
  [<ffffffff8109a400>] ? sort_range+0x30/0x30
  [<ffffffff81096da8>] kthread+0xd8/0xf0
  [<ffffffff81693c82>] ret_from_fork+0x22/0x40
  [<ffffffff81096cd0>] ? kthread_park+0x60/0x60

The following trace is seen when receiving a DHCP request over a flow-based
VXLAN tunnel.  I believe this is caused by the metadata dst having a NULL
dev value and as a result dev_net(dev) is causing a NULL pointer dereference.

To resolve this I am replacing the check for skb_dst(skb)->dev with just
skb->dev.  This makes sense as the callers of this function are usually in
the receive path and as such skb->dev should always be populated.  In
addition other functions in the area where these are called are already
using dev_net(skb->dev) to determine the namespace the UDP packet belongs
in.

Fixes: 63058308cd55 ("udp: Add udp6_lib_lookup_skb and udp4_lib_lookup_skb")
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosunrpc: set SOCK_FASYNC
Eric Dumazet [Fri, 13 May 2016 04:41:39 +0000 (21:41 -0700)]
sunrpc: set SOCK_FASYNC

sunrpc is using SOCKWQ_ASYNC_NOSPACE without setting SOCK_FASYNC,
so the recent optimizations done in sk_set_bit() and sk_clear_bit()
broke it.

There is still the risk that a subsequent sock_fasync() call
would clear SOCK_FASYNC, but sunrpc does not use this yet.

Fixes: 9317bb69824e ("net: SOCKWQ_ASYNC_NOSPACE optimizations")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jiri Pirko <jiri@resnulli.us>
Reported-by: Huang, Ying <ying.huang@intel.com>
Tested-by: Jiri Pirko <jiri@resnulli.us>
Tested-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'perf-urgent-for-mingo-20160512' of git://git.kernel.org/pub/scm/linux...
Ingo Molnar [Fri, 13 May 2016 05:35:12 +0000 (07:35 +0200)]
Merge tag 'perf-urgent-for-mingo-20160512' of git://git./linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

- Fallback to usermode-only counters when perf_event_paranoid > 1, which
  is the case now (Arnaldo Carvalho de Melo)

- Do not reassign parg after collapse_tree() in libtraceevent, which
  may cause tool crashes (Steven Rostedt)

- Fix the build on Fedora Rawhide, where readdir_r() is deprecated and
  also wrt -Werror=unused-const-variable= + x86_32_regoffset_table on
  !x86_64 (Arnaldo Carvalho de Melo)

- Fix the build on Ubuntu 12.04.5, where dwarf_getlocations() isn't
  available, i.e. libdw-dev < 0.157 (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Fri, 13 May 2016 01:44:24 +0000 (18:44 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge fixes from Andrew Morton:
 "4 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: thp: calculate the mapcount correctly for THP pages during WP faults
  ksm: fix conflict between mmput and scan_get_next_rmap_item
  ocfs2: fix posix_acl_create deadlock
  ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang

7 years agomm: thp: calculate the mapcount correctly for THP pages during WP faults
Andrea Arcangeli [Thu, 12 May 2016 22:42:25 +0000 (15:42 -0700)]
mm: thp: calculate the mapcount correctly for THP pages during WP faults

This will provide fully accuracy to the mapcount calculation in the
write protect faults, so page pinning will not get broken by false
positive copy-on-writes.

total_mapcount() isn't the right calculation needed in
reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
that is effectively the full accurate return value for page_mapcount()
if dealing with Transparent Hugepages, however we only use the
page_trans_huge_mapcount() during COW faults where it strictly needed,
due to its higher runtime cost.

This also provide at practical zero cost the total_mapcount
information which is needed to know if we can still relocate the page
anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
can reuse the page no matter if it's a pte or a pmd_trans_huge
triggering the fault, but we can only relocate the page anon_vma to
the local vma->anon_vma if we're sure it's only this "vma" mapping the
whole THP physical range.

Kirill A. Shutemov discovered the problem with moving the page
anon_vma to the local vma->anon_vma in a previous version of this
patch and another problem in the way page_move_anon_rmap() was called.

Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a
previous version, because reuse_swap_page must be a macro to call
page_trans_huge_mapcount from swap.h, so this uses a macro again
instead of an inline function. With this change at least it's a less
dangerous usage than it was before, because "page" is used only once
now, while with the previous code reuse_swap_page(page++) would have
called page_mapcount on page+1 and it would have increased page twice
instead of just once.

Dean Luick noticed an uninitialized variable that could result in a
rmap inefficiency for the non-THP case in a previous version.

Mike Marciniszyn said:

: Our RDMA tests are seeing an issue with memory locking that bisects to
: commit 61f5d698cc97 ("mm: re-enable THP")
:
: The test program registers two rather large MRs (512M) and RDMA
: writes data to a passive peer using the first and RDMA reads it back
: into the second MR and compares that data.  The sizes are chosen randomly
: between 0 and 1024 bytes.
:
: The test will get through a few (<= 4 iterations) and then gets a
: compare error.
:
: Tracing indicates the kernel logical addresses associated with the individual
: pages at registration ARE correct , the data in the "RDMA read response only"
: packets ARE correct.
:
: The "corruption" occurs when the packet crosse two pages that are not physically
: contiguous.   The second page reads back as zero in the program.
:
: It looks like the user VA at the point of the compare error no longer points to
: the same physical address as was registered.
:
: This patch totally resolves the issue!

Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Tested-by: Josh Collier <josh.d.collier@intel.com>
Cc: Marc Haber <mh+linux-kernel@zugschlus.de>
Cc: <stable@vger.kernel.org> [4.5]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoksm: fix conflict between mmput and scan_get_next_rmap_item
Zhou Chengming [Thu, 12 May 2016 22:42:21 +0000 (15:42 -0700)]
ksm: fix conflict between mmput and scan_get_next_rmap_item

A concurrency issue about KSM in the function scan_get_next_rmap_item.

task A (ksmd): |task B (the mm's task):
|
mm = slot->mm; |
down_read(&mm->mmap_sem); |
|
... |
|
spin_lock(&ksm_mmlist_lock); |
|
ksm_scan.mm_slot go to the next slot; |
|
spin_unlock(&ksm_mmlist_lock); |
|mmput() ->
| ksm_exit():
|
|spin_lock(&ksm_mmlist_lock);
|if (mm_slot && ksm_scan.mm_slot != mm_slot) {
| if (!mm_slot->rmap_list) {
| easy_to_free = 1;
| ...
|
|if (easy_to_free) {
| mmdrop(mm);
| ...
|
|So this mm_struct may be freed in the mmput().
|
up_read(&mm->mmap_sem); |

As we can see above, the ksmd thread may access a mm_struct that already
been freed to the kmem_cache.  Suppose a fork will get this mm_struct from
the kmem_cache, the ksmd thread then call up_read(&mm->mmap_sem), will
cause mmap_sem.count to become -1.

As suggested by Andrea Arcangeli, unmerge_and_remove_all_rmap_items has
the same SMP race condition, so fix it too.  My prev fix in function
scan_get_next_rmap_item will introduce a different SMP race condition, so
just invert the up_read/spin_unlock order as Andrea Arcangeli said.

Link: http://lkml.kernel.org/r/1462708815-31301-1-git-send-email-zhouchengming1@huawei.com
Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Geliang Tang <geliangtang@163.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: Li Bin <huawei.libin@huawei.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoocfs2: fix posix_acl_create deadlock
Junxiao Bi [Thu, 12 May 2016 22:42:18 +0000 (15:42 -0700)]
ocfs2: fix posix_acl_create deadlock

Commit 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure")
refactored code to use posix_acl_create.  The problem with this function
is that it is not mindful of the cluster wide inode lock making it
unsuitable for use with ocfs2 inode creation with ACLs.  For example,
when used in ocfs2_mknod, this function can cause deadlock as follows.
The parent dir inode lock is taken when calling posix_acl_create ->
get_acl -> ocfs2_iop_get_acl which takes the inode lock again.  This can
cause deadlock if there is a blocked remote lock request waiting for the
lock to be downconverted.  And same deadlock happened in ocfs2_reflink.
This fix is to revert back using ocfs2_init_acl.

Fixes: 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure")
Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang
Junxiao Bi [Thu, 12 May 2016 22:42:15 +0000 (15:42 -0700)]
ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang

Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
introduced this issue.  ocfs2_setattr called by chmod command holds
cluster wide inode lock when calling posix_acl_chmod.  This latter
function in turn calls ocfs2_iop_get_acl and ocfs2_iop_set_acl.  These
two are also called directly from vfs layer for getfacl/setfacl commands
and therefore acquire the cluster wide inode lock.  If a remote
conversion request comes after the first inode lock in ocfs2_setattr,
OCFS2_LOCK_BLOCKED will be set.  And this will cause the second call to
inode lock from the ocfs2_iop_get_acl() to block indefinetly.

The deleted version of ocfs2_acl_chmod() calls __posix_acl_chmod() which
does not call back into the filesystem.  Therefore, we restore
ocfs2_acl_chmod(), modify it slightly for locking as needed, and use that
instead.

Fixes: 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agotipc: eliminate risk of double link_up events
Jon Paul Maloy [Wed, 11 May 2016 23:15:45 +0000 (19:15 -0400)]
tipc: eliminate risk of double link_up events

When an ACTIVATE or data packet is received in a link in state
ESTABLISHING, the link does not immediately change state to
ESTABLISHED, but does instead return a LINK_UP event to the caller,
which will execute the state change in a different lock context.

This non-atomic approach incurs a low risk that we may have two
LINK_UP events pending simultaneously for the same link, resulting
in the final part of the setup procedure being executed twice. The
only potential harm caused by this it that we may see two LINK_UP
events issued to subsribers of the topology server, something that
may cause confusion.

This commit eliminates this risk by checking if the link is already
up before proceeding with the second half of the setup.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvneta: bm: fix dependencies again
Arnd Bergmann [Wed, 11 May 2016 20:13:23 +0000 (22:13 +0200)]
net: mvneta: bm: fix dependencies again

I tried to fix this before, but my previous fix was incomplete
and we can still get the same link error in randconfig builds
because of the way that Kconfig treats the

default y if MVNETA=y && MVNETA_BM_ENABLE

line that does not actually trigger when MVNETA_BM_ENABLE=m,
unlike I intended.
Changing the line to use MVNETA_BM_ENABLE!=n however has
the desired effect and hopefully makes all configurations
work as expected.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 019ded3aa7c9 ("net: mvneta: bm: clarify dependencies")
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agophy: micrel: Use MICREL_PHY_ID_MASK definition
Fabio Estevam [Wed, 11 May 2016 20:02:05 +0000 (17:02 -0300)]
phy: micrel: Use MICREL_PHY_ID_MASK definition

Replace the hardcoded mask 0x00fffff0 with MICREL_PHY_ID_MASK for
better readability.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agogre: Fix wrong tpi->proto in WCCP
Haishuang Yan [Wed, 11 May 2016 10:48:32 +0000 (18:48 +0800)]
gre: Fix wrong tpi->proto in WCCP

When dealing with WCCP in gre6 tunnel, it sets the wrong tpi->protocol,
that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoip6_gre: Fix get_size calculation for gre6 tunnel
Haishuang Yan [Wed, 11 May 2016 10:48:31 +0000 (18:48 +0800)]
ip6_gre: Fix get_size calculation for gre6 tunnel

Do not include attribute IFLA_GRE_TOS.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'keys-fixes-20160512' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 12 May 2016 20:00:33 +0000 (13:00 -0700)]
Merge tag 'keys-fixes-20160512' of git://git./linux/kernel/git/dhowells/linux-fs

Pull keyring fix from David Howells:
 "Fix ASN.1 indefinite length object parsing"

* tag 'keys-fixes-20160512' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  KEYS: Fix ASN.1 indefinite length object parsing

7 years agoMerge tag 'sound-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Linus Torvalds [Thu, 12 May 2016 19:55:42 +0000 (12:55 -0700)]
Merge tag 'sound-4.6' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "This is a pretty boring pull request as you wish: including a few
  small and trivial HD-audio and USB-audio quirks and a couple of small
  regression fixes in HD-audio"

* tag 'sound-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usb-audio: Yet another Phoneix Audio device quirk
  ALSA: hda - Fix regression on ATI HDMI audio
  ALSA: hda - Fix subwoofer pin on ASUS N751 and N551
  ALSA: hda - Fix broken reconfig
  ALSA: hda - Fix white noise on Asus UX501VW headset
  ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2)

7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Thu, 12 May 2016 19:47:49 +0000 (12:47 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input subsystem fixes from Dmitry Torokhov.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: twl6040-vibra - fix DT node memory management
  Input: max8997-haptic - fix NULL pointer dereference
  Input: byd - update copyright header

7 years agoperf stat: Fallback to user only counters when perf_event_paranoid > 1
Arnaldo Carvalho de Melo [Thu, 12 May 2016 19:25:18 +0000 (16:25 -0300)]
perf stat: Fallback to user only counters when perf_event_paranoid > 1

After 0161028b7c8a ("perf/core: Change the default paranoia level to 2")
'perf stat' fails for users without CAP_SYS_ADMIN, so just use
'perf_evsel__fallback()' to have the same behaviour as 'perf record',
i.e. set perf_event_attr.exclude_kernel to 1.

Now:

  [acme@jouet linux]$ perf stat usleep 1

   Performance counter stats for 'usleep 1':

          0.352536      task-clock:u (msec)  #   0.423 CPUs utilized
                 0      context-switches:u   #   0.000 K/sec
                 0      cpu-migrations:u     #   0.000 K/sec
                49      page-faults:u        #   0.139 M/sec
           309,407      cycles:u             #   0.878 GHz
           243,791      instructions:u       #   0.79  insn per cycle
            49,622      branches:u           # 140.757 M/sec
             3,884      branch-misses:u      #   7.83% of all branches

       0.000834174 seconds time elapsed

  [acme@jouet linux]$

Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf evsel: Handle EACCESS + perf_event_paranoid=2 in fallback()
Arnaldo Carvalho de Melo [Thu, 12 May 2016 19:07:47 +0000 (16:07 -0300)]
perf evsel: Handle EACCESS + perf_event_paranoid=2 in fallback()

Now with the default for the kernel.perf_event_paranoid sysctl being 2 [1]
we need to fall back to :u, i.e. to set perf_event_attr.exclude_kernel
to 1.

Before:

  [acme@jouet linux]$ perf record usleep 1
  Error:
  You may not have permission to collect stats.

  Consider tweaking /proc/sys/kernel/perf_event_paranoid,
  which controls use of the performance events system by
  unprivileged users (without CAP_SYS_ADMIN).

  The current value is 2:

    -1: Allow use of (almost) all events by all users
  >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
  >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
  >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
  [acme@jouet linux]$

After:

  [acme@jouet linux]$ perf record usleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ]
  [acme@jouet linux]$ perf evlist
  cycles:u
  [acme@jouet linux]$ perf evlist -v
  cycles:u: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
  [acme@jouet linux]$

And if the user turns on verbose mode, an explanation will appear:

  [acme@jouet linux]$ perf record -v usleep 1
  Warning:
  kernel.perf_event_paranoid=2, trying to fall back to excluding kernel samples
  mmap size 528384B
  [ perf record: Woken up 1 times to write data ]
  Looking at the vmlinux_path (8 entries long)
  Using /lib/modules/4.6.0-rc7+/build/vmlinux for symbols
  [ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ]
  [acme@jouet linux]$

[1] 0161028b7c8a ("perf/core: Change the default paranoia level to 2")

Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agodrm/amdgpu: fix DP mode validation
Alex Deucher [Wed, 11 May 2016 20:21:03 +0000 (16:21 -0400)]
drm/amdgpu: fix DP mode validation

Switch the order of the loops to walk the rates on the top
so we exhaust all DP 1.1 rate/lane combinations before trying
DP 1.2 rate/lane combos.

This avoids selecting rates that are supported by the monitor,
but not the connector leading to valid modes getting rejected.

bug:
https://bugs.freedesktop.org/show_bug.cgi?id=95206

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
7 years agodrm/radeon: fix DP mode validation
Alex Deucher [Wed, 11 May 2016 20:16:53 +0000 (16:16 -0400)]
drm/radeon: fix DP mode validation

Switch the order of the loops to walk the rates on the top
so we exhaust all DP 1.1 rate/lane combinations before trying
DP 1.2 rate/lane combos.

This avoids selecting rates that are supported by the monitor,
but not the connector leading to valid modes getting rejected.

bug:
https://bugs.freedesktop.org/show_bug.cgi?id=95206

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
7 years agoperf evsel: Improve EPERM error handling in open_strerror()
Arnaldo Carvalho de Melo [Thu, 12 May 2016 18:44:55 +0000 (15:44 -0300)]
perf evsel: Improve EPERM error handling in open_strerror()

We were showing a hardcoded default value for the kernel.perf_event_paranoid
sysctl, now that it became more paranoid (1 -> 2 [1]), this would need to be
updated, instead show the current value:

  [acme@jouet linux]$ perf record ls
  Error:
  You may not have permission to collect stats.

  Consider tweaking /proc/sys/kernel/perf_event_paranoid,
  which controls use of the performance events system by
  unprivileged users (without CAP_SYS_ADMIN).

  The current value is 2:

    -1: Allow use of (almost) all events by all users
  >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
  >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
  >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
  [acme@jouet linux]$

[1] 0161028b7c8a ("perf/core: Change the default paranoia level to 2")

Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-0gc4rdpg8d025r5not8s8028@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>