cascardo/ovs.git
9 years agoSet release dates for 2.3.1. v2.3.1
Justin Pettit [Thu, 4 Dec 2014 05:26:32 +0000 (21:26 -0800)]
Set release dates for 2.3.1.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
9 years agoBFD: Decreasing minimal transmit and receive interval
Niels van Adrichem [Tue, 7 Oct 2014 15:04:13 +0000 (15:04 +0000)]
BFD: Decreasing minimal transmit and receive interval

I found the BFD transmit interval was lowerbounded by the default value
without warning, although documentation does not consider a lowerbound.

Testing has been performed with transmit and receive intervals as low as 1
ms, and although CPU overhead was effected (especially with multiple BFD
sessions such as 6 and higher), it worked well.

Signed-off-by: Niels van Adrichem <n.l.m.vanadrichem@tudelft.nl>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: compat: Fix build on RHEL 6.6
Pravin B Shelar [Wed, 19 Nov 2014 15:47:04 +0000 (07:47 -0800)]
datapath: compat: Fix build on RHEL 6.6

Reported-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoutil: Add be32_prefix_mask().
Jarno Rajahalme [Tue, 11 Nov 2014 23:50:51 +0000 (15:50 -0800)]
util: Add be32_prefix_mask().

Shifting a 32-bit entity by 32 bits is undefined behavior.  As we have 2
cases where we may hit this, it is a time to introduce a helper for
this.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agotests: Fix expected output for group features test.
Ben Pfaff [Tue, 11 Nov 2014 20:44:39 +0000 (12:44 -0800)]
tests: Fix expected output for group features test.

Commit f6d23392d19db (ofproto: Fix supported group types.) fixed a bug in
the group features code but added the wrong expected output to a test case.
This fixes the expected output for the test case.

Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoofproto-dpif-rid: Fix memory leak.
Ben Pfaff [Tue, 11 Nov 2014 20:40:52 +0000 (12:40 -0800)]
ofproto-dpif-rid: Fix memory leak.

Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoofproto: Fix supported group types.
Shu Shen [Tue, 11 Nov 2014 00:35:08 +0000 (16:35 -0800)]
ofproto: Fix supported group types.

Previously the types field is uninited and leds to no supported type
being reported in OFPMP_GROUP_FEATURES message.

Signed-off-by: Shu Shen <shu.shen@radisys.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agorhel: Fix tunnel port ifup/ifdown failure.
Flavio Leitner [Thu, 6 Nov 2014 18:50:06 +0000 (16:50 -0200)]
rhel: Fix tunnel port ifup/ifdown failure.

The tunnel port is invisible to the OS, so there is
no reason to call OTHERSCRIPT in the ifup/ifdown scripts.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Fix compat checks for ipv6_skip_exthdr()
Pravin B Shelar [Mon, 3 Nov 2014 16:55:12 +0000 (08:55 -0800)]
datapath: Fix compat checks for ipv6_skip_exthdr()

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: Convert dp rcu read operation to locked operations
Pravin B Shelar [Wed, 29 Oct 2014 09:45:48 +0000 (02:45 -0700)]
datapath: Convert dp rcu read operation to locked operations

dp read operations depends on ovs_dp_cmd_fill_info(). This API
needs to looup vport to find dp name, but vport lookup can
fail. Therefore to keep vport reference alive we need to
take ovs lock.

Found by code inspection.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agonetflow: Fix interpretation of flow_seq.
Motonori Shindo [Tue, 4 Nov 2014 16:12:18 +0000 (01:12 +0900)]
netflow: Fix interpretation of flow_seq.

'flow_seq" field in NetFlow v5 header should represent a number of NetFlow
flow records exported while it is representing the number of NetFlow
packets exported in the current code. This patch fixes this problem.

Signed-off-by: Motonori Shindo <motonori@shin.do>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoodp-util: Fix segfault in MPLS attribute parsing.
Joe Stringer [Fri, 31 Oct 2014 21:05:46 +0000 (14:05 -0700)]
odp-util: Fix segfault in MPLS attribute parsing.

Just because the ethertype is MPLS, this doesn't mean that the datapath
understands and provides OVS_KEY_ATTR_MPLS attributes for the flow.
Previously we would check the size of the OVS_KEY_ATTR_MPLS attribute
before checking whether the attribute is present. This would cause a
segfault in nl_attr_get_size(), usually triggered from a handler thread.

This patch brings the MPLS parsing code more in line with the rest of
the parse_l2_5_onward() function, by only processing MPLS if the
attribute is present.

Reported-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoofproto: Take locks before calling classifier_count().
Joe Stringer [Thu, 30 Oct 2014 20:33:02 +0000 (13:33 -0700)]
ofproto: Take locks before calling classifier_count().

Commit 635e5bf55b (ofproto: Warn about excessive rule counts in OpenFlow
tables.) introduced an access of classifier_count() without taking the
corresponding locks first.

Found by clang.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoofp-print-ofctl: Free group buckets.
Simon Horman [Fri, 31 Oct 2014 08:14:47 +0000 (17:14 +0900)]
ofp-print-ofctl: Free group buckets.

Found by inspection using make check-valgrind.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Use upstream ipv6_find_hdr().
Pravin B Shelar [Mon, 20 Oct 2014 23:13:04 +0000 (16:13 -0700)]
datapath: Use upstream ipv6_find_hdr().

ipv6_find_hdr() already fixed in newer upstram kernel by Ansis, we
can start using this API safely.
This patch also backports fix (ipv6: ipv6_find_hdr restore prev
functionality) to compat ipv6_find_hdr().

CC: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agodatapath: net: make skb_gso_segment error handling more robust
Pravin B Shelar [Tue, 21 Oct 2014 21:20:42 +0000 (14:20 -0700)]
datapath: net: make skb_gso_segment error handling more robust

skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL.  This can happen when GSO is used for header verification.

However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.

Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.

However, there have been issues with some protocol handlers erronously not
respecting the specified feature mask in some cases.

It is preferable to get 'have to turn off hw offloading, else slow' reports
rather than 'kernel crashes'.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: fix a use after free
Pravin B Shelar [Fri, 17 Oct 2014 22:02:27 +0000 (15:02 -0700)]
datapath: fix a use after free

pskb_may_pull() called by arphdr_ok can change skb->data, so put the arp
setting after arphdr_ok to avoid the use the freed memory

Fixes: 0714812134d7d ("openvswitch: Eliminate memset() from flow_extract.")
Cc: Jesse Gross <jesse@nicira.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
arphdr_ok() was not done on branch-2.3, so backported whole fix
as part of this patch.

Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: compat: Fix compilation 3.11
Pravin B Shelar [Mon, 13 Oct 2014 09:02:44 +0000 (02:02 -0700)]
datapath: compat: Fix compilation 3.11

Kernel 3.11 is only kernel where GRE APIs are available but
not vxlan. Add check for vxlan xmit to detect this case.

Reported-by: Dave Benson <dbenson@verdantnetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agoovsdb: bond_active_salve column's value should be durable when database restarts
Andy Zhou [Fri, 10 Oct 2014 21:20:36 +0000 (14:20 -0700)]
ovsdb: bond_active_salve column's value should be durable when database restarts

According to RFC 7047, 'ephemeral' annotation does not guarantee
the values to be durable. This fix Removes this annotation.

VMware-BZ:  1332235

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agoovs-vswitchd: Fix high cpu utilization when acquire idl lock fails.
Alex Wang [Thu, 9 Oct 2014 08:23:37 +0000 (08:23 +0000)]
ovs-vswitchd: Fix high cpu utilization when acquire idl lock fails.

When ovs-vswitchd fails to acquire the ovsdb idl lock (either due to
contention or due to invalid database path), ovs-vswitchd will spin
on the global connectivity sequence number and consume 100% cpu.
This is in that the local copy is different to the global sequence
number and never updated when ovsdb idl is not locked.

To fix this issue, this commit makes ovs-vswitchd not checking the
global connectivity sequence number in that situation.

Reported-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-vswitchd.at: Port tests to run successfully on Windows.
Gurucharan Shetty [Mon, 6 Oct 2014 17:04:26 +0000 (10:04 -0700)]
ovs-vswitchd.at: Port tests to run successfully on Windows.

It is a little tricky to implement "$!" with unit tests on Windows.
This commit changes the tests so that it works both on Windows
and Linux.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agoovs-vswitchd.at: Add test for switch over to another ovs-vswitchd.
Alex Wang [Thu, 2 Oct 2014 02:44:47 +0000 (02:44 +0000)]
ovs-vswitchd.at: Add test for switch over to another ovs-vswitchd.

Test the switch over to inactive ovs-vswitchd process when user
kill the currently active ovs-vswitchd.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agoovs-vswitchd.at: Fix "start additional ovs-vswitchd process" test
YAMAMOTO Takashi [Wed, 1 Oct 2014 05:00:36 +0000 (14:00 +0900)]
ovs-vswitchd.at: Fix "start additional ovs-vswitchd process" test

NetBSD implementation of wc command outputs extra whitespaces
like the following.  Tweak the test to success on such environments.

    % echo hoge|wc -l|hexdump -C
    00000000  20 20 20 20 20 20 20 31  0a                       |       1.|
    00000009
    %

The failing test was introduced by
commit 6bef3c7ca859f208239ca61ec3b25c09a3571553
("bridge: Fix high cpu utilization.")

Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agobridge: Fix high cpu utilization.
Alex Wang [Tue, 30 Sep 2014 20:46:22 +0000 (13:46 -0700)]
bridge: Fix high cpu utilization.

When there are more than one ovs-vswitchd processes started,
only one process is enabled.  The disabled processes should
just sleep.  However, a bug in ovs makes the disabled processes
keep waking up on global connectivity sequence number which is
never sync'ed.  Consequently, those processes use 100% cpu.

This commit fixes the bug by always sync up the connectivity
sequence number for disabled processes.

Reported-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agorhel7: Fix rpm install failure.
Alex Wang [Thu, 9 Oct 2014 07:02:07 +0000 (00:02 -0700)]
rhel7: Fix rpm install failure.

When trying to install the kernel module rpm built for RHEL7,
the install failed with following conflicts:

  # rpm -i kmod-openvswitch-2.3.1-1.el7.x86_64.rpm
    file /lib/modules/3.10.0-123.8.1.el7.x86_64/modules.devname
    from install of kmod-openvswitch-2.3.1-1.el7.x86_64 conflicts
    with file from package kernel-3.10.0-123.8.1.el7.x86_64

    file /lib/modules/3.10.0-123.8.1.el7.x86_64/modules.softdep
    from install of kmod-openvswitch-2.3.1-1.el7.x86_64 conflicts
    with file from package kernel-3.10.0-123.8.1.el7.x86_64

Similar issue has already been reported and solved here:

    https://bugzilla.redhat.com/show_bug.cgi?id=1003267

This commit applies the solution in the link to ovs.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agodatapath: Add backport for iptunnel_xmit.
Pravin B Shelar [Tue, 7 Oct 2014 14:33:52 +0000 (07:33 -0700)]
datapath: Add backport for iptunnel_xmit.

Missed during backport of "datapath: Add support for RHEL-7 /
CentOS-7 kernel."

Reported-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
9 years agolib/meta-flow: Index correct MPLS lse in mf_is_all_wild().
Jarno Rajahalme [Mon, 6 Oct 2014 21:12:57 +0000 (14:12 -0700)]
lib/meta-flow: Index correct MPLS lse in mf_is_all_wild().

Should index the first lse for all parts of the lse (label, TC, BOS).

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agobridge: Keep bond active slave selection across OVS restart
Andy Zhou [Sun, 5 Oct 2014 06:35:30 +0000 (23:35 -0700)]
bridge: Keep bond active slave selection across OVS restart

Whenever OVS restarts, it pseudo-randomly picks an interface
of a bond port to be the active slave. This can cause traffic
disruption in case the upstream switch does not support LACP, or
in case of multi-chassis switches that do not support mLACP.

This patch helps the situation by always record the last active
slave into ovsdb. When OVS restarts, the stored last active slave
has the highest priority to be selected again. In case this interface
is available, due to configuration changes or being offline, OVS then
consider other interfaces with the bond as it does today.

In a nutshell, this patch makes the active slave selection stickier
across OVS restart.

VMware-BZ:  1332235

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agodatapath: Add support for RHEL-7 / CentOS-7 kernel.
Pravin B Shelar [Fri, 3 Oct 2014 12:26:42 +0000 (05:26 -0700)]
datapath: Add support for RHEL-7 / CentOS-7 kernel.

This patch mostly is related to tunnel API where RHEL 7
kernel API are not in-sync with newer linux kernel API. So
extra checks are required to check for parameters of API.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
9 years agodatapath: Enable tunnel GSO features.
Pravin B Shelar [Mon, 29 Sep 2014 11:36:48 +0000 (04:36 -0700)]
datapath: Enable tunnel GSO features.

Following patch enables all available tunnel GSO features for OVS
bridge device so that ovs can use hardware offloads available to
underling device.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agobridge: Skip statistics update if cannot get ovsdb config.
Alex Wang [Mon, 29 Sep 2014 00:54:17 +0000 (17:54 -0700)]
bridge: Skip statistics update if cannot get ovsdb config.

Commit 62c5c3e (bridge: Rate limit the statistics update.) makes
ovs try committing the statistics even if ovs failed to get ovsdb
configuration.  This causes the ovs committing the NULL transaction
pointer.

To fix this issue, this commit makes ovs skip statistics update
if it cannot get ovsdb config.

VMware-BZ: #1331308

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoofproto-dpif-rid: correct logic error in rid_pool_alloc_id()
Simon Horman [Wed, 24 Sep 2014 12:41:02 +0000 (12:41 +0000)]
ofproto-dpif-rid: correct logic error in rid_pool_alloc_id()

When searching through the valid ids an id should
be used if is not found rather than if it is found.

It appears to me that without this change duplicate recirculation
ids may used in cases where the last recirculation id has
been allocated; selection loops back to the beginning of the pool and;
reaches a recirculation id that is still in use.

As the number of recirculation ids is currently RECIRC_ID_N_IDS = 1024 this
does not seem beyond the bounds of possibility.

I have not verified that such a scenario can actually occur.  But it seems
that a likely consequence would be that some packets may be forwarded
incorrectly.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
9 years agoofp-actions: Fix error code for invalid table id.
Selvamuthukumar [Wed, 24 Sep 2014 16:53:13 +0000 (09:53 -0700)]
ofp-actions: Fix error code for invalid table id.

Send OFPET_BAD_INSTRUCTION/OFPBIC_BAD_TABLE_ID if table is invalid
in goto table instruction.

Signed-off-by: Selvamuthukumar <smkumar@merunetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agojsonrpc: Notify excessive sending backlog.
Alex Wang [Thu, 18 Sep 2014 17:49:47 +0000 (10:49 -0700)]
jsonrpc: Notify excessive sending backlog.

This commit adds a log message to notify the excessive backlog
for jsonrpc.  Expectedly, this message should never be printed.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agobridge: Rate limit the statistics update.
Alex Wang [Thu, 18 Sep 2014 21:10:24 +0000 (14:10 -0700)]
bridge: Rate limit the statistics update.

When ovs is running with large topology (e.g. large number of
interfaces), the stats update to ovsdb becomes huge and normally
requires multiple run of ovsdb jsonrpc message processing loop to
consume.

To prevent the periodic stats update from backlogging in the
jsonrpc sending queue, this commit adds rate limiting logic
which only allows new update if the previous one is done.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
9 years agodatapath: Restore OVS_CB after skb_segment.
Pravin B Shelar [Tue, 23 Sep 2014 22:15:37 +0000 (15:15 -0700)]
datapath: Restore OVS_CB after skb_segment.

OVS needs to segments large skb before sending it for miss
packet handling to userspace. but skb_gso_segment uses
skb->cb. This corrupted OVS_CB which result in following panic.

[  735.419921] BUG: unable to handle kernel paging request at 00000014000001b2
[  735.423168] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[  735.445097] RIP: 0010:[<ffffffffa05df0d7>]  [<ffffffffa05df0d7>] ovs_nla_put_flow+0x37/0x7c0 [openvswitch]
[  735.468858] Call Trace:
[  735.470384]  [<ffffffffa05d7ec2>] queue_userspace_packet+0x332/0x4d0 [openvswitch]
[  735.471741]  [<ffffffffa05d8155>] queue_gso_packets+0xf5/0x180 [openvswitch]
[  735.481862]  [<ffffffffa05da9f5>] ovs_dp_upcall+0x65/0x70 [openvswitch]
[  735.483031]  [<ffffffffa05dab81>] ovs_dp_process_packet+0x181/0x1b0 [openvswitch]
[  735.484391]  [<ffffffffa05e2f55>] ovs_vport_receive+0x65/0x90 [openvswitch]
[  735.492638]  [<ffffffffa05e5738>] internal_dev_xmit+0x68/0x110 [openvswitch]
[  735.495334]  [<ffffffff81588eb6>] dev_hard_start_xmit+0x2e6/0x8b0
[  735.496503]  [<ffffffff81589847>] __dev_queue_xmit+0x3c7/0x920
[  735.499827]  [<ffffffff81589db0>] dev_queue_xmit+0x10/0x20
[  735.500798]  [<ffffffff815d3b60>] ip_finish_output+0x6a0/0x950
[  735.502818]  [<ffffffff815d55f8>] ip_output+0x68/0x110
[  735.503835]  [<ffffffff815d4979>] ip_local_out+0x29/0x90
[  735.504801]  [<ffffffff815d4e46>] ip_queue_xmit+0x1d6/0x640
[  735.507015]  [<ffffffff815ee0d7>] tcp_transmit_skb+0x477/0xac0
[  735.508260]  [<ffffffff815ee856>] tcp_write_xmit+0x136/0xba0
[  735.510829]  [<ffffffff815ef56e>] __tcp_push_pending_frames+0x2e/0xc0
[  735.512296]  [<ffffffff815e0593>] tcp_sendmsg+0xa63/0xd50
[  735.513526]  [<ffffffff81612c2c>] inet_sendmsg+0x10c/0x220
[  735.516025]  [<ffffffff81566b8c>] sock_sendmsg+0x9c/0xe0
[  735.518066]  [<ffffffff81566d41>] SYSC_sendto+0x121/0x1c0
[  735.521398]  [<ffffffff8156801e>] SyS_sendto+0xe/0x10
[  735.522473]  [<ffffffff816df5e9>] system_call_fastpath+0x16/0x1b

Reported-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agoovs-pki: Use SHA-1 instead of SHA-512 as message digest.
Alex Wang [Mon, 22 Sep 2014 22:34:12 +0000 (15:34 -0700)]
ovs-pki: Use SHA-1 instead of SHA-512 as message digest.

Commit 9ff33ca7 (ovs-pki: Use SHA-512 instead of MD5 as message
digest.) changes the message digest algorithm to SHA-512.  This
seems to break the unit tests on some xenserver 5.6/6.0 builds
causing the error: "SSL_connect: error:0D0C50A1:asn1 encoding
routines:ASN1_item_verify:unknown message digest algorithm".

As a solution, this commit changes the message digest algorithm
to SHA-1 which works for both the above xenserver builds and
centos 7.

VMware-BZ: #1304530

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoofproto: Warn about excessive rule counts in OpenFlow tables.
Ethan Jackson [Wed, 17 Sep 2014 20:22:14 +0000 (13:22 -0700)]
ofproto: Warn about excessive rule counts in OpenFlow tables.

Frequently we've run into controller bugs which result in hundreds of
thousands, or even millions of rules being installed in an OpenFlow
table.  This isn't something trouble-shooters naturally think of to
check for, so it's nice to have a low rate warning message to hint at
the potential problem.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodebian: Don't depened on $RUNLEVEL at startup to create bridges.
Gurucharan Shetty [Thu, 11 Sep 2014 16:35:10 +0000 (09:35 -0700)]
debian: Don't depened on $RUNLEVEL at startup to create bridges.

Commit b2a0daa5bd (debian: Don't recreate bridges during manual restart.)
added a check on $RUNLEVEL to only create bridges and ports when the
system starts up. This fix does not work with systemd.

This commit uses a different approach to solve the same problem.

Reported-at: https://bugs.debian.org/686518
Reported-by: Philipp S. Schmidt <phils@in-panik.de>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Tested-by: Philipp S. Schmidt <phils@in-panik.de>
9 years agodebian: Automatically start openvswitch before first invocation of ovs-vsctl.
Gurucharan Shetty [Fri, 11 Jul 2014 21:50:50 +0000 (14:50 -0700)]
debian: Automatically start openvswitch before first invocation of ovs-vsctl.

In the 'interfaces' file, if an admin adds the openvswitch interface
in 'auto', ifupdown will try to create OVS interfaces even before
openvswitch has started. In a case like that, assume that the admin
knows what he is doing and try to start openvswitch.

The negatives I see are
1. /usr is NFS mounted, in which case expect the admin
to mount it through initramfs.
2. syslog through network may not be accessible.

This is similar to what rhel does with commit 602453000e28ec10
(rhel: Automatically start openvswitch service before bringing an ovs
interface up or down)

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agotests: Remove extraneous parenthesis from test name.
Joe Stringer [Thu, 11 Sep 2014 14:04:52 +0000 (11:04 -0300)]
tests: Remove extraneous parenthesis from test name.

This could cause configuration failure on earlier versions of autoconf.

Reported-by: Lin Shaopeng <slin0209@gmail.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agobuild: Allow building with autoconf 2.63
Thomas Graf [Thu, 11 Sep 2014 14:04:51 +0000 (11:04 -0300)]
build: Allow building with autoconf 2.63

Reduces the dependency on autoconf from 2.64 to 2.63 to ease building
on older platforms. There is only a few macros missing and they can
be provided easily.

A handful of tests needed modification. The difference in quoting
behaviour between 2.63 and later require the m4_define() to be
manually unfolded.

The Debian control file is left untouched on purpose. The decision
whether to adjust the dependency is left to the respective maintainers.

Tested with autoconf 2.63 and 2.69.

Cc: Scott Mann <smann@noironetworks.com>
Cc: Don Kehn <dkehn@noironetworks.com>
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Backport __ip_select_ident() function
Pravin B Shelar [Wed, 25 Sep 2013 01:42:43 +0000 (18:42 -0700)]
datapath: Backport __ip_select_ident() function

definition of __ip_select_ident() changed in newer kernel and
it is backported to stable kernel, Therefore adding configure
check to detect the new function.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agoofproto-dpif-xlate: Work around Linux netdev_max_backlog limit.
Ben Pfaff [Tue, 9 Sep 2014 22:06:52 +0000 (15:06 -0700)]
ofproto-dpif-xlate: Work around Linux netdev_max_backlog limit.

Linux has an internal queue that temporarily holds packets transmitted to
certain network devices.  If too many packets are transmitted to such
network devices within a single list of actions, then packets tend to get
dropped.  Broadcast or flooded or multicast packets on bridges with
thousands of ports are examples of how this can occur.

This commit avoids the problem by implementing a flow in userspace when it
outputs its packet more times than the maximum length of the queue.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Tested-by: Flavio Leitner <fbl@redhat.com>
9 years agopacket: Fix sparse warnings ICMPv6.
Jesse Gross [Wed, 3 Sep 2014 00:57:21 +0000 (17:57 -0700)]
packet: Fix sparse warnings ICMPv6.

The system defined ICMPv6 header doesn't have sparse annotation,
so this adds a definition so that endianness can be checked.

Reported-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
9 years agoovs-vsctl: Correctly exit on errors for non-map types in "remove" command.
Ben Pfaff [Tue, 2 Sep 2014 15:35:02 +0000 (08:35 -0700)]
ovs-vsctl: Correctly exit on errors for non-map types in "remove" command.

Reported-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
9 years agodatapath: Fix checksum calculation when modifying ICMPv6 packets.
Jesse Gross [Fri, 15 Aug 2014 18:01:54 +0000 (11:01 -0700)]
datapath: Fix checksum calculation when modifying ICMPv6 packets.

The checksum of ICMPv6 packets uses the IP pseudoheader as part of
the calculation, unlike ICMP in IPv4. This was not implemented,
which means that modifying the IP addresses of an ICMPv6 packet
would cause the checksum to no longer be correct as the psuedoheader
did not match.

Reported-by: Neal Shrader <icosahedral@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agothread: Use explicit wide type when shifting > 32 bits
Thomas Graf [Fri, 29 Aug 2014 10:21:49 +0000 (12:21 +0200)]
thread: Use explicit wide type when shifting > 32 bits

Without the explicit wide type, the shift operation may be performed
on a int which will result in implementation defined behaviour on a
system with more than 32 CPUs.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agonetdev-linux: Cast policer rate to uint64_t to avoid overflow
Thomas Graf [Fri, 29 Aug 2014 10:20:21 +0000 (12:20 +0200)]
netdev-linux: Cast policer rate to uint64_t to avoid overflow

tc_fill_rate() takes a 64bit int, casting kbits_rate from int
to uint64_t avoids a possible overflow when translating from
kbits to bytes.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Always initialize fix_segment for GSO packet.
Pravin B Shelar [Wed, 27 Aug 2014 14:24:44 +0000 (07:24 -0700)]
datapath: Always initialize fix_segment for GSO packet.

OVS tunnel compat code depends on this function pointer to
handle GSO packet. Currently we do not initialize for all
GRE GSO packets. Following patch fixes that.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agoovsdb: Fix error leak for negative timeout and invalid until case
Thomas Graf [Thu, 28 Aug 2014 12:40:50 +0000 (14:40 +0200)]
ovsdb: Fix error leak for negative timeout and invalid until case

Although the check for negative timeout is present, the error string
is overwritten if an invalid "until" is found right after. This leaks
an error string and results in not reporting the negative timeout back
to the user even though it is encountered first.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agovtep-ctl: Free error string before return from cmd_remove().
Madhu Challa [Wed, 27 Aug 2014 01:16:12 +0000 (18:16 -0700)]
vtep-ctl: Free error string before return from cmd_remove().

Error string should be freed in all cases.

Found by Coverity.

Signed-off-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoFix memory leaks in error paths.
yinpeijun [Wed, 27 Aug 2014 01:52:54 +0000 (09:52 +0800)]
Fix memory leaks in error paths.

Found by Fortify.

Signed-off-by: yinpeijun <yinpeijun@huawei.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodebian: Fix cross build.
Ed Swierk [Sun, 24 Aug 2014 17:37:29 +0000 (10:37 -0700)]
debian: Fix cross build.

Cross-building openvswitch with debuild -aARCH (or equivalent) fails
because the target architecture is not getting passed to configure.
Thus binaries like ovs-appctl get built using the build host
architecture.

Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoodp-util: Only add recirc_id mask to Netlink message if mask is provided
Thomas Graf [Tue, 26 Aug 2014 16:34:52 +0000 (18:34 +0200)]
odp-util: Only add recirc_id mask to Netlink message if mask is provided

Current unconditional call may result in NULL being passed to
nl_msg_put_u32().

Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agovswitch.xml: Fix a typo.
Alex Wang [Tue, 26 Aug 2014 05:42:04 +0000 (22:42 -0700)]
vswitch.xml: Fix a typo.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agojson: Fix leaked nodes in json_hash_object()
Thomas Graf [Tue, 26 Aug 2014 10:23:03 +0000 (12:23 +0200)]
json: Fix leaked nodes in json_hash_object()

nodes is allocated through shash_sort() but never freed.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agobfd: Clarify bfd:enable.
Alex Wang [Wed, 20 Aug 2014 02:06:04 +0000 (02:06 +0000)]
bfd: Clarify bfd:enable.

This commit clarifies the documentation for 'bfd:enable'
when it is not specified.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agobfd: Clarify the BFD diagnostic.
Alex Wang [Fri, 15 Aug 2014 02:02:06 +0000 (02:02 +0000)]
bfd: Clarify the BFD diagnostic.

This commit adds more explanation for the bfd diagnostic and
bfd remote diagnostic in vswitch and vtep documentation.

Requested-by:Bruce Davie <bdavie@vmware.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
9 years agodatapath: Simplify flow mask cache delete.
Pravin B Shelar [Thu, 14 Aug 2014 18:23:32 +0000 (11:23 -0700)]
datapath: Simplify flow mask cache delete.

Currently on mask delete OVS moves last mask to the deleted
place to keep flow cache consistent and compact for per cpu
cache. But that generate duplicate entries in mask cache array
which results in multiple flow lookups in case we miss flow
cache.
Following patch simply sets NULL for deleted entry.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agodatapath: Optimize Flow mask cache hash collision case.
Pravin B Shelar [Wed, 13 Aug 2014 22:53:01 +0000 (15:53 -0700)]
datapath: Optimize Flow mask cache hash collision case.

In case hash collision on mask cache, OVS does extra flow lookup.
Following patch avoid it.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agodatapath: Avoid using wrong metadata for recic action.
Pravin B Shelar [Wed, 13 Aug 2014 23:08:02 +0000 (16:08 -0700)]
datapath: Avoid using wrong metadata for recic action.

Recirc action needs to extract flow key from packet, it uses tun_info
from OVS_CB for setting tunnel meta data in flow key. But tun_info
can be overwritten by tunnel send action. This would result in wrong
flow key for the recirculation.
Following patch copies flow-key meta data from OVS_CB packet key
itself thus avoids this bug.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agodatapath: refactor ovs flow extract API.
Pravin B Shelar [Wed, 13 Aug 2014 23:06:47 +0000 (16:06 -0700)]
datapath: refactor ovs flow extract API.

OVS flow extract is called on packet receive or packet
execute code path.  Following patch defines separate API
for extracting flow-key in packet execute code path.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agoPrepare for 2.3.1.
Justin Pettit [Thu, 14 Aug 2014 18:05:34 +0000 (11:05 -0700)]
Prepare for 2.3.1.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoSet release dates for 2.3.0. v2.3
Justin Pettit [Thu, 14 Aug 2014 18:03:39 +0000 (11:03 -0700)]
Set release dates for 2.3.0.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoIndicate that DPDK support in 2.3 is experimantal.
Justin Pettit [Thu, 14 Aug 2014 18:00:17 +0000 (11:00 -0700)]
Indicate that DPDK support in 2.3 is experimantal.

Users should consider a later version of OVS for DPDK support.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agobfd: Add configuration for setting and matching mac address.
Alex Wang [Fri, 8 Aug 2014 23:41:27 +0000 (16:41 -0700)]
bfd: Add configuration for setting and matching mac address.

This commit adds options for configuring the MAC addresses
in BFD state machine.  Therein, the "bfd_local_src_mac" and
"bfd_local_dst_mac" configure the MAC address for transmitted
BFD packets.  The "bfd_remote_dst_mac" configure the matching
of MAC address on received BFD packets.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
9 years agobfd: Flip the default value of bfd ip source and destination.
Alex Wang [Fri, 8 Aug 2014 22:50:14 +0000 (15:50 -0700)]
bfd: Flip the default value of bfd ip source and destination.

This commit flips the default value of bfd ip source and destination,
so that they match the default value of ip destination and source
of vtep schema.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
9 years agodpif-netdev: Avoid divide by zero.
Ben Pfaff [Wed, 6 Aug 2014 16:43:31 +0000 (09:43 -0700)]
dpif-netdev: Avoid divide by zero.

Otherwise creating the first dpif-netdev bridge fails because there are
no handlers:

  Program terminated with signal 8, Arithmetic exception.
  #0  0x080971e9 in dp_execute_cb (aux_=aux_@entry=0xffcfaa54,
      packet=packet@entry=0xffcfac54, md=md@entry=0xffcfac84,
      a=a@entry=0x8f58930, may_steal=false) at ../lib/dpif-netdev.c:2154
  #1  0x080b5adb in odp_execute_actions__ (dp=dp@entry=0xffcfaa54,
      packet=packet@entry=0xffcfac54, steal=steal@entry=false,
      md=md@entry=0xffcfac84, actions=actions@entry=0x8f58930,
      actions_len=actions_len@entry=20,
      dp_execute_action=dp_execute_action@entry=0x8097040 <dp_execute_cb>,
      more_actions=more_actions@entry=false) at ../lib/odp-execute.c:218
  #2  0x080b5def in odp_execute_actions (dp=dp@entry=0xffcfaa54,
      packet=packet@entry=0xffcfac54, steal=steal@entry=false,
      md=md@entry=0xffcfac84, actions=0x8f58930, actions_len=20,
      dp_execute_action=dp_execute_action@entry=0x8097040 <dp_execute_cb>)
      at ../lib/odp-execute.c:285
  #3  0x08095098 in dp_netdev_execute_actions (actions_len=<optimized out>,
      actions=<optimized out>, md=0xffcfac84, may_steal=false,
      packet=0xffcfac54, key=0xffcfaa5c, dp=<optimized out>)
      at ../lib/dpif-netdev.c:2227
  #4  dpif_netdev_execute (dpif=0x8f59598, execute=0xffcfac78)
      at ../lib/dpif-netdev.c:1551
  #5  0x0809a56c in dpif_execute (dpif=0x8f59598,
      execute=execute@entry=0xffcfac78) at ../lib/dpif.c:1227
  #6  0x08071071 in check_variable_length_userdata (backer=<optimized out>)
      at ../ofproto/ofproto-dpif.c:1040
  #7  open_dpif_backer (backerp=0x8f5834c, type=<optimized out>)
      at ../ofproto/ofproto-dpif.c:921
  #8  construct (ofproto_=0x8f581c0) at ../ofproto/ofproto-dpif.c:1120
  #9  0x080675e0 in ofproto_create (datapath_name=0x8f57310 "br0",
      datapath_type=<optimized out>, ofprotop=ofprotop@entry=0x8f576c8)
      at ../ofproto/ofproto.c:564
  #10 0x080529aa in bridge_reconfigure (ovs_cfg=ovs_cfg@entry=0x8f596d8)
      at ../vswitchd/bridge.c:572
  #11 0x08055e33 in bridge_run () at ../vswitchd/bridge.c:2339
  #12 0x0804cdbd in main (argc=9, argv=0xffcfb554)
      at ../vswitchd/ovs-vswitchd.c:116

This bug was introduced by commit 9bcefb956d8f (ofproto-dpif: fix an ovs
crash when dpif_recv_set returns error).

CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agoofproto-dpif: Use DPIF_FP_CREATE but not DPIF_FP_MODIFY.
Ben Pfaff [Sat, 2 Aug 2014 00:22:20 +0000 (17:22 -0700)]
ofproto-dpif: Use DPIF_FP_CREATE but not DPIF_FP_MODIFY.

A dpif reports EEXIST if a flow put operation that should create a new flow
instead attempts to modify an existing flow, or ENOENT if a flow put would
create a flow that overlaps some existing flow.  The latter does not always
indicate a bug in OVS userspace, because it can also mean that two
userspace OVS packet handler threads received packets that generated
the same megaflow for different microflows.  Until now, userspace has
logged this, which confuses users by making them suspect a bug.  We could
simply not log ENOENT in userspace, but that would suppress logging for
genuine bugs too.  Instead, this commit drops DPIF_FP_MODIFY from flow
put operations in ofproto-dpif, which transforms this particular
problem into EEXIST, which userspace already logs at "debug" level (see
flow_message_log_level()), effectively suppressing the logging in normal
circumstances.

It appears that in practice ofproto-dpif doesn't actually ever need to
modify flows in the datapath, only create and delete them, so this
shouldn't cause problems.

This backport squashes together commit a7d1bbdcfe4 (ofproto-dpif: Use
DPIF_FP_CREATE but not DPIF_FP_MODIFY.) and commit dc71c5dd0455
(ofproto-dpif: Fix tests broken by previous commit.) from master, with
resolved conflicts.

Suggested-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agodpif-netdev: Initialize upcall->packet when queuing to userspace.
Ben Pfaff [Mon, 4 Aug 2014 19:16:25 +0000 (12:16 -0700)]
dpif-netdev: Initialize upcall->packet when queuing to userspace.

Only the data and size members were being initialized, but all of them
should be.

This is inspired by commit b6f4590fa036 (dpif-netdev: Initialize
upcall->packet when queuing to userspace.) from master.  The background is
not exactly the same as on master (the commit that it references is not
on branch-2.3).

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoofproto-dpif: fix an ovs crash when dpif_recv_set returns error
Andy Zhou [Sun, 3 Aug 2014 23:16:40 +0000 (16:16 -0700)]
ofproto-dpif: fix an ovs crash when dpif_recv_set returns error

When dpif_recv_set returns an error, close_dpif_backer gets called,
which in term calls recirc_id_pool_destroy, which can lead to a crash
since recirc_id_pool_create() was not called before this patch.

Reported-by: Mukesh Hira <mhira@vmware.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Drop packets when interdev is not up
Chunhe Li [Wed, 30 Jul 2014 01:49:01 +0000 (09:49 +0800)]
datapath: Drop packets when interdev is not up

If the internal device is not up, it should drop received
packets. Sometimes it receive the broadcast or multicast
packets, and the ip protocol stack will casue more cpu
usage wasted.

Signed-off-by: Chunhe Li <lichunhe@huawei.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoconnmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.
Alex Wang [Tue, 29 Jul 2014 17:50:07 +0000 (10:50 -0700)]
connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.

connmgr_wants_packet_in_on_miss() is called by multiple threads
and thusly should be protected by the mutex.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
9 years agoconnmgr: Fix a typo.
Alex Wang [Tue, 29 Jul 2014 18:10:21 +0000 (11:10 -0700)]
connmgr: Fix a typo.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agoconnmgr: Only send role status messages to OpenFlow 1.4+ controllers.
Ben Pfaff [Tue, 22 Jul 2014 22:54:55 +0000 (15:54 -0700)]
connmgr: Only send role status messages to OpenFlow 1.4+ controllers.

Only OpenFlow 1.4 and later support role status messages, but this code
tried to send them to all controllers, which caused an assertion failure.

Also, add tests to check that role status messages work, and that they
don't cause trouble with OF1.2.

Reported-by: Anup Khadka <khadka.py@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agoconnmgr: Demote service controllers too in ofconn_set_role().
Ben Pfaff [Mon, 28 Jul 2014 17:17:56 +0000 (10:17 -0700)]
connmgr: Demote service controllers too in ofconn_set_role().

Service controllers can set their roles, so it's necessary to demote them
to slaves if another controller becomes master.  Unfortunately the
'controllers' hmap only contains primary controllers, so this was omitted.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agoovs-ofctl: Add --unixctl command line option.
Ben Pfaff [Mon, 28 Jul 2014 17:31:25 +0000 (10:31 -0700)]
ovs-ofctl: Add --unixctl command line option.

This matches the option offered by some other Open vSwitch daemons.  I
intend to use it in tests in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agoFix two memory leaks.
yinpeijun [Mon, 28 Jul 2014 07:21:17 +0000 (15:21 +0800)]
Fix two memory leaks.

Found by coverity.

Signed-off-by: yinpeijun <yinpeijun@huawei.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Fix buffer overrun in mask array realloc.
Pravin B Shelar [Fri, 25 Jul 2014 23:22:46 +0000 (16:22 -0700)]
datapath: Fix buffer overrun in mask array realloc.

mask realloc copies elements from old array to new array. When
shrinking array it can go beyond allocated memory.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agodatapath: Add NULL check for mask pointer.
Pravin B Shelar [Thu, 24 Jul 2014 20:32:35 +0000 (13:32 -0700)]
datapath: Add NULL check for mask pointer.

There is race in datapath when last mask in mask array deleted can
result in NULL pointer dereference in datapath.
datapath lookup does not check mask pointer if its index is less than
mask-array count. That is safe because delete operation moves last valid
pointer to the deleted element. But this does not work if we are
deleting last element in array. Following patch adds NULL check for the
mask pointer.
This patch also avoids accessing ma->count without any locks.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agodatapath: reorder action netlink attribute definition for upstreaming
Andy Zhou [Thu, 24 Jul 2014 09:17:48 +0000 (02:17 -0700)]
datapath: reorder action netlink attribute definition for upstreaming

Keeping the order of netlink attribute definition in the order of
upstreaming is the best way to keep all released user space program
forward compatible with upstreamed kernel modules.

Adjust action netlink attribute order to match with the current
upstreaming plan.

Recirc and hash actions are introduced in branch 2.3, which will be
fixed by the patch. The MPLS actions have been released since
branch-2.1 but there is no kernel implementation of them prior to
branch 2.3. Thus the ordering change should not affect them.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoroute-table: Make route-table module thread-safe.
Ryan Wilson [Thu, 29 May 2014 21:56:09 +0000 (14:56 -0700)]
route-table: Make route-table module thread-safe.

Since the use of xcache, the netdev struct can be freed by the
revalidator threads.  This fact also makes the following race possible:

1. Consider there is a gre tunnel, and datapath flows that go through
   the tunnel.  Now, assume user deletes the tunnel port.
2. The main thread closes all of its references to the corresponding
   netdev struct.
3. If the ukey for the tunnel datapath flows hold the last reference
   to the old port's netdev, the revalidator will then close it
   and remove the netlink notifier struct (if the netdev is the last
   vport netdev).
4. However, if the main thread executes the netdev_vport_run(), and
   sees the existence of netlink notifier struct (before revalidator
   frees it), it will try polling updates from the notifier socket.
   And when it polls the socket fd, the fd has already been freed,
   and poll will keep failing with "Bad file descriptor".

The following script could be used to reproduce the race:
- assume on a VM-VM setup, with the setup below:

ovs-vsctl add-br br-int -- add-port br-int p3
ovs-vsctl add-port br-int vif1 -- set int vif1 type=internal
ifconfig vif1 11.0.0.1 up; ifconfig eth3 3.3.3.1 up

- while keeping a ping from 11.0.0.1 to 11.0.0.2, run this loop:

for i in `seq 1000000`; do
    sleep 5; ovs-vsctl del-port p3;
    ovs-vsctl add-port br-int p3 -- set int p3  type=gre \
        options:remote_ip=3.3.3.2 options:key=1;
done

- after a while, the race should be triggered.  and the main thread
  should run at 100% cpu.

This race has already been fixed on master by commit 3c27dbe6 (route
-table: Make route-table module thread-safe.).  This commit backports
it to branch-2.3.

VMware-BZ: #1287360

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
9 years agoovsdb: Don't add ovsdb-server.c to libovsdb.
Gurucharan Shetty [Fri, 18 Jul 2014 01:15:17 +0000 (18:15 -0700)]
ovsdb: Don't add ovsdb-server.c to libovsdb.

Without this change, with shared libraries, VLOG
constructor for ovsdb-server would get called twice corrupting
the 'vlog_modules' list causing an infinite loop.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Reported-by: Gur Stavi <gstavi@mrv.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agostp: Make stp-disabled port forward stp bpdu packets.
Alex Wang [Wed, 16 Jul 2014 01:52:19 +0000 (18:52 -0700)]
stp: Make stp-disabled port forward stp bpdu packets.

Commit 0d1cee123a84 (stp: Fix bpdu tx problem in listening state)
makes ovs drop the stp bpdu packets if stp is not enabled on the
input port.  However, when pif bridge is used and stp is enabled
on the integration bridge.  The flow translation of stp bpdu
packets will go through a level of resubmission which changes
the input port to the corresponding peer port.  Since, the
patch port on the pif bridge does not have stp enabled, the
flow translation will drop the bpdu packets.

This commit fixes the issue by making ovs forward stp bpdu packets
on stp-disabled port.

VMware-BZ: #1284695

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agodatapath: Check for NULL upcall_portids.
Pravin B Shelar [Tue, 15 Jul 2014 18:12:11 +0000 (11:12 -0700)]
datapath: Check for NULL upcall_portids.

Following patch adds NULL check for memory allocated
by kmalloc.

Reported-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetlink-socket: Fix handling socket allocation failure in nl_dump_start().
Ben Pfaff [Mon, 14 Jul 2014 21:06:03 +0000 (14:06 -0700)]
netlink-socket: Fix handling socket allocation failure in nl_dump_start().

If nl_pool_alloc() failed, then 'dump' was not initialized at all and
further use of the dump would access uninitialized data, probably causing
a crash.

Found by inspection.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agodpif-linux: Avoid null dereference if all ports disappear.
Ben Pfaff [Mon, 14 Jul 2014 20:17:05 +0000 (13:17 -0700)]
dpif-linux: Avoid null dereference if all ports disappear.

When dpif_linux_refresh_channels() refreshes the set of channels when
the number of handlers changes, it destroys all the dpif's channels and
sets dpif->uc_array_size to 0.  If the port dump later in the function
turns up no ports (which generally indicates a bug), then no channels will
be allocated and thus dpif->uc_array_size will remain 0 and 'channels' will
be null in each handler.  This is self-consistent, at least, but
dpif_linux_port_get_pid__() was still willing in this situation to
try to access element 0 of the set of channels, dereferencing a null
pointer.

This fixes the problem.

I encountered this while looking at a bug that I had introduced during
development that caused the port dump to always be empty.  It would be
difficult to encounter in normal use.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agoofp-msgs: Correct code for queue configuration messages in OpenFlow 1.0.
Ben Pfaff [Mon, 14 Jul 2014 17:37:38 +0000 (10:37 -0700)]
ofp-msgs: Correct code for queue configuration messages in OpenFlow 1.0.

Reported-by: Simon Jouet <simon.jouet@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoofp-util: Fix null pointer dereference in ofputil_pull_buckets().
Ben Pfaff [Mon, 14 Jul 2014 21:33:01 +0000 (14:33 -0700)]
ofp-util: Fix null pointer dereference in ofputil_pull_buckets().

Found by clang-analyzer.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agotests: Disable glibc memory checking under glibc <= 2.11.
Ben Pfaff [Fri, 11 Jul 2014 18:03:08 +0000 (11:03 -0700)]
tests: Disable glibc memory checking under glibc <= 2.11.

We noticed that the unit tests sometimes fail on XenServer inside glibc's
memory checker, in the free_check() function.  It turns out that the
glibc memory checker in glibc 2.11 and earlier had an internal race that
caused false positives in multithreaded programs.

This commit avoids the problem by disabling the glibc memory checker in
glibc 2.11 and earlier.

VMware-BZ: #1267127
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agonetlink-socket: Work around kernel Netlink dump thread races.
Ben Pfaff [Thu, 10 Jul 2014 23:48:16 +0000 (16:48 -0700)]
netlink-socket: Work around kernel Netlink dump thread races.

The Linux kernel Netlink implementation has two races that cause problems
for processes that attempt to dump a table in a multithreaded manner.

The first race is in the structure of the kernel netlink_recv() function.
This function pulls a message from the socket queue and, if there is none,
reports EAGAIN:
skb = skb_recv_datagram(sk, flags, noblock, &err);
if (skb == NULL)
goto out;
Only if a message is successfully read from the socket queue does the
function, toward the end, try to queue up a new message to be dumped:
if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) {
ret = netlink_dump(sk);
if (ret) {
sk->sk_err = ret;
sk->sk_error_report(sk);
}
}
This means that if thread A reads a message from a dump, then thread B
attempts to read one before A queues up the next, B will get EAGAIN.  This
means that, following EAGAIN, B needs to wait until A returns to userspace
before it tries to read the socket again.  nl_dump_next() already does
this, using 'dump->status_seq' (although the need for it has never been
explained clearly, to my knowledge).

The second race is more serious.  Suppose thread X and thread Y both
simultaneously attempt to queue up a new message to be dumped, using the
call to netlink_dump() quoted above.  netlink_dump() begins with:
mutex_lock(nlk->cb_mutex);

cb = nlk->cb;
if (cb == NULL) {
err = -EINVAL;
goto errout_skb;
}
Suppose that X gets cb_mutex first and finds that the dump is complete.  It
will therefore, toward the end of netlink_dump(), clear nlk->cb to NULL to
indicate that no dump is in progress and release the mutex:
nlk->cb = NULL;
mutex_unlock(nlk->cb_mutex);
When Y grabs cb_mutex afterward, it will see that nlk->cb is NULL and
return -EINVAL as quoted above.  netlink_recv() stuffs -EINVAL in sk_err,
but that error is not reported immediately; instead, it is saved for the
next read from the socket.  Since Open vSwitch maintains a pool of Netlink
sockets, that next failure can crop up pretty much anywhere.  One of the
worst places for it to crop up is in the execution of a later transaction
(e.g. in nl_sock_transact_multiple__()), because userspace treats Netlink
transactions as idempotent and will re-execute them when socket errors
occur.  For a transaction that sends a packet, this causes packet
duplication, which we actually observed in practice.  (ENOBUFS should
actually cause transactions to be re-executed in many cases, but EINVAL
should not; this is a separate bug in the userspace netlink code.)

VMware-BZ: #1283188
Reported-and-tested-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agonetlink-socket: Fix sign of error code.
Ben Pfaff [Thu, 10 Jul 2014 21:32:10 +0000 (14:32 -0700)]
netlink-socket: Fix sign of error code.

Commit 8f20fd98db (netlink-socket: Work around upstream kernel Netlink
bug.) got the sign of the error code wrong, so that it reported e.g. -22
for EINVAL to nl_sock_recv__()'s caller, instead of 22.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agodatapath: add skb_clone NULL check in the recirc action.
Andy Zhou [Thu, 10 Jul 2014 07:30:27 +0000 (00:30 -0700)]
datapath: add skb_clone NULL check in the recirc action.

Refactoring recirc action implementation.

The main change is to fix a bug where the NULL check after skb clone()
call is missing.  The fix is to return -ENOMEM whenever skb_clone()
failed to create a clone.

Also rearranged adjacent code to improve readability.

Reported-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath/flow_netlink: Fix NDP flow mask validation
Daniele Di Proietto [Wed, 9 Jul 2014 22:22:14 +0000 (15:22 -0700)]
datapath/flow_netlink: Fix NDP flow mask validation

match_validate() enforce that a mask matching on NDP attributes has also an
exact match on ICMPv6 type.
The ICMPv6 type, which is 8-bit wide, is stored in the 'tp.src' field of
'struct sw_flow_key', which is 16-bit wide.
Therefore, an exact match on ICMPv6 type should only check the first 8 bits.

This commit fixes a bug that prevented flows with an exact match on NDP field
from being installed

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodpif-linux: Recheck the socket pointer existence before getting its pid.
Alex Wang [Tue, 8 Jul 2014 04:58:33 +0000 (21:58 -0700)]
dpif-linux: Recheck the socket pointer existence before getting its pid.

This commit fixes a race between port deletion and flow miss handling.
More specifically, a port could be removed by main thread while
the handler thread is handling the flow miss from it.  If the flow
requires slow path action, the handler thread will try querying a pid
from port's socket.  Since the port has been deleted, the query will
cause a dereference of NULL socket pointer.

This commit makes the handler thread recheck the socket pointer before
dereferencing it.

VMware-BZ: 1251981

Reported-by: Pratap Reddy <preddy@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agotests: Fix "megaflow disabled" test.
Joe Stringer [Tue, 8 Jul 2014 09:01:20 +0000 (09:01 +0000)]
tests: Fix "megaflow disabled" test.

The previous 'grep' logic would occasionally catch unintended flows.
There's no reason to check for these flows separately, so combine the
two checks.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ansis Atteka <aatteka@nicira.com>
9 years agorevalidator: Revalidate missed flows.
Joe Stringer [Thu, 3 Jul 2014 08:29:26 +0000 (08:29 +0000)]
revalidator: Revalidate missed flows.

If the datapath doesn't dump a flow for some reason, and the current
dump is expected to revalidate all flows in the datapath, then perform
revalidation for those flows by fetching them during the sweep phase.
If revalidation is not required, then leave the flow in the datapath and
don't revalidate it.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodpif: Support fetching flow mask via dpif_flow_get().
Joe Stringer [Fri, 4 Jul 2014 08:37:12 +0000 (08:37 +0000)]
dpif: Support fetching flow mask via dpif_flow_get().

Change the interface to allow implementations to pass back a buffer, and
allow callers to specify which of actions, mask, and stats they wish to
receive. This will be used in the next commit.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoutil: fix compile warnings
Ansis Atteka [Tue, 8 Jul 2014 04:11:53 +0000 (04:11 +0000)]
util: fix compile warnings

This patch fixes two compile warnings introduced by commit
64b73291 ("util: create a copy of program_name"):
1. ../lib/util.c:457:5: error: passing argument 1 of 'free'
   discards 'const' qualifier from pointer target type; And
2. ../lib/util.c:463:5: error: ISO C90 forbids mixed declarations
   and code [-Werror=declaration-after-statement] (affected only
   branch-2.3 that is C90 compliant and not the master)

Reported-By: Joe Stringer <jstringer@nicira.com>
Reported-By: Lorand Jakab <lojakab@cisco.com>
Signed-Off-By: Ansis Atteka <aatteka@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>