cascardo/ovs.git
9 years agoDo not seemingly #include Linux-specific headers on other platforms.
Ben Pfaff [Mon, 4 Aug 2014 18:11:40 +0000 (11:11 -0700)]
Do not seemingly #include Linux-specific headers on other platforms.

Until now, the OVS source tree has had a whole maze of header files that
make "#include <linux/openvswitch.h>" work OK regardless of platform, but
this confuses everyone new to the tree, at first glance, and is difficult
to understand at second glance too.

This commit renames include/linux/openvswitch.h to
datapath/linux/compat/include/linux/openvswitch.h without other change,
then modifies the userspace build to generate a header that makes sense in
portable Open vSwitch userspace from that header.

It then removes all the remaining include/linux/* files since they are now
unused.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agodatapath: do not use vport type to determine presence of Geneve attributes
Ansis Atteka [Tue, 29 Jul 2014 22:39:35 +0000 (15:39 -0700)]
datapath: do not use vport type to determine presence of Geneve attributes

This patch fixes following kernel crash that could happen, if geneve
vport was not added yet, but revalidator thread attempted to dump flows.
To reproduce:
1. switch tunnel type between geneve and gre in a loop; and
2. run ping.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
IP: [<ffffffffa0385470>] ovs_nla_put_flow+0x3d0/0x7c0 [openvswitch]
PGD 3b32b067 PUD 3b2ef067 PMD 0
Oops: 0000 [#2] SMP
...
CPU: 0 PID: 6450 Comm: revalidator2 Tainted: GF     D    O
3.13.0-24-generic #46-Ubuntu
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference
Platform, BIOS 6.00 07/02/2012
task: ffff88003b4aafe0 ti: ffff88003d314000 task.ti: ffff88003d314000
RIP: 0010:[<ffffffffa0385470>]  [<ffffffffa0385470>]
ovs_nla_put_flow+0x3d0/0x7c0 [openvswitch]
RSP: 0018:ffff88003d315a10  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88003a9a9960 RCX: 0000000000000000
RDX: 0000000000000002 RSI: ffffffffffffffc8 RDI: ffff88003babcb80
RBP: ffff88003d315a68 R08: 0000000000000000 R09: 0000000000000004
R10: ffff880039c23034 R11: 0000000000000008 R12: ffff88003a861600
R13: ffff88003a9a9960 R14: ffff88003babcb80 R15: qffff88003a861600
FS:  00007ff0f5d94700(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000048 CR3: 000000003b55b000 CR4: 00000000000007f0
Stack:
ffffffff81385093 0000000000000000 0000000000000000 0000000000000000
ffff880000000000 ffff88003d315a58 ffff880039c23014 ffff88003a9a97a0
ffff88003babcb80 ffff880039c23018 ffff88003a861600 ffff88003d315ad0
Call Trace:
[<ffffffff81385093>] ? __nla_reserve+0x43/0x50
[<ffffffffa037e683>] ovs_flow_cmd_fill_info+0x93/0x2b0 [openvswitch]
[<ffffffffa0387159>] ? ovs_flow_tbl_dump_next+0x49/0xc0 [openvswitch]
[<ffffffffa037e920>] ovs_flow_cmd_dump+0x80/0xd0 [openvswitch]
[<ffffffff81645004>] netlink_dump+0x84/0x240
[<ffffffff816458eb>] __netlink_dump_start+0x1ab/0x220
[<ffffffff816498d7>] genl_family_rcv_msg+0x337/0x370
[<ffffffffa037e8a0>] ? ovs_flow_cmd_fill_info+0x2b0/0x2b0 [openvswitch]
[<ffffffff811a2778>] ? __kmalloc_node_track_caller+0x58/0x1e0
[<ffffffff81649910>] ? genl_family_rcv_msg+0x370/0x370
[<ffffffff816499a1>] genl_rcv_msg+0x91/0xd0
[<ffffffff81647a29>] netlink_rcv_skb+0xa9/0xc0
[<ffffffff81647f28>] genl_rcv+0x28/0x40
[<ffffffff81647055>] netlink_unicast+0xd5/0x1b0
[<ffffffff8164742f>] netlink_sendmsg+0x2ff/0x740
[<ffffffff816024eb>] sock_sendmsg+0x8b/0xc0
[<ffffffff811bbaa1>] ? __sb_end_write+0x31/0x60
[<ffffffff811d42bf>] ? touch_atime+0x10f/0x140
[<ffffffff811c2471>] ? pipe_read+0x371/0x400
[<ffffffff81602691>] SYSC_sendto+0x121/0x1c0
[<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
[<ffffffff81020d35>] ? syscall_trace_enter+0x145/0x250
[<ffffffff8160319e>] SyS_sendto+0xe/0x10
[<ffffffff8172663f>] tracesys+0xe1/0xe6

Signed-Off-By: Ansis Atteka <aatteka@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agodatapath-windows: Fix crash when internal port is removed.
Eitan Eliahu [Sat, 2 Aug 2014 00:11:51 +0000 (17:11 -0700)]
datapath-windows: Fix crash when internal port is removed.

Avoids a BSOD when AllowManagementOS is set to false.

Reported-at: https://github.com/openvswitch/ovs/issues/13
Reported-by: Alin Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Eitan Eliahu <eliahue@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Nithin Raju <nithin@vmware.com>
9 years agobitmap: Add test for bitmap_equal and bitmap_scan
Kmindg [Thu, 31 Jul 2014 05:16:04 +0000 (13:16 +0800)]
bitmap: Add test for bitmap_equal and bitmap_scan

Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Kmindg <kmindg@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath-windows: Add temporary files to gitignore.
Saurabh Shah [Wed, 30 Jul 2014 20:29:54 +0000 (20:29 +0000)]
datapath-windows: Add temporary files to gitignore.

Github Issue: #12

Reported-by: Alin Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Saurabh Shah <ssaurabh@vmware.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
9 years agodatapath: Remove unlikely() for WARN_ON() conditions
Thomas Graf [Thu, 31 Jul 2014 01:05:01 +0000 (18:05 -0700)]
datapath: Remove unlikely() for WARN_ON() conditions

No need for the unlikely(), WARN_ON() and BUG_ON() internally use
unlikely() on the condition.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath-windows: Rename the extension & vendor in the INF file.
Saurabh Shah [Wed, 30 Jul 2014 16:03:20 +0000 (09:03 -0700)]
datapath-windows: Rename the extension & vendor in the INF file.

Github issue: #14

Signed-off-by: Saurabh Shah <ssaurabh@vmware.com>
Reported-by: Alessandro Pilotti <apilotti@cloudbasesolutions.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
9 years agodatapath: Remove redundant key ref from upcall_info.
Pravin B Shelar [Tue, 29 Jul 2014 17:04:52 +0000 (10:04 -0700)]
datapath: Remove redundant key ref from upcall_info.

struct dp_upcall_info has pointer to pkt_key which is already
available in OVS_CB.  This also simplifies upcall handling
for gso packet.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agodatapath: Drop packets when interdev is not up
Chunhe Li [Wed, 30 Jul 2014 01:49:01 +0000 (09:49 +0800)]
datapath: Drop packets when interdev is not up

If the internal device is not up, it should drop received
packets. Sometimes it receive the broadcast or multicast
packets, and the ip protocol stack will casue more cpu
usage wasted.

Signed-off-by: Chunhe Li <lichunhe@huawei.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoconnmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.
Alex Wang [Tue, 29 Jul 2014 17:50:07 +0000 (10:50 -0700)]
connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.

connmgr_wants_packet_in_on_miss() is called by multiple threads
and thusly should be protected by the mutex.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
9 years agoconnmgr: Fix a typo.
Alex Wang [Tue, 29 Jul 2014 18:10:21 +0000 (11:10 -0700)]
connmgr: Fix a typo.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agotests: Add another test for extended registers.
Ben Pfaff [Tue, 29 Jul 2014 16:27:11 +0000 (09:27 -0700)]
tests: Add another test for extended registers.

ONF-JIRA: EXT-244
Suggested-by: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jean Tourrilhes <jt@hpl.hp.com>
9 years agoAdd more files to the openvswitch library on MSVC
Alin Serdean [Tue, 29 Jul 2014 15:24:08 +0000 (15:24 +0000)]
Add more files to the openvswitch library on MSVC

Add netlink related files to the windows build.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agonetlink-socket: Adapt to Windows and MSVC.
Alin Serdean [Tue, 29 Jul 2014 15:23:28 +0000 (15:23 +0000)]
netlink-socket: Adapt to Windows and MSVC.

Add two functions set_sock_pid_in_kernel and portid_next. This will allow
the channel identification for the kernel extension to send back messages.

Replace send with WriteFile equivalent and ignore nl_sock_drain for the moment
under MSVC.

Replace sendmsg and recvmsg with ReadFile and WriteFile equivalents.

On MSVC put in handle instead of fd(sock->fd becomes sock->handle).

Creation of the netlink socket will be replaced by CreateFile equivalent.

Add MAX_STACK_LENGTH for MSVC.  This will be our maximum size for on-stack
copy buffer.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agonetlink-socket: Allow compiling on MSVC even without HAVE_NETLINK.
Alin Serdean [Tue, 29 Jul 2014 15:22:37 +0000 (15:22 +0000)]
netlink-socket: Allow compiling on MSVC even without HAVE_NETLINK.

Bypass the error compilation when compiling under MSVC.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agonetlink-protocol: Add more definitions.
Alin Serdean [Tue, 29 Jul 2014 15:22:03 +0000 (15:22 +0000)]
netlink-protocol: Add more definitions.

Add MAX_LINKS define needed for nl_pool.

Add NLM_F_CREATE and NLM_F_EXCL defines also.

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoAdd defines, enums and headers for MSVC
Alin Serdean [Tue, 29 Jul 2014 15:21:16 +0000 (15:21 +0000)]
Add defines, enums and headers for MSVC

Add defines needed to compile netlink-socket.c and netlink.c.

Add a wrapper and the functionality behind it for syconf.

Add the newly created files to the noinst_HEADERS in windows/automake.mk

Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agocmap: Merge CMAP_FOR_EACH_SAFE into CMAP_FOR_EACH.
Ben Pfaff [Tue, 29 Jul 2014 16:02:23 +0000 (09:02 -0700)]
cmap: Merge CMAP_FOR_EACH_SAFE into CMAP_FOR_EACH.

There isn't any significant downside to making cmap iteration "safe" all
the time, so this drops the _SAFE variant.

Similar changes to CMAP_CURSOR_FOR_EACH and CMAP_CURSOR_FOR_EACH_CONTINUE.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agonetlink-socket: Add conceptual documentation.
Ben Pfaff [Tue, 29 Jul 2014 15:59:40 +0000 (08:59 -0700)]
netlink-socket: Add conceptual documentation.

Based on a conversation with the VMware Hyper-V team earlier today.

This commit also changes a couple of functions that were only used with
netlink-socket.c into static functions.  I couldn't think of a reason for
code outside that file to use them.

Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Use currect rcu API in exact match flow lookup function.
Pravin B Shelar [Sat, 26 Jul 2014 03:34:48 +0000 (20:34 -0700)]
datapath: Use currect rcu API in exact match flow lookup function.

exact match cache lookup is always done under ovs lock. So
use ovsl_dereference() API for rcu access.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agoconnmgr: Only send role status messages to OpenFlow 1.4+ controllers.
Ben Pfaff [Tue, 22 Jul 2014 22:54:55 +0000 (15:54 -0700)]
connmgr: Only send role status messages to OpenFlow 1.4+ controllers.

Only OpenFlow 1.4 and later support role status messages, but this code
tried to send them to all controllers, which caused an assertion failure.

Also, add tests to check that role status messages work, and that they
don't cause trouble with OF1.2.

Reported-by: Anup Khadka <khadka.py@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agoconnmgr: Demote service controllers too in ofconn_set_role().
Ben Pfaff [Mon, 28 Jul 2014 17:17:56 +0000 (10:17 -0700)]
connmgr: Demote service controllers too in ofconn_set_role().

Service controllers can set their roles, so it's necessary to demote them
to slaves if another controller becomes master.  Unfortunately the
'controllers' hmap only contains primary controllers, so this was omitted.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agoovs-ofctl: Add --unixctl command line option.
Ben Pfaff [Mon, 28 Jul 2014 17:31:25 +0000 (10:31 -0700)]
ovs-ofctl: Add --unixctl command line option.

This matches the option offered by some other Open vSwitch daemons.  I
intend to use it in tests in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agodatapath-windows: Kernel module for HyperV.
Saurabh Shah [Mon, 28 Jul 2014 00:26:58 +0000 (17:26 -0700)]
datapath-windows: Kernel module for HyperV.

The kernel switch extension has support for bridged back forwarding & tunneling
over VXLAN. There is no Netlink integration as it is still being worked out.

Co-Authored-By: Ankur Sharma <ankursharma@vmware.com>
Signed-off-by: Ankur Sharma <ankursharma@vmware.com>
Co-Authored-By: Eitan Eliahu <eliahue@vmware.com>
Signed-off-by: Eitan Eliahu <eliahue@vmware.com>
Co-Authored-By: Guolin Yang <gyang@vmware.com>
Signed-off-by: Guolin Yang <gyang@vmware.com>
Co-Authored-By: Linda Sun <lsun@vmware.com>
Signed-off-by: Linda Sun <lsun@vmware.com>
Co-Authored-By: Nithin Raju <nithin@vmware.com>
Signed-off-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Saurabh Shah <ssaurabh@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agometa-flow: Add 64-bit registers.
Ben Pfaff [Mon, 28 Jul 2014 16:50:37 +0000 (09:50 -0700)]
meta-flow: Add 64-bit registers.

These 64-bit registers are intended to conform with the OpenFlow 1.5
draft specification.

EXT-244.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agoofproto-dpif-upcall: Fix sparse warnings.
Ben Pfaff [Sat, 26 Jul 2014 19:19:03 +0000 (12:19 -0700)]
ofproto-dpif-upcall: Fix sparse warnings.

Fixes these warnings from "sparse":

../ofproto/ofproto-dpif-upcall.c:761:1: warning: symbol 'free_upcall' was
    not declared. Should it be static?
../ofproto/ofproto-dpif-upcall.c:849:1: warning: symbol 'convert_upcall'
    was not declared. Should it be static?

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agoFix two memory leaks.
yinpeijun [Mon, 28 Jul 2014 07:21:17 +0000 (15:21 +0800)]
Fix two memory leaks.

Found by coverity.

Signed-off-by: yinpeijun <yinpeijun@huawei.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoRemove assumption that there are 64 or fewer fields.
Ben Pfaff [Sat, 26 Jul 2014 19:15:26 +0000 (12:15 -0700)]
Remove assumption that there are 64 or fewer fields.

An upcoming commit will increase the number of fields beyond 64.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agobitmap: Add new functions.
Ben Pfaff [Sat, 26 Jul 2014 05:23:44 +0000 (22:23 -0700)]
bitmap: Add new functions.

These will be used in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agometa-flow: Simplify handling of a variable number of registers.
Ben Pfaff [Thu, 15 May 2014 15:19:11 +0000 (08:19 -0700)]
meta-flow: Simplify handling of a variable number of registers.

At the time that Open vSwitch implemented registers, there was a high cost
to adding additional fields, so I wrote the code so that the number of
registers could be reduced at compile time.  Now, fields are cheaper
(though not free) and in the meantime I have never heard of anyone reducing
the number of registers.  Since I intend to add more code that would
require awkward "#if"s like this, I think that this is a good time to
simplify it by requiring FLOW_N_REGS to be fixed.  This commit does that.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agodatapath: Fix buffer overrun in mask array realloc.
Pravin B Shelar [Fri, 25 Jul 2014 23:22:46 +0000 (16:22 -0700)]
datapath: Fix buffer overrun in mask array realloc.

mask realloc copies elements from old array to new array. When
shrinking array it can go beyond allocated memory.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agodpif-netdev: Polling threads directly call ofproto upcall functions.
Ryan Wilson [Sat, 26 Jul 2014 06:51:55 +0000 (06:51 +0000)]
dpif-netdev: Polling threads directly call ofproto upcall functions.

Typically, kernel datapath threads send upcalls to userspace where
handler threads process the upcalls. For TAP and DPDK devices, the
datapath threads operate in userspace, so there is no need for
separate handler threads.

This patch allows userspace datapath threads to directly call the
ofproto upcall functions, eliminating the need for handler threads
for datapaths of type 'netdev'.

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
9 years agodatapath: Add NULL check for mask pointer.
Pravin B Shelar [Thu, 24 Jul 2014 20:32:35 +0000 (13:32 -0700)]
datapath: Add NULL check for mask pointer.

There is race in datapath when last mask in mask array deleted can
result in NULL pointer dereference in datapath.
datapath lookup does not check mask pointer if its index is less than
mask-array count. That is safe because delete operation moves last valid
pointer to the deleted element. But this does not work if we are
deleting last element in array. Following patch adds NULL check for the
mask pointer.
This patch also avoids accessing ma->count without any locks.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agocfm: Reduce "long delay" message from WARN to INFO, to match BFD behavior.
Ben Pfaff [Fri, 25 Jul 2014 16:19:17 +0000 (09:19 -0700)]
cfm: Reduce "long delay" message from WARN to INFO, to match BFD behavior.

These messages can cause the testsuite to fail on a busy build machine
since the testsuite treats WARN or ERR log messages as failures.  BFD
uses an INFO message instead of WARN, so this just changes CFM to match.

Alternatively, the testsuite could ignore "long delay" messages (it ignores
some other categories of messages).  In that case I'd expect that we'd
want to change BFD to match CFM since I don't know of a reason why they
should log differently.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agodatapath: reorder action netlink attribute definition for upstreaming
Andy Zhou [Thu, 24 Jul 2014 09:17:48 +0000 (02:17 -0700)]
datapath: reorder action netlink attribute definition for upstreaming

Keeping the order of netlink attribute definition in the order of
upstreaming is the best way to keep all released user space program
forward compatible with upstreamed kernel modules.

Adjust action netlink attribute order to match with the current
upstreaming plan.

Recirc and hash actions are introduced in branch 2.3, which will be
fixed by the patch. The MPLS actions have been released since
branch-2.1 but there is no kernel implementation of them prior to
branch 2.3. Thus the ordering change should not affect them.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetdev-provider: Clarify comment where 'get_next_hop' function looks.
Ben Pfaff [Wed, 23 Jul 2014 19:48:42 +0000 (12:48 -0700)]
netdev-provider: Clarify comment where 'get_next_hop' function looks.

Some readers thought it was looking in an ofproto- or netdev-specific
table.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
9 years agodatapath/flow_netlink: Avoid wildcarding tunnel key with disabled megaflows
Daniele Di Proietto [Thu, 17 Jul 2014 06:36:05 +0000 (06:36 +0000)]
datapath/flow_netlink: Avoid wildcarding tunnel key with disabled megaflows

If the userspace wants to match on a flow with some tunnel attributesset to 0,
it simply omits them in the netlink attributes stream.
Since our wildcarding logic (when megaflows are disabled) is based on the
attributes in the netlink stream, we set our mask incorrectly.

This commit adds a check to detect if the userspace wants to match on a tunnel,
in which case we simply unwildcard the whole tun_key

Reported-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
9 years agodatapath: flow_netlink: Fix a bug.
Alex Wang [Wed, 23 Jul 2014 06:14:50 +0000 (23:14 -0700)]
datapath: flow_netlink: Fix a bug.

Commit 62974663fe (datapath/flow_netlink: Create right mask with
disabled megaflows) introduced the bug which caused
ovs_nla_get_match() returns immediately after parsing the flow
mask for OVS_KEY_ATTR_ENCAP.  Consequently, when vlan encapsulated
packets are present, the corresponding datapath flows will have
incorrect mask like below.  And the incorrect flows could affect
other non-vlan packets.

~/ovs# ovs-dpctl dump-flows
in_port(3/0xffff0000),eth_type(0x8100),encap(), packets:0,
bytes:0, used:never, actions:2

This commit fixes the bug by checking and handling the return
value of the parsing function correctly.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodpif-netdev: Initialize upcall->packet when queuing to userspace.
Ben Pfaff [Wed, 23 Jul 2014 04:06:44 +0000 (21:06 -0700)]
dpif-netdev: Initialize upcall->packet when queuing to userspace.

Commit db73f7166a6 (netdev-dpdk: Fix race condition with DPDK mempools in
non pmd threads) switched to a new way of setting up 'upcall->packet', but
only initialized two of the fields in the packet.  This could cause
core dumps and other strange behavior.  In particular it caused failures in
several unit tests on XenServer.

This commit fixes the problem by initializing the entire ofpbuf.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agodpif-netdev: Reduce netdev_flow_key size
Daniele Di Proietto [Wed, 23 Jul 2014 00:06:23 +0000 (17:06 -0700)]
dpif-netdev: Reduce netdev_flow_key size

struct 'miniflow' already contains MINI_N_INLINE values, therefore
we can save few bytes in netdev_flow_key.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetdev-dpdk: Increase tx queue size and rx batch size
Daniele Di Proietto [Wed, 23 Jul 2014 00:09:10 +0000 (17:09 -0700)]
netdev-dpdk: Increase tx queue size and rx batch size

These values has been found to give the best throughput
in simple cases (1 flow 64 bytes UDP packets).

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agostp: Add more logging points for debug.
Alex Wang [Sat, 19 Jul 2014 04:49:52 +0000 (04:49 +0000)]
stp: Add more logging points for debug.

This commit adds more logging points in stp module for debugging.
Also, it makes the log print out the port name.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agonetlink-notifier: Exit loop if nl_sock_recv() returns an error
Daniele Di Proietto [Tue, 22 Jul 2014 06:38:57 +0000 (06:38 +0000)]
netlink-notifier: Exit loop if nl_sock_recv() returns an error

An error from nl_sock_recv() could mean that there are some issues with the
netlink socket (EBADF, ENOTSOCK, ...). Calling nl_sock_recv() in this case is
harmful: nln_run() will never return and since we are calling it from the main
thread, vswitchd becomes unresponsive.
Also, with this commit we avoid calling the notifier callback in case of error
(except for ENOBUFS, which means that there could be too many notifications)

Suggested-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agoconfigure: Don't check for malloc hooks that we no longer use.
Ben Pfaff [Tue, 22 Jul 2014 23:18:35 +0000 (16:18 -0700)]
configure: Don't check for malloc hooks that we no longer use.

Commit 825da1c6d1c7 (leak-checker: Remove because it cannot be made
thread-safe.) removed the only uses of these hooks but neglected to remove
the test for them.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agoovsdb-server: Document RFC 7047 extensions to ovsdb <error>s.
Ben Pfaff [Tue, 22 Jul 2014 00:08:54 +0000 (17:08 -0700)]
ovsdb-server: Document RFC 7047 extensions to ovsdb <error>s.

Reported-by: Madhu Venugopal <vmadhu@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agoovs-ctl: Add comment to explain why we only save ofports for pre-1.10.
Ben Pfaff [Tue, 22 Jul 2014 23:09:26 +0000 (16:09 -0700)]
ovs-ctl: Add comment to explain why we only save ofports for pre-1.10.

We've had a couple of questions about this lately, including one suggestion
that we should always save the OpenFlow port numbers.  This explains why
the behavior is as it is.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agodebian: Avoid -Wformat-zero-length warnings.
Ben Pfaff [Tue, 22 Jul 2014 18:50:37 +0000 (11:50 -0700)]
debian: Avoid -Wformat-zero-length warnings.

Debian puts an extra "-Wformat" in the CFLAGS following OVS's own
"-Wformat -Wno-format-zero-length", which therefore overrides
-Wno-format-zero-length, so this commit adds an extra
-Wno-format-zero-length to avoid those false positives.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agoUse xstrdup() instead of strdup(), xmalloc() instead of malloc().
Ben Pfaff [Tue, 22 Jul 2014 22:46:25 +0000 (15:46 -0700)]
Use xstrdup() instead of strdup(), xmalloc() instead of malloc().

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agodatapath: Refactor get_dp() function into multiple access APIs.
Andy Zhou [Tue, 15 Jul 2014 21:38:29 +0000 (14:38 -0700)]
datapath: Refactor get_dp() function into multiple access APIs.

Avoid recursive read_rcu_lock() by using the lighter weight get_dp_rcu()
API. Add proper locking assertions to get_dp().

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agotests: Wait for ofctl_monitor.log in ofproto-dpif controller.
Ben Pfaff [Tue, 22 Jul 2014 04:42:49 +0000 (21:42 -0700)]
tests: Wait for ofctl_monitor.log in ofproto-dpif controller.

Fixes a race in the test case.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoclassifier: Refactor cls_cursor_advance() to make it easier to follow.
Ben Pfaff [Tue, 22 Jul 2014 04:00:34 +0000 (21:00 -0700)]
classifier: Refactor cls_cursor_advance() to make it easier to follow.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agocmap, classifier: Avoid unsafe aliasing in iterators.
Ben Pfaff [Tue, 22 Jul 2014 04:00:04 +0000 (21:00 -0700)]
cmap, classifier: Avoid unsafe aliasing in iterators.

CMAP_FOR_EACH and CLS_FOR_EACH and their variants tried to use void ** as
a "pointer to any kind of pointer".  That is a violation of the aliasing
rules in ISO C which technically yields undefined behavior.  With GCC 4.1,
it causes both warnings and actual misbehavior.  One option would to add
-fno-strict-aliasing to the compiler flags, but that would only help with
GCC; who knows whether this can be worked around with other compilers.

Instead, this commit rewrites the iterators to avoid disallowed pointer
aliasing.

VMware-BZ: #1287651
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agovswitchd/bridge: Fix setting default prefix fields.
Jarno Rajahalme [Mon, 21 Jul 2014 21:34:02 +0000 (14:34 -0700)]
vswitchd/bridge: Fix setting default prefix fields.

The check for the need of default values was in the wrong place,
causing no prefix tracking to be used when database had no
configuration for a flow table.  Missing configuration means that
defaults should be used.

To limit clutter on the log, we now log the prefix tracking
configuration when it is explicitly set in the database.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-atomic: Avoid evaluating arguments multiple times.
Jarno Rajahalme [Mon, 21 Jul 2014 21:19:06 +0000 (14:19 -0700)]
ovs-atomic: Avoid evaluating arguments multiple times.

ovs-atomic-gcc4+ did expand arguments again, if locked set/get was
called.

Also fix atomic_is_lock_free().

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/ovs-rcu: evaluate argument of ovsrcu_get only once.
Jarno Rajahalme [Mon, 21 Jul 2014 21:19:06 +0000 (14:19 -0700)]
lib/ovs-rcu: evaluate argument of ovsrcu_get only once.

As ovsrcu_get() looks like a function call, it is reasonable for the
callers to expect that the arguments are evaluated only once.
CONST_CAST expands its 'POINTER' argument multiple times, and the
exact effect of this turned out to be compiler dependent.  Fix this by
expanding the macro argument before CONST_CAST, and removing
unnecessary CONST_CASTs.

VMware-BZ: #1287651

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agonetdev-dpdk: Fix race condition with DPDK mempools in non pmd threads
Daniele Di Proietto [Thu, 17 Jul 2014 21:29:36 +0000 (14:29 -0700)]
netdev-dpdk: Fix race condition with DPDK mempools in non pmd threads

DPDK mempools rely on rte_lcore_id() to implement a thread-local cache.
Our non pmd threads had rte_lcore_id() == 0. This allowed concurrent access to
the "thread-local" cache, causing crashes.

This commit resolves the issue with the following changes:

- Every non pmd thread has the same lcore_id (0, for management reasons), which
  is not shared with any pmd thread (lcore_id for pmd threads now start from 1)
- DPDK mbufs must be allocated/freed in pmd threads. When there is the need to
  use mempools in non pmd threads, like in dpdk_do_tx_copy(), a mutex must be
  held.
- The previous change does not allow us anymore to pass DPDK mbufs to handler
  threads: therefore this commit partially revert 143859ec63d45e. Now packets
  are copied for upcall processing. We can remove the extra memcpy by
  processing upcalls in the pmd thread itself.

With the introduction of the extra locking, the packet throughput will be lower
in the following cases:

- When using internal (tap) devices with DPDK devices on the same datapath.
  Anyway, to support internal devices efficiently, we needed DPDK KNI devices,
  which will be proper pmd devices and will not need this locking.
- When packets are processed in the slow path by non pmd threads. This overhead
  can be avoided by handling the upcalls directly in pmd threads (a change that
  has already been proposed by Ryan Wilson)

Also, the following two fixes have been introduced:
- In dpdk_free_buf() use rte_pktmbuf_free_seg() instead of rte_mempool_put().
  This allows OVS to run properly with CONFIG_RTE_LIBRTE_MBUF_DEBUG DPDK option
- Do not bulk free mbufs in a transmission queue. They may belong to different
  mempools

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetlink-socket: Do not make flow_dump block on netlink socket.
Alex Wang [Fri, 18 Jul 2014 21:27:36 +0000 (14:27 -0700)]
netlink-socket: Do not make flow_dump block on netlink socket.

Commit 93295354 (netlink-socket: Simplify multithreaded dumping
to match Linux reality.) makes the call to recvmsg() block if no
messages are available.  This can cause revalidator threads hanging
for long time or even deadlock when main thread tries to stop the
revalidator threads.

This commit fixes the issue by enabling the MSG_DONTWAIT flag in
the call to recvmsg().

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovsdb: Don't add ovsdb-server.c to libovsdb.
Gurucharan Shetty [Fri, 18 Jul 2014 01:15:17 +0000 (18:15 -0700)]
ovsdb: Don't add ovsdb-server.c to libovsdb.

Without this change, with shared libraries, VLOG
constructor for ovsdb-server would get called twice corrupting
the 'vlog_modules' list causing an infinite loop.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Reported-by: Gur Stavi <gstavi@mrv.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/classifier: Clarify subtable skipping.
Jarno Rajahalme [Fri, 18 Jul 2014 09:24:27 +0000 (02:24 -0700)]
lib/classifier: Clarify subtable skipping.

Clarify comments for trie-based subtable skipping.

Perform the cheaper check first.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agolib/classifier: Return all matching prefix lengths from trie lookup.
Jarno Rajahalme [Fri, 18 Jul 2014 09:24:26 +0000 (02:24 -0700)]
lib/classifier: Return all matching prefix lengths from trie lookup.

Previously we only returned the last matching prefix length
encountered during a trie lookup, and skipped subtables that had
prefixes longer than that.  This patch changes the trie lookup
functions to return all matching prefix lengths seen, so that all
non-matching prefix lengths can be skipped.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/classifier: Change local variable names.
Jarno Rajahalme [Fri, 18 Jul 2014 09:24:26 +0000 (02:24 -0700)]
lib/classifier: Change local variable names.

These stylistic changes makes the following patch a bit simpler.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agolib/classifier: Unify struct classifier and cls_classifier.
Jarno Rajahalme [Fri, 18 Jul 2014 09:24:26 +0000 (02:24 -0700)]
lib/classifier: Unify struct classifier and cls_classifier.

Now that it is clear that struct cls_classifier itself does not
need RCU indirection and pvector is defined in its own header, it
is possible get rid of the indirection from struct classifier to
struct cls_classifier.

Suggested-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agonetdev-dpdk: Refactor dpdk_class_init()
Daniele Di Proietto [Thu, 17 Jul 2014 00:10:59 +0000 (17:10 -0700)]
netdev-dpdk: Refactor dpdk_class_init()

The following changes were made:

- Since we have two dpdk classes, we should split the initial operations needed
  by both classes from the initialization needed by each class.
- The dpdk_ring class does not need an initialization function: it has been
  removed. This also prevents many testcase from failing, because
  dpdk_ring_class_init() was printing an unexpected log message
  (OVS_VSWITCHD_START at tests/ofproto-macros.at:54 check for a specific set of
  startup log messages)
- If the user doesn't pass the --dpdk option we do not register the dpdk*
  classes
- Do not call VLOG_ERR if there are 0 dpdk ethernet device. OVS can now be used
  with dpdk_ring devices.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoofproto: Report controller rate limiting statistics in database.
Ben Pfaff [Thu, 17 Jul 2014 17:31:03 +0000 (10:31 -0700)]
ofproto: Report controller rate limiting statistics in database.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agosmap: New function smap_add_nocopy().
Ben Pfaff [Wed, 16 Jul 2014 20:34:03 +0000 (13:34 -0700)]
smap: New function smap_add_nocopy().

An upcoming commit will add a caller that needs to format both the key and
the value.  That isn't cleanly possible with the current interface.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agoSimplify ofproto_controller_info by using a struct smap in place of array.
Ben Pfaff [Thu, 17 Jul 2014 17:27:00 +0000 (10:27 -0700)]
Simplify ofproto_controller_info by using a struct smap in place of array.

The only client for ofproto_controller_info was transforming the array of
pairs into an smap anyway.  It's easy for the code that fills in the array
to generate it as an smap directly, and it's also easier to extend later.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agopinsched: Report queued packet count correctly.
Ben Pfaff [Thu, 17 Jul 2014 17:23:52 +0000 (10:23 -0700)]
pinsched: Report queued packet count correctly.

'n_txq' was initialized to 0 and never modified, so pinsched_count_txqlen()
always returned 0.  Instead, return the correct number.

This only affected the results of "ovs-appctl memory/show", and only if
controller rate limiting was turned on, so it is not a serious bug.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agovswitch.xml: Fix typo in documentation.
Ben Pfaff [Wed, 16 Jul 2014 20:34:22 +0000 (13:34 -0700)]
vswitch.xml: Fix typo in documentation.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agostp: Make stp-disabled port forward stp bpdu packets.
Alex Wang [Wed, 16 Jul 2014 01:52:19 +0000 (18:52 -0700)]
stp: Make stp-disabled port forward stp bpdu packets.

Commit 0d1cee123a84 (stp: Fix bpdu tx problem in listening state)
makes ovs drop the stp bpdu packets if stp is not enabled on the
input port.  However, when pif bridge is used and stp is enabled
on the integration bridge.  The flow translation of stp bpdu
packets will go through a level of resubmission which changes
the input port to the corresponding peer port.  Since, the
patch port on the pif bridge does not have stp enabled, the
flow translation will drop the bpdu packets.

This commit fixes the issue by making ovs forward stp bpdu packets
on stp-disabled port.

VMware-BZ: #1284695

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agocmap: Fix cmap_next_position()
Daniele Di Proietto [Wed, 16 Jul 2014 17:46:20 +0000 (10:46 -0700)]
cmap: Fix cmap_next_position()

cmap_next_position() didn't update the node pointer while iterating through a
list of nodes with the same hash.
This commit fixes the bug and improve test-cmap to detect it.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agonetdev-dpdk: add dpdk rings to netdev-dpdk
maryam.tahhan [Fri, 11 Jul 2014 12:37:11 +0000 (13:37 +0100)]
netdev-dpdk: add dpdk rings to netdev-dpdk

Shared memory ring patch

This patch enables the client dpdk rings within the netdev-dpdk.  It adds
a new dpdk device called dpdkr (other naming suggestions?).  This allows
for the use of shared memory to communicate with other dpdk applications,
on the host or within a virtual machine.  Instructions for use are in
INSTALL.DPDK.

This has been tested on Intel multi-core platforms and with the client
application within the host.

Signed-off-by: Gerald Rogers <gerald.rogers@intel.com>
Signed-off-by: Maryam Tahhan <maryam.tahhan@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetlink-socket: Simplify multithreaded dumping to match Linux reality.
Ben Pfaff [Wed, 16 Jul 2014 16:39:49 +0000 (09:39 -0700)]
netlink-socket: Simplify multithreaded dumping to match Linux reality.

Commit 0791315e4d (netlink-socket: Work around kernel Netlink dump thread
races.) introduced a simple workaround for Linux kernel races in Netlink
dumps.  However, the code remained more complicated than needed.  This
commit simplifies it.

The main reason for complication in the code was 'status_seq' in nl_dump.
This member was there to allow a thread to wait for some other thread to
refill the socket buffer with another dump message (although we did not
understand the reason at the time it was introduced).  Now that we know
that Netlink dumps properly need to be serialized to work in existing
Linux kernels, there's no additional value in having 'status_seq',
because serialized recvmsg() calls always refill the socket buffer
properly.

This commit updates nl_msg_next() to clear its buffer argument on error.
This is a more convenient interface for the new version of the Netlink
dump code.  nl_msg_next() doesn't have any other callers.

Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoFactor the ovsdb-server main loop into a new function
Eric Sesterhenn [Fri, 11 Jul 2014 11:24:06 +0000 (13:24 +0200)]
Factor the ovsdb-server main loop into a new function

This refactors the ovsdb-server main loop into a new function, which allows
to call it from multiple places.

Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoCodingStyle: Add suggested GNU indent options.
Ben Pfaff [Tue, 24 Jun 2014 18:44:16 +0000 (11:44 -0700)]
CodingStyle: Add suggested GNU indent options.

Suggested-by: Gerald Rogers <gerald.rogers@intel.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agodpif: Update documentation for RCU-protected actions.
Joe Stringer [Tue, 15 Jul 2014 12:04:52 +0000 (12:04 +0000)]
dpif: Update documentation for RCU-protected actions.

The userspace datapath returns RCU-protected actions from flow_get() and
flow_dump_next(). This doesn't cause any trouble for current users of
these functions, but it imposes additional constraints on their use.
This patch makes the dpif documentation more explicit about how the
results of these functions can be used.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agotests: Use the 'LARGE_MSECS' variation of time/warp at more places.
Gurucharan Shetty [Tue, 15 Jul 2014 15:16:52 +0000 (08:16 -0700)]
tests: Use the 'LARGE_MSECS' variation of time/warp at more places.

commit 8661af798(timeval: Provide a variation for time/warp command)
provided a variation of time/warp command to prevent multiple
invocations of time/warp. That commit changed only the users in bfd.at
and cfm.at as they used it the most. Since we haven't had any negative
confequences because of the change, add the remaining users.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodebian: Automatically start openvswitch before first invocation of ovs-vsctl.
Gurucharan Shetty [Fri, 11 Jul 2014 21:50:50 +0000 (14:50 -0700)]
debian: Automatically start openvswitch before first invocation of ovs-vsctl.

In the 'interfaces' file, if an admin adds the openvswitch interface
in 'auto', ifupdown will try to create OVS interfaces even before
openvswitch has started. In a case like that, assume that the admin
knows what he is doing and try to start openvswitch.

The negatives I see are
1. /usr is NFS mounted, in which case expect the admin
to mount it through initramfs.
2. syslog through network may not be accessible.

This is similar to what rhel does with commit 602453000e28ec10
(rhel: Automatically start openvswitch service before bringing an ovs
interface up or down)

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodebian: Option to create patch ports through 'interfaces'
Gurucharan Shetty [Fri, 11 Jul 2014 21:32:48 +0000 (14:32 -0700)]
debian: Option to create patch ports through 'interfaces'

This is a port of commit d7aab661 ( rhel: Add Patch Port support to
initscripts.) from rhel to debian's ifupdown script.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/coverage: Removed set but not used variables
Daniele Di Proietto [Tue, 15 Jul 2014 18:32:36 +0000 (11:32 -0700)]
lib/coverage: Removed set but not used variables

This removes a GCC 4.9 warning (unused-but-set-variable)

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Check for NULL upcall_portids.
Pravin B Shelar [Tue, 15 Jul 2014 18:12:11 +0000 (11:12 -0700)]
datapath: Check for NULL upcall_portids.

Following patch adds NULL check for memory allocated
by kmalloc.

Reported-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetlink-socket: Fix handling socket allocation failure in nl_dump_start().
Ben Pfaff [Mon, 14 Jul 2014 21:06:03 +0000 (14:06 -0700)]
netlink-socket: Fix handling socket allocation failure in nl_dump_start().

If nl_pool_alloc() failed, then 'dump' was not initialized at all and
further use of the dump would access uninitialized data, probably causing
a crash.

Found by inspection.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agonetlink-socket: Refill comment to fit within 79 columns.
Ben Pfaff [Mon, 14 Jul 2014 20:40:18 +0000 (13:40 -0700)]
netlink-socket: Refill comment to fit within 79 columns.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agodpif-linux: Avoid null dereference if all ports disappear.
Ben Pfaff [Mon, 14 Jul 2014 20:17:05 +0000 (13:17 -0700)]
dpif-linux: Avoid null dereference if all ports disappear.

When dpif_linux_refresh_channels() refreshes the set of channels when
the number of handlers changes, it destroys all the dpif's channels and
sets dpif->uc_array_size to 0.  If the port dump later in the function
turns up no ports (which generally indicates a bug), then no channels will
be allocated and thus dpif->uc_array_size will remain 0 and 'channels' will
be null in each handler.  This is self-consistent, at least, but
dpif_linux_port_get_pid__() was still willing in this situation to
try to access element 0 of the set of channels, dereferencing a null
pointer.

This fixes the problem.

I encountered this while looking at a bug that I had introduced during
development that caused the port dump to always be empty.  It would be
difficult to encounter in normal use.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agoofp-msgs: Correct code for queue configuration messages in OpenFlow 1.0.
Ben Pfaff [Mon, 14 Jul 2014 17:37:38 +0000 (10:37 -0700)]
ofp-msgs: Correct code for queue configuration messages in OpenFlow 1.0.

Reported-by: Simon Jouet <simon.jouet@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoFix documentation error that led user to wrong file to install dependency packages.
Kirkland Spector [Tue, 15 Jul 2014 16:48:20 +0000 (09:48 -0700)]
Fix documentation error that led user to wrong file to install dependency packages.

Signed-off-by: Kirkland Spector <kspector@salesforce.com>
Acked-by: Andrey Falko <afalko@salesforce.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoDrop assignments whose values are never used.
Ben Pfaff [Mon, 14 Jul 2014 21:29:20 +0000 (14:29 -0700)]
Drop assignments whose values are never used.

Found by clang-analyzer.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agocoverage: Move m_idx, h_idx to an inner scope in coverage_run().
Ben Pfaff [Mon, 14 Jul 2014 21:31:36 +0000 (14:31 -0700)]
coverage: Move m_idx, h_idx to an inner scope in coverage_run().

These variables were initialized in an outer scope and then immediately
changed in an inner one, so they might as well be farther in.

Found by clang-analyzer.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agoofproto: Avoid theoretical double free of large rule collections.
Ben Pfaff [Mon, 14 Jul 2014 21:47:12 +0000 (14:47 -0700)]
ofproto: Avoid theoretical double free of large rule collections.

collect_rules_strict() and collect_rules_loose() destroy the rule
collections that they create if they return an error, and some of their
callers then go on to destroy them again.  This could cause a double-free
in the case where rule_collection_destroy() actually calls free().  That
never happens in the current tree, because free() is only necessary if
malloc() was called and there's a 64-entry stub that none of the current
code in collect_rules_*() can fill up in their error cases.  Still, it
seems better to fix the problem.

Found by clang-analyzer.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agoofp-util: Fix null pointer dereference in ofputil_pull_buckets().
Ben Pfaff [Mon, 14 Jul 2014 21:33:01 +0000 (14:33 -0700)]
ofp-util: Fix null pointer dereference in ofputil_pull_buckets().

Found by clang-analyzer.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agorevalidator: Revalidate missed flows.
Joe Stringer [Tue, 8 Jul 2014 07:04:05 +0000 (07:04 +0000)]
revalidator: Revalidate missed flows.

If the datapath doesn't dump a flow for some reason, and the current
dump is expected to revalidate all flows in the datapath, then perform
revalidation for those flows by fetching them during the sweep phase.
If revalidation is not required, then leave the flow in the datapath and
don't revalidate it.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodpif: Support fetching flow mask via dpif_flow_get().
Joe Stringer [Tue, 8 Jul 2014 07:04:04 +0000 (07:04 +0000)]
dpif: Support fetching flow mask via dpif_flow_get().

Change the interface to allow implementations to pass back a buffer, and
allow callers to specify which of actions, mask, and stats they wish to
receive. This will be used in the next commit.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovsdb-server.at: Skip tests that use ovsdb-server's "--run" on Windows.
Gurucharan Shetty [Mon, 16 Jun 2014 20:49:18 +0000 (13:49 -0700)]
ovsdb-server.at: Skip tests that use ovsdb-server's "--run" on Windows.

ovsdb-server's port on Windows does not support the "--run" option.
The two tests skipped in this commit make use of "--run" option to
test ovsdb-server's truncating of corrupt log or bad transaction.
It looks a little tricky to get this test running on Windows without
the "--run" option implemented.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Refactor ovs_flow_cmd_fill_info().
Joe Stringer [Thu, 26 Jun 2014 00:10:37 +0000 (12:10 +1200)]
datapath: Refactor ovs_flow_cmd_fill_info().

Split up ovs_flow_cmd_fill_info() to make it easier to cache parts of a
dump reply. This will be used to streamline flow_dump in a future patch.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agotests: Disable glibc memory checking under glibc <= 2.11.
Ben Pfaff [Fri, 11 Jul 2014 18:03:08 +0000 (11:03 -0700)]
tests: Disable glibc memory checking under glibc <= 2.11.

We noticed that the unit tests sometimes fail on XenServer inside glibc's
memory checker, in the free_check() function.  It turns out that the
glibc memory checker in glibc 2.11 and earlier had an internal race that
caused false positives in multithreaded programs.

This commit avoids the problem by disabling the glibc memory checker in
glibc 2.11 and earlier.

VMware-BZ: #1267127
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agodatapath/flow_netlink: Create right mask with disabled megaflows
Daniele Di Proietto [Fri, 11 Jul 2014 17:01:17 +0000 (10:01 -0700)]
datapath/flow_netlink: Create right mask with disabled megaflows

If megaflows are disabled, the userspace does not send the netlink attribute
OVS_FLOW_ATTR_MASK, and the kernel must create an exact match mask.

sw_flow_mask_set() sets every bytes (in 'range') of the mask to 0xff, even the
bytes that represent padding for struct sw_flow, or the bytes that represent
fields that may not be set during ovs_flow_extract().
This is a problem, because when we extract a flow from a packet,
we do not memset() anymore the struct sw_flow to 0 (since commit 9cef26ac6a71).

This commit gets rid of sw_flow_mask_set() and introduces mask_set_nlattr(),
which operates on the netlink attributes rather than on the mask key. Using
this approach we are sure that only the bytes that the user provided in the
flow are matched.

Also, if the parse_flow_mask_nlattrs() for the mask ENCAP attribute fails, we
now return with an error.

Reported-by: Alex Wang <alexw@nicira.com>
Suggested-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: Enable tunnel GSO features.
Pravin B Shelar [Thu, 3 Jul 2014 18:38:29 +0000 (11:38 -0700)]
datapath: Enable tunnel GSO features.

Following patch enables all available tunnel GSO features for OVS
bridge device so that ovs can use hardware offloads available to
underling device.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
9 years agolib/hash: Use CRC32 for hashing.
Jarno Rajahalme [Fri, 11 Jul 2014 12:57:11 +0000 (05:57 -0700)]
lib/hash: Use CRC32 for hashing.

Use CRC32 intrinsics for hash computations when building for
X86_64 with SSE4_2.

Add a new hash_words64() and change hash_words() to be inlined when
'n_words' is a compile-time constant.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/classifier: Lockless lookups.
Jarno Rajahalme [Fri, 11 Jul 2014 09:29:08 +0000 (02:29 -0700)]
lib/classifier: Lockless lookups.

Now that all the relevant classifier structures use RCU and internal
mutual exclusion for modifications, we can remove the fat-rwlock and
thus make the classifier lookups lockless.

As the readers are operating concurrently with the writers, a
concurrent reader may or may not see a new rule being added by a
writer, depending on how the concurrent events overlap with each
other.  Overall, this is no different from the former locked behavior,
but there the visibility of the new rule only depended on the timing
of the locking functions.

A new rule is first added to the segment indices, so the readers may
find the rule in the indices before the rule is visible in the
subtables 'rules' map.  This may result in us losing the opportunity
to quit lookups earlier, resulting in sub-optimal wildcarding.  This
will be fixed by forthcoming revalidation always scheduled after flow
table changes.

Similar behavior may happen due to us removing the overlapping rule
(if any) from the indices only after the corresponding new rule has
been added.

The subtable's max priority is updated only after a rule is inserted
to the maps, so the concurrent readers may not see the rule, as the
updated priority ordered subtable list will only be visible after the
subtable's max priority is updated.

Similarly, the classifier's partitions are updated by the caller after
the rule is inserted to the maps, so the readers may keep skipping the
subtable until they see the updated partitions.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agolib/classifier: RCUify prefix trie code.
Jarno Rajahalme [Fri, 11 Jul 2014 09:29:08 +0000 (02:29 -0700)]
lib/classifier: RCUify prefix trie code.

cls_set_prefix_fields() now synchronizes explicitly with the readers,
waiting them to finish using the old configuration before changing to
the new configuration.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agolib/classifier: Use internal mutex.
Jarno Rajahalme [Fri, 11 Jul 2014 09:29:08 +0000 (02:29 -0700)]
lib/classifier: Use internal mutex.

Add an internal mutex to struct cls_classifier, and reorganize
classifier internal structures according to the user of each field,
marking the fields that need to be protected by the mutex.  This makes
locking requirements easier to track, and may make lookup more memory
efficient.

After this patch there is some double locking, as callers are taking
the fat-rwlock, and we take the mutex internally.  A following patch
will remove the classifier fat-rwlock, removing the (double) locking
overhead.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>