cascardo/ovs.git
9 years agolib/classifier: Stylistic change.
Jarno Rajahalme [Fri, 11 Jul 2014 09:29:08 +0000 (02:29 -0700)]
lib/classifier: Stylistic change.

Rename 'nbits' as 'n_bits' to be more consistent with other count-like
fields.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agolib/ovs-rcu: Export ovsrcu_synchronize().
Jarno Rajahalme [Fri, 11 Jul 2014 09:29:07 +0000 (02:29 -0700)]
lib/ovs-rcu: Export ovsrcu_synchronize().

A following patch will add the first external user.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agolib/classifier: Simplify iteration with C99 declaration.
Jarno Rajahalme [Fri, 11 Jul 2014 09:29:07 +0000 (02:29 -0700)]
lib/classifier: Simplify iteration with C99 declaration.

Hide the cursor from the classifier iteration users and move locking to
the iterators.  This will make following RCU changes simpler, as the call
sites of the iterators need not be changed at that point.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agolib/classifier: Use cmap.
Jarno Rajahalme [Fri, 11 Jul 2014 09:29:07 +0000 (02:29 -0700)]
lib/classifier: Use cmap.

Use cmap instead of hmap & hindex in classifier.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agonetlink-socket: Work around kernel Netlink dump thread races.
Ben Pfaff [Thu, 10 Jul 2014 23:48:16 +0000 (16:48 -0700)]
netlink-socket: Work around kernel Netlink dump thread races.

The Linux kernel Netlink implementation has two races that cause problems
for processes that attempt to dump a table in a multithreaded manner.

The first race is in the structure of the kernel netlink_recv() function.
This function pulls a message from the socket queue and, if there is none,
reports EAGAIN:
skb = skb_recv_datagram(sk, flags, noblock, &err);
if (skb == NULL)
goto out;
Only if a message is successfully read from the socket queue does the
function, toward the end, try to queue up a new message to be dumped:
if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) {
ret = netlink_dump(sk);
if (ret) {
sk->sk_err = ret;
sk->sk_error_report(sk);
}
}
This means that if thread A reads a message from a dump, then thread B
attempts to read one before A queues up the next, B will get EAGAIN.  This
means that, following EAGAIN, B needs to wait until A returns to userspace
before it tries to read the socket again.  nl_dump_next() already does
this, using 'dump->status_seq' (although the need for it has never been
explained clearly, to my knowledge).

The second race is more serious.  Suppose thread X and thread Y both
simultaneously attempt to queue up a new message to be dumped, using the
call to netlink_dump() quoted above.  netlink_dump() begins with:
mutex_lock(nlk->cb_mutex);

cb = nlk->cb;
if (cb == NULL) {
err = -EINVAL;
goto errout_skb;
}
Suppose that X gets cb_mutex first and finds that the dump is complete.  It
will therefore, toward the end of netlink_dump(), clear nlk->cb to NULL to
indicate that no dump is in progress and release the mutex:
nlk->cb = NULL;
mutex_unlock(nlk->cb_mutex);
When Y grabs cb_mutex afterward, it will see that nlk->cb is NULL and
return -EINVAL as quoted above.  netlink_recv() stuffs -EINVAL in sk_err,
but that error is not reported immediately; instead, it is saved for the
next read from the socket.  Since Open vSwitch maintains a pool of Netlink
sockets, that next failure can crop up pretty much anywhere.  One of the
worst places for it to crop up is in the execution of a later transaction
(e.g. in nl_sock_transact_multiple__()), because userspace treats Netlink
transactions as idempotent and will re-execute them when socket errors
occur.  For a transaction that sends a packet, this causes packet
duplication, which we actually observed in practice.  (ENOBUFS should
actually cause transactions to be re-executed in many cases, but EINVAL
should not; this is a separate bug in the userspace netlink code.)

VMware-BZ: #1283188
Reported-and-tested-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agonetlink-socket: Fix sign of error code.
Ben Pfaff [Thu, 10 Jul 2014 21:32:10 +0000 (14:32 -0700)]
netlink-socket: Fix sign of error code.

Commit 8f20fd98db (netlink-socket: Work around upstream kernel Netlink
bug.) got the sign of the error code wrong, so that it reported e.g. -22
for EINVAL to nl_sock_recv__()'s caller, instead of 22.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agodatapath: refactor do_output() to move skb_clone NULL check out of fast path
Andy Zhou [Thu, 10 Jul 2014 07:50:26 +0000 (00:50 -0700)]
datapath: refactor do_output() to move skb_clone NULL check out of fast path

skb_clone() NULL check is implemented in do_output(), as past of the
common (fast) path. Refactoring so that NULL check is done in the
slow path, immediately after skb_clone() is called.

Besides optimization, this change also improves code readability by
making the skb_clone() NULL check consistent within OVS datapath
module.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: add skb_clone NULL check for the sampling action.
Andy Zhou [Thu, 10 Jul 2014 07:49:06 +0000 (00:49 -0700)]
datapath: add skb_clone NULL check for the sampling action.

Fix a bug where skb_clone() NULL check is missing in sample action
implementation.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: add skb_clone NULL check in the recirc action.
Andy Zhou [Thu, 10 Jul 2014 07:30:27 +0000 (00:30 -0700)]
datapath: add skb_clone NULL check in the recirc action.

Refactoring recirc action implementation.

The main change is to fix a bug where the NULL check after skb clone()
call is missing.  The fix is to return -ENOMEM whenever skb_clone()
failed to create a clone.

Also rearranged adjacent code to improve readability.

Reported-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetlink-protocol: Remove definition of struct sockaddr_nl.
Ben Pfaff [Thu, 10 Jul 2014 20:21:35 +0000 (13:21 -0700)]
netlink-protocol: Remove definition of struct sockaddr_nl.

This struct is used only in netlink-socket.c, which is only used on Linux,
which in turn gets the definition from <linux/netlink.h>.  On Windows the
definition actually causes a small amount of trouble because Windows does
not define sa_family_t (despite POSIX), so it's better to remove it.

Even if other platforms adopt Netlink, I have no reason to believe that
they will use the same sockaddr format as Linux.

CC: Saurabh Shah <ssaurabh@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agodatapath/flow_netlink: Fix NDP flow mask validation
Daniele Di Proietto [Wed, 9 Jul 2014 22:22:14 +0000 (15:22 -0700)]
datapath/flow_netlink: Fix NDP flow mask validation

match_validate() enforce that a mask matching on NDP attributes has also an
exact match on ICMPv6 type.
The ICMPv6 type, which is 8-bit wide, is stored in the 'tp.src' field of
'struct sw_flow_key', which is 16-bit wide.
Therefore, an exact match on ICMPv6 type should only check the first 8 bits.

This commit fixes a bug that prevented flows with an exact match on NDP field
from being installed

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodaemon: restart child process if it died before signaling its readiness
Ansis Atteka [Tue, 8 Jul 2014 00:11:39 +0000 (17:11 -0700)]
daemon: restart child process if it died before signaling its readiness

The child process (the one being monitored) could die before it was able
to call fork_notify_startup() function.  If such situation arises, then
parent process (the one monitoring child process) would also terminate
with a fatal log message:

...|EMER|fork child died before signaling startup (killed (...))

This patch changes that behavior by always restarting child process
if it was able to start up at least once in the past.  However, if
child was not able to start up even once, then the monitor process
would still terminate, because that would most likely indicate a
persistent programming or system error.

To reproduce use following script:

while : ; do kill -SIGSEGV `cat /var/run/openvswitch/ovs-vswitchd.pid`; done

Signed-Off-By: Ansis Atteka <aatteka@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
VMware-BZ: 1273550

9 years agotimeval: Initialize 'unix_epoch' for Windows correctly.
Gurucharan Shetty [Thu, 26 Jun 2014 04:27:36 +0000 (21:27 -0700)]
timeval: Initialize 'unix_epoch' for Windows correctly.

Till now, we were initializing 'unix_epoch' through time_init().
But if there was a call directly to xgettimeofday(), we would
miss the initialization causing overflows. This commit fixes it
by pre-calculating the value and assigning it globally.

Also add-in a missing return statement.

Reported-by: Alessandro Pilotti <apilotti@cloudbasesolutions.com>
Reported-by: Linda Sun <lsun@vmware.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodpif-linux: Recheck the socket pointer existence before getting its pid.
Alex Wang [Tue, 8 Jul 2014 04:58:33 +0000 (21:58 -0700)]
dpif-linux: Recheck the socket pointer existence before getting its pid.

This commit fixes a race between port deletion and flow miss handling.
More specifically, a port could be removed by main thread while
the handler thread is handling the flow miss from it.  If the flow
requires slow path action, the handler thread will try querying a pid
from port's socket.  Since the port has been deleted, the query will
cause a dereference of NULL socket pointer.

This commit makes the handler thread recheck the socket pointer before
dereferencing it.

VMware-BZ: 1251981

Reported-by: Pratap Reddy <preddy@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agotests: Fix "megaflow disabled" test.
Joe Stringer [Tue, 8 Jul 2014 09:01:20 +0000 (09:01 +0000)]
tests: Fix "megaflow disabled" test.

The previous 'grep' logic would occasionally catch unintended flows.
There's no reason to check for these flows separately, so combine the
two checks.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ansis Atteka <aatteka@nicira.com>
9 years agotests: Fix race in 'balance-tcp bonding' test.
Joe Stringer [Fri, 4 Jul 2014 06:58:28 +0000 (06:58 +0000)]
tests: Fix race in 'balance-tcp bonding' test.

Running the test in a tight loop could cause this test to fail after
about 5 runs, with some of the ports reporting "may_enable: false" in
the "ovs-appctl bond/show" output. This commit fixes the race condition
by waiting for may_enable to be true for all bond ports.

I suspect that LACP negotiation finishes, but the main thread doesn't
have a chance to enable the ports before we send the test packets.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoutil: fix compile warnings
Ansis Atteka [Tue, 8 Jul 2014 04:11:53 +0000 (04:11 +0000)]
util: fix compile warnings

This patch fixes two compile warnings introduced by commit
64b73291 ("util: create a copy of program_name"):
1. ../lib/util.c:457:5: error: passing argument 1 of 'free'
   discards 'const' qualifier from pointer target type; And
2. ../lib/util.c:463:5: error: ISO C90 forbids mixed declarations
   and code [-Werror=declaration-after-statement] (affected only
   branch-2.3 that is C90 compliant and not the master)

Reported-By: Joe Stringer <jstringer@nicira.com>
Reported-By: Lorand Jakab <lojakab@cisco.com>
Signed-Off-By: Ansis Atteka <aatteka@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
9 years agoovsdb: Frees database memory on ovsdb process cleanup.
Ryan Wilson [Wed, 2 Jul 2014 22:00:16 +0000 (15:00 -0700)]
ovsdb: Frees database memory on ovsdb process cleanup.

This fixes valgrind errors.

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agolib/classifier: Remove unused typedef cls_cb_func.
Jarno Rajahalme [Mon, 7 Jul 2014 20:18:47 +0000 (13:18 -0700)]
lib/classifier: Remove unused typedef cls_cb_func.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoofproto-dpif: Use ovs_refcount_try_ref_rcu().
Jarno Rajahalme [Mon, 7 Jul 2014 20:18:46 +0000 (13:18 -0700)]
ofproto-dpif: Use ovs_refcount_try_ref_rcu().

This is a prerequisite step in making the classifier lookups lockless.
If taking a reference fails, we do the lookup again, as a new (lower
priority) rule may now match instead.

Also remove unwildcarding dl_type and nw_frag, as these are already
taken care of by xlate_actions().

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoUse ovs_refcount_unref_relaxed.
Jarno Rajahalme [Mon, 7 Jul 2014 20:18:46 +0000 (13:18 -0700)]
Use ovs_refcount_unref_relaxed.

After a quick analysis, in most cases the access to refcounted objects
is clearly protected either with an explicit lock/mutex, or RCU. there
are only a few places where I left a call to ovs_refcount_unref().
Upon closer analysis it may well be that those could also use the
relaxed form.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/ovs-atomic: Add ovs_refcount_unref_relaxed(), ovs_refcount_try_ref_rcu().
Jarno Rajahalme [Mon, 7 Jul 2014 20:18:46 +0000 (13:18 -0700)]
lib/ovs-atomic: Add ovs_refcount_unref_relaxed(), ovs_refcount_try_ref_rcu().

When a reference counted object is also RCU protected the deletion of
the object's memory is always postponed.  This allows
memory_order_relaxed to be used also for unreferencing, as RCU
quiescing provides a full memory barrier (it has to, or otherwise
there could be lingering accesses to objects after they are recycled).

Also, when access to the reference counted object is protected via a
mutex or a lock, the locking primitives provide the required memory
barrier functionality.

Also, add ovs_refcount_try_ref_rcu(), which takes a reference only if
the refcount is non-zero and returns true if a reference was taken,
false otherwise.  This can be used in combined RCU/refcount scenarios
where we have an RCU protected reference to an refcounted object, but
which may be unref'ed at any time.  If ovs_refcount_try_ref_rcu()
fails, the object may still be safely used until the current thread
quiesces.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/ovs-atomic: Add atomic compare_exchange.
Jarno Rajahalme [Mon, 7 Jul 2014 20:18:46 +0000 (13:18 -0700)]
lib/ovs-atomic: Add atomic compare_exchange.

Add support for atomic compare_exchange operations.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-atomic: Use explicit memory order for ovs_refcount.
Jarno Rajahalme [Mon, 7 Jul 2014 20:18:46 +0000 (13:18 -0700)]
ovs-atomic: Use explicit memory order for ovs_refcount.

Use explicit variants of atomic operations for the ovs_refcount to
avoid the overhead of the default memory_order_seq_cst.

Adding a reference requires no memory ordering, as the calling thread
is already assumed to have protected access to the object being
reference counted.  Hence, memory_order_relaxed is used for
ovs_refcount_ref().  ovs_refcount_read() does not change the reference
count, so it can also use memory_order_relaxed.

Unreferencing an object needs a release barrier, so that none of the
accesses to the protected object are reordered after the atomic
decrement operation.  Additionally, an explicit acquire barrier is
needed before the object is recycled, to keep the subsequent accesses
to the object's memory from being reordered before the atomic
decrement operation.

This patch follows the memory ordering and argumentation discussed
here:

http://www.chaoticmind.net/~hcb/projects/boost.atomic/doc/atomic/usage_examples.html

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/hash: Abstract hash interface.
Jarno Rajahalme [Fri, 4 Jul 2014 14:57:18 +0000 (07:57 -0700)]
lib/hash: Abstract hash interface.

Use generic names hash_add() and hash_finish() instead of mhash_*
equivalents.  This makes future changes to hash implentations more
localized.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodpif-netdev: Use cmap instead of hmap.
Jarno Rajahalme [Fri, 4 Jul 2014 13:38:47 +0000 (06:38 -0700)]
dpif-netdev: Use cmap instead of hmap.

This requires less locking and makes introducing lockless classifier
lookups possible.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoutil: Suppress a warning by adding CONST_CAST
YAMAMOTO Takashi [Fri, 4 Jul 2014 14:14:57 +0000 (14:14 +0000)]
util: Suppress a warning by adding CONST_CAST

Suppress a gcc warning which was introduced by
commit e0b48482c16b6eaa7f14d8c7e7c6275528881b9e.
("util: create a copy of program_name")
I guess MSVC doesn't have a corresponding warning.

Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: Lorand Jakab <lojakab@cisco.com>
9 years agoutil: create a copy of program_name
Ansis Atteka [Thu, 3 Jul 2014 20:41:02 +0000 (13:41 -0700)]
util: create a copy of program_name

Commit 8a9562 ("dpif-netdev: Add DPDK netdev.") reversed sequence
in which set_program_name() and proctitle_init() functions are
called.  This introduced a regression where program_name and argv_start
would point to exactly the same memory (previously both of these
pointers were pointing to different memory locations because
proctitle_init() would have beforehand created a copy of argv[0]
for the succeeding set_program_name() call).

This regression on my system caused ovs-vswitchd monitoring process to
show up without process name:

...  00:00:00 : monitoring pid 26308 (healthy)

Ps output was lacking process name because following code was
using overlapping memory for source and target buffer:.

proctitle_set(const char *format, ...)
{
    ...
    n = snprintf(argv_start, argv_size, "%s: ", program_name);

Overall C99 and POSIX standards state that behavior is undefined
if source and target buffers overlap.

Signed-Off-By: Ansis Atteka <aatteka@nicira.com>
Acked-By: Ben Pfaff <blp@nicira.com>
9 years agorevalidator: Simplify push_dump_ops__().
Joe Stringer [Tue, 1 Jul 2014 09:54:18 +0000 (09:54 +0000)]
revalidator: Simplify push_dump_ops__().

Commit acaa8dac49 (revalidator: Eliminate duplicate flow handling.)
ensured that a ukey will always exist for a given flow, even if it is
about to be deleted. This means that push_dump_ops__() no longer needs
to handle the case where there is no ukey. This commit removes the
redundant code.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Additional logging for -EINVAL on flow setups.
Jesse Gross [Wed, 2 Jul 2014 00:41:51 +0000 (17:41 -0700)]
datapath: Additional logging for -EINVAL on flow setups.

There are many possible ways that a flow can be invalid so we've
added logging for most of them. This adds logs for the remaining
possible cases so there isn't any ambiguity while debugging.

CC: Federico Iezzi <fiezzi@enter.it>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
9 years agonetlink-socket: Work around upstream kernel Netlink bug.
Ben Pfaff [Mon, 30 Jun 2014 21:57:42 +0000 (14:57 -0700)]
netlink-socket: Work around upstream kernel Netlink bug.

The upstream kernel net/netlink/af_netlink.c netlink_recvmsg() contains the
following code to refill the Netlink socket buffer with more dump skbs
while a dump is in progress:

if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) {
ret = netlink_dump(sk);
if (ret) {
sk->sk_err = ret;
sk->sk_error_report(sk);
}
}

The netlink_dump() function that this calls returns a negative number on
error, the convention used throughout the kernel, and thus sk->sk_err
receives a negative value on error.

However, sk->sk_err is supposed to contain either 0 or a positive errno
value, as one can see from a quick "grep" through net for 'sk_err =', e.g.:

    ipv4/tcp.c:2067: sk->sk_err = ECONNRESET;
    ipv4/tcp.c:2069: sk->sk_err = ECONNRESET;
    ipv4/tcp_input.c:4106: sk->sk_err = ECONNREFUSED;
    ipv4/tcp_input.c:4109: sk->sk_err = EPIPE;
    ipv4/tcp_input.c:4114: sk->sk_err = ECONNRESET;
    netlink/af_netlink.c:741: sk->sk_err = ENOBUFS;
    netlink/af_netlink.c:1796: sk->sk_err = ENOBUFS;
    packet/af_packet.c:2476: sk->sk_err = ENETDOWN;
    unix/af_unix.c:341: other->sk_err = ECONNRESET;
    unix/af_unix.c:407: skpair->sk_err = ECONNRESET;

The result is that the next attempt to receive from the socket will return
the error to userspace with the wrong sign.

(The root of the error in this case is that multiple threads are attempting
to read a single flow dump from a shared fd.  That should work, but the
kernel has an internal race that can result in one or more of those threads
hitting the EINVAL case at the start of netlink_dump().  The EINVAL is
harmless in this case and userspace should be able to ignore it, but
reporting the EINVAL as if it were a 22-byte message received in userspace
throws a real wrench in the works.)

This bug makes me think that there are probably not many programs doing
multithreaded Netlink dumps.  Maybe it is good that we are considering
other approaches.

VMware-BZ: #1255704
Reported-by: Mihir Gangar <gangarm@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agoINSTALL: Mention conflict with NET_IPGRE setting before Linux 3.11.
Ben Pfaff [Wed, 2 Jul 2014 18:52:41 +0000 (11:52 -0700)]
INSTALL: Mention conflict with NET_IPGRE setting before Linux 3.11.

I found when reconfiguring my kernel that if I turned on NET_IPGRE, GRE
tunnels no longer worked.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agoINSTALL.DPDK: fix a typo in remote line
Flavio Leitner [Wed, 2 Jul 2014 18:47:41 +0000 (15:47 -0300)]
INSTALL.DPDK: fix a typo in remote line

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>
9 years agoINSTALL.Fedora, INSTALL.RHEL: Make instructions more explicit.
Ben Pfaff [Wed, 2 Jul 2014 19:50:08 +0000 (12:50 -0700)]
INSTALL.Fedora, INSTALL.RHEL: Make instructions more explicit.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>
9 years agorevalidator: Improve optimization to skip revalidation.
Joe Stringer [Wed, 2 Jul 2014 07:41:33 +0000 (07:41 +0000)]
revalidator: Improve optimization to skip revalidation.

The should_revalidate() optimisation introduced with commit 698ffe3623
(revalidator: Only revalidate high-throughput flows.) was a little
aggressive, occasionally deleting flows even when OVS is quite capable
of performing full revalidation.

This commit modifies the logic to:
* Firstly, check if we are likely to handle full revalidation, and
  attempt that instead.
* Secondly, fall back to the existing flow throughput estimations to
  determine whether to revalidate the flow or just delete it.

VMware-BZ: #1271926

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agodatapath: Allow pop and push MPLS actions after pop VLAN
Simon Horman [Mon, 30 Jun 2014 04:20:14 +0000 (13:20 +0900)]
datapath: Allow pop and push MPLS actions after pop VLAN

This patch loosens the restrictions surrounding push and pop MPLS actions
such that they will be allowed after a pop VLAN action if the inner
ethernet type is acceptable for pop and push MPLS actions. This implies
that there is only one VLAN tag present.

Some analysis of logic of this change is as follows:

    The purpose of tracking vlan_tci is to allow prohibition of push
    and pop MPLS actions in the presence of a VLAN. In this scenario
    the VLAN_TAG_PRESENT bit of vlan_tci is set and eth_type is that of
    the packet with the outermost VLAN tag removed.

    A pop VLAN action may clear vlan_tci as it removes the outermost
    VLAN tag and the push and pop MPLS logic may rely on eth_type for
    their prohibition logic.

    This will not allow push and pop MPLS on packets with multiple VLAN
    tags, regardless of if they are all remove using POP VLAN, as there
    is no mechanism to expose the inner ethernet type beyond that of
    the outermost VLAN tag.

Suggested-by: Jesse Gross <jgross@nicira.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
9 years agodatapath: Use exact lookup for flow_get and flow_del.
Alex Wang [Mon, 30 Jun 2014 21:51:02 +0000 (14:51 -0700)]
datapath: Use exact lookup for flow_get and flow_del.

Due to the race condition in userspace, there is chance that two
overlapping megaflows could be installed in datapath.  And this
causes userspace unable to delete the less inclusive megaflow flow
even after it timeout, since the flow_del logic will stop at the
first match of masked flow.

This commit fixes the bug by making the kernel flow_del and flow_get
logic check all masks in that case.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: Change u64_stats_* to use _irq instead of _bh().
Jesse Gross [Mon, 30 Jun 2014 20:43:25 +0000 (13:43 -0700)]
datapath: Change u64_stats_* to use _irq instead of _bh().

The upstream u64_stats API has been changed to remove the _bh()
versions and switch all consumers to use IRQ safe variants instead.
This was done to be safe for netpoll generated packets, which can
occur in hard IRQ context. From a safety perspective, this doesn't
directly affect OVS since it doesn't support netpoll. However, this
change has been backported to older kernels so OVS needs to use the
new API to compile.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agotest-vconn: Change the expected error for Windows.
Gurucharan Shetty [Wed, 25 Jun 2014 02:52:54 +0000 (02:52 +0000)]
test-vconn: Change the expected error for Windows.

On Windows ECONNRESET is WSAECONNRESET.

Also, "unix" connections are done through TCP sockets.
For the 'refuse-connection' test, the error message for Windows
is WSAECONNRESET instead of EPIPE.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agorhel, xenserver: Run unit tests while creating rpms.
Gurucharan Shetty [Mon, 23 Jun 2014 21:05:00 +0000 (14:05 -0700)]
rhel, xenserver: Run unit tests while creating rpms.

For RHEL, Fedora and Xenserver, run unit tests while
building rpms.  This may catch some cross-platform bugs.

The commit also allows the users to optionally skip unit tests.
(On debian, the default is to run unit tests. For consistency,
do the same for rpms.)

VMware-BZ: 1267127

CC: Flavio Leitner <fbl@redhat.com>
CC: Ben Pfaff <blp@nicira.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Tested-by: Flavio Leitner <fbl@redhat.com>
9 years agodpif-netdev: Batch megaflow lookup.
Ethan Jackson [Tue, 24 Jun 2014 01:28:43 +0000 (18:28 -0700)]
dpif-netdev: Batch megaflow lookup.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agoclassifier: Add a batched miniflow lookup function.
Ethan Jackson [Tue, 24 Jun 2014 01:40:47 +0000 (18:40 -0700)]
classifier: Add a batched miniflow lookup function.

Used in a future patch.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agodpif-netdev: Take the classifier rdlock once per batch.
Ethan Jackson [Thu, 5 Jun 2014 18:23:06 +0000 (11:23 -0700)]
dpif-netdev: Take the classifier rdlock once per batch.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agodpif-netdev: Rename batch_pkt_execute.
Ethan Jackson [Tue, 24 Jun 2014 01:22:03 +0000 (18:22 -0700)]
dpif-netdev: Rename batch_pkt_execute.

The new name "packet_batch" is a bit more straight forward.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
9 years agodatapath: Fix error handling for Geneve options in ipv4_tun_to_nlattr().
Ben Pfaff [Mon, 30 Jun 2014 20:48:30 +0000 (13:48 -0700)]
datapath: Fix error handling for Geneve options in ipv4_tun_to_nlattr().

Found by inspection.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agonetdev-dpdk: Add OVS_UNLIKELY annotations in dpdk_do_tx_copy().
Ryan Wilson [Fri, 27 Jun 2014 01:16:41 +0000 (18:16 -0700)]
netdev-dpdk: Add OVS_UNLIKELY annotations in dpdk_do_tx_copy().

Since dropped packets due to large packet size or lack of memory
are unlikely, it is best to add OVS_UNLIKELY annotations to these
conditions.

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoovs-benchmark: Compile for windows.
Gurucharan Shetty [Wed, 2 Apr 2014 15:31:50 +0000 (08:31 -0700)]
ovs-benchmark: Compile for windows.

This just makes ovs-benchmark compile on windows.
This lets us go ahead with just a 'make' instead of
picking and choosing executables that are tested to work on
windows as arguments for make.

This commit does not make ovs-benchmark a supported utility
on windows.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agotests: Port test-sflow and test-netflow to Windows.
Gurucharan Shetty [Fri, 23 May 2014 20:40:07 +0000 (13:40 -0700)]
tests: Port test-sflow and test-netflow to Windows.

After the change, both of them compile. test-netflow related
unit tests pass.

test-sflow related tests do not pass because
of LOOPBACK_INTERFACE constraints for 'agent'.
(It should be revisited later.)

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agonetdev-dpdk: Fix memory leak in dpdk_do_tx_copy().
Ryan Wilson [Fri, 27 Jun 2014 01:16:39 +0000 (18:16 -0700)]
netdev-dpdk: Fix memory leak in dpdk_do_tx_copy().

This patch fixes a bug where rte_pktmbuf_alloc() would fail and
packets which succeeded to allocate memory with rte_pktmbuf_alloc()
would not be sent and leak memory.

Also, as a byproduct of using a local variable to record dropped
packets, this reduces the locking of the netdev's mutex when
multiple packets are dropped in dpdk_do_tx_copy().

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetdev-dpdk: Set current timestamp when flushing TX queue.
Ryan Wilson [Fri, 27 Jun 2014 00:41:46 +0000 (17:41 -0700)]
netdev-dpdk: Set current timestamp when flushing TX queue.

The current timestamp should be set every time the queue is flushed.
Thus, if DRAIN_TSC timer cycles have passed since the last timestamp,
the send queue should be flushed again.

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetdev-dpdk: Refactor dpdk_queue_flush().
Ryan Wilson [Fri, 27 Jun 2014 00:41:45 +0000 (17:41 -0700)]
netdev-dpdk: Refactor dpdk_queue_flush().

This patch refactors dpdk_queue_flush() to reuse code in
dpdk_queue_pkts().

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoINSTALL.RHEL: Fix a typo.
Gurucharan Shetty [Mon, 30 Jun 2014 15:01:51 +0000 (08:01 -0700)]
INSTALL.RHEL: Fix a typo.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
9 years agopoll-loop: Create Windows event handles for sockets automatically.
Gurucharan Shetty [Fri, 27 Jun 2014 20:30:49 +0000 (13:30 -0700)]
poll-loop: Create Windows event handles for sockets automatically.

We currently have a poll_fd_wait_event(fd, wevent, events) function that
is used at places common to Windows and Linux where we have to wait on
sockets.  On Linux, 'wevent' is always set as zero. On Windows, for sockets,
when we send both 'fd' and 'wevent', we associate them with each other for
'events' and then wait on 'wevent'. Also on Windows, when we only send 'wevent'
to this function, we would simply wait for all events for that 'wevent'.

There is a disadvantage with this approach.
* Windows clients need to create a 'wevent' and then pass it along. This
means that at a lot of places where we create sockets, we also are forced
to create a 'wevent'.

With this commit, we pass the responsibility of creating a 'wevent' to
poll_fd_wait() in case of sockets. That way, a client using poll_fd_wait()
is only concerned about sockets and not about 'wevents'. There is a potential
disadvantage with this change in that we create events more often and that
may have a performance penalty. If that turns out to be the case, we will
eventually need to create a pool of wevents that can be re-used.

In Windows, there are cases where we want to wait on a event (not
associated with any sockets) and then control it using functions
like SetEvent() etc. For that purpose, introduce a new function
poll_wevent_wait(). For this function, the client needs to create a event
and then pass it along as an argument.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-By: Ben Pfaff <blp@nicira.com>
9 years agorpm: improved RPM sources dir explanation
Flavio Leitner [Mon, 30 Jun 2014 00:21:05 +0000 (21:21 -0300)]
rpm: improved RPM sources dir explanation

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoudpif: Add command to wait for revalidation.
Joe Stringer [Wed, 25 Jun 2014 14:02:45 +0000 (14:02 +0000)]
udpif: Add command to wait for revalidation.

This allows us to remove some of the sleeps from the testsuite.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodatapath: Initialize OVS_CB in ovs_vport_receive()
Pravin B Shelar [Thu, 26 Jun 2014 22:15:00 +0000 (15:15 -0700)]
datapath: Initialize OVS_CB in ovs_vport_receive()

On packet recv OVS CB is initialized in multiple function.
Following patch moves all these initialization to
ovs_vport_receive().
This patch also save a check in execute actions.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agodpdk: High speed PMD physical NIC queue size
Polehn, Mike A [Thu, 19 Jun 2014 22:58:26 +0000 (22:58 +0000)]
dpdk: High speed PMD physical NIC queue size

Large TX and RX queues are needed for high speed 10 GbE physical NICS.
Observed a 250% zero loss improvement over small NIC queue test for
port to port flow test.

Signed-off-by: Mike A. Polehn <mike.a.polehn@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoFAQ: Add an entry for multicast snooping with VLAN
Flavio Leitner [Thu, 26 Jun 2014 12:00:37 +0000 (09:00 -0300)]
FAQ: Add an entry for multicast snooping with VLAN

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agobuild: Allow building with autoconf 2.63
Thomas Graf [Fri, 27 Jun 2014 07:31:57 +0000 (09:31 +0200)]
build: Allow building with autoconf 2.63

Reduces the dependency on autoconf from 2.64 to 2.63 to ease building
on older platforms. There is only a few macros missing and they can
be provided easily.

A handful of tests needed modification. The difference in quoting
behaviour between 2.63 and later require the m4_define() to be
manually unfolded.

The Debian control file is left untouched on purpose. The decision
whether to adjust the dependency is left to the respective maintainers.

Tested with autoconf 2.63 and 2.69.

Cc: Scott Mann <smann@noironetworks.com>
Cc: Don Kehn <dkehn@noironetworks.com>
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agotest-ovsdb: Workaround unicode bug in Python 2.4.x.
Gurucharan Shetty [Fri, 20 Jun 2014 20:13:51 +0000 (13:13 -0700)]
test-ovsdb: Workaround unicode bug in Python 2.4.x.

Run the following command on Xenserver:
PYTHONPATH=`pwd`/python/compat::`pwd`/python python ./tests/test-ovsdb.py \
parse-atoms '{"type": "string", "minLength": 2}'    \
'[""]'     '["a"]'     '["ab"]'     '["abc"]'     '["\ud834\udd1e"]'

And we get the following error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001d11e'
in position 23: ordinal not in range(128).

It looks like we are hitting the following bug:
http://bugs.python.org/issue2517

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-By: Ben Pfaff <blp@nicira.com>
9 years agovlan-splinters.at: Skip the test for Windows.
Gurucharan Shetty [Tue, 17 Jun 2014 20:24:37 +0000 (13:24 -0700)]
vlan-splinters.at: Skip the test for Windows.

vlan splinters is to workaround buggy network drivers of Linux.
Skip the test for Windows.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-ofctl: Ability to read a hex string from file.
Gurucharan Shetty [Wed, 18 Jun 2014 18:12:36 +0000 (11:12 -0700)]
ovs-ofctl: Ability to read a hex string from file.

The unit test, "OFPST_TABLE reply - OF1.2" in ofp-print.at
sends a very large hex string as an argument to 'ovs-ofctl ofp-print'.
The length of the hex string exceeds the maximum command line length
in Windows. With this commit, we can pass the same hex string by
placing it inside a file.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-vsctl.at: Workaround lack of 'kill -l' on Windows.
Gurucharan Shetty [Tue, 17 Jun 2014 19:35:32 +0000 (12:35 -0700)]
ovs-vsctl.at: Workaround lack of 'kill -l' on Windows.

Also, fflush(stderr) when we raise a signal. The test
this commit is changing would fail otherwise.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoutil: Set two digit exponents for scientific notation.
Gurucharan Shetty [Tue, 17 Jun 2014 18:16:06 +0000 (11:16 -0700)]
util: Set two digit exponents for scientific notation.

By default, three digit exponents are printed on Windows.
Many unit tests in Open vSwitch expect two digit exponents.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovsdb-tool: Workaround inability to replace existing file on Windows.
Gurucharan Shetty [Mon, 16 Jun 2014 22:49:06 +0000 (15:49 -0700)]
ovsdb-tool: Workaround inability to replace existing file on Windows.

rename() on an existing destination file fails on Windows. This commit
worksaround that problem.

There are two tests that test it. But both of them use the ovsdb-server's
--run option for the test and it does not exist in Windows. So change
the test to workaround the lack of that feature.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovsdb-server.at: Handle different error message for already opened database.
Gurucharan Shetty [Mon, 2 Jun 2014 17:49:18 +0000 (10:49 -0700)]
ovsdb-server.at: Handle different error message for already opened database.

Commit ebed9f78(ovsdb-server: Improve message for "add-db" of
database already open.) improved the error message seen when
opening an already opened database on Linux. For Windows,
we still need to look for the lockfile error message.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agointerface-reconfigure.at: Skip POSIX specfic tests for Windows.
Gurucharan Shetty [Mon, 16 Jun 2014 20:01:36 +0000 (13:01 -0700)]
interface-reconfigure.at: Skip POSIX specfic tests for Windows.

interface-reconfigure.at mostly tests functionality for Xenserver
and the tests use some POSIX only features. Skip them for Windows.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoofproto-dpif.at: Skip sflow tests in Windows.
Gurucharan Shetty [Mon, 16 Jun 2014 18:18:33 +0000 (11:18 -0700)]
ofproto-dpif.at: Skip sflow tests in Windows.

sflow feature needs to be investigated for Windows. Right now
test-sflow related tests do not pass because of LOOPBACK_INTERFACE
constraints for 'agent'. Add a TODO item and skip the tests.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agofile_name.at: Skip a symlink related test for Windows.
Gurucharan Shetty [Mon, 16 Jun 2014 18:08:14 +0000 (11:08 -0700)]
file_name.at: Skip a symlink related test for Windows.

There is no one-one mapping of symlinks between Linux and
Windows. This test currently fails on Windows and we do not
really need this functionality on Windows. So skip it.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-ofctl.at: Prevent msys from getting confused with ipv6 address.
Gurucharan Shetty [Mon, 9 Jun 2014 17:04:58 +0000 (10:04 -0700)]
ovs-ofctl.at: Prevent msys from getting confused with ipv6 address.

msys has a set of rules which triggers an automatic conversion of
arguments into something else to suit Windows requirements. Sometimes
this also causes unwanted conversions. Details of the rules is here:
http://www.mingw.org/wiki/Posix_path_conversion

msys converts ::1/::1 into ;1\;1. To prevent this, use fullform
ipv6 address of the form 0:0:0:0:0:0:0:1 instead.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agorconn: Don't warn when peer abruptly closes connection.
Gurucharan Shetty [Mon, 9 Jun 2014 15:47:12 +0000 (08:47 -0700)]
rconn: Don't warn when peer abruptly closes connection.

On Windows, when a peer terminates without calling a close
on socket fd, the server ends up printing "connection dropped"
warning messages. We probably don't want those warning messages
when the error is WSAECONNRESET.

(In OVS unit tests on Windows, anytime a client like ovs-ofctl
calls a ovs_fatal without clean close(fd) on the socket, the
server like ovs-vswitchd prints warnings that cause unit tests
to fail.)

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-vsctl.at: Adjust test output for Windows.
Gurucharan Shetty [Mon, 2 Jun 2014 21:23:44 +0000 (14:23 -0700)]
ovs-vsctl.at: Adjust test output for Windows.

"xargs echo" introduces "^M" character inbetween the
records on Windows. Workaround it.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovsdb-server.at: Adjust PKIDIR for ssl test.
Gurucharan Shetty [Mon, 2 Jun 2014 20:04:35 +0000 (13:04 -0700)]
ovsdb-server.at: Adjust PKIDIR for ssl test.

For this particular test, we pass the PKIDIR through a
ovsdb-tool transact and msys does not convert the path style.
(On Windows, we have to pass the directory in the form C:/foo/bar.pem.)

So get the Windows style path through 'pwd -W'(which is called through
the function pwd ())

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovsdb-server.at: Use different test for socket files.
Gurucharan Shetty [Mon, 2 Jun 2014 17:46:23 +0000 (10:46 -0700)]
ovsdb-server.at: Use different test for socket files.

On Windows, we really do not have any "socket" files created.
We instead create a regular file with a TCP port number written
inside it.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoovsdb-server.at: Skip a few tests with '--monitor' option.
Gurucharan Shetty [Mon, 16 Jun 2014 17:09:14 +0000 (10:09 -0700)]
ovsdb-server.at: Skip a few tests with '--monitor' option.

We do not support that option in Windows.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agostream-tcp: Cleanup files created for Windows "unix" sockets.
Gurucharan Shetty [Mon, 2 Jun 2014 17:20:34 +0000 (10:20 -0700)]
stream-tcp: Cleanup files created for Windows "unix" sockets.

On Windows, we create "unix sockets" by creating TCP sockets
and hiding the TCP port number in files. When we close the
pstream session, we need to delete the file.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agodpif: Explain a bug, why it isn't important, and why it's hard to fix.
Ben Pfaff [Wed, 25 Jun 2014 18:02:10 +0000 (11:02 -0700)]
dpif: Explain a bug, why it isn't important, and why it's hard to fix.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agodpif: When executing an operation with help, log the sub-executes.
Ben Pfaff [Wed, 25 Jun 2014 17:44:25 +0000 (10:44 -0700)]
dpif: When executing an operation with help, log the sub-executes.

This should make it easier to identify problems that result from bugs in
the execution helper.

Suggested-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agodpif: Consistently help datapath to execute oversized actions.
Ben Pfaff [Wed, 25 Jun 2014 17:08:04 +0000 (10:08 -0700)]
dpif: Consistently help datapath to execute oversized actions.

Only dpif_execute() was checking for oversized actions but dpif_operate()
should do so also.  This fixes the problem.

Found by inspection.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
9 years agodatapath: Remove redundant tcp_flags code.
Joe Stringer [Thu, 26 Jun 2014 11:24:24 +0000 (11:24 +0000)]
datapath: Remove redundant tcp_flags code.

These two cases used to be treated differently for IPv4/IPv6, but they
are now identical.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agolib: Fix FreeBSD build.
Joe Stringer [Wed, 25 Jun 2014 07:41:41 +0000 (07:41 +0000)]
lib: Fix FreeBSD build.

Various recent commits have introduced build failures on FreeBSD. This
patch fixes them.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/classifier: Clarify find_match_wc().
Jarno Rajahalme [Fri, 13 Jun 2014 17:38:05 +0000 (10:38 -0700)]
lib/classifier: Clarify find_match_wc().

Reduce the number of goto statements by returning via a new helper
fill_range_wc() when no match is found.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/classifier: Add miniflow_and_mask_matches_flow_wc.
Jarno Rajahalme [Fri, 13 Jun 2014 17:38:05 +0000 (10:38 -0700)]
lib/classifier: Add miniflow_and_mask_matches_flow_wc.

miniflow_and_mask_matches_flow_wc() fills in the masks in flow
wildcards, so a separate step to that effect is no longer needed.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/classifier: Optimize megaflows for single rule case.
Jarno Rajahalme [Fri, 13 Jun 2014 17:38:05 +0000 (10:38 -0700)]
lib/classifier: Optimize megaflows for single rule case.

When, during a classifier lookup, we narrow down to a single potential
rule, it is enough to match on ("unwildcard") one bit that differs
between the packet and the rule.

This is a special case of the more general algorithm, where it is
sufficient to match on enough bits that separates the packet from all
higher priority rules than the matched rule.  For a miss that would be
all the rules.  Implementing this is expensive for a more than a few
rules.  This patch starts by doing this for a single rule when we
already have it, also reducing the lookup cost by finishing the lookup
earlier than before.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agolib/pvector: Non-intrusive RCU priority vector.
Jarno Rajahalme [Thu, 26 Jun 2014 14:41:25 +0000 (07:41 -0700)]
lib/pvector: Non-intrusive RCU priority vector.

Factor out the priority vector code from the classifier.

Making the classifier use RCU instead of locking requires parallel
access to the priority vector, pointing to subtables in descending
priority order.  When a new subtable is added, a new copy of the
priority vector is allocated, while the current readers can keep on
using the old copy they started with.  Adding and removing subtables
is usually less frequent than adding and removing rules, so this
should not have a visible performance implication.  As an optimization
for the userspace datapath use, where all the subtables have the same
priority, new subtables can be added to the end of the vector without
reallocation and without disturbing readers.

cls_subtables_reset() is now removed, as it served its purpose in bug
hunting.  Checks on the new pvector are now incorporated into
tests/test-classifier.c.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
9 years agoFAQ: add an entry for MAC learning + VLAN
YAMAMOTO Takashi [Wed, 25 Jun 2014 02:06:18 +0000 (11:06 +0900)]
FAQ: add an entry for MAC learning + VLAN

Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
9 years agodpif-netdev: delete lost packets in dp_execute_cb()
Daniele Di Proietto [Wed, 25 Jun 2014 18:39:34 +0000 (11:39 -0700)]
dpif-netdev: delete lost packets in dp_execute_cb()

This commit fixes memory leaks in dp_execute_cb() in two cases:
    - when the output port cannot be found
    - when the recirculation depth is exceeded

Reported-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agodatapath: fix sparse warning in function tbl_mask_array_delete_mask()
Andy Zhou [Tue, 24 Jun 2014 06:00:55 +0000 (23:00 -0700)]
datapath: fix sparse warning in function tbl_mask_array_delete_mask()

Sparse gives "incompatible types in comparison expression (different
address spaces)" warning messages. Fix this by add rcu_dereference()
wrappers.

Reported-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agoofproto-dpif: Add MPLS tests
Simon Horman [Mon, 23 Jun 2014 23:46:33 +0000 (08:46 +0900)]
ofproto-dpif: Add MPLS tests

Co-authored-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoofp-actions: Allow pop_mpls on MPLS packets
Simon Horman [Mon, 23 Jun 2014 23:46:32 +0000 (08:46 +0900)]
ofp-actions: Allow pop_mpls on MPLS packets

With recirculation in place this should be safe.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoofproto-dpif: MPLS recirculation
Simon Horman [Mon, 23 Jun 2014 23:46:31 +0000 (08:46 +0900)]
ofproto-dpif: MPLS recirculation

In some cases an pop MPLS action changes a packet to be a non-mpls packet.
In this case subsequent any L3+ actions require access to portions
of the packet which were not decoded as they were opaque when the
packet was MPLS. Allow such actions to be translated by
first recirculating the packet.

Co-authored-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoCodingStyle: Functions in .c files should usually not be marked "inline".
Ben Pfaff [Wed, 25 Jun 2014 20:58:41 +0000 (13:58 -0700)]
CodingStyle: Functions in .c files should usually not be marked "inline".

I thought this was in here already, but I guess not.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agodpif-netdev: Fix memory leak in dpif_netdev_flow_put()
Ryan Wilson [Wed, 25 Jun 2014 20:05:17 +0000 (13:05 -0700)]
dpif-netdev: Fix memory leak in dpif_netdev_flow_put()

miniflow_destroy() needs to be called after using miniflow_init().
Otherwise, if the miniflow mallocs data, then a memory leak may
occur.

Found by inspection.

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agodpif-netdev: Implement batched flow dumping.
Ryan Wilson [Mon, 23 Jun 2014 19:36:11 +0000 (12:36 -0700)]
dpif-netdev: Implement batched flow dumping.

Previously, flows were retrieved one by one when dumping flows for
datapaths of type 'netdev'. This increased contention for the dump's
mutex, negatively affecting revalidator performance.

This patch retrieves batches of flows when dumping flows for datapaths
of type 'netdev'.

Signed-off-by: Ryan Wilson <wryan@nicira.com>
[blp@nicira.com relaxed max_flows restriction]
Signed-off-by: Ben Pfaff <blp@nicira.com>
9 years agoovs-ofctl: Document form of packet parameters to "packet-out".
Ben Pfaff [Wed, 25 Jun 2014 19:27:25 +0000 (12:27 -0700)]
ovs-ofctl: Document form of packet parameters to "packet-out".

The syntax is inconvenient but at least it's documented now.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Justin Pettit <jpettit@nicira.com>
9 years agojson: Fix parsing of strings that end with a backslash.
Ben Pfaff [Wed, 25 Jun 2014 18:39:25 +0000 (11:39 -0700)]
json: Fix parsing of strings that end with a backslash.

json_string_unescape() flagged a backslash at the end of a string as an
error, but of course "\\" is a valid string.  This fixes the problem.

VMware-BZ: #1275208
Reported-by: Michael Hu <mhu@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
9 years agodpif-linux: Minor style fixes.
Ben Pfaff [Wed, 25 Jun 2014 18:34:44 +0000 (11:34 -0700)]
dpif-linux: Minor style fixes.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>
9 years agodpif-netdev: Delete packet if not able to do upcall
Daniele Di Proietto [Tue, 24 Jun 2014 23:08:34 +0000 (16:08 -0700)]
dpif-netdev: Delete packet if not able to do upcall

In dp_netdev_input() we nevered fully covered the case where handler queues are
not there.
With this change we increment the stat counter and free the packet.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetdev-dpdk: Disable NIC offloading and multiseg mbufs
Daniele Di Proietto [Tue, 24 Jun 2014 23:05:02 +0000 (16:05 -0700)]
netdev-dpdk: Disable NIC offloading and multiseg mbufs

We do not use any offloading (now) or multiple segments per packet, so
we might as well disable those features while configuring the NIC.

This could give performance improvements. For ixgbe, for example, this change
allows the driver to use a simpler tx routine, resulting in throuput
improvements (~7.5%)

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
9 years agonetdev-dpdk: Fix coding style in TX/RX conf structs
Daniele Di Proietto [Tue, 24 Jun 2014 23:05:01 +0000 (16:05 -0700)]
netdev-dpdk: Fix coding style in TX/RX conf structs

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>