From: Thomas Graf Date: Tue, 28 Oct 2014 10:19:52 +0000 (+0100) Subject: doc: Convert docs to Markdown language X-Git-Tag: v2.4.0~1106 X-Git-Url: http://git.cascardo.info/?p=cascardo%2Fovs.git;a=commitdiff_plain;h=542cc9bb8b8817866afcd692a78fa591db5839dc doc: Convert docs to Markdown language Converts the majority of docs over to use the Markdown language for pretty printing on GitHub. It's a rough first convertion without exploiting the full potential of Markdown at this point. Section titles and indentation are fixed as needed. Minimal docs interlinking is added. Signed-off-by: Thomas Graf Signed-off-by: Ben Pfaff --- diff --git a/CONTRIBUTING b/CONTRIBUTING deleted file mode 100644 index dfbb17155..000000000 --- a/CONTRIBUTING +++ /dev/null @@ -1,316 +0,0 @@ -How to Submit Patches for Open vSwitch -====================================== - -Send changes to Open vSwitch as patches to dev@openvswitch.org. -One patch per email, please. More details are included below. - -If you are using Git, then "git format-patch" takes care of most of -the mechanics described below for you. - -Before You Start ----------------- - -Before you send patches at all, make sure that each patch makes sense. -In particular: - - - A given patch should not break anything, even if later - patches fix the problems that it causes. The source tree - should still build and work after each patch is applied. - (This enables "git bisect" to work best.) - - - A patch should make one logical change. Don't make - multiple, logically unconnected changes to disparate - subsystems in a single patch. - - - A patch that adds or removes user-visible features should - also update the appropriate user documentation or manpages. - -Testing is also important: - - - A patch that adds or deletes files should be tested with - "make distcheck" before submission. - - - A patch that modifies Linux kernel code should be at least - build-tested on various Linux kernel versions before - submission. I suggest versions 2.6.32 and whatever - the current latest release version is at the time. - - - A patch that modifies the ofproto or vswitchd code should be - tested in at least simple cases before submission. - - - A patch that modifies xenserver code should be tested on - XenServer before submission. - -If you are using GitHub, then you may utilize the travis-ci.org CI build -system by linking your GitHub repository to it. This will run some of -the above tests automatically when you push changes to your repository. -See the "Continuous Integration with Travis-CI" in the INSTALL file for -details on how to set it up. - -Email Subject -------------- - -The subject line of your email should be in the following format: -[PATCH /] : - - - [PATCH /] indicates that this is the nth of a series - of m patches. It helps reviewers to read patches in the - correct order. You may omit this prefix if you are sending - only one patch. - - - : indicates the area of the Open vSwitch to which the - change applies (often the name of a source file or a - directory). You may omit it if the change crosses multiple - distinct pieces of code. - - - briefly describes the change. - -The subject, minus the [PATCH /] prefix, becomes the first line -of the commit's change log message. - -Description ------------ - -The body of the email should start with a more thorough description of -the change. This becomes the body of the commit message, following -the subject. There is no need to duplicate the summary given in the -subject. - -Please limit lines in the description to 79 characters in width. - -The description should include: - - - The rationale for the change. - - - Design description and rationale (but this might be better - added as code comments). - - - Testing that you performed (or testing that should be done - but you could not for whatever reason). - - - Tags (see below). - -There is no need to describe what the patch actually changed, if the -reader can see it for himself. - -If the patch refers to a commit already in the Open vSwitch -repository, please include both the commit number and the subject of -the patch, e.g. 'commit 632d136c (vswitch: Remove restriction on -datapath names.)'. - -If you, the person sending the patch, did not write the patch -yourself, then the very first line of the body should take the form -"From: ", followed by a blank line. This -will automatically cause the named author to be credited with -authorship in the repository. - -Tags ----- - -The description ends with a series of tags, written one to a line as -the last paragraph of the email. Each tag indicates some property of -the patch in an easily machine-parseable manner. - -Examples of common tags follow. - - Signed-off-by: Author Name - - Informally, this indicates that Author Name is the author or - submitter of a patch and has the authority to submit it under - the terms of the license. The formal meaning is to agree to - the Developer's Certificate of Origin (see below). - - If the author and submitter are different, each must sign off. - If the patch has more than one author, all must sign off. - - Signed-off-by: Author Name - Signed-off-by: Submitter Name - - Co-authored-by: Author Name - - Git can only record a single person as the author of a given - patch. In the rare event that a patch has multiple authors, - one must be given the credit in Git and the others must be - credited via Co-authored-by: tags. (All co-authors must also - sign off.) - - Acked-by: Reviewer Name - - Reviewers will often give an Acked-by: tag to code of which - they approve. It is polite for the submitter to add the tag - before posting the next version of the patch or applying the - patch to the repository. Quality reviewing is hard work, so - this gives a small amount of credit to the reviewer. - - Not all reviewers give Acked-by: tags when they provide - positive reviews. It's customary only to add tags from - reviewers who actually provide them explicitly. - - Tested-by: Tester Name - - When someone tests a patch, it is customary to add a - Tested-by: tag indicating that. It's rare for a tester to - actually provide the tag; usually the patch submitter makes - the tag himself in response to an email indicating successful - testing results. - - Reported-by: Reporter Name - - When a patch fixes a bug reported by some person, please - credit the reporter in the commit log in this fashion. Please - also add the reporter's name and email address to the list of - people who provided helpful bug reports in the AUTHORS file at - the top of the source tree. - - Fairly often, the reporter of a bug also tests the fix. - Occasionally one sees a combined "Reported-and-tested-by:" tag - used to indicate this. It is also acceptable, and more - common, to include both tags separately. - - (If a bug report is received privately, it might not always be - appropriate to publicly credit the reporter. If in doubt, - please ask the reporter.) - - Requested-by: Requester Name - Suggested-by: Suggester Name - - When a patch implements a request or a suggestion made by some - person, please credit that person in the commit log in this - fashion. For a helpful suggestion, please also add the - person's name and email address to the list of people who - provided suggestions in the AUTHORS file at the top of the - source tree. - - (If a suggestion or a request is received privately, it might - not always be appropriate to publicly give credit. If in - doubt, please ask.) - - Reported-at: - - If a patch fixes or is otherwise related to a bug reported in - a public bug tracker, please include a reference to the bug in - the form of a URL to the specific bug, e.g.: - - Reported-at: https://bugs.debian.org/743635 - - This is also an appropriate way to refer to bug report emails - in public email archives, e.g.: - - Reported-at: http://openvswitch.org/pipermail/dev/2014-June/040952.html - - VMware-BZ: #1234567 - ONF-JIRA: EXT-12345 - - If a patch fixes or is otherwise related to a bug reported in - a private bug tracker, you may include some tracking ID for - the bug for your own reference. Please include some - identifier to make the origin clear, e.g. "VMware-BZ" refers - to VMware's internal Bugzilla instance and "ONF-JIRA" refers - to the Open Networking Foundation's JIRA bug tracker. - - Bug #1234567. - Issue: 1234567 - - These are obsolete forms of VMware-BZ: that can still be seen - in old change log entries. (They are obsolete because they do - not tell the reader what bug tracker is referred to.) - -Developer's Certificate of Origin ---------------------------------- - -To help track the author of a patch as well as the submission chain, -and be clear that the developer has authority to submit a patch for -inclusion in openvswitch please sign off your work. The sign off -certifies the following: - - Developer's Certificate of Origin 1.1 - - By making a contribution to this project, I certify that: - - (a) The contribution was created in whole or in part by me and I - have the right to submit it under the open source license - indicated in the file; or - - (b) The contribution is based upon previous work that, to the best - of my knowledge, is covered under an appropriate open source - license and I have the right under that license to submit that - work with modifications, whether created in whole or in part - by me, under the same open source license (unless I am - permitted to submit under a different license), as indicated - in the file; or - - (c) The contribution was provided directly to me by some other - person who certified (a), (b) or (c) and I have not modified - it. - - (d) I understand and agree that this project and the contribution - are public and that a record of the contribution (including all - personal information I submit with it, including my sign-off) is - maintained indefinitely and may be redistributed consistent with - this project or the open source license(s) involved. - -Comments --------- - -If you want to include any comments in your email that should not be -part of the commit's change log message, put them after the -description, separated by a line that contains just "---". It may be -helpful to include a diffstat here for changes that touch multiple -files. - -Patch ------ - -The patch should be in the body of the email following the description, -separated by a blank line. - -Patches should be in "diff -up" format. We recommend that you use Git -to produce your patches, in which case you should use the -M -C -options to "git diff" (or other Git tools) if your patch renames or -copies files. Quilt (http://savannah.nongnu.org/projects/quilt) might -be useful if you do not want to use Git. - -Patches should be inline in the email message. Some email clients -corrupt white space or wrap lines in patches. There are hints on how -to configure many email clients to avoid this problem at: - http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob_plain;f=Documentation/email-clients.txt -If you cannot convince your email client not to mangle patches, then -sending the patch as an attachment is a second choice. - -Please follow the style used in the code that you are modifying. The -CodingStyle file describes the coding style used in most of Open -vSwitch. Use Linux kernel coding style for Linux kernel code. - -Example -------- - -From fa29a1c2c17682879e79a21bb0cdd5bbe67fa7c0 Mon Sep 17 00:00:00 2001 -From: Jesse Gross -Date: Thu, 8 Dec 2011 13:17:24 -0800 -Subject: [PATCH] datapath: Alphabetize include/net/ipv6.h compat header. - -Signed-off-by: Jesse Gross ---- - datapath/linux/Modules.mk | 2 +- - 1 files changed, 1 insertions(+), 1 deletions(-) - -diff --git a/datapath/linux/Modules.mk b/datapath/linux/Modules.mk -index fdd952e..f6cb88e 100644 ---- a/datapath/linux/Modules.mk -+++ b/datapath/linux/Modules.mk -@@ -56,11 +56,11 @@ openvswitch_headers += \ - linux/compat/include/net/dst.h \ - linux/compat/include/net/genetlink.h \ - linux/compat/include/net/ip.h \ -+ linux/compat/include/net/ipv6.h \ - linux/compat/include/net/net_namespace.h \ - linux/compat/include/net/netlink.h \ - linux/compat/include/net/protocol.h \ - linux/compat/include/net/route.h \ -- linux/compat/include/net/ipv6.h \ - linux/compat/genetlink.inc - - both_modules += brcompat --- -1.7.7.3 - diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 000000000..b434e78e3 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,318 @@ +How to Submit Patches for Open vSwitch +====================================== + +Send changes to Open vSwitch as patches to dev@openvswitch.org. +One patch per email, please. More details are included below. + +If you are using Git, then `git format-patch` takes care of most of +the mechanics described below for you. + +Before You Start +---------------- + +Before you send patches at all, make sure that each patch makes sense. +In particular: + + - A given patch should not break anything, even if later + patches fix the problems that it causes. The source tree + should still build and work after each patch is applied. + (This enables `git bisect` to work best.) + + - A patch should make one logical change. Don't make + multiple, logically unconnected changes to disparate + subsystems in a single patch. + + - A patch that adds or removes user-visible features should + also update the appropriate user documentation or manpages. + +Testing is also important: + + - A patch that adds or deletes files should be tested with + `make distcheck` before submission. + + - A patch that modifies Linux kernel code should be at least + build-tested on various Linux kernel versions before + submission. I suggest versions 2.6.32 and whatever + the current latest release version is at the time. + + - A patch that modifies the ofproto or vswitchd code should be + tested in at least simple cases before submission. + + - A patch that modifies xenserver code should be tested on + XenServer before submission. + +If you are using GitHub, then you may utilize the travis-ci.org CI build +system by linking your GitHub repository to it. This will run some of +the above tests automatically when you push changes to your repository. +See the "Continuous Integration with Travis-CI" in the [INSTALL](INSTALL.md) +file for details on how to set it up. + +Email Subject +------------- + +The subject line of your email should be in the following format: +`[PATCH /] : ` + + - `[PATCH /]` indicates that this is the nth of a series + of m patches. It helps reviewers to read patches in the + correct order. You may omit this prefix if you are sending + only one patch. + + - `:` indicates the area of the Open vSwitch to which the + change applies (often the name of a source file or a + directory). You may omit it if the change crosses multiple + distinct pieces of code. + + - `` briefly describes the change. + +The subject, minus the `[PATCH /]` prefix, becomes the first line +of the commit's change log message. + +Description +----------- + +The body of the email should start with a more thorough description of +the change. This becomes the body of the commit message, following +the subject. There is no need to duplicate the summary given in the +subject. + +Please limit lines in the description to 79 characters in width. + +The description should include: + + - The rationale for the change. + + - Design description and rationale (but this might be better + added as code comments). + + - Testing that you performed (or testing that should be done + but you could not for whatever reason). + + - Tags (see below). + +There is no need to describe what the patch actually changed, if the +reader can see it for himself. + +If the patch refers to a commit already in the Open vSwitch +repository, please include both the commit number and the subject of +the patch, e.g. 'commit 632d136c (vswitch: Remove restriction on +datapath names.)'. + +If you, the person sending the patch, did not write the patch +yourself, then the very first line of the body should take the form +`From: `, followed by a blank line. This +will automatically cause the named author to be credited with +authorship in the repository. + +Tags +---- + +The description ends with a series of tags, written one to a line as +the last paragraph of the email. Each tag indicates some property of +the patch in an easily machine-parseable manner. + +Examples of common tags follow. + + Signed-off-by: Author Name + + Informally, this indicates that Author Name is the author or + submitter of a patch and has the authority to submit it under + the terms of the license. The formal meaning is to agree to + the Developer's Certificate of Origin (see below). + + If the author and submitter are different, each must sign off. + If the patch has more than one author, all must sign off. + + Signed-off-by: Author Name + Signed-off-by: Submitter Name + + Co-authored-by: Author Name + + Git can only record a single person as the author of a given + patch. In the rare event that a patch has multiple authors, + one must be given the credit in Git and the others must be + credited via Co-authored-by: tags. (All co-authors must also + sign off.) + + Acked-by: Reviewer Name + + Reviewers will often give an Acked-by: tag to code of which + they approve. It is polite for the submitter to add the tag + before posting the next version of the patch or applying the + patch to the repository. Quality reviewing is hard work, so + this gives a small amount of credit to the reviewer. + + Not all reviewers give Acked-by: tags when they provide + positive reviews. It's customary only to add tags from + reviewers who actually provide them explicitly. + + Tested-by: Tester Name + + When someone tests a patch, it is customary to add a + Tested-by: tag indicating that. It's rare for a tester to + actually provide the tag; usually the patch submitter makes + the tag himself in response to an email indicating successful + testing results. + + Reported-by: Reporter Name + + When a patch fixes a bug reported by some person, please + credit the reporter in the commit log in this fashion. Please + also add the reporter's name and email address to the list of + people who provided helpful bug reports in the AUTHORS file at + the top of the source tree. + + Fairly often, the reporter of a bug also tests the fix. + Occasionally one sees a combined "Reported-and-tested-by:" tag + used to indicate this. It is also acceptable, and more + common, to include both tags separately. + + (If a bug report is received privately, it might not always be + appropriate to publicly credit the reporter. If in doubt, + please ask the reporter.) + + Requested-by: Requester Name + Suggested-by: Suggester Name + + When a patch implements a request or a suggestion made by some + person, please credit that person in the commit log in this + fashion. For a helpful suggestion, please also add the + person's name and email address to the list of people who + provided suggestions in the AUTHORS file at the top of the + source tree. + + (If a suggestion or a request is received privately, it might + not always be appropriate to publicly give credit. If in + doubt, please ask.) + + Reported-at: + + If a patch fixes or is otherwise related to a bug reported in + a public bug tracker, please include a reference to the bug in + the form of a URL to the specific bug, e.g.: + + Reported-at: https://bugs.debian.org/743635 + + This is also an appropriate way to refer to bug report emails + in public email archives, e.g.: + + Reported-at: http://openvswitch.org/pipermail/dev/2014-June/040952.html + + VMware-BZ: #1234567 + ONF-JIRA: EXT-12345 + + If a patch fixes or is otherwise related to a bug reported in + a private bug tracker, you may include some tracking ID for + the bug for your own reference. Please include some + identifier to make the origin clear, e.g. "VMware-BZ" refers + to VMware's internal Bugzilla instance and "ONF-JIRA" refers + to the Open Networking Foundation's JIRA bug tracker. + + Bug #1234567. + Issue: 1234567 + + These are obsolete forms of VMware-BZ: that can still be seen + in old change log entries. (They are obsolete because they do + not tell the reader what bug tracker is referred to.) + +Developer's Certificate of Origin +--------------------------------- + +To help track the author of a patch as well as the submission chain, +and be clear that the developer has authority to submit a patch for +inclusion in openvswitch please sign off your work. The sign off +certifies the following: + + Developer's Certificate of Origin 1.1 + + By making a contribution to this project, I certify that: + + (a) The contribution was created in whole or in part by me and I + have the right to submit it under the open source license + indicated in the file; or + + (b) The contribution is based upon previous work that, to the best + of my knowledge, is covered under an appropriate open source + license and I have the right under that license to submit that + work with modifications, whether created in whole or in part + by me, under the same open source license (unless I am + permitted to submit under a different license), as indicated + in the file; or + + (c) The contribution was provided directly to me by some other + person who certified (a), (b) or (c) and I have not modified + it. + + (d) I understand and agree that this project and the contribution + are public and that a record of the contribution (including all + personal information I submit with it, including my sign-off) is + maintained indefinitely and may be redistributed consistent with + this project or the open source license(s) involved. + +Comments +-------- + +If you want to include any comments in your email that should not be +part of the commit's change log message, put them after the +description, separated by a line that contains just `---`. It may be +helpful to include a diffstat here for changes that touch multiple +files. + +Patch +----- + +The patch should be in the body of the email following the description, +separated by a blank line. + +Patches should be in `diff -up` format. We recommend that you use Git +to produce your patches, in which case you should use the `-M -C` +options to `git diff` (or other Git tools) if your patch renames or +copies files. Quilt (http://savannah.nongnu.org/projects/quilt) might +be useful if you do not want to use Git. + +Patches should be inline in the email message. Some email clients +corrupt white space or wrap lines in patches. There are hints on how +to configure many email clients to avoid this problem at: + http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob_plain;f=Documentation/email-clients.txt +If you cannot convince your email client not to mangle patches, then +sending the patch as an attachment is a second choice. + +Please follow the style used in the code that you are modifying. The +[CodingStyle](CodingStyle.md) file describes the coding style used in +most of Open vSwitch. Use Linux kernel coding style for Linux kernel code. + +Example +------- + +``` +From fa29a1c2c17682879e79a21bb0cdd5bbe67fa7c0 Mon Sep 17 00:00:00 2001 +From: Jesse Gross +Date: Thu, 8 Dec 2011 13:17:24 -0800 +Subject: [PATCH] datapath: Alphabetize include/net/ipv6.h compat header. + +Signed-off-by: Jesse Gross +--- + datapath/linux/Modules.mk | 2 +- + 1 files changed, 1 insertions(+), 1 deletions(-) + +diff --git a/datapath/linux/Modules.mk b/datapath/linux/Modules.mk +index fdd952e..f6cb88e 100644 +--- a/datapath/linux/Modules.mk ++++ b/datapath/linux/Modules.mk +@@ -56,11 +56,11 @@ openvswitch_headers += \ + linux/compat/include/net/dst.h \ + linux/compat/include/net/genetlink.h \ + linux/compat/include/net/ip.h \ ++ linux/compat/include/net/ipv6.h \ + linux/compat/include/net/net_namespace.h \ + linux/compat/include/net/netlink.h \ + linux/compat/include/net/protocol.h \ + linux/compat/include/net/route.h \ +- linux/compat/include/net/ipv6.h \ + linux/compat/genetlink.inc + + both_modules += brcompat +-- +1.7.7.3 +``` + diff --git a/CodingStyle b/CodingStyle deleted file mode 100644 index d1ef65b5a..000000000 --- a/CodingStyle +++ /dev/null @@ -1,569 +0,0 @@ - Open vSwitch Coding Style - ========================= - -This file describes the coding style used in most C files in the Open -vSwitch distribution. However, Linux kernel code datapath directory -follows the Linux kernel's established coding conventions. For the -Windows kernel datapath code, use the coding style described in -datapath-windows/CodingStyle. - -The following GNU indent options approximate this style: - - -npro -bad -bap -bbb -br -blf -brs -cdw -ce -fca -cli0 -npcs -i4 -l79 \ - -lc79 -nbfda -nut -saf -sai -saw -sbi4 -sc -sob -st -ncdb -pi4 -cs -bs \ - -di1 -lp -il0 -hnl - - -BASICS - - Limit lines to 79 characters. - - Use form feeds (control+L) to divide long source files into logical -pieces. A form feed should appear as the only character on a line. - - Do not use tabs for indentation. - - Avoid trailing spaces on lines. - - -NAMING - - Use names that explain the purpose of a function or object. - - Use underscores to separate words in an identifier: multi_word_name. - - Use lowercase for most names. Use uppercase for macros, macro -parameters, and members of enumerations. - - Give arrays names that are plural. - - Pick a unique name prefix (ending with an underscore) for each -module, and apply that prefix to all of that module's externally -visible names. Names of macro parameters, struct and union members, -and parameters in function prototypes are not considered externally -visible for this purpose. - - Do not use names that begin with _. If you need a name for -"internal use only", use __ as a suffix instead of a prefix. - - Avoid negative names: "found" is a better name than "not_found". - - In names, a "size" is a count of bytes, a "length" is a count of -characters. A buffer has size, but a string has length. The length -of a string does not include the null terminator, but the size of the -buffer that contains the string does. - - -COMMENTS - - Comments should be written as full sentences that start with a -capital letter and end with a period. Put two spaces between -sentences. - - Write block comments as shown below. You may put the /* and */ on -the same line as comment text if you prefer. - - /* - * We redirect stderr to /dev/null because we often want to remove all - * traffic control configuration on a port so its in a known state. If - * this done when there is no such configuration, tc complains, so we just - * always ignore it. - */ - - Each function and each variable declared outside a function, and -each struct, union, and typedef declaration should be preceded by a -comment. See FUNCTION DEFINITIONS below for function comment -guidelines. - - Each struct and union member should each have an inline comment that -explains its meaning. structs and unions with many members should be -additionally divided into logical groups of members by block comments, -e.g.: - - /* An event that will wake the following call to poll_block(). */ - struct poll_waiter { - /* Set when the waiter is created. */ - struct list node; /* Element in global waiters list. */ - int fd; /* File descriptor. */ - short int events; /* Events to wait for (POLLIN, POLLOUT). */ - poll_fd_func *function; /* Callback function, if any, or null. */ - void *aux; /* Argument to callback function. */ - struct backtrace *backtrace; /* Event that created waiter, or null. */ - - /* Set only when poll_block() is called. */ - struct pollfd *pollfd; /* Pointer to element of the pollfds array - (null if added from a callback). */ - }; - - Use XXX or FIXME comments to mark code that needs work. - - Don't use // comments. - - Don't comment out or #if 0 out code. Just remove it. The code that -was there will still be in version control history. - - -FUNCTIONS - - Put the return type, function name, and the braces that surround the -function's code on separate lines, all starting in column 0. - - Before each function definition, write a comment that describes the -function's purpose, including each parameter, the return value, and -side effects. References to argument names should be given in -single-quotes, e.g. 'arg'. The comment should not include the -function name, nor need it follow any formal structure. The comment -does not need to describe how a function does its work, unless this -information is needed to use the function correctly (this is often -better done with comments *inside* the function). - - Simple static functions do not need a comment. - - Within a file, non-static functions should come first, in the order -that they are declared in the header file, followed by static -functions. Static functions should be in one or more separate pages -(separated by form feed characters) in logical groups. A commonly -useful way to divide groups is by "level", with high-level functions -first, followed by groups of progressively lower-level functions. -This makes it easy for the program's reader to see the top-down -structure by reading from top to bottom. - - All function declarations and definitions should include a -prototype. Empty parentheses, e.g. "int foo();", do not include a -prototype (they state that the function's parameters are unknown); -write "void" in parentheses instead, e.g. "int foo(void);". - - Prototypes for static functions should either all go at the top of -the file, separated into groups by blank lines, or they should appear -at the top of each page of functions. Don't comment individual -prototypes, but a comment on each group of prototypes is often -appropriate. - - In the absence of good reasons for another order, the following -parameter order is preferred. One notable exception is that data -parameters and their corresponding size parameters should be paired. - - 1. The primary object being manipulated, if any (equivalent to the - "this" pointer in C++). - 2. Input-only parameters. - 3. Input/output parameters. - 4. Output-only parameters. - 5. Status parameter. - - Example: - - /* Stores the features supported by 'netdev' into each of '*current', - * '*advertised', '*supported', and '*peer' that are non-null. Each value - * is a bitmap of "enum ofp_port_features" bits, in host byte order. - * Returns 0 if successful, otherwise a positive errno value. On failure, - * all of the passed-in values are set to 0. */ - int - netdev_get_features(struct netdev *netdev, - uint32_t *current, uint32_t *advertised, - uint32_t *supported, uint32_t *peer) - { - ... - } - -Functions that destroy an instance of a dynamically-allocated type -should accept and ignore a null pointer argument. Code that calls -such a function (including the C standard library function free()) -should omit a null-pointer check. We find that this usually makes -code easier to read. - -Functions in .c files should not normally be marked "inline", because -it does not usually help code generation and it does suppress -compilers warnings about unused functions. (Functions defined in .h -usually should be marked inline.) - - -FUNCTION PROTOTYPES - - Put the return type and function name on the same line in a function -prototype: - - static const struct option_class *get_option_class(int code); - - - Omit parameter names from function prototypes when the names do not -give useful information, e.g.: - - int netdev_get_mtu(const struct netdev *, int *mtup); - - -STATEMENTS - - Indent each level of code with 4 spaces. Use BSD-style brace -placement: - - if (a()) { - b(); - d(); - } - - Put a space between "if", "while", "for", etc. and the expressions -that follow them. - - Enclose single statements in braces: - - if (a > b) { - return a; - } else { - return b; - } - - Use comments and blank lines to divide long functions into logical -groups of statements. - - Avoid assignments inside "if" and "while" conditions. - - Do not put gratuitous parentheses around the expression in a return -statement, that is, write "return 0;" and not "return(0);" - - Write only one statement per line. - - Indent "switch" statements like this: - - switch (conn->state) { - case S_RECV: - error = run_connection_input(conn); - break; - - case S_PROCESS: - error = 0; - break; - - case S_SEND: - error = run_connection_output(conn); - break; - - default: - OVS_NOT_REACHED(); - } - - "switch" statements with very short, uniform cases may use an -abbreviated style: - - switch (code) { - case 200: return "OK"; - case 201: return "Created"; - case 202: return "Accepted"; - case 204: return "No Content"; - default: return "Unknown"; - } - - Use "for (;;)" to write an infinite loop. - - In an if/else construct where one branch is the "normal" or "common" -case and the other branch is the "uncommon" or "error" case, put the -common case after the "if", not the "else". This is a form of -documentation. It also places the most important code in sequential -order without forcing the reader to visually skip past less important -details. (Some compilers also assume that the "if" branch is the more -common case, so this can be a real form of optimization as well.) - - -RETURN VALUES - - For functions that return a success or failure indication, prefer -one of the following return value conventions: - - * An "int" where 0 indicates success and a positive errno value - indicates a reason for failure. - - * A "bool" where true indicates success and false indicates - failure. - - -MACROS - - Don't define an object-like macro if an enum can be used instead. - - Don't define a function-like macro if a "static inline" function -can be used instead. - - If a macro's definition contains multiple statements, enclose them -with "do { ... } while (0)" to allow them to work properly in all -syntactic circumstances. - - Do use macros to eliminate the need to update different parts of a -single file in parallel, e.g. a list of enums and an array that gives -the name of each enum. For example: - - /* Logging importance levels. */ - #define VLOG_LEVELS \ - VLOG_LEVEL(EMER, LOG_ALERT) \ - VLOG_LEVEL(ERR, LOG_ERR) \ - VLOG_LEVEL(WARN, LOG_WARNING) \ - VLOG_LEVEL(INFO, LOG_NOTICE) \ - VLOG_LEVEL(DBG, LOG_DEBUG) - enum vlog_level { - #define VLOG_LEVEL(NAME, SYSLOG_LEVEL) VLL_##NAME, - VLOG_LEVELS - #undef VLOG_LEVEL - VLL_N_LEVELS - }; - - /* Name for each logging level. */ - static const char *level_names[VLL_N_LEVELS] = { - #define VLOG_LEVEL(NAME, SYSLOG_LEVEL) #NAME, - VLOG_LEVELS - #undef VLOG_LEVEL - }; - - -THREAD SAFETY ANNOTATIONS - - Use the macros in lib/compiler.h to annotate locking requirements. -For example: - - static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER; - static struct ovs_rwlock rwlock = OVS_RWLOCK_INITIALIZER; - - void function_require_plain_mutex(void) OVS_REQUIRES(mutex); - void function_require_rwlock(void) OVS_REQ_RDLOCK(rwlock); - - Pass lock objects, not their addresses, to the annotation macros. -(Thus we have OVS_REQUIRES(mutex) above, not OVS_REQUIRES(&mutex).) - - -SOURCE FILES - - Each source file should state its license in a comment at the very -top, followed by a comment explaining the purpose of the code that is -in that file. The comment should explain how the code in the file -relates to code in other files. The goal is to allow a programmer to -quickly figure out where a given module fits into the larger system. - - The first non-comment line in a .c source file should be: - - #include - -#include directives should appear in the following order: - - 1. #include - - 2. The module's own headers, if any. Including this before any - other header (besides ) ensures that the module's - header file is self-contained (see HEADER FILES) below. - - 3. Standard C library headers and other system headers, preferably - in alphabetical order. (Occasionally one encounters a set of - system headers that must be included in a particular order, in - which case that order must take precedence.) - - 4. Open vSwitch headers, in alphabetical order. Use "", not <>, - to specify Open vSwitch header names. - - -HEADER FILES - - Each header file should start with its license, as described under -SOURCE FILES above, followed by a "header guard" to make the header -file idempotent, like so: - - #ifndef NETDEV_H - #define NETDEV_H 1 - - ... - - #endif /* netdev.h */ - - Header files should be self-contained; that is, they should #include -whatever additional headers are required, without requiring the client -to #include them for it. - - Don't define the members of a struct or union in a header file, -unless client code is actually intended to access them directly or if -the definition is otherwise actually needed (e.g. inline functions -defined in the header need them). - - Similarly, don't #include a header file just for the declaration of -a struct or union tag (e.g. just for "struct ;"). Just declare -the tag yourself. This reduces the number of header file -dependencies. - - -TYPES - - Use typedefs sparingly. Code is clearer if the actual type is -visible at the point of declaration. Do not, in general, declare a -typedef for a struct, union, or enum. Do not declare a typedef for a -pointer type, because this can be very confusing to the reader. - - A function type is a good use for a typedef because it can clarify -code. The type should be a function type, not a pointer-to-function -type. That way, the typedef name can be used to declare function -prototypes. (It cannot be used for function definitions, because that -is explicitly prohibited by C89 and C99.) - - You may assume that "char" is exactly 8 bits and that "int" and -"long" are at least 32 bits. - - Don't assume that "long" is big enough to hold a pointer. If you -need to cast a pointer to an integer, use "intptr_t" or "uintptr_t" -from . - - Use the int_t and uint_t types from for exact-width -integer types. Use the PRId, PRIu, and PRIx macros from - for formatting them with printf() and related functions. - - For compatibility with antique printf() implementations: - - - Instead of "%zu", use "%"PRIuSIZE. - - - Instead of "%td", use "%"PRIdPTR. - - - Instead of "%ju", use "%"PRIuMAX. - -Other variants exist for different radixes. For example, use -"%"PRIxSIZE instead of "%zx" or "%x" instead of "%hhx". - - Also, instead of "%hhd", use "%d". Be cautious substituting "%u", -"%x", and "%o" for the corresponding versions with "hh": cast the -argument to unsigned char if necessary, because printf("%hhu", -1) -prints 255 but printf("%u", -1) prints 4294967295. - - Use bit-fields sparingly. Do not use bit-fields for layout of -network protocol fields or in other circumstances where the exact -format is important. - - Declare bit-fields to be signed or unsigned integer types or _Bool -(aka bool). Do *not* declare bit-fields of type "int": C99 allows -these to be either signed or unsigned according to the compiler's -whim. (A 1-bit bit-field of type "int" may have a range of -1...0!) - - Try to order structure members such that they pack well on a system -with 2-byte "short", 4-byte "int", and 4- or 8-byte "long" and pointer -types. Prefer clear organization over size optimization unless you -are convinced there is a size or speed benefit. - - Pointer declarators bind to the variable name, not the type name. -Write "int *x", not "int* x" and definitely not "int * x". - - -EXPRESSIONS - - Put one space on each side of infix binary and ternary operators: - - * / % - + - - << >> - < <= > >= - == != - & - ^ - | - && - || - ?: - = += -= *= /= %= &= ^= |= <<= >>= - - Avoid comma operators. - - Do not put any white space around postfix, prefix, or grouping -operators: - - () [] -> . - ! ~ ++ -- + - * & - -Exception 1: Put a space after (but not before) the "sizeof" keyword. -Exception 2: Put a space between the () used in a cast and the -expression whose type is cast: (void *) 0. - - Break long lines before the ternary operators ? and :, rather than -after them, e.g. - - return (out_port != VIGP_CONTROL_PATH - ? alpheus_output_port(dp, skb, out_port) - : alpheus_output_control(dp, skb, fwd_save_skb(skb), - VIGR_ACTION)); - - - Do not parenthesize the operands of && and || unless operator -precedence makes it necessary, or unless the operands are themselves -expressions that use && and ||. Thus: - - if (!isdigit((unsigned char)s[0]) - || !isdigit((unsigned char)s[1]) - || !isdigit((unsigned char)s[2])) { - printf("string %s does not start with 3-digit code\n", s); - } - -but - - if (rule && (!best || rule->priority > best->priority)) { - best = rule; - } - - Do parenthesize a subexpression that must be split across more than -one line, e.g.: - - *idxp = ((l1_idx << PORT_ARRAY_L1_SHIFT) - | (l2_idx << PORT_ARRAY_L2_SHIFT) - | (l3_idx << PORT_ARRAY_L3_SHIFT)); - - Try to avoid casts. Don't cast the return value of malloc(). - - The "sizeof" operator is unique among C operators in that it accepts -two very different kinds of operands: an expression or a type. In -general, prefer to specify an expression, e.g. "int *x = -xmalloc(sizeof *x);". When the operand of sizeof is an expression, -there is no need to parenthesize that operand, and please don't. - - Use the ARRAY_SIZE macro from lib/util.h to calculate the number of -elements in an array. - - When using a relational operator like "<" or "==", put an expression -or variable argument on the left and a constant argument on the -right, e.g. "x == 0", *not* "0 == x". - - -BLANK LINES - - Put one blank line between top-level definitions of functions and -global variables. - - -C DIALECT - - Most C99 features are OK because they are widely implemented: - - * Flexible array members (e.g. struct { int foo[]; }). - - * "static inline" functions (but no other forms of "inline", for - which GCC and C99 have differing interpretations). - - * "long long" - - * and . - - * bool and , but don't assume that bool or _Bool can - only take on the values 0 or 1, because this behavior can't be - simulated on C89 compilers. - Also, don't assume that a conversion to bool or _Bool follows - C99 semantics. I.e. use "(bool)(some_value != 0)" rather than - "(bool)some_value". The latter might produce unexpected results - on non-C99 environments. For example, if bool is implemented as - a typedef of char and some_value = 0x10000000. - - * Designated initializers (e.g. "struct foo foo = {.a = 1};" and - "int a[] = {[2] = 5};"). - - * Mixing of declarations and code within a block. Please use this - judiciously; keep declarations nicely grouped together in the - beginning of a block if possible. - - * Use of declarations in iteration statements (e.g. - "for (int i = 0; i < 10; i++)"). - - * Use of a trailing comma in an enum declaration (e.g. - "enum { x = 1, };"). - - As a matter of style, avoid // comments. - - Avoid using GCC or Clang extensions unless you also add a fallback -for other compilers. You can, however, use C99 features or GCC -extensions also supported by Clang in code that compiles only on -GNU/Linux (such as lib/netdev-linux.c), because GCC is the system -compiler there. diff --git a/CodingStyle.md b/CodingStyle.md new file mode 100644 index 000000000..b69b5d61e --- /dev/null +++ b/CodingStyle.md @@ -0,0 +1,571 @@ +Open vSwitch Coding Style +========================= + +This file describes the coding style used in most C files in the Open +vSwitch distribution. However, Linux kernel code datapath directory +follows the Linux kernel's established coding conventions. For the +Windows kernel datapath code, use the coding style described in +datapath-windows/CodingStyle. + +The following GNU indent options approximate this style: + + -npro -bad -bap -bbb -br -blf -brs -cdw -ce -fca -cli0 -npcs -i4 -l79 \ + -lc79 -nbfda -nut -saf -sai -saw -sbi4 -sc -sob -st -ncdb -pi4 -cs -bs \ + -di1 -lp -il0 -hnl + + +## BASICS + + Limit lines to 79 characters. + + Use form feeds (control+L) to divide long source files into logical +pieces. A form feed should appear as the only character on a line. + + Do not use tabs for indentation. + + Avoid trailing spaces on lines. + + +## NAMING + + - Use names that explain the purpose of a function or object. + + - Use underscores to separate words in an identifier: multi_word_name. + + - Use lowercase for most names. Use uppercase for macros, macro + parameters, and members of enumerations. + + - Give arrays names that are plural. + + - Pick a unique name prefix (ending with an underscore) for each + module, and apply that prefix to all of that module's externally + visible names. Names of macro parameters, struct and union members, + and parameters in function prototypes are not considered externally + visible for this purpose. + + - Do not use names that begin with _. If you need a name for + "internal use only", use __ as a suffix instead of a prefix. + + - Avoid negative names: "found" is a better name than "not_found". + + - In names, a "size" is a count of bytes, a "length" is a count of + characters. A buffer has size, but a string has length. The length + of a string does not include the null terminator, but the size of the + buffer that contains the string does. + + +## COMMENTS + + Comments should be written as full sentences that start with a +capital letter and end with a period. Put two spaces between +sentences. + + Write block comments as shown below. You may put the /* and */ on +the same line as comment text if you prefer. + + /* + * We redirect stderr to /dev/null because we often want to remove all + * traffic control configuration on a port so its in a known state. If + * this done when there is no such configuration, tc complains, so we just + * always ignore it. + */ + + Each function and each variable declared outside a function, and +each struct, union, and typedef declaration should be preceded by a +comment. See FUNCTION DEFINITIONS below for function comment +guidelines. + + Each struct and union member should each have an inline comment that +explains its meaning. structs and unions with many members should be +additionally divided into logical groups of members by block comments, +e.g.: + + /* An event that will wake the following call to poll_block(). */ + struct poll_waiter { + /* Set when the waiter is created. */ + struct list node; /* Element in global waiters list. */ + int fd; /* File descriptor. */ + short int events; /* Events to wait for (POLLIN, POLLOUT). */ + poll_fd_func *function; /* Callback function, if any, or null. */ + void *aux; /* Argument to callback function. */ + struct backtrace *backtrace; /* Event that created waiter, or null. */ + + /* Set only when poll_block() is called. */ + struct pollfd *pollfd; /* Pointer to element of the pollfds array + (null if added from a callback). */ + }; + + Use XXX or FIXME comments to mark code that needs work. + + Don't use `//` comments. + + Don't comment out or #if 0 out code. Just remove it. The code that +was there will still be in version control history. + + +## FUNCTIONS + + Put the return type, function name, and the braces that surround the +function's code on separate lines, all starting in column 0. + + Before each function definition, write a comment that describes the +function's purpose, including each parameter, the return value, and +side effects. References to argument names should be given in +single-quotes, e.g. 'arg'. The comment should not include the +function name, nor need it follow any formal structure. The comment +does not need to describe how a function does its work, unless this +information is needed to use the function correctly (this is often +better done with comments *inside* the function). + + Simple static functions do not need a comment. + + Within a file, non-static functions should come first, in the order +that they are declared in the header file, followed by static +functions. Static functions should be in one or more separate pages +(separated by form feed characters) in logical groups. A commonly +useful way to divide groups is by "level", with high-level functions +first, followed by groups of progressively lower-level functions. +This makes it easy for the program's reader to see the top-down +structure by reading from top to bottom. + + All function declarations and definitions should include a +prototype. Empty parentheses, e.g. "int foo();", do not include a +prototype (they state that the function's parameters are unknown); +write "void" in parentheses instead, e.g. "int foo(void);". + + Prototypes for static functions should either all go at the top of +the file, separated into groups by blank lines, or they should appear +at the top of each page of functions. Don't comment individual +prototypes, but a comment on each group of prototypes is often +appropriate. + + In the absence of good reasons for another order, the following +parameter order is preferred. One notable exception is that data +parameters and their corresponding size parameters should be paired. + + 1. The primary object being manipulated, if any (equivalent to the + "this" pointer in C++). + 2. Input-only parameters. + 3. Input/output parameters. + 4. Output-only parameters. + 5. Status parameter. + + Example: + + ``` + /* Stores the features supported by 'netdev' into each of '*current', + * '*advertised', '*supported', and '*peer' that are non-null. Each value + * is a bitmap of "enum ofp_port_features" bits, in host byte order. + * Returns 0 if successful, otherwise a positive errno value. On failure, + * all of the passed-in values are set to 0. */ + int + netdev_get_features(struct netdev *netdev, + uint32_t *current, uint32_t *advertised, + uint32_t *supported, uint32_t *peer) + { + ... + } + ``` + +Functions that destroy an instance of a dynamically-allocated type +should accept and ignore a null pointer argument. Code that calls +such a function (including the C standard library function free()) +should omit a null-pointer check. We find that this usually makes +code easier to read. + +Functions in .c files should not normally be marked "inline", because +it does not usually help code generation and it does suppress +compilers warnings about unused functions. (Functions defined in .h +usually should be marked inline.) + + +## FUNCTION PROTOTYPES + + Put the return type and function name on the same line in a function +prototype: + + static const struct option_class *get_option_class(int code); + + + Omit parameter names from function prototypes when the names do not +give useful information, e.g.: + + int netdev_get_mtu(const struct netdev *, int *mtup); + + +## STATEMENTS + + Indent each level of code with 4 spaces. Use BSD-style brace +placement: + + if (a()) { + b(); + d(); + } + + Put a space between "if", "while", "for", etc. and the expressions +that follow them. + + Enclose single statements in braces: + + if (a > b) { + return a; + } else { + return b; + } + + Use comments and blank lines to divide long functions into logical +groups of statements. + + Avoid assignments inside "if" and "while" conditions. + + Do not put gratuitous parentheses around the expression in a return +statement, that is, write "return 0;" and not "return(0);" + + Write only one statement per line. + + Indent "switch" statements like this: + + switch (conn->state) { + case S_RECV: + error = run_connection_input(conn); + break; + + case S_PROCESS: + error = 0; + break; + + case S_SEND: + error = run_connection_output(conn); + break; + + default: + OVS_NOT_REACHED(); + } + + "switch" statements with very short, uniform cases may use an +abbreviated style: + + switch (code) { + case 200: return "OK"; + case 201: return "Created"; + case 202: return "Accepted"; + case 204: return "No Content"; + default: return "Unknown"; + } + + Use "for (;;)" to write an infinite loop. + + In an if/else construct where one branch is the "normal" or "common" +case and the other branch is the "uncommon" or "error" case, put the +common case after the "if", not the "else". This is a form of +documentation. It also places the most important code in sequential +order without forcing the reader to visually skip past less important +details. (Some compilers also assume that the "if" branch is the more +common case, so this can be a real form of optimization as well.) + + +## RETURN VALUES + + For functions that return a success or failure indication, prefer +one of the following return value conventions: + +* An "int" where 0 indicates success and a positive errno value + indicates a reason for failure. + +* A "bool" where true indicates success and false indicates + failure. + + +## MACROS + + Don't define an object-like macro if an enum can be used instead. + + Don't define a function-like macro if a "static inline" function +can be used instead. + + If a macro's definition contains multiple statements, enclose them +with "do { ... } while (0)" to allow them to work properly in all +syntactic circumstances. + + Do use macros to eliminate the need to update different parts of a +single file in parallel, e.g. a list of enums and an array that gives +the name of each enum. For example: + + /* Logging importance levels. */ + #define VLOG_LEVELS \ + VLOG_LEVEL(EMER, LOG_ALERT) \ + VLOG_LEVEL(ERR, LOG_ERR) \ + VLOG_LEVEL(WARN, LOG_WARNING) \ + VLOG_LEVEL(INFO, LOG_NOTICE) \ + VLOG_LEVEL(DBG, LOG_DEBUG) + enum vlog_level { + #define VLOG_LEVEL(NAME, SYSLOG_LEVEL) VLL_##NAME, + VLOG_LEVELS + #undef VLOG_LEVEL + VLL_N_LEVELS + }; + + /* Name for each logging level. */ + static const char *level_names[VLL_N_LEVELS] = { + #define VLOG_LEVEL(NAME, SYSLOG_LEVEL) #NAME, + VLOG_LEVELS + #undef VLOG_LEVEL + }; + + +## THREAD SAFETY ANNOTATIONS + + Use the macros in lib/compiler.h to annotate locking requirements. +For example: + + static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER; + static struct ovs_rwlock rwlock = OVS_RWLOCK_INITIALIZER; + + void function_require_plain_mutex(void) OVS_REQUIRES(mutex); + void function_require_rwlock(void) OVS_REQ_RDLOCK(rwlock); + + Pass lock objects, not their addresses, to the annotation macros. +(Thus we have OVS_REQUIRES(mutex) above, not OVS_REQUIRES(&mutex).) + + +## SOURCE FILES + + Each source file should state its license in a comment at the very +top, followed by a comment explaining the purpose of the code that is +in that file. The comment should explain how the code in the file +relates to code in other files. The goal is to allow a programmer to +quickly figure out where a given module fits into the larger system. + + The first non-comment line in a .c source file should be: + + #include + +`#include` directives should appear in the following order: + +1. `#include ` + +2. The module's own headers, if any. Including this before any + other header (besides ) ensures that the module's + header file is self-contained (see HEADER FILES) below. + +3. Standard C library headers and other system headers, preferably + in alphabetical order. (Occasionally one encounters a set of + system headers that must be included in a particular order, in + which case that order must take precedence.) + +4. Open vSwitch headers, in alphabetical order. Use "", not <>, + to specify Open vSwitch header names. + + +## HEADER FILES + + Each header file should start with its license, as described under +SOURCE FILES above, followed by a "header guard" to make the header +file idempotent, like so: + + #ifndef NETDEV_H + #define NETDEV_H 1 + + ... + + #endif /* netdev.h */ + + Header files should be self-contained; that is, they should #include +whatever additional headers are required, without requiring the client +to #include them for it. + + Don't define the members of a struct or union in a header file, +unless client code is actually intended to access them directly or if +the definition is otherwise actually needed (e.g. inline functions +defined in the header need them). + + Similarly, don't #include a header file just for the declaration of +a struct or union tag (e.g. just for "struct ;"). Just declare +the tag yourself. This reduces the number of header file +dependencies. + + +## TYPES + + Use typedefs sparingly. Code is clearer if the actual type is +visible at the point of declaration. Do not, in general, declare a +typedef for a struct, union, or enum. Do not declare a typedef for a +pointer type, because this can be very confusing to the reader. + + A function type is a good use for a typedef because it can clarify +code. The type should be a function type, not a pointer-to-function +type. That way, the typedef name can be used to declare function +prototypes. (It cannot be used for function definitions, because that +is explicitly prohibited by C89 and C99.) + + You may assume that "char" is exactly 8 bits and that "int" and +"long" are at least 32 bits. + + Don't assume that "long" is big enough to hold a pointer. If you +need to cast a pointer to an integer, use "intptr_t" or "uintptr_t" +from . + + Use the int_t and uint_t types from for exact-width +integer types. Use the PRId, PRIu, and PRIx macros from + for formatting them with printf() and related functions. + + For compatibility with antique printf() implementations: + + - Instead of "%zu", use "%"PRIuSIZE. + + - Instead of "%td", use "%"PRIdPTR. + + - Instead of "%ju", use "%"PRIuMAX. + +Other variants exist for different radixes. For example, use +"%"PRIxSIZE instead of "%zx" or "%x" instead of "%hhx". + + Also, instead of "%hhd", use "%d". Be cautious substituting "%u", +"%x", and "%o" for the corresponding versions with "hh": cast the +argument to unsigned char if necessary, because printf("%hhu", -1) +prints 255 but printf("%u", -1) prints 4294967295. + + Use bit-fields sparingly. Do not use bit-fields for layout of +network protocol fields or in other circumstances where the exact +format is important. + + Declare bit-fields to be signed or unsigned integer types or _Bool +(aka bool). Do *not* declare bit-fields of type "int": C99 allows +these to be either signed or unsigned according to the compiler's +whim. (A 1-bit bit-field of type "int" may have a range of -1...0!) + + Try to order structure members such that they pack well on a system +with 2-byte "short", 4-byte "int", and 4- or 8-byte "long" and pointer +types. Prefer clear organization over size optimization unless you +are convinced there is a size or speed benefit. + + Pointer declarators bind to the variable name, not the type name. +Write "int *x", not "int* x" and definitely not "int * x". + + +## EXPRESSIONS + + Put one space on each side of infix binary and ternary operators: + + * / % + + - + << >> + < <= > >= + == != + & + ^ + | + && + || + ?: + = += -= *= /= %= &= ^= |= <<= >>= + + Avoid comma operators. + + Do not put any white space around postfix, prefix, or grouping +operators: + + () [] -> . + ! ~ ++ -- + - * & + +Exception 1: Put a space after (but not before) the "sizeof" keyword. +Exception 2: Put a space between the () used in a cast and the +expression whose type is cast: (void *) 0. + + Break long lines before the ternary operators ? and :, rather than +after them, e.g. + + return (out_port != VIGP_CONTROL_PATH + ? alpheus_output_port(dp, skb, out_port) + : alpheus_output_control(dp, skb, fwd_save_skb(skb), + VIGR_ACTION)); + + + Do not parenthesize the operands of && and || unless operator +precedence makes it necessary, or unless the operands are themselves +expressions that use && and ||. Thus: + + if (!isdigit((unsigned char)s[0]) + || !isdigit((unsigned char)s[1]) + || !isdigit((unsigned char)s[2])) { + printf("string %s does not start with 3-digit code\n", s); + } + +but + + if (rule && (!best || rule->priority > best->priority)) { + best = rule; + } + + Do parenthesize a subexpression that must be split across more than +one line, e.g.: + + *idxp = ((l1_idx << PORT_ARRAY_L1_SHIFT) + | (l2_idx << PORT_ARRAY_L2_SHIFT) + | (l3_idx << PORT_ARRAY_L3_SHIFT)); + + Try to avoid casts. Don't cast the return value of malloc(). + + The "sizeof" operator is unique among C operators in that it accepts +two very different kinds of operands: an expression or a type. In +general, prefer to specify an expression, e.g. "int *x = +xmalloc(sizeof *x);". When the operand of sizeof is an expression, +there is no need to parenthesize that operand, and please don't. + + Use the ARRAY_SIZE macro from lib/util.h to calculate the number of +elements in an array. + + When using a relational operator like "<" or "==", put an expression +or variable argument on the left and a constant argument on the +right, e.g. "x == 0", *not* "0 == x". + + +## BLANK LINES + + Put one blank line between top-level definitions of functions and +global variables. + + +## C DIALECT + + Most C99 features are OK because they are widely implemented: + + * Flexible array members (e.g. struct { int foo[]; }). + + * "static inline" functions (but no other forms of "inline", for + which GCC and C99 have differing interpretations). + + * "long long" + + * and . + + * bool and , but don't assume that bool or _Bool can + only take on the values 0 or 1, because this behavior can't be + simulated on C89 compilers. + Also, don't assume that a conversion to bool or _Bool follows + C99 semantics. I.e. use "(bool)(some_value != 0)" rather than + "(bool)some_value". The latter might produce unexpected results + on non-C99 environments. For example, if bool is implemented as + a typedef of char and some_value = 0x10000000. + + * Designated initializers (e.g. "struct foo foo = {.a = 1};" and + "int a[] = {[2] = 5};"). + + * Mixing of declarations and code within a block. Please use this + judiciously; keep declarations nicely grouped together in the + beginning of a block if possible. + + * Use of declarations in iteration statements (e.g. + "for (int i = 0; i < 10; i++)"). + + * Use of a trailing comma in an enum declaration (e.g. + "enum { x = 1, };"). + + As a matter of style, avoid // comments. + + Avoid using GCC or Clang extensions unless you also add a fallback +for other compilers. You can, however, use C99 features or GCC +extensions also supported by Clang in code that compiles only on +GNU/Linux (such as lib/netdev-linux.c), because GCC is the system +compiler there. diff --git a/DESIGN b/DESIGN deleted file mode 100644 index f864135a8..000000000 --- a/DESIGN +++ /dev/null @@ -1,933 +0,0 @@ - Design Decisions In Open vSwitch - ================================ - -This document describes design decisions that went into implementing -Open vSwitch. While we believe these to be reasonable decisions, it is -impossible to predict how Open vSwitch will be used in all environments. -Understanding assumptions made by Open vSwitch is critical to a -successful deployment. The end of this document contains contact -information that can be used to let us know how we can make Open vSwitch -more generally useful. - -Asynchronous Messages -===================== - -Over time, Open vSwitch has added many knobs that control whether a -given controller receives OpenFlow asynchronous messages. This -section describes how all of these features interact. - -First, a service controller never receives any asynchronous messages -unless it changes its miss_send_len from the service controller -default of zero in one of the following ways: - - - Sending an OFPT_SET_CONFIG message with nonzero miss_send_len. - - - Sending any NXT_SET_ASYNC_CONFIG message: as a side effect, this - message changes the miss_send_len to - OFP_DEFAULT_MISS_SEND_LEN (128) for service controllers. - -Second, OFPT_FLOW_REMOVED and NXT_FLOW_REMOVED messages are generated -only if the flow that was removed had the OFPFF_SEND_FLOW_REM flag -set. - -Third, OFPT_PACKET_IN and NXT_PACKET_IN messages are sent only to -OpenFlow controller connections that have the correct connection ID -(see "struct nx_controller_id" and "struct nx_action_controller"): - - - For packet-in messages generated by a NXAST_CONTROLLER action, - the controller ID specified in the action. - - - For other packet-in messages, controller ID zero. (This is the - default ID when an OpenFlow controller does not configure one.) - -Finally, Open vSwitch consults a per-connection table indexed by the -message type, reason code, and current role. The following table -shows how this table is initialized by default when an OpenFlow -connection is made. An entry labeled "yes" means that the message is -sent, an entry labeled "---" means that the message is suppressed. - - master/ - message and reason code other slave - ---------------------------------------- ------- ----- - OFPT_PACKET_IN / NXT_PACKET_IN - OFPR_NO_MATCH yes --- - OFPR_ACTION yes --- - OFPR_INVALID_TTL --- --- - - OFPT_FLOW_REMOVED / NXT_FLOW_REMOVED - OFPRR_IDLE_TIMEOUT yes --- - OFPRR_HARD_TIMEOUT yes --- - OFPRR_DELETE yes --- - - OFPT_PORT_STATUS - OFPPR_ADD yes yes - OFPPR_DELETE yes yes - OFPPR_MODIFY yes yes - -The NXT_SET_ASYNC_CONFIG message directly sets all of the values in -this table for the current connection. The -OFPC_INVALID_TTL_TO_CONTROLLER bit in the OFPT_SET_CONFIG message -controls the setting for OFPR_INVALID_TTL for the "master" role. - - -OFPAT_ENQUEUE -============= - -The OpenFlow 1.0 specification requires the output port of the OFPAT_ENQUEUE -action to "refer to a valid physical port (i.e. < OFPP_MAX) or OFPP_IN_PORT". -Although OFPP_LOCAL is not less than OFPP_MAX, it is an 'internal' port which -can have QoS applied to it in Linux. Since we allow the OFPAT_ENQUEUE to apply -to 'internal' ports whose port numbers are less than OFPP_MAX, we interpret -OFPP_LOCAL as a physical port and support OFPAT_ENQUEUE on it as well. - - -OFPT_FLOW_MOD -============= - -The OpenFlow specification for the behavior of OFPT_FLOW_MOD is -confusing. The following tables summarize the Open vSwitch -implementation of its behavior in the following categories: - - - "match on priority": Whether the flow_mod acts only on flows - whose priority matches that included in the flow_mod message. - - - "match on out_port": Whether the flow_mod acts only on flows - that output to the out_port included in the flow_mod message (if - out_port is not OFPP_NONE). OpenFlow 1.1 and later have a - similar feature (not listed separately here) for out_group. - - - "match on flow_cookie": Whether the flow_mod acts only on flows - whose flow_cookie matches an optional controller-specified value - and mask. - - - "updates flow_cookie": Whether the flow_mod changes the - flow_cookie of the flow or flows that it matches to the - flow_cookie included in the flow_mod message. - - - "updates OFPFF_ flags": Whether the flow_mod changes the - OFPFF_SEND_FLOW_REM flag of the flow or flows that it matches to - the setting included in the flags of the flow_mod message. - - - "honors OFPFF_CHECK_OVERLAP": Whether the OFPFF_CHECK_OVERLAP - flag in the flow_mod is significant. - - - "updates idle_timeout" and "updates hard_timeout": Whether the - idle_timeout and hard_timeout in the flow_mod, respectively, - have an effect on the flow or flows matched by the flow_mod. - - - "updates idle timer": Whether the flow_mod resets the per-flow - timer that measures how long a flow has been idle. - - - "updates hard timer": Whether the flow_mod resets the per-flow - timer that measures how long it has been since a flow was - modified. - - - "zeros counters": Whether the flow_mod resets per-flow packet - and byte counters to zero. - - - "may add a new flow": Whether the flow_mod may add a new flow to - the flow table. (Obviously this is always true for "add" - commands but in some OpenFlow versions "modify" and - "modify-strict" can also add new flows.) - - - "sends flow_removed message": Whether the flow_mod generates a - flow_removed message for the flow or flows that it affects. - -An entry labeled "yes" means that the flow mod type does have the -indicated behavior, "---" means that it does not, an empty cell means -that the property is not applicable, and other values are explained -below the table. - -OpenFlow 1.0 ------------- - - MODIFY DELETE - ADD MODIFY STRICT DELETE STRICT - === ====== ====== ====== ====== -match on priority yes --- yes --- yes -match on out_port --- --- --- yes yes -match on flow_cookie --- --- --- --- --- -match on table_id --- --- --- --- --- -controller chooses table_id --- --- --- -updates flow_cookie yes yes yes -updates OFPFF_SEND_FLOW_REM yes + + -honors OFPFF_CHECK_OVERLAP yes + + -updates idle_timeout yes + + -updates hard_timeout yes + + -resets idle timer yes + + -resets hard timer yes yes yes -zeros counters yes + + -may add a new flow yes yes yes -sends flow_removed message --- --- --- % % - -(+) "modify" and "modify-strict" only take these actions when they - create a new flow, not when they update an existing flow. - -(%) "delete" and "delete_strict" generates a flow_removed message if - the deleted flow or flows have the OFPFF_SEND_FLOW_REM flag set. - (Each controller can separately control whether it wants to - receive the generated messages.) - -OpenFlow 1.1 ------------- - -OpenFlow 1.1 makes these changes: - - - The controller now must specify the table_id of the flow match - searched and into which a flow may be inserted. Behavior for a - table_id of 255 is undefined. - - - A flow_mod, except an "add", can now match on the flow_cookie. - - - When a flow_mod matches on the flow_cookie, "modify" and - "modify-strict" never insert a new flow. - - MODIFY DELETE - ADD MODIFY STRICT DELETE STRICT - === ====== ====== ====== ====== -match on priority yes --- yes --- yes -match on out_port --- --- --- yes yes -match on flow_cookie --- yes yes yes yes -match on table_id yes yes yes yes yes -controller chooses table_id yes yes yes -updates flow_cookie yes --- --- -updates OFPFF_SEND_FLOW_REM yes + + -honors OFPFF_CHECK_OVERLAP yes + + -updates idle_timeout yes + + -updates hard_timeout yes + + -resets idle timer yes + + -resets hard timer yes yes yes -zeros counters yes + + -may add a new flow yes # # -sends flow_removed message --- --- --- % % - -(+) "modify" and "modify-strict" only take these actions when they - create a new flow, not when they update an existing flow. - -(%) "delete" and "delete_strict" generates a flow_removed message if - the deleted flow or flows have the OFPFF_SEND_FLOW_REM flag set. - (Each controller can separately control whether it wants to - receive the generated messages.) - -(#) "modify" and "modify-strict" only add a new flow if the flow_mod - does not match on any bits of the flow cookie - -OpenFlow 1.2 ------------- - -OpenFlow 1.2 makes these changes: - - - Only "add" commands ever add flows, "modify" and "modify-strict" - never do. - - - A new flag OFPFF_RESET_COUNTS now controls whether "modify" and - "modify-strict" reset counters, whereas previously they never - reset counters (except when they inserted a new flow). - - MODIFY DELETE - ADD MODIFY STRICT DELETE STRICT - === ====== ====== ====== ====== -match on priority yes --- yes --- yes -match on out_port --- --- --- yes yes -match on flow_cookie --- yes yes yes yes -match on table_id yes yes yes yes yes -controller chooses table_id yes yes yes -updates flow_cookie yes --- --- -updates OFPFF_SEND_FLOW_REM yes --- --- -honors OFPFF_CHECK_OVERLAP yes --- --- -updates idle_timeout yes --- --- -updates hard_timeout yes --- --- -resets idle timer yes --- --- -resets hard timer yes yes yes -zeros counters yes & & -may add a new flow yes --- --- -sends flow_removed message --- --- --- % % - -(%) "delete" and "delete_strict" generates a flow_removed message if - the deleted flow or flows have the OFPFF_SEND_FLOW_REM flag set. - (Each controller can separately control whether it wants to - receive the generated messages.) - -(&) "modify" and "modify-strict" reset counters if the - OFPFF_RESET_COUNTS flag is specified. - -OpenFlow 1.3 ------------- - -OpenFlow 1.3 makes these changes: - - - Behavior for a table_id of 255 is now defined, for "delete" and - "delete-strict" commands, as meaning to delete from all tables. - A table_id of 255 is now explicitly invalid for other commands. - - - New flags OFPFF_NO_PKT_COUNTS and OFPFF_NO_BYT_COUNTS for "add" - operations. - -The table for 1.3 is the same as the one shown above for 1.2. - - -OpenFlow 1.4 ------------- - -OpenFlow 1.4 does not change flow_mod semantics. - - -OFPT_PACKET_IN -============== - -The OpenFlow 1.1 specification for OFPT_PACKET_IN is confusing. The -definition in OF1.1 openflow.h is[*]: - - /* Packet received on port (datapath -> controller). */ - struct ofp_packet_in { - struct ofp_header header; - uint32_t buffer_id; /* ID assigned by datapath. */ - uint32_t in_port; /* Port on which frame was received. */ - uint32_t in_phy_port; /* Physical Port on which frame was received. */ - uint16_t total_len; /* Full length of frame. */ - uint8_t reason; /* Reason packet is being sent (one of OFPR_*) */ - uint8_t table_id; /* ID of the table that was looked up */ - uint8_t data[0]; /* Ethernet frame, halfway through 32-bit word, - so the IP header is 32-bit aligned. The - amount of data is inferred from the length - field in the header. Because of padding, - offsetof(struct ofp_packet_in, data) == - sizeof(struct ofp_packet_in) - 2. */ - }; - OFP_ASSERT(sizeof(struct ofp_packet_in) == 24); - -The confusing part is the comment on the data[] member. This comment -is a leftover from OF1.0 openflow.h, in which the comment was correct: -sizeof(struct ofp_packet_in) is 20 in OF1.0 and offsetof(struct -ofp_packet_in, data) is 18. When OF1.1 was written, the structure -members were changed but the comment was carelessly not updated, and -the comment became wrong: sizeof(struct ofp_packet_in) and -offsetof(struct ofp_packet_in, data) are both 24 in OF1.1. - -That leaves the question of how to implement ofp_packet_in in OF1.1. -The OpenFlow reference implementation for OF1.1 does not include any -padding, that is, the first byte of the encapsulated frame immediately -follows the 'table_id' member without a gap. Open vSwitch therefore -implements it the same way for compatibility. - -For an earlier discussion, please see the thread archived at: -https://mailman.stanford.edu/pipermail/openflow-discuss/2011-August/002604.html - -[*] The quoted definition is directly from OF1.1. Definitions used - inside OVS omit the 8-byte ofp_header members, so the sizes in - this discussion are 8 bytes larger than those declared in OVS - header files. - - -VLAN Matching -============= - -The 802.1Q VLAN header causes more trouble than any other 4 bytes in -networking. More specifically, three versions of OpenFlow and Open -vSwitch have among them four different ways to match the contents and -presence of the VLAN header. The following table describes how each -version works. - - Match NXM OF1.0 OF1.1 OF1.2 - ----- --------- ----------- ----------- ------------ - [1] 0000/0000 ????/1,??/? ????/1,??/? 0000/0000,-- - [2] 0000/ffff ffff/0,??/? ffff/0,??/? 0000/ffff,-- - [3] 1xxx/1fff 0xxx/0,??/1 0xxx/0,??/1 1xxx/ffff,-- - [4] z000/f000 ????/1,0y/0 fffe/0,0y/0 1000/1000,0y - [5] zxxx/ffff 0xxx/0,0y/0 0xxx/0,0y/0 1xxx/ffff,0y - [6] 0000/0fff - [7] 0000/f000 - [8] 0000/efff - [9] 1001/1001 1001/1001,-- - [10] 3000/3000 - -Each column is interpreted as follows. - - - Match: See the list below. - - - NXM: xxxx/yyyy means NXM_OF_VLAN_TCI_W with value xxxx and mask - yyyy. A mask of 0000 is equivalent to omitting - NXM_OF_VLAN_TCI(_W), a mask of ffff is equivalent to - NXM_OF_VLAN_TCI. - - - OF1.0 and OF1.1: wwww/x,yy/z means dl_vlan wwww, OFPFW_DL_VLAN - x, dl_vlan_pcp yy, and OFPFW_DL_VLAN_PCP z. ? means that the - given nibble is ignored (and conventionally 0 for wwww or yy, - conventionally 1 for x or z). means that the given match - is not supported. - - - OF1.2: xxxx/yyyy,zz means OXM_OF_VLAN_VID_W with value xxxx and - mask yyyy, and OXM_OF_VLAN_PCP (which is not maskable) with - value zz. A mask of 0000 is equivalent to omitting - OXM_OF_VLAN_VID(_W), a mask of ffff is equivalent to - OXM_OF_VLAN_VID. -- means that OXM_OF_VLAN_PCP is omitted. - means that the given match is not supported. - -The matches are: - - [1] Matches any packet, that is, one without an 802.1Q header or with - an 802.1Q header with any TCI value. - - [2] Matches only packets without an 802.1Q header. - - NXM: Any match with (vlan_tci == 0) and (vlan_tci_mask & 0x1000) - != 0 is equivalent to the one listed in the table. - - OF1.0: The spec doesn't define behavior if dl_vlan is set to - 0xffff and OFPFW_DL_VLAN_PCP is not set. - - OF1.1: The spec says explicitly to ignore dl_vlan_pcp when - dl_vlan is set to 0xffff. - - OF1.2: The spec doesn't say what should happen if (vlan_vid == 0) - and (vlan_vid_mask & 0x1000) != 0 but (vlan_vid_mask != 0x1000), - but it would be straightforward to also interpret as [2]. - - [3] Matches only packets that have an 802.1Q header with VID xxx (and - any PCP). - - [4] Matches only packets that have an 802.1Q header with PCP y (and - any VID). - - NXM: z is ((y << 1) | 1). - - OF1.0: The spec isn't very clear, but OVS implements it this way. - - OF1.2: Presumably other masks such that (vlan_vid_mask & 0x1fff) - == 0x1000 would also work, but the spec doesn't define their - behavior. - - [5] Matches only packets that have an 802.1Q header with VID xxx and - PCP y. - - NXM: z is ((y << 1) | 1). - - OF1.2: Presumably other masks such that (vlan_vid_mask & 0x1fff) - == 0x1fff would also work. - - [6] Matches packets with no 802.1Q header or with an 802.1Q header - with a VID of 0. Only possible with NXM. - - [7] Matches packets with no 802.1Q header or with an 802.1Q header - with a PCP of 0. Only possible with NXM. - - [8] Matches packets with no 802.1Q header or with an 802.1Q header - with both VID and PCP of 0. Only possible with NXM. - - [9] Matches only packets that have an 802.1Q header with an - odd-numbered VID (and any PCP). Only possible with NXM and - OF1.2. (This is just an example; one can match on any desired - VID bit pattern.) - -[10] Matches only packets that have an 802.1Q header with an - odd-numbered PCP (and any VID). Only possible with NXM. (This - is just an example; one can match on any desired VID bit - pattern.) - -Additional notes: - - - OF1.2: The top three bits of OXM_OF_VLAN_VID are fixed to zero, - so bits 13, 14, and 15 in the masks listed in the table may be - set to arbitrary values, as long as the corresponding value bits - are also zero. The suggested ffff mask for [2], [3], and [5] - allows a shorter OXM representation (the mask is omitted) than - the minimal 1fff mask. - - -Flow Cookies -============ - -OpenFlow 1.0 and later versions have the concept of a "flow cookie", -which is a 64-bit integer value attached to each flow. The treatment -of the flow cookie has varied greatly across OpenFlow versions, -however. - -In OpenFlow 1.0: - - - OFPFC_ADD set the cookie in the flow that it added. - - - OFPFC_MODIFY and OFPFC_MODIFY_STRICT updated the cookie for - the flow or flows that it modified. - - - OFPST_FLOW messages included the flow cookie. - - - OFPT_FLOW_REMOVED messages reported the cookie of the flow - that was removed. - -OpenFlow 1.1 made the following changes: - - - Flow mod operations OFPFC_MODIFY, OFPFC_MODIFY_STRICT, - OFPFC_DELETE, and OFPFC_DELETE_STRICT, plus flow stats - requests and aggregate stats requests, gained the ability to - match on flow cookies with an arbitrary mask. - - - OFPFC_MODIFY and OFPFC_MODIFY_STRICT were changed to add a - new flow, in the case of no match, only if the flow table - modification operation did not match on the cookie field. - (In OpenFlow 1.0, modify operations always added a new flow - when there was no match.) - - - OFPFC_MODIFY and OFPFC_MODIFY_STRICT no longer updated flow - cookies. - -OpenFlow 1.2 made the following changes: - - - OFPC_MODIFY and OFPFC_MODIFY_STRICT were changed to never - add a new flow, regardless of whether the flow cookie was - used for matching. - -Open vSwitch support for OpenFlow 1.0 implements the OpenFlow 1.0 -behavior with the following extensions: - - - An NXM extension field NXM_NX_COOKIE(_W) allows the NXM - versions of OFPFC_MODIFY, OFPFC_MODIFY_STRICT, OFPFC_DELETE, - and OFPFC_DELETE_STRICT flow_mods, plus flow stats requests - and aggregate stats requests, to match on flow cookies with - arbitrary masks. This is much like the equivalent OpenFlow - 1.1 feature. - - - Like OpenFlow 1.1, OFPC_MODIFY and OFPFC_MODIFY_STRICT add a - new flow if there is no match and the mask is zero (or not - given). - - - The "cookie" field in OFPT_FLOW_MOD and NXT_FLOW_MOD messages - is used as the cookie value for OFPFC_ADD commands, as - described in OpenFlow 1.0. For OFPFC_MODIFY and - OFPFC_MODIFY_STRICT commands, the "cookie" field is used as a - new cookie for flows that match unless it is UINT64_MAX, in - which case the flow's cookie is not updated. - - - NXT_PACKET_IN (the Nicira extended version of - OFPT_PACKET_IN) reports the cookie of the rule that - generated the packet, or all-1-bits if no rule generated the - packet. (Older versions of OVS used all-0-bits instead of - all-1-bits.) - -The following table shows the handling of different protocols when -receiving OFPFC_MODIFY and OFPFC_MODIFY_STRICT messages. A mask of 0 -indicates either an explicit mask of zero or an implicit one by not -specifying the NXM_NX_COOKIE(_W) field. - - Match Update Add on miss Add on miss - cookie cookie mask!=0 mask==0 - ====== ====== =========== =========== -OpenFlow 1.0 no yes -OpenFlow 1.1 yes no no yes -OpenFlow 1.2 yes no no no -NXM yes yes* no yes - -* Updates the flow's cookie unless the "cookie" field is UINT64_MAX. - - -Multiple Table Support -====================== - -OpenFlow 1.0 has only rudimentary support for multiple flow tables. -Notably, OpenFlow 1.0 does not allow the controller to specify the -flow table to which a flow is to be added. Open vSwitch adds an -extension for this purpose, which is enabled on a per-OpenFlow -connection basis using the NXT_FLOW_MOD_TABLE_ID message. When the -extension is enabled, the upper 8 bits of the 'command' member in an -OFPT_FLOW_MOD or NXT_FLOW_MOD message designates the table to which a -flow is to be added. - -The Open vSwitch software switch implementation offers 255 flow -tables. On packet ingress, only the first flow table (table 0) is -searched, and the contents of the remaining tables are not considered -in any way. Tables other than table 0 only come into play when an -NXAST_RESUBMIT_TABLE action specifies another table to search. - -Tables 128 and above are reserved for use by the switch itself. -Controllers should use only tables 0 through 127. - - -IPv6 -==== - -Open vSwitch supports stateless handling of IPv6 packets. Flows can be -written to support matching TCP, UDP, and ICMPv6 headers within an IPv6 -packet. Deeper matching of some Neighbor Discovery messages is also -supported. - -IPv6 was not designed to interact well with middle-boxes. This, -combined with Open vSwitch's stateless nature, have affected the -processing of IPv6 traffic, which is detailed below. - -Extension Headers ------------------ - -The base IPv6 header is incredibly simple with the intention of only -containing information relevant for routing packets between two -endpoints. IPv6 relies heavily on the use of extension headers to -provide any other functionality. Unfortunately, the extension headers -were designed in such a way that it is impossible to move to the next -header (including the layer-4 payload) unless the current header is -understood. - -Open vSwitch will process the following extension headers and continue -to the next header: - - * Fragment (see the next section) - * AH (Authentication Header) - * Hop-by-Hop Options - * Routing - * Destination Options - -When a header is encountered that is not in that list, it is considered -"terminal". A terminal header's IPv6 protocol value is stored in -"nw_proto" for matching purposes. If a terminal header is TCP, UDP, or -ICMPv6, the packet will be further processed in an attempt to extract -layer-4 information. - -Fragments ---------- - -IPv6 requires that every link in the internet have an MTU of 1280 octets -or greater (RFC 2460). As such, a terminal header (as described above in -"Extension Headers") in the first fragment should generally be -reachable. In this case, the terminal header's IPv6 protocol type is -stored in the "nw_proto" field for matching purposes. If a terminal -header cannot be found in the first fragment (one with a fragment offset -of zero), the "nw_proto" field is set to 0. Subsequent fragments (those -with a non-zero fragment offset) have the "nw_proto" field set to the -IPv6 protocol type for fragments (44). - -Jumbograms ----------- - -An IPv6 jumbogram (RFC 2675) is a packet containing a payload longer -than 65,535 octets. A jumbogram is only relevant in subnets with a link -MTU greater than 65,575 octets, and are not required to be supported on -nodes that do not connect to link with such large MTUs. Currently, Open -vSwitch doesn't process jumbograms. - - -In-Band Control -=============== - -Motivation ----------- - -An OpenFlow switch must establish and maintain a TCP network -connection to its controller. There are two basic ways to categorize -the network that this connection traverses: either it is completely -separate from the one that the switch is otherwise controlling, or its -path may overlap the network that the switch controls. We call the -former case "out-of-band control", the latter case "in-band control". - -Out-of-band control has the following benefits: - - - Simplicity: Out-of-band control slightly simplifies the switch - implementation. - - - Reliability: Excessive switch traffic volume cannot interfere - with control traffic. - - - Integrity: Machines not on the control network cannot - impersonate a switch or a controller. - - - Confidentiality: Machines not on the control network cannot - snoop on control traffic. - -In-band control, on the other hand, has the following advantages: - - - No dedicated port: There is no need to dedicate a physical - switch port to control, which is important on switches that have - few ports (e.g. wireless routers, low-end embedded platforms). - - - No dedicated network: There is no need to build and maintain a - separate control network. This is important in many - environments because it reduces proliferation of switches and - wiring. - -Open vSwitch supports both out-of-band and in-band control. This -section describes the principles behind in-band control. See the -description of the Controller table in ovs-vswitchd.conf.db(5) to -configure OVS for in-band control. - -Principles ----------- - -The fundamental principle of in-band control is that an OpenFlow -switch must recognize and switch control traffic without involving the -OpenFlow controller. All the details of implementing in-band control -are special cases of this principle. - -The rationale for this principle is simple. If the switch does not -handle in-band control traffic itself, then it will be caught in a -contradiction: it must contact the controller, but it cannot, because -only the controller can set up the flows that are needed to contact -the controller. - -The following points describe important special cases of this -principle. - - - In-band control must be implemented regardless of whether the - switch is connected. - - It is tempting to implement the in-band control rules only when - the switch is not connected to the controller, using the - reasoning that the controller should have complete control once - it has established a connection with the switch. - - This does not work in practice. Consider the case where the - switch is connected to the controller. Occasionally it can - happen that the controller forgets or otherwise needs to obtain - the MAC address of the switch. To do so, the controller sends a - broadcast ARP request. A switch that implements the in-band - control rules only when it is disconnected will then send an - OFPT_PACKET_IN message up to the controller. The controller will - be unable to respond, because it does not know the MAC address of - the switch. This is a deadlock situation that can only be - resolved by the switch noticing that its connection to the - controller has hung and reconnecting. - - - In-band control must override flows set up by the controller. - - It is reasonable to assume that flows set up by the OpenFlow - controller should take precedence over in-band control, on the - basis that the controller should be in charge of the switch. - - Again, this does not work in practice. Reasonable controller - implementations may set up a "last resort" fallback rule that - wildcards every field and, e.g., sends it up to the controller or - discards it. If a controller does that, then it will isolate - itself from the switch. - - - The switch must recognize all control traffic. - - The fundamental principle of in-band control states, in part, - that a switch must recognize control traffic without involving - the OpenFlow controller. More specifically, the switch must - recognize *all* control traffic. "False negatives", that is, - packets that constitute control traffic but that the switch does - not recognize as control traffic, lead to control traffic storms. - - Consider an OpenFlow switch that only recognizes control packets - sent to or from that switch. Now suppose that two switches of - this type, named A and B, are connected to ports on an Ethernet - hub (not a switch) and that an OpenFlow controller is connected - to a third hub port. In this setup, control traffic sent by - switch A will be seen by switch B, which will send it to the - controller as part of an OFPT_PACKET_IN message. Switch A will - then see the OFPT_PACKET_IN message's packet, re-encapsulate it - in another OFPT_PACKET_IN, and send it to the controller. Switch - B will then see that OFPT_PACKET_IN, and so on in an infinite - loop. - - Incidentally, the consequences of "false positives", where - packets that are not control traffic are nevertheless recognized - as control traffic, are much less severe. The controller will - not be able to control their behavior, but the network will - remain in working order. False positives do constitute a - security problem. - - - The switch should use echo-requests to detect disconnection. - - TCP will notice that a connection has hung, but this can take a - considerable amount of time. For example, with default settings - the Linux kernel TCP implementation will retransmit for between - 13 and 30 minutes, depending on the connection's retransmission - timeout, according to kernel documentation. This is far too long - for a switch to be disconnected, so an OpenFlow switch should - implement its own connection timeout. OpenFlow OFPT_ECHO_REQUEST - messages are the best way to do this, since they test the - OpenFlow connection itself. - -Implementation --------------- - -This section describes how Open vSwitch implements in-band control. -Correctly implementing in-band control has proven difficult due to its -many subtleties, and has thus gone through many iterations. Please -read through and understand the reasoning behind the chosen rules -before making modifications. - -Open vSwitch implements in-band control as "hidden" flows, that is, -flows that are not visible through OpenFlow, and at a higher priority -than wildcarded flows can be set up through OpenFlow. This is done so -that the OpenFlow controller cannot interfere with them and possibly -break connectivity with its switches. It is possible to see all -flows, including in-band ones, with the ovs-appctl "bridge/dump-flows" -command. - -The Open vSwitch implementation of in-band control can hide traffic to -arbitrary "remotes", where each remote is one TCP port on one IP address. -Currently the remotes are automatically configured as the in-band OpenFlow -controllers plus the OVSDB managers, if any. (The latter is a requirement -because OVSDB managers are responsible for configuring OpenFlow controllers, -so if the manager cannot be reached then OpenFlow cannot be reconfigured.) - -The following rules (with the OFPP_NORMAL action) are set up on any bridge -that has any remotes: - - (a) DHCP requests sent from the local port. - (b) ARP replies to the local port's MAC address. - (c) ARP requests from the local port's MAC address. - -In-band also sets up the following rules for each unique next-hop MAC -address for the remotes' IPs (the "next hop" is either the remote -itself, if it is on a local subnet, or the gateway to reach the remote): - - (d) ARP replies to the next hop's MAC address. - (e) ARP requests from the next hop's MAC address. - -In-band also sets up the following rules for each unique remote IP address: - - (f) ARP replies containing the remote's IP address as a target. - (g) ARP requests containing the remote's IP address as a source. - -In-band also sets up the following rules for each unique remote (IP,port) -pair: - - (h) TCP traffic to the remote's IP and port. - (i) TCP traffic from the remote's IP and port. - -The goal of these rules is to be as narrow as possible to allow a -switch to join a network and be able to communicate with the -remotes. As mentioned earlier, these rules have higher priority -than the controller's rules, so if they are too broad, they may -prevent the controller from implementing its policy. As such, -in-band actively monitors some aspects of flow and packet processing -so that the rules can be made more precise. - -In-band control monitors attempts to add flows into the datapath that -could interfere with its duties. The datapath only allows exact -match entries, so in-band control is able to be very precise about -the flows it prevents. Flows that miss in the datapath are sent to -userspace to be processed, so preventing these flows from being -cached in the "fast path" does not affect correctness. The only type -of flow that is currently prevented is one that would prevent DHCP -replies from being seen by the local port. For example, a rule that -forwarded all DHCP traffic to the controller would not be allowed, -but one that forwarded to all ports (including the local port) would. - -As mentioned earlier, packets that miss in the datapath are sent to -the userspace for processing. The userspace has its own flow table, -the "classifier", so in-band checks whether any special processing -is needed before the classifier is consulted. If a packet is a DHCP -response to a request from the local port, the packet is forwarded to -the local port, regardless of the flow table. Note that this requires -L7 processing of DHCP replies to determine whether the 'chaddr' field -matches the MAC address of the local port. - -It is interesting to note that for an L3-based in-band control -mechanism, the majority of rules are devoted to ARP traffic. At first -glance, some of these rules appear redundant. However, each serves an -important role. First, in order to determine the MAC address of the -remote side (controller or gateway) for other ARP rules, we must allow -ARP traffic for our local port with rules (b) and (c). If we are -between a switch and its connection to the remote, we have to -allow the other switch's ARP traffic to through. This is done with -rules (d) and (e), since we do not know the addresses of the other -switches a priori, but do know the remote's or gateway's. Finally, -if the remote is running in a local guest VM that is not reached -through the local port, the switch that is connected to the VM must -allow ARP traffic based on the remote's IP address, since it will -not know the MAC address of the local port that is sending the traffic -or the MAC address of the remote in the guest VM. - -With a few notable exceptions below, in-band should work in most -network setups. The following are considered "supported' in the -current implementation: - - - Locally Connected. The switch and remote are on the same - subnet. This uses rules (a), (b), (c), (h), and (i). - - - Reached through Gateway. The switch and remote are on - different subnets and must go through a gateway. This uses - rules (a), (b), (c), (h), and (i). - - - Between Switch and Remote. This switch is between another - switch and the remote, and we want to allow the other - switch's traffic through. This uses rules (d), (e), (h), and - (i). It uses (b) and (c) indirectly in order to know the MAC - address for rules (d) and (e). Note that DHCP for the other - switch will not work unless an OpenFlow controller explicitly lets this - switch pass the traffic. - - - Between Switch and Gateway. This switch is between another - switch and the gateway, and we want to allow the other switch's - traffic through. This uses the same rules and logic as the - "Between Switch and Remote" configuration described earlier. - - - Remote on Local VM. The remote is a guest VM on the - system running in-band control. This uses rules (a), (b), (c), - (h), and (i). - - - Remote on Local VM with Different Networks. The remote - is a guest VM on the system running in-band control, but the - local port is not used to connect to the remote. For - example, an IP address is configured on eth0 of the switch. The - remote's VM is connected through eth1 of the switch, but an - IP address has not been configured for that port on the switch. - As such, the switch will use eth0 to connect to the remote, - and eth1's rules about the local port will not work. In the - example, the switch attached to eth0 would use rules (a), (b), - (c), (h), and (i) on eth0. The switch attached to eth1 would use - rules (f), (g), (h), and (i). - -The following are explicitly *not* supported by in-band control: - - - Specify Remote by Name. Currently, the remote must be - identified by IP address. A naive approach would be to permit - all DNS traffic. Unfortunately, this would prevent the - controller from defining any policy over DNS. Since switches - that are located behind us need to connect to the remote, - in-band cannot simply add a rule that allows DNS traffic from - the local port. The "correct" way to support this is to parse - DNS requests to allow all traffic related to a request for the - remote's name through. Due to the potential security - problems and amount of processing, we decided to hold off for - the time-being. - - - Differing Remotes for Switches. All switches must know - the L3 addresses for all the remotes that other switches - may use, since rules need to be set up to allow traffic related - to those remotes through. See rules (f), (g), (h), and (i). - - - Differing Routes for Switches. In order for the switch to - allow other switches to connect to a remote through a - gateway, it allows the gateway's traffic through with rules (d) - and (e). If the routes to the remote differ for the two - switches, we will not know the MAC address of the alternate - gateway. - - -Action Reproduction -=================== - -It seems likely that many controllers, at least at startup, use the -OpenFlow "flow statistics" request to obtain existing flows, then -compare the flows' actions against the actions that they expect to -find. Before version 1.8.0, Open vSwitch always returned exact, -byte-for-byte copies of the actions that had been added to the flow -table. The current version of Open vSwitch does not always do this in -some exceptional cases. This section lists the exceptions that -controller authors must keep in mind if they compare actual actions -against desired actions in a bytewise fashion: - - - Open vSwitch zeros padding bytes in action structures, - regardless of their values when the flows were added. - - - Open vSwitch "normalizes" the instructions in OpenFlow 1.1 - (and later) in the following way: - - * OVS sorts the instructions into the following order: - Apply-Actions, Clear-Actions, Write-Actions, - Write-Metadata, Goto-Table. - - * OVS drops Apply-Actions instructions that have empty - action lists. - - * OVS drops Write-Actions instructions that have empty - action sets. - -Please report other discrepancies, if you notice any, so that we can -fix or document them. - - -Suggestions -=========== - -Suggestions to improve Open vSwitch are welcome at discuss@openvswitch.org. diff --git a/DESIGN.md b/DESIGN.md new file mode 100644 index 000000000..6f8d09072 --- /dev/null +++ b/DESIGN.md @@ -0,0 +1,944 @@ +Design Decisions In Open vSwitch +================================ + +This document describes design decisions that went into implementing +Open vSwitch. While we believe these to be reasonable decisions, it is +impossible to predict how Open vSwitch will be used in all environments. +Understanding assumptions made by Open vSwitch is critical to a +successful deployment. The end of this document contains contact +information that can be used to let us know how we can make Open vSwitch +more generally useful. + +Asynchronous Messages +===================== + +Over time, Open vSwitch has added many knobs that control whether a +given controller receives OpenFlow asynchronous messages. This +section describes how all of these features interact. + +First, a service controller never receives any asynchronous messages +unless it changes its miss_send_len from the service controller +default of zero in one of the following ways: + + - Sending an OFPT_SET_CONFIG message with nonzero miss_send_len. + + - Sending any NXT_SET_ASYNC_CONFIG message: as a side effect, this + message changes the miss_send_len to + OFP_DEFAULT_MISS_SEND_LEN (128) for service controllers. + +Second, OFPT_FLOW_REMOVED and NXT_FLOW_REMOVED messages are generated +only if the flow that was removed had the OFPFF_SEND_FLOW_REM flag +set. + +Third, OFPT_PACKET_IN and NXT_PACKET_IN messages are sent only to +OpenFlow controller connections that have the correct connection ID +(see "struct nx_controller_id" and "struct nx_action_controller"): + + - For packet-in messages generated by a NXAST_CONTROLLER action, + the controller ID specified in the action. + + - For other packet-in messages, controller ID zero. (This is the + default ID when an OpenFlow controller does not configure one.) + +Finally, Open vSwitch consults a per-connection table indexed by the +message type, reason code, and current role. The following table +shows how this table is initialized by default when an OpenFlow +connection is made. An entry labeled "yes" means that the message is +sent, an entry labeled "---" means that the message is suppressed. + +``` + master/ + message and reason code other slave + ---------------------------------------- ------- ----- + OFPT_PACKET_IN / NXT_PACKET_IN + OFPR_NO_MATCH yes --- + OFPR_ACTION yes --- + OFPR_INVALID_TTL --- --- + + OFPT_FLOW_REMOVED / NXT_FLOW_REMOVED + OFPRR_IDLE_TIMEOUT yes --- + OFPRR_HARD_TIMEOUT yes --- + OFPRR_DELETE yes --- + + OFPT_PORT_STATUS + OFPPR_ADD yes yes + OFPPR_DELETE yes yes + OFPPR_MODIFY yes yes +``` + +The NXT_SET_ASYNC_CONFIG message directly sets all of the values in +this table for the current connection. The +OFPC_INVALID_TTL_TO_CONTROLLER bit in the OFPT_SET_CONFIG message +controls the setting for OFPR_INVALID_TTL for the "master" role. + + +OFPAT_ENQUEUE +============= + +The OpenFlow 1.0 specification requires the output port of the OFPAT_ENQUEUE +action to "refer to a valid physical port (i.e. < OFPP_MAX) or OFPP_IN_PORT". +Although OFPP_LOCAL is not less than OFPP_MAX, it is an 'internal' port which +can have QoS applied to it in Linux. Since we allow the OFPAT_ENQUEUE to apply +to 'internal' ports whose port numbers are less than OFPP_MAX, we interpret +OFPP_LOCAL as a physical port and support OFPAT_ENQUEUE on it as well. + + +OFPT_FLOW_MOD +============= + +The OpenFlow specification for the behavior of OFPT_FLOW_MOD is +confusing. The following tables summarize the Open vSwitch +implementation of its behavior in the following categories: + + - "match on priority": Whether the flow_mod acts only on flows + whose priority matches that included in the flow_mod message. + + - "match on out_port": Whether the flow_mod acts only on flows + that output to the out_port included in the flow_mod message (if + out_port is not OFPP_NONE). OpenFlow 1.1 and later have a + similar feature (not listed separately here) for out_group. + + - "match on flow_cookie": Whether the flow_mod acts only on flows + whose flow_cookie matches an optional controller-specified value + and mask. + + - "updates flow_cookie": Whether the flow_mod changes the + flow_cookie of the flow or flows that it matches to the + flow_cookie included in the flow_mod message. + + - "updates OFPFF_ flags": Whether the flow_mod changes the + OFPFF_SEND_FLOW_REM flag of the flow or flows that it matches to + the setting included in the flags of the flow_mod message. + + - "honors OFPFF_CHECK_OVERLAP": Whether the OFPFF_CHECK_OVERLAP + flag in the flow_mod is significant. + + - "updates idle_timeout" and "updates hard_timeout": Whether the + idle_timeout and hard_timeout in the flow_mod, respectively, + have an effect on the flow or flows matched by the flow_mod. + + - "updates idle timer": Whether the flow_mod resets the per-flow + timer that measures how long a flow has been idle. + + - "updates hard timer": Whether the flow_mod resets the per-flow + timer that measures how long it has been since a flow was + modified. + + - "zeros counters": Whether the flow_mod resets per-flow packet + and byte counters to zero. + + - "may add a new flow": Whether the flow_mod may add a new flow to + the flow table. (Obviously this is always true for "add" + commands but in some OpenFlow versions "modify" and + "modify-strict" can also add new flows.) + + - "sends flow_removed message": Whether the flow_mod generates a + flow_removed message for the flow or flows that it affects. + +An entry labeled "yes" means that the flow mod type does have the +indicated behavior, "---" means that it does not, an empty cell means +that the property is not applicable, and other values are explained +below the table. + +OpenFlow 1.0 +------------ + +``` + MODIFY DELETE + ADD MODIFY STRICT DELETE STRICT + === ====== ====== ====== ====== +match on priority yes --- yes --- yes +match on out_port --- --- --- yes yes +match on flow_cookie --- --- --- --- --- +match on table_id --- --- --- --- --- +controller chooses table_id --- --- --- +updates flow_cookie yes yes yes +updates OFPFF_SEND_FLOW_REM yes + + +honors OFPFF_CHECK_OVERLAP yes + + +updates idle_timeout yes + + +updates hard_timeout yes + + +resets idle timer yes + + +resets hard timer yes yes yes +zeros counters yes + + +may add a new flow yes yes yes +sends flow_removed message --- --- --- % % + +(+) "modify" and "modify-strict" only take these actions when they + create a new flow, not when they update an existing flow. + +(%) "delete" and "delete_strict" generates a flow_removed message if + the deleted flow or flows have the OFPFF_SEND_FLOW_REM flag set. + (Each controller can separately control whether it wants to + receive the generated messages.) +``` + +OpenFlow 1.1 +------------ + +OpenFlow 1.1 makes these changes: + + - The controller now must specify the table_id of the flow match + searched and into which a flow may be inserted. Behavior for a + table_id of 255 is undefined. + + - A flow_mod, except an "add", can now match on the flow_cookie. + + - When a flow_mod matches on the flow_cookie, "modify" and + "modify-strict" never insert a new flow. + +``` + MODIFY DELETE + ADD MODIFY STRICT DELETE STRICT + === ====== ====== ====== ====== +match on priority yes --- yes --- yes +match on out_port --- --- --- yes yes +match on flow_cookie --- yes yes yes yes +match on table_id yes yes yes yes yes +controller chooses table_id yes yes yes +updates flow_cookie yes --- --- +updates OFPFF_SEND_FLOW_REM yes + + +honors OFPFF_CHECK_OVERLAP yes + + +updates idle_timeout yes + + +updates hard_timeout yes + + +resets idle timer yes + + +resets hard timer yes yes yes +zeros counters yes + + +may add a new flow yes # # +sends flow_removed message --- --- --- % % + +(+) "modify" and "modify-strict" only take these actions when they + create a new flow, not when they update an existing flow. + +(%) "delete" and "delete_strict" generates a flow_removed message if + the deleted flow or flows have the OFPFF_SEND_FLOW_REM flag set. + (Each controller can separately control whether it wants to + receive the generated messages.) + +(#) "modify" and "modify-strict" only add a new flow if the flow_mod + does not match on any bits of the flow cookie +``` + +OpenFlow 1.2 +------------ + +OpenFlow 1.2 makes these changes: + + - Only "add" commands ever add flows, "modify" and "modify-strict" + never do. + + - A new flag OFPFF_RESET_COUNTS now controls whether "modify" and + "modify-strict" reset counters, whereas previously they never + reset counters (except when they inserted a new flow). + +``` + MODIFY DELETE + ADD MODIFY STRICT DELETE STRICT + === ====== ====== ====== ====== +match on priority yes --- yes --- yes +match on out_port --- --- --- yes yes +match on flow_cookie --- yes yes yes yes +match on table_id yes yes yes yes yes +controller chooses table_id yes yes yes +updates flow_cookie yes --- --- +updates OFPFF_SEND_FLOW_REM yes --- --- +honors OFPFF_CHECK_OVERLAP yes --- --- +updates idle_timeout yes --- --- +updates hard_timeout yes --- --- +resets idle timer yes --- --- +resets hard timer yes yes yes +zeros counters yes & & +may add a new flow yes --- --- +sends flow_removed message --- --- --- % % + +(%) "delete" and "delete_strict" generates a flow_removed message if + the deleted flow or flows have the OFPFF_SEND_FLOW_REM flag set. + (Each controller can separately control whether it wants to + receive the generated messages.) + +(&) "modify" and "modify-strict" reset counters if the + OFPFF_RESET_COUNTS flag is specified. +``` + +OpenFlow 1.3 +------------ + +OpenFlow 1.3 makes these changes: + + - Behavior for a table_id of 255 is now defined, for "delete" and + "delete-strict" commands, as meaning to delete from all tables. + A table_id of 255 is now explicitly invalid for other commands. + + - New flags OFPFF_NO_PKT_COUNTS and OFPFF_NO_BYT_COUNTS for "add" + operations. + +The table for 1.3 is the same as the one shown above for 1.2. + + +OpenFlow 1.4 +------------ + +OpenFlow 1.4 does not change flow_mod semantics. + + +OFPT_PACKET_IN +============== + +The OpenFlow 1.1 specification for OFPT_PACKET_IN is confusing. The +definition in OF1.1 openflow.h is[*]: + +``` + /* Packet received on port (datapath -> controller). */ + struct ofp_packet_in { + struct ofp_header header; + uint32_t buffer_id; /* ID assigned by datapath. */ + uint32_t in_port; /* Port on which frame was received. */ + uint32_t in_phy_port; /* Physical Port on which frame was received. */ + uint16_t total_len; /* Full length of frame. */ + uint8_t reason; /* Reason packet is being sent (one of OFPR_*) */ + uint8_t table_id; /* ID of the table that was looked up */ + uint8_t data[0]; /* Ethernet frame, halfway through 32-bit word, + so the IP header is 32-bit aligned. The + amount of data is inferred from the length + field in the header. Because of padding, + offsetof(struct ofp_packet_in, data) == + sizeof(struct ofp_packet_in) - 2. */ + }; + OFP_ASSERT(sizeof(struct ofp_packet_in) == 24); +``` + +The confusing part is the comment on the data[] member. This comment +is a leftover from OF1.0 openflow.h, in which the comment was correct: +sizeof(struct ofp_packet_in) is 20 in OF1.0 and offsetof(struct +ofp_packet_in, data) is 18. When OF1.1 was written, the structure +members were changed but the comment was carelessly not updated, and +the comment became wrong: sizeof(struct ofp_packet_in) and +offsetof(struct ofp_packet_in, data) are both 24 in OF1.1. + +That leaves the question of how to implement ofp_packet_in in OF1.1. +The OpenFlow reference implementation for OF1.1 does not include any +padding, that is, the first byte of the encapsulated frame immediately +follows the 'table_id' member without a gap. Open vSwitch therefore +implements it the same way for compatibility. + +For an earlier discussion, please see the thread archived at: +https://mailman.stanford.edu/pipermail/openflow-discuss/2011-August/002604.html + +[*] The quoted definition is directly from OF1.1. Definitions used + inside OVS omit the 8-byte ofp_header members, so the sizes in + this discussion are 8 bytes larger than those declared in OVS + header files. + + +VLAN Matching +============= + +The 802.1Q VLAN header causes more trouble than any other 4 bytes in +networking. More specifically, three versions of OpenFlow and Open +vSwitch have among them four different ways to match the contents and +presence of the VLAN header. The following table describes how each +version works. + + Match NXM OF1.0 OF1.1 OF1.2 + ----- --------- ----------- ----------- ------------ + [1] 0000/0000 ????/1,??/? ????/1,??/? 0000/0000,-- + [2] 0000/ffff ffff/0,??/? ffff/0,??/? 0000/ffff,-- + [3] 1xxx/1fff 0xxx/0,??/1 0xxx/0,??/1 1xxx/ffff,-- + [4] z000/f000 ????/1,0y/0 fffe/0,0y/0 1000/1000,0y + [5] zxxx/ffff 0xxx/0,0y/0 0xxx/0,0y/0 1xxx/ffff,0y + [6] 0000/0fff + [7] 0000/f000 + [8] 0000/efff + [9] 1001/1001 1001/1001,-- + [10] 3000/3000 + +Each column is interpreted as follows. + + - Match: See the list below. + + - NXM: xxxx/yyyy means NXM_OF_VLAN_TCI_W with value xxxx and mask + yyyy. A mask of 0000 is equivalent to omitting + NXM_OF_VLAN_TCI(_W), a mask of ffff is equivalent to + NXM_OF_VLAN_TCI. + + - OF1.0 and OF1.1: wwww/x,yy/z means dl_vlan wwww, OFPFW_DL_VLAN + x, dl_vlan_pcp yy, and OFPFW_DL_VLAN_PCP z. ? means that the + given nibble is ignored (and conventionally 0 for wwww or yy, + conventionally 1 for x or z). means that the given match + is not supported. + + - OF1.2: xxxx/yyyy,zz means OXM_OF_VLAN_VID_W with value xxxx and + mask yyyy, and OXM_OF_VLAN_PCP (which is not maskable) with + value zz. A mask of 0000 is equivalent to omitting + OXM_OF_VLAN_VID(_W), a mask of ffff is equivalent to + OXM_OF_VLAN_VID. -- means that OXM_OF_VLAN_PCP is omitted. + means that the given match is not supported. + +The matches are: + + [1] Matches any packet, that is, one without an 802.1Q header or with + an 802.1Q header with any TCI value. + + [2] Matches only packets without an 802.1Q header. + + NXM: Any match with (vlan_tci == 0) and (vlan_tci_mask & 0x1000) + != 0 is equivalent to the one listed in the table. + + OF1.0: The spec doesn't define behavior if dl_vlan is set to + 0xffff and OFPFW_DL_VLAN_PCP is not set. + + OF1.1: The spec says explicitly to ignore dl_vlan_pcp when + dl_vlan is set to 0xffff. + + OF1.2: The spec doesn't say what should happen if (vlan_vid == 0) + and (vlan_vid_mask & 0x1000) != 0 but (vlan_vid_mask != 0x1000), + but it would be straightforward to also interpret as [2]. + + [3] Matches only packets that have an 802.1Q header with VID xxx (and + any PCP). + + [4] Matches only packets that have an 802.1Q header with PCP y (and + any VID). + + NXM: z is ((y << 1) | 1). + + OF1.0: The spec isn't very clear, but OVS implements it this way. + + OF1.2: Presumably other masks such that (vlan_vid_mask & 0x1fff) + == 0x1000 would also work, but the spec doesn't define their + behavior. + + [5] Matches only packets that have an 802.1Q header with VID xxx and + PCP y. + + NXM: z is ((y << 1) | 1). + + OF1.2: Presumably other masks such that (vlan_vid_mask & 0x1fff) + == 0x1fff would also work. + + [6] Matches packets with no 802.1Q header or with an 802.1Q header + with a VID of 0. Only possible with NXM. + + [7] Matches packets with no 802.1Q header or with an 802.1Q header + with a PCP of 0. Only possible with NXM. + + [8] Matches packets with no 802.1Q header or with an 802.1Q header + with both VID and PCP of 0. Only possible with NXM. + + [9] Matches only packets that have an 802.1Q header with an + odd-numbered VID (and any PCP). Only possible with NXM and + OF1.2. (This is just an example; one can match on any desired + VID bit pattern.) + +[10] Matches only packets that have an 802.1Q header with an + odd-numbered PCP (and any VID). Only possible with NXM. (This + is just an example; one can match on any desired VID bit + pattern.) + +Additional notes: + + - OF1.2: The top three bits of OXM_OF_VLAN_VID are fixed to zero, + so bits 13, 14, and 15 in the masks listed in the table may be + set to arbitrary values, as long as the corresponding value bits + are also zero. The suggested ffff mask for [2], [3], and [5] + allows a shorter OXM representation (the mask is omitted) than + the minimal 1fff mask. + + +Flow Cookies +============ + +OpenFlow 1.0 and later versions have the concept of a "flow cookie", +which is a 64-bit integer value attached to each flow. The treatment +of the flow cookie has varied greatly across OpenFlow versions, +however. + +In OpenFlow 1.0: + + - OFPFC_ADD set the cookie in the flow that it added. + + - OFPFC_MODIFY and OFPFC_MODIFY_STRICT updated the cookie for + the flow or flows that it modified. + + - OFPST_FLOW messages included the flow cookie. + + - OFPT_FLOW_REMOVED messages reported the cookie of the flow + that was removed. + +OpenFlow 1.1 made the following changes: + + - Flow mod operations OFPFC_MODIFY, OFPFC_MODIFY_STRICT, + OFPFC_DELETE, and OFPFC_DELETE_STRICT, plus flow stats + requests and aggregate stats requests, gained the ability to + match on flow cookies with an arbitrary mask. + + - OFPFC_MODIFY and OFPFC_MODIFY_STRICT were changed to add a + new flow, in the case of no match, only if the flow table + modification operation did not match on the cookie field. + (In OpenFlow 1.0, modify operations always added a new flow + when there was no match.) + + - OFPFC_MODIFY and OFPFC_MODIFY_STRICT no longer updated flow + cookies. + +OpenFlow 1.2 made the following changes: + + - OFPC_MODIFY and OFPFC_MODIFY_STRICT were changed to never + add a new flow, regardless of whether the flow cookie was + used for matching. + +Open vSwitch support for OpenFlow 1.0 implements the OpenFlow 1.0 +behavior with the following extensions: + + - An NXM extension field NXM_NX_COOKIE(_W) allows the NXM + versions of OFPFC_MODIFY, OFPFC_MODIFY_STRICT, OFPFC_DELETE, + and OFPFC_DELETE_STRICT flow_mods, plus flow stats requests + and aggregate stats requests, to match on flow cookies with + arbitrary masks. This is much like the equivalent OpenFlow + 1.1 feature. + + - Like OpenFlow 1.1, OFPC_MODIFY and OFPFC_MODIFY_STRICT add a + new flow if there is no match and the mask is zero (or not + given). + + - The "cookie" field in OFPT_FLOW_MOD and NXT_FLOW_MOD messages + is used as the cookie value for OFPFC_ADD commands, as + described in OpenFlow 1.0. For OFPFC_MODIFY and + OFPFC_MODIFY_STRICT commands, the "cookie" field is used as a + new cookie for flows that match unless it is UINT64_MAX, in + which case the flow's cookie is not updated. + + - NXT_PACKET_IN (the Nicira extended version of + OFPT_PACKET_IN) reports the cookie of the rule that + generated the packet, or all-1-bits if no rule generated the + packet. (Older versions of OVS used all-0-bits instead of + all-1-bits.) + +The following table shows the handling of different protocols when +receiving OFPFC_MODIFY and OFPFC_MODIFY_STRICT messages. A mask of 0 +indicates either an explicit mask of zero or an implicit one by not +specifying the NXM_NX_COOKIE(_W) field. + +``` + Match Update Add on miss Add on miss + cookie cookie mask!=0 mask==0 + ====== ====== =========== =========== +OpenFlow 1.0 no yes +OpenFlow 1.1 yes no no yes +OpenFlow 1.2 yes no no no +NXM yes yes* no yes + +* Updates the flow's cookie unless the "cookie" field is UINT64_MAX. +``` + +Multiple Table Support +====================== + +OpenFlow 1.0 has only rudimentary support for multiple flow tables. +Notably, OpenFlow 1.0 does not allow the controller to specify the +flow table to which a flow is to be added. Open vSwitch adds an +extension for this purpose, which is enabled on a per-OpenFlow +connection basis using the NXT_FLOW_MOD_TABLE_ID message. When the +extension is enabled, the upper 8 bits of the 'command' member in an +OFPT_FLOW_MOD or NXT_FLOW_MOD message designates the table to which a +flow is to be added. + +The Open vSwitch software switch implementation offers 255 flow +tables. On packet ingress, only the first flow table (table 0) is +searched, and the contents of the remaining tables are not considered +in any way. Tables other than table 0 only come into play when an +NXAST_RESUBMIT_TABLE action specifies another table to search. + +Tables 128 and above are reserved for use by the switch itself. +Controllers should use only tables 0 through 127. + + +IPv6 +==== + +Open vSwitch supports stateless handling of IPv6 packets. Flows can be +written to support matching TCP, UDP, and ICMPv6 headers within an IPv6 +packet. Deeper matching of some Neighbor Discovery messages is also +supported. + +IPv6 was not designed to interact well with middle-boxes. This, +combined with Open vSwitch's stateless nature, have affected the +processing of IPv6 traffic, which is detailed below. + +Extension Headers +----------------- + +The base IPv6 header is incredibly simple with the intention of only +containing information relevant for routing packets between two +endpoints. IPv6 relies heavily on the use of extension headers to +provide any other functionality. Unfortunately, the extension headers +were designed in such a way that it is impossible to move to the next +header (including the layer-4 payload) unless the current header is +understood. + +Open vSwitch will process the following extension headers and continue +to the next header: + + * Fragment (see the next section) + * AH (Authentication Header) + * Hop-by-Hop Options + * Routing + * Destination Options + +When a header is encountered that is not in that list, it is considered +"terminal". A terminal header's IPv6 protocol value is stored in +"nw_proto" for matching purposes. If a terminal header is TCP, UDP, or +ICMPv6, the packet will be further processed in an attempt to extract +layer-4 information. + +Fragments +--------- + +IPv6 requires that every link in the internet have an MTU of 1280 octets +or greater (RFC 2460). As such, a terminal header (as described above in +"Extension Headers") in the first fragment should generally be +reachable. In this case, the terminal header's IPv6 protocol type is +stored in the "nw_proto" field for matching purposes. If a terminal +header cannot be found in the first fragment (one with a fragment offset +of zero), the "nw_proto" field is set to 0. Subsequent fragments (those +with a non-zero fragment offset) have the "nw_proto" field set to the +IPv6 protocol type for fragments (44). + +Jumbograms +---------- + +An IPv6 jumbogram (RFC 2675) is a packet containing a payload longer +than 65,535 octets. A jumbogram is only relevant in subnets with a link +MTU greater than 65,575 octets, and are not required to be supported on +nodes that do not connect to link with such large MTUs. Currently, Open +vSwitch doesn't process jumbograms. + + +In-Band Control +=============== + +Motivation +---------- + +An OpenFlow switch must establish and maintain a TCP network +connection to its controller. There are two basic ways to categorize +the network that this connection traverses: either it is completely +separate from the one that the switch is otherwise controlling, or its +path may overlap the network that the switch controls. We call the +former case "out-of-band control", the latter case "in-band control". + +Out-of-band control has the following benefits: + + - Simplicity: Out-of-band control slightly simplifies the switch + implementation. + + - Reliability: Excessive switch traffic volume cannot interfere + with control traffic. + + - Integrity: Machines not on the control network cannot + impersonate a switch or a controller. + + - Confidentiality: Machines not on the control network cannot + snoop on control traffic. + +In-band control, on the other hand, has the following advantages: + + - No dedicated port: There is no need to dedicate a physical + switch port to control, which is important on switches that have + few ports (e.g. wireless routers, low-end embedded platforms). + + - No dedicated network: There is no need to build and maintain a + separate control network. This is important in many + environments because it reduces proliferation of switches and + wiring. + +Open vSwitch supports both out-of-band and in-band control. This +section describes the principles behind in-band control. See the +description of the Controller table in ovs-vswitchd.conf.db(5) to +configure OVS for in-band control. + +Principles +---------- + +The fundamental principle of in-band control is that an OpenFlow +switch must recognize and switch control traffic without involving the +OpenFlow controller. All the details of implementing in-band control +are special cases of this principle. + +The rationale for this principle is simple. If the switch does not +handle in-band control traffic itself, then it will be caught in a +contradiction: it must contact the controller, but it cannot, because +only the controller can set up the flows that are needed to contact +the controller. + +The following points describe important special cases of this +principle. + + - In-band control must be implemented regardless of whether the + switch is connected. + + It is tempting to implement the in-band control rules only when + the switch is not connected to the controller, using the + reasoning that the controller should have complete control once + it has established a connection with the switch. + + This does not work in practice. Consider the case where the + switch is connected to the controller. Occasionally it can + happen that the controller forgets or otherwise needs to obtain + the MAC address of the switch. To do so, the controller sends a + broadcast ARP request. A switch that implements the in-band + control rules only when it is disconnected will then send an + OFPT_PACKET_IN message up to the controller. The controller will + be unable to respond, because it does not know the MAC address of + the switch. This is a deadlock situation that can only be + resolved by the switch noticing that its connection to the + controller has hung and reconnecting. + + - In-band control must override flows set up by the controller. + + It is reasonable to assume that flows set up by the OpenFlow + controller should take precedence over in-band control, on the + basis that the controller should be in charge of the switch. + + Again, this does not work in practice. Reasonable controller + implementations may set up a "last resort" fallback rule that + wildcards every field and, e.g., sends it up to the controller or + discards it. If a controller does that, then it will isolate + itself from the switch. + + - The switch must recognize all control traffic. + + The fundamental principle of in-band control states, in part, + that a switch must recognize control traffic without involving + the OpenFlow controller. More specifically, the switch must + recognize *all* control traffic. "False negatives", that is, + packets that constitute control traffic but that the switch does + not recognize as control traffic, lead to control traffic storms. + + Consider an OpenFlow switch that only recognizes control packets + sent to or from that switch. Now suppose that two switches of + this type, named A and B, are connected to ports on an Ethernet + hub (not a switch) and that an OpenFlow controller is connected + to a third hub port. In this setup, control traffic sent by + switch A will be seen by switch B, which will send it to the + controller as part of an OFPT_PACKET_IN message. Switch A will + then see the OFPT_PACKET_IN message's packet, re-encapsulate it + in another OFPT_PACKET_IN, and send it to the controller. Switch + B will then see that OFPT_PACKET_IN, and so on in an infinite + loop. + + Incidentally, the consequences of "false positives", where + packets that are not control traffic are nevertheless recognized + as control traffic, are much less severe. The controller will + not be able to control their behavior, but the network will + remain in working order. False positives do constitute a + security problem. + + - The switch should use echo-requests to detect disconnection. + + TCP will notice that a connection has hung, but this can take a + considerable amount of time. For example, with default settings + the Linux kernel TCP implementation will retransmit for between + 13 and 30 minutes, depending on the connection's retransmission + timeout, according to kernel documentation. This is far too long + for a switch to be disconnected, so an OpenFlow switch should + implement its own connection timeout. OpenFlow OFPT_ECHO_REQUEST + messages are the best way to do this, since they test the + OpenFlow connection itself. + +Implementation +-------------- + +This section describes how Open vSwitch implements in-band control. +Correctly implementing in-band control has proven difficult due to its +many subtleties, and has thus gone through many iterations. Please +read through and understand the reasoning behind the chosen rules +before making modifications. + +Open vSwitch implements in-band control as "hidden" flows, that is, +flows that are not visible through OpenFlow, and at a higher priority +than wildcarded flows can be set up through OpenFlow. This is done so +that the OpenFlow controller cannot interfere with them and possibly +break connectivity with its switches. It is possible to see all +flows, including in-band ones, with the ovs-appctl "bridge/dump-flows" +command. + +The Open vSwitch implementation of in-band control can hide traffic to +arbitrary "remotes", where each remote is one TCP port on one IP address. +Currently the remotes are automatically configured as the in-band OpenFlow +controllers plus the OVSDB managers, if any. (The latter is a requirement +because OVSDB managers are responsible for configuring OpenFlow controllers, +so if the manager cannot be reached then OpenFlow cannot be reconfigured.) + +The following rules (with the OFPP_NORMAL action) are set up on any bridge +that has any remotes: + + (a) DHCP requests sent from the local port. + (b) ARP replies to the local port's MAC address. + (c) ARP requests from the local port's MAC address. + +In-band also sets up the following rules for each unique next-hop MAC +address for the remotes' IPs (the "next hop" is either the remote +itself, if it is on a local subnet, or the gateway to reach the remote): + + (d) ARP replies to the next hop's MAC address. + (e) ARP requests from the next hop's MAC address. + +In-band also sets up the following rules for each unique remote IP address: + + (f) ARP replies containing the remote's IP address as a target. + (g) ARP requests containing the remote's IP address as a source. + +In-band also sets up the following rules for each unique remote (IP,port) +pair: + + (h) TCP traffic to the remote's IP and port. + (i) TCP traffic from the remote's IP and port. + +The goal of these rules is to be as narrow as possible to allow a +switch to join a network and be able to communicate with the +remotes. As mentioned earlier, these rules have higher priority +than the controller's rules, so if they are too broad, they may +prevent the controller from implementing its policy. As such, +in-band actively monitors some aspects of flow and packet processing +so that the rules can be made more precise. + +In-band control monitors attempts to add flows into the datapath that +could interfere with its duties. The datapath only allows exact +match entries, so in-band control is able to be very precise about +the flows it prevents. Flows that miss in the datapath are sent to +userspace to be processed, so preventing these flows from being +cached in the "fast path" does not affect correctness. The only type +of flow that is currently prevented is one that would prevent DHCP +replies from being seen by the local port. For example, a rule that +forwarded all DHCP traffic to the controller would not be allowed, +but one that forwarded to all ports (including the local port) would. + +As mentioned earlier, packets that miss in the datapath are sent to +the userspace for processing. The userspace has its own flow table, +the "classifier", so in-band checks whether any special processing +is needed before the classifier is consulted. If a packet is a DHCP +response to a request from the local port, the packet is forwarded to +the local port, regardless of the flow table. Note that this requires +L7 processing of DHCP replies to determine whether the 'chaddr' field +matches the MAC address of the local port. + +It is interesting to note that for an L3-based in-band control +mechanism, the majority of rules are devoted to ARP traffic. At first +glance, some of these rules appear redundant. However, each serves an +important role. First, in order to determine the MAC address of the +remote side (controller or gateway) for other ARP rules, we must allow +ARP traffic for our local port with rules (b) and (c). If we are +between a switch and its connection to the remote, we have to +allow the other switch's ARP traffic to through. This is done with +rules (d) and (e), since we do not know the addresses of the other +switches a priori, but do know the remote's or gateway's. Finally, +if the remote is running in a local guest VM that is not reached +through the local port, the switch that is connected to the VM must +allow ARP traffic based on the remote's IP address, since it will +not know the MAC address of the local port that is sending the traffic +or the MAC address of the remote in the guest VM. + +With a few notable exceptions below, in-band should work in most +network setups. The following are considered "supported' in the +current implementation: + + - Locally Connected. The switch and remote are on the same + subnet. This uses rules (a), (b), (c), (h), and (i). + + - Reached through Gateway. The switch and remote are on + different subnets and must go through a gateway. This uses + rules (a), (b), (c), (h), and (i). + + - Between Switch and Remote. This switch is between another + switch and the remote, and we want to allow the other + switch's traffic through. This uses rules (d), (e), (h), and + (i). It uses (b) and (c) indirectly in order to know the MAC + address for rules (d) and (e). Note that DHCP for the other + switch will not work unless an OpenFlow controller explicitly lets this + switch pass the traffic. + + - Between Switch and Gateway. This switch is between another + switch and the gateway, and we want to allow the other switch's + traffic through. This uses the same rules and logic as the + "Between Switch and Remote" configuration described earlier. + + - Remote on Local VM. The remote is a guest VM on the + system running in-band control. This uses rules (a), (b), (c), + (h), and (i). + + - Remote on Local VM with Different Networks. The remote + is a guest VM on the system running in-band control, but the + local port is not used to connect to the remote. For + example, an IP address is configured on eth0 of the switch. The + remote's VM is connected through eth1 of the switch, but an + IP address has not been configured for that port on the switch. + As such, the switch will use eth0 to connect to the remote, + and eth1's rules about the local port will not work. In the + example, the switch attached to eth0 would use rules (a), (b), + (c), (h), and (i) on eth0. The switch attached to eth1 would use + rules (f), (g), (h), and (i). + +The following are explicitly *not* supported by in-band control: + + - Specify Remote by Name. Currently, the remote must be + identified by IP address. A naive approach would be to permit + all DNS traffic. Unfortunately, this would prevent the + controller from defining any policy over DNS. Since switches + that are located behind us need to connect to the remote, + in-band cannot simply add a rule that allows DNS traffic from + the local port. The "correct" way to support this is to parse + DNS requests to allow all traffic related to a request for the + remote's name through. Due to the potential security + problems and amount of processing, we decided to hold off for + the time-being. + + - Differing Remotes for Switches. All switches must know + the L3 addresses for all the remotes that other switches + may use, since rules need to be set up to allow traffic related + to those remotes through. See rules (f), (g), (h), and (i). + + - Differing Routes for Switches. In order for the switch to + allow other switches to connect to a remote through a + gateway, it allows the gateway's traffic through with rules (d) + and (e). If the routes to the remote differ for the two + switches, we will not know the MAC address of the alternate + gateway. + + +Action Reproduction +=================== + +It seems likely that many controllers, at least at startup, use the +OpenFlow "flow statistics" request to obtain existing flows, then +compare the flows' actions against the actions that they expect to +find. Before version 1.8.0, Open vSwitch always returned exact, +byte-for-byte copies of the actions that had been added to the flow +table. The current version of Open vSwitch does not always do this in +some exceptional cases. This section lists the exceptions that +controller authors must keep in mind if they compare actual actions +against desired actions in a bytewise fashion: + + - Open vSwitch zeros padding bytes in action structures, + regardless of their values when the flows were added. + + - Open vSwitch "normalizes" the instructions in OpenFlow 1.1 + (and later) in the following way: + + * OVS sorts the instructions into the following order: + Apply-Actions, Clear-Actions, Write-Actions, + Write-Metadata, Goto-Table. + + * OVS drops Apply-Actions instructions that have empty + action lists. + + * OVS drops Write-Actions instructions that have empty + action sets. + +Please report other discrepancies, if you notice any, so that we can +fix or document them. + + +Suggestions +=========== + +Suggestions to improve Open vSwitch are welcome at discuss@openvswitch.org. diff --git a/FAQ b/FAQ deleted file mode 100644 index 9e74a3f46..000000000 --- a/FAQ +++ /dev/null @@ -1,1653 +0,0 @@ - Open vSwitch - -Frequently Asked Questions -========================== - -General -------- - -Q: What is Open vSwitch? - -A: Open vSwitch is a production quality open source software switch - designed to be used as a vswitch in virtualized server - environments. A vswitch forwards traffic between different VMs on - the same physical host and also forwards traffic between VMs and - the physical network. Open vSwitch supports standard management - interfaces (e.g. sFlow, NetFlow, IPFIX, RSPAN, CLI), and is open to - programmatic extension and control using OpenFlow and the OVSDB - management protocol. - - Open vSwitch as designed to be compatible with modern switching - chipsets. This means that it can be ported to existing high-fanout - switches allowing the same flexible control of the physical - infrastructure as the virtual infrastructure. It also means that - Open vSwitch will be able to take advantage of on-NIC switching - chipsets as their functionality matures. - -Q: What virtualization platforms can use Open vSwitch? - -A: Open vSwitch can currently run on any Linux-based virtualization - platform (kernel 2.6.32 and newer), including: KVM, VirtualBox, Xen, - Xen Cloud Platform, XenServer. As of Linux 3.3 it is part of the - mainline kernel. The bulk of the code is written in platform- - independent C and is easily ported to other environments. We welcome - inquires about integrating Open vSwitch with other virtualization - platforms. - -Q: How can I try Open vSwitch? - -A: The Open vSwitch source code can be built on a Linux system. You can - build and experiment with Open vSwitch on any Linux machine. - Packages for various Linux distributions are available on many - platforms, including: Debian, Ubuntu, Fedora. - - You may also download and run a virtualization platform that already - has Open vSwitch integrated. For example, download a recent ISO for - XenServer or Xen Cloud Platform. Be aware that the version - integrated with a particular platform may not be the most recent Open - vSwitch release. - -Q: Does Open vSwitch only work on Linux? - -A: No, Open vSwitch has been ported to a number of different operating - systems and hardware platforms. Most of the development work occurs - on Linux, but the code should be portable to any POSIX system. We've - seen Open vSwitch ported to a number of different platforms, - including FreeBSD, Windows, and even non-POSIX embedded systems. - - By definition, the Open vSwitch Linux kernel module only works on - Linux and will provide the highest performance. However, a userspace - datapath is available that should be very portable. - -Q: What's involved with porting Open vSwitch to a new platform or - switching ASIC? - -A: The PORTING document describes how one would go about porting Open - vSwitch to a new operating system or hardware platform. - -Q: Why would I use Open vSwitch instead of the Linux bridge? - -A: Open vSwitch is specially designed to make it easier to manage VM - network configuration and monitor state spread across many physical - hosts in dynamic virtualized environments. Please see WHY-OVS for a - more detailed description of how Open vSwitch relates to the Linux - Bridge. - -Q: How is Open vSwitch related to distributed virtual switches like the - VMware vNetwork distributed switch or the Cisco Nexus 1000V? - -A: Distributed vswitch applications (e.g., VMware vNetwork distributed - switch, Cisco Nexus 1000V) provide a centralized way to configure and - monitor the network state of VMs that are spread across many physical - hosts. Open vSwitch is not a distributed vswitch itself, rather it - runs on each physical host and supports remote management in a way - that makes it easier for developers of virtualization/cloud - management platforms to offer distributed vswitch capabilities. - - To aid in distribution, Open vSwitch provides two open protocols that - are specially designed for remote management in virtualized network - environments: OpenFlow, which exposes flow-based forwarding state, - and the OVSDB management protocol, which exposes switch port state. - In addition to the switch implementation itself, Open vSwitch - includes tools (ovs-ofctl, ovs-vsctl) that developers can script and - extend to provide distributed vswitch capabilities that are closely - integrated with their virtualization management platform. - -Q: Why doesn't Open vSwitch support distribution? - -A: Open vSwitch is intended to be a useful component for building - flexible network infrastructure. There are many different approaches - to distribution which balance trade-offs between simplicity, - scalability, hardware compatibility, convergence times, logical - forwarding model, etc. The goal of Open vSwitch is to be able to - support all as a primitive building block rather than choose a - particular point in the distributed design space. - -Q: How can I contribute to the Open vSwitch Community? - -A: You can start by joining the mailing lists and helping to answer - questions. You can also suggest improvements to documentation. If - you have a feature or bug you would like to work on, send a mail to - one of the mailing lists: - - http://openvswitch.org/mlists/ - - -Releases --------- - -Q: What does it mean for an Open vSwitch release to be LTS (long-term - support)? - -A: All official releases have been through a comprehensive testing - process and are suitable for production use. Planned releases will - occur several times a year. If a significant bug is identified in an - LTS release, we will provide an updated release that includes the - fix. Releases that are not LTS may not be fixed and may just be - supplanted by the next major release. The current LTS release is - 1.9.x. - -Q: What Linux kernel versions does each Open vSwitch release work with? - -A: The following table lists the Linux kernel versions against which the - given versions of the Open vSwitch kernel module will successfully - build. The Linux kernel versions are upstream kernel versions, so - Linux kernels modified from the upstream sources may not build in - some cases even if they are based on a supported version. This is - most notably true of Red Hat Enterprise Linux (RHEL) kernels, which - are extensively modified from upstream. - - Open vSwitch Linux kernel - ------------ ------------- - 1.4.x 2.6.18 to 3.2 - 1.5.x 2.6.18 to 3.2 - 1.6.x 2.6.18 to 3.2 - 1.7.x 2.6.18 to 3.3 - 1.8.x 2.6.18 to 3.4 - 1.9.x 2.6.18 to 3.8 - 1.10.x 2.6.18 to 3.8 - 1.11.x 2.6.18 to 3.8 - 2.0.x 2.6.32 to 3.10 - 2.1.x 2.6.32 to 3.11 - 2.3.x 2.6.32 to 3.14 - - Open vSwitch userspace should also work with the Linux kernel module - built into Linux 3.3 and later. - - Open vSwitch userspace is not sensitive to the Linux kernel version. - It should build against almost any kernel, certainly against 2.6.32 - and later. - -Q: I get an error like this when I configure Open vSwitch: - - configure: error: Linux kernel in is version , but - version newer than is not supported (please refer to the - FAQ for advice) - - What should I do? - -A: If there is a newer version of Open vSwitch, consider building that - one, because it may support the kernel that you are building - against. (To find out, consult the table in the previous answer.) - - Otherwise, use the Linux kernel module supplied with the kernel - that you are using. All versions of Open vSwitch userspace are - compatible with all versions of the Open vSwitch kernel module, so - this will also work. See also the following question. - -Q: What features are not available in the Open vSwitch kernel datapath - that ships as part of the upstream Linux kernel? - -A: The kernel module in upstream Linux does not include support for - LISP. Work is in progress to add support for LISP to the upstream - Linux version of the Open vSwitch kernel module. For now, if you - need this feature, use the kernel module from the Open vSwitch - distribution instead of the upstream Linux kernel module. - - Certain features require kernel support to function or to have - reasonable performance. If the ovs-vswitchd log file indicates that - a feature is not supported, consider upgrading to a newer upstream - Linux release or using the kernel module paired with the userspace - distribution. - -Q: Why do tunnels not work when using a kernel module other than the - one packaged with Open vSwitch? - -A: Support for tunnels was added to the upstream Linux kernel module - after the rest of Open vSwitch. As a result, some kernels may contain - support for Open vSwitch but not tunnels. The minimum kernel version - that supports each tunnel protocol is: - - Protocol Linux Kernel - -------- ------------ - GRE 3.11 - VXLAN 3.12 - LISP - - If you are using a version of the kernel that is older than the one - listed above, it is still possible to use that tunnel protocol. However, - you must compile and install the kernel module included with the Open - vSwitch distribution rather than the one on your machine. If problems - persist after doing this, check to make sure that the module that is - loaded is the one you expect. - -Q: What features are not available when using the userspace datapath? - -A: Tunnel virtual ports are not supported, as described in the - previous answer. It is also not possible to use queue-related - actions. On Linux kernels before 2.6.39, maximum-sized VLAN packets - may not be transmitted. - -Q: What Linux kernel versions does IPFIX flow monitoring work with? - -A: IPFIX flow monitoring requires the Linux kernel module from Open - vSwitch version 1.10.90 or later. - -Q: Should userspace or kernel be upgraded first to minimize downtime? - - In general, the Open vSwitch userspace should be used with the - kernel version included in the same release or with the version - from upstream Linux. However, when upgrading between two releases - of Open vSwitch it is best to migrate userspace first to reduce - the possibility of incompatibilities. - -Q: What happened to the bridge compatibility feature? - -A: Bridge compatibility was a feature of Open vSwitch 1.9 and earlier. - When it was enabled, Open vSwitch imitated the interface of the - Linux kernel "bridge" module. This allowed users to drop Open - vSwitch into environments designed to use the Linux kernel bridge - module without adapting the environment to use Open vSwitch. - - Open vSwitch 1.10 and later do not support bridge compatibility. - The feature was dropped because version 1.10 adopted a new internal - architecture that made bridge compatibility difficult to maintain. - Now that many environments use OVS directly, it would be rarely - useful in any case. - - To use bridge compatibility, install OVS 1.9 or earlier, including - the accompanying kernel modules (both the main and bridge - compatibility modules), following the instructions that come with - the release. Be sure to start the ovs-brcompatd daemon. - - -Terminology ------------ - -Q: I thought Open vSwitch was a virtual Ethernet switch, but the - documentation keeps talking about bridges. What's a bridge? - -A: In networking, the terms "bridge" and "switch" are synonyms. Open - vSwitch implements an Ethernet switch, which means that it is also - an Ethernet bridge. - -Q: What's a VLAN? - -A: See the "VLAN" section below. - - -Basic Configuration -------------------- - -Q: How do I configure a port as an access port? - -A: Add "tag=VLAN" to your "ovs-vsctl add-port" command. For example, - the following commands configure br0 with eth0 as a trunk port (the - default) and tap0 as an access port for VLAN 9: - - ovs-vsctl add-br br0 - ovs-vsctl add-port br0 eth0 - ovs-vsctl add-port br0 tap0 tag=9 - - If you want to configure an already added port as an access port, - use "ovs-vsctl set", e.g.: - - ovs-vsctl set port tap0 tag=9 - -Q: How do I configure a port as a SPAN port, that is, enable mirroring - of all traffic to that port? - -A: The following commands configure br0 with eth0 and tap0 as trunk - ports. All traffic coming in or going out on eth0 or tap0 is also - mirrored to tap1; any traffic arriving on tap1 is dropped: - - ovs-vsctl add-br br0 - ovs-vsctl add-port br0 eth0 - ovs-vsctl add-port br0 tap0 - ovs-vsctl add-port br0 tap1 \ - -- --id=@p get port tap1 \ - -- --id=@m create mirror name=m0 select-all=true output-port=@p \ - -- set bridge br0 mirrors=@m - - To later disable mirroring, run: - - ovs-vsctl clear bridge br0 mirrors - -Q: Does Open vSwitch support configuring a port in promiscuous mode? - -A: Yes. How you configure it depends on what you mean by "promiscuous - mode": - - - Conventionally, "promiscuous mode" is a feature of a network - interface card. Ordinarily, a NIC passes to the CPU only the - packets actually destined to its host machine. It discards - the rest to avoid wasting memory and CPU cycles. When - promiscuous mode is enabled, however, it passes every packet - to the CPU. On an old-style shared-media or hub-based - network, this allows the host to spy on all packets on the - network. But in the switched networks that are almost - everywhere these days, promiscuous mode doesn't have much - effect, because few packets not destined to a host are - delivered to the host's NIC. - - This form of promiscuous mode is configured in the guest OS of - the VMs on your bridge, e.g. with "ifconfig". - - - The VMware vSwitch uses a different definition of "promiscuous - mode". When you configure promiscuous mode on a VMware vNIC, - the vSwitch sends a copy of every packet received by the - vSwitch to that vNIC. That has a much bigger effect than just - enabling promiscuous mode in a guest OS. Rather than getting - a few stray packets for which the switch does not yet know the - correct destination, the vNIC gets every packet. The effect - is similar to replacing the vSwitch by a virtual hub. - - This "promiscuous mode" is what switches normally call "port - mirroring" or "SPAN". For information on how to configure - SPAN, see "How do I configure a port as a SPAN port, that is, - enable mirroring of all traffic to that port?" - -Q: How do I configure a VLAN as an RSPAN VLAN, that is, enable - mirroring of all traffic to that VLAN? - -A: The following commands configure br0 with eth0 as a trunk port and - tap0 as an access port for VLAN 10. All traffic coming in or going - out on tap0, as well as traffic coming in or going out on eth0 in - VLAN 10, is also mirrored to VLAN 15 on eth0. The original tag for - VLAN 10, in cases where one is present, is dropped as part of - mirroring: - - ovs-vsctl add-br br0 - ovs-vsctl add-port br0 eth0 - ovs-vsctl add-port br0 tap0 tag=10 - ovs-vsctl \ - -- --id=@m create mirror name=m0 select-all=true select-vlan=10 \ - output-vlan=15 \ - -- set bridge br0 mirrors=@m - - To later disable mirroring, run: - - ovs-vsctl clear bridge br0 mirrors - - Mirroring to a VLAN can disrupt a network that contains unmanaged - switches. See ovs-vswitchd.conf.db(5) for details. Mirroring to a - GRE tunnel has fewer caveats than mirroring to a VLAN and should - generally be preferred. - -Q: Can I mirror more than one input VLAN to an RSPAN VLAN? - -A: Yes, but mirroring to a VLAN strips the original VLAN tag in favor - of the specified output-vlan. This loss of information may make - the mirrored traffic too hard to interpret. - - To mirror multiple VLANs, use the commands above, but specify a - comma-separated list of VLANs as the value for select-vlan. To - mirror every VLAN, use the commands above, but omit select-vlan and - its value entirely. - - When a packet arrives on a VLAN that is used as a mirror output - VLAN, the mirror is disregarded. Instead, in standalone mode, OVS - floods the packet across all the ports for which the mirror output - VLAN is configured. (If an OpenFlow controller is in use, then it - can override this behavior through the flow table.) If OVS is used - as an intermediate switch, rather than an edge switch, this ensures - that the RSPAN traffic is distributed through the network. - - Mirroring to a VLAN can disrupt a network that contains unmanaged - switches. See ovs-vswitchd.conf.db(5) for details. Mirroring to a - GRE tunnel has fewer caveats than mirroring to a VLAN and should - generally be preferred. - -Q: How do I configure mirroring of all traffic to a GRE tunnel? - -A: The following commands configure br0 with eth0 and tap0 as trunk - ports. All traffic coming in or going out on eth0 or tap0 is also - mirrored to gre0, a GRE tunnel to the remote host 192.168.1.10; any - traffic arriving on gre0 is dropped: - - ovs-vsctl add-br br0 - ovs-vsctl add-port br0 eth0 - ovs-vsctl add-port br0 tap0 - ovs-vsctl add-port br0 gre0 \ - -- set interface gre0 type=gre options:remote_ip=192.168.1.10 \ - -- --id=@p get port gre0 \ - -- --id=@m create mirror name=m0 select-all=true output-port=@p \ - -- set bridge br0 mirrors=@m - - To later disable mirroring and destroy the GRE tunnel: - - ovs-vsctl clear bridge br0 mirrors - ovs-vcstl del-port br0 gre0 - -Q: Does Open vSwitch support ERSPAN? - -A: No. ERSPAN is an undocumented proprietary protocol. As an - alternative, Open vSwitch supports mirroring to a GRE tunnel (see - above). - -Q: How do I connect two bridges? - -A: First, why do you want to do this? Two connected bridges are not - much different from a single bridge, so you might as well just have - a single bridge with all your ports on it. - - If you still want to connect two bridges, you can use a pair of - patch ports. The following example creates bridges br0 and br1, - adds eth0 and tap0 to br0, adds tap1 to br1, and then connects br0 - and br1 with a pair of patch ports. - - ovs-vsctl add-br br0 - ovs-vsctl add-port br0 eth0 - ovs-vsctl add-port br0 tap0 - ovs-vsctl add-br br1 - ovs-vsctl add-port br1 tap1 - ovs-vsctl \ - -- add-port br0 patch0 \ - -- set interface patch0 type=patch options:peer=patch1 \ - -- add-port br1 patch1 \ - -- set interface patch1 type=patch options:peer=patch0 - - Bridges connected with patch ports are much like a single bridge. - For instance, if the example above also added eth1 to br1, and both - eth0 and eth1 happened to be connected to the same next-hop switch, - then you could loop your network just as you would if you added - eth0 and eth1 to the same bridge (see the "Configuration Problems" - section below for more information). - - If you are using Open vSwitch 1.9 or an earlier version, then you - need to be using the kernel module bundled with Open vSwitch rather - than the one that is integrated into Linux 3.3 and later, because - Open vSwitch 1.9 and earlier versions need kernel support for patch - ports. This also means that in Open vSwitch 1.9 and earlier, patch - ports will not work with the userspace datapath, only with the - kernel module. - -Q: How do I configure a bridge without an OpenFlow local port? - (Local port in the sense of OFPP_LOCAL) - -A: Open vSwitch does not support such a configuration. - Bridges always have their local ports. - - -Implementation Details ----------------------- - -Q: I hear OVS has a couple of kinds of flows. Can you tell me about them? - -A: Open vSwitch uses different kinds of flows for different purposes: - - - OpenFlow flows are the most important kind of flow. OpenFlow - controllers use these flows to define a switch's policy. - OpenFlow flows support wildcards, priorities, and multiple - tables. - - When in-band control is in use, Open vSwitch sets up a few - "hidden" flows, with priority higher than a controller or the - user can configure, that are not visible via OpenFlow. (See - the "Controller" section of the FAQ for more information - about hidden flows.) - - - The Open vSwitch software switch implementation uses a second - kind of flow internally. These flows, called "datapath" or - "kernel" flows, do not support priorities and comprise only a - single table, which makes them suitable for caching. (Like - OpenFlow flows, datapath flows do support wildcarding, in Open - vSwitch 1.11 and later.) OpenFlow flows and datapath flows - also support different actions and number ports differently. - - Datapath flows are an implementation detail that is subject to - change in future versions of Open vSwitch. Even with the - current version of Open vSwitch, hardware switch - implementations do not necessarily use this architecture. - - Users and controllers directly control only the OpenFlow flow - table. Open vSwitch manages the datapath flow table itself, so - users should not normally be concerned with it. - -Q: Why are there so many different ways to dump flows? - -A: Open vSwitch has two kinds of flows (see the previous question), so - it has commands with different purposes for dumping each kind of - flow: - - - "ovs-ofctl dump-flows
" dumps OpenFlow flows, excluding - hidden flows. This is the most commonly useful form of flow - dump. (Unlike the other commands, this should work with any - OpenFlow switch, not just Open vSwitch.) - - - "ovs-appctl bridge/dump-flows
" dumps OpenFlow flows, - including hidden flows. This is occasionally useful for - troubleshooting suspected issues with in-band control. - - - "ovs-dpctl dump-flows [dp]" dumps the datapath flow table - entries for a Linux kernel-based datapath. In Open vSwitch - 1.10 and later, ovs-vswitchd merges multiple switches into a - single datapath, so it will show all the flows on all your - kernel-based switches. This command can occasionally be - useful for debugging. - - - "ovs-appctl dpif/dump-flows
", new in Open vSwitch 1.10, - dumps datapath flows for only the specified bridge, regardless - of the type. - -Q: How does multicast snooping works with VLANs? - -A: Open vSwitch maintains snooping tables for each VLAN. - - -Performance ------------ - -Q: I just upgraded and I see a performance drop. Why? - -A: The OVS kernel datapath may have been updated to a newer version than - the OVS userspace components. Sometimes new versions of OVS kernel - module add functionality that is backwards compatible with older - userspace components but may cause a drop in performance with them. - Especially, if a kernel module from OVS 2.1 or newer is paired with - OVS userspace 1.10 or older, there will be a performance drop for - TCP traffic. - - Updating the OVS userspace components to the latest released - version should fix the performance degradation. - - To get the best possible performance and functionality, it is - recommended to pair the same versions of the kernel module and OVS - userspace. - - -Configuration Problems ----------------------- - -Q: I created a bridge and added my Ethernet port to it, using commands - like these: - - ovs-vsctl add-br br0 - ovs-vsctl add-port br0 eth0 - - and as soon as I ran the "add-port" command I lost all connectivity - through eth0. Help! - -A: A physical Ethernet device that is part of an Open vSwitch bridge - should not have an IP address. If one does, then that IP address - will not be fully functional. - - You can restore functionality by moving the IP address to an Open - vSwitch "internal" device, such as the network device named after - the bridge itself. For example, assuming that eth0's IP address is - 192.168.128.5, you could run the commands below to fix up the - situation: - - ifconfig eth0 0.0.0.0 - ifconfig br0 192.168.128.5 - - (If your only connection to the machine running OVS is through the - IP address in question, then you would want to run all of these - commands on a single command line, or put them into a script.) If - there were any additional routes assigned to eth0, then you would - also want to use commands to adjust these routes to go through br0. - - If you use DHCP to obtain an IP address, then you should kill the - DHCP client that was listening on the physical Ethernet interface - (e.g. eth0) and start one listening on the internal interface - (e.g. br0). You might still need to manually clear the IP address - from the physical interface (e.g. with "ifconfig eth0 0.0.0.0"). - - There is no compelling reason why Open vSwitch must work this way. - However, this is the way that the Linux kernel bridge module has - always worked, so it's a model that those accustomed to Linux - bridging are already used to. Also, the model that most people - expect is not implementable without kernel changes on all the - versions of Linux that Open vSwitch supports. - - By the way, this issue is not specific to physical Ethernet - devices. It applies to all network devices except Open vSwitch - "internal" devices. - -Q: I created a bridge and added a couple of Ethernet ports to it, - using commands like these: - - ovs-vsctl add-br br0 - ovs-vsctl add-port br0 eth0 - ovs-vsctl add-port br0 eth1 - - and now my network seems to have melted: connectivity is unreliable - (even connectivity that doesn't go through Open vSwitch), all the - LEDs on my physical switches are blinking, wireshark shows - duplicated packets, and CPU usage is very high. - -A: More than likely, you've looped your network. Probably, eth0 and - eth1 are connected to the same physical Ethernet switch. This - yields a scenario where OVS receives a broadcast packet on eth0 and - sends it out on eth1, then the physical switch connected to eth1 - sends the packet back on eth0, and so on forever. More complicated - scenarios, involving a loop through multiple switches, are possible - too. - - The solution depends on what you are trying to do: - - - If you added eth0 and eth1 to get higher bandwidth or higher - reliability between OVS and your physical Ethernet switch, - use a bond. The following commands create br0 and then add - eth0 and eth1 as a bond: - - ovs-vsctl add-br br0 - ovs-vsctl add-bond br0 bond0 eth0 eth1 - - Bonds have tons of configuration options. Please read the - documentation on the Port table in ovs-vswitchd.conf.db(5) - for all the details. - - - Perhaps you don't actually need eth0 and eth1 to be on the - same bridge. For example, if you simply want to be able to - connect each of them to virtual machines, then you can put - each of them on a bridge of its own: - - ovs-vsctl add-br br0 - ovs-vsctl add-port br0 eth0 - - ovs-vsctl add-br br1 - ovs-vsctl add-port br1 eth1 - - and then connect VMs to br0 and br1. (A potential - disadvantage is that traffic cannot directly pass between br0 - and br1. Instead, it will go out eth0 and come back in eth1, - or vice versa.) - - - If you have a redundant or complex network topology and you - want to prevent loops, turn on spanning tree protocol (STP). - The following commands create br0, enable STP, and add eth0 - and eth1 to the bridge. The order is important because you - don't want have to have a loop in your network even - transiently: - - ovs-vsctl add-br br0 - ovs-vsctl set bridge br0 stp_enable=true - ovs-vsctl add-port br0 eth0 - ovs-vsctl add-port br0 eth1 - - The Open vSwitch implementation of STP is not well tested. - Please report any bugs you observe, but if you'd rather avoid - acting as a beta tester then another option might be your - best shot. - -Q: I can't seem to use Open vSwitch in a wireless network. - -A: Wireless base stations generally only allow packets with the source - MAC address of NIC that completed the initial handshake. - Therefore, without MAC rewriting, only a single device can - communicate over a single wireless link. - - This isn't specific to Open vSwitch, it's enforced by the access - point, so the same problems will show up with the Linux bridge or - any other way to do bridging. - -Q: I can't seem to add my PPP interface to an Open vSwitch bridge. - -A: PPP most commonly carries IP packets, but Open vSwitch works only - with Ethernet frames. The correct way to interface PPP to an - Ethernet network is usually to use routing instead of switching. - -Q: Is there any documentation on the database tables and fields? - -A: Yes. ovs-vswitchd.conf.db(5) is a comprehensive reference. - -Q: When I run ovs-dpctl I no longer see the bridges I created. Instead, - I only see a datapath called "ovs-system". How can I see datapath - information about a particular bridge? - -A: In version 1.9.0, OVS switched to using a single datapath that is - shared by all bridges of that type. The "ovs-appctl dpif/*" - commands provide similar functionality that is scoped by the bridge. - -Q: I created a GRE port using ovs-vsctl so why can't I send traffic or - see the port in the datapath? - -A: On Linux kernels before 3.11, the OVS GRE module and Linux GRE module - cannot be loaded at the same time. It is likely that on your system the - Linux GRE module is already loaded and blocking OVS (to confirm, check - dmesg for errors regarding GRE registration). To fix this, unload all - GRE modules that appear in lsmod as well as the OVS kernel module. You - can then reload the OVS module following the directions in INSTALL, - which will ensure that dependencies are satisfied. - -Q: Open vSwitch does not seem to obey my packet filter rules. - -A: It depends on mechanisms and configurations you want to use. - - You cannot usefully use typical packet filters, like iptables, on - physical Ethernet ports that you add to an Open vSwitch bridge. - This is because Open vSwitch captures packets from the interface at - a layer lower below where typical packet-filter implementations - install their hooks. (This actually applies to any interface of - type "system" that you might add to an Open vSwitch bridge.) - - You can usefully use typical packet filters on Open vSwitch - internal ports as they are mostly ordinary interfaces from the point - of view of packet filters. - - For example, suppose you create a bridge br0 and add Ethernet port - eth0 to it. Then you can usefully add iptables rules to affect the - internal interface br0, but not the physical interface eth0. (br0 - is also where you would add an IP address, as discussed elsewhere - in the FAQ.) - - For simple filtering rules, it might be possible to achieve similar - results by installing appropriate OpenFlow flows instead. - - If the use of a particular packet filter setup is essential, Open - vSwitch might not be the best choice for you. On Linux, you might - want to consider using the Linux Bridge. (This is the only choice if - you want to use ebtables rules.) On NetBSD, you might want to - consider using the bridge(4) with BRIDGE_IPF option. - -Q: It seems that Open vSwitch does nothing when I removed a port and - then immediately put it back. For example, consider that p1 is - a port of type=internal: - - ovs-vsctl del-port br0 p1 -- \ - add-port br0 p1 -- \ - set interface p1 type=internal - -A: It's an expected behaviour. - - If del-port and add-port happen in a single OVSDB transaction as - your example, Open vSwitch always "skips" the intermediate steps. - Even if they are done in multiple transactions, it's still allowed - for Open vSwitch to skip the intermediate steps and just implement - the overall effect. In both cases, your example would be turned - into a no-op. - - If you want to make Open vSwitch actually destroy and then re-create - the port for some side effects like resetting kernel setting for the - corresponding interface, you need to separate operations into multiple - OVSDB transactions and ensure that at least the first one does not have - --no-wait. In the following example, the first ovs-vsctl will block - until Open vSwitch reloads the new configuration and removes the port: - - ovs-vsctl del-port br0 p1 - ovs-vsctl add-port br0 p1 -- \ - set interface p1 type=internal - -Quality of Service (QoS) ------------------------- - -Q: How do I configure Quality of Service (QoS)? - -A: Suppose that you want to set up bridge br0 connected to physical - Ethernet port eth0 (a 1 Gbps device) and virtual machine interfaces - vif1.0 and vif2.0, and that you want to limit traffic from vif1.0 - to eth0 to 10 Mbps and from vif2.0 to eth0 to 20 Mbps. Then, you - could configure the bridge this way: - - ovs-vsctl -- \ - add-br br0 -- \ - add-port br0 eth0 -- \ - add-port br0 vif1.0 -- set interface vif1.0 ofport_request=5 -- \ - add-port br0 vif2.0 -- set interface vif2.0 ofport_request=6 -- \ - set port eth0 qos=@newqos -- \ - --id=@newqos create qos type=linux-htb \ - other-config:max-rate=1000000000 \ - queues:123=@vif10queue \ - queues:234=@vif20queue -- \ - --id=@vif10queue create queue other-config:max-rate=10000000 -- \ - --id=@vif20queue create queue other-config:max-rate=20000000 - - At this point, bridge br0 is configured with the ports and eth0 is - configured with the queues that you need for QoS, but nothing is - actually directing packets from vif1.0 or vif2.0 to the queues that - we have set up for them. That means that all of the packets to - eth0 are going to the "default queue", which is not what we want. - - We use OpenFlow to direct packets from vif1.0 and vif2.0 to the - queues reserved for them: - - ovs-ofctl add-flow br0 in_port=5,actions=set_queue:123,normal - ovs-ofctl add-flow br0 in_port=6,actions=set_queue:234,normal - - Each of the above flows matches on the input port, sets up the - appropriate queue (123 for vif1.0, 234 for vif2.0), and then - executes the "normal" action, which performs the same switching - that Open vSwitch would have done without any OpenFlow flows being - present. (We know that vif1.0 and vif2.0 have OpenFlow port - numbers 5 and 6, respectively, because we set their ofport_request - columns above. If we had not done that, then we would have needed - to find out their port numbers before setting up these flows.) - - Now traffic going from vif1.0 or vif2.0 to eth0 should be - rate-limited. - - By the way, if you delete the bridge created by the above commands, - with: - - ovs-vsctl del-br br0 - - then that will leave one unreferenced QoS record and two - unreferenced Queue records in the Open vSwich database. One way to - clear them out, assuming you don't have other QoS or Queue records - that you want to keep, is: - - ovs-vsctl -- --all destroy QoS -- --all destroy Queue - - If you do want to keep some QoS or Queue records, or the Open - vSwitch you are using is older than version 1.8 (which added the - --all option), then you will have to destroy QoS and Queue records - individually. - -Q: I configured Quality of Service (QoS) in my OpenFlow network by - adding records to the QoS and Queue table, but the results aren't - what I expect. - -A: Did you install OpenFlow flows that use your queues? This is the - primary way to tell Open vSwitch which queues you want to use. If - you don't do this, then the default queue will be used, which will - probably not have the effect you want. - - Refer to the previous question for an example. - -Q: I'd like to take advantage of some QoS feature that Open vSwitch - doesn't yet support. How do I do that? - -A: Open vSwitch does not implement QoS itself. Instead, it can - configure some, but not all, of the QoS features built into the - Linux kernel. If you need some QoS feature that OVS cannot - configure itself, then the first step is to figure out whether - Linux QoS supports that feature. If it does, then you can submit a - patch to support Open vSwitch configuration for that feature, or - you can use "tc" directly to configure the feature in Linux. (If - Linux QoS doesn't support the feature you want, then first you have - to add that support to Linux.) - -Q: I configured QoS, correctly, but my measurements show that it isn't - working as well as I expect. - -A: With the Linux kernel, the Open vSwitch implementation of QoS has - two aspects: - - - Open vSwitch configures a subset of Linux kernel QoS - features, according to what is in OVSDB. It is possible that - this code has bugs. If you believe that this is so, then you - can configure the Linux traffic control (QoS) stack directly - with the "tc" program. If you get better results that way, - you can send a detailed bug report to bugs@openvswitch.org. - - It is certain that Open vSwitch cannot configure every Linux - kernel QoS feature. If you need some feature that OVS cannot - configure, then you can also use "tc" directly (or add that - feature to OVS). - - - The Open vSwitch implementation of OpenFlow allows flows to - be directed to particular queues. This is pretty simple and - unlikely to have serious bugs at this point. - - However, most problems with QoS on Linux are not bugs in Open - vSwitch at all. They tend to be either configuration errors - (please see the earlier questions in this section) or issues with - the traffic control (QoS) stack in Linux. The Open vSwitch - developers are not experts on Linux traffic control. We suggest - that, if you believe you are encountering a problem with Linux - traffic control, that you consult the tc manpages (e.g. tc(8), - tc-htb(8), tc-hfsc(8)), web resources (e.g. http://lartc.org/), or - mailing lists (e.g. http://vger.kernel.org/vger-lists.html#netdev). - -Q: Does Open vSwitch support OpenFlow meters? - -A: Since version 2.0, Open vSwitch has OpenFlow protocol support for - OpenFlow meters. There is no implementation of meters in the Open - vSwitch software switch (neither the kernel-based nor userspace - switches). - - -VLANs ------ - -Q: What's a VLAN? - -A: At the simplest level, a VLAN (short for "virtual LAN") is a way to - partition a single switch into multiple switches. Suppose, for - example, that you have two groups of machines, group A and group B. - You want the machines in group A to be able to talk to each other, - and you want the machine in group B to be able to talk to each - other, but you don't want the machines in group A to be able to - talk to the machines in group B. You can do this with two - switches, by plugging the machines in group A into one switch and - the machines in group B into the other switch. - - If you only have one switch, then you can use VLANs to do the same - thing, by configuring the ports for machines in group A as VLAN - "access ports" for one VLAN and the ports for group B as "access - ports" for a different VLAN. The switch will only forward packets - between ports that are assigned to the same VLAN, so this - effectively subdivides your single switch into two independent - switches, one for each group of machines. - - So far we haven't said anything about VLAN headers. With access - ports, like we've described so far, no VLAN header is present in - the Ethernet frame. This means that the machines (or switches) - connected to access ports need not be aware that VLANs are - involved, just like in the case where we use two different physical - switches. - - Now suppose that you have a whole bunch of switches in your - network, instead of just one, and that some machines in group A are - connected directly to both switches 1 and 2. To allow these - machines to talk to each other, you could add an access port for - group A's VLAN to switch 1 and another to switch 2, and then - connect an Ethernet cable between those ports. That works fine, - but it doesn't scale well as the number of switches and the number - of VLANs increases, because you use up a lot of valuable switch - ports just connecting together your VLANs. - - This is where VLAN headers come in. Instead of using one cable and - two ports per VLAN to connect a pair of switches, we configure a - port on each switch as a VLAN "trunk port". Packets sent and - received on a trunk port carry a VLAN header that says what VLAN - the packet belongs to, so that only two ports total are required to - connect the switches, regardless of the number of VLANs in use. - Normally, only switches (either physical or virtual) are connected - to a trunk port, not individual hosts, because individual hosts - don't expect to see a VLAN header in the traffic that they receive. - - None of the above discussion says anything about particular VLAN - numbers. This is because VLAN numbers are completely arbitrary. - One must only ensure that a given VLAN is numbered consistently - throughout a network and that different VLANs are given different - numbers. (That said, VLAN 0 is usually synonymous with a packet - that has no VLAN header, and VLAN 4095 is reserved.) - -Q: VLANs don't work. - -A: Many drivers in Linux kernels before version 3.3 had VLAN-related - bugs. If you are having problems with VLANs that you suspect to be - driver related, then you have several options: - - - Upgrade to Linux 3.3 or later. - - - Build and install a fixed version of the particular driver - that is causing trouble, if one is available. - - - Use a NIC whose driver does not have VLAN problems. - - - Use "VLAN splinters", a feature in Open vSwitch 1.4 and later - that works around bugs in kernel drivers. To enable VLAN - splinters on interface eth0, use the command: - - ovs-vsctl set interface eth0 other-config:enable-vlan-splinters=true - - For VLAN splinters to be effective, Open vSwitch must know - which VLANs are in use. See the "VLAN splinters" section in - the Interface table in ovs-vswitchd.conf.db(5) for details on - how Open vSwitch infers in-use VLANs. - - VLAN splinters increase memory use and reduce performance, so - use them only if needed. - - - Apply the "vlan workaround" patch from the XenServer kernel - patch queue, build Open vSwitch against this patched kernel, - and then use ovs-vlan-bug-workaround(8) to enable the VLAN - workaround for each interface whose driver is buggy. - - (This is a nontrivial exercise, so this option is included - only for completeness.) - - It is not always easy to tell whether a Linux kernel driver has - buggy VLAN support. The ovs-vlan-test(8) and ovs-test(8) utilities - can help you test. See their manpages for details. Of the two - utilities, ovs-test(8) is newer and more thorough, but - ovs-vlan-test(8) may be easier to use. - -Q: VLANs still don't work. I've tested the driver so I know that it's OK. - -A: Do you have VLANs enabled on the physical switch that OVS is - attached to? Make sure that the port is configured to trunk the - VLAN or VLANs that you are using with OVS. - -Q: Outgoing VLAN-tagged traffic goes through OVS to my physical switch - and to its destination host, but OVS seems to drop incoming return - traffic. - -A: It's possible that you have the VLAN configured on your physical - switch as the "native" VLAN. In this mode, the switch treats - incoming packets either tagged with the native VLAN or untagged as - part of the native VLAN. It may also send outgoing packets in the - native VLAN without a VLAN tag. - - If this is the case, you have two choices: - - - Change the physical switch port configuration to tag packets - it forwards to OVS with the native VLAN instead of forwarding - them untagged. - - - Change the OVS configuration for the physical port to a - native VLAN mode. For example, the following sets up a - bridge with port eth0 in "native-tagged" mode in VLAN 9: - - ovs-vsctl add-br br0 - ovs-vsctl add-port br0 eth0 tag=9 vlan_mode=native-tagged - - In this situation, "native-untagged" mode will probably work - equally well. Refer to the documentation for the Port table - in ovs-vswitchd.conf.db(5) for more information. - -Q: I added a pair of VMs on different VLANs, like this: - - ovs-vsctl add-br br0 - ovs-vsctl add-port br0 eth0 - ovs-vsctl add-port br0 tap0 tag=9 - ovs-vsctl add-port br0 tap1 tag=10 - - but the VMs can't access each other, the external network, or the - Internet. - -A: It is to be expected that the VMs can't access each other. VLANs - are a means to partition a network. When you configured tap0 and - tap1 as access ports for different VLANs, you indicated that they - should be isolated from each other. - - As for the external network and the Internet, it seems likely that - the machines you are trying to access are not on VLAN 9 (or 10) and - that the Internet is not available on VLAN 9 (or 10). - -Q: I added a pair of VMs on the same VLAN, like this: - - ovs-vsctl add-br br0 - ovs-vsctl add-port br0 eth0 - ovs-vsctl add-port br0 tap0 tag=9 - ovs-vsctl add-port br0 tap1 tag=9 - - The VMs can access each other, but not the external network or the - Internet. - -A: It seems likely that the machines you are trying to access in the - external network are not on VLAN 9 and that the Internet is not - available on VLAN 9. Also, ensure VLAN 9 is set up as an allowed - trunk VLAN on the upstream switch port to which eth0 is connected. - -Q: Can I configure an IP address on a VLAN? - -A: Yes. Use an "internal port" configured as an access port. For - example, the following configures IP address 192.168.0.7 on VLAN 9. - That is, OVS will forward packets from eth0 to 192.168.0.7 only if - they have an 802.1Q header with VLAN 9. Conversely, traffic - forwarded from 192.168.0.7 to eth0 will be tagged with an 802.1Q - header with VLAN 9: - - ovs-vsctl add-br br0 - ovs-vsctl add-port br0 eth0 - ovs-vsctl add-port br0 vlan9 tag=9 -- set interface vlan9 type=internal - ifconfig vlan9 192.168.0.7 - - See also the following question. - -Q: I configured one IP address on VLAN 0 and another on VLAN 9, like - this: - - ovs-vsctl add-br br0 - ovs-vsctl add-port br0 eth0 - ifconfig br0 192.168.0.5 - ovs-vsctl add-port br0 vlan9 tag=9 -- set interface vlan9 type=internal - ifconfig vlan9 192.168.0.9 - - but other hosts that are only on VLAN 0 can reach the IP address - configured on VLAN 9. What's going on? - -A: RFC 1122 section 3.3.4.2 "Multihoming Requirements" describes two - approaches to IP address handling in Internet hosts: - - - In the "Strong ES Model", where an ES is a host ("End - System"), an IP address is primarily associated with a - particular interface. The host discards packets that arrive - on interface A if they are destined for an IP address that is - configured on interface B. The host never sends packets from - interface A using a source address configured on interface B. - - - In the "Weak ES Model", an IP address is primarily associated - with a host. The host accepts packets that arrive on any - interface if they are destined for any of the host's IP - addresses, even if the address is configured on some - interface other than the one on which it arrived. The host - does not restrict itself to sending packets from an IP - address associated with the originating interface. - - Linux uses the weak ES model. That means that when packets - destined to the VLAN 9 IP address arrive on eth0 and are bridged to - br0, the kernel IP stack accepts them there for the VLAN 9 IP - address, even though they were not received on vlan9, the network - device for vlan9. - - To simulate the strong ES model on Linux, one may add iptables rule - to filter packets based on source and destination address and - adjust ARP configuration with sysctls. - - BSD uses the strong ES model. - -Q: My OpenFlow controller doesn't see the VLANs that I expect. - -A: The configuration for VLANs in the Open vSwitch database (e.g. via - ovs-vsctl) only affects traffic that goes through Open vSwitch's - implementation of the OpenFlow "normal switching" action. By - default, when Open vSwitch isn't connected to a controller and - nothing has been manually configured in the flow table, all traffic - goes through the "normal switching" action. But, if you set up - OpenFlow flows on your own, through a controller or using ovs-ofctl - or through other means, then you have to implement VLAN handling - yourself. - - You can use "normal switching" as a component of your OpenFlow - actions, e.g. by putting "normal" into the lists of actions on - ovs-ofctl or by outputting to OFPP_NORMAL from an OpenFlow - controller. In situations where this is not suitable, you can - implement VLAN handling yourself, e.g.: - - - If a packet comes in on an access port, and the flow table - needs to send it out on a trunk port, then the flow can add - the appropriate VLAN tag with the "mod_vlan_vid" action. - - - If a packet comes in on a trunk port, and the flow table - needs to send it out on an access port, then the flow can - strip the VLAN tag with the "strip_vlan" action. - -Q: I configured ports on a bridge as access ports with different VLAN - tags, like this: - - ovs-vsctl add-br br0 - ovs-vsctl set-controller br0 tcp:192.168.0.10:6633 - ovs-vsctl add-port br0 eth0 - ovs-vsctl add-port br0 tap0 tag=9 - ovs-vsctl add-port br0 tap1 tag=10 - - but the VMs running behind tap0 and tap1 can still communicate, - that is, they are not isolated from each other even though they are - on different VLANs. - -A: Do you have a controller configured on br0 (as the commands above - do)? If so, then this is a variant on the previous question, "My - OpenFlow controller doesn't see the VLANs that I expect," and you - can refer to the answer there for more information. - -Q: How MAC learning works with VLANs? - -A: Open vSwitch implements Independent VLAN Learning (IVL) for - OFPP_NORMAL action. I.e. it logically has separate learning tables - for each VLANs. - - -VXLANs ------ - -Q: What's a VXLAN? - -A: VXLAN stands for Virtual eXtensible Local Area Network, and is a means - to solve the scaling challenges of VLAN networks in a multi-tenant - environment. VXLAN is an overlay network which transports an L2 network - over an existing L3 network. For more information on VXLAN, please see - the IETF draft available here: - - http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03 - -Q: How much of the VXLAN protocol does Open vSwitch currently support? - -A: Open vSwitch currently supports the framing format for packets on the - wire. There is currently no support for the multicast aspects of VXLAN. - To get around the lack of multicast support, it is possible to - pre-provision MAC to IP address mappings either manually or from a - controller. - -Q: What destination UDP port does the VXLAN implementation in Open vSwitch - use? - -A: By default, Open vSwitch will use the assigned IANA port for VXLAN, which - is 4789. However, it is possible to configure the destination UDP port - manually on a per-VXLAN tunnel basis. An example of this configuration is - provided below. - - ovs-vsctl add-br br0 - ovs-vsctl add-port br0 vxlan1 -- set interface vxlan1 - type=vxlan options:remote_ip=192.168.1.2 options:key=flow - options:dst_port=8472 - - -Using OpenFlow (Manually or Via Controller) -------------------------------------------- - -Q: What versions of OpenFlow does Open vSwitch support? - -A: The following table lists the versions of OpenFlow supported by - each version of Open vSwitch: - - Open vSwitch OF1.0 OF1.1 OF1.2 OF1.3 OF1.4 OF1.5 - =============== ===== ===== ===== ===== ===== ===== - 1.9 and earlier yes --- --- --- --- --- - 1.10 yes --- [*] [*] --- --- - 1.11 yes --- [*] [*] --- --- - 2.0 yes [*] [*] [*] --- --- - 2.1 yes [*] [*] [*] --- --- - 2.2 yes [*] [*] [*] [%] [*] - 2.3 yes yes yes yes [*] [*] - - [*] Supported, with one or more missing features. - [%] Experimental, unsafe implementation. - - Open vSwitch 2.3 enables OpenFlow 1.0, 1.1, 1.2, and 1.3 by default - in ovs-vswitchd. In Open vSwitch 1.10 through 2.2, OpenFlow 1.1, - 1.2, and 1.3 must be enabled manually in ovs-vswitchd. OpenFlow - 1.4 and 1.5 are also supported, with missing features, in Open - vSwitch 2.3 and later, but not enabled by default. In any case, - the user may override the default: - - - To enable OpenFlow 1.0, 1.1, 1.2, and 1.3 on bridge br0: - - ovs-vsctl set bridge br0 protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13 - - - To enable OpenFlow 1.0, 1.1, 1.2, 1.3, 1.4, and 1.5 on bridge br0: - - ovs-vsctl set bridge br0 protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15 - - - To enable only OpenFlow 1.0 on bridge br0: - - ovs-vsctl set bridge br0 protocols=OpenFlow10 - - All current versions of ovs-ofctl enable only OpenFlow 1.0 by - default. Use the -O option to enable support for later versions of - OpenFlow in ovs-ofctl. For example: - - ovs-ofctl -O OpenFlow13 dump-flows br0 - - (Open vSwitch 2.2 had an experimental implementation of OpenFlow - 1.4 that could cause crashes. We don't recommend enabling it.) - - OPENFLOW-1.1+ in the Open vSwitch source tree tracks support for - OpenFlow 1.1 and later features. When support for OpenFlow 1.4 and - 1.5 is solidly implemented, Open vSwitch will enable those version - by default. Also, the OpenFlow 1.5 specification is still under - development and thus subject to change. - -Q: Does Open vSwitch support MPLS? - -A: Before version 1.11, Open vSwitch did not support MPLS. That is, - these versions can match on MPLS Ethernet types, but they cannot - match, push, or pop MPLS labels, nor can they look past MPLS labels - into the encapsulated packet. - - Open vSwitch versions 1.11, 2.0, and 2.1 have very minimal support - for MPLS. With the userspace datapath only, these versions can - match, push, or pop a single MPLS label, but they still cannot look - past MPLS labels (even after popping them) into the encapsulated - packet. Kernel datapath support is unchanged from earlier - versions. - - Open vSwitch version 2.3 can match, push, or pop up to 3 MPLS - labels. Looking past MPLS labels into the encapsulated packet will - still be unsupported. Both userspace and kernel datapaths will be - supported, but MPLS processing always happens in userspace either - way, so kernel datapath performance will be disappointing. - - Open vSwitch version 2.4 will have kernel support for MPLS, - yielding improved performance. - -Q: I'm getting "error type 45250 code 0". What's that? - -A: This is a Open vSwitch extension to OpenFlow error codes. Open - vSwitch uses this extension when it must report an error to an - OpenFlow controller but no standard OpenFlow error code is - suitable. - - Open vSwitch logs the errors that it sends to controllers, so the - easiest thing to do is probably to look at the ovs-vswitchd log to - find out what the error was. - - If you want to dissect the extended error message yourself, the - format is documented in include/openflow/nicira-ext.h in the Open - vSwitch source distribution. The extended error codes are - documented in lib/ofp-errors.h. - -Q1: Some of the traffic that I'd expect my OpenFlow controller to see - doesn't actually appear through the OpenFlow connection, even - though I know that it's going through. -Q2: Some of the OpenFlow flows that my controller sets up don't seem - to apply to certain traffic, especially traffic between OVS and - the controller itself. - -A: By default, Open vSwitch assumes that OpenFlow controllers are - connected "in-band", that is, that the controllers are actually - part of the network that is being controlled. In in-band mode, - Open vSwitch sets up special "hidden" flows to make sure that - traffic can make it back and forth between OVS and the controllers. - These hidden flows are higher priority than any flows that can be - set up through OpenFlow, and they are not visible through normal - OpenFlow flow table dumps. - - Usually, the hidden flows are desirable and helpful, but - occasionally they can cause unexpected behavior. You can view the - full OpenFlow flow table, including hidden flows, on bridge br0 - with the command: - - ovs-appctl bridge/dump-flows br0 - - to help you debug. The hidden flows are those with priorities - greater than 65535 (the maximum priority that can be set with - OpenFlow). - - The DESIGN file at the top level of the Open vSwitch source - distribution describes the in-band model in detail. - - If your controllers are not actually in-band (e.g. they are on - localhost via 127.0.0.1, or on a separate network), then you should - configure your controllers in "out-of-band" mode. If you have one - controller on bridge br0, then you can configure out-of-band mode - on it with: - - ovs-vsctl set controller br0 connection-mode=out-of-band - -Q: I configured all my controllers for out-of-band control mode but - "ovs-appctl bridge/dump-flows" still shows some hidden flows. - -A: You probably have a remote manager configured (e.g. with "ovs-vsctl - set-manager"). By default, Open vSwitch assumes that managers need - in-band rules set up on every bridge. You can disable these rules - on bridge br0 with: - - ovs-vsctl set bridge br0 other-config:disable-in-band=true - - This actually disables in-band control entirely for the bridge, as - if all the bridge's controllers were configured for out-of-band - control. - -Q: My OpenFlow controller doesn't see the VLANs that I expect. - -A: See answer under "VLANs", above. - -Q: I ran "ovs-ofctl add-flow br0 nw_dst=192.168.0.1,actions=drop" - but I got a funny message like this: - - ofp_util|INFO|normalization changed ofp_match, details: - ofp_util|INFO| pre: nw_dst=192.168.0.1 - ofp_util|INFO|post: - - and when I ran "ovs-ofctl dump-flows br0" I saw that my nw_dst - match had disappeared, so that the flow ends up matching every - packet. - -A: The term "normalization" in the log message means that a flow - cannot match on an L3 field without saying what L3 protocol is in - use. The "ovs-ofctl" command above didn't specify an L3 protocol, - so the L3 field match was dropped. - - In this case, the L3 protocol could be IP or ARP. A correct - command for each possibility is, respectively: - - ovs-ofctl add-flow br0 ip,nw_dst=192.168.0.1,actions=drop - - and - - ovs-ofctl add-flow br0 arp,nw_dst=192.168.0.1,actions=drop - - Similarly, a flow cannot match on an L4 field without saying what - L4 protocol is in use. For example, the flow match "tp_src=1234" - is, by itself, meaningless and will be ignored. Instead, to match - TCP source port 1234, write "tcp,tp_src=1234", or to match UDP - source port 1234, write "udp,tp_src=1234". - -Q: How can I figure out the OpenFlow port number for a given port? - -A: The OFPT_FEATURES_REQUEST message requests an OpenFlow switch to - respond with an OFPT_FEATURES_REPLY that, among other information, - includes a mapping between OpenFlow port names and numbers. From a - command prompt, "ovs-ofctl show br0" makes such a request and - prints the response for switch br0. - - The Interface table in the Open vSwitch database also maps OpenFlow - port names to numbers. To print the OpenFlow port number - associated with interface eth0, run: - - ovs-vsctl get Interface eth0 ofport - - You can print the entire mapping with: - - ovs-vsctl -- --columns=name,ofport list Interface - - but the output mixes together interfaces from all bridges in the - database, so it may be confusing if more than one bridge exists. - - In the Open vSwitch database, ofport value -1 means that the - interface could not be created due to an error. (The Open vSwitch - log should indicate the reason.) ofport value [] (the empty set) - means that the interface hasn't been created yet. The latter is - normally an intermittent condition (unless ovs-vswitchd is not - running). - -Q: I added some flows with my controller or with ovs-ofctl, but when I - run "ovs-dpctl dump-flows" I don't see them. - -A: ovs-dpctl queries a kernel datapath, not an OpenFlow switch. It - won't display the information that you want. You want to use - "ovs-ofctl dump-flows" instead. - -Q: It looks like each of the interfaces in my bonded port shows up - as an individual OpenFlow port. Is that right? - -A: Yes, Open vSwitch makes individual bond interfaces visible as - OpenFlow ports, rather than the bond as a whole. The interfaces - are treated together as a bond for only a few purposes: - - - Sending a packet to the OFPP_NORMAL port. (When an OpenFlow - controller is not configured, this happens implicitly to - every packet.) - - - Mirrors configured for output to a bonded port. - - It would make a lot of sense for Open vSwitch to present a bond as - a single OpenFlow port. If you want to contribute an - implementation of such a feature, please bring it up on the Open - vSwitch development mailing list at dev@openvswitch.org. - -Q: I have a sophisticated network setup involving Open vSwitch, VMs or - multiple hosts, and other components. The behavior isn't what I - expect. Help! - -A: To debug network behavior problems, trace the path of a packet, - hop-by-hop, from its origin in one host to a remote host. If - that's correct, then trace the path of the response packet back to - the origin. - - Usually a simple ICMP echo request and reply ("ping") packet is - good enough. Start by initiating an ongoing "ping" from the origin - host to a remote host. If you are tracking down a connectivity - problem, the "ping" will not display any successful output, but - packets are still being sent. (In this case the packets being sent - are likely ARP rather than ICMP.) - - Tools available for tracing include the following: - - - "tcpdump" and "wireshark" for observing hops across network - devices, such as Open vSwitch internal devices and physical - wires. - - - "ovs-appctl dpif/dump-flows
" in Open vSwitch 1.10 and - later or "ovs-dpctl dump-flows
" in earlier versions. - These tools allow one to observe the actions being taken on - packets in ongoing flows. - - See ovs-vswitchd(8) for "ovs-appctl dpif/dump-flows" - documentation, ovs-dpctl(8) for "ovs-dpctl dump-flows" - documentation, and "Why are there so many different ways to - dump flows?" above for some background. - - - "ovs-appctl ofproto/trace" to observe the logic behind how - ovs-vswitchd treats packets. See ovs-vswitchd(8) for - documentation. You can out more details about a given flow - that "ovs-dpctl dump-flows" displays, by cutting and pasting - a flow from the output into an "ovs-appctl ofproto/trace" - command. - - - SPAN, RSPAN, and ERSPAN features of physical switches, to - observe what goes on at these physical hops. - - Starting at the origin of a given packet, observe the packet at - each hop in turn. For example, in one plausible scenario, you - might: - - 1. "tcpdump" the "eth" interface through which an ARP egresses - a VM, from inside the VM. - - 2. "tcpdump" the "vif" or "tap" interface through which the ARP - ingresses the host machine. - - 3. Use "ovs-dpctl dump-flows" to spot the ARP flow and observe - the host interface through which the ARP egresses the - physical machine. You may need to use "ovs-dpctl show" to - interpret the port numbers. If the output seems surprising, - you can use "ovs-appctl ofproto/trace" to observe details of - how ovs-vswitchd determined the actions in the "ovs-dpctl - dump-flows" output. - - 4. "tcpdump" the "eth" interface through which the ARP egresses - the physical machine. - - 5. "tcpdump" the "eth" interface through which the ARP - ingresses the physical machine, at the remote host that - receives the ARP. - - 6. Use "ovs-dpctl dump-flows" to spot the ARP flow on the - remote host that receives the ARP and observe the VM "vif" - or "tap" interface to which the flow is directed. Again, - "ovs-dpctl show" and "ovs-appctl ofproto/trace" might help. - - 7. "tcpdump" the "vif" or "tap" interface to which the ARP is - directed. - - 8. "tcpdump" the "eth" interface through which the ARP - ingresses a VM, from inside the VM. - - It is likely that during one of these steps you will figure out the - problem. If not, then follow the ARP reply back to the origin, in - reverse. - -Q: How do I make a flow drop packets? - -A: To drop a packet is to receive it without forwarding it. OpenFlow - explicitly specifies forwarding actions. Thus, a flow with an - empty set of actions does not forward packets anywhere, causing - them to be dropped. You can specify an empty set of actions with - "actions=" on the ovs-ofctl command line. For example: - - ovs-ofctl add-flow br0 priority=65535,actions= - - would cause every packet entering switch br0 to be dropped. - - You can write "drop" explicitly if you like. The effect is the - same. Thus, the following command also causes every packet - entering switch br0 to be dropped: - - ovs-ofctl add-flow br0 priority=65535,actions=drop - - "drop" is not an action, either in OpenFlow or Open vSwitch. - Rather, it is only a way to say that there are no actions. - -Q: I added a flow to send packets out the ingress port, like this: - - ovs-ofctl add-flow br0 in_port=2,actions=2 - - but OVS drops the packets instead. - -A: Yes, OpenFlow requires a switch to ignore attempts to send a packet - out its ingress port. The rationale is that dropping these packets - makes it harder to loop the network. Sometimes this behavior can - even be convenient, e.g. it is often the desired behavior in a flow - that forwards a packet to several ports ("floods" the packet). - - Sometimes one really needs to send a packet out its ingress port - ("hairpin"). In this case, output to OFPP_IN_PORT, which in - ovs-ofctl syntax is expressed as just "in_port", e.g.: - - ovs-ofctl add-flow br0 in_port=2,actions=in_port - - This also works in some circumstances where the flow doesn't match - on the input port. For example, if you know that your switch has - five ports numbered 2 through 6, then the following will send every - received packet out every port, even its ingress port: - - ovs-ofctl add-flow br0 actions=2,3,4,5,6,in_port - - or, equivalently: - - ovs-ofctl add-flow br0 actions=all,in_port - - Sometimes, in complicated flow tables with multiple levels of - "resubmit" actions, a flow needs to output to a particular port - that may or may not be the ingress port. It's difficult to take - advantage of OFPP_IN_PORT in this situation. To help, Open vSwitch - provides, as an OpenFlow extension, the ability to modify the - in_port field. Whatever value is currently in the in_port field is - the port to which outputs will be dropped, as well as the - destination for OFPP_IN_PORT. This means that the following will - reliably output to port 2 or to ports 2 through 6, respectively: - - ovs-ofctl add-flow br0 in_port=2,actions=load:0->NXM_OF_IN_PORT[],2 - ovs-ofctl add-flow br0 actions=load:0->NXM_OF_IN_PORT[],2,3,4,5,6 - - If the input port is important, then one may save and restore it on - the stack: - - ovs-ofctl add-flow br0 actions=push:NXM_OF_IN_PORT[],\ - load:0->NXM_OF_IN_PORT[],\ - 2,3,4,5,6,\ - pop:NXM_OF_IN_PORT[] - -Q: My bridge br0 has host 192.168.0.1 on port 1 and host 192.168.0.2 - on port 2. I set up flows to forward only traffic destined to the - other host and drop other traffic, like this: - - priority=5,in_port=1,ip,nw_dst=192.168.0.2,actions=2 - priority=5,in_port=2,ip,nw_dst=192.168.0.1,actions=1 - priority=0,actions=drop - - But it doesn't work--I don't get any connectivity when I do this. - Why? - -A: These flows drop the ARP packets that IP hosts use to establish IP - connectivity over Ethernet. To solve the problem, add flows to - allow ARP to pass between the hosts: - - priority=5,in_port=1,arp,actions=2 - priority=5,in_port=2,arp,actions=1 - - This issue can manifest other ways, too. The following flows that - match on Ethernet addresses instead of IP addresses will also drop - ARP packets, because ARP requests are broadcast instead of being - directed to a specific host: - - priority=5,in_port=1,dl_dst=54:00:00:00:00:02,actions=2 - priority=5,in_port=2,dl_dst=54:00:00:00:00:01,actions=1 - priority=0,actions=drop - - The solution already described above will also work in this case. - It may be better to add flows to allow all multicast and broadcast - traffic: - - priority=5,in_port=1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00,actions=2 - priority=5,in_port=2,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00,actions=1 - -Q: My bridge disconnects from my controller on add-port/del-port. - -A: Reconfiguring your bridge can change your bridge's datapath-id because - Open vSwitch generates datapath-id from the MAC address of one of its ports. - In that case, Open vSwitch disconnects from controllers because there's - no graceful way to notify controllers about the change of datapath-id. - - To avoid the behaviour, you can configure datapath-id manually. - - ovs-vsctl set bridge br0 other-config:datapath-id=0123456789abcdef - - -Development ------------ - -Q: How do I implement a new OpenFlow message? - -A: Add your new message to "enum ofpraw" and "enum ofptype" in - lib/ofp-msgs.h, following the existing pattern. Then recompile and - fix all of the new warnings, implementing new functionality for the - new message as needed. (If you configure with --enable-Werror, as - described in INSTALL, then it is impossible to miss any warnings.) - - If you need to add an OpenFlow vendor extension message for a - vendor that doesn't yet have any extension messages, then you will - also need to edit build-aux/extract-ofp-msgs. - - -Contact -------- - -bugs@openvswitch.org -http://openvswitch.org/ diff --git a/FAQ.md b/FAQ.md new file mode 100644 index 000000000..01228eb85 --- /dev/null +++ b/FAQ.md @@ -0,0 +1,1645 @@ +Frequently Asked Questions +========================== + +Open vSwitch + +General +------- + +### Q: What is Open vSwitch? + +A: Open vSwitch is a production quality open source software switch + designed to be used as a vswitch in virtualized server + environments. A vswitch forwards traffic between different VMs on + the same physical host and also forwards traffic between VMs and + the physical network. Open vSwitch supports standard management + interfaces (e.g. sFlow, NetFlow, IPFIX, RSPAN, CLI), and is open to + programmatic extension and control using OpenFlow and the OVSDB + management protocol. + + Open vSwitch as designed to be compatible with modern switching + chipsets. This means that it can be ported to existing high-fanout + switches allowing the same flexible control of the physical + infrastructure as the virtual infrastructure. It also means that + Open vSwitch will be able to take advantage of on-NIC switching + chipsets as their functionality matures. + +### Q: What virtualization platforms can use Open vSwitch? + +A: Open vSwitch can currently run on any Linux-based virtualization + platform (kernel 2.6.32 and newer), including: KVM, VirtualBox, Xen, + Xen Cloud Platform, XenServer. As of Linux 3.3 it is part of the + mainline kernel. The bulk of the code is written in platform- + independent C and is easily ported to other environments. We welcome + inquires about integrating Open vSwitch with other virtualization + platforms. + +### Q: How can I try Open vSwitch? + +A: The Open vSwitch source code can be built on a Linux system. You can + build and experiment with Open vSwitch on any Linux machine. + Packages for various Linux distributions are available on many + platforms, including: Debian, Ubuntu, Fedora. + + You may also download and run a virtualization platform that already + has Open vSwitch integrated. For example, download a recent ISO for + XenServer or Xen Cloud Platform. Be aware that the version + integrated with a particular platform may not be the most recent Open + vSwitch release. + +### Q: Does Open vSwitch only work on Linux? + +A: No, Open vSwitch has been ported to a number of different operating + systems and hardware platforms. Most of the development work occurs + on Linux, but the code should be portable to any POSIX system. We've + seen Open vSwitch ported to a number of different platforms, + including FreeBSD, Windows, and even non-POSIX embedded systems. + + By definition, the Open vSwitch Linux kernel module only works on + Linux and will provide the highest performance. However, a userspace + datapath is available that should be very portable. + +### Q: What's involved with porting Open vSwitch to a new platform or switching ASIC? + +A: The [PORTING](PORTING.md) document describes how one would go about + porting Open vSwitch to a new operating system or hardware platform. + +### Q: Why would I use Open vSwitch instead of the Linux bridge? + +A: Open vSwitch is specially designed to make it easier to manage VM + network configuration and monitor state spread across many physical + hosts in dynamic virtualized environments. Please see + [WHY-OVS](WHY-OVS.md) for a more detailed description of how Open + vSwitch relates to the Linux Bridge. + +### Q: How is Open vSwitch related to distributed virtual switches like the VMware vNetwork distributed switch or the Cisco Nexus 1000V? + +A: Distributed vswitch applications (e.g., VMware vNetwork distributed + switch, Cisco Nexus 1000V) provide a centralized way to configure and + monitor the network state of VMs that are spread across many physical + hosts. Open vSwitch is not a distributed vswitch itself, rather it + runs on each physical host and supports remote management in a way + that makes it easier for developers of virtualization/cloud + management platforms to offer distributed vswitch capabilities. + + To aid in distribution, Open vSwitch provides two open protocols that + are specially designed for remote management in virtualized network + environments: OpenFlow, which exposes flow-based forwarding state, + and the OVSDB management protocol, which exposes switch port state. + In addition to the switch implementation itself, Open vSwitch + includes tools (ovs-ofctl, ovs-vsctl) that developers can script and + extend to provide distributed vswitch capabilities that are closely + integrated with their virtualization management platform. + +### Q: Why doesn't Open vSwitch support distribution? + +A: Open vSwitch is intended to be a useful component for building + flexible network infrastructure. There are many different approaches + to distribution which balance trade-offs between simplicity, + scalability, hardware compatibility, convergence times, logical + forwarding model, etc. The goal of Open vSwitch is to be able to + support all as a primitive building block rather than choose a + particular point in the distributed design space. + +### Q: How can I contribute to the Open vSwitch Community? + +A: You can start by joining the mailing lists and helping to answer + questions. You can also suggest improvements to documentation. If + you have a feature or bug you would like to work on, send a mail to + one of the mailing lists: + + http://openvswitch.org/mlists/ + + +Releases +-------- + +### Q: What does it mean for an Open vSwitch release to be LTS (long-term support)? + +A: All official releases have been through a comprehensive testing + process and are suitable for production use. Planned releases will + occur several times a year. If a significant bug is identified in an + LTS release, we will provide an updated release that includes the + fix. Releases that are not LTS may not be fixed and may just be + supplanted by the next major release. The current LTS release is + 1.9.x. + +### Q: What Linux kernel versions does each Open vSwitch release work with? + +A: The following table lists the Linux kernel versions against which the + given versions of the Open vSwitch kernel module will successfully + build. The Linux kernel versions are upstream kernel versions, so + Linux kernels modified from the upstream sources may not build in + some cases even if they are based on a supported version. This is + most notably true of Red Hat Enterprise Linux (RHEL) kernels, which + are extensively modified from upstream. + +| Open vSwitch | Linux kernel +|:------------:|:-------------: +| 1.4.x | 2.6.18 to 3.2 +| 1.5.x | 2.6.18 to 3.2 +| 1.6.x | 2.6.18 to 3.2 +| 1.7.x | 2.6.18 to 3.3 +| 1.8.x | 2.6.18 to 3.4 +| 1.9.x | 2.6.18 to 3.8 +| 1.10.x | 2.6.18 to 3.8 +| 1.11.x | 2.6.18 to 3.8 +| 2.0.x | 2.6.32 to 3.10 +| 2.1.x | 2.6.32 to 3.11 +| 2.3.x | 2.6.32 to 3.14 + + Open vSwitch userspace should also work with the Linux kernel module + built into Linux 3.3 and later. + + Open vSwitch userspace is not sensitive to the Linux kernel version. + It should build against almost any kernel, certainly against 2.6.32 + and later. + +### Q: I get an error like this when I configure Open vSwitch: + + configure: error: Linux kernel in is version , but + version newer than is not supported (please refer to the + FAQ for advice) + + What should I do? + +A: If there is a newer version of Open vSwitch, consider building that + one, because it may support the kernel that you are building + against. (To find out, consult the table in the previous answer.) + + Otherwise, use the Linux kernel module supplied with the kernel + that you are using. All versions of Open vSwitch userspace are + compatible with all versions of the Open vSwitch kernel module, so + this will also work. See also the following question. + +### Q: What features are not available in the Open vSwitch kernel datapath that ships as part of the upstream Linux kernel? + +A: The kernel module in upstream Linux does not include support for + LISP. Work is in progress to add support for LISP to the upstream + Linux version of the Open vSwitch kernel module. For now, if you + need this feature, use the kernel module from the Open vSwitch + distribution instead of the upstream Linux kernel module. + + Certain features require kernel support to function or to have + reasonable performance. If the ovs-vswitchd log file indicates that + a feature is not supported, consider upgrading to a newer upstream + Linux release or using the kernel module paired with the userspace + distribution. + +### Q: Why do tunnels not work when using a kernel module other than the one packaged with Open vSwitch? + +A: Support for tunnels was added to the upstream Linux kernel module + after the rest of Open vSwitch. As a result, some kernels may contain + support for Open vSwitch but not tunnels. The minimum kernel version + that supports each tunnel protocol is: + +| Protocol | Linux Kernel +|:--------:|:-------------: +| GRE | 3.11 +| VXLAN | 3.12 +| LISP | + + If you are using a version of the kernel that is older than the one + listed above, it is still possible to use that tunnel protocol. However, + you must compile and install the kernel module included with the Open + vSwitch distribution rather than the one on your machine. If problems + persist after doing this, check to make sure that the module that is + loaded is the one you expect. + +### Q: What features are not available when using the userspace datapath? + +A: Tunnel virtual ports are not supported, as described in the + previous answer. It is also not possible to use queue-related + actions. On Linux kernels before 2.6.39, maximum-sized VLAN packets + may not be transmitted. + +### Q: What Linux kernel versions does IPFIX flow monitoring work with? + +A: IPFIX flow monitoring requires the Linux kernel module from Open + vSwitch version 1.10.90 or later. + +### Q: Should userspace or kernel be upgraded first to minimize downtime? + + In general, the Open vSwitch userspace should be used with the + kernel version included in the same release or with the version + from upstream Linux. However, when upgrading between two releases + of Open vSwitch it is best to migrate userspace first to reduce + the possibility of incompatibilities. + +### Q: What happened to the bridge compatibility feature? + +A: Bridge compatibility was a feature of Open vSwitch 1.9 and earlier. + When it was enabled, Open vSwitch imitated the interface of the + Linux kernel "bridge" module. This allowed users to drop Open + vSwitch into environments designed to use the Linux kernel bridge + module without adapting the environment to use Open vSwitch. + + Open vSwitch 1.10 and later do not support bridge compatibility. + The feature was dropped because version 1.10 adopted a new internal + architecture that made bridge compatibility difficult to maintain. + Now that many environments use OVS directly, it would be rarely + useful in any case. + + To use bridge compatibility, install OVS 1.9 or earlier, including + the accompanying kernel modules (both the main and bridge + compatibility modules), following the instructions that come with + the release. Be sure to start the ovs-brcompatd daemon. + + +Terminology +----------- + +### Q: I thought Open vSwitch was a virtual Ethernet switch, but the documentation keeps talking about bridges. What's a bridge? + +A: In networking, the terms "bridge" and "switch" are synonyms. Open + vSwitch implements an Ethernet switch, which means that it is also + an Ethernet bridge. + +### Q: What's a VLAN? + +A: See the "VLAN" section below. + + +Basic Configuration +------------------- + +### Q: How do I configure a port as an access port? + +A: Add "tag=VLAN" to your "ovs-vsctl add-port" command. For example, + the following commands configure br0 with eth0 as a trunk port (the + default) and tap0 as an access port for VLAN 9: + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 eth0 + ovs-vsctl add-port br0 tap0 tag=9 + + If you want to configure an already added port as an access port, + use "ovs-vsctl set", e.g.: + + ovs-vsctl set port tap0 tag=9 + +### Q: How do I configure a port as a SPAN port, that is, enable mirroring of all traffic to that port? + +A: The following commands configure br0 with eth0 and tap0 as trunk + ports. All traffic coming in or going out on eth0 or tap0 is also + mirrored to tap1; any traffic arriving on tap1 is dropped: + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 eth0 + ovs-vsctl add-port br0 tap0 + ovs-vsctl add-port br0 tap1 \ + -- --id=@p get port tap1 \ + -- --id=@m create mirror name=m0 select-all=true output-port=@p \ + -- set bridge br0 mirrors=@m + + To later disable mirroring, run: + + ovs-vsctl clear bridge br0 mirrors + +### Q: Does Open vSwitch support configuring a port in promiscuous mode? + +A: Yes. How you configure it depends on what you mean by "promiscuous + mode": + + - Conventionally, "promiscuous mode" is a feature of a network + interface card. Ordinarily, a NIC passes to the CPU only the + packets actually destined to its host machine. It discards + the rest to avoid wasting memory and CPU cycles. When + promiscuous mode is enabled, however, it passes every packet + to the CPU. On an old-style shared-media or hub-based + network, this allows the host to spy on all packets on the + network. But in the switched networks that are almost + everywhere these days, promiscuous mode doesn't have much + effect, because few packets not destined to a host are + delivered to the host's NIC. + + This form of promiscuous mode is configured in the guest OS of + the VMs on your bridge, e.g. with "ifconfig". + + - The VMware vSwitch uses a different definition of "promiscuous + mode". When you configure promiscuous mode on a VMware vNIC, + the vSwitch sends a copy of every packet received by the + vSwitch to that vNIC. That has a much bigger effect than just + enabling promiscuous mode in a guest OS. Rather than getting + a few stray packets for which the switch does not yet know the + correct destination, the vNIC gets every packet. The effect + is similar to replacing the vSwitch by a virtual hub. + + This "promiscuous mode" is what switches normally call "port + mirroring" or "SPAN". For information on how to configure + SPAN, see "How do I configure a port as a SPAN port, that is, + enable mirroring of all traffic to that port?" + +### Q: How do I configure a VLAN as an RSPAN VLAN, that is, enable mirroring of all traffic to that VLAN? + +A: The following commands configure br0 with eth0 as a trunk port and + tap0 as an access port for VLAN 10. All traffic coming in or going + out on tap0, as well as traffic coming in or going out on eth0 in + VLAN 10, is also mirrored to VLAN 15 on eth0. The original tag for + VLAN 10, in cases where one is present, is dropped as part of + mirroring: + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 eth0 + ovs-vsctl add-port br0 tap0 tag=10 + ovs-vsctl \ + -- --id=@m create mirror name=m0 select-all=true select-vlan=10 \ + output-vlan=15 \ + -- set bridge br0 mirrors=@m + + To later disable mirroring, run: + + ovs-vsctl clear bridge br0 mirrors + + Mirroring to a VLAN can disrupt a network that contains unmanaged + switches. See ovs-vswitchd.conf.db(5) for details. Mirroring to a + GRE tunnel has fewer caveats than mirroring to a VLAN and should + generally be preferred. + +### Q: Can I mirror more than one input VLAN to an RSPAN VLAN? + +A: Yes, but mirroring to a VLAN strips the original VLAN tag in favor + of the specified output-vlan. This loss of information may make + the mirrored traffic too hard to interpret. + + To mirror multiple VLANs, use the commands above, but specify a + comma-separated list of VLANs as the value for select-vlan. To + mirror every VLAN, use the commands above, but omit select-vlan and + its value entirely. + + When a packet arrives on a VLAN that is used as a mirror output + VLAN, the mirror is disregarded. Instead, in standalone mode, OVS + floods the packet across all the ports for which the mirror output + VLAN is configured. (If an OpenFlow controller is in use, then it + can override this behavior through the flow table.) If OVS is used + as an intermediate switch, rather than an edge switch, this ensures + that the RSPAN traffic is distributed through the network. + + Mirroring to a VLAN can disrupt a network that contains unmanaged + switches. See ovs-vswitchd.conf.db(5) for details. Mirroring to a + GRE tunnel has fewer caveats than mirroring to a VLAN and should + generally be preferred. + +### Q: How do I configure mirroring of all traffic to a GRE tunnel? + +A: The following commands configure br0 with eth0 and tap0 as trunk + ports. All traffic coming in or going out on eth0 or tap0 is also + mirrored to gre0, a GRE tunnel to the remote host 192.168.1.10; any + traffic arriving on gre0 is dropped: + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 eth0 + ovs-vsctl add-port br0 tap0 + ovs-vsctl add-port br0 gre0 \ + -- set interface gre0 type=gre options:remote_ip=192.168.1.10 \ + -- --id=@p get port gre0 \ + -- --id=@m create mirror name=m0 select-all=true output-port=@p \ + -- set bridge br0 mirrors=@m + + To later disable mirroring and destroy the GRE tunnel: + + ovs-vsctl clear bridge br0 mirrors + ovs-vcstl del-port br0 gre0 + +### Q: Does Open vSwitch support ERSPAN? + +A: No. ERSPAN is an undocumented proprietary protocol. As an + alternative, Open vSwitch supports mirroring to a GRE tunnel (see + above). + +### Q: How do I connect two bridges? + +A: First, why do you want to do this? Two connected bridges are not + much different from a single bridge, so you might as well just have + a single bridge with all your ports on it. + + If you still want to connect two bridges, you can use a pair of + patch ports. The following example creates bridges br0 and br1, + adds eth0 and tap0 to br0, adds tap1 to br1, and then connects br0 + and br1 with a pair of patch ports. + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 eth0 + ovs-vsctl add-port br0 tap0 + ovs-vsctl add-br br1 + ovs-vsctl add-port br1 tap1 + ovs-vsctl \ + -- add-port br0 patch0 \ + -- set interface patch0 type=patch options:peer=patch1 \ + -- add-port br1 patch1 \ + -- set interface patch1 type=patch options:peer=patch0 + + Bridges connected with patch ports are much like a single bridge. + For instance, if the example above also added eth1 to br1, and both + eth0 and eth1 happened to be connected to the same next-hop switch, + then you could loop your network just as you would if you added + eth0 and eth1 to the same bridge (see the "Configuration Problems" + section below for more information). + + If you are using Open vSwitch 1.9 or an earlier version, then you + need to be using the kernel module bundled with Open vSwitch rather + than the one that is integrated into Linux 3.3 and later, because + Open vSwitch 1.9 and earlier versions need kernel support for patch + ports. This also means that in Open vSwitch 1.9 and earlier, patch + ports will not work with the userspace datapath, only with the + kernel module. + +### Q: How do I configure a bridge without an OpenFlow local port? (Local port in the sense of OFPP_LOCAL) + +A: Open vSwitch does not support such a configuration. + Bridges always have their local ports. + + +Implementation Details +---------------------- + +### Q: I hear OVS has a couple of kinds of flows. Can you tell me about them? + +A: Open vSwitch uses different kinds of flows for different purposes: + + - OpenFlow flows are the most important kind of flow. OpenFlow + controllers use these flows to define a switch's policy. + OpenFlow flows support wildcards, priorities, and multiple + tables. + + When in-band control is in use, Open vSwitch sets up a few + "hidden" flows, with priority higher than a controller or the + user can configure, that are not visible via OpenFlow. (See + the "Controller" section of the FAQ for more information + about hidden flows.) + + - The Open vSwitch software switch implementation uses a second + kind of flow internally. These flows, called "datapath" or + "kernel" flows, do not support priorities and comprise only a + single table, which makes them suitable for caching. (Like + OpenFlow flows, datapath flows do support wildcarding, in Open + vSwitch 1.11 and later.) OpenFlow flows and datapath flows + also support different actions and number ports differently. + + Datapath flows are an implementation detail that is subject to + change in future versions of Open vSwitch. Even with the + current version of Open vSwitch, hardware switch + implementations do not necessarily use this architecture. + + Users and controllers directly control only the OpenFlow flow + table. Open vSwitch manages the datapath flow table itself, so + users should not normally be concerned with it. + +### Q: Why are there so many different ways to dump flows? + +A: Open vSwitch has two kinds of flows (see the previous question), so + it has commands with different purposes for dumping each kind of + flow: + + - `ovs-ofctl dump-flows
` dumps OpenFlow flows, excluding + hidden flows. This is the most commonly useful form of flow + dump. (Unlike the other commands, this should work with any + OpenFlow switch, not just Open vSwitch.) + + - `ovs-appctl bridge/dump-flows
` dumps OpenFlow flows, + including hidden flows. This is occasionally useful for + troubleshooting suspected issues with in-band control. + + - `ovs-dpctl dump-flows [dp]` dumps the datapath flow table + entries for a Linux kernel-based datapath. In Open vSwitch + 1.10 and later, ovs-vswitchd merges multiple switches into a + single datapath, so it will show all the flows on all your + kernel-based switches. This command can occasionally be + useful for debugging. + + - `ovs-appctl dpif/dump-flows
`, new in Open vSwitch 1.10, + dumps datapath flows for only the specified bridge, regardless + of the type. + +### Q: How does multicast snooping works with VLANs? + +A: Open vSwitch maintains snooping tables for each VLAN. + + +Performance +----------- + +### Q: I just upgraded and I see a performance drop. Why? + +A: The OVS kernel datapath may have been updated to a newer version than + the OVS userspace components. Sometimes new versions of OVS kernel + module add functionality that is backwards compatible with older + userspace components but may cause a drop in performance with them. + Especially, if a kernel module from OVS 2.1 or newer is paired with + OVS userspace 1.10 or older, there will be a performance drop for + TCP traffic. + + Updating the OVS userspace components to the latest released + version should fix the performance degradation. + + To get the best possible performance and functionality, it is + recommended to pair the same versions of the kernel module and OVS + userspace. + + +Configuration Problems +---------------------- + +### Q: I created a bridge and added my Ethernet port to it, using commands + like these: + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 eth0 + + and as soon as I ran the "add-port" command I lost all connectivity + through eth0. Help! + +A: A physical Ethernet device that is part of an Open vSwitch bridge + should not have an IP address. If one does, then that IP address + will not be fully functional. + + You can restore functionality by moving the IP address to an Open + vSwitch "internal" device, such as the network device named after + the bridge itself. For example, assuming that eth0's IP address is + 192.168.128.5, you could run the commands below to fix up the + situation: + + ifconfig eth0 0.0.0.0 + ifconfig br0 192.168.128.5 + + (If your only connection to the machine running OVS is through the + IP address in question, then you would want to run all of these + commands on a single command line, or put them into a script.) If + there were any additional routes assigned to eth0, then you would + also want to use commands to adjust these routes to go through br0. + + If you use DHCP to obtain an IP address, then you should kill the + DHCP client that was listening on the physical Ethernet interface + (e.g. eth0) and start one listening on the internal interface + (e.g. br0). You might still need to manually clear the IP address + from the physical interface (e.g. with "ifconfig eth0 0.0.0.0"). + + There is no compelling reason why Open vSwitch must work this way. + However, this is the way that the Linux kernel bridge module has + always worked, so it's a model that those accustomed to Linux + bridging are already used to. Also, the model that most people + expect is not implementable without kernel changes on all the + versions of Linux that Open vSwitch supports. + + By the way, this issue is not specific to physical Ethernet + devices. It applies to all network devices except Open vSwitch + "internal" devices. + +### Q: I created a bridge and added a couple of Ethernet ports to it, +### using commands like these: + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 eth0 + ovs-vsctl add-port br0 eth1 + + and now my network seems to have melted: connectivity is unreliable + (even connectivity that doesn't go through Open vSwitch), all the + LEDs on my physical switches are blinking, wireshark shows + duplicated packets, and CPU usage is very high. + +A: More than likely, you've looped your network. Probably, eth0 and + eth1 are connected to the same physical Ethernet switch. This + yields a scenario where OVS receives a broadcast packet on eth0 and + sends it out on eth1, then the physical switch connected to eth1 + sends the packet back on eth0, and so on forever. More complicated + scenarios, involving a loop through multiple switches, are possible + too. + + The solution depends on what you are trying to do: + + - If you added eth0 and eth1 to get higher bandwidth or higher + reliability between OVS and your physical Ethernet switch, + use a bond. The following commands create br0 and then add + eth0 and eth1 as a bond: + + ovs-vsctl add-br br0 + ovs-vsctl add-bond br0 bond0 eth0 eth1 + + Bonds have tons of configuration options. Please read the + documentation on the Port table in ovs-vswitchd.conf.db(5) + for all the details. + + - Perhaps you don't actually need eth0 and eth1 to be on the + same bridge. For example, if you simply want to be able to + connect each of them to virtual machines, then you can put + each of them on a bridge of its own: + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 eth0 + + ovs-vsctl add-br br1 + ovs-vsctl add-port br1 eth1 + + and then connect VMs to br0 and br1. (A potential + disadvantage is that traffic cannot directly pass between br0 + and br1. Instead, it will go out eth0 and come back in eth1, + or vice versa.) + + - If you have a redundant or complex network topology and you + want to prevent loops, turn on spanning tree protocol (STP). + The following commands create br0, enable STP, and add eth0 + and eth1 to the bridge. The order is important because you + don't want have to have a loop in your network even + transiently: + + ovs-vsctl add-br br0 + ovs-vsctl set bridge br0 stp_enable=true + ovs-vsctl add-port br0 eth0 + ovs-vsctl add-port br0 eth1 + + The Open vSwitch implementation of STP is not well tested. + Please report any bugs you observe, but if you'd rather avoid + acting as a beta tester then another option might be your + best shot. + +### Q: I can't seem to use Open vSwitch in a wireless network. + +A: Wireless base stations generally only allow packets with the source + MAC address of NIC that completed the initial handshake. + Therefore, without MAC rewriting, only a single device can + communicate over a single wireless link. + + This isn't specific to Open vSwitch, it's enforced by the access + point, so the same problems will show up with the Linux bridge or + any other way to do bridging. + +### Q: I can't seem to add my PPP interface to an Open vSwitch bridge. + +A: PPP most commonly carries IP packets, but Open vSwitch works only + with Ethernet frames. The correct way to interface PPP to an + Ethernet network is usually to use routing instead of switching. + +### Q: Is there any documentation on the database tables and fields? + +A: Yes. ovs-vswitchd.conf.db(5) is a comprehensive reference. + +### Q: When I run ovs-dpctl I no longer see the bridges I created. Instead, + I only see a datapath called "ovs-system". How can I see datapath + information about a particular bridge? + +A: In version 1.9.0, OVS switched to using a single datapath that is + shared by all bridges of that type. The "ovs-appctl dpif/*" + commands provide similar functionality that is scoped by the bridge. + +### Q: I created a GRE port using ovs-vsctl so why can't I send traffic or + see the port in the datapath? + +A: On Linux kernels before 3.11, the OVS GRE module and Linux GRE module + cannot be loaded at the same time. It is likely that on your system the + Linux GRE module is already loaded and blocking OVS (to confirm, check + dmesg for errors regarding GRE registration). To fix this, unload all + GRE modules that appear in lsmod as well as the OVS kernel module. You + can then reload the OVS module following the directions in + [INSTALL](INSTALL.md), which will ensure that dependencies are satisfied. + +### Q: Open vSwitch does not seem to obey my packet filter rules. + +A: It depends on mechanisms and configurations you want to use. + + You cannot usefully use typical packet filters, like iptables, on + physical Ethernet ports that you add to an Open vSwitch bridge. + This is because Open vSwitch captures packets from the interface at + a layer lower below where typical packet-filter implementations + install their hooks. (This actually applies to any interface of + type "system" that you might add to an Open vSwitch bridge.) + + You can usefully use typical packet filters on Open vSwitch + internal ports as they are mostly ordinary interfaces from the point + of view of packet filters. + + For example, suppose you create a bridge br0 and add Ethernet port + eth0 to it. Then you can usefully add iptables rules to affect the + internal interface br0, but not the physical interface eth0. (br0 + is also where you would add an IP address, as discussed elsewhere + in the FAQ.) + + For simple filtering rules, it might be possible to achieve similar + results by installing appropriate OpenFlow flows instead. + + If the use of a particular packet filter setup is essential, Open + vSwitch might not be the best choice for you. On Linux, you might + want to consider using the Linux Bridge. (This is the only choice if + you want to use ebtables rules.) On NetBSD, you might want to + consider using the bridge(4) with BRIDGE_IPF option. + +### Q: It seems that Open vSwitch does nothing when I removed a port and + then immediately put it back. For example, consider that p1 is + a port of type=internal: + + ovs-vsctl del-port br0 p1 -- \ + add-port br0 p1 -- \ + set interface p1 type=internal + +A: It's an expected behaviour. + + If del-port and add-port happen in a single OVSDB transaction as + your example, Open vSwitch always "skips" the intermediate steps. + Even if they are done in multiple transactions, it's still allowed + for Open vSwitch to skip the intermediate steps and just implement + the overall effect. In both cases, your example would be turned + into a no-op. + + If you want to make Open vSwitch actually destroy and then re-create + the port for some side effects like resetting kernel setting for the + corresponding interface, you need to separate operations into multiple + OVSDB transactions and ensure that at least the first one does not have + --no-wait. In the following example, the first ovs-vsctl will block + until Open vSwitch reloads the new configuration and removes the port: + + ovs-vsctl del-port br0 p1 + ovs-vsctl add-port br0 p1 -- \ + set interface p1 type=internal + +Quality of Service (QoS) +------------------------ + +### Q: How do I configure Quality of Service (QoS)? + +A: Suppose that you want to set up bridge br0 connected to physical + Ethernet port eth0 (a 1 Gbps device) and virtual machine interfaces + vif1.0 and vif2.0, and that you want to limit traffic from vif1.0 + to eth0 to 10 Mbps and from vif2.0 to eth0 to 20 Mbps. Then, you + could configure the bridge this way: + + ovs-vsctl -- \ + add-br br0 -- \ + add-port br0 eth0 -- \ + add-port br0 vif1.0 -- set interface vif1.0 ofport_request=5 -- \ + add-port br0 vif2.0 -- set interface vif2.0 ofport_request=6 -- \ + set port eth0 qos=@newqos -- \ + --id=@newqos create qos type=linux-htb \ + other-config:max-rate=1000000000 \ + queues:123=@vif10queue \ + queues:234=@vif20queue -- \ + --id=@vif10queue create queue other-config:max-rate=10000000 -- \ + --id=@vif20queue create queue other-config:max-rate=20000000 + + At this point, bridge br0 is configured with the ports and eth0 is + configured with the queues that you need for QoS, but nothing is + actually directing packets from vif1.0 or vif2.0 to the queues that + we have set up for them. That means that all of the packets to + eth0 are going to the "default queue", which is not what we want. + + We use OpenFlow to direct packets from vif1.0 and vif2.0 to the + queues reserved for them: + + ovs-ofctl add-flow br0 in_port=5,actions=set_queue:123,normal + ovs-ofctl add-flow br0 in_port=6,actions=set_queue:234,normal + + Each of the above flows matches on the input port, sets up the + appropriate queue (123 for vif1.0, 234 for vif2.0), and then + executes the "normal" action, which performs the same switching + that Open vSwitch would have done without any OpenFlow flows being + present. (We know that vif1.0 and vif2.0 have OpenFlow port + numbers 5 and 6, respectively, because we set their ofport_request + columns above. If we had not done that, then we would have needed + to find out their port numbers before setting up these flows.) + + Now traffic going from vif1.0 or vif2.0 to eth0 should be + rate-limited. + + By the way, if you delete the bridge created by the above commands, + with: + + ovs-vsctl del-br br0 + + then that will leave one unreferenced QoS record and two + unreferenced Queue records in the Open vSwich database. One way to + clear them out, assuming you don't have other QoS or Queue records + that you want to keep, is: + + ovs-vsctl -- --all destroy QoS -- --all destroy Queue + + If you do want to keep some QoS or Queue records, or the Open + vSwitch you are using is older than version 1.8 (which added the + --all option), then you will have to destroy QoS and Queue records + individually. + +### Q: I configured Quality of Service (QoS) in my OpenFlow network by + adding records to the QoS and Queue table, but the results aren't + what I expect. + +A: Did you install OpenFlow flows that use your queues? This is the + primary way to tell Open vSwitch which queues you want to use. If + you don't do this, then the default queue will be used, which will + probably not have the effect you want. + + Refer to the previous question for an example. + +### Q: I'd like to take advantage of some QoS feature that Open vSwitch + doesn't yet support. How do I do that? + +A: Open vSwitch does not implement QoS itself. Instead, it can + configure some, but not all, of the QoS features built into the + Linux kernel. If you need some QoS feature that OVS cannot + configure itself, then the first step is to figure out whether + Linux QoS supports that feature. If it does, then you can submit a + patch to support Open vSwitch configuration for that feature, or + you can use "tc" directly to configure the feature in Linux. (If + Linux QoS doesn't support the feature you want, then first you have + to add that support to Linux.) + +### Q: I configured QoS, correctly, but my measurements show that it isn't + working as well as I expect. + +A: With the Linux kernel, the Open vSwitch implementation of QoS has + two aspects: + + - Open vSwitch configures a subset of Linux kernel QoS + features, according to what is in OVSDB. It is possible that + this code has bugs. If you believe that this is so, then you + can configure the Linux traffic control (QoS) stack directly + with the "tc" program. If you get better results that way, + you can send a detailed bug report to bugs@openvswitch.org. + + It is certain that Open vSwitch cannot configure every Linux + kernel QoS feature. If you need some feature that OVS cannot + configure, then you can also use "tc" directly (or add that + feature to OVS). + + - The Open vSwitch implementation of OpenFlow allows flows to + be directed to particular queues. This is pretty simple and + unlikely to have serious bugs at this point. + + However, most problems with QoS on Linux are not bugs in Open + vSwitch at all. They tend to be either configuration errors + (please see the earlier questions in this section) or issues with + the traffic control (QoS) stack in Linux. The Open vSwitch + developers are not experts on Linux traffic control. We suggest + that, if you believe you are encountering a problem with Linux + traffic control, that you consult the tc manpages (e.g. tc(8), + tc-htb(8), tc-hfsc(8)), web resources (e.g. http://lartc.org/), or + mailing lists (e.g. http://vger.kernel.org/vger-lists.html#netdev). + +### Q: Does Open vSwitch support OpenFlow meters? + +A: Since version 2.0, Open vSwitch has OpenFlow protocol support for + OpenFlow meters. There is no implementation of meters in the Open + vSwitch software switch (neither the kernel-based nor userspace + switches). + + +VLANs +----- + +### Q: What's a VLAN? + +A: At the simplest level, a VLAN (short for "virtual LAN") is a way to + partition a single switch into multiple switches. Suppose, for + example, that you have two groups of machines, group A and group B. + You want the machines in group A to be able to talk to each other, + and you want the machine in group B to be able to talk to each + other, but you don't want the machines in group A to be able to + talk to the machines in group B. You can do this with two + switches, by plugging the machines in group A into one switch and + the machines in group B into the other switch. + + If you only have one switch, then you can use VLANs to do the same + thing, by configuring the ports for machines in group A as VLAN + "access ports" for one VLAN and the ports for group B as "access + ports" for a different VLAN. The switch will only forward packets + between ports that are assigned to the same VLAN, so this + effectively subdivides your single switch into two independent + switches, one for each group of machines. + + So far we haven't said anything about VLAN headers. With access + ports, like we've described so far, no VLAN header is present in + the Ethernet frame. This means that the machines (or switches) + connected to access ports need not be aware that VLANs are + involved, just like in the case where we use two different physical + switches. + + Now suppose that you have a whole bunch of switches in your + network, instead of just one, and that some machines in group A are + connected directly to both switches 1 and 2. To allow these + machines to talk to each other, you could add an access port for + group A's VLAN to switch 1 and another to switch 2, and then + connect an Ethernet cable between those ports. That works fine, + but it doesn't scale well as the number of switches and the number + of VLANs increases, because you use up a lot of valuable switch + ports just connecting together your VLANs. + + This is where VLAN headers come in. Instead of using one cable and + two ports per VLAN to connect a pair of switches, we configure a + port on each switch as a VLAN "trunk port". Packets sent and + received on a trunk port carry a VLAN header that says what VLAN + the packet belongs to, so that only two ports total are required to + connect the switches, regardless of the number of VLANs in use. + Normally, only switches (either physical or virtual) are connected + to a trunk port, not individual hosts, because individual hosts + don't expect to see a VLAN header in the traffic that they receive. + + None of the above discussion says anything about particular VLAN + numbers. This is because VLAN numbers are completely arbitrary. + One must only ensure that a given VLAN is numbered consistently + throughout a network and that different VLANs are given different + numbers. (That said, VLAN 0 is usually synonymous with a packet + that has no VLAN header, and VLAN 4095 is reserved.) + +### Q: VLANs don't work. + +A: Many drivers in Linux kernels before version 3.3 had VLAN-related + bugs. If you are having problems with VLANs that you suspect to be + driver related, then you have several options: + + - Upgrade to Linux 3.3 or later. + + - Build and install a fixed version of the particular driver + that is causing trouble, if one is available. + + - Use a NIC whose driver does not have VLAN problems. + + - Use "VLAN splinters", a feature in Open vSwitch 1.4 and later + that works around bugs in kernel drivers. To enable VLAN + splinters on interface eth0, use the command: + + ovs-vsctl set interface eth0 other-config:enable-vlan-splinters=true + + For VLAN splinters to be effective, Open vSwitch must know + which VLANs are in use. See the "VLAN splinters" section in + the Interface table in ovs-vswitchd.conf.db(5) for details on + how Open vSwitch infers in-use VLANs. + + VLAN splinters increase memory use and reduce performance, so + use them only if needed. + + - Apply the "vlan workaround" patch from the XenServer kernel + patch queue, build Open vSwitch against this patched kernel, + and then use ovs-vlan-bug-workaround(8) to enable the VLAN + workaround for each interface whose driver is buggy. + + (This is a nontrivial exercise, so this option is included + only for completeness.) + + It is not always easy to tell whether a Linux kernel driver has + buggy VLAN support. The ovs-vlan-test(8) and ovs-test(8) utilities + can help you test. See their manpages for details. Of the two + utilities, ovs-test(8) is newer and more thorough, but + ovs-vlan-test(8) may be easier to use. + +### Q: VLANs still don't work. I've tested the driver so I know that it's OK. + +A: Do you have VLANs enabled on the physical switch that OVS is + attached to? Make sure that the port is configured to trunk the + VLAN or VLANs that you are using with OVS. + +### Q: Outgoing VLAN-tagged traffic goes through OVS to my physical switch + and to its destination host, but OVS seems to drop incoming return + traffic. + +A: It's possible that you have the VLAN configured on your physical + switch as the "native" VLAN. In this mode, the switch treats + incoming packets either tagged with the native VLAN or untagged as + part of the native VLAN. It may also send outgoing packets in the + native VLAN without a VLAN tag. + + If this is the case, you have two choices: + + - Change the physical switch port configuration to tag packets + it forwards to OVS with the native VLAN instead of forwarding + them untagged. + + - Change the OVS configuration for the physical port to a + native VLAN mode. For example, the following sets up a + bridge with port eth0 in "native-tagged" mode in VLAN 9: + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 eth0 tag=9 vlan_mode=native-tagged + + In this situation, "native-untagged" mode will probably work + equally well. Refer to the documentation for the Port table + in ovs-vswitchd.conf.db(5) for more information. + +### Q: I added a pair of VMs on different VLANs, like this: + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 eth0 + ovs-vsctl add-port br0 tap0 tag=9 + ovs-vsctl add-port br0 tap1 tag=10 + + but the VMs can't access each other, the external network, or the + Internet. + +A: It is to be expected that the VMs can't access each other. VLANs + are a means to partition a network. When you configured tap0 and + tap1 as access ports for different VLANs, you indicated that they + should be isolated from each other. + + As for the external network and the Internet, it seems likely that + the machines you are trying to access are not on VLAN 9 (or 10) and + that the Internet is not available on VLAN 9 (or 10). + +### Q: I added a pair of VMs on the same VLAN, like this: + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 eth0 + ovs-vsctl add-port br0 tap0 tag=9 + ovs-vsctl add-port br0 tap1 tag=9 + + The VMs can access each other, but not the external network or the + Internet. + +A: It seems likely that the machines you are trying to access in the + external network are not on VLAN 9 and that the Internet is not + available on VLAN 9. Also, ensure VLAN 9 is set up as an allowed + trunk VLAN on the upstream switch port to which eth0 is connected. + +### Q: Can I configure an IP address on a VLAN? + +A: Yes. Use an "internal port" configured as an access port. For + example, the following configures IP address 192.168.0.7 on VLAN 9. + That is, OVS will forward packets from eth0 to 192.168.0.7 only if + they have an 802.1Q header with VLAN 9. Conversely, traffic + forwarded from 192.168.0.7 to eth0 will be tagged with an 802.1Q + header with VLAN 9: + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 eth0 + ovs-vsctl add-port br0 vlan9 tag=9 -- set interface vlan9 type=internal + ifconfig vlan9 192.168.0.7 + + See also the following question. + +### Q: I configured one IP address on VLAN 0 and another on VLAN 9, like + this: + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 eth0 + ifconfig br0 192.168.0.5 + ovs-vsctl add-port br0 vlan9 tag=9 -- set interface vlan9 type=internal + ifconfig vlan9 192.168.0.9 + + but other hosts that are only on VLAN 0 can reach the IP address + configured on VLAN 9. What's going on? + +A: RFC 1122 section 3.3.4.2 "Multihoming Requirements" describes two + approaches to IP address handling in Internet hosts: + + - In the "Strong ES Model", where an ES is a host ("End + System"), an IP address is primarily associated with a + particular interface. The host discards packets that arrive + on interface A if they are destined for an IP address that is + configured on interface B. The host never sends packets from + interface A using a source address configured on interface B. + + - In the "Weak ES Model", an IP address is primarily associated + with a host. The host accepts packets that arrive on any + interface if they are destined for any of the host's IP + addresses, even if the address is configured on some + interface other than the one on which it arrived. The host + does not restrict itself to sending packets from an IP + address associated with the originating interface. + + Linux uses the weak ES model. That means that when packets + destined to the VLAN 9 IP address arrive on eth0 and are bridged to + br0, the kernel IP stack accepts them there for the VLAN 9 IP + address, even though they were not received on vlan9, the network + device for vlan9. + + To simulate the strong ES model on Linux, one may add iptables rule + to filter packets based on source and destination address and + adjust ARP configuration with sysctls. + + BSD uses the strong ES model. + +### Q: My OpenFlow controller doesn't see the VLANs that I expect. + +A: The configuration for VLANs in the Open vSwitch database (e.g. via + ovs-vsctl) only affects traffic that goes through Open vSwitch's + implementation of the OpenFlow "normal switching" action. By + default, when Open vSwitch isn't connected to a controller and + nothing has been manually configured in the flow table, all traffic + goes through the "normal switching" action. But, if you set up + OpenFlow flows on your own, through a controller or using ovs-ofctl + or through other means, then you have to implement VLAN handling + yourself. + + You can use "normal switching" as a component of your OpenFlow + actions, e.g. by putting "normal" into the lists of actions on + ovs-ofctl or by outputting to OFPP_NORMAL from an OpenFlow + controller. In situations where this is not suitable, you can + implement VLAN handling yourself, e.g.: + + - If a packet comes in on an access port, and the flow table + needs to send it out on a trunk port, then the flow can add + the appropriate VLAN tag with the "mod_vlan_vid" action. + + - If a packet comes in on a trunk port, and the flow table + needs to send it out on an access port, then the flow can + strip the VLAN tag with the "strip_vlan" action. + +### Q: I configured ports on a bridge as access ports with different VLAN + tags, like this: + + ovs-vsctl add-br br0 + ovs-vsctl set-controller br0 tcp:192.168.0.10:6633 + ovs-vsctl add-port br0 eth0 + ovs-vsctl add-port br0 tap0 tag=9 + ovs-vsctl add-port br0 tap1 tag=10 + + but the VMs running behind tap0 and tap1 can still communicate, + that is, they are not isolated from each other even though they are + on different VLANs. + +A: Do you have a controller configured on br0 (as the commands above + do)? If so, then this is a variant on the previous question, "My + OpenFlow controller doesn't see the VLANs that I expect," and you + can refer to the answer there for more information. + +### Q: How MAC learning works with VLANs? + +A: Open vSwitch implements Independent VLAN Learning (IVL) for + OFPP_NORMAL action. I.e. it logically has separate learning tables + for each VLANs. + + +VXLANs +----- + +### Q: What's a VXLAN? + +A: VXLAN stands for Virtual eXtensible Local Area Network, and is a means + to solve the scaling challenges of VLAN networks in a multi-tenant + environment. VXLAN is an overlay network which transports an L2 network + over an existing L3 network. For more information on VXLAN, please see + the IETF draft available here: + + http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03 + +### Q: How much of the VXLAN protocol does Open vSwitch currently support? + +A: Open vSwitch currently supports the framing format for packets on the + wire. There is currently no support for the multicast aspects of VXLAN. + To get around the lack of multicast support, it is possible to + pre-provision MAC to IP address mappings either manually or from a + controller. + +### Q: What destination UDP port does the VXLAN implementation in Open vSwitch + use? + +A: By default, Open vSwitch will use the assigned IANA port for VXLAN, which + is 4789. However, it is possible to configure the destination UDP port + manually on a per-VXLAN tunnel basis. An example of this configuration is + provided below. + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 vxlan1 -- set interface vxlan1 + type=vxlan options:remote_ip=192.168.1.2 options:key=flow + options:dst_port=8472 + + +Using OpenFlow (Manually or Via Controller) +------------------------------------------- + +### Q: What versions of OpenFlow does Open vSwitch support? + +A: The following table lists the versions of OpenFlow supported by + each version of Open vSwitch: + + Open vSwitch OF1.0 OF1.1 OF1.2 OF1.3 OF1.4 OF1.5 + ###============ ===== ===== ===== ===== ===== ===== + 1.9 and earlier yes --- --- --- --- --- + 1.10 yes --- [*] [*] --- --- + 1.11 yes --- [*] [*] --- --- + 2.0 yes [*] [*] [*] --- --- + 2.1 yes [*] [*] [*] --- --- + 2.2 yes [*] [*] [*] [%] [*] + 2.3 yes yes yes yes [*] [*] + + [*] Supported, with one or more missing features. + [%] Experimental, unsafe implementation. + + Open vSwitch 2.3 enables OpenFlow 1.0, 1.1, 1.2, and 1.3 by default + in ovs-vswitchd. In Open vSwitch 1.10 through 2.2, OpenFlow 1.1, + 1.2, and 1.3 must be enabled manually in ovs-vswitchd. OpenFlow + 1.4 and 1.5 are also supported, with missing features, in Open + vSwitch 2.3 and later, but not enabled by default. In any case, + the user may override the default: + + - To enable OpenFlow 1.0, 1.1, 1.2, and 1.3 on bridge br0: + + ovs-vsctl set bridge br0 protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13 + + - To enable OpenFlow 1.0, 1.1, 1.2, 1.3, 1.4, and 1.5 on bridge br0: + + ovs-vsctl set bridge br0 protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15 + + - To enable only OpenFlow 1.0 on bridge br0: + + ovs-vsctl set bridge br0 protocols=OpenFlow10 + + All current versions of ovs-ofctl enable only OpenFlow 1.0 by + default. Use the -O option to enable support for later versions of + OpenFlow in ovs-ofctl. For example: + + ovs-ofctl -O OpenFlow13 dump-flows br0 + + (Open vSwitch 2.2 had an experimental implementation of OpenFlow + 1.4 that could cause crashes. We don't recommend enabling it.) + + OPENFLOW-1.1+ in the Open vSwitch source tree tracks support for + OpenFlow 1.1 and later features. When support for OpenFlow 1.4 and + 1.5 is solidly implemented, Open vSwitch will enable those version + by default. Also, the OpenFlow 1.5 specification is still under + development and thus subject to change. + +### Q: Does Open vSwitch support MPLS? + +A: Before version 1.11, Open vSwitch did not support MPLS. That is, + these versions can match on MPLS Ethernet types, but they cannot + match, push, or pop MPLS labels, nor can they look past MPLS labels + into the encapsulated packet. + + Open vSwitch versions 1.11, 2.0, and 2.1 have very minimal support + for MPLS. With the userspace datapath only, these versions can + match, push, or pop a single MPLS label, but they still cannot look + past MPLS labels (even after popping them) into the encapsulated + packet. Kernel datapath support is unchanged from earlier + versions. + + Open vSwitch version 2.3 can match, push, or pop up to 3 MPLS + labels. Looking past MPLS labels into the encapsulated packet will + still be unsupported. Both userspace and kernel datapaths will be + supported, but MPLS processing always happens in userspace either + way, so kernel datapath performance will be disappointing. + + Open vSwitch version 2.4 will have kernel support for MPLS, + yielding improved performance. + +### Q: I'm getting "error type 45250 code 0". What's that? + +A: This is a Open vSwitch extension to OpenFlow error codes. Open + vSwitch uses this extension when it must report an error to an + OpenFlow controller but no standard OpenFlow error code is + suitable. + + Open vSwitch logs the errors that it sends to controllers, so the + easiest thing to do is probably to look at the ovs-vswitchd log to + find out what the error was. + + If you want to dissect the extended error message yourself, the + format is documented in include/openflow/nicira-ext.h in the Open + vSwitch source distribution. The extended error codes are + documented in lib/ofp-errors.h. + +Q1: Some of the traffic that I'd expect my OpenFlow controller to see + doesn't actually appear through the OpenFlow connection, even + though I know that it's going through. +Q2: Some of the OpenFlow flows that my controller sets up don't seem + to apply to certain traffic, especially traffic between OVS and + the controller itself. + +A: By default, Open vSwitch assumes that OpenFlow controllers are + connected "in-band", that is, that the controllers are actually + part of the network that is being controlled. In in-band mode, + Open vSwitch sets up special "hidden" flows to make sure that + traffic can make it back and forth between OVS and the controllers. + These hidden flows are higher priority than any flows that can be + set up through OpenFlow, and they are not visible through normal + OpenFlow flow table dumps. + + Usually, the hidden flows are desirable and helpful, but + occasionally they can cause unexpected behavior. You can view the + full OpenFlow flow table, including hidden flows, on bridge br0 + with the command: + + ovs-appctl bridge/dump-flows br0 + + to help you debug. The hidden flows are those with priorities + greater than 65535 (the maximum priority that can be set with + OpenFlow). + + The DESIGN file at the top level of the Open vSwitch source + distribution describes the in-band model in detail. + + If your controllers are not actually in-band (e.g. they are on + localhost via 127.0.0.1, or on a separate network), then you should + configure your controllers in "out-of-band" mode. If you have one + controller on bridge br0, then you can configure out-of-band mode + on it with: + + ovs-vsctl set controller br0 connection-mode=out-of-band + +### Q: I configured all my controllers for out-of-band control mode but + "ovs-appctl bridge/dump-flows" still shows some hidden flows. + +A: You probably have a remote manager configured (e.g. with "ovs-vsctl + set-manager"). By default, Open vSwitch assumes that managers need + in-band rules set up on every bridge. You can disable these rules + on bridge br0 with: + + ovs-vsctl set bridge br0 other-config:disable-in-band=true + + This actually disables in-band control entirely for the bridge, as + if all the bridge's controllers were configured for out-of-band + control. + +### Q: My OpenFlow controller doesn't see the VLANs that I expect. + +A: See answer under "VLANs", above. + +### Q: I ran "ovs-ofctl add-flow br0 nw_dst=192.168.0.1,actions=drop" + but I got a funny message like this: + + ofp_util|INFO|normalization changed ofp_match, details: + ofp_util|INFO| pre: nw_dst=192.168.0.1 + ofp_util|INFO|post: + + and when I ran "ovs-ofctl dump-flows br0" I saw that my nw_dst + match had disappeared, so that the flow ends up matching every + packet. + +A: The term "normalization" in the log message means that a flow + cannot match on an L3 field without saying what L3 protocol is in + use. The "ovs-ofctl" command above didn't specify an L3 protocol, + so the L3 field match was dropped. + + In this case, the L3 protocol could be IP or ARP. A correct + command for each possibility is, respectively: + + ovs-ofctl add-flow br0 ip,nw_dst=192.168.0.1,actions=drop + + and + + ovs-ofctl add-flow br0 arp,nw_dst=192.168.0.1,actions=drop + + Similarly, a flow cannot match on an L4 field without saying what + L4 protocol is in use. For example, the flow match "tp_src=1234" + is, by itself, meaningless and will be ignored. Instead, to match + TCP source port 1234, write "tcp,tp_src=1234", or to match UDP + source port 1234, write "udp,tp_src=1234". + +### Q: How can I figure out the OpenFlow port number for a given port? + +A: The OFPT_FEATURES_REQUEST message requests an OpenFlow switch to + respond with an OFPT_FEATURES_REPLY that, among other information, + includes a mapping between OpenFlow port names and numbers. From a + command prompt, "ovs-ofctl show br0" makes such a request and + prints the response for switch br0. + + The Interface table in the Open vSwitch database also maps OpenFlow + port names to numbers. To print the OpenFlow port number + associated with interface eth0, run: + + ovs-vsctl get Interface eth0 ofport + + You can print the entire mapping with: + + ovs-vsctl -- --columns=name,ofport list Interface + + but the output mixes together interfaces from all bridges in the + database, so it may be confusing if more than one bridge exists. + + In the Open vSwitch database, ofport value -1 means that the + interface could not be created due to an error. (The Open vSwitch + log should indicate the reason.) ofport value [] (the empty set) + means that the interface hasn't been created yet. The latter is + normally an intermittent condition (unless ovs-vswitchd is not + running). + +### Q: I added some flows with my controller or with ovs-ofctl, but when I + run "ovs-dpctl dump-flows" I don't see them. + +A: ovs-dpctl queries a kernel datapath, not an OpenFlow switch. It + won't display the information that you want. You want to use + "ovs-ofctl dump-flows" instead. + +### Q: It looks like each of the interfaces in my bonded port shows up + as an individual OpenFlow port. Is that right? + +A: Yes, Open vSwitch makes individual bond interfaces visible as + OpenFlow ports, rather than the bond as a whole. The interfaces + are treated together as a bond for only a few purposes: + + - Sending a packet to the OFPP_NORMAL port. (When an OpenFlow + controller is not configured, this happens implicitly to + every packet.) + + - Mirrors configured for output to a bonded port. + + It would make a lot of sense for Open vSwitch to present a bond as + a single OpenFlow port. If you want to contribute an + implementation of such a feature, please bring it up on the Open + vSwitch development mailing list at dev@openvswitch.org. + +### Q: I have a sophisticated network setup involving Open vSwitch, VMs or + multiple hosts, and other components. The behavior isn't what I + expect. Help! + +A: To debug network behavior problems, trace the path of a packet, + hop-by-hop, from its origin in one host to a remote host. If + that's correct, then trace the path of the response packet back to + the origin. + + Usually a simple ICMP echo request and reply ("ping") packet is + good enough. Start by initiating an ongoing "ping" from the origin + host to a remote host. If you are tracking down a connectivity + problem, the "ping" will not display any successful output, but + packets are still being sent. (In this case the packets being sent + are likely ARP rather than ICMP.) + + Tools available for tracing include the following: + + - "tcpdump" and "wireshark" for observing hops across network + devices, such as Open vSwitch internal devices and physical + wires. + + - "ovs-appctl dpif/dump-flows
" in Open vSwitch 1.10 and + later or "ovs-dpctl dump-flows
" in earlier versions. + These tools allow one to observe the actions being taken on + packets in ongoing flows. + + See ovs-vswitchd(8) for "ovs-appctl dpif/dump-flows" + documentation, ovs-dpctl(8) for "ovs-dpctl dump-flows" + documentation, and "Why are there so many different ways to + dump flows?" above for some background. + + - "ovs-appctl ofproto/trace" to observe the logic behind how + ovs-vswitchd treats packets. See ovs-vswitchd(8) for + documentation. You can out more details about a given flow + that "ovs-dpctl dump-flows" displays, by cutting and pasting + a flow from the output into an "ovs-appctl ofproto/trace" + command. + + - SPAN, RSPAN, and ERSPAN features of physical switches, to + observe what goes on at these physical hops. + + Starting at the origin of a given packet, observe the packet at + each hop in turn. For example, in one plausible scenario, you + might: + + 1. "tcpdump" the "eth" interface through which an ARP egresses + a VM, from inside the VM. + + 2. "tcpdump" the "vif" or "tap" interface through which the ARP + ingresses the host machine. + + 3. Use "ovs-dpctl dump-flows" to spot the ARP flow and observe + the host interface through which the ARP egresses the + physical machine. You may need to use "ovs-dpctl show" to + interpret the port numbers. If the output seems surprising, + you can use "ovs-appctl ofproto/trace" to observe details of + how ovs-vswitchd determined the actions in the "ovs-dpctl + dump-flows" output. + + 4. "tcpdump" the "eth" interface through which the ARP egresses + the physical machine. + + 5. "tcpdump" the "eth" interface through which the ARP + ingresses the physical machine, at the remote host that + receives the ARP. + + 6. Use "ovs-dpctl dump-flows" to spot the ARP flow on the + remote host that receives the ARP and observe the VM "vif" + or "tap" interface to which the flow is directed. Again, + "ovs-dpctl show" and "ovs-appctl ofproto/trace" might help. + + 7. "tcpdump" the "vif" or "tap" interface to which the ARP is + directed. + + 8. "tcpdump" the "eth" interface through which the ARP + ingresses a VM, from inside the VM. + + It is likely that during one of these steps you will figure out the + problem. If not, then follow the ARP reply back to the origin, in + reverse. + +### Q: How do I make a flow drop packets? + +A: To drop a packet is to receive it without forwarding it. OpenFlow + explicitly specifies forwarding actions. Thus, a flow with an + empty set of actions does not forward packets anywhere, causing + them to be dropped. You can specify an empty set of actions with + "actions=" on the ovs-ofctl command line. For example: + + ovs-ofctl add-flow br0 priority=65535,actions= + + would cause every packet entering switch br0 to be dropped. + + You can write "drop" explicitly if you like. The effect is the + same. Thus, the following command also causes every packet + entering switch br0 to be dropped: + + ovs-ofctl add-flow br0 priority=65535,actions=drop + + "drop" is not an action, either in OpenFlow or Open vSwitch. + Rather, it is only a way to say that there are no actions. + +### Q: I added a flow to send packets out the ingress port, like this: + + ovs-ofctl add-flow br0 in_port=2,actions=2 + + but OVS drops the packets instead. + +A: Yes, OpenFlow requires a switch to ignore attempts to send a packet + out its ingress port. The rationale is that dropping these packets + makes it harder to loop the network. Sometimes this behavior can + even be convenient, e.g. it is often the desired behavior in a flow + that forwards a packet to several ports ("floods" the packet). + + Sometimes one really needs to send a packet out its ingress port + ("hairpin"). In this case, output to OFPP_IN_PORT, which in + ovs-ofctl syntax is expressed as just "in_port", e.g.: + + ovs-ofctl add-flow br0 in_port=2,actions=in_port + + This also works in some circumstances where the flow doesn't match + on the input port. For example, if you know that your switch has + five ports numbered 2 through 6, then the following will send every + received packet out every port, even its ingress port: + + ovs-ofctl add-flow br0 actions=2,3,4,5,6,in_port + + or, equivalently: + + ovs-ofctl add-flow br0 actions=all,in_port + + Sometimes, in complicated flow tables with multiple levels of + "resubmit" actions, a flow needs to output to a particular port + that may or may not be the ingress port. It's difficult to take + advantage of OFPP_IN_PORT in this situation. To help, Open vSwitch + provides, as an OpenFlow extension, the ability to modify the + in_port field. Whatever value is currently in the in_port field is + the port to which outputs will be dropped, as well as the + destination for OFPP_IN_PORT. This means that the following will + reliably output to port 2 or to ports 2 through 6, respectively: + + ovs-ofctl add-flow br0 in_port=2,actions=load:0->NXM_OF_IN_PORT[],2 + ovs-ofctl add-flow br0 actions=load:0->NXM_OF_IN_PORT[],2,3,4,5,6 + + If the input port is important, then one may save and restore it on + the stack: + + ovs-ofctl add-flow br0 actions=push:NXM_OF_IN_PORT[],\ + load:0->NXM_OF_IN_PORT[],\ + 2,3,4,5,6,\ + pop:NXM_OF_IN_PORT[] + +### Q: My bridge br0 has host 192.168.0.1 on port 1 and host 192.168.0.2 + on port 2. I set up flows to forward only traffic destined to the + other host and drop other traffic, like this: + + priority=5,in_port=1,ip,nw_dst=192.168.0.2,actions=2 + priority=5,in_port=2,ip,nw_dst=192.168.0.1,actions=1 + priority=0,actions=drop + + But it doesn't work--I don't get any connectivity when I do this. + Why? + +A: These flows drop the ARP packets that IP hosts use to establish IP + connectivity over Ethernet. To solve the problem, add flows to + allow ARP to pass between the hosts: + + priority=5,in_port=1,arp,actions=2 + priority=5,in_port=2,arp,actions=1 + + This issue can manifest other ways, too. The following flows that + match on Ethernet addresses instead of IP addresses will also drop + ARP packets, because ARP requests are broadcast instead of being + directed to a specific host: + + priority=5,in_port=1,dl_dst=54:00:00:00:00:02,actions=2 + priority=5,in_port=2,dl_dst=54:00:00:00:00:01,actions=1 + priority=0,actions=drop + + The solution already described above will also work in this case. + It may be better to add flows to allow all multicast and broadcast + traffic: + + priority=5,in_port=1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00,actions=2 + priority=5,in_port=2,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00,actions=1 + +### Q: My bridge disconnects from my controller on add-port/del-port. + +A: Reconfiguring your bridge can change your bridge's datapath-id because + Open vSwitch generates datapath-id from the MAC address of one of its ports. + In that case, Open vSwitch disconnects from controllers because there's + no graceful way to notify controllers about the change of datapath-id. + + To avoid the behaviour, you can configure datapath-id manually. + + ovs-vsctl set bridge br0 other-config:datapath-id=0123456789abcdef + + +Development +----------- + +### Q: How do I implement a new OpenFlow message? + +A: Add your new message to "enum ofpraw" and "enum ofptype" in + lib/ofp-msgs.h, following the existing pattern. Then recompile and + fix all of the new warnings, implementing new functionality for the + new message as needed. (If you configure with --enable-Werror, as + described in [INSTALL](INSTALL.md), then it is impossible to miss + any warnings.) + + If you need to add an OpenFlow vendor extension message for a + vendor that doesn't yet have any extension messages, then you will + also need to edit build-aux/extract-ofp-msgs. + + +Contact +------- + +bugs@openvswitch.org +http://openvswitch.org/ diff --git a/INSTALL b/INSTALL deleted file mode 100644 index 3cfdc0412..000000000 --- a/INSTALL +++ /dev/null @@ -1,602 +0,0 @@ - How to Install Open vSwitch on Linux, FreeBSD and NetBSD - ======================================================== - -This document describes how to build and install Open vSwitch on a -generic Linux, FreeBSD, or NetBSD host. For specifics around installation -on a specific platform, please see one of these files: - - - INSTALL.Debian - - INSTALL.Fedora - - INSTALL.RHEL - - INSTALL.XenServer - - INSTALL.NetBSD - - INSTALL.DPDK - -Build Requirements ------------------- - -To compile the userspace programs in the Open vSwitch distribution, -you will need the following software: - - - GNU make. - - - A C compiler, such as: - - * GCC 4.x. - - * Clang. Clang 3.4 and later provide useful static semantic - analysis and thread-safety checks. For Ubuntu, there are - nightly built packages available on clang's website. - - While OVS may be compatible with other compilers, optimal - support for atomic operations may be missing, making OVS very - slow (see lib/ovs-atomic.h). - - - libssl, from OpenSSL, is optional but recommended if you plan to - connect the Open vSwitch to an OpenFlow controller. libssl is - required to establish confidentiality and authenticity in the - connections from an Open vSwitch to an OpenFlow controller. If - libssl is installed, then Open vSwitch will automatically build - with support for it. - - - Python 2.x, for x >= 4. - -On Linux, you may choose to compile the kernel module that comes with -the Open vSwitch distribution or to use the kernel module built into -the Linux kernel (version 3.3 or later). See the FAQ question "What -features are not available in the Open vSwitch kernel datapath that -ships as part of the upstream Linux kernel?" for more information on -this trade-off. You may also use the userspace-only implementation, -at some cost in features and performance (see INSTALL.userspace for -details). To compile the kernel module on Linux, you must also -install the following: - - - A supported Linux kernel version. Please refer to README.md for a - list of supported versions. - - The Open vSwitch datapath requires bridging support - (CONFIG_BRIDGE) to be built as a kernel module. (This is common - in kernels provided by Linux distributions.) The bridge module - must not be loaded or in use. If the bridge module is running - (check with "lsmod | grep bridge"), you must remove it ("rmmod - bridge") before starting the datapath. - - For optional support of ingress policing, you must enable kernel - configuration options NET_CLS_BASIC, NET_SCH_INGRESS, and - NET_ACT_POLICE, either built-in or as modules. (NET_CLS_POLICE is - obsolete and not needed.) - - To use GRE tunneling on Linux 2.6.37 or newer, kernel support - for GRE demultiplexing (CONFIG_NET_IPGRE_DEMUX) must be compiled - in or available as a module. Also, on kernels before 3.11, the - ip_gre module, for GRE tunnels over IP (NET_IPGRE), must not be - loaded or compiled in. - - To configure HTB or HFSC quality of service with Open vSwitch, - you must enable the respective configuration options. - - To use Open vSwitch support for TAP devices, you must enable - CONFIG_TUN. - - - To build a kernel module, you need the same version of GCC that - was used to build that kernel. - - - A kernel build directory corresponding to the Linux kernel image - the module is to run on. Under Debian and Ubuntu, for example, - each linux-image package containing a kernel binary has a - corresponding linux-headers package with the required build - infrastructure. - -If you are working from a Git tree or snapshot (instead of from a -distribution tarball), or if you modify the Open vSwitch build system -or the database schema, you will also need the following software: - - - Autoconf version 2.63 or later. - - - Automake version 1.10 or later. - - - libtool version 2.4 or later. (Older versions might work too.) - -To run the unit tests, you also need: - - - Perl. Version 5.10.1 is known to work. Earlier versions should - also work. - -The ovs-vswitchd.conf.db(5) manpage will include an E-R diagram, in -formats other than plain text, only if you have the following: - - - "dot" from graphviz (http://www.graphviz.org/). - - - Perl. Version 5.10.1 is known to work. Earlier versions should - also work. - - - Python 2.x, for x >= 4. - -If you are going to extensively modify Open vSwitch, please consider -installing the following to obtain better warnings: - - - "sparse" version 0.4.4 or later - (http://www.kernel.org/pub/software/devel/sparse/dist/). - - - GNU make. - - - clang, version 3.4 or later - -Also, you may find the ovs-dev script found in utilities/ovs-dev.py useful. - -Installation Requirements -------------------------- - -The machine on which Open vSwitch is to be installed must have the -following software: - - - libc compatible with the libc used for build. - - - libssl compatible with the libssl used for build, if OpenSSL was - used for the build. - - - On Linux, the same kernel version configured as part of the build. - - - For optional support of ingress policing on Linux, the "tc" program - from iproute2 (part of all major distributions and available at - http://www.linux-foundation.org/en/Net:Iproute2). - -On Linux you should ensure that /dev/urandom exists. To support TAP -devices, you must also ensure that /dev/net/tun exists. - -Building and Installing Open vSwitch for Linux, FreeBSD or NetBSD -================================================================= - -Once you have installed all the prerequisites listed above in the Base -Prerequisites section, follow the procedure below to build. - -1. If you pulled the sources directly from an Open vSwitch Git tree, - run boot.sh in the top source directory: - - % ./boot.sh - -2. Configure the package by running the configure script. You can - usually invoke configure without any arguments. For example: - - % ./configure - - By default all files are installed under /usr/local. If you want - to install into, e.g., /usr and /var instead of /usr/local and - /usr/local/var, add options as shown here: - - % ./configure --prefix=/usr --localstatedir=/var - - To use a specific C compiler for compiling Open vSwitch user - programs, also specify it on the configure command line, like so: - - % ./configure CC=gcc-4.2 - - To use 'clang' compiler: - - % ./configure CC=clang - - To build the Linux kernel module, so that you can run the - kernel-based switch, pass the location of the kernel build - directory on --with-linux. For example, to build for a running - instance of Linux: - - % ./configure --with-linux=/lib/modules/`uname -r`/build - - If --with-linux requests building for an unsupported version of - Linux, then "configure" will fail with an error message. Please - refer to the FAQ for advice in that case. - - If you wish to build the kernel module for an architecture other - than the architecture of the machine used for the build, you may - specify the kernel architecture string using the KARCH variable - when invoking the configure script. For example, to build for MIPS - with Linux: - - % ./configure --with-linux=/path/to/linux KARCH=mips - - If you plan to do much Open vSwitch development, you might want to - add --enable-Werror, which adds the -Werror option to the compiler - command line, turning warnings into errors. That makes it - impossible to miss warnings generated by the build. - - To build with gcov code coverage support, add --enable-coverage, - e.g.: - - % ./configure --enable-coverage - - The configure script accepts a number of other options and honors - additional environment variables. For a full list, invoke - configure with the --help option. - - You can also run configure from a separate build directory. This - is helpful if you want to build Open vSwitch in more than one way - from a single source directory, e.g. to try out both GCC and Clang - builds, or to build kernel modules for more than one Linux version. - Here is an example: - - % mkdir _gcc && (cd _gcc && ../configure CC=gcc) - % mkdir _clang && (cd _clang && ../configure CC=clang) - -3. Run GNU make in the build directory, e.g.: - - % make - - or if GNU make is installed as "gmake": - - % gmake - - If you used a separate build directory, run make or gmake from that - directory, e.g.: - - % make -C _gcc - % make -C _clang - - For improved warnings if you installed "sparse" (see - "Prerequisites"), add C=1 to the command line. - -4. Consider running the testsuite. Refer to "Running the Testsuite" - below, for instructions. - -5. Become root by running "su" or another program. - -6. Run "make install" to install the executables and manpages into the - running system, by default under /usr/local. - -7. If you built kernel modules, you may install and load them, e.g.: - - % make modules_install - % /sbin/modprobe openvswitch - - To verify that the modules have been loaded, run "/sbin/lsmod" and - check that openvswitch is listed. - - If the "modprobe" operation fails, look at the last few kernel log - messages (e.g. with "dmesg | tail"): - - - The message "openvswitch: exports duplicate symbol - br_should_route_hook (owned by bridge)" means that the bridge - module is loaded. Run "/sbin/rmmod bridge" to remove it. - - If "/sbin/rmmod bridge" fails with "ERROR: Module bridge does - not exist in /proc/modules", then the bridge is compiled into - the kernel, rather than as a module. Open vSwitch does not - support this configuration (see "Build Requirements", above). - - - The message "openvswitch: exports duplicate symbol - dp_ioctl_hook (owned by ofdatapath)" means that the ofdatapath - module from the OpenFlow reference implementation is loaded. - Run "/sbin/rmmod ofdatapath" to remove it. (You might have to - delete any existing datapaths beforehand, using the "dpctl" - program included with the OpenFlow reference implementation. - "ovs-dpctl" will not work.) - - - Otherwise, the most likely problem is that Open vSwitch was - built for a kernel different from the one into which you are - trying to load it. Run "modinfo" on openvswitch.ko and on - a module built for the running kernel, e.g.: - - % /sbin/modinfo openvswitch.ko - % /sbin/modinfo /lib/modules/`uname -r`/kernel/net/bridge/bridge.ko - - Compare the "vermagic" lines output by the two commands. If - they differ, then Open vSwitch was built for the wrong kernel. - - - If you decide to report a bug or ask a question related to - module loading, please include the output from the "dmesg" and - "modinfo" commands mentioned above. - - There is an optional module parameter to openvswitch.ko called - vlan_tso that enables TCP segmentation offload over VLANs on NICs - that support it. Many drivers do not expose support for TSO on VLANs - in a way that Open vSwitch can use but there is no way to detect - whether this is the case. If you know that your particular driver can - handle it (for example by testing sending large TCP packets over VLANs) - then passing in a value of 1 may improve performance. Modules built for - Linux kernels 2.6.37 and later, as well as specially patched versions - of earlier kernels, do not need this and do not have this parameter. If - you do not understand what this means or do not know if your driver - will work, do not set this. - -8. Initialize the configuration database using ovsdb-tool, e.g.: - - % mkdir -p /usr/local/etc/openvswitch - % ovsdb-tool create /usr/local/etc/openvswitch/conf.db vswitchd/vswitch.ovsschema - -Startup -======= - -Before starting ovs-vswitchd itself, you need to start its -configuration database, ovsdb-server. Each machine on which Open -vSwitch is installed should run its own copy of ovsdb-server. -Configure it to use the database you created during installation (as -explained above), to listen on a Unix domain socket, to connect to any -managers specified in the database itself, and to use the SSL -configuration in the database: - - % ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ - --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ - --private-key=db:Open_vSwitch,SSL,private_key \ - --certificate=db:Open_vSwitch,SSL,certificate \ - --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert \ - --pidfile --detach - -(If you built Open vSwitch without SSL support, then omit ---private-key, --certificate, and --bootstrap-ca-cert.) - -Then initialize the database using ovs-vsctl. This is only -necessary the first time after you create the database with -ovsdb-tool (but running it at any time is harmless): - - % ovs-vsctl --no-wait init - -Then start the main Open vSwitch daemon, telling it to connect to the -same Unix domain socket: - - % ovs-vswitchd --pidfile --detach - -Now you may use ovs-vsctl to set up bridges and other Open vSwitch -features. For example, to create a bridge named br0 and add ports -eth0 and vif1.0 to it: - - % ovs-vsctl add-br br0 - % ovs-vsctl add-port br0 eth0 - % ovs-vsctl add-port br0 vif1.0 - -Please refer to ovs-vsctl(8) for more details. - -Upgrading -========= - -When you upgrade Open vSwitch from one version to another, you should -also upgrade the database schema: - -1. Stop the Open vSwitch daemons, e.g.: - - % kill `cd /usr/local/var/run/openvswitch && cat ovsdb-server.pid ovs-vswitchd.pid` - -2. Install the new Open vSwitch release. - -3. Upgrade the database, in one of the following two ways: - - - If there is no important data in your database, then you may - delete the database file and recreate it with ovsdb-tool, - following the instructions under "Building and Installing Open - vSwitch for Linux, FreeBSD or NetBSD". - - - If you want to preserve the contents of your database, back it - up first, then use "ovsdb-tool convert" to upgrade it, e.g.: - - % ovsdb-tool convert /usr/local/etc/openvswitch/conf.db vswitchd/vswitch.ovsschema - -4. Start the Open vSwitch daemons as described under "Building and - Installing Open vSwitch for Linux, FreeBSD or NetBSD" above. - -Hot Upgrading -============= -Upgrading Open vSwitch from one version to the next version with minimum -disruption of traffic going through the system that is using that Open vSwitch -needs some considerations: - -1. If the upgrade only involves upgrading the userspace utilities and daemons -of Open vSwitch, make sure that the new userspace version is compatible with -the previously loaded kernel module. - -2. An upgrade of userspace daemons means that they have to be restarted. -Restarting the daemons means that the OpenFlow flows in the ovs-vswitchd daemon -will be lost. One way to restore the flows is to let the controller -re-populate it. Another way is to save the previous flows using a utility -like ovs-ofctl and then re-add them after the restart. Restoring the old flows -is accurate only if the new Open vSwitch interfaces retain the old 'ofport' -values. - -3. When the new userspace daemons get restarted, they automatically flush -the old flows setup in the kernel. This can be expensive if there are hundreds -of new flows that are entering the kernel but userspace daemons are busy -setting up new userspace flows from either the controller or an utility like -ovs-ofctl. Open vSwitch database provides an option to solve this problem -through the other_config:flow-restore-wait column of the Open_vSwitch table. -Refer to the ovs-vswitchd.conf.db(5) manpage for details. - -4. If the upgrade also involves upgrading the kernel module, the old kernel -module needs to be unloaded and the new kernel module should be loaded. This -means that the kernel network devices belonging to Open vSwitch is recreated -and the kernel flows are lost. The downtime of the traffic can be reduced -if the userspace daemons are restarted immediately and the userspace flows -are restored as soon as possible. - -The ovs-ctl utility's "restart" function only restarts the userspace daemons, -makes sure that the 'ofport' values remain consistent across restarts, restores -userspace flows using the ovs-ofctl utility and also uses the -other_config:flow-restore-wait column to keep the traffic downtime to the -minimum. The ovs-ctl utility's "force-reload-kmod" function does all of the -above, but also replaces the old kernel module with the new one. Open vSwitch -startup scripts for Debian, XenServer and RHEL use ovs-ctl's functions and it -is recommended that these functions be used for other software platforms too. - -Testsuites -========== - -This section describe Open vSwitch's built-in support for various test -suites. You must configure and build Open vSwitch (steps 1 through 3 -in "Building and Installing Open vSwitch for Linux, FreeBSD or NetBSD" -above) before you run the tests described here. You do not need to -install Open vSwitch or to build or load the kernel module to run -these test suites. You do not need supervisor privilege to run these -test suites. - -Self-Tests ----------- - -Open vSwitch includes a suite of self-tests. Before you submit patches -upstream, we advise that you run the tests and ensure that they pass. -If you add new features to Open vSwitch, then adding tests for those -features will ensure your features don't break as developers modify -other areas of Open vSwitch. - -Refer to "Testsuites" above for prerequisites. - -To run all the unit tests in Open vSwitch, one at a time: - make check -This takes under 5 minutes on a modern desktop system. - -To run all the unit tests in Open vSwitch, up to 8 in parallel: - make check TESTSUITEFLAGS=-j8 -This takes under a minute on a modern 4-core desktop system. - -To see a list of all the available tests, run: - make check TESTSUITEFLAGS=--list - -To run only a subset of tests, e.g. test 123 and tests 477 through 484: - make check TESTSUITEFLAGS='123 477-484' -(Tests do not have inter-dependencies, so you may run any subset.) - -To run tests matching a keyword, e.g. "ovsdb": - make check TESTSUITEFLAGS='-k ovsdb' - -To see a complete list of test options: - make check TESTSUITEFLAGS=--help - -The results of a testing run are reported in tests/testsuite.log. -Please report test failures as bugs and include the testsuite.log in -your report. - -If you have "valgrind" installed, then you can also run the testsuite -under valgrind by using "make check-valgrind" in place of "make -check". All the same options are available via TESTSUITEFLAGS. When -you do this, the "valgrind" results for test are reported in files -named tests/testsuite.dir//valgrind.*. You may find that the -valgrind results are easier to interpret if you put "-q" in -~/.valgrindrc, since that reduces the amount of output. - -Sometimes a few tests may fail on some runs but not others. This is -usually a bug in the testsuite, not a bug in Open vSwitch itself. If -you find that a test fails intermittently, please report it, since the -developers may not have noticed. - -OFTest ------- - -OFTest is an OpenFlow protocol testing suite. Open vSwitch includes a -Makefile target to run OFTest with Open vSwitch in "dummy mode". In -this mode of testing, no packets travel across physical or virtual -networks. Instead, Unix domain sockets stand in as simulated -networks. This simulation is imperfect, but it is much easier to set -up, does not require extra physical or virtual hardware, and does not -require supervisor privileges. - -To run OFTest with Open vSwitch, first read and follow the -instructions under "Testsuites" above. Second, obtain a copy of -OFTest and install its prerequisites. You need a copy of OFTest that -includes commit 406614846c5 (make ovs-dummy platform work again). -This commit was merged into the OFTest repository on Feb 1, 2013, so -any copy of OFTest more recent than that should work. Testing OVS in -dummy mode does not require root privilege, so you may ignore that -requirement. - -Optionally, add the top-level OFTest directory (containing the "oft" -program) to your $PATH. This slightly simplifies running OFTest later. - -To run OFTest in dummy mode, run the following command from your Open -vSwitch build directory: - make check-oftest OFT= -where is the absolute path to the "oft" program in -OFTest. - -If you added "oft" to your $PATH, you may omit the OFT variable -assignment: - make check-oftest -By default, "check-oftest" passes "oft" just enough options to enable -dummy mode. You can use OFTFLAGS to pass additional options. For -example, to run just the basic.Echo test instead of all tests (the -default) and enable verbose logging: - make check-oftest OFT= OFTFLAGS='--verbose -T basic.Echo' - -If you use OFTest that does not include commit 4d1f3eb2c792 (oft: -change default port to 6653), merged into the OFTest repository in -October 2013, then you need to add an option to use the IETF-assigned -controller port: - make check-oftest OFT= OFTFLAGS='--port=6653' - -Please interpret OFTest results cautiously. Open vSwitch can fail a -given test in OFTest for many reasons, including bugs in Open vSwitch, -bugs in OFTest, bugs in the "dummy mode" integration, and differing -interpretations of the OpenFlow standard and other standards. - -Open vSwitch has not been validated against OFTest. Please do report -test failures that you believe to represent bugs in Open vSwitch. -Include the precise versions of Open vSwitch and OFTest in your bug -report, plus any other information needed to reproduce the problem. - -Ryu ---- - -Ryu is an OpenFlow controller written in Python that includes an -extensive OpenFlow testsuite. Open vSwitch includes a Makefile target -to run Ryu in "dummy mode". See "OFTest" above for an explanation of -dummy mode. - -To run Ryu tests with Open vSwitch, first read and follow the -instructions under "Testsuites" above. Second, obtain a copy of Ryu, -install its prerequisites, and build it. You do not need to install -Ryu (some of the tests do not get installed, so it does not help). - -To run Ryu tests, run the following command from your Open vSwitch -build directory: - make check-ryu RYUDIR= -where is the absolute path to the root of the Ryu -source distribution. The default is $srcdir/../ryu -where $srcdir is your Open vSwitch source directory, so if this -default is correct then you make simply run "make check-ryu". - -Open vSwitch has not been validated against Ryu. Please do report -test failures that you believe to represent bugs in Open vSwitch. -Include the precise versions of Open vSwitch and Ryu in your bug -report, plus any other information needed to reproduce the problem. - -Continuous Integration with Travis-CI -------------------------------------- - -A .travis.yml file is provided to automatically build Open vSwitch with -various build configurations and run the testsuite using travis-ci. -Builds will be performed with gcc, sparse and clang with the -Werror -compiler flag included, therefore the build will fail if a new warning -has been introduced. - -The CI build is triggered via git push (regardless of the specific -branch) or pull request against any Open vSwitch GitHub repository that -is linked to travis-ci. - -Instructions to setup travis-ci for your GitHub repository: - -1. Go to http://travis-ci.org/ and sign in using your GitHub ID. -2. Go to the "Repositories" tab and enable the ovs repository. You - may disable builds for pushes or pull requests. -3. In order to avoid forks sending build failures to the upstream - mailing list, the notification email recipient is encrypted. If you - want to receive email notification for build failures, replace the - the encrypted string: - 3.1) Install the travis-ci CLI (Requires ruby >=2.0): - gem install travis - 3.2) In your Open vSwitch repository: - travis encrypt mylist@mydomain.org - 3.3) Add/replace the notifications section in .travis.yml and fill - in the secure string as returned by travis encrypt: - - notifications: - email: - recipients: - - secure: "....." - - (You may remove/omit the notifications section to fall back to - default notification behaviour which is to send an email directly - to the author and committer of the failing commit. Note that the - email is only sent if the author/committer have commit rights for - the particular GitHub repository). - -4. Pushing a commit to the repository which breaks the build or the - testsuite will now trigger a email sent to mylist@mydomain.org - -Bug Reporting -============= - -Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.DPDK b/INSTALL.DPDK deleted file mode 100644 index d3b03cf0d..000000000 --- a/INSTALL.DPDK +++ /dev/null @@ -1,252 +0,0 @@ - Using Open vSwitch with DPDK - ============================ - -Open vSwitch can use Intel(R) DPDK lib to operate entirely in -userspace. This file explains how to install and use Open vSwitch in -such a mode. - -The DPDK support of Open vSwitch is considered experimental. -It has not been thoroughly tested. - -This version of Open vSwitch should be built manually with "configure" -and "make". - -OVS needs a system with 1GB hugepages support. - -Building and Installing: ------------------------- - -Required DPDK 1.7 - -DPDK: -Set dir i.g.: export DPDK_DIR=/usr/src/dpdk-1.7.1 -cd $DPDK_DIR -update config/common_linuxapp so that dpdk generate single lib file. -(modification also required for IVSHMEM build) -CONFIG_RTE_BUILD_COMBINE_LIBS=y - -For default install without IVSHMEM: -make install T=x86_64-native-linuxapp-gcc -To include IVSHMEM (shared memory): -make install T=x86_64-ivshmem-linuxapp-gcc -For details refer to http://dpdk.org/ - -Linux kernel: -Refer to intel-dpdk-getting-started-guide.pdf for understanding -DPDK kernel requirement. - -OVS: -Non IVSHMEM: -export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/ -IVSHMEM: -export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/ - -cd $(OVS_DIR)/openvswitch -./boot.sh -./configure --with-dpdk=$DPDK_BUILD -make - -To have better performance one can enable aggressive compiler optimizations and -use the special instructions(popcnt, crc32) that may not be available on all -machines. Instead of typing 'make', type: - -make CFLAGS='-O3 -march=native' - -Refer to INSTALL.userspace for general requirements of building -userspace OVS. - -Using the DPDK with ovs-vswitchd: ---------------------------------- - -Setup system boot: - kernel bootline, add: default_hugepagesz=1GB hugepagesz=1G hugepages=1 - -First setup DPDK devices: - - insert uio.ko - e.g. modprobe uio - - insert igb_uio.ko - e.g. insmod $DPDK_BUILD/kmod/igb_uio.ko - - Bind network device to igb_uio. - e.g. $DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1 - -Prepare system: - - mount hugetlbfs - e.g. mount -t hugetlbfs -o pagesize=1G none /dev/hugepages - -Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup. - -Start ovsdb-server as discussed in INSTALL doc: - Summary e.g.: - First time only db creation (or clearing): - mkdir -p /usr/local/etc/openvswitch - mkdir -p /usr/local/var/run/openvswitch - rm /usr/local/etc/openvswitch/conf.db - cd $OVS_DIR - ./ovsdb/ovsdb-tool create /usr/local/etc/openvswitch/conf.db \ - ./vswitchd/vswitch.ovsschema - start ovsdb-server - cd $OVS_DIR - ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ - --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ - --private-key=db:Open_vSwitch,SSL,private_key \ - --certificate=Open_vSwitch,SSL,certificate \ - --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach - First time after db creation, initialize: - cd $OVS_DIR - ./utilities/ovs-vsctl --no-wait init - -Start vswitchd: -DPDK configuration arguments can be passed to vswitchd via `--dpdk` -argument. This needs to be first argument passed to vswitchd process. -dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter -for dpdk initialization. - - e.g. - export DB_SOCK=/usr/local/var/run/openvswitch/db.sock - ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach - -If allocated more than one GB hugepage (as for IVSHMEM), set amount and use NUMA -node 0 memory: - - ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \ - -- unix:$DB_SOCK --pidfile --detach - -To use ovs-vswitchd with DPDK, create a bridge with datapath_type -"netdev" in the configuration database. For example: - - ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev - -Now you can add dpdk devices. OVS expect DPDK device name start with dpdk -and end with portid. vswitchd should print (in the log file) the number of dpdk -devices found. - - ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk - ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk - -Once first DPDK port is added to vswitchd, it creates a Polling thread and -polls dpdk device in continuous loop. Therefore CPU utilization -for that thread is always 100%. - -Test flow script across NICs (assuming ovs in /usr/src/ovs): - Execute script: - -############################# Script: - -#! /bin/sh -# Move to command directory -cd /usr/src/ovs/utilities/ - -# Clear current flows -./ovs-ofctl del-flows br0 - -# Add flows between port 1 (dpdk0) to port 2 (dpdk1) -./ovs-ofctl add-flow br0 in_port=1,action=output:2 -./ovs-ofctl add-flow br0 in_port=2,action=output:1 - -###################################### - -With pmd multi-threading support, OVS creates one pmd thread for each -numa node as default. The pmd thread handles the I/O of all DPDK -interfaces on the same numa node. The following two commands can be used -to configure the multi-threading behavior. - - ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask= - -The command above asks for a CPU mask for setting the affinity of pmd threads. -A set bit in the mask means a pmd thread is created and pinned to the -corresponding CPU core. For more information, please refer to -`man ovs-vswitchd.conf.db` - - ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs= - -The command above sets the number of rx queues of each DPDK interface. The -rx queues are assigned to pmd threads on the same numa node in round-robin -fashion. For more information, please refer to `man ovs-vswitchd.conf.db` - -Ideally for maximum throughput, the pmd thread should not be scheduled out -which temporarily halts its execution. The following affinitization methods -can help. - -Lets pick core 4,6,8,10 for pmd threads to run on. Also assume a dual 8 core -sandy bridge system with hyperthreading enabled where CPU1 has cores 0,...,7 -and 16,...,23 & CPU2 cores 8,...,15 & 24,...,31. (A different cpu -configuration could have different core mask requirements). - -To kernel bootline add core isolation list for cores and associated hype cores -(e.g. isolcpus=4,20,6,22,8,24,10,26,). Reboot system for isolation to take -effect, restart everything. - -Configure pmd threads on core 4,6,8,10 using 'pmd-cpu-mask': - - ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=00000550 - -You should be able to check that pmd threads are pinned to the correct cores -via: - - top -p `pidof ovs-vswitchd` -H -d1 - -Note, the pmd threads on a numa node are only created if there is at least -one DPDK interface from the numa node that has been added to OVS. - -Note, core 0 is always reserved from non-pmd threads and should never be set -in the cpu mask. - -DPDK Rings : ------------- - -Following the steps above to create a bridge, you can now add dpdk rings -as a port to the vswitch. OVS will expect the DPDK ring device name to -start with dpdkr and end with a portid. - - ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr - -DPDK rings client test application - -Included in the test directory is a sample DPDK application for testing -the rings. This is from the base dpdk directory and modified to work -with the ring naming used within ovs. - -location tests/ovs_client - -To run the client : - cd /usr/src/ovs/tests/ - ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr" - -In the case of the dpdkr example above the "port id you gave dpdkr" is 0. - -It is essential to have --proc-type=secondary - -The application simply receives an mbuf on the receive queue of the -ethernet ring and then places that same mbuf on the transmit ring of -the ethernet ring. It is a trivial loopback application. - -DPDK rings in VM (IVSHMEM shared memory communications) -------------------------------------------------------- - -In addition to executing the client in the host, you can execute it within -a guest VM. To do so you will need a patched qemu. You can download the -patch and getting started guide at : - -https://01.org/packet-processing/downloads - -A general rule of thumb for better performance is that the client -application should not be assigned the same dpdk core mask "-c" as -the vswitchd. - -Restrictions: -------------- - - - This Support is for Physical NIC. I have tested with Intel NIC only. - - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. - - Currently DPDK port does not make use any offload functionality. - - ivshmem: - - The shared memory is currently restricted to the use of a 1GB - huge pages. - - All huge pages are shared amongst the host, clients, virtual - machines etc. - -Bug Reporting: --------------- - -Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md new file mode 100644 index 000000000..d612b8fe3 --- /dev/null +++ b/INSTALL.DPDK.md @@ -0,0 +1,284 @@ +Using Open vSwitch with DPDK +============================ + +Open vSwitch can use Intel(R) DPDK lib to operate entirely in +userspace. This file explains how to install and use Open vSwitch in +such a mode. + +The DPDK support of Open vSwitch is considered experimental. +It has not been thoroughly tested. + +This version of Open vSwitch should be built manually with `configure` +and `make`. + +OVS needs a system with 1GB hugepages support. + +Building and Installing: +------------------------ + +Required DPDK 1.7 + +1. Configure build & install DPDK: + 1. Set `$DPDK_DIR` + + ``` + export DPDK_DIR=/usr/src/dpdk-1.7.1 + cd $DPDK_DIR + ``` + + 2. Update `config/common_linuxapp` so that DPDK generate single lib file. + (modification also required for IVSHMEM build) + + `CONFIG_RTE_BUILD_COMBINE_LIBS=y` + + Then run `make install` to build and isntall the library. + For default install without IVSHMEM: + + `make install T=x86_64-native-linuxapp-gcc` + + To include IVSHMEM (shared memory): + + `make install T=x86_64-ivshmem-linuxapp-gcc` + + For further details refer to http://dpdk.org/ + +2. Configure & build the Linux kernel: + + Refer to intel-dpdk-getting-started-guide.pdf for understanding + DPDK kernel requirement. + +3. Configure & build OVS: + + * Non IVSHMEM: + + `export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/` + + * IVSHMEM: + + `export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/` + + ``` + cd $(OVS_DIR)/openvswitch + ./boot.sh + ./configure --with-dpdk=$DPDK_BUILD + make + ``` + +To have better performance one can enable aggressive compiler optimizations and +use the special instructions(popcnt, crc32) that may not be available on all +machines. Instead of typing `make`, type: + +`make CFLAGS='-O3 -march=native'` + +Refer to [INSTALL.userspace](INSTALL.userspace.md) for general requirements of +building userspace OVS. + +Using the DPDK with ovs-vswitchd: +--------------------------------- + +1. Setup system boot + Add the following options to the kernel bootline: + + `default_hugepagesz=1GB hugepagesz=1G hugepages=1` + +2. Setup DPDK devices: + 1. insert uio.ko: `modprobe uio` + 2. insert igb_uio.ko: `insmod $DPDK_BUILD/kmod/igb_uio.ko` + 3. Bind network device to igb_uio: `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1` + +3. Mount the hugetable filsystem + + `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages` + + Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup. + +4. Start ovsdb-server as discussed in [INSTALL](INSTALL.md) doc: + 1. First time only db creation (or clearing): + + ``` + mkdir -p /usr/local/etc/openvswitch + mkdir -p /usr/local/var/run/openvswitch + rm /usr/local/etc/openvswitch/conf.db + cd $OVS_DIR + ./ovsdb/ovsdb-tool create /usr/local/etc/openvswitch/conf.db \ + ./vswitchd/vswitch.ovsschema + ``` + + 2. start ovsdb-server + + ``` + cd $OVS_DIR + ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ + --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ + --private-key=db:Open_vSwitch,SSL,private_key \ + --certificate=Open_vSwitch,SSL,certificate \ + --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach + ``` + + 3. First time after db creation, initialize: + + ``` + cd $OVS_DIR + ./utilities/ovs-vsctl --no-wait init + ``` + +5. Start vswitchd: + + DPDK configuration arguments can be passed to vswitchd via `--dpdk` + argument. This needs to be first argument passed to vswitchd process. + dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter + for dpdk initialization. + + export DB_SOCK=/usr/local/var/run/openvswitch/db.sock + ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach + + If allocated more than one GB hugepage (as for IVSHMEM), set amount and use NUMA + node 0 memory: + + ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \ + -- unix:$DB_SOCK --pidfile --detach + +6. Add bridge & ports + + To use ovs-vswitchd with DPDK, create a bridge with datapath_type + "netdev" in the configuration database. For example: + + `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev` + + Now you can add dpdk devices. OVS expect DPDK device name start with dpdk + and end with portid. vswitchd should print (in the log file) the number of dpdk + devices found. + + ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk + ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk + + Once first DPDK port is added to vswitchd, it creates a Polling thread and + polls dpdk device in continuous loop. Therefore CPU utilization + for that thread is always 100%. + +7. Add test flows + + Test flow script across NICs (assuming ovs in /usr/src/ovs): + Execute script: + + ``` + #! /bin/sh + # Move to command directory + cd /usr/src/ovs/utilities/ + + # Clear current flows + ./ovs-ofctl del-flows br0 + + # Add flows between port 1 (dpdk0) to port 2 (dpdk1) + ./ovs-ofctl add-flow br0 in_port=1,action=output:2 + ./ovs-ofctl add-flow br0 in_port=2,action=output:1 + ``` + +8. Performance tuning + + With pmd multi-threading support, OVS creates one pmd thread for each + numa node as default. The pmd thread handles the I/O of all DPDK + interfaces on the same numa node. The following two commands can be used + to configure the multi-threading behavior. + + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask= + + The command above asks for a CPU mask for setting the affinity of pmd threads. + A set bit in the mask means a pmd thread is created and pinned to the + corresponding CPU core. For more information, please refer to + `man ovs-vswitchd.conf.db` + + ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs= + + The command above sets the number of rx queues of each DPDK interface. The + rx queues are assigned to pmd threads on the same numa node in round-robin + fashion. For more information, please refer to `man ovs-vswitchd.conf.db` + + Ideally for maximum throughput, the pmd thread should not be scheduled out + which temporarily halts its execution. The following affinitization methods + can help. + + Lets pick core 4,6,8,10 for pmd threads to run on. Also assume a dual 8 core + sandy bridge system with hyperthreading enabled where CPU1 has cores 0,...,7 + and 16,...,23 & CPU2 cores 8,...,15 & 24,...,31. (A different cpu + configuration could have different core mask requirements). + + To kernel bootline add core isolation list for cores and associated hype cores + (e.g. isolcpus=4,20,6,22,8,24,10,26,). Reboot system for isolation to take + effect, restart everything. + + Configure pmd threads on core 4,6,8,10 using 'pmd-cpu-mask': + + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=00000550 + + You should be able to check that pmd threads are pinned to the correct cores + via: + + top -p `pidof ovs-vswitchd` -H -d1 + + Note, the pmd threads on a numa node are only created if there is at least + one DPDK interface from the numa node that has been added to OVS. + + Note, core 0 is always reserved from non-pmd threads and should never be set + in the cpu mask. + +DPDK Rings : +------------ + +Following the steps above to create a bridge, you can now add dpdk rings +as a port to the vswitch. OVS will expect the DPDK ring device name to +start with dpdkr and end with a portid. + + ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr + +DPDK rings client test application + +Included in the test directory is a sample DPDK application for testing +the rings. This is from the base dpdk directory and modified to work +with the ring naming used within ovs. + +location tests/ovs_client + +To run the client : + + cd /usr/src/ovs/tests/ + ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr" + +In the case of the dpdkr example above the "port id you gave dpdkr" is 0. + +It is essential to have --proc-type=secondary + +The application simply receives an mbuf on the receive queue of the +ethernet ring and then places that same mbuf on the transmit ring of +the ethernet ring. It is a trivial loopback application. + +DPDK rings in VM (IVSHMEM shared memory communications) +------------------------------------------------------- + +In addition to executing the client in the host, you can execute it within +a guest VM. To do so you will need a patched qemu. You can download the +patch and getting started guide at : + +https://01.org/packet-processing/downloads + +A general rule of thumb for better performance is that the client +application should not be assigned the same dpdk core mask "-c" as +the vswitchd. + +Restrictions: +------------- + + - This Support is for Physical NIC. I have tested with Intel NIC only. + - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. + - Currently DPDK port does not make use any offload functionality. + + ivshmem: + - The shared memory is currently restricted to the use of a 1GB + huge pages. + - All huge pages are shared amongst the host, clients, virtual + machines etc. + +Bug Reporting: +-------------- + +Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.Debian b/INSTALL.Debian deleted file mode 100644 index 34251a1b5..000000000 --- a/INSTALL.Debian +++ /dev/null @@ -1,91 +0,0 @@ - How to Build Debian Packages for Open vSwitch - ============================================= - -This document describes how to build Debian packages for Open vSwitch. -To install Open vSwitch on Debian without building Debian packages, -see INSTALL instead. - -These instructions should also work on Ubuntu and other Debian -derivative distributions. - - -Before You Begin ----------------- - -Before you begin, consider whether you really need to build packages -yourself. Debian "wheezy" and "sid", as well as recent versions of -Ubuntu, contain pre-built Debian packages for Open vSwitch. It is -easier to install these than to build your own. To use packages from -your distribution, skip ahead to "Installing .deb Packages", below. - - -Building Open vSwitch Debian packages -------------------------------------- - -You may build from an Open vSwitch distribution tarball or from an -Open vSwitch Git tree with these instructions. - -You do not need to be the superuser to build the Debian packages. - -1. Install the "build-essential" and "fakeroot" packages, e.g. with - "apt-get install build-essential fakeroot". - -2. Obtain and unpack an Open vSwitch source distribution and "cd" into - its top level directory. - -3. Install the build dependencies listed under "Build-Depends:" near - the top of debian/control. You can install these any way you like, - e.g. with "apt-get install". - - Check your work by running "dpkg-checkbuilddeps" in the top level of - your ovs directory. If you've installed all the dependencies - properly, dpkg-checkbuilddeps will exit without printing anything. - If you forgot to install some dependencies, it will tell you which ones. - -4. Run: - - fakeroot debian/rules binary - - This will do a serial build that runs the unit tests. This will take - approximately 8 to 10 minutes. If you prefer, you can run a faster - parallel build, e.g.: - - DEB_BUILD_OPTIONS='parallel=8' fakeroot debian/rules binary - - If you are in a big hurry, you can even skip the unit tests: - - DEB_BUILD_OPTIONS='parallel=8 nocheck' fakeroot debian/rules binary - -5. The generated .deb files will be in the parent directory of the - Open vSwitch source distribution. - - -Installing .deb Packages ------------------------- - -These instructions apply to installing from Debian packages that you -built yourself, as described in the previous section, or from packages -provided by Debian or a Debian derivative distribution such as Ubuntu. -In the former case, use a command such as "dpkg -i" to install the -.deb files that you build, and in the latter case use a program such -as "apt-get" or "aptitude" to download and install the provided -packages. - -You must be superuser to install Debian packages. - -1. Start by installing an Open vSwitch kernel module. See - debian/openvswitch-switch.README.Debian for the available options. - -2. Install the "openvswitch-switch" and "openvswitch-common" packages. - These packages include the core userspace components of the switch. - -Open vSwitch .deb packages not mentioned above are rarely useful. -Please refer to their individual package descriptions to find out -whether any of them are useful to you. - - -Bug Reporting -------------- - -Please report problems to bugs@openvswitch.org. - diff --git a/INSTALL.Debian.md b/INSTALL.Debian.md new file mode 100644 index 000000000..59e0c973b --- /dev/null +++ b/INSTALL.Debian.md @@ -0,0 +1,91 @@ +How to Build Debian Packages for Open vSwitch +============================================= + +This document describes how to build Debian packages for Open vSwitch. +To install Open vSwitch on Debian without building Debian packages, +see [INSTALL](INSTALL.md) instead. + +These instructions should also work on Ubuntu and other Debian +derivative distributions. + + +Before You Begin +---------------- + +Before you begin, consider whether you really need to build packages +yourself. Debian "wheezy" and "sid", as well as recent versions of +Ubuntu, contain pre-built Debian packages for Open vSwitch. It is +easier to install these than to build your own. To use packages from +your distribution, skip ahead to "Installing .deb Packages", below. + + +Building Open vSwitch Debian packages +------------------------------------- + +You may build from an Open vSwitch distribution tarball or from an +Open vSwitch Git tree with these instructions. + +You do not need to be the superuser to build the Debian packages. + +1. Install the "build-essential" and "fakeroot" packages, e.g. with + `apt-get install build-essential fakeroot`. + +2. Obtain and unpack an Open vSwitch source distribution and `cd` into + its top level directory. + +3. Install the build dependencies listed under "Build-Depends:" near + the top of debian/control. You can install these any way you like, + e.g. with `apt-get install`. + + Check your work by running `dpkg-checkbuilddeps` in the top level of + your ovs directory. If you've installed all the dependencies + properly, dpkg-checkbuilddeps will exit without printing anything. + If you forgot to install some dependencies, it will tell you which ones. + +4. Run: + + `fakeroot debian/rules binary` + + This will do a serial build that runs the unit tests. This will take + approximately 8 to 10 minutes. If you prefer, you can run a faster + parallel build, e.g.: + + `DEB_BUILD_OPTIONS='parallel=8' fakeroot debian/rules binary` + + If you are in a big hurry, you can even skip the unit tests: + + `DEB_BUILD_OPTIONS='parallel=8 nocheck' fakeroot debian/rules binary` + +5. The generated .deb files will be in the parent directory of the + Open vSwitch source distribution. + + +Installing .deb Packages +------------------------ + +These instructions apply to installing from Debian packages that you +built yourself, as described in the previous section, or from packages +provided by Debian or a Debian derivative distribution such as Ubuntu. +In the former case, use a command such as `dpkg -i` to install the +.deb files that you build, and in the latter case use a program such +as `apt-get` or `aptitude` to download and install the provided +packages. + +You must be superuser to install Debian packages. + +1. Start by installing an Open vSwitch kernel module. See + debian/openvswitch-switch.README.Debian for the available options. + +2. Install the "openvswitch-switch" and "openvswitch-common" packages. + These packages include the core userspace components of the switch. + +Open vSwitch .deb packages not mentioned above are rarely useful. +Please refer to their individual package descriptions to find out +whether any of them are useful to you. + + +Bug Reporting +------------- + +Please report problems to bugs@openvswitch.org. + diff --git a/INSTALL.Docker b/INSTALL.Docker deleted file mode 100644 index 1c492f3fe..000000000 --- a/INSTALL.Docker +++ /dev/null @@ -1,82 +0,0 @@ - How to Use Open vSwitch with Docker - ==================================== - -This document describes how to use Open vSwitch with Docker 1.2.0 or -later. This document assumes that you followed INSTALL or installed -Open vSwitch from distribution packaging such as a .deb or .rpm. -Consult www.docker.com for instructions on how to install Docker. - -Limitations ------------ -Currently there is no native integration of Open vSwitch in Docker, i.e., -one cannot use the Docker client to automatically add a container's -network interface to an Open vSwitch bridge during the creation of the -container. This document describes addition of new network interfaces to an -already created container and in turn attaching that interface as a port to an -Open vSwitch bridge. - -Setup ------ -* Create your container, e.g.: - -% docker run -d ubuntu:14.04 /bin/sh -c \ -"while true; do echo hello world; sleep 1; done" - -The above command creates a container with one network interface 'eth0' -and attaches it to a Linux bridge called 'docker0'. 'eth0' by default -gets an IP address in the 172.17.0.0/16 space. Docker sets up iptables -NAT rules to let this interface talk to the outside world. Also since -it is connected to 'docker0' bridge, it can talk to all other containers -connected to the same bridge. If you prefer that no network interface be -created by default, you can start your container with -the option '--net=none', e,g.: - -% docker run -d --net=none ubuntu:14.04 /bin/sh -c \ -"while true; do echo hello world; sleep 1; done" - -The above commands will return a container id. You will need to pass this -value to the utility 'ovs-docker' to create network interfaces attached to an -Open vSwitch bridge as a port. This document will reference this value -as $CONTAINER_ID in the next steps. - -* Add a new network interface to the container and attach it to an Open vSwitch - bridge. e.g.: - -% ovs-docker add-port br-int eth1 $CONTAINER_ID - -The above command will create a network interface 'eth1' inside the container -and then attaches it to the Open vSwitch bridge 'br-int'. This is done by -creating a veth pair. One end of the interface becomes 'eth1' inside the -container and the other end attaches to 'br-int'. - -The script also lets one to add an IP address to the interface. e.g.: - -% ovs-docker add-port br-int eth1 $CONTAINER_ID 192.168.1.1/24 - -* A previously added network interface can be deleted. e.g.: - -% ovs-docker del-port br-int eth1 $CONTAINER_ID - -All the previously added Open vSwitch interfaces inside a container can be -deleted. e.g.: - -% ovs-docker del-ports br-int $CONTAINER_ID - -It is important that the same $CONTAINER_ID be passed to both add-port -and del-port[s] commands. - -* More network control. - -Once a container interface is added to an Open vSwitch bridge, one can -set VLANs, create Tunnels, add OpenFlow rules etc for more network control. -Please read the man pages of ovs-vsctl, ovs-ofctl, ovs-vswitchd, -ovsdb-server ovs-vswitchd.conf.db etc for more details. - -Docker networking is quite flexible and can be used in multiple ways. For more -information, please read: -https://docs.docker.com/articles/networking - -Bug Reporting -------------- - -Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.Docker.md b/INSTALL.Docker.md new file mode 100644 index 000000000..29560363f --- /dev/null +++ b/INSTALL.Docker.md @@ -0,0 +1,87 @@ +How to Use Open vSwitch with Docker +==================================== + +This document describes how to use Open vSwitch with Docker 1.2.0 or +later. This document assumes that you followed [INSTALL](INSTALL.md) +or installed Open vSwitch from distribution packaging such as a .deb +or .rpm. Consult www.docker.com for instructions on how to install +Docker. + +Limitations +----------- +Currently there is no native integration of Open vSwitch in Docker, i.e., +one cannot use the Docker client to automatically add a container's +network interface to an Open vSwitch bridge during the creation of the +container. This document describes addition of new network interfaces to an +already created container and in turn attaching that interface as a port to an +Open vSwitch bridge. + +Setup +----- +* Create your container, e.g.: + +``` +% docker run -d ubuntu:14.04 /bin/sh -c \ +"while true; do echo hello world; sleep 1; done" +``` + +The above command creates a container with one network interface 'eth0' +and attaches it to a Linux bridge called 'docker0'. 'eth0' by default +gets an IP address in the 172.17.0.0/16 space. Docker sets up iptables +NAT rules to let this interface talk to the outside world. Also since +it is connected to 'docker0' bridge, it can talk to all other containers +connected to the same bridge. If you prefer that no network interface be +created by default, you can start your container with +the option '--net=none', e,g.: + +``` +% docker run -d --net=none ubuntu:14.04 /bin/sh -c \ +"while true; do echo hello world; sleep 1; done" +``` + +The above commands will return a container id. You will need to pass this +value to the utility 'ovs-docker' to create network interfaces attached to an +Open vSwitch bridge as a port. This document will reference this value +as $CONTAINER_ID in the next steps. + +* Add a new network interface to the container and attach it to an Open vSwitch + bridge. e.g.: + +`% ovs-docker add-port br-int eth1 $CONTAINER_ID` + +The above command will create a network interface 'eth1' inside the container +and then attaches it to the Open vSwitch bridge 'br-int'. This is done by +creating a veth pair. One end of the interface becomes 'eth1' inside the +container and the other end attaches to 'br-int'. + +The script also lets one to add an IP address to the interface. e.g.: + +`% ovs-docker add-port br-int eth1 $CONTAINER_ID 192.168.1.1/24` + +* A previously added network interface can be deleted. e.g.: + +`% ovs-docker del-port br-int eth1 $CONTAINER_ID` + +All the previously added Open vSwitch interfaces inside a container can be +deleted. e.g.: + +`% ovs-docker del-ports br-int $CONTAINER_ID` + +It is important that the same $CONTAINER_ID be passed to both add-port +and del-port[s] commands. + +* More network control. + +Once a container interface is added to an Open vSwitch bridge, one can +set VLANs, create Tunnels, add OpenFlow rules etc for more network control. +Please read the man pages of ovs-vsctl, ovs-ofctl, ovs-vswitchd, +ovsdb-server ovs-vswitchd.conf.db etc for more details. + +Docker networking is quite flexible and can be used in multiple ways. For more +information, please read: +https://docs.docker.com/articles/networking + +Bug Reporting +------------- + +Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.Fedora b/INSTALL.Fedora deleted file mode 100644 index aa76c06ab..000000000 --- a/INSTALL.Fedora +++ /dev/null @@ -1,83 +0,0 @@ - How to Install Open vSwitch on Fedora Linux - =========================================== - -This document describes how to build and install Open vSwitch on a Fedora -Linux host. If you want to install Open vSwitch on a generic Linux host, -see INSTALL.Linux instead. - -We have tested these instructions with Fedora 16 and Fedora 17. - -Building Open vSwitch for Fedora --------------------------------- - -You may build from an Open vSwitch distribution tarball or from an -Open vSwitch Git tree. - -The default RPM build directory (_topdir) has five directories in -the top-level: -1. BUILD/ Where the software is unpacked and built. -2. RPMS/ Where the newly created binary package files are written. -3. SOURCES/ Contains the original sources, patches, and icon files. -4. SPECS/ Contains the spec files for each package to be built. -5. SRPMS/ Where the newly created source package files are written. - -Before you begin, note the RPM sources directory on your version of -Fedora. The command "rpmbuild --showrc" will show the configuration -for each of those directories. Alternatively, the command "rpm --eval - '%{_topdir}'" shows the current configuration for the top level -directory and the command "rpm --eval '%{_sourcedir}'" does the same -for the sources directory. On Fedora 17, the default RPM _topdir is -$HOME/rpmbuild and the default RPM sources directory is -$HOME/rpmbuild/SOURCES. - -1. If you are building from a distribution tarball, skip to step 2. - Otherwise, you must be building from an Open vSwitch Git tree. - Create a distribution tarball from the root of the Git tree by - running: - - ./boot.sh - ./configure - make dist - -2. Now you have a distribution tarball, named something like - openvswitch-x.y.z.tar.gz. Copy this file into the RPM sources - directory, e.g.: - - cp openvswitch-x.y.z.tar.gz $HOME/rpmbuild/SOURCES - -3. Make another copy of the distribution tarball in a temporary - directory. Then unpack the tarball and "cd" into its root, e.g.: - - tar xzf openvswitch-x.y.z.tar.gz - cd openvswitch-x.y.z - -4. To build Open vSwitch userspace, run: - - rpmbuild -bb rhel/openvswitch-fedora.spec - - This produces one RPM: "openvswitch". - - The above command automatically runs the Open vSwitch unit tests. - To disable the unit tests, run: - - rpmbuild -bb --without check rhel/openvswitch-fedora.spec - -5. On Fedora 17, to build the Open vSwitch kernel module, run: - - rpmbuild -bb rhel/openvswitch-kmod-fedora.spec - - You might have to specify a kernel version and/or variants, e.g.: - - rpmbuild -bb \ - -D "kversion 2.6.32-131.6.1.el6.x86_64" \ - -D "kflavors default debug kdump" \ - rhel/openvswitch-kmod-rhel6.spec - - This produces an "kmod-openvswitch" RPM for each kernel variant, - in this example: "kmod-openvswitch", "kmod-openvswitch-debug", and - "kmod-openvswitch-kdump". - -Reporting Bugs --------------- - -Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.Fedora.md b/INSTALL.Fedora.md new file mode 100644 index 000000000..bfde181cd --- /dev/null +++ b/INSTALL.Fedora.md @@ -0,0 +1,89 @@ +How to Install Open vSwitch on Fedora Linux +=========================================== + +This document describes how to build and install Open vSwitch on a Fedora +Linux host. If you want to install Open vSwitch on a generic Linux host, +see [INSTALL.Linux](INSTALL.Linux.md) instead. + +We have tested these instructions with Fedora 16 and Fedora 17. + +Building Open vSwitch for Fedora +-------------------------------- + +You may build from an Open vSwitch distribution tarball or from an +Open vSwitch Git tree. + +The default RPM build directory (_topdir) has five directories in +the top-level: +1. BUILD/ Where the software is unpacked and built. +2. RPMS/ Where the newly created binary package files are written. +3. SOURCES/ Contains the original sources, patches, and icon files. +4. SPECS/ Contains the spec files for each package to be built. +5. SRPMS/ Where the newly created source package files are written. + +Before you begin, note the RPM sources directory on your version of +Fedora. The command "rpmbuild --showrc" will show the configuration +for each of those directories. Alternatively, the command "rpm --eval + '%{_topdir}'" shows the current configuration for the top level +directory and the command "rpm --eval '%{_sourcedir}'" does the same +for the sources directory. On Fedora 17, the default RPM _topdir is +$HOME/rpmbuild and the default RPM sources directory is +$HOME/rpmbuild/SOURCES. + +1. If you are building from a distribution tarball, skip to step 2. + Otherwise, you must be building from an Open vSwitch Git tree. + Create a distribution tarball from the root of the Git tree by + running: + + ``` + ./boot.sh + ./configure + make dist + ``` + +2. Now you have a distribution tarball, named something like + openvswitch-x.y.z.tar.gz. Copy this file into the RPM sources + directory, e.g.: + + `cp openvswitch-x.y.z.tar.gz $HOME/rpmbuild/SOURCES` + +3. Make another copy of the distribution tarball in a temporary + directory. Then unpack the tarball and "cd" into its root, e.g.: + + ``` + tar xzf openvswitch-x.y.z.tar.gz + cd openvswitch-x.y.z + ``` + +4. To build Open vSwitch userspace, run: + + `rpmbuild -bb rhel/openvswitch-fedora.spec` + + This produces one RPM: "openvswitch". + + The above command automatically runs the Open vSwitch unit tests. + To disable the unit tests, run: + + `rpmbuild -bb --without check rhel/openvswitch-fedora.spec` + +5. On Fedora 17, to build the Open vSwitch kernel module, run: + + `rpmbuild -bb rhel/openvswitch-kmod-fedora.spec` + + You might have to specify a kernel version and/or variants, e.g.: + + ``` + rpmbuild -bb \ + -D "kversion 2.6.32-131.6.1.el6.x86_64" \ + -D "kflavors default debug kdump" \ + rhel/openvswitch-kmod-rhel6.spec + ``` + + This produces an "kmod-openvswitch" RPM for each kernel variant, + in this example: "kmod-openvswitch", "kmod-openvswitch-debug", and + "kmod-openvswitch-kdump". + +Reporting Bugs +-------------- + +Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.KVM b/INSTALL.KVM deleted file mode 100644 index 01b11710e..000000000 --- a/INSTALL.KVM +++ /dev/null @@ -1,83 +0,0 @@ - How to Use Open vSwitch with KVM - ================================= - -This document describes how to use Open vSwitch with the Kernel-based -Virtual Machine (KVM). This document assumes that you have read and -followed INSTALL to get Open vSwitch setup on your Linux system. - -Setup ------ - -First, follow the setup instructions in INSTALL to get a working -Open vSwitch installation. - -KVM uses tunctl to handle various bridging modes, which you can -install with the Debian/Ubuntu package uml-utilities. - - % apt-get install uml-utilities - -Next, you will need to modify or create custom versions of the qemu-ifup -and qemu-ifdown scripts. In this guide, we'll create custom versions -that make use of example Open vSwitch bridges that we'll describe in this -guide. - -Create the following two files and store them in known locations. - -For example /etc/ovs-ifup and /etc/ovs-ifdown - -/etc/ovs-ifup --------------------------------------------------------------------- -#!/bin/sh - -switch='br0' -/sbin/ifconfig $1 0.0.0.0 up -ovs-vsctl add-port ${switch} $1 --------------------------------------------------------------------- - -/etc/ovs-ifdown --------------------------------------------------------------------- -#!/bin/sh - -switch='br0' -/sbin/ifconfig $1 0.0.0.0 down -ovs-vsctl del-port ${switch} $1 --------------------------------------------------------------------- - -At the end of INSTALL, it describes basic usage of creating -bridges and ports. If you haven't already, create a bridge named -br0 with the following command: - - % ovs-vsctl add-br br0 - -Then, add a port to the bridge for the NIC that you want your guests -to communicate over (e.g. eth0): - - % ovs-vsctl add-port br0 eth0 - -Please refer to ovs-vsctl(8) for more details. - -Next, we'll start a guest that will use our ifup and ifdown scripts. - - % kvm -m 512 -net nic,macaddr=00:11:22:EE:EE:EE -net \ -tap,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown -drive \ -file=/path/to/disk-image,boot=on - -This will start the guest and associate a tap device with it. The -ovs-ifup script will add a port on the br0 bridge so that the -guest will be able to communicate over that bridge. - -To get some more information and for debugging you can use Open -vSwitch utilities such as ovs-dpctl and ovs-ofctl, For example: - - % ovs-dpctl show - % ovs-ofctl show br0 - -You should see tap devices for each KVM guest added as ports to -the bridge (e.g. tap0) - -Please refer to ovs-dpctl(8) and ovs-ofctl(8) for more details. - -Bug Reporting -------------- - -Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.KVM.md b/INSTALL.KVM.md new file mode 100644 index 000000000..4d847b6e9 --- /dev/null +++ b/INSTALL.KVM.md @@ -0,0 +1,86 @@ +How to Use Open vSwitch with KVM +================================= + +This document describes how to use Open vSwitch with the Kernel-based +Virtual Machine (KVM). This document assumes that you have read and +followed [INSTALL](INSTALL.md) to get Open vSwitch setup on your Linux +system. + +Setup +----- + +First, follow the setup instructions in [INSTALL](INSTALL.md) to get a +working Open vSwitch installation. + +KVM uses tunctl to handle various bridging modes, which you can +install with the Debian/Ubuntu package uml-utilities. + + % apt-get install uml-utilities + +Next, you will need to modify or create custom versions of the qemu-ifup +and qemu-ifdown scripts. In this guide, we'll create custom versions +that make use of example Open vSwitch bridges that we'll describe in this +guide. + +Create the following two files and store them in known locations. + +For example /etc/ovs-ifup and /etc/ovs-ifdown + +/etc/ovs-ifup + +``` +#!/bin/sh + +switch='br0' +/sbin/ifconfig $1 0.0.0.0 up +ovs-vsctl add-port ${switch} $1 +``` + +/etc/ovs-ifdown + +``` +#!/bin/sh + +switch='br0' +/sbin/ifconfig $1 0.0.0.0 down +ovs-vsctl del-port ${switch} $1 +``` + +At the end of [INSTALL](INSTALL.md), it describes basic usage of creating +bridges and ports. If you haven't already, create a bridge named +br0 with the following command: + + % ovs-vsctl add-br br0 + +Then, add a port to the bridge for the NIC that you want your guests +to communicate over (e.g. eth0): + + % ovs-vsctl add-port br0 eth0 + +Please refer to ovs-vsctl(8) for more details. + +Next, we'll start a guest that will use our ifup and ifdown scripts. + + % kvm -m 512 -net nic,macaddr=00:11:22:EE:EE:EE -net \ + tap,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown -drive \ + file=/path/to/disk-image,boot=on + +This will start the guest and associate a tap device with it. The +ovs-ifup script will add a port on the br0 bridge so that the +guest will be able to communicate over that bridge. + +To get some more information and for debugging you can use Open +vSwitch utilities such as ovs-dpctl and ovs-ofctl, For example: + + % ovs-dpctl show + % ovs-ofctl show br0 + +You should see tap devices for each KVM guest added as ports to +the bridge (e.g. tap0) + +Please refer to ovs-dpctl(8) and ovs-ofctl(8) for more details. + +Bug Reporting +------------- + +Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.Libvirt b/INSTALL.Libvirt deleted file mode 100644 index fc5575cb9..000000000 --- a/INSTALL.Libvirt +++ /dev/null @@ -1,70 +0,0 @@ - How to Use Open vSwitch with Libvirt - ==================================== - -This document describes how to use Open vSwitch with Libvirt 0.9.11 or -later. This document assumes that you followed INSTALL or installed -Open vSwitch from distribution packaging such as a .deb or .rpm. The Open -vSwitch support is included by default in Libvirt 0.9.11. Consult -www.libvirt.org for instructions on how to build the latest Libvirt, if your -Linux distribution by default comes with an older Libvirt release. - -Limitations ------------ -Currently there is no Open vSwitch support for networks that are managed -by libvirt (e.g. NAT). As of now, only bridged networks are supported (those -where the user has to manually create the bridge). - -Setup ------ - -First, create the Open vSwitch bridge by using the ovs-vsctl utility (this -must be done with administrative privileges): - - % ovs-vsctl add-br ovsbr - -Once that is done, create a VM, if necessary, and edit its Domain XML file: - - % virsh edit - -Lookup in the Domain XML file the section. There should be one -such XML section for each interface the VM has. - - ... - - - -
- - ... - -And change it to something like this: - - ... - - - - -
- - ... - -The interface type must be set to "bridge". The XML element -specifies to which bridge this interface will be attached to. The - element indicates that the bridge in element is an -Open vSwitch bridge. - -Then (re)start the VM and verify if the guest's vnet interface is attached to -the ovsbr bridge. - - % ovs-vsctl show - -Troubleshooting ---------------- -If the VM does not want to start, then try to run the libvirtd process either -from the terminal, so that all errors are printed in console, or inspect -Libvirt/Open vSwitch log files for possible root cause. - -Bug Reporting -------------- - -Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.Libvirt.md b/INSTALL.Libvirt.md new file mode 100644 index 000000000..0ee676cb7 --- /dev/null +++ b/INSTALL.Libvirt.md @@ -0,0 +1,70 @@ +How to Use Open vSwitch with Libvirt +==================================== + +This document describes how to use Open vSwitch with Libvirt 0.9.11 or +later. This document assumes that you followed INSTALL or installed +Open vSwitch from distribution packaging such as a .deb or .rpm. The Open +vSwitch support is included by default in Libvirt 0.9.11. Consult +www.libvirt.org for instructions on how to build the latest Libvirt, if your +Linux distribution by default comes with an older Libvirt release. + +Limitations +----------- +Currently there is no Open vSwitch support for networks that are managed +by libvirt (e.g. NAT). As of now, only bridged networks are supported (those +where the user has to manually create the bridge). + +Setup +----- + +First, create the Open vSwitch bridge by using the ovs-vsctl utility (this +must be done with administrative privileges): + + % ovs-vsctl add-br ovsbr + +Once that is done, create a VM, if necessary, and edit its Domain XML file: + + % virsh edit + +Lookup in the Domain XML file the `` section. There should be one +such XML section for each interface the VM has. + +``` + + + +
+ +``` + +And change it to something like this: + +``` + + + + +
+ + ``` + +The interface type must be set to "bridge". The `` XML element +specifies to which bridge this interface will be attached to. The +`` element indicates that the bridge in `` element is an +Open vSwitch bridge. + +Then (re)start the VM and verify if the guest's vnet interface is attached to +the ovsbr bridge. + + % ovs-vsctl show + +Troubleshooting +--------------- +If the VM does not want to start, then try to run the libvirtd process either +from the terminal, so that all errors are printed in console, or inspect +Libvirt/Open vSwitch log files for possible root cause. + +Bug Reporting +------------- + +Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.NetBSD b/INSTALL.NetBSD deleted file mode 100644 index aab8793e9..000000000 --- a/INSTALL.NetBSD +++ /dev/null @@ -1,31 +0,0 @@ - How to Install Open vSwitch on NetBSD - ===================================== - -On NetBSD, you might want to install requirements from pkgsrc. -In that case, you need at least the following packages. - - automake - libtool-base - gmake - python27 - py27-xml - pkg_alternatives - -Some components have additional requirements. (See INSTALL) - -Assuming you are running NetBSD/amd64 6.1.2, you can download and -install pre-built binary packages as the following. -(You might get some warnings about minor version mismatch. Don't care.) - - # PKG_PATH=http://ftp.netbsd.org/pub/pkgsrc/packages/NetBSD/amd64/6.1.2/All/ - # export PKG_PATH - # pkg_add automake libtool-base gmake python27 py27-xml pkg_alternatives - -NetBSD's /usr/bin/make is not GNU make. GNU make is installed as -/usr/pkg/bin/gmake by the above mentioned 'gmake' package. - -As all executables installed with pkgsrc are placed in /usr/pkg/bin/ -directory, it might be a good idea to add it to your PATH. - -Open vSwitch on NetBSD is currently "userspace switch" implementation -in the sense described in INSTALL.userspace and PORTING. diff --git a/INSTALL.NetBSD.md b/INSTALL.NetBSD.md new file mode 100644 index 000000000..47322505a --- /dev/null +++ b/INSTALL.NetBSD.md @@ -0,0 +1,33 @@ +How to Install Open vSwitch on NetBSD +===================================== + +On NetBSD, you might want to install requirements from pkgsrc. +In that case, you need at least the following packages. + + * automake + * libtool-base + * gmake + * python27 + * py27-xml + * pkg_alternatives + +Some components have additional requirements. (See [INSTALL](INSTALL.md)) + +Assuming you are running NetBSD/amd64 6.1.2, you can download and +install pre-built binary packages as the following. +(You might get some warnings about minor version mismatch. Don't care.) + + ``` + # PKG_PATH=http://ftp.netbsd.org/pub/pkgsrc/packages/NetBSD/amd64/6.1.2/All/ + # export PKG_PATH + # pkg_add automake libtool-base gmake python27 py27-xml pkg_alternatives + ``` + +NetBSD's `/usr/bin/make` is not GNU make. GNU make is installed as +`/usr/pkg/bin/gmake` by the above mentioned `gmake` package. + +As all executables installed with pkgsrc are placed in `/usr/pkg/bin/` +directory, it might be a good idea to add it to your PATH. + +Open vSwitch on NetBSD is currently "userspace switch" implementation +in the sense described in INSTALL.userspace and PORTING. diff --git a/INSTALL.RHEL b/INSTALL.RHEL deleted file mode 100644 index 080296b9c..000000000 --- a/INSTALL.RHEL +++ /dev/null @@ -1,151 +0,0 @@ - How to Install Open vSwitch on Red Hat Enterprise Linux - ======================================================= - -This document describes how to build and install Open vSwitch on a Red -Hat Enterprise Linux (RHEL) host. If you want to install Open vSwitch -on a generic Linux host, see INSTALL instead. - -We have tested these instructions with RHEL 5.6 and RHEL 6.0. - -Building Open vSwitch for RHEL ------------------------------- - -You may build from an Open vSwitch distribution tarball or from an -Open vSwitch Git tree. - -The default RPM build directory (_topdir) has five directories in -the top-level: -1. BUILD/ Where the software is unpacked and built. -2. RPMS/ Where the newly created binary package files are written. -3. SOURCES/ Contains the original sources, patches, and icon files. -4. SPECS/ Contains the spec files for each package to be built. -5. SRPMS/ Where the newly created source package files are written. - -Before you begin, note the RPM sources directory on your version of -RHEL. The command "rpmbuild --showrc" will show the configuration -for each of those directories. Alternatively, the command "rpm --eval - '%{_topdir}'" shows the current configuration for the top level -directory and the command "rpm --eval '%{_sourcedir}'" does the same -for the sources directory. On RHEL 5, the default RPM _topdir is -/usr/src/redhat and the default RPM sources directory is -/usr/src/redhat/SOURCES. On RHEL 6, the default _topdir is -$HOME/rpmbuild and the default RPM sources directory is -$HOME/rpmbuild/SOURCES. - -1. Install build prerequisites: - - yum install gcc make python-devel openssl-devel kernel-devel graphviz \ - kernel-debug-devel autoconf automake rpm-build redhat-rpm-config \ - libtool - -2. If you are building from a distribution tarball, skip to step 3. - Otherwise, you must be building from an Open vSwitch Git tree. - Determine what version of Autoconf is installed (e.g. run "autoconf - --version"). If it is not at least version 2.63, then you have two - choices: - - a. Install Autoconf 2.63 or later, one way or another. - - b. Create a distribution tarball on some other machine, by - running "./boot.sh; ./configure; make dist" in the Git tree. - You must run this on a machine that has the tools listed in - INSTALL as prerequisites for building from a Git tree. - Afterward, proceed with the rest of the instructions using the - distribution tarball. - -3. Some versions of the RHEL 6 kernel-devel package contain a broken - "build" symlink. If you are using such a version, you must fix - the problem before continuing. - - To find out whether you are affected, run: - - cd /lib/modules/ - ls -l build/ - - where is the version number of the RHEL 6 kernel. (The - trailing slash in the final command is important. Be sure to include - it.) If the "ls" command produces a directory listing, your - kernel-devel package is OK. If it produces a "No such file or - directory" error, your kernel-devel package is buggy. - - If your kernel-devel package is buggy, then you can fix it with: - - cd /lib/modules/ - rm build - ln -s /usr/src/kernels/ build - - where is the name of an existing directory under - /usr/src/kernels, whose name should be similar to but may - contain some extra parts. Once you have done this, verify the fix with - the same procedure you used above to check for the problem. - -4. If you are building from a distribution tarball, skip to step 5. - Otherwise, create a distribution tarball from the root of the Git - tree by running: - - ./boot.sh - ./configure - make dist - -5. Now you have a distribution tarball, named something like - openvswitch-x.y.z.tar.gz. Copy this file into the RPM sources - directory, e.g.: - - cp openvswitch-x.y.z.tar.gz $HOME/rpmbuild/SOURCES - -6. Make another copy of the distribution tarball in a temporary - directory. Then unpack the tarball and "cd" into its root, e.g.: - - tar xzf openvswitch-x.y.z.tar.gz - cd openvswitch-x.y.z - -7. To build Open vSwitch userspace, run: - - rpmbuild -bb rhel/openvswitch.spec - - This produces two RPMs: "openvswitch" and "openvswitch-debuginfo". - - The above command automatically runs the Open vSwitch unit tests. - To disable the unit tests, run: - - rpmbuild -bb --without check rhel/openvswitch.spec - - If the build fails with "configure: error: source dir - /lib/modules/2.6.32-279.el6.x86_64/build doesn't exist" or similar, - then the kernel-devel package is missing or buggy. Go back to step - 1 or 2 and fix the problem. - -8. On RHEL 6, to build the Open vSwitch kernel module, copy - rhel/openvswitch-kmod.files into the RPM sources directory and run: - - rpmbuild -bb rhel/openvswitch-kmod-rhel6.spec - - You might have to specify a kernel version and/or variants, e.g.: - - rpmbuild -bb \ - -D "kversion 2.6.32-131.6.1.el6.x86_64" \ - -D "kflavors default debug kdump" \ - rhel/openvswitch-kmod-rhel6.spec - - This produces an "kmod-openvswitch" RPM for each kernel variant, in - this example: "kmod-openvswitch", "kmod-openvswitch-debug", and - "kmod-openvswitch-kdump". - -A RHEL host has default firewall rules that prevent any Open vSwitch tunnel -traffic from passing through. If a user configures Open vSwitch tunnels like -Geneve, GRE, VXLAN, LISP etc., they will either have to manually add iptables -firewall rules to allow the tunnel traffic or add it through a startup script -(Please refer to the "enable-protocol" command in the ovs-ctl(8) manpage). - -Red Hat Network Scripts Integration ------------------------------------ - -Simple integration with Red Hat network scripts has been implemented. -Please read rhel/README.RHEL in the source tree or -/usr/share/doc/openvswitch/README.RHEL in the installed openvswitch -package for details. - -Reporting Bugs --------------- - -Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.RHEL.md b/INSTALL.RHEL.md new file mode 100644 index 000000000..1d3d0f8ae --- /dev/null +++ b/INSTALL.RHEL.md @@ -0,0 +1,163 @@ +How to Install Open vSwitch on Red Hat Enterprise Linux +======================================================= + +This document describes how to build and install Open vSwitch on a Red +Hat Enterprise Linux (RHEL) host. If you want to install Open vSwitch +on a generic Linux host, see [INSTALL](INSTALL.md) instead. + +We have tested these instructions with RHEL 5.6 and RHEL 6.0. + +Building Open vSwitch for RHEL +------------------------------ + +You may build from an Open vSwitch distribution tarball or from an +Open vSwitch Git tree. + +The default RPM build directory (_topdir) has five directories in +the top-level: +1. BUILD/ Where the software is unpacked and built. +2. RPMS/ Where the newly created binary package files are written. +3. SOURCES/ Contains the original sources, patches, and icon files. +4. SPECS/ Contains the spec files for each package to be built. +5. SRPMS/ Where the newly created source package files are written. + +Before you begin, note the RPM sources directory on your version of +RHEL. The command "rpmbuild --showrc" will show the configuration +for each of those directories. Alternatively, the command "rpm --eval + '%{_topdir}'" shows the current configuration for the top level +directory and the command "rpm --eval '%{_sourcedir}'" does the same +for the sources directory. On RHEL 5, the default RPM _topdir is +/usr/src/redhat and the default RPM sources directory is +/usr/src/redhat/SOURCES. On RHEL 6, the default _topdir is +$HOME/rpmbuild and the default RPM sources directory is +$HOME/rpmbuild/SOURCES. + +1. Install build prerequisites: + + ``` + yum install gcc make python-devel openssl-devel kernel-devel graphviz \ + kernel-debug-devel autoconf automake rpm-build redhat-rpm-config \ + libtool + ``` + +2. If you are building from a distribution tarball, skip to step 3. + Otherwise, you must be building from an Open vSwitch Git tree. + Determine what version of Autoconf is installed (e.g. run "autoconf + --version"). If it is not at least version 2.63, then you have two + choices: + + a. Install Autoconf 2.63 or later, one way or another. + + b. Create a distribution tarball on some other machine, by + running "./boot.sh; ./configure; make dist" in the Git tree. + You must run this on a machine that has the tools listed in + [INSTALL](INSTALL.md) as prerequisites for building from a Git + tree. Afterward, proceed with the rest of the instructions using + the distribution tarball. + +3. Some versions of the RHEL 6 kernel-devel package contain a broken + "build" symlink. If you are using such a version, you must fix + the problem before continuing. + + To find out whether you are affected, run: + + ``` + cd /lib/modules/ + ls -l build/ + ``` + + where `` is the version number of the RHEL 6 kernel. (The + trailing slash in the final command is important. Be sure to include + it.) If the "ls" command produces a directory listing, your + kernel-devel package is OK. If it produces a "No such file or + directory" error, your kernel-devel package is buggy. + + If your kernel-devel package is buggy, then you can fix it with: + + ``` + cd /lib/modules/ + rm build + ln -s /usr/src/kernels/ build + ``` + + where `` is the name of an existing directory under + /usr/src/kernels, whose name should be similar to `` but may + contain some extra parts. Once you have done this, verify the fix with + the same procedure you used above to check for the problem. + +4. If you are building from a distribution tarball, skip to step 5. + Otherwise, create a distribution tarball from the root of the Git + tree by running: + + ``` + ./boot.sh + ./configure + make dist + ``` + +5. Now you have a distribution tarball, named something like + openvswitch-x.y.z.tar.gz. Copy this file into the RPM sources + directory, e.g.: + + `cp openvswitch-x.y.z.tar.gz $HOME/rpmbuild/SOURCES` + +6. Make another copy of the distribution tarball in a temporary + directory. Then unpack the tarball and "cd" into its root, e.g.: + + ``` + tar xzf openvswitch-x.y.z.tar.gz + cd openvswitch-x.y.z + ``` + +7. To build Open vSwitch userspace, run: + + `rpmbuild -bb rhel/openvswitch.spec` + + This produces two RPMs: "openvswitch" and "openvswitch-debuginfo". + + The above command automatically runs the Open vSwitch unit tests. + To disable the unit tests, run: + + `rpmbuild -bb --without check rhel/openvswitch.spec` + + If the build fails with "configure: error: source dir + /lib/modules/2.6.32-279.el6.x86_64/build doesn't exist" or similar, + then the kernel-devel package is missing or buggy. Go back to step + 1 or 2 and fix the problem. + +8. On RHEL 6, to build the Open vSwitch kernel module, copy + rhel/openvswitch-kmod.files into the RPM sources directory and run: + + `rpmbuild -bb rhel/openvswitch-kmod-rhel6.spec` + + You might have to specify a kernel version and/or variants, e.g.: + + ``` + rpmbuild -bb \ + -D "kversion 2.6.32-131.6.1.el6.x86_64" \ + -D "kflavors default debug kdump" \ + rhel/openvswitch-kmod-rhel6.spec + ``` + + This produces an "kmod-openvswitch" RPM for each kernel variant, in + this example: "kmod-openvswitch", "kmod-openvswitch-debug", and + "kmod-openvswitch-kdump". + +A RHEL host has default firewall rules that prevent any Open vSwitch tunnel +traffic from passing through. If a user configures Open vSwitch tunnels like +Geneve, GRE, VXLAN, LISP etc., they will either have to manually add iptables +firewall rules to allow the tunnel traffic or add it through a startup script +(Please refer to the "enable-protocol" command in the ovs-ctl(8) manpage). + +Red Hat Network Scripts Integration +----------------------------------- + +Simple integration with Red Hat network scripts has been implemented. +Please read rhel/README.RHEL in the source tree or +/usr/share/doc/openvswitch/README.RHEL in the installed openvswitch +package for details. + +Reporting Bugs +-------------- + +Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.SSL b/INSTALL.SSL deleted file mode 100644 index a6931d20c..000000000 --- a/INSTALL.SSL +++ /dev/null @@ -1,312 +0,0 @@ - Configuring Open vSwitch for SSL - ================================ - -If you plan to configure Open vSwitch to connect across the network to -an OpenFlow controller, then we recommend that you build Open vSwitch -with OpenSSL. SSL support ensures integrity and confidentiality of -the OpenFlow connections, increasing network security. - -This file explains how to configure an Open vSwitch to connect to an -OpenFlow controller over SSL. Refer to INSTALL for instructions on -building Open vSwitch with SSL support. - -Open vSwitch uses TLS version 1.0 or later (TLSv1), as specified by -RFC 2246, which is very similar to SSL version 3.0. TLSv1 was -released in January 1999, so all current software and hardware should -implement it. - -This document assumes basic familiarity with public-key cryptography -and public-key infrastructure. - -SSL Concepts for OpenFlow -------------------------- - -This section is an introduction to the public-key infrastructure -architectures that Open vSwitch supports for SSL authentication. - -To connect over SSL, every Open vSwitch must have a unique -private/public key pair and a certificate that signs that public key. -Typically, the Open vSwitch generates its own public/private key pair. -There are two common ways to obtain a certificate for a switch: - - * Self-signed certificates: The Open vSwitch signs its certificate - with its own private key. In this case, each switch must be - individually approved by the OpenFlow controller(s), since there - is no central authority. - - This is the only switch PKI model currently supported by NOX - (http://noxrepo.org). - - * Switch certificate authority: A certificate authority (the - "switch CA") signs each Open vSwitch's public key. The OpenFlow - controllers then check that any connecting switches' - certificates are signed by that certificate authority. - - This is the only switch PKI model supported by the simple - OpenFlow controller included with Open vSwitch. - -Each Open vSwitch must also have a copy of the CA certificate for the -certificate authority that signs OpenFlow controllers' keys (the -"controller CA" certificate). Typically, the same controller CA -certificate is installed on all of the switches within a given -administrative unit. There are two common ways for a switch to obtain -the controller CA certificate: - - * Manually copy the certificate to the switch through some secure - means, e.g. using a USB flash drive, or over the network with - "scp", or even FTP or HTTP followed by manual verification. - - * Open vSwitch "bootstrap" mode, in which Open vSwitch accepts and - saves the controller CA certificate that it obtains from the - OpenFlow controller on its first connection. Thereafter the - switch will only connect to controllers signed by the same CA - certificate. - -Establishing a Public Key Infrastructure ----------------------------------------- - -Open vSwitch can make use of your existing public key infrastructure. -If you already have a PKI, you may skip forward to the next section. -Otherwise, if you do not have a PKI, the ovs-pki script included with -Open vSwitch can help. To create an initial PKI structure, invoke it -as: - - % ovs-pki init - -to create and populate a new PKI directory. The default location for -the PKI directory depends on how the Open vSwitch tree was configured -(to see the configured default, look for the --dir option description -in the output of "ovs-pki --help"). - -The pki directory contains two important subdirectories. The -controllerca subdirectory contains controller CA files, including the -following: - - - cacert.pem: Root certificate for the controller certificate - authority. Each Open vSwitch must have a copy of this file to - allow it to authenticate valid controllers. - - - private/cakey.pem: Private signing key for the controller - certificate authority. This file must be kept secret. There is - no need for switches or controllers to have a copy of it. - -The switchca subdirectory contains switch CA files, analogous to those -in the controllerca subdirectory: - - - cacert.pem: Root certificate for the switch certificate - authority. The OpenFlow controller must have this file to - enable it to authenticate valid switches. - - - private/cakey.pem: Private signing key for the switch - certificate authority. This file must be kept secret. There is - no need for switches or controllers to have a copy of it. - -After you create the initial structure, you can create keys and -certificates for switches and controllers with ovs-pki. Refer to the -ovs-pki(8) manage for complete details. A few examples of its use -follow: - -CONTROLLER KEY GENERATION - -To create a controller private key and certificate in files named -ctl-privkey.pem and ctl-cert.pem, run the following on the machine -that contains the PKI structure: - - % ovs-pki req+sign ctl controller - -ctl-privkey.pem and ctl-cert.pem would need to be copied to the -controller for its use at runtime. If, for testing purposes, you were -to use ovs-testcontroller, the simple OpenFlow controller included -with Open vSwitch, then the --private-key and --certificate options, -respectively, would point to these files. - -It is very important to make sure that no stray copies of -ctl-privkey.pem are created, because they could be used to impersonate -the controller. - -SWITCH KEY GENERATION WITH SELF-SIGNED CERTIFICATES - -If you are using self-signed certificates (see "SSL Concepts for -OpenFlow"), this is one way to create an acceptable certificate for -your controller to approve. - -1. Run the following command on the Open vSwitch itself: - - % ovs-pki self-sign sc - - (This command does not require a copy of any of the PKI files - generated by "ovs-pki init", and you should not copy them to the - switch because some of them have contents that must remain secret - for security.) - - The "ovs-pki self-sign" command has the following output: - - * sc-privkey.pem, the switch private key file. For security, - the contents of this file must remain secret. There is - ordinarily no need to copy this file off the Open vSwitch. - - * sc-cert.pem, the switch certificate, signed by the switch's - own private key. Its contents are not a secret. - -2. Optionally, copy controllerca/cacert.pem from the machine that has - the OpenFlow PKI structure and verify that it is correct. - (Otherwise, you will have to use CA certificate bootstrapping when - you configure Open vSwitch in the next step.) - -3. Configure Open vSwitch to use the keys and certificates (see - "Configuring SSL Support", below). - -SWITCH KEY GENERATION WITH A SWITCH PKI (EASY METHOD) - -If you are using a switch PKI (see "SSL Concepts for OpenFlow", -above), this method of switch key generation is a little easier than -the alternate method described below, but it is also a little less -secure because it requires copying a sensitive private key from file -from the machine hosting the PKI to the switch. - -1. Run the following on the machine that contains the PKI structure: - - % ovs-pki req+sign sc switch - - This command has the following output: - - * sc-privkey.pem, the switch private key file. For - security, the contents of this file must remain secret. - - * sc-cert.pem, the switch certificate. Its contents are - not a secret. - -2. Copy sc-privkey.pem and sc-cert.pem, plus controllerca/cacert.pem, - to the Open vSwitch. - -3. Delete the copies of sc-privkey.pem and sc-cert.pem on the PKI - machine and any other copies that may have been made in transit. - It is very important to make sure that there are no stray copies of - sc-privkey.pem, because they could be used to impersonate the - switch. - - (Don't delete controllerca/cacert.pem! It is not - security-sensitive and you will need it to configure additional - switches.) - -4. Configure Open vSwitch to use the keys and certificates (see - "Configuring SSL Support", below). - -SWITCH KEY GENERATION WITH A SWITCH PKI (MORE SECURE) - -If you are using a switch PKI (see "SSL Concepts for OpenFlow", -above), then, compared to the previous method, the method described -here takes a little more work, but it does not involve copying the -private key from one machine to another, so it may also be a little -more secure. - -1. Run the following command on the Open vSwitch itself: - - % ovs-pki req sc switch - - (This command does not require a copy of any of the PKI files - generated by "ovs-pki init", and you should not copy them to the - switch because some of them have contents that must remain secret - for security.) - - The "ovs-pki req" command has the following output: - - * sc-privkey.pem, the switch private key file. For security, - the contents of this file must remain secret. There is - ordinarily no need to copy this file off the Open vSwitch. - - * sc-req.pem, the switch "certificate request", which is - essentially the switch's public key. Its contents are not a - secret. - - * A fingerprint, on stdout. - -2. Write the fingerprint down on a slip of paper and copy sc-req.pem - to the machine that contains the PKI structure. - -3. On the machine that contains the PKI structure, run: - - % ovs-pki sign sc switch - - This command will output a fingerprint to stdout and request that - you verify it. Check that it is the same as the fingerprint that - you wrote down on the slip of paper before you answer "yes". - - "ovs-pki sign" creates a file named sc-cert.pem, which is the - switch certificate. Its contents are not a secret. - -4. Copy the generated sc-cert.pem, plus controllerca/cacert.pem from - the PKI structure, to the Open vSwitch, and verify that they were - copied correctly. - - You may delete sc-cert.pem from the machine that hosts the PKI - structure now, although it is not important that you do so. (Don't - delete controllerca/cacert.pem! It is not security-sensitive and - you will need it to configure additional switches.) - -5. Configure Open vSwitch to use the keys and certificates (see - "Configuring SSL Support", below). - -Configuring SSL Support ------------------------ - -SSL configuration requires three additional configuration files. The -first two of these are unique to each Open vSwitch. If you used the -instructions above to build your PKI, then these files will be named -sc-privkey.pem and sc-cert.pem, respectively: - - - A private key file, which contains the private half of an RSA or - DSA key. - - This file can be generated on the Open vSwitch itself, for the - greatest security, or it can be generated elsewhere and copied - to the Open vSwitch. - - The contents of the private key file are secret and must not be - exposed. - - - A certificate file, which certifies that the private key is that - of a trustworthy Open vSwitch. - - This file has to be generated on a machine that has the private - key for the switch certification authority, which should not be - an Open vSwitch; ideally, it should be a machine that is not - networked at all. - - The certificate file itself is not a secret. - -The third configuration file is typically the same across all the -switches in a given administrative unit. If you used the -instructions above to build your PKI, then this file will be named -cacert.pem: - - - The root certificate for the controller certificate authority. - The Open vSwitch verifies it that is authorized to connect to an - OpenFlow controller by verifying a signature against this CA - certificate. - -Once you have these files, configure ovs-vswitchd to use them using -the ovs-vsctl "set-ssl" command, e.g.: - - ovs-vsctl set-ssl /etc/openvswitch/sc-privkey.pem /etc/openvswitch/sc-cert.pem /etc/openvswitch/cacert.pem - -Substitute the correct file names, of course, if they differ from the -ones used above. You should use absolute file names (ones that begin -with "/"), because ovs-vswitchd's current directory is unrelated to -the one from which you run ovs-vsctl. - -If you are using self-signed certificates (see "SSL Concepts for -OpenFlow") and you did not copy controllerca/cacert.pem from the PKI -machine to the Open vSwitch, then add the --bootstrap option, e.g.: - - ovs-vsctl -- --bootstrap set-ssl /etc/openvswitch/sc-privkey.pem /etc/openvswitch/sc-cert.pem /etc/openvswitch/cacert.pem - -After you have added all of these configuration keys, you may specify -"ssl:" connection methods elsewhere in the configuration database. -"tcp:" connection methods are still allowed even after SSL has been -configured, so for security you should use only "ssl:" connections. - -Reporting Bugs --------------- - -Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.SSL.md b/INSTALL.SSL.md new file mode 100644 index 000000000..aecb2b5f3 --- /dev/null +++ b/INSTALL.SSL.md @@ -0,0 +1,312 @@ +Configuring Open vSwitch for SSL +================================ + +If you plan to configure Open vSwitch to connect across the network to +an OpenFlow controller, then we recommend that you build Open vSwitch +with OpenSSL. SSL support ensures integrity and confidentiality of +the OpenFlow connections, increasing network security. + +This file explains how to configure an Open vSwitch to connect to an +OpenFlow controller over SSL. Refer to INSTALL for instructions on +building Open vSwitch with SSL support. + +Open vSwitch uses TLS version 1.0 or later (TLSv1), as specified by +RFC 2246, which is very similar to SSL version 3.0. TLSv1 was +released in January 1999, so all current software and hardware should +implement it. + +This document assumes basic familiarity with public-key cryptography +and public-key infrastructure. + +SSL Concepts for OpenFlow +------------------------- + +This section is an introduction to the public-key infrastructure +architectures that Open vSwitch supports for SSL authentication. + +To connect over SSL, every Open vSwitch must have a unique +private/public key pair and a certificate that signs that public key. +Typically, the Open vSwitch generates its own public/private key pair. +There are two common ways to obtain a certificate for a switch: + + * Self-signed certificates: The Open vSwitch signs its certificate + with its own private key. In this case, each switch must be + individually approved by the OpenFlow controller(s), since there + is no central authority. + + This is the only switch PKI model currently supported by NOX + (http://noxrepo.org). + + * Switch certificate authority: A certificate authority (the + "switch CA") signs each Open vSwitch's public key. The OpenFlow + controllers then check that any connecting switches' + certificates are signed by that certificate authority. + + This is the only switch PKI model supported by the simple + OpenFlow controller included with Open vSwitch. + +Each Open vSwitch must also have a copy of the CA certificate for the +certificate authority that signs OpenFlow controllers' keys (the +"controller CA" certificate). Typically, the same controller CA +certificate is installed on all of the switches within a given +administrative unit. There are two common ways for a switch to obtain +the controller CA certificate: + + * Manually copy the certificate to the switch through some secure + means, e.g. using a USB flash drive, or over the network with + "scp", or even FTP or HTTP followed by manual verification. + + * Open vSwitch "bootstrap" mode, in which Open vSwitch accepts and + saves the controller CA certificate that it obtains from the + OpenFlow controller on its first connection. Thereafter the + switch will only connect to controllers signed by the same CA + certificate. + +Establishing a Public Key Infrastructure +---------------------------------------- + +Open vSwitch can make use of your existing public key infrastructure. +If you already have a PKI, you may skip forward to the next section. +Otherwise, if you do not have a PKI, the ovs-pki script included with +Open vSwitch can help. To create an initial PKI structure, invoke it +as: + + % ovs-pki init + +to create and populate a new PKI directory. The default location for +the PKI directory depends on how the Open vSwitch tree was configured +(to see the configured default, look for the --dir option description +in the output of "ovs-pki --help"). + +The pki directory contains two important subdirectories. The +controllerca subdirectory contains controller CA files, including the +following: + + - cacert.pem: Root certificate for the controller certificate + authority. Each Open vSwitch must have a copy of this file to + allow it to authenticate valid controllers. + + - private/cakey.pem: Private signing key for the controller + certificate authority. This file must be kept secret. There is + no need for switches or controllers to have a copy of it. + +The switchca subdirectory contains switch CA files, analogous to those +in the controllerca subdirectory: + + - cacert.pem: Root certificate for the switch certificate + authority. The OpenFlow controller must have this file to + enable it to authenticate valid switches. + + - private/cakey.pem: Private signing key for the switch + certificate authority. This file must be kept secret. There is + no need for switches or controllers to have a copy of it. + +After you create the initial structure, you can create keys and +certificates for switches and controllers with ovs-pki. Refer to the +ovs-pki(8) manage for complete details. A few examples of its use +follow: + +CONTROLLER KEY GENERATION + +To create a controller private key and certificate in files named +ctl-privkey.pem and ctl-cert.pem, run the following on the machine +that contains the PKI structure: + + % ovs-pki req+sign ctl controller + +ctl-privkey.pem and ctl-cert.pem would need to be copied to the +controller for its use at runtime. If, for testing purposes, you were +to use ovs-testcontroller, the simple OpenFlow controller included +with Open vSwitch, then the --private-key and --certificate options, +respectively, would point to these files. + +It is very important to make sure that no stray copies of +ctl-privkey.pem are created, because they could be used to impersonate +the controller. + +SWITCH KEY GENERATION WITH SELF-SIGNED CERTIFICATES + +If you are using self-signed certificates (see "SSL Concepts for +OpenFlow"), this is one way to create an acceptable certificate for +your controller to approve. + +1. Run the following command on the Open vSwitch itself: + + % ovs-pki self-sign sc + + (This command does not require a copy of any of the PKI files + generated by "ovs-pki init", and you should not copy them to the + switch because some of them have contents that must remain secret + for security.) + + The "ovs-pki self-sign" command has the following output: + + * sc-privkey.pem, the switch private key file. For security, + the contents of this file must remain secret. There is + ordinarily no need to copy this file off the Open vSwitch. + + * sc-cert.pem, the switch certificate, signed by the switch's + own private key. Its contents are not a secret. + +2. Optionally, copy controllerca/cacert.pem from the machine that has + the OpenFlow PKI structure and verify that it is correct. + (Otherwise, you will have to use CA certificate bootstrapping when + you configure Open vSwitch in the next step.) + +3. Configure Open vSwitch to use the keys and certificates (see + "Configuring SSL Support", below). + +SWITCH KEY GENERATION WITH A SWITCH PKI (EASY METHOD) + +If you are using a switch PKI (see "SSL Concepts for OpenFlow", +above), this method of switch key generation is a little easier than +the alternate method described below, but it is also a little less +secure because it requires copying a sensitive private key from file +from the machine hosting the PKI to the switch. + +1. Run the following on the machine that contains the PKI structure: + + % ovs-pki req+sign sc switch + + This command has the following output: + + * sc-privkey.pem, the switch private key file. For + security, the contents of this file must remain secret. + + * sc-cert.pem, the switch certificate. Its contents are + not a secret. + +2. Copy sc-privkey.pem and sc-cert.pem, plus controllerca/cacert.pem, + to the Open vSwitch. + +3. Delete the copies of sc-privkey.pem and sc-cert.pem on the PKI + machine and any other copies that may have been made in transit. + It is very important to make sure that there are no stray copies of + sc-privkey.pem, because they could be used to impersonate the + switch. + + (Don't delete controllerca/cacert.pem! It is not + security-sensitive and you will need it to configure additional + switches.) + +4. Configure Open vSwitch to use the keys and certificates (see + "Configuring SSL Support", below). + +SWITCH KEY GENERATION WITH A SWITCH PKI (MORE SECURE) + +If you are using a switch PKI (see "SSL Concepts for OpenFlow", +above), then, compared to the previous method, the method described +here takes a little more work, but it does not involve copying the +private key from one machine to another, so it may also be a little +more secure. + +1. Run the following command on the Open vSwitch itself: + + % ovs-pki req sc switch + + (This command does not require a copy of any of the PKI files + generated by "ovs-pki init", and you should not copy them to the + switch because some of them have contents that must remain secret + for security.) + + The "ovs-pki req" command has the following output: + + * sc-privkey.pem, the switch private key file. For security, + the contents of this file must remain secret. There is + ordinarily no need to copy this file off the Open vSwitch. + + * sc-req.pem, the switch "certificate request", which is + essentially the switch's public key. Its contents are not a + secret. + + * A fingerprint, on stdout. + +2. Write the fingerprint down on a slip of paper and copy sc-req.pem + to the machine that contains the PKI structure. + +3. On the machine that contains the PKI structure, run: + + % ovs-pki sign sc switch + + This command will output a fingerprint to stdout and request that + you verify it. Check that it is the same as the fingerprint that + you wrote down on the slip of paper before you answer "yes". + + "ovs-pki sign" creates a file named sc-cert.pem, which is the + switch certificate. Its contents are not a secret. + +4. Copy the generated sc-cert.pem, plus controllerca/cacert.pem from + the PKI structure, to the Open vSwitch, and verify that they were + copied correctly. + + You may delete sc-cert.pem from the machine that hosts the PKI + structure now, although it is not important that you do so. (Don't + delete controllerca/cacert.pem! It is not security-sensitive and + you will need it to configure additional switches.) + +5. Configure Open vSwitch to use the keys and certificates (see + "Configuring SSL Support", below). + +Configuring SSL Support +----------------------- + +SSL configuration requires three additional configuration files. The +first two of these are unique to each Open vSwitch. If you used the +instructions above to build your PKI, then these files will be named +sc-privkey.pem and sc-cert.pem, respectively: + + - A private key file, which contains the private half of an RSA or + DSA key. + + This file can be generated on the Open vSwitch itself, for the + greatest security, or it can be generated elsewhere and copied + to the Open vSwitch. + + The contents of the private key file are secret and must not be + exposed. + + - A certificate file, which certifies that the private key is that + of a trustworthy Open vSwitch. + + This file has to be generated on a machine that has the private + key for the switch certification authority, which should not be + an Open vSwitch; ideally, it should be a machine that is not + networked at all. + + The certificate file itself is not a secret. + +The third configuration file is typically the same across all the +switches in a given administrative unit. If you used the +instructions above to build your PKI, then this file will be named +cacert.pem: + + - The root certificate for the controller certificate authority. + The Open vSwitch verifies it that is authorized to connect to an + OpenFlow controller by verifying a signature against this CA + certificate. + +Once you have these files, configure ovs-vswitchd to use them using +the ovs-vsctl "set-ssl" command, e.g.: + + ovs-vsctl set-ssl /etc/openvswitch/sc-privkey.pem /etc/openvswitch/sc-cert.pem /etc/openvswitch/cacert.pem + +Substitute the correct file names, of course, if they differ from the +ones used above. You should use absolute file names (ones that begin +with "/"), because ovs-vswitchd's current directory is unrelated to +the one from which you run ovs-vsctl. + +If you are using self-signed certificates (see "SSL Concepts for +OpenFlow") and you did not copy controllerca/cacert.pem from the PKI +machine to the Open vSwitch, then add the --bootstrap option, e.g.: + + ovs-vsctl -- --bootstrap set-ssl /etc/openvswitch/sc-privkey.pem /etc/openvswitch/sc-cert.pem /etc/openvswitch/cacert.pem + +After you have added all of these configuration keys, you may specify +"ssl:" connection methods elsewhere in the configuration database. +"tcp:" connection methods are still allowed even after SSL has been +configured, so for security you should use only "ssl:" connections. + +Reporting Bugs +-------------- + +Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.Windows b/INSTALL.Windows deleted file mode 100644 index bba0710b8..000000000 --- a/INSTALL.Windows +++ /dev/null @@ -1,263 +0,0 @@ - How to Build the Kernel module & userspace daemons for Windows - ============================================================== - -Autoconf, Automake and Visual C++: ---------------------------------- -Open vSwitch on Linux uses autoconf and automake for generating Makefiles. -It will be useful to maintain the same build system while compiling on Windows -too. One approach is to compile Open vSwitch in a MinGW environment that -contains autoconf and automake utilities and then use Visual C++ as a compiler -and linker. - -The following explains the steps in some detail. - -* Install Mingw on a Windows machine by following the instructions at: -http://www.mingw.org/wiki/Getting_Started - -This should install mingw at C:\Mingw and msys at C:\Mingw\msys. -Add "C:\MinGW\bin" and "C:\Mingw\msys\1.0\bin" to PATH environment variable -of Windows. - -You can either use the MinGW installer or the command line utility 'mingw-get' -to install both the base packages and additional packages like automake and -autoconf(version 2.68). - -Also make sure that /mingw mount point exists. If its not, please add/create -the following entry in /etc/fstab - 'C:/MinGW /mingw'. - -* Install the latest Python 2.x from python.org and verify that its path is -part of Windows' PATH environment variable. - -* You will need at least Visual Studio 2013 to compile userspace binaries. In -addition to that, if you want to compile the kernel module you will also need to -install Windows Driver Kit (WDK) 8.1 Update. - -It is important to get the Visual Studio related environment variables and to -have the $PATH inside the bash to point to the proper compiler and linker. One -easy way to achieve this is to get into the "Developer Command prompt for visual -studio" and through it enter into the bash shell available from msys. - -If after the above step, a 'which link' inside MSYS's bash says, -"/bin/link.exe", rename /bin/link.exe to something else so that the -Visual studio's linker is used. - -* For pthread support, install the library, dll and includes of pthreads-win32 -project from -ftp://sourceware.org/pub/pthreads-win32/prebuilt-dll-2-9-1-release to a -directory (e.g.: C:/pthread). - -* Get the Open vSwitch sources from either cloning the repo using git -or from a distribution tar ball. - -* If you pulled the sources directly from an Open vSwitch Git tree, - run boot.sh in the top source directory: - - % ./boot.sh - -* In the top source directory, configure the package by running the - configure script. You should provide some configure options to choose - the right compiler, linker, libraries, Open vSwitch component installation - directories, etc. For example, - - % ./configure CC=./build-aux/cccl LD="`which link`" LIBS="-lws2_32" \ - --prefix="C:/openvswitch/usr" --localstatedir="C:/openvswitch/var" \ - --sysconfdir="C:/openvswitch/etc" --with-pthread="C:/pthread" - - By default, the above enables compiler optimization for fast code. - For default compiler optimization, pass the "--with-debug" configure - option. - -* Run make for the ported executables in the top source directory, e.g.: - - % make - -* To run all the unit tests: - - % make check - -OpenSSL, Open vSwitch and Visual C++ ------------------------------------- -To get SSL support for Open vSwitch on Windows, do the following: - -* Install OpenSSL for Windows as suggested at -http://www.openssl.org/related/binaries.html. -The link as of this writing suggests to download it from -http://slproweb.com/products/Win32OpenSSL.html and the latest version is -"Win32 OpenSSL v1.0.1j". - -Note down the directory where OpenSSL is installed (e.g.: C:/OpenSSL-Win32). - -* While configuring the package, specify the OpenSSL directory path. -For example, - - % ./configure CC=./build-aux/cccl LD="`which link`" LIBS="-lws2_32" \ - --prefix="C:/openvswitch/usr" --localstatedir="C:/openvswitch/var" \ - --sysconfdir="C:/openvswitch/etc" --with-pthread="C:/pthread" --enable-ssl \ - --with-openssl="C:/OpenSSL-Win32" - -* Run make for the ported executables. - -Building the Kernel module --------------------------- -We directly use the Visual Studio 2013 IDE to compile the kernel module. You can -open the extensions.sln file in the IDE and build the solution. - -Installing the Kernel module ----------------------------- -Once you have built the solution, you can copy the following files to the -target Hyper-V machines: - - ./datapath-windows/x64/Win8.1Debug/package/ovsext.inf - ./datapath-windows/x64/Win8.1Debug/package/OVSExt.sys - ./datapath-windows/x64/Win8.1Debug/package/ovsext.cat - ./datapath-windows/misc/install.cmd - ./datapath-windows/misc/uninstall.cmd - -Steps to install the module ---------------------------- - -01> Run ./uninstall.cmd to remove the old extension. -02> Run ./install.cmd to insert the new one. For this to work you will have to -turn on TESTSIGNING boot option or 'Disable Driver Signature Enforcement' -during boot. -03> In the Virtual Switch Manager configuration you should now see "VMWare OVS -Extension" under 'Virtual Switch Extensions'. Click the check box to enable the -extension. - -Steps to run the user processes & configure VXLAN ports -------------------------------------------------------- - -01> Create the conf db file. -ovsdb\ovsdb-tool.exe create conf.db .\vswitchd\vswitch.ovsschema - -02> Run ovsdb-server -ovsdb\ovsdb-server.exe -v --remote=ptcp:6632:127.0.0.1 conf.db - -03> Create integration bridge & pif bridge -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-br br-int -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-br br-pif - -04> Dump the ports -utilities\ovs-dpctl.exe show - -* Sample output shows up like this. Currently it is not possible to figure out -* the mapping between VIF and VM from the output. - -$ utilities\ovs-dpctl.exe show -2014-06-27T01:55:32Z|00001|socket_util|ERR|4789:0.0.0.0: -socket: Either the application has not called WSAStartup, or WSAStartup failed. - <<< Ignore this error, it is harmless. -system@ovs-system: - lookups: hit:0 missed:0 lost:0 - flows: 0 - masks: hit:0 total:0 hit/pkt:0.00 - port 16777216: internal <<< VTEP created by AllowManagementOS - setting - port 16777225: external.1 <<< Physical NIC - port 16777288: vmNICEmu.1000048 <<< VIF #1 - port 16777289: vmNICSyn.1000049 <<< VIF #2 - - -05> Add the physical NIC and the internal port to br-pif -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-port br-pif - -Eg: -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-port br-pif external.1 -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-port br-pif internal - -06> Add the VIFs to br-int -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-port br-int - -Eg: -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-port br-int vmNICEmu.1000048 -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-port br-int vmNICSyn.1000049 - -07> Verify the status -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 show - -Eg: -$ utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 show -4cd86499-74df-48bd-a64d-8d115b12a9f2 - Bridge br-pif - Port internal - Interface internal - Port "external.1" - Interface "external.1" - Port br-pif - Interface br-pif - type: internal - Bridge br-int - Port br-int - Interface br-int - type: internal - Port "vmNICEmu.1000048" - Interface "vmNICEmu.1000048" - Port "vmNICSyn.1000049" - Interface "vmNICSyn.1000049" - - -09> Run vswitchd -vswitchd\ovs-vswitchd.exe -v tcp:127.0.0.1:6632 - -10> You can figure out the port name to MAC address mapping now. (optional) -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 list interface - -//********** VXLAN PORT CONFIGURATION (Supports Multiple ports) ************// -(Remove all patch ports added to create VLAN networks.) -11> Add the vxlan port between 172.168.201.101 <-> 172.168.201.102 -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-port br-int vxlan-1 -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-1 type=vxlan -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-1 options:local_ip=172.168.201.101 -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-1 options:remote_ip=172.168.201.102 -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-1 options:in_key=flow -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-1 options:out_key=flow - -12> Add the vxlan port between 172.168.201.101 <-> 172.168.201.105 -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-port br-int vxlan-2 -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-2 type=vxlan -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-2 options:local_ip=172.168.201.102 -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-2 options:remote_ip=172.168.201.105 -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-2 options:in_key=flow -utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-2 options:out_key=flow - - -//********** VLAN CONFIGURATION (Using patch ports) ************// -(Remove all VXLAN ports from the configuration.) -13> Add a patch port from br-int to br-pif -utilities/ovs-vsctl.exe -- add-port br-int patch-to-pif -utilities/ovs-vsctl.exe -- set interface patch-to-pif type=patch options:peer=patch-to-int - -14> Add a patch port from br-pif to br-int -utilities/ovs-vsctl.exe -- add-port br-pif patch-to-int -utilities/ovs-vsctl.exe -- set interface patch-to-int type=patch options:peer=patch-to-pif - -15> Re-Add the VIF ports with the VLAN tag -utilities\ovs-vsctl.exe add-port br-int vmNICEmu.1000048 tag=900 -utilities\ovs-vsctl.exe add-port br-int vmNICSyn.1000049 tag=900 - - -Requirements ------------- - -* We require that you don't disable the "Allow management operating system to -share this network adapter" under 'Virtual Switch Properties' > 'Connection -type: External network', in the HyperV virtual network switch configuration. - -* Checksum Offloads - While there is some support for checksum/segmentation offloads in software, -this is still a work in progress. Till the support is complete we recommend -disabling TX/RX offloads for both the VM's as well as the HyperV. - - -TODO ----- - -* OVS currently has no native support for atomics on Windows. Pthreads -are used as a fallback, but some features, such as OVS-RCU are really -slow without native atomics support. Atomics support for Windows has to -be brought in. - -* Investigate the working of sFlow on Windows and re-enable the unit tests. - -* Sign the driver & create an MSI for installing the different OpenvSwitch -components on windows. diff --git a/INSTALL.Windows.md b/INSTALL.Windows.md new file mode 100644 index 000000000..fd333bc0f --- /dev/null +++ b/INSTALL.Windows.md @@ -0,0 +1,263 @@ +How to Build the Kernel module & userspace daemons for Windows +============================================================== + +Autoconf, Automake and Visual C++: +--------------------------------- +Open vSwitch on Linux uses autoconf and automake for generating Makefiles. +It will be useful to maintain the same build system while compiling on Windows +too. One approach is to compile Open vSwitch in a MinGW environment that +contains autoconf and automake utilities and then use Visual C++ as a compiler +and linker. + +The following explains the steps in some detail. + +* Install Mingw on a Windows machine by following the instructions at: +http://www.mingw.org/wiki/Getting_Started + +This should install mingw at C:\Mingw and msys at C:\Mingw\msys. +Add "C:\MinGW\bin" and "C:\Mingw\msys\1.0\bin" to PATH environment variable +of Windows. + +You can either use the MinGW installer or the command line utility 'mingw-get' +to install both the base packages and additional packages like automake and +autoconf(version 2.68). + +Also make sure that /mingw mount point exists. If its not, please add/create +the following entry in /etc/fstab - 'C:/MinGW /mingw'. + +* Install the latest Python 2.x from python.org and verify that its path is +part of Windows' PATH environment variable. + +* You will need at least Visual Studio 2013 to compile userspace binaries. In +addition to that, if you want to compile the kernel module you will also need to +install Windows Driver Kit (WDK) 8.1 Update. + +It is important to get the Visual Studio related environment variables and to +have the $PATH inside the bash to point to the proper compiler and linker. One +easy way to achieve this is to get into the "Developer Command prompt for visual +studio" and through it enter into the bash shell available from msys. + +If after the above step, a 'which link' inside MSYS's bash says, +"/bin/link.exe", rename /bin/link.exe to something else so that the +Visual studio's linker is used. + +* For pthread support, install the library, dll and includes of pthreads-win32 +project from +ftp://sourceware.org/pub/pthreads-win32/prebuilt-dll-2-9-1-release to a +directory (e.g.: C:/pthread). + +* Get the Open vSwitch sources from either cloning the repo using git +or from a distribution tar ball. + +* If you pulled the sources directly from an Open vSwitch Git tree, + run boot.sh in the top source directory: + + % ./boot.sh + +* In the top source directory, configure the package by running the + configure script. You should provide some configure options to choose + the right compiler, linker, libraries, Open vSwitch component installation + directories, etc. For example, + + % ./configure CC=./build-aux/cccl LD="`which link`" LIBS="-lws2_32" \ + --prefix="C:/openvswitch/usr" --localstatedir="C:/openvswitch/var" \ + --sysconfdir="C:/openvswitch/etc" --with-pthread="C:/pthread" + + By default, the above enables compiler optimization for fast code. + For default compiler optimization, pass the "--with-debug" configure + option. + +* Run make for the ported executables in the top source directory, e.g.: + + % make + +* To run all the unit tests: + + % make check + +OpenSSL, Open vSwitch and Visual C++ +------------------------------------ +To get SSL support for Open vSwitch on Windows, do the following: + +* Install OpenSSL for Windows as suggested at +http://www.openssl.org/related/binaries.html. +The link as of this writing suggests to download it from +http://slproweb.com/products/Win32OpenSSL.html and the latest version is +"Win32 OpenSSL v1.0.1j". + +Note down the directory where OpenSSL is installed (e.g.: C:/OpenSSL-Win32). + +* While configuring the package, specify the OpenSSL directory path. +For example, + + % ./configure CC=./build-aux/cccl LD="`which link`" LIBS="-lws2_32" \ + --prefix="C:/openvswitch/usr" --localstatedir="C:/openvswitch/var" \ + --sysconfdir="C:/openvswitch/etc" --with-pthread="C:/pthread" \ + --enable-ssl --with-openssl="C:/OpenSSL-Win32" + +* Run make for the ported executables. + +Building the Kernel module +-------------------------- +We directly use the Visual Studio 2013 IDE to compile the kernel module. You can +open the extensions.sln file in the IDE and build the solution. + +Installing the Kernel module +---------------------------- +Once you have built the solution, you can copy the following files to the +target Hyper-V machines: + + ./datapath-windows/x64/Win8.1Debug/package/ovsext.inf + ./datapath-windows/x64/Win8.1Debug/package/OVSExt.sys + ./datapath-windows/x64/Win8.1Debug/package/ovsext.cat + ./datapath-windows/misc/install.cmd + ./datapath-windows/misc/uninstall.cmd + +Steps to install the module +--------------------------- + +01> Run ./uninstall.cmd to remove the old extension. +02> Run ./install.cmd to insert the new one. For this to work you will have to +turn on TESTSIGNING boot option or 'Disable Driver Signature Enforcement' +during boot. +03> In the Virtual Switch Manager configuration you should now see "VMWare OVS +Extension" under 'Virtual Switch Extensions'. Click the check box to enable the +extension. + +Steps to run the user processes & configure VXLAN ports +------------------------------------------------------- + +01> Create the conf db file. +ovsdb\ovsdb-tool.exe create conf.db .\vswitchd\vswitch.ovsschema + +02> Run ovsdb-server +ovsdb\ovsdb-server.exe -v --remote=ptcp:6632:127.0.0.1 conf.db + +03> Create integration bridge & pif bridge +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-br br-int +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-br br-pif + +04> Dump the ports +utilities\ovs-dpctl.exe show + +* Sample output shows up like this. Currently it is not possible to figure out +* the mapping between VIF and VM from the output. + +$ utilities\ovs-dpctl.exe show +2014-06-27T01:55:32Z|00001|socket_util|ERR|4789:0.0.0.0: +socket: Either the application has not called WSAStartup, or WSAStartup failed. + <<< Ignore this error, it is harmless. +system@ovs-system: + lookups: hit:0 missed:0 lost:0 + flows: 0 + masks: hit:0 total:0 hit/pkt:0.00 + port 16777216: internal <<< VTEP created by AllowManagementOS + setting + port 16777225: external.1 <<< Physical NIC + port 16777288: vmNICEmu.1000048 <<< VIF #1 + port 16777289: vmNICSyn.1000049 <<< VIF #2 + + +05> Add the physical NIC and the internal port to br-pif +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-port br-pif + +Eg: +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-port br-pif external.1 +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-port br-pif internal + +06> Add the VIFs to br-int +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-port br-int + +Eg: +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-port br-int vmNICEmu.1000048 +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-port br-int vmNICSyn.1000049 + +07> Verify the status +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 show + +Eg: +$ utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 show +4cd86499-74df-48bd-a64d-8d115b12a9f2 + Bridge br-pif + Port internal + Interface internal + Port "external.1" + Interface "external.1" + Port br-pif + Interface br-pif + type: internal + Bridge br-int + Port br-int + Interface br-int + type: internal + Port "vmNICEmu.1000048" + Interface "vmNICEmu.1000048" + Port "vmNICSyn.1000049" + Interface "vmNICSyn.1000049" + + +09> Run vswitchd +vswitchd\ovs-vswitchd.exe -v tcp:127.0.0.1:6632 + +10> You can figure out the port name to MAC address mapping now. (optional) +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 list interface + +//********** VXLAN PORT CONFIGURATION (Supports Multiple ports) ************// +(Remove all patch ports added to create VLAN networks.) +11> Add the vxlan port between 172.168.201.101 <-> 172.168.201.102 +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-port br-int vxlan-1 +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-1 type=vxlan +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-1 options:local_ip=172.168.201.101 +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-1 options:remote_ip=172.168.201.102 +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-1 options:in_key=flow +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-1 options:out_key=flow + +12> Add the vxlan port between 172.168.201.101 <-> 172.168.201.105 +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 add-port br-int vxlan-2 +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-2 type=vxlan +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-2 options:local_ip=172.168.201.102 +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-2 options:remote_ip=172.168.201.105 +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-2 options:in_key=flow +utilities\ovs-vsctl.exe --db=tcp:127.0.0.1:6632 set Interface vxlan-2 options:out_key=flow + + +//********** VLAN CONFIGURATION (Using patch ports) ************// +(Remove all VXLAN ports from the configuration.) +13> Add a patch port from br-int to br-pif +utilities/ovs-vsctl.exe -- add-port br-int patch-to-pif +utilities/ovs-vsctl.exe -- set interface patch-to-pif type=patch options:peer=patch-to-int + +14> Add a patch port from br-pif to br-int +utilities/ovs-vsctl.exe -- add-port br-pif patch-to-int +utilities/ovs-vsctl.exe -- set interface patch-to-int type=patch options:peer=patch-to-pif + +15> Re-Add the VIF ports with the VLAN tag +utilities\ovs-vsctl.exe add-port br-int vmNICEmu.1000048 tag=900 +utilities\ovs-vsctl.exe add-port br-int vmNICSyn.1000049 tag=900 + + +Requirements +------------ + +* We require that you don't disable the "Allow management operating system to +share this network adapter" under 'Virtual Switch Properties' > 'Connection +type: External network', in the HyperV virtual network switch configuration. + +* Checksum Offloads + While there is some support for checksum/segmentation offloads in software, +this is still a work in progress. Till the support is complete we recommend +disabling TX/RX offloads for both the VM's as well as the HyperV. + + +TODO +---- + +* OVS currently has no native support for atomics on Windows. Pthreads +are used as a fallback, but some features, such as OVS-RCU are really +slow without native atomics support. Atomics support for Windows has to +be brought in. + +* Investigate the working of sFlow on Windows and re-enable the unit tests. + +* Sign the driver & create an MSI for installing the different OpenvSwitch +components on windows. diff --git a/INSTALL.XenServer b/INSTALL.XenServer deleted file mode 100644 index 8c07d24f9..000000000 --- a/INSTALL.XenServer +++ /dev/null @@ -1,188 +0,0 @@ - How to Install Open vSwitch on Citrix XenServer - =============================================== - -This document describes how to build and install Open vSwitch on a -Citrix XenServer host. If you want to install Open vSwitch on a -generic Linux or BSD host, see INSTALL instead. - -These instructions have been tested with XenServer 5.6 FP1. - -Building Open vSwitch for XenServer ------------------------------------ - -You may build from an Open vSwitch distribution tarball or from an -Open vSwitch Git tree. The recommended build environment to build -RPMs for Citrix XenServer is the DDK VM available from Citrix. - -1. If you are building from an Open vSwitch Git tree, then you will - need to first create a distribution tarball by running "./boot.sh; - ./configure; make dist" in the Git tree. You cannot run this in - the DDK VM, because it lacks tools that are necessary to bootstrap - the Open vSwitch distribution. Instead, you must run this on a - machine that has the tools listed in INSTALL as prerequisites for - building from a Git tree. - -2. Copy the distribution tarball into /usr/src/redhat/SOURCES inside - the DDK VM. - -3. In the DDK VM, unpack the distribution tarball into a temporary - directory and "cd" into the root of the distribution tarball. - -4. To build Open vSwitch userspace, run: - - rpmbuild -bb xenserver/openvswitch-xen.spec - - This produces three RPMs in /usr/src/redhat/RPMS/i386: - "openvswitch", "openvswitch-modules-xen", and - "openvswitch-debuginfo". - - The above command automatically runs the Open vSwitch unit tests. - To disable the unit tests, run: - - rpmbuild -bb --without check xenserver/openvswitch-xen.spec - -Build Parameters ----------------- - -openvswitch-xen.spec needs to know a number of pieces of information -about the XenServer kernel. Usually, it can figure these out for -itself, but if it does not do it correctly then you can specify them -yourself as parameters to the build. Thus, the final "rpmbuild" step -above can be elaborated as: - - VERSION= - KERNEL_NAME= - KERNEL_VERSION= - KERNEL_FLAVOR= - rpmbuild \ - -D "openvswitch_version $VERSION" \ - -D "kernel_name $KERNEL_NAME" \ - -D "kernel_version $KERNEL_VERSION" \ - -D "kernel_flavor $KERNEL_FLAVOR" \ - -bb xenserver/openvswitch-xen.spec - -where: - - is the version number that appears in the - name of the Open vSwitch tarball, e.g. 0.90.0. - - is the name of the XenServer kernel package, - e.g. kernel-xen or kernel-NAME-xen, without the "kernel-" prefix. - - is the output of: - rpm -q --queryformat "%{Version}-%{Release}" , - e.g. 2.6.32.12-0.7.1.xs5.6.100.323.170596, where is - the name of the -devel package corresponding to . - - is either "xen" or "kdump". - The "xen" flavor is the main running kernel flavor and the "kdump" flavor is - the crashdump kernel flavor. Commonly, one would specify "xen" here. - -Installing Open vSwitch for XenServer -------------------------------------- - -To install Open vSwitch on a XenServer host, or to upgrade to a newer version, -copy the "openvswitch" and "openvswitch-modules-xen" RPMs to that host with -"scp", then install them with "rpm -U", e.g.: - - scp openvswitch-$VERSION-1.i386.rpm \ - openvswitch-modules-xen-$XEN_KERNEL_VERSION-$VERSION-1.i386.rpm \ - root@: -(At this point you will have to enter 's root password.) - ssh root@ -(At this point you will have to enter 's root password again.) - rpm -U openvswitch-$VERSION-1.i386.rpm \ - openvswitch-modules-xen-$XEN_KERNEL_VERSION-$VERSION-1.i386.rpm - -To uninstall Open vSwitch from a XenServer host, remove the packages: - - ssh root@ -(At this point you will have to enter 's root password again.) - rpm -e openvswitch openvswitch-modules-xen-$XEN_KERNEL_VERSION - -After installing or uninstalling Open vSwitch, the XenServer should be -rebooted as soon as possible. - -Open vSwitch Boot Sequence on XenServer ---------------------------------------- - -When Open vSwitch is installed on XenServer, its startup script -/etc/init.d/openvswitch runs early in boot. It does roughly the -following: - - * Loads the OVS kernel module, openvswitch. - - * Starts ovsdb-server, the OVS configuration database. - - * XenServer expects there to be no bridges configured at - startup, but the OVS configuration database likely still has - bridges configured from before reboot. To match XenServer - expectations, the startup script deletes all configured - bridges from the database. - - * Starts ovs-vswitchd, the OVS switching daemon. - -At this point in the boot process, then, there are no Open vSwitch -bridges, even though all of the Open vSwitch daemons are running. -Later on in boot, /etc/init.d/management-interface (part of XenServer, -not Open vSwitch) creates the bridge for the XAPI management interface -by invoking /opt/xensource/libexec/interface-reconfigure. Normally -this program consults XAPI's database to obtain information about how -to configure the bridge, but XAPI is not running yet[*] so it instead -consults /var/xapi/network.dbcache, which is a cached copy of the most -recent network configuration. - -[*] Even if XAPI were running, if this XenServer node is a pool slave - then the query would have to consult the master, which requires - network access, which begs the question of how to configure the - management interface. - -XAPI starts later on in the boot process. XAPI can then create other -bridges on demand using /opt/xensource/libexec/interface-reconfigure. -Now that XAPI is running, that program consults XAPI directly instead -of reading the cache. - -As part of its own startup, XAPI invokes the Open vSwitch XAPI plugin -script /etc/xapi.d/openvswitch-cfg-update passing the "update" -command. The plugin script does roughly the following: - - * Calls /opt/xensource/libexec/interface-reconfigure with the - "rewrite" command, to ensure that the network cache is - up-to-date. - - * Queries the Open vSwitch manager setting (named - "vswitch_controller") from the XAPI database for the - XenServer pool. - - * If XAPI and OVS are configured for different managers, or if - OVS is configured for a manager but XAPI is not, runs - "ovs-vsctl emer-reset" to bring the Open vSwitch - configuration to a known state. One effect of emer-reset is - to deconfigure any manager from the OVS database. - - * If XAPI is configured for a manager, configures the OVS - manager to match with "ovs-vsctl set-manager". - -Notes ------ - -* The Open vSwitch boot sequence only configures an OVS configuration -database manager. There is no way to directly configure an OpenFlow -controller on XenServer and, as a consequence of the step above that -deletes all of the bridges at boot time, controller configuration only -persists until XenServer reboot. The configuration database manager -can, however, configure controllers for bridges. See the BUGS section -of ovs-testcontroller(8) for more information on this topic. - -* The Open vSwitch startup script automatically adds a firewall rule -to allow GRE traffic. This rule is needed for the XenServer feature -called "Cross-Host Internal Networks" (CHIN) that uses GRE. If a user -configures tunnels other than GRE (ex: Geneve, VXLAN, LISP), they will have -to either manually add a iptables firewall rule to allow the tunnel traffic -or add it through a startup script (Please refer to the "enable-protocol" -command in the ovs-ctl(8) manpage). - -Reporting Bugs --------------- - -Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.XenServer.md b/INSTALL.XenServer.md new file mode 100644 index 000000000..e4408ddf8 --- /dev/null +++ b/INSTALL.XenServer.md @@ -0,0 +1,192 @@ +How to Install Open vSwitch on Citrix XenServer +=============================================== + +This document describes how to build and install Open vSwitch on a +Citrix XenServer host. If you want to install Open vSwitch on a +generic Linux or BSD host, see [INSTALL](INSTALL.md) instead. + +These instructions have been tested with XenServer 5.6 FP1. + +Building Open vSwitch for XenServer +----------------------------------- + +You may build from an Open vSwitch distribution tarball or from an +Open vSwitch Git tree. The recommended build environment to build +RPMs for Citrix XenServer is the DDK VM available from Citrix. + +1. If you are building from an Open vSwitch Git tree, then you will + need to first create a distribution tarball by running `./boot.sh; + ./configure; make dist` in the Git tree. You cannot run this in + the DDK VM, because it lacks tools that are necessary to bootstrap + the Open vSwitch distribution. Instead, you must run this on a + machine that has the tools listed in [INSTALL](INSTALL.md) as + prerequisites for building from a Git tree. + +2. Copy the distribution tarball into /usr/src/redhat/SOURCES inside + the DDK VM. + +3. In the DDK VM, unpack the distribution tarball into a temporary + directory and "cd" into the root of the distribution tarball. + +4. To build Open vSwitch userspace, run: + + `rpmbuild -bb xenserver/openvswitch-xen.spec` + + This produces three RPMs in /usr/src/redhat/RPMS/i386: + "openvswitch", "openvswitch-modules-xen", and + "openvswitch-debuginfo". + + The above command automatically runs the Open vSwitch unit tests. + To disable the unit tests, run: + + `rpmbuild -bb --without check xenserver/openvswitch-xen.spec` + +Build Parameters +---------------- + +openvswitch-xen.spec needs to know a number of pieces of information +about the XenServer kernel. Usually, it can figure these out for +itself, but if it does not do it correctly then you can specify them +yourself as parameters to the build. Thus, the final "rpmbuild" step +above can be elaborated as: + + ``` + VERSION= + KERNEL_NAME= + KERNEL_VERSION= + KERNEL_FLAVOR= + rpmbuild \ + -D "openvswitch_version $VERSION" \ + -D "kernel_name $KERNEL_NAME" \ + -D "kernel_version $KERNEL_VERSION" \ + -D "kernel_flavor $KERNEL_FLAVOR" \ + -bb xenserver/openvswitch-xen.spec + ``` + +where: + + `` is the version number that appears in the + name of the Open vSwitch tarball, e.g. 0.90.0. + + `` is the name of the XenServer kernel package, + e.g. kernel-xen or kernel-NAME-xen, without the "kernel-" prefix. + + `` is the output of: + rpm -q --queryformat "%{Version}-%{Release}" , + e.g. 2.6.32.12-0.7.1.xs5.6.100.323.170596, where is + the name of the -devel package corresponding to . + + `` is either "xen" or "kdump". + The "xen" flavor is the main running kernel flavor and the "kdump" flavor is + the crashdump kernel flavor. Commonly, one would specify "xen" here. + +Installing Open vSwitch for XenServer +------------------------------------- + +To install Open vSwitch on a XenServer host, or to upgrade to a newer version, +copy the "openvswitch" and "openvswitch-modules-xen" RPMs to that host with +"scp", then install them with "rpm -U", e.g.: + + ``` + scp openvswitch-$VERSION-1.i386.rpm \ + openvswitch-modules-xen-$XEN_KERNEL_VERSION-$VERSION-1.i386.rpm \ + root@: +(At this point you will have to enter 's root password.) + ssh root@ +(At this point you will have to enter 's root password again.) + rpm -U openvswitch-$VERSION-1.i386.rpm \ + openvswitch-modules-xen-$XEN_KERNEL_VERSION-$VERSION-1.i386.rpm + ``` + +To uninstall Open vSwitch from a XenServer host, remove the packages: + + `ssh root@` +(At this point you will have to enter 's root password again.) + `rpm -e openvswitch openvswitch-modules-xen-$XEN_KERNEL_VERSION` + +After installing or uninstalling Open vSwitch, the XenServer should be +rebooted as soon as possible. + +Open vSwitch Boot Sequence on XenServer +--------------------------------------- + +When Open vSwitch is installed on XenServer, its startup script +/etc/init.d/openvswitch runs early in boot. It does roughly the +following: + + * Loads the OVS kernel module, openvswitch. + + * Starts ovsdb-server, the OVS configuration database. + + * XenServer expects there to be no bridges configured at + startup, but the OVS configuration database likely still has + bridges configured from before reboot. To match XenServer + expectations, the startup script deletes all configured + bridges from the database. + + * Starts ovs-vswitchd, the OVS switching daemon. + +At this point in the boot process, then, there are no Open vSwitch +bridges, even though all of the Open vSwitch daemons are running. +Later on in boot, /etc/init.d/management-interface (part of XenServer, +not Open vSwitch) creates the bridge for the XAPI management interface +by invoking /opt/xensource/libexec/interface-reconfigure. Normally +this program consults XAPI's database to obtain information about how +to configure the bridge, but XAPI is not running yet[*] so it instead +consults /var/xapi/network.dbcache, which is a cached copy of the most +recent network configuration. + +[*] Even if XAPI were running, if this XenServer node is a pool slave + then the query would have to consult the master, which requires + network access, which begs the question of how to configure the + management interface. + +XAPI starts later on in the boot process. XAPI can then create other +bridges on demand using /opt/xensource/libexec/interface-reconfigure. +Now that XAPI is running, that program consults XAPI directly instead +of reading the cache. + +As part of its own startup, XAPI invokes the Open vSwitch XAPI plugin +script /etc/xapi.d/openvswitch-cfg-update passing the "update" +command. The plugin script does roughly the following: + + * Calls /opt/xensource/libexec/interface-reconfigure with the + "rewrite" command, to ensure that the network cache is + up-to-date. + + * Queries the Open vSwitch manager setting (named + "vswitch_controller") from the XAPI database for the + XenServer pool. + + * If XAPI and OVS are configured for different managers, or if + OVS is configured for a manager but XAPI is not, runs + "ovs-vsctl emer-reset" to bring the Open vSwitch + configuration to a known state. One effect of emer-reset is + to deconfigure any manager from the OVS database. + + * If XAPI is configured for a manager, configures the OVS + manager to match with "ovs-vsctl set-manager". + +Notes +----- + +* The Open vSwitch boot sequence only configures an OVS configuration +database manager. There is no way to directly configure an OpenFlow +controller on XenServer and, as a consequence of the step above that +deletes all of the bridges at boot time, controller configuration only +persists until XenServer reboot. The configuration database manager +can, however, configure controllers for bridges. See the BUGS section +of ovs-testcontroller(8) for more information on this topic. + +* The Open vSwitch startup script automatically adds a firewall rule +to allow GRE traffic. This rule is needed for the XenServer feature +called "Cross-Host Internal Networks" (CHIN) that uses GRE. If a user +configures tunnels other than GRE (ex: Geneve, VXLAN, LISP), they will have +to either manually add a iptables firewall rule to allow the tunnel traffic +or add it through a startup script (Please refer to the "enable-protocol" +command in the ovs-ctl(8) manpage). + +Reporting Bugs +-------------- + +Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.md b/INSTALL.md new file mode 100644 index 000000000..536cbfbc4 --- /dev/null +++ b/INSTALL.md @@ -0,0 +1,606 @@ +How to Install Open vSwitch on Linux, FreeBSD and NetBSD +======================================================== + +This document describes how to build and install Open vSwitch on a +generic Linux, FreeBSD, or NetBSD host. For specifics around installation +on a specific platform, please see one of these files: + + - [INSTALL.Debian](INSTALL.Debian.md) + - [INSTALL.Fedora](INSTALL.Fedora.md) + - [INSTALL.RHEL](INSTALL.RHEL.md) + - [INSTALL.XenServer](INSTALL.XenServer.md) + - [INSTALL.NetBSD](INSTALL.NetBSD.md) + - [INSTALL.DPDK](INSTALL.DPDK.md) + +Build Requirements +------------------ + +To compile the userspace programs in the Open vSwitch distribution, +you will need the following software: + + - GNU make. + + - A C compiler, such as: + + * GCC 4.x. + + * Clang. Clang 3.4 and later provide useful static semantic + analysis and thread-safety checks. For Ubuntu, there are + nightly built packages available on clang's website. + + While OVS may be compatible with other compilers, optimal + support for atomic operations may be missing, making OVS very + slow (see lib/ovs-atomic.h). + + - libssl, from OpenSSL, is optional but recommended if you plan to + connect the Open vSwitch to an OpenFlow controller. libssl is + required to establish confidentiality and authenticity in the + connections from an Open vSwitch to an OpenFlow controller. If + libssl is installed, then Open vSwitch will automatically build + with support for it. + + - Python 2.x, for x >= 4. + +On Linux, you may choose to compile the kernel module that comes with +the Open vSwitch distribution or to use the kernel module built into +the Linux kernel (version 3.3 or later). See the FAQ question "What +features are not available in the Open vSwitch kernel datapath that +ships as part of the upstream Linux kernel?" for more information on +this trade-off. You may also use the userspace-only implementation, +at some cost in features and performance (see INSTALL.userspace for +details). To compile the kernel module on Linux, you must also +install the following: + + - A supported Linux kernel version. Please refer to README.md for a + list of supported versions. + + The Open vSwitch datapath requires bridging support + (CONFIG_BRIDGE) to be built as a kernel module. (This is common + in kernels provided by Linux distributions.) The bridge module + must not be loaded or in use. If the bridge module is running + (check with "lsmod | grep bridge"), you must remove it ("rmmod + bridge") before starting the datapath. + + For optional support of ingress policing, you must enable kernel + configuration options NET_CLS_BASIC, NET_SCH_INGRESS, and + NET_ACT_POLICE, either built-in or as modules. (NET_CLS_POLICE is + obsolete and not needed.) + + To use GRE tunneling on Linux 2.6.37 or newer, kernel support + for GRE demultiplexing (CONFIG_NET_IPGRE_DEMUX) must be compiled + in or available as a module. Also, on kernels before 3.11, the + ip_gre module, for GRE tunnels over IP (NET_IPGRE), must not be + loaded or compiled in. + + To configure HTB or HFSC quality of service with Open vSwitch, + you must enable the respective configuration options. + + To use Open vSwitch support for TAP devices, you must enable + CONFIG_TUN. + + - To build a kernel module, you need the same version of GCC that + was used to build that kernel. + + - A kernel build directory corresponding to the Linux kernel image + the module is to run on. Under Debian and Ubuntu, for example, + each linux-image package containing a kernel binary has a + corresponding linux-headers package with the required build + infrastructure. + +If you are working from a Git tree or snapshot (instead of from a +distribution tarball), or if you modify the Open vSwitch build system +or the database schema, you will also need the following software: + + - Autoconf version 2.63 or later. + + - Automake version 1.10 or later. + + - libtool version 2.4 or later. (Older versions might work too.) + +To run the unit tests, you also need: + + - Perl. Version 5.10.1 is known to work. Earlier versions should + also work. + +The ovs-vswitchd.conf.db(5) manpage will include an E-R diagram, in +formats other than plain text, only if you have the following: + + - "dot" from graphviz (http://www.graphviz.org/). + + - Perl. Version 5.10.1 is known to work. Earlier versions should + also work. + + - Python 2.x, for x >= 4. + +If you are going to extensively modify Open vSwitch, please consider +installing the following to obtain better warnings: + + - "sparse" version 0.4.4 or later + (http://www.kernel.org/pub/software/devel/sparse/dist/). + + - GNU make. + + - clang, version 3.4 or later + +Also, you may find the ovs-dev script found in utilities/ovs-dev.py useful. + +Installation Requirements +------------------------- + +The machine on which Open vSwitch is to be installed must have the +following software: + + - libc compatible with the libc used for build. + + - libssl compatible with the libssl used for build, if OpenSSL was + used for the build. + + - On Linux, the same kernel version configured as part of the build. + + - For optional support of ingress policing on Linux, the "tc" program + from iproute2 (part of all major distributions and available at + http://www.linux-foundation.org/en/Net:Iproute2). + +On Linux you should ensure that /dev/urandom exists. To support TAP +devices, you must also ensure that /dev/net/tun exists. + +Building and Installing Open vSwitch for Linux, FreeBSD or NetBSD +================================================================= + +Once you have installed all the prerequisites listed above in the Base +Prerequisites section, follow the procedure below to build. + +1. If you pulled the sources directly from an Open vSwitch Git tree, + run boot.sh in the top source directory: + + `% ./boot.sh` + +2. Configure the package by running the configure script. You can + usually invoke configure without any arguments. For example: + + `% ./configure` + + By default all files are installed under /usr/local. If you want + to install into, e.g., /usr and /var instead of /usr/local and + /usr/local/var, add options as shown here: + + `% ./configure --prefix=/usr --localstatedir=/var` + + To use a specific C compiler for compiling Open vSwitch user + programs, also specify it on the configure command line, like so: + + `% ./configure CC=gcc-4.2` + + To use 'clang' compiler: + + `% ./configure CC=clang` + + To build the Linux kernel module, so that you can run the + kernel-based switch, pass the location of the kernel build + directory on --with-linux. For example, to build for a running + instance of Linux: + + `% ./configure --with-linux=/lib/modules/`uname -r`/build` + + If --with-linux requests building for an unsupported version of + Linux, then "configure" will fail with an error message. Please + refer to the FAQ for advice in that case. + + If you wish to build the kernel module for an architecture other + than the architecture of the machine used for the build, you may + specify the kernel architecture string using the KARCH variable + when invoking the configure script. For example, to build for MIPS + with Linux: + + `% ./configure --with-linux=/path/to/linux KARCH=mips` + + If you plan to do much Open vSwitch development, you might want to + add --enable-Werror, which adds the -Werror option to the compiler + command line, turning warnings into errors. That makes it + impossible to miss warnings generated by the build. + + To build with gcov code coverage support, add --enable-coverage, + e.g.: + + `% ./configure --enable-coverage` + + The configure script accepts a number of other options and honors + additional environment variables. For a full list, invoke + configure with the --help option. + + You can also run configure from a separate build directory. This + is helpful if you want to build Open vSwitch in more than one way + from a single source directory, e.g. to try out both GCC and Clang + builds, or to build kernel modules for more than one Linux version. + Here is an example: + + `% mkdir _gcc && (cd _gcc && ../configure CC=gcc)` + `% mkdir _clang && (cd _clang && ../configure CC=clang)` + +3. Run GNU make in the build directory, e.g.: + + `% make` + + or if GNU make is installed as "gmake": + + `% gmake` + + If you used a separate build directory, run make or gmake from that + directory, e.g.: + + `% make -C _gcc` + `% make -C _clang` + + For improved warnings if you installed "sparse" (see + "Prerequisites"), add C=1 to the command line. + +4. Consider running the testsuite. Refer to "Running the Testsuite" + below, for instructions. + +5. Become root by running "su" or another program. + +6. Run "make install" to install the executables and manpages into the + running system, by default under /usr/local. + +7. If you built kernel modules, you may install and load them, e.g.: + + `% make modules_install` + `% /sbin/modprobe openvswitch` + + To verify that the modules have been loaded, run "/sbin/lsmod" and + check that openvswitch is listed. + + If the `modprobe` operation fails, look at the last few kernel log + messages (e.g. with `dmesg | tail`): + + - The message "openvswitch: exports duplicate symbol + br_should_route_hook (owned by bridge)" means that the bridge + module is loaded. Run `/sbin/rmmod bridge` to remove it. + + If `/sbin/rmmod bridge` fails with "ERROR: Module bridge does + not exist in /proc/modules", then the bridge is compiled into + the kernel, rather than as a module. Open vSwitch does not + support this configuration (see "Build Requirements", above). + + - The message "openvswitch: exports duplicate symbol + dp_ioctl_hook (owned by ofdatapath)" means that the ofdatapath + module from the OpenFlow reference implementation is loaded. + Run `/sbin/rmmod ofdatapath` to remove it. (You might have to + delete any existing datapaths beforehand, using the "dpctl" + program included with the OpenFlow reference implementation. + "ovs-dpctl" will not work.) + + - Otherwise, the most likely problem is that Open vSwitch was + built for a kernel different from the one into which you are + trying to load it. Run `modinfo` on openvswitch.ko and on + a module built for the running kernel, e.g.: + + ``` + % /sbin/modinfo openvswitch.ko + % /sbin/modinfo /lib/modules/`uname -r`/kernel/net/bridge/bridge.ko + ``` + + Compare the "vermagic" lines output by the two commands. If + they differ, then Open vSwitch was built for the wrong kernel. + + - If you decide to report a bug or ask a question related to + module loading, please include the output from the `dmesg` and + `modinfo` commands mentioned above. + + There is an optional module parameter to openvswitch.ko called + vlan_tso that enables TCP segmentation offload over VLANs on NICs + that support it. Many drivers do not expose support for TSO on VLANs + in a way that Open vSwitch can use but there is no way to detect + whether this is the case. If you know that your particular driver can + handle it (for example by testing sending large TCP packets over VLANs) + then passing in a value of 1 may improve performance. Modules built for + Linux kernels 2.6.37 and later, as well as specially patched versions + of earlier kernels, do not need this and do not have this parameter. If + you do not understand what this means or do not know if your driver + will work, do not set this. + +8. Initialize the configuration database using ovsdb-tool, e.g.: + + `% mkdir -p /usr/local/etc/openvswitch` + `% ovsdb-tool create /usr/local/etc/openvswitch/conf.db vswitchd/vswitch.ovsschema` + +Startup +======= + +Before starting ovs-vswitchd itself, you need to start its +configuration database, ovsdb-server. Each machine on which Open +vSwitch is installed should run its own copy of ovsdb-server. +Configure it to use the database you created during installation (as +explained above), to listen on a Unix domain socket, to connect to any +managers specified in the database itself, and to use the SSL +configuration in the database: + + % ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ + --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ + --private-key=db:Open_vSwitch,SSL,private_key \ + --certificate=db:Open_vSwitch,SSL,certificate \ + --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert \ + --pidfile --detach + +(If you built Open vSwitch without SSL support, then omit +--private-key, --certificate, and --bootstrap-ca-cert.) + +Then initialize the database using ovs-vsctl. This is only +necessary the first time after you create the database with +ovsdb-tool (but running it at any time is harmless): + + % ovs-vsctl --no-wait init + +Then start the main Open vSwitch daemon, telling it to connect to the +same Unix domain socket: + + % ovs-vswitchd --pidfile --detach + +Now you may use ovs-vsctl to set up bridges and other Open vSwitch +features. For example, to create a bridge named br0 and add ports +eth0 and vif1.0 to it: + + % ovs-vsctl add-br br0 + % ovs-vsctl add-port br0 eth0 + % ovs-vsctl add-port br0 vif1.0 + +Please refer to ovs-vsctl(8) for more details. + +Upgrading +========= + +When you upgrade Open vSwitch from one version to another, you should +also upgrade the database schema: + +1. Stop the Open vSwitch daemons, e.g.: + + ``` + % kill `cd /usr/local/var/run/openvswitch && cat ovsdb-server.pid ovs-vswitchd.pid` + ``` + +2. Install the new Open vSwitch release. + +3. Upgrade the database, in one of the following two ways: + + - If there is no important data in your database, then you may + delete the database file and recreate it with ovsdb-tool, + following the instructions under "Building and Installing Open + vSwitch for Linux, FreeBSD or NetBSD". + + - If you want to preserve the contents of your database, back it + up first, then use "ovsdb-tool convert" to upgrade it, e.g.: + + `% ovsdb-tool convert /usr/local/etc/openvswitch/conf.db vswitchd/vswitch.ovsschema` + +4. Start the Open vSwitch daemons as described under "Building and + Installing Open vSwitch for Linux, FreeBSD or NetBSD" above. + +Hot Upgrading +============= +Upgrading Open vSwitch from one version to the next version with minimum +disruption of traffic going through the system that is using that Open vSwitch +needs some considerations: + +1. If the upgrade only involves upgrading the userspace utilities and daemons +of Open vSwitch, make sure that the new userspace version is compatible with +the previously loaded kernel module. + +2. An upgrade of userspace daemons means that they have to be restarted. +Restarting the daemons means that the OpenFlow flows in the ovs-vswitchd daemon +will be lost. One way to restore the flows is to let the controller +re-populate it. Another way is to save the previous flows using a utility +like ovs-ofctl and then re-add them after the restart. Restoring the old flows +is accurate only if the new Open vSwitch interfaces retain the old 'ofport' +values. + +3. When the new userspace daemons get restarted, they automatically flush +the old flows setup in the kernel. This can be expensive if there are hundreds +of new flows that are entering the kernel but userspace daemons are busy +setting up new userspace flows from either the controller or an utility like +ovs-ofctl. Open vSwitch database provides an option to solve this problem +through the other_config:flow-restore-wait column of the Open_vSwitch table. +Refer to the ovs-vswitchd.conf.db(5) manpage for details. + +4. If the upgrade also involves upgrading the kernel module, the old kernel +module needs to be unloaded and the new kernel module should be loaded. This +means that the kernel network devices belonging to Open vSwitch is recreated +and the kernel flows are lost. The downtime of the traffic can be reduced +if the userspace daemons are restarted immediately and the userspace flows +are restored as soon as possible. + +The ovs-ctl utility's "restart" function only restarts the userspace daemons, +makes sure that the 'ofport' values remain consistent across restarts, restores +userspace flows using the ovs-ofctl utility and also uses the +other_config:flow-restore-wait column to keep the traffic downtime to the +minimum. The ovs-ctl utility's "force-reload-kmod" function does all of the +above, but also replaces the old kernel module with the new one. Open vSwitch +startup scripts for Debian, XenServer and RHEL use ovs-ctl's functions and it +is recommended that these functions be used for other software platforms too. + +Testsuites +========== + +This section describe Open vSwitch's built-in support for various test +suites. You must configure and build Open vSwitch (steps 1 through 3 +in "Building and Installing Open vSwitch for Linux, FreeBSD or NetBSD" +above) before you run the tests described here. You do not need to +install Open vSwitch or to build or load the kernel module to run +these test suites. You do not need supervisor privilege to run these +test suites. + +Self-Tests +---------- + +Open vSwitch includes a suite of self-tests. Before you submit patches +upstream, we advise that you run the tests and ensure that they pass. +If you add new features to Open vSwitch, then adding tests for those +features will ensure your features don't break as developers modify +other areas of Open vSwitch. + +Refer to "Testsuites" above for prerequisites. + +To run all the unit tests in Open vSwitch, one at a time: + `make check` +This takes under 5 minutes on a modern desktop system. + +To run all the unit tests in Open vSwitch, up to 8 in parallel: + `make check TESTSUITEFLAGS=-j8` +This takes under a minute on a modern 4-core desktop system. + +To see a list of all the available tests, run: + `make check TESTSUITEFLAGS=--list` + +To run only a subset of tests, e.g. test 123 and tests 477 through 484: + `make check TESTSUITEFLAGS='123 477-484'` +(Tests do not have inter-dependencies, so you may run any subset.) + +To run tests matching a keyword, e.g. "ovsdb": + `make check TESTSUITEFLAGS='-k ovsdb'` + +To see a complete list of test options: + `make check TESTSUITEFLAGS=--help` + +The results of a testing run are reported in tests/testsuite.log. +Please report test failures as bugs and include the testsuite.log in +your report. + +If you have "valgrind" installed, then you can also run the testsuite +under valgrind by using "make check-valgrind" in place of "make +check". All the same options are available via TESTSUITEFLAGS. When +you do this, the "valgrind" results for test `` are reported in files +named `tests/testsuite.dir//valgrind.*`. You may find that the +valgrind results are easier to interpret if you put "-q" in +~/.valgrindrc, since that reduces the amount of output. + +Sometimes a few tests may fail on some runs but not others. This is +usually a bug in the testsuite, not a bug in Open vSwitch itself. If +you find that a test fails intermittently, please report it, since the +developers may not have noticed. + +OFTest +------ + +OFTest is an OpenFlow protocol testing suite. Open vSwitch includes a +Makefile target to run OFTest with Open vSwitch in "dummy mode". In +this mode of testing, no packets travel across physical or virtual +networks. Instead, Unix domain sockets stand in as simulated +networks. This simulation is imperfect, but it is much easier to set +up, does not require extra physical or virtual hardware, and does not +require supervisor privileges. + +To run OFTest with Open vSwitch, first read and follow the +instructions under "Testsuites" above. Second, obtain a copy of +OFTest and install its prerequisites. You need a copy of OFTest that +includes commit 406614846c5 (make ovs-dummy platform work again). +This commit was merged into the OFTest repository on Feb 1, 2013, so +any copy of OFTest more recent than that should work. Testing OVS in +dummy mode does not require root privilege, so you may ignore that +requirement. + +Optionally, add the top-level OFTest directory (containing the "oft" +program) to your $PATH. This slightly simplifies running OFTest later. + +To run OFTest in dummy mode, run the following command from your Open +vSwitch build directory: + `make check-oftest OFT=` +where `` is the absolute path to the "oft" program in +OFTest. + +If you added "oft" to your $PATH, you may omit the OFT variable +assignment: + `make check-oftest` +By default, "check-oftest" passes "oft" just enough options to enable +dummy mode. You can use OFTFLAGS to pass additional options. For +example, to run just the basic.Echo test instead of all tests (the +default) and enable verbose logging: + `make check-oftest OFT= OFTFLAGS='--verbose -T basic.Echo'` + +If you use OFTest that does not include commit 4d1f3eb2c792 (oft: +change default port to 6653), merged into the OFTest repository in +October 2013, then you need to add an option to use the IETF-assigned +controller port: + `make check-oftest OFT= OFTFLAGS='--port=6653'` + +Please interpret OFTest results cautiously. Open vSwitch can fail a +given test in OFTest for many reasons, including bugs in Open vSwitch, +bugs in OFTest, bugs in the "dummy mode" integration, and differing +interpretations of the OpenFlow standard and other standards. + +Open vSwitch has not been validated against OFTest. Please do report +test failures that you believe to represent bugs in Open vSwitch. +Include the precise versions of Open vSwitch and OFTest in your bug +report, plus any other information needed to reproduce the problem. + +Ryu +--- + +Ryu is an OpenFlow controller written in Python that includes an +extensive OpenFlow testsuite. Open vSwitch includes a Makefile target +to run Ryu in "dummy mode". See "OFTest" above for an explanation of +dummy mode. + +To run Ryu tests with Open vSwitch, first read and follow the +instructions under "Testsuites" above. Second, obtain a copy of Ryu, +install its prerequisites, and build it. You do not need to install +Ryu (some of the tests do not get installed, so it does not help). + +To run Ryu tests, run the following command from your Open vSwitch +build directory: + `make check-ryu RYUDIR=` +where `` is the absolute path to the root of the Ryu +source distribution. The default `` is `$srcdir/../ryu` +where $srcdir is your Open vSwitch source directory, so if this +default is correct then you make simply run `make check-ryu`. + +Open vSwitch has not been validated against Ryu. Please do report +test failures that you believe to represent bugs in Open vSwitch. +Include the precise versions of Open vSwitch and Ryu in your bug +report, plus any other information needed to reproduce the problem. + +Continuous Integration with Travis-CI +------------------------------------- + +A .travis.yml file is provided to automatically build Open vSwitch with +various build configurations and run the testsuite using travis-ci. +Builds will be performed with gcc, sparse and clang with the -Werror +compiler flag included, therefore the build will fail if a new warning +has been introduced. + +The CI build is triggered via git push (regardless of the specific +branch) or pull request against any Open vSwitch GitHub repository that +is linked to travis-ci. + +Instructions to setup travis-ci for your GitHub repository: + +1. Go to http://travis-ci.org/ and sign in using your GitHub ID. +2. Go to the "Repositories" tab and enable the ovs repository. You + may disable builds for pushes or pull requests. +3. In order to avoid forks sending build failures to the upstream + mailing list, the notification email recipient is encrypted. If you + want to receive email notification for build failures, replace the + the encrypted string: + 3.1) Install the travis-ci CLI (Requires ruby >=2.0): + gem install travis + 3.2) In your Open vSwitch repository: + travis encrypt mylist@mydomain.org + 3.3) Add/replace the notifications section in .travis.yml and fill + in the secure string as returned by travis encrypt: + + notifications: + email: + recipients: + - secure: "....." + + (You may remove/omit the notifications section to fall back to + default notification behaviour which is to send an email directly + to the author and committer of the failing commit. Note that the + email is only sent if the author/committer have commit rights for + the particular GitHub repository). + +4. Pushing a commit to the repository which breaks the build or the + testsuite will now trigger a email sent to mylist@mydomain.org + +Bug Reporting +============= + +Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.userspace b/INSTALL.userspace deleted file mode 100644 index f54b93e2e..000000000 --- a/INSTALL.userspace +++ /dev/null @@ -1,84 +0,0 @@ - Using Open vSwitch without kernel support - ========================================= - -Open vSwitch can operate, at a cost in performance, entirely in -userspace, without assistance from a kernel module. This file -explains how to install Open vSwitch in such a mode. - -The userspace-only mode of Open vSwitch is considered experimental. -It has not been thoroughly tested. - -This version of Open vSwitch should be built manually with "configure" -and "make". Debian packaging for Open vSwitch is also included, but -it has not been recently tested, and so Debian packages are not a -recommended way to use this version of Open vSwitch. - -Building and Installing ------------------------ - -The requirements and procedure for building, installing, and -configuring Open vSwitch are the same as those given in INSTALL. -You may omit configuring, building, and installing the kernel module, -and the related requirements. - -On Linux, the userspace switch additionally requires the kernel -TUN/TAP driver to be available, either built into the kernel or loaded -as a module. If you are not sure, check for a directory named -/sys/class/misc/tun. If it does not exist, then attempt to load the -module with "modprobe tun". - -The tun device must also exist as /dev/net/tun. If it does not exist, -then create /dev/net (if necessary) with "mkdir /dev/net", then create -/dev/net/tun with "mknod /dev/net/tun c 10 200". - -On FreeBSD and NetBSD, the userspace switch additionally requires the -kernel tap(4) driver to be available, either built into the kernel or -loaded as a module. - -Using the Userspace Datapath with ovs-vswitchd ----------------------------------------------- - -To use ovs-vswitchd in userspace mode, create a bridge with datapath_type -"netdev" in the configuration database. For example: - - ovs-vsctl add-br br0 - ovs-vsctl set bridge br0 datapath_type=netdev - ovs-vsctl add-port br0 eth0 - ovs-vsctl add-port br0 eth1 - ovs-vsctl add-port br0 eth2 - -ovs-vswitchd will create a TAP device as the bridge's local interface, -named the same as the bridge, as well as for each configured internal -interface. - -Currently, on FreeBSD, the functionality required for in-band control -support is not implemented. To avoid related errors, you can disable -the in-band support with the following command. - - ovs-vsctl set bridge br0 other_config:disable-in-band=true - -Firewall Rules --------------- - -On Linux, when a physical interface is in use by the userspace -datapath, packets received on the interface still also pass into the -kernel TCP/IP stack. This can cause surprising and incorrect -behavior. You can use "iptables" to avoid this behavior, by using it -to drop received packets. For example, to drop packets received on -eth0: - - iptables -A INPUT -i eth0 -j DROP - iptables -A FORWARD -i eth0 -j DROP - -Other settings --------------- - -On NetBSD, depending on your network topology and applications, the -following configuration might help. See sysctl(7). - - sysctl net.inet.ip.checkinterface=1 - -Bug Reporting -------------- - -Please report problems to bugs@openvswitch.org. diff --git a/INSTALL.userspace.md b/INSTALL.userspace.md new file mode 100644 index 000000000..a9f881c32 --- /dev/null +++ b/INSTALL.userspace.md @@ -0,0 +1,84 @@ +Using Open vSwitch without kernel support +========================================= + +Open vSwitch can operate, at a cost in performance, entirely in +userspace, without assistance from a kernel module. This file +explains how to install Open vSwitch in such a mode. + +The userspace-only mode of Open vSwitch is considered experimental. +It has not been thoroughly tested. + +This version of Open vSwitch should be built manually with `configure` +and `make`. Debian packaging for Open vSwitch is also included, but +it has not been recently tested, and so Debian packages are not a +recommended way to use this version of Open vSwitch. + +Building and Installing +----------------------- + +The requirements and procedure for building, installing, and +configuring Open vSwitch are the same as those given in INSTALL. +You may omit configuring, building, and installing the kernel module, +and the related requirements. + +On Linux, the userspace switch additionally requires the kernel +TUN/TAP driver to be available, either built into the kernel or loaded +as a module. If you are not sure, check for a directory named +/sys/class/misc/tun. If it does not exist, then attempt to load the +module with `modprobe tun`. + +The tun device must also exist as `/dev/net/tun`. If it does not exist, +then create /dev/net (if necessary) with `mkdir /dev/net`, then create +`/dev/net/tun` with `mknod /dev/net/tun c 10 200`. + +On FreeBSD and NetBSD, the userspace switch additionally requires the +kernel tap(4) driver to be available, either built into the kernel or +loaded as a module. + +Using the Userspace Datapath with ovs-vswitchd +---------------------------------------------- + +To use ovs-vswitchd in userspace mode, create a bridge with datapath_type +"netdev" in the configuration database. For example: + + ovs-vsctl add-br br0 + ovs-vsctl set bridge br0 datapath_type=netdev + ovs-vsctl add-port br0 eth0 + ovs-vsctl add-port br0 eth1 + ovs-vsctl add-port br0 eth2 + +ovs-vswitchd will create a TAP device as the bridge's local interface, +named the same as the bridge, as well as for each configured internal +interface. + +Currently, on FreeBSD, the functionality required for in-band control +support is not implemented. To avoid related errors, you can disable +the in-band support with the following command. + + ovs-vsctl set bridge br0 other_config:disable-in-band=true + +Firewall Rules +-------------- + +On Linux, when a physical interface is in use by the userspace +datapath, packets received on the interface still also pass into the +kernel TCP/IP stack. This can cause surprising and incorrect +behavior. You can use "iptables" to avoid this behavior, by using it +to drop received packets. For example, to drop packets received on +eth0: + + iptables -A INPUT -i eth0 -j DROP + iptables -A FORWARD -i eth0 -j DROP + +Other settings +-------------- + +On NetBSD, depending on your network topology and applications, the +following configuration might help. See sysctl(7). + + sysctl net.inet.ip.checkinterface=1 + +Bug Reporting +------------- + +Please report problems to bugs@openvswitch.org. diff --git a/IntegrationGuide b/IntegrationGuide deleted file mode 100644 index 976936b71..000000000 --- a/IntegrationGuide +++ /dev/null @@ -1,169 +0,0 @@ - Integration Guide for Centralized Control - ========================================= - -This document describes how to integrate Open vSwitch onto a new -platform to expose the state of the switch and attached devices for -centralized control. (If you are looking to port the switching -components of Open vSwitch to a new platform, please see the PORTING -document.) The focus of this guide is on hypervisors, but many of the -interfaces are useful for hardware switches, as well. The XenServer -integration is the most mature implementation, so most of the examples -are drawn from it. - -The externally visible interface to this integration is -platform-agnostic. We encourage anyone who integrates Open vSwitch to -use the same interface, because keeping a uniform interface means that -controllers require less customization for individual platforms (and -perhaps no customization at all). - -Integration centers around the Open vSwitch database and mostly involves -the 'external_ids' columns in several of the tables. These columns are -not interpreted by Open vSwitch itself. Instead, they provide -information to a controller that permits it to associate a database -record with a more meaningful entity. In contrast, the 'other_config' -column is used to configure behavior of the switch. The main job of the -integrator, then, is to ensure that these values are correctly populated -and maintained. - -An integrator sets the columns in the database by talking to the -ovsdb-server daemon. A few of the columns can be set during startup by -calling the ovs-ctl tool from inside the startup scripts. The -'xenserver/etc_init.d_openvswitch' script provides examples of its use, -and the ovs-ctl(8) manpage contains complete documentation. At runtime, -ovs-vsctl can be be used to set columns in the database. The script -'xenserver/etc_xensource_scripts_vif' contains examples of its use, and -ovs-vsctl(8) manpage contains complete documentation. - -Python and C bindings to the database are provided if deeper integration -with a program are needed. The XenServer ovs-xapi-sync daemon -('xenserver/usr_share_openvswitch_scripts_ovs-xapi-sync') provides an -example of using the Python bindings. More information on the python -bindings is available at 'python/ovs/db/idl.py'. Information on the C -bindings is available at 'lib/ovsdb-idl.h'. - -The following diagram shows how integration scripts fit into the Open vSwitch -architecture: - - +----------------------------------------+ - | Controller Cluster + - +----------------------------------------+ - | - | - +----------------------------------------------------------+ - | | | - | +--------------+---------------+ | - | | | | - | +-------------------+ +------------------+ | - | | ovsdb-server |-----------| ovs-vswitchd | | - | +-------------------+ +------------------+ | - | | | | - | +---------------------+ | | - | | Integration scripts | | | - | | (ex: ovs-xapi-sync) | | | - | +---------------------+ | | - | | Userspace | - |----------------------------------------------------------| - | | Kernel | - | | | - | +---------------------+ | - | | OVS Kernel Module | | - | +---------------------+ | - +----------------------------------------------------------+ - - -A description of the most relevant fields for integration follows. By -setting these values, controllers are able to understand the network and -manage it more dynamically and precisely. For more details about the -database and each individual column, please refer to the -ovs-vswitchd.conf.db(5) manpage. - - -Open_vSwitch table ------------------- -The Open_vSwitch table describes the switch as a whole. The -'system_type' and 'system_version' columns identify the platform to the -controller. The 'external_ids:system-id' key uniquely identifies the -physical host. In XenServer, the system-id will likely be the same as -the UUID returned by 'xe host-list'. This key allows controllers to -distinguish between multiple hypervisors. - -Most of this configuration can be done with the ovs-ctl command at -startup. For example: - - ovs-ctl --system-type="XenServer" --system-version="6.0.0-50762p" \ - --system-id="${UUID}" "${other_options}" start - -Alternatively, the ovs-vsctl command may be used to set a particular -value at runtime. For example: - - ovs-vsctl set open_vswitch . external-ids:system-id='"${UUID}"' - -The 'other_config:enable-statistics' key may be set to "true" to have OVS -populate the database with statistics (e.g., number of CPUs, memory, -system load) for the controller's use. - - -Bridge table ------------- -The Bridge table describes individual bridges within an Open vSwitch -instance. The 'external-ids:bridge-id' key uniquely identifies a -particular bridge. In XenServer, this will likely be the same as the -UUID returned by 'xe network-list' for that particular bridge. - -For example, to set the identifier for bridge "br0", the following -command can be used: - - ovs-vsctl set Bridge br0 external-ids:bridge-id='"${UUID}"' - -The MAC address of the bridge may be manually configured by setting it -with the "other_config:hwaddr" key. For example: - - ovs-vsctl set Bridge br0 other_config:hwaddr="12:34:56:78:90:ab" - - -Interface table ---------------- -The Interface table describes an interface under the control of Open -vSwitch. The 'external_ids' column contains keys that are used to -provide additional information about the interface: - - attached-mac - - This field contains the MAC address of the device attached to - the interface. On a hypervisor, this is the MAC address of the - interface as seen inside a VM. It does not necessarily - correlate to the host-side MAC address. For example, on - XenServer, the MAC address on a VIF in the hypervisor is always - FE:FF:FF:FF:FF:FF, but inside the VM a normal MAC address is - seen. - - iface-id - - This field uniquely identifies the interface. In hypervisors, - this allows the controller to follow VM network interfaces as - VMs migrate. A well-chosen identifier should also allow an - administrator or a controller to associate the interface with - the corresponding object in the VM management system. For - example, the Open vSwitch integration with XenServer by default - uses the XenServer assigned UUID for a VIF record as the - iface-id. - - iface-status - - In a hypervisor, there are situations where there are multiple - interface choices for a single virtual ethernet interface inside - a VM. Valid values are "active" and "inactive". A complete - description is available in the ovs-vswitchd.conf.db(5) manpage. - - vm-id - - This field uniquely identifies the VM to which this interface - belongs. A single VM may have multiple interfaces attached to - it. - -As in the previous tables, the ovs-vsctl command may be used to -configure the values. For example, to set the 'iface-id' on eth0, the -following command can be used: - - ovs-vsctl set Interface eth0 external-ids:iface-id='"${UUID}"' - diff --git a/IntegrationGuide.md b/IntegrationGuide.md new file mode 100644 index 000000000..5d3e574b6 --- /dev/null +++ b/IntegrationGuide.md @@ -0,0 +1,169 @@ +Integration Guide for Centralized Control +========================================= + +This document describes how to integrate Open vSwitch onto a new +platform to expose the state of the switch and attached devices for +centralized control. (If you are looking to port the switching +components of Open vSwitch to a new platform, please see the PORTING +document.) The focus of this guide is on hypervisors, but many of the +interfaces are useful for hardware switches, as well. The XenServer +integration is the most mature implementation, so most of the examples +are drawn from it. + +The externally visible interface to this integration is +platform-agnostic. We encourage anyone who integrates Open vSwitch to +use the same interface, because keeping a uniform interface means that +controllers require less customization for individual platforms (and +perhaps no customization at all). + +Integration centers around the Open vSwitch database and mostly involves +the 'external_ids' columns in several of the tables. These columns are +not interpreted by Open vSwitch itself. Instead, they provide +information to a controller that permits it to associate a database +record with a more meaningful entity. In contrast, the 'other_config' +column is used to configure behavior of the switch. The main job of the +integrator, then, is to ensure that these values are correctly populated +and maintained. + +An integrator sets the columns in the database by talking to the +ovsdb-server daemon. A few of the columns can be set during startup by +calling the ovs-ctl tool from inside the startup scripts. The +'xenserver/etc_init.d_openvswitch' script provides examples of its use, +and the ovs-ctl(8) manpage contains complete documentation. At runtime, +ovs-vsctl can be be used to set columns in the database. The script +'xenserver/etc_xensource_scripts_vif' contains examples of its use, and +ovs-vsctl(8) manpage contains complete documentation. + +Python and C bindings to the database are provided if deeper integration +with a program are needed. The XenServer ovs-xapi-sync daemon +('xenserver/usr_share_openvswitch_scripts_ovs-xapi-sync') provides an +example of using the Python bindings. More information on the python +bindings is available at 'python/ovs/db/idl.py'. Information on the C +bindings is available at 'lib/ovsdb-idl.h'. + +The following diagram shows how integration scripts fit into the Open vSwitch +architecture: + + +----------------------------------------+ + | Controller Cluster + + +----------------------------------------+ + | + | + +----------------------------------------------------------+ + | | | + | +--------------+---------------+ | + | | | | + | +-------------------+ +------------------+ | + | | ovsdb-server |-----------| ovs-vswitchd | | + | +-------------------+ +------------------+ | + | | | | + | +---------------------+ | | + | | Integration scripts | | | + | | (ex: ovs-xapi-sync) | | | + | +---------------------+ | | + | | Userspace | + |----------------------------------------------------------| + | | Kernel | + | | | + | +---------------------+ | + | | OVS Kernel Module | | + | +---------------------+ | + +----------------------------------------------------------+ + + +A description of the most relevant fields for integration follows. By +setting these values, controllers are able to understand the network and +manage it more dynamically and precisely. For more details about the +database and each individual column, please refer to the +ovs-vswitchd.conf.db(5) manpage. + + +Open_vSwitch table +------------------ +The Open_vSwitch table describes the switch as a whole. The +'system_type' and 'system_version' columns identify the platform to the +controller. The 'external_ids:system-id' key uniquely identifies the +physical host. In XenServer, the system-id will likely be the same as +the UUID returned by 'xe host-list'. This key allows controllers to +distinguish between multiple hypervisors. + +Most of this configuration can be done with the ovs-ctl command at +startup. For example: + + ovs-ctl --system-type="XenServer" --system-version="6.0.0-50762p" \ + --system-id="${UUID}" "${other_options}" start + +Alternatively, the ovs-vsctl command may be used to set a particular +value at runtime. For example: + + ovs-vsctl set open_vswitch . external-ids:system-id='"${UUID}"' + +The 'other_config:enable-statistics' key may be set to "true" to have OVS +populate the database with statistics (e.g., number of CPUs, memory, +system load) for the controller's use. + + +Bridge table +------------ +The Bridge table describes individual bridges within an Open vSwitch +instance. The 'external-ids:bridge-id' key uniquely identifies a +particular bridge. In XenServer, this will likely be the same as the +UUID returned by 'xe network-list' for that particular bridge. + +For example, to set the identifier for bridge "br0", the following +command can be used: + + ovs-vsctl set Bridge br0 external-ids:bridge-id='"${UUID}"' + +The MAC address of the bridge may be manually configured by setting it +with the "other_config:hwaddr" key. For example: + + ovs-vsctl set Bridge br0 other_config:hwaddr="12:34:56:78:90:ab" + + +Interface table +--------------- +The Interface table describes an interface under the control of Open +vSwitch. The 'external_ids' column contains keys that are used to +provide additional information about the interface: + + attached-mac + + This field contains the MAC address of the device attached to + the interface. On a hypervisor, this is the MAC address of the + interface as seen inside a VM. It does not necessarily + correlate to the host-side MAC address. For example, on + XenServer, the MAC address on a VIF in the hypervisor is always + FE:FF:FF:FF:FF:FF, but inside the VM a normal MAC address is + seen. + + iface-id + + This field uniquely identifies the interface. In hypervisors, + this allows the controller to follow VM network interfaces as + VMs migrate. A well-chosen identifier should also allow an + administrator or a controller to associate the interface with + the corresponding object in the VM management system. For + example, the Open vSwitch integration with XenServer by default + uses the XenServer assigned UUID for a VIF record as the + iface-id. + + iface-status + + In a hypervisor, there are situations where there are multiple + interface choices for a single virtual ethernet interface inside + a VM. Valid values are "active" and "inactive". A complete + description is available in the ovs-vswitchd.conf.db(5) manpage. + + vm-id + + This field uniquely identifies the VM to which this interface + belongs. A single VM may have multiple interfaces attached to + it. + +As in the previous tables, the ovs-vsctl command may be used to +configure the values. For example, to set the 'iface-id' on eth0, the +following command can be used: + + ovs-vsctl set Interface eth0 external-ids:iface-id='"${UUID}"' + diff --git a/Makefile.am b/Makefile.am index 846172d85..343dffa18 100644 --- a/Makefile.am +++ b/Makefile.am @@ -62,35 +62,35 @@ CLEAN_LOCAL = DISTCLEANFILES = PYCOV_CLEAN_FILES = build-aux/check-structs,cover EXTRA_DIST = \ - CONTRIBUTING \ - CodingStyle \ - DESIGN \ - FAQ \ - INSTALL \ - INSTALL.Debian \ - INSTALL.Docker \ - INSTALL.DPDK \ - INSTALL.Fedora \ - INSTALL.KVM \ - INSTALL.Libvirt \ - INSTALL.NetBSD \ - INSTALL.RHEL \ - INSTALL.SSL \ - INSTALL.XenServer \ - INSTALL.userspace \ - INSTALL.Windows \ - IntegrationGuide \ + CONTRIBUTING.md \ + CodingStyle.md \ + DESIGN.md \ + FAQ.md \ + INSTALL.md \ + INSTALL.Debian.md \ + INSTALL.Docker.md \ + INSTALL.DPDK.md \ + INSTALL.Fedora.md \ + INSTALL.KVM.md \ + INSTALL.Libvirt.md \ + INSTALL.NetBSD.md \ + INSTALL.RHEL.md \ + INSTALL.SSL.md \ + INSTALL.XenServer.md \ + INSTALL.userspace.md \ + INSTALL.Windows.md \ + IntegrationGuide.md \ NOTICE \ - OPENFLOW-1.1+ \ - PORTING \ + OPENFLOW-1.1+.md \ + PORTING.md \ README.md \ - README-lisp \ - REPORTING-BUGS \ - TODO \ + README-lisp.md \ + REPORTING-BUGS.md \ + TODO.md \ .travis.yml \ .travis/build.sh \ .travis/prepare.sh \ - WHY-OVS \ + WHY-OVS.md \ boot.sh \ build-aux/cccl \ build-aux/sodepends.pl \ @@ -225,7 +225,7 @@ printf-check: then \ echo "See above for list of violations of the rule that"; \ echo "'z', 't', 'j', 'hh' printf() type modifiers are"; \ - echo "forbidden. See CodingStyle for replacements."; \ + echo "forbidden. See CodingStyle.md for replacements."; \ exit 1; \ fi .PHONY: printf-check diff --git a/NEWS b/NEWS index abcb52571..31d00de9c 100644 --- a/NEWS +++ b/NEWS @@ -32,7 +32,7 @@ Post-v2.3.0 who find it useful for testing basic OpenFlow setups. It is still not a necessary or desirable part of most Open vSwitch deployments. - Support for travis-ci.org based continuous integration builds has been - added. Build failures are reported to build@openvswitch.org. See INSTALL + added. Build failures are reported to build@openvswitch.org. See INSTALL.md file for additional details. - Experimental support for the Rapid Spanning Tree Protocol (IEEE 802.1D-2004). More conformance and interoperability testing is @@ -71,7 +71,7 @@ v2.2.0 - Internal Release - The "ovsdbmonitor" graphical tool has been removed, because it was poorly maintained and not widely used. - New "check-ryu" Makefile target for running Ryu tests for OpenFlow - controllers against Open vSwitch. See INSTALL for details. + controllers against Open vSwitch. See INSTALL.md for details. - Added IPFIX support for SCTP flows and templates for ICMPv4/v6 flows. - Upon the receipt of a SIGHUP signal, ovs-vswitchd no longer reopens its log file (it will terminate instead). Please use 'ovs-appctl vlog/reopen' diff --git a/OPENFLOW-1.1+ b/OPENFLOW-1.1+ deleted file mode 100644 index 01adf7267..000000000 --- a/OPENFLOW-1.1+ +++ /dev/null @@ -1,231 +0,0 @@ - OpenFlow 1.1+ support in Open vSwitch - ===================================== - -Open vSwitch support for OpenFlow 1.1 and beyond is a work in -progress. This file describes the work still to be done. - -The Plan --------- - -OpenFlow version support is not a build-time option. A single build -of Open vSwitch must be able to handle all supported versions of -OpenFlow. Ideally, even at runtime it should be able to support all -protocol versions at the same time on different OpenFlow bridges (and -perhaps even on the same bridge). - -At the same time, it would be a shame to litter the core of the OVS -code with lots of ugly code concerned with the details of various -OpenFlow protocol versions. - -The primary approach to compatibility is to abstract most of the -details of the differences from the core code, by adding a protocol -layer that translates between OF1.x and a slightly higher-level -abstract representation. The core of this approach is the many struct -ofputil_* structures in lib/ofp-util.h. - -As a consequence of this approach, OVS cannot use OpenFlow protocol -definitions that closely resemble those in the OpenFlow specification, -because openflow.h in different versions of the OpenFlow specification -defines the same identifier with different values. Instead, -openflow-common.h contains definitions that are common to all the -specifications and separate protocol version-specific headers contain -protocol-specific definitions renamed so as not to conflict, -e.g. OFPAT10_ENQUEUE and OFPAT11_ENQUEUE for the OpenFlow 1.0 and 1.1 -values for OFPAT_ENQUEUE. Generally, in cases of conflict, the -protocol layer will define a more abstract OFPUTIL_* or struct -ofputil_*. - -Here are the current approaches in a few tricky areas: - - * Port numbering. OpenFlow 1.0 has 16-bit port numbers and later - OpenFlow versions have 32-bit port numbers. For now, OVS - support for later protocol versions requires all port numbers to - fall into the 16-bit range, translating the reserved OFPP_* port - numbers. - - * Actions. OpenFlow 1.0 and later versions have very different - ideas of actions. OVS reconciles by translating all the - versions' actions (and instructions) to and from a common - internal representation. - -OpenFlow 1.1 ------------- - -The list of remaining work items for OpenFlow 1.1 is below. It is -probably incomplete. - - * Match and set double-tagged VLANs (QinQ). This requires kernel - work for reasonable performance. - [optional for OF1.1+] - - * VLANs tagged with 88a8 Ethertype. This requires kernel work for - reasonable performance. - [required for OF1.1+] - -OpenFlow 1.2 ------------- - -OpenFlow 1.2 support requires OpenFlow 1.1 as a prerequisite. All the -additional work specific to Openflow 1.2 are complete. (This is based -on the change log at the end of the OF1.2 spec. I didn't compare the -specs carefully yet.) - -OpenFlow 1.3 ------------- - -OpenFlow 1.3 support requires OpenFlow 1.2 as a prerequisite, plus the -following additional work. (This is based on the change log at the -end of the OF1.3 spec, reusing most of the section titles directly. I -didn't compare the specs carefully yet.) - - * Add support for multipart requests. - Currently we always report OFPBRC_MULTIPART_BUFFER_OVERFLOW. - [optional for OF1.3+] - - * IPv6 extension header handling support. Fully implementing this - requires kernel support. This likely will take some careful and - probably time-consuming design work. The actual coding, once - that is all done, is probably 2 or 3 days work. - [optional for OF1.3+] - - * Per-flow meters. OpenFlow protocol support is now implemented. - Support for the special OFPM_SLOWPATH and OFPM_CONTROLLER meters - is missing. Support for the software switch is under review. - [optional for OF1.3+] - - * Auxiliary connections. An implementation in generic code might - be a week's worth of work. The value of an implementation in - generic code is questionable, though, since much of the benefit - of axuiliary connections is supposed to be to take advantage of - hardware support. (We could make the kernel module somehow - send packets across the auxiliary connections directly, for - some kind of "hardware" support, if we judged it useful enough.) - [optional for OF1.3+] - - * Provider Backbone Bridge tagging. I don't plan to implement - this (but we'd accept an implementation). - [optional for OF1.3+] - - * On-demand flow counters. I think this might be a real - optimization in some cases for the software switch. - [optional for OF1.3+] - -ONF OpenFlow Exensions for 1.3.X Pack1 --------------------------------------- - -OpenFlow 1.3 has a bunch of ONF extentions. -Many of them are necessary for OpenFlow 1.4 as well. - - * Flow entry notifications - This seems to be modelled after OVS's NXST_FLOW_MONITOR. - (Simon Horman is working on this.) - [EXT-187] - [required for OF1.4+] - - * Role Status - [EXT-191] - [required for OF1.4+] - - * Flow entry eviction - OVS has flow eviction functionality. - table_mod OFPTC_EVICTION, flow_mod 'importance', and - table_desc ofp_table_mod_prop_eviction need to be implemented. - [EXT-192-e] - [optional for OF1.4+] - - * Vacancy events - [EXT-192-v] - [optional for OF1.4+] - - * Bundle - Transactional modification. OpenFlow 1.4 requires to support - flow_mods and port_mods in a bundle. - (Not related to OVS's 'ofbundle' stuff.) - [EXT-230] - [required for OF1.4+] - - * Table synchronisation - [EXT-232] - [optional for OF1.4+] - - * Group notifications - [EXT-235] - [optional for OF1.4+] - - * Bad flow entry priority error - Probably not so useful to the software switch. - [EXT-236] - [optional for OF1.4+] - - * Set async config error - [EXT-237] - [optional for OF1.4+] - - * PBB UCA header field - [EXT-256] - [optional for OF1.4+] - - * Multipart timeout error - [EXT-264] - [required for OF1.4+] - -OpenFlow 1.4 ------------- - - * More extensible wire protocol - Many on-wire structures got TLVs. - [EXT-262] - [required for OF1.4+] - - * More descriptive reasons for packet-in - Distinguish OFPR_APPLY_ACTION, OFPR_ACTION_SET, OFPR_GROUP, - OFPR_PACKET_OUT. NO_MATCH was renamed to OFPR_TABLE_MISS. - [EXT-136] - [required for OF1.4+] - - * Optical port properties - [EXT-154] - [optional for OF1.4+] - - * Meter notifications - [EXT-235] - [optional for OF1.4+] - -General ------ - - * ovs-ofctl(8) often lists as Nicira extensions features that - later OpenFlow versions support in standard ways. - -How to contribute ------------------ - -If you plan to contribute code for a feature, please let everyone know -on ovs-dev before you start work. This will help avoid duplicating -work. - -Please consider the following: - - * Testing. Please test your code. - - * Unit tests. Please consider writing some. The tests directory - has many examples that you can use as a starting point. - - * ovs-ofctl. If you add a feature that is useful for some - ovs-ofctl command then you should add support for it there. - - * Documentation. If you add a user-visible feature, then you - should document it in the appropriate manpage and mention it in - NEWS as well. - - * Coding style (see the CodingStyle file at the top of the source - tree). - - * The patch submission guidelines (see CONTRIBUTING). I - recommend using "git send-email", which automatically follows a - lot of those guidelines. - -Bug Reporting -------------- - -Please report problems to bugs@openvswitch.org. diff --git a/OPENFLOW-1.1+.md b/OPENFLOW-1.1+.md new file mode 100644 index 000000000..36fd1681b --- /dev/null +++ b/OPENFLOW-1.1+.md @@ -0,0 +1,231 @@ +OpenFlow 1.1+ support in Open vSwitch +===================================== + +Open vSwitch support for OpenFlow 1.1 and beyond is a work in +progress. This file describes the work still to be done. + +The Plan +-------- + +OpenFlow version support is not a build-time option. A single build +of Open vSwitch must be able to handle all supported versions of +OpenFlow. Ideally, even at runtime it should be able to support all +protocol versions at the same time on different OpenFlow bridges (and +perhaps even on the same bridge). + +At the same time, it would be a shame to litter the core of the OVS +code with lots of ugly code concerned with the details of various +OpenFlow protocol versions. + +The primary approach to compatibility is to abstract most of the +details of the differences from the core code, by adding a protocol +layer that translates between OF1.x and a slightly higher-level +abstract representation. The core of this approach is the many struct +ofputil_* structures in lib/ofp-util.h. + +As a consequence of this approach, OVS cannot use OpenFlow protocol +definitions that closely resemble those in the OpenFlow specification, +because openflow.h in different versions of the OpenFlow specification +defines the same identifier with different values. Instead, +openflow-common.h contains definitions that are common to all the +specifications and separate protocol version-specific headers contain +protocol-specific definitions renamed so as not to conflict, +e.g. OFPAT10_ENQUEUE and OFPAT11_ENQUEUE for the OpenFlow 1.0 and 1.1 +values for OFPAT_ENQUEUE. Generally, in cases of conflict, the +protocol layer will define a more abstract OFPUTIL_* or struct +ofputil_*. + +Here are the current approaches in a few tricky areas: + + * Port numbering. OpenFlow 1.0 has 16-bit port numbers and later + OpenFlow versions have 32-bit port numbers. For now, OVS + support for later protocol versions requires all port numbers to + fall into the 16-bit range, translating the reserved OFPP_* port + numbers. + + * Actions. OpenFlow 1.0 and later versions have very different + ideas of actions. OVS reconciles by translating all the + versions' actions (and instructions) to and from a common + internal representation. + +OpenFlow 1.1 +------------ + +The list of remaining work items for OpenFlow 1.1 is below. It is +probably incomplete. + + * Match and set double-tagged VLANs (QinQ). This requires kernel + work for reasonable performance. + [optional for OF1.1+] + + * VLANs tagged with 88a8 Ethertype. This requires kernel work for + reasonable performance. + [required for OF1.1+] + +OpenFlow 1.2 +------------ + +OpenFlow 1.2 support requires OpenFlow 1.1 as a prerequisite. All the +additional work specific to Openflow 1.2 are complete. (This is based +on the change log at the end of the OF1.2 spec. I didn't compare the +specs carefully yet.) + +OpenFlow 1.3 +------------ + +OpenFlow 1.3 support requires OpenFlow 1.2 as a prerequisite, plus the +following additional work. (This is based on the change log at the +end of the OF1.3 spec, reusing most of the section titles directly. I +didn't compare the specs carefully yet.) + + * Add support for multipart requests. + Currently we always report OFPBRC_MULTIPART_BUFFER_OVERFLOW. + [optional for OF1.3+] + + * IPv6 extension header handling support. Fully implementing this + requires kernel support. This likely will take some careful and + probably time-consuming design work. The actual coding, once + that is all done, is probably 2 or 3 days work. + [optional for OF1.3+] + + * Per-flow meters. OpenFlow protocol support is now implemented. + Support for the special OFPM_SLOWPATH and OFPM_CONTROLLER meters + is missing. Support for the software switch is under review. + [optional for OF1.3+] + + * Auxiliary connections. An implementation in generic code might + be a week's worth of work. The value of an implementation in + generic code is questionable, though, since much of the benefit + of axuiliary connections is supposed to be to take advantage of + hardware support. (We could make the kernel module somehow + send packets across the auxiliary connections directly, for + some kind of "hardware" support, if we judged it useful enough.) + [optional for OF1.3+] + + * Provider Backbone Bridge tagging. I don't plan to implement + this (but we'd accept an implementation). + [optional for OF1.3+] + + * On-demand flow counters. I think this might be a real + optimization in some cases for the software switch. + [optional for OF1.3+] + +ONF OpenFlow Exensions for 1.3.X Pack1 +-------------------------------------- + +OpenFlow 1.3 has a bunch of ONF extentions. +Many of them are necessary for OpenFlow 1.4 as well. + + * Flow entry notifications + This seems to be modelled after OVS's NXST_FLOW_MONITOR. + (Simon Horman is working on this.) + [EXT-187] + [required for OF1.4+] + + * Role Status + [EXT-191] + [required for OF1.4+] + + * Flow entry eviction + OVS has flow eviction functionality. + table_mod OFPTC_EVICTION, flow_mod 'importance', and + table_desc ofp_table_mod_prop_eviction need to be implemented. + [EXT-192-e] + [optional for OF1.4+] + + * Vacancy events + [EXT-192-v] + [optional for OF1.4+] + + * Bundle + Transactional modification. OpenFlow 1.4 requires to support + flow_mods and port_mods in a bundle. + (Not related to OVS's 'ofbundle' stuff.) + [EXT-230] + [required for OF1.4+] + + * Table synchronisation + [EXT-232] + [optional for OF1.4+] + + * Group notifications + [EXT-235] + [optional for OF1.4+] + + * Bad flow entry priority error + Probably not so useful to the software switch. + [EXT-236] + [optional for OF1.4+] + + * Set async config error + [EXT-237] + [optional for OF1.4+] + + * PBB UCA header field + [EXT-256] + [optional for OF1.4+] + + * Multipart timeout error + [EXT-264] + [required for OF1.4+] + +OpenFlow 1.4 +------------ + + * More extensible wire protocol + Many on-wire structures got TLVs. + [EXT-262] + [required for OF1.4+] + + * More descriptive reasons for packet-in + Distinguish OFPR_APPLY_ACTION, OFPR_ACTION_SET, OFPR_GROUP, + OFPR_PACKET_OUT. NO_MATCH was renamed to OFPR_TABLE_MISS. + [EXT-136] + [required for OF1.4+] + + * Optical port properties + [EXT-154] + [optional for OF1.4+] + + * Meter notifications + [EXT-235] + [optional for OF1.4+] + +General +----- + + * ovs-ofctl(8) often lists as Nicira extensions features that + later OpenFlow versions support in standard ways. + +How to contribute +----------------- + +If you plan to contribute code for a feature, please let everyone know +on ovs-dev before you start work. This will help avoid duplicating +work. + +Please consider the following: + + * Testing. Please test your code. + + * Unit tests. Please consider writing some. The tests directory + has many examples that you can use as a starting point. + + * ovs-ofctl. If you add a feature that is useful for some + ovs-ofctl command then you should add support for it there. + + * Documentation. If you add a user-visible feature, then you + should document it in the appropriate manpage and mention it in + NEWS as well. + + * Coding style (see the [CodingStyle](CodingStyle.md) file at the + top of the source tree). + + * The patch submission guidelines (see CONTRIBUTING). I + recommend using "git send-email", which automatically follows a + lot of those guidelines. + +Bug Reporting +------------- + +Please report problems to bugs@openvswitch.org. diff --git a/PORTING b/PORTING deleted file mode 100644 index 79b1aaf84..000000000 --- a/PORTING +++ /dev/null @@ -1,324 +0,0 @@ - How to Port Open vSwitch to New Software or Hardware - ==================================================== - -Open vSwitch (OVS) is intended to be easily ported to new software and -hardware platforms. This document describes the types of changes that -are most likely to be necessary in porting OVS to Unix-like platforms. -(Porting OVS to other kinds of platforms is likely to be more -difficult.) - - -Vocabulary ----------- - -For historical reasons, different words are used for essentially the -same concept in different areas of the Open vSwitch source tree. Here -is a concordance, indexed by the area of the source tree: - - datapath/ vport --- - vswitchd/ iface port - ofproto/ port bundle - ofproto/bond.c slave bond - lib/lacp.c slave lacp - lib/netdev.c netdev --- - database Interface Port - - -Open vSwitch Architectural Overview ------------------------------------ - -The following diagram shows the very high-level architecture of Open -vSwitch from a porter's perspective. - - +-------------------+ - | ovs-vswitchd |<-->ovsdb-server - +-------------------+ - | ofproto |<-->OpenFlow controllers - +--------+-+--------+ - | netdev | | ofproto| - +--------+ |provider| - | netdev | +--------+ - |provider| - +--------+ - -Some of the components are generic. Modulo bugs or inadequacies, -these components should not need to be modified as part of a port: - - - "ovs-vswitchd" is the main Open vSwitch userspace program, in - vswitchd/. It reads the desired Open vSwitch configuration from - the ovsdb-server program over an IPC channel and passes this - configuration down to the "ofproto" library. It also passes - certain status and statistical information from ofproto back - into the database. - - - "ofproto" is the Open vSwitch library, in ofproto/, that - implements an OpenFlow switch. It talks to OpenFlow controllers - over the network and to switch hardware or software through an - "ofproto provider", explained further below. - - - "netdev" is the Open vSwitch library, in lib/netdev.c, that - abstracts interacting with network devices, that is, Ethernet - interfaces. The netdev library is a thin layer over "netdev - provider" code, explained further below. - -The other components may need attention during a port. You will -almost certainly have to implement a "netdev provider". Depending on -the type of port you are doing and the desired performance, you may -also have to implement an "ofproto provider" or a lower-level -component called a "dpif" provider. - -The following sections talk about these components in more detail. - - -Writing a netdev Provider -------------------------- - -A "netdev provider" implements an operating system and hardware -specific interface to "network devices", e.g. eth0 on Linux. Open -vSwitch must be able to open each port on a switch as a netdev, so you -will need to implement a "netdev provider" that works with your switch -hardware and software. - -struct netdev_class, in lib/netdev-provider.h, defines the interfaces -required to implement a netdev. That structure contains many function -pointers, each of which has a comment that is meant to describe its -behavior in detail. If the requirements are unclear, please report -this as a bug. - -The netdev interface can be divided into a few rough categories: - - * Functions required to properly implement OpenFlow features. For - example, OpenFlow requires the ability to report the Ethernet - hardware address of a port. These functions must be implemented - for minimally correct operation. - - * Functions required to implement optional Open vSwitch features. - For example, the Open vSwitch support for in-band control - requires netdev support for inspecting the TCP/IP stack's ARP - table. These functions must be implemented if the corresponding - OVS features are to work, but may be omitted initially. - - * Functions needed in some implementations but not in others. For - example, most kinds of ports (see below) do not need - functionality to receive packets from a network device. - -The existing netdev implementations may serve as useful examples -during a port: - - * lib/netdev-linux.c implements netdev functionality for Linux - network devices, using Linux kernel calls. It may be a good - place to start for full-featured netdev implementations. - - * lib/netdev-vport.c provides support for "virtual ports" - implemented by the Open vSwitch datapath module for the Linux - kernel. This may serve as a model for minimal netdev - implementations. - - * lib/netdev-dummy.c is a fake netdev implementation useful only - for testing. - - -Porting Strategies ------------------- - -After a netdev provider has been implemented for a system's network -devices, you may choose among three basic porting strategies. - -The lowest-effort strategy is to use the "userspace switch" -implementation built into Open vSwitch. This ought to work, without -writing any more code, as long as the netdev provider that you -implemented supports receiving packets. It yields poor performance, -however, because every packet passes through the ovs-vswitchd process. -See INSTALL.userspace for instructions on how to configure a userspace -switch. - -If the userspace switch is not the right choice for your port, then -you will have to write more code. You may implement either an -"ofproto provider" or a "dpif provider". Which you should choose -depends on a few different factors: - - * Only an ofproto provider can take full advantage of hardware - with built-in support for wildcards (e.g. an ACL table or a - TCAM). - - * A dpif provider can take advantage of the Open vSwitch built-in - implementations of bonding, LACP, 802.1ag, 802.1Q VLANs, and - other features. An ofproto provider has to provide its own - implementations, if the hardware can support them at all. - - * A dpif provider is usually easier to implement, but most - appropriate for software switching. It "explodes" wildcard - rules into exact-match entries (with an optional wildcard mask). - This allows fast hash lookups in software, but makes - inefficient use of TCAMs in hardware that support wildcarding. - -The following sections describe how to implement each kind of port. - - -ofproto Providers ------------------ - -An "ofproto provider" is what ofproto uses to directly monitor and -control an OpenFlow-capable switch. struct ofproto_class, in -ofproto/ofproto-provider.h, defines the interfaces to implement an -ofproto provider for new hardware or software. That structure contains -many function pointers, each of which has a comment that is meant to -describe its behavior in detail. If the requirements are unclear, -please report this as a bug. - -The ofproto provider interface is preliminary. Please let us know if -it seems unsuitable for your purpose. We will try to improve it. - - -Writing a dpif Provider ------------------------ - -Open vSwitch has a built-in ofproto provider named "ofproto-dpif", -which is built on top of a library for manipulating datapaths, called -"dpif". A "datapath" is a simple flow table, one that is only required -to support exact-match flows, that is, flows without wildcards. When a -packet arrives on a network device, the datapath looks for it in this -table. If there is a match, then it performs the associated actions. -If there is no match, the datapath passes the packet up to ofproto-dpif, -which maintains the full OpenFlow flow table. If the packet matches in -this flow table, then ofproto-dpif executes its actions and inserts a -new entry into the dpif flow table. (Otherwise, ofproto-dpif passes the -packet up to ofproto to send the packet to the OpenFlow controller, if -one is configured.) - -When calculating the dpif flow, ofproto-dpif generates an exact-match -flow that describes the missed packet. It makes an effort to figure out -what fields can be wildcarded based on the switch's configuration and -OpenFlow flow table. The dpif is free to ignore the suggested wildcards -and only support the exact-match entry. However, if the dpif supports -wildcarding, then it can use the masks to match multiple flows with -fewer entries and potentially significantly reduce the number of flow -misses handled by ofproto-dpif. - -The "dpif" library in turn delegates much of its functionality to a -"dpif provider". The following diagram shows how dpif providers fit -into the Open vSwitch architecture: - - _ - | +-------------------+ - | | ovs-vswitchd |<-->ovsdb-server - | +-------------------+ - | | ofproto |<-->OpenFlow controllers - | +--------+-+--------+ _ - | | netdev | |ofproto-| | - userspace | +--------+ | dpif | | - | | netdev | +--------+ | - | |provider| | dpif | | - | +---||---+ +--------+ | - | || | dpif | | implementation of - | || |provider| | ofproto provider - |_ || +---||---+ | - || || | - _ +---||-----+---||---+ | - | | |datapath| | - kernel | | +--------+ _| - | | | - |_ +--------||---------+ - || - physical - NIC - -struct dpif_class, in lib/dpif-provider.h, defines the interfaces -required to implement a dpif provider for new hardware or software. -That structure contains many function pointers, each of which has a -comment that is meant to describe its behavior in detail. If the -requirements are unclear, please report this as a bug. - -There are two existing dpif implementations that may serve as -useful examples during a port: - - * lib/dpif-netlink.c is a Linux-specific dpif implementation that - talks to an Open vSwitch-specific kernel module (whose sources - are in the "datapath" directory). The kernel module performs - all of the switching work, passing packets that do not match any - flow table entry up to userspace. This dpif implementation is - essentially a wrapper around calls into the kernel module. - - * lib/dpif-netdev.c is a generic dpif implementation that performs - all switching internally. This is how the Open vSwitch - userspace switch is implemented. - - -Miscellaneous Notes -------------------- - -Open vSwitch source code uses uint16_t, uint32_t, and uint64_t as -fixed-width types in host byte order, and ovs_be16, ovs_be32, and -ovs_be64 as fixed-width types in network byte order. Each of the -latter is equivalent to the one of the former, but the difference in -name makes the intended use obvious. - -The default "fail-mode" for Open vSwitch bridges is "standalone", -meaning that, when the OpenFlow controllers cannot be contacted, Open -vSwitch acts as a regular MAC-learning switch. This works well in -virtualization environments where there is normally just one uplink -(either a single physical interface or a bond). In a more general -environment, it can create loops. So, if you are porting to a -general-purpose switch platform, you should consider changing the -default "fail-mode" to "secure", which does not behave this way. See -documentation for the "fail-mode" column in the Bridge table in -ovs-vswitchd.conf.db(5) for more information. - -lib/entropy.c assumes that it can obtain high-quality random number -seeds at startup by reading from /dev/urandom. You will need to -modify it if this is not true on your platform. - -vswitchd/system-stats.c only knows how to obtain some statistics on -Linux. Optionally you may implement them for your platform as well. - - -Why OVS Does Not Support Hybrid Providers ------------------------------------------ - -The "Porting Strategies" section above describes the "ofproto -provider" and "dpif provider" porting strategies. Only an ofproto -provider can take advantage of hardware TCAM support, and only a dpif -provider can take advantage of the OVS built-in implementations of -various features. It is therefore tempting to suggest a hybrid -approach that shares the advantages of both strategies. - -However, Open vSwitch does not support a hybrid approach. Doing so -may be possible, with a significant amount of extra development work, -but it does not yet seem worthwhile, for the reasons explained below. - -First, user surprise is likely when a switch supports a feature only -with a high performance penalty. For example, one user questioned why -adding a particular OpenFlow action to a flow caused a 1,058x slowdown -on a hardware OpenFlow implementation [1]. The action required the -flow to be implemented in software. - -Given that implementing a flow in software on the slow management CPU -of a hardware switch causes a major slowdown, software-implemented -flows would only make sense for very low-volume traffic. But many of -the features built into the OVS software switch implementation would -need to apply to every flow to be useful. There is no value, for -example, in applying bonding or 802.1Q VLAN support only to low-volume -traffic. - -Besides supporting features of OpenFlow actions, a hybrid approach -could also support forms of matching not supported by particular -switching hardware, by sending all packets that might match a rule to -software. But again this can cause an unacceptable slowdown by -forcing bulk traffic through software in the hardware switch's slow -management CPU. Consider, for example, a hardware switch that can -match on the IPv6 Ethernet type but not on fields in IPv6 headers. An -OpenFlow table that matched on the IPv6 Ethernet type would perform -well, but adding a rule that matched only UDPv6 would force every IPv6 -packet to software, slowing down not just UDPv6 but all IPv6 -processing. - -[1] Aaron Rosen, "Modify packet fields extremely slow", - openflow-discuss mailing list, June 26, 2011, archived at - https://mailman.stanford.edu/pipermail/openflow-discuss/2011-June/002386.html. - - -Questions ---------- - -Please direct porting questions to dev@openvswitch.org. We will try -to use questions to improve this porting guide. diff --git a/PORTING.md b/PORTING.md new file mode 100644 index 000000000..7e5de00d3 --- /dev/null +++ b/PORTING.md @@ -0,0 +1,324 @@ +How to Port Open vSwitch to New Software or Hardware +==================================================== + +Open vSwitch (OVS) is intended to be easily ported to new software and +hardware platforms. This document describes the types of changes that +are most likely to be necessary in porting OVS to Unix-like platforms. +(Porting OVS to other kinds of platforms is likely to be more +difficult.) + + +Vocabulary +---------- + +For historical reasons, different words are used for essentially the +same concept in different areas of the Open vSwitch source tree. Here +is a concordance, indexed by the area of the source tree: + + datapath/ vport --- + vswitchd/ iface port + ofproto/ port bundle + ofproto/bond.c slave bond + lib/lacp.c slave lacp + lib/netdev.c netdev --- + database Interface Port + + +Open vSwitch Architectural Overview +----------------------------------- + +The following diagram shows the very high-level architecture of Open +vSwitch from a porter's perspective. + + +-------------------+ + | ovs-vswitchd |<-->ovsdb-server + +-------------------+ + | ofproto |<-->OpenFlow controllers + +--------+-+--------+ + | netdev | | ofproto| + +--------+ |provider| + | netdev | +--------+ + |provider| + +--------+ + +Some of the components are generic. Modulo bugs or inadequacies, +these components should not need to be modified as part of a port: + + - "ovs-vswitchd" is the main Open vSwitch userspace program, in + vswitchd/. It reads the desired Open vSwitch configuration from + the ovsdb-server program over an IPC channel and passes this + configuration down to the "ofproto" library. It also passes + certain status and statistical information from ofproto back + into the database. + + - "ofproto" is the Open vSwitch library, in ofproto/, that + implements an OpenFlow switch. It talks to OpenFlow controllers + over the network and to switch hardware or software through an + "ofproto provider", explained further below. + + - "netdev" is the Open vSwitch library, in lib/netdev.c, that + abstracts interacting with network devices, that is, Ethernet + interfaces. The netdev library is a thin layer over "netdev + provider" code, explained further below. + +The other components may need attention during a port. You will +almost certainly have to implement a "netdev provider". Depending on +the type of port you are doing and the desired performance, you may +also have to implement an "ofproto provider" or a lower-level +component called a "dpif" provider. + +The following sections talk about these components in more detail. + + +Writing a netdev Provider +------------------------- + +A "netdev provider" implements an operating system and hardware +specific interface to "network devices", e.g. eth0 on Linux. Open +vSwitch must be able to open each port on a switch as a netdev, so you +will need to implement a "netdev provider" that works with your switch +hardware and software. + +struct netdev_class, in lib/netdev-provider.h, defines the interfaces +required to implement a netdev. That structure contains many function +pointers, each of which has a comment that is meant to describe its +behavior in detail. If the requirements are unclear, please report +this as a bug. + +The netdev interface can be divided into a few rough categories: + + * Functions required to properly implement OpenFlow features. For + example, OpenFlow requires the ability to report the Ethernet + hardware address of a port. These functions must be implemented + for minimally correct operation. + + * Functions required to implement optional Open vSwitch features. + For example, the Open vSwitch support for in-band control + requires netdev support for inspecting the TCP/IP stack's ARP + table. These functions must be implemented if the corresponding + OVS features are to work, but may be omitted initially. + + * Functions needed in some implementations but not in others. For + example, most kinds of ports (see below) do not need + functionality to receive packets from a network device. + +The existing netdev implementations may serve as useful examples +during a port: + + * lib/netdev-linux.c implements netdev functionality for Linux + network devices, using Linux kernel calls. It may be a good + place to start for full-featured netdev implementations. + + * lib/netdev-vport.c provides support for "virtual ports" + implemented by the Open vSwitch datapath module for the Linux + kernel. This may serve as a model for minimal netdev + implementations. + + * lib/netdev-dummy.c is a fake netdev implementation useful only + for testing. + + +Porting Strategies +------------------ + +After a netdev provider has been implemented for a system's network +devices, you may choose among three basic porting strategies. + +The lowest-effort strategy is to use the "userspace switch" +implementation built into Open vSwitch. This ought to work, without +writing any more code, as long as the netdev provider that you +implemented supports receiving packets. It yields poor performance, +however, because every packet passes through the ovs-vswitchd process. +See [INSTALL.userspace](INSTALL.userspace.md) for instructions on how +to configure a userspace switch. + +If the userspace switch is not the right choice for your port, then +you will have to write more code. You may implement either an +"ofproto provider" or a "dpif provider". Which you should choose +depends on a few different factors: + + * Only an ofproto provider can take full advantage of hardware + with built-in support for wildcards (e.g. an ACL table or a + TCAM). + + * A dpif provider can take advantage of the Open vSwitch built-in + implementations of bonding, LACP, 802.1ag, 802.1Q VLANs, and + other features. An ofproto provider has to provide its own + implementations, if the hardware can support them at all. + + * A dpif provider is usually easier to implement, but most + appropriate for software switching. It "explodes" wildcard + rules into exact-match entries (with an optional wildcard mask). + This allows fast hash lookups in software, but makes + inefficient use of TCAMs in hardware that support wildcarding. + +The following sections describe how to implement each kind of port. + + +ofproto Providers +----------------- + +An "ofproto provider" is what ofproto uses to directly monitor and +control an OpenFlow-capable switch. struct ofproto_class, in +ofproto/ofproto-provider.h, defines the interfaces to implement an +ofproto provider for new hardware or software. That structure contains +many function pointers, each of which has a comment that is meant to +describe its behavior in detail. If the requirements are unclear, +please report this as a bug. + +The ofproto provider interface is preliminary. Please let us know if +it seems unsuitable for your purpose. We will try to improve it. + + +Writing a dpif Provider +----------------------- + +Open vSwitch has a built-in ofproto provider named "ofproto-dpif", +which is built on top of a library for manipulating datapaths, called +"dpif". A "datapath" is a simple flow table, one that is only required +to support exact-match flows, that is, flows without wildcards. When a +packet arrives on a network device, the datapath looks for it in this +table. If there is a match, then it performs the associated actions. +If there is no match, the datapath passes the packet up to ofproto-dpif, +which maintains the full OpenFlow flow table. If the packet matches in +this flow table, then ofproto-dpif executes its actions and inserts a +new entry into the dpif flow table. (Otherwise, ofproto-dpif passes the +packet up to ofproto to send the packet to the OpenFlow controller, if +one is configured.) + +When calculating the dpif flow, ofproto-dpif generates an exact-match +flow that describes the missed packet. It makes an effort to figure out +what fields can be wildcarded based on the switch's configuration and +OpenFlow flow table. The dpif is free to ignore the suggested wildcards +and only support the exact-match entry. However, if the dpif supports +wildcarding, then it can use the masks to match multiple flows with +fewer entries and potentially significantly reduce the number of flow +misses handled by ofproto-dpif. + +The "dpif" library in turn delegates much of its functionality to a +"dpif provider". The following diagram shows how dpif providers fit +into the Open vSwitch architecture: + + _ + | +-------------------+ + | | ovs-vswitchd |<-->ovsdb-server + | +-------------------+ + | | ofproto |<-->OpenFlow controllers + | +--------+-+--------+ _ + | | netdev | |ofproto-| | + userspace | +--------+ | dpif | | + | | netdev | +--------+ | + | |provider| | dpif | | + | +---||---+ +--------+ | + | || | dpif | | implementation of + | || |provider| | ofproto provider + |_ || +---||---+ | + || || | + _ +---||-----+---||---+ | + | | |datapath| | + kernel | | +--------+ _| + | | | + |_ +--------||---------+ + || + physical + NIC + +struct dpif_class, in lib/dpif-provider.h, defines the interfaces +required to implement a dpif provider for new hardware or software. +That structure contains many function pointers, each of which has a +comment that is meant to describe its behavior in detail. If the +requirements are unclear, please report this as a bug. + +There are two existing dpif implementations that may serve as +useful examples during a port: + + * lib/dpif-netlink.c is a Linux-specific dpif implementation that + talks to an Open vSwitch-specific kernel module (whose sources + are in the "datapath" directory). The kernel module performs + all of the switching work, passing packets that do not match any + flow table entry up to userspace. This dpif implementation is + essentially a wrapper around calls into the kernel module. + + * lib/dpif-netdev.c is a generic dpif implementation that performs + all switching internally. This is how the Open vSwitch + userspace switch is implemented. + + +Miscellaneous Notes +------------------- + +Open vSwitch source code uses uint16_t, uint32_t, and uint64_t as +fixed-width types in host byte order, and ovs_be16, ovs_be32, and +ovs_be64 as fixed-width types in network byte order. Each of the +latter is equivalent to the one of the former, but the difference in +name makes the intended use obvious. + +The default "fail-mode" for Open vSwitch bridges is "standalone", +meaning that, when the OpenFlow controllers cannot be contacted, Open +vSwitch acts as a regular MAC-learning switch. This works well in +virtualization environments where there is normally just one uplink +(either a single physical interface or a bond). In a more general +environment, it can create loops. So, if you are porting to a +general-purpose switch platform, you should consider changing the +default "fail-mode" to "secure", which does not behave this way. See +documentation for the "fail-mode" column in the Bridge table in +ovs-vswitchd.conf.db(5) for more information. + +lib/entropy.c assumes that it can obtain high-quality random number +seeds at startup by reading from /dev/urandom. You will need to +modify it if this is not true on your platform. + +vswitchd/system-stats.c only knows how to obtain some statistics on +Linux. Optionally you may implement them for your platform as well. + + +Why OVS Does Not Support Hybrid Providers +----------------------------------------- + +The "Porting Strategies" section above describes the "ofproto +provider" and "dpif provider" porting strategies. Only an ofproto +provider can take advantage of hardware TCAM support, and only a dpif +provider can take advantage of the OVS built-in implementations of +various features. It is therefore tempting to suggest a hybrid +approach that shares the advantages of both strategies. + +However, Open vSwitch does not support a hybrid approach. Doing so +may be possible, with a significant amount of extra development work, +but it does not yet seem worthwhile, for the reasons explained below. + +First, user surprise is likely when a switch supports a feature only +with a high performance penalty. For example, one user questioned why +adding a particular OpenFlow action to a flow caused a 1,058x slowdown +on a hardware OpenFlow implementation [1]. The action required the +flow to be implemented in software. + +Given that implementing a flow in software on the slow management CPU +of a hardware switch causes a major slowdown, software-implemented +flows would only make sense for very low-volume traffic. But many of +the features built into the OVS software switch implementation would +need to apply to every flow to be useful. There is no value, for +example, in applying bonding or 802.1Q VLAN support only to low-volume +traffic. + +Besides supporting features of OpenFlow actions, a hybrid approach +could also support forms of matching not supported by particular +switching hardware, by sending all packets that might match a rule to +software. But again this can cause an unacceptable slowdown by +forcing bulk traffic through software in the hardware switch's slow +management CPU. Consider, for example, a hardware switch that can +match on the IPv6 Ethernet type but not on fields in IPv6 headers. An +OpenFlow table that matched on the IPv6 Ethernet type would perform +well, but adding a rule that matched only UDPv6 would force every IPv6 +packet to software, slowing down not just UDPv6 but all IPv6 +processing. + +[1] Aaron Rosen, "Modify packet fields extremely slow", + openflow-discuss mailing list, June 26, 2011, archived at + https://mailman.stanford.edu/pipermail/openflow-discuss/2011-June/002386.html. + + +Questions +--------- + +Please direct porting questions to dev@openvswitch.org. We will try +to use questions to improve this porting guide. diff --git a/README-lisp b/README-lisp deleted file mode 100644 index f1e1172a8..000000000 --- a/README-lisp +++ /dev/null @@ -1,81 +0,0 @@ -Using LISP tunneling -==================== - -LISP is a layer 3 tunneling mechanism, meaning that encapsulated packets do -not carry Ethernet headers, and ARP requests shouldn't be sent over the -tunnel. Because of this, there are some additional steps required for setting -up LISP tunnels in Open vSwitch, until support for L3 tunnels will improve. - -This guide assumes tunneling between two VMs connected to OVS bridges on -different hypervisors reachable over IPv4. Of course, more than one VM may be -connected to any of the hypervisors, and a hypervisor may communicate with -several different hypervisors over the same lisp tunneling interface. A LISP -"map-cache" can be implemented using flows, see example at the bottom of this -file. - -There are several scenarios: - - 1) the VMs have IP addresses in the same subnet and the hypervisors are also - in a single subnet (although one different from the VM's); - 2) the VMs have IP addresses in the same subnet but the hypervisors are - separated by a router; - 3) the VMs are in different subnets. - -In cases 1) and 3) ARP resolution can work as normal: ARP traffic is -configured not to go through the LISP tunnel. For case 1) ARP is able to -reach the other VM, if both OVS instances default to MAC address learning. -Case 3) requires the hypervisor be configured as the default router for the -VMs. - -In case 2) the VMs expect ARP replies from each other, but this is not -possible over a layer 3 tunnel. One solution is to have static MAC address -entries preconfigured on the VMs (e.g., `arp -f /etc/ethers` on startup on -Unix based VMs), or have the hypervisor do proxy ARP. In this scenario, the -eth0 interfaces need not be added to the br0 bridge in the examples below. - -On the receiving side, the packet arrives without the original MAC header. -The LISP tunneling code attaches a header with harcoded source and destination -MAC address 02:00:00:00:00:00. This address has all bits set to 0, except the -locally administered bit, in order to avoid potential collisions with existing -allocations. In order for packets to reach their intended destination, the -destination MAC address needs to be rewritten. This can be done using the -flow table. - -See below for an example setup, and the associated flow rules to enable LISP -tunneling. - - +---+ +---+ - |VM1| |VM2| - +---+ +---+ - | | - +--[tap0]--+ +--[tap0]---+ - | | | | - [lisp0] OVS1 [eth0]-----------------[eth0] OVS2 [lisp0] - | | | | - +----------+ +-----------+ - -On each hypervisor, interfaces tap0, eth0, and lisp0 are added to a single -bridge instance, and become numbered 1, 2, and 3 respectively: - - ovs-vsctl add-br br0 - ovs-vsctl add-port br0 tap0 - ovs-vsctl add-port br0 eth0 - ovs-vsctl add-port br0 lisp0 -- set Interface lisp0 type=lisp options:remote_ip=flow options:key=flow - -The last command sets up flow based tunneling on the lisp0 interface. From -the LISP point of view, this is like having the Tunnel Router map cache -implemented as flow rules. - -Flows on br0 should be configured as follows: - - priority=3,dl_dst=02:00:00:00:00:00,action=mod_dl_dst:,output:1 - priority=2,in_port=1,dl_type=0x0806,action=NORMAL - priority=1,in_port=1,dl_type=0x0800,vlan_tci=0,nw_src=,action=set_field:->tun_dst,output:3 - priority=0,action=NORMAL - -The third rule is like a map cache entry: the specified by the -nw_src match field is mapped to the RLOC , which is set as the tunnel -destination for this particular flow. - -Optionally, if you want to use Instance ID in a flow, you can add -"set_tunnel:" to the action list. diff --git a/README-lisp.md b/README-lisp.md new file mode 100644 index 000000000..f1e1172a8 --- /dev/null +++ b/README-lisp.md @@ -0,0 +1,81 @@ +Using LISP tunneling +==================== + +LISP is a layer 3 tunneling mechanism, meaning that encapsulated packets do +not carry Ethernet headers, and ARP requests shouldn't be sent over the +tunnel. Because of this, there are some additional steps required for setting +up LISP tunnels in Open vSwitch, until support for L3 tunnels will improve. + +This guide assumes tunneling between two VMs connected to OVS bridges on +different hypervisors reachable over IPv4. Of course, more than one VM may be +connected to any of the hypervisors, and a hypervisor may communicate with +several different hypervisors over the same lisp tunneling interface. A LISP +"map-cache" can be implemented using flows, see example at the bottom of this +file. + +There are several scenarios: + + 1) the VMs have IP addresses in the same subnet and the hypervisors are also + in a single subnet (although one different from the VM's); + 2) the VMs have IP addresses in the same subnet but the hypervisors are + separated by a router; + 3) the VMs are in different subnets. + +In cases 1) and 3) ARP resolution can work as normal: ARP traffic is +configured not to go through the LISP tunnel. For case 1) ARP is able to +reach the other VM, if both OVS instances default to MAC address learning. +Case 3) requires the hypervisor be configured as the default router for the +VMs. + +In case 2) the VMs expect ARP replies from each other, but this is not +possible over a layer 3 tunnel. One solution is to have static MAC address +entries preconfigured on the VMs (e.g., `arp -f /etc/ethers` on startup on +Unix based VMs), or have the hypervisor do proxy ARP. In this scenario, the +eth0 interfaces need not be added to the br0 bridge in the examples below. + +On the receiving side, the packet arrives without the original MAC header. +The LISP tunneling code attaches a header with harcoded source and destination +MAC address 02:00:00:00:00:00. This address has all bits set to 0, except the +locally administered bit, in order to avoid potential collisions with existing +allocations. In order for packets to reach their intended destination, the +destination MAC address needs to be rewritten. This can be done using the +flow table. + +See below for an example setup, and the associated flow rules to enable LISP +tunneling. + + +---+ +---+ + |VM1| |VM2| + +---+ +---+ + | | + +--[tap0]--+ +--[tap0]---+ + | | | | + [lisp0] OVS1 [eth0]-----------------[eth0] OVS2 [lisp0] + | | | | + +----------+ +-----------+ + +On each hypervisor, interfaces tap0, eth0, and lisp0 are added to a single +bridge instance, and become numbered 1, 2, and 3 respectively: + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 tap0 + ovs-vsctl add-port br0 eth0 + ovs-vsctl add-port br0 lisp0 -- set Interface lisp0 type=lisp options:remote_ip=flow options:key=flow + +The last command sets up flow based tunneling on the lisp0 interface. From +the LISP point of view, this is like having the Tunnel Router map cache +implemented as flow rules. + +Flows on br0 should be configured as follows: + + priority=3,dl_dst=02:00:00:00:00:00,action=mod_dl_dst:,output:1 + priority=2,in_port=1,dl_type=0x0806,action=NORMAL + priority=1,in_port=1,dl_type=0x0800,vlan_tci=0,nw_src=,action=set_field:->tun_dst,output:3 + priority=0,action=NORMAL + +The third rule is like a map cache entry: the specified by the +nw_src match field is mapped to the RLOC , which is set as the tunnel +destination for this particular flow. + +Optionally, if you want to use Instance ID in a flow, you can add +"set_tunnel:" to the action list. diff --git a/README.md b/README.md index 538613062..5084c310d 100644 --- a/README.md +++ b/README.md @@ -81,24 +81,27 @@ To install Open vSwitch on a regular Linux or FreeBSD host, please read INSTALL. For specifics around installation on a specific platform, please see one of these files: -- INSTALL.Debian -- INSTALL.Fedora -- INSTALL.RHEL -- INSTALL.XenServer +- [INSTALL.Debian](INSTALL.Debian.md) +- [INSTALL.Fedora](INSTALL.Fedora.md) +- [INSTALL.RHEL](INSTALL.RHEL.md) +- [INSTALL.XenServer](INSTALL.XenServer.md) To use Open vSwitch... -- ...with Docker on Linux, read INSTALL.Docker. +- ...with Docker on Linux, read [INSTALL.Docker](INSTALL.Docker.md) -- ...with KVM on Linux, read INSTALL, read INSTALL.KVM. +- ...with KVM on Linux, read [INSTALL](INSTALL.md), read + [INSTALL.KVM](INSTALL.KVM.md) -- ...with Libvirt, read INSTALL.Libvirt. +- ...with Libvirt, read [INSTALL.Libvirt](INSTALL.Libvirt.md). -- ...without using a kernel module, read INSTALL.userspace. +- ...without using a kernel module, read +[INSTALL.userspace](INSTALL.userspace.md). For answers to common questions, read FAQ. -To learn how to set up SSL support for Open vSwitch, read INSTALL.SSL. +To learn how to set up SSL support for Open vSwitch, read +[INSTALL.SSL](INSTALL.SSL.md). To learn about some advanced features of the Open vSwitch software switch, read the tutorial in tutorial/Tutorial. diff --git a/REPORTING-BUGS b/REPORTING-BUGS deleted file mode 100644 index 86510d276..000000000 --- a/REPORTING-BUGS +++ /dev/null @@ -1,56 +0,0 @@ -Reporting Bugs in Open vSwitch -============================== - -We are eager to hear from users about problems that they have -encountered with Open vSwitch. This file documents how best to report -bugs so as to ensure that they can be fixed as quickly as possible. - -Please report bugs by sending email to bugs@openvswitch.org. - -The most important parts of your bug report are the following: - - * What you did that make the problem appear. - - * What you expected to happen. - - * What actually happened. - -Please also include the following information: - - * The Open vSwitch version number (as output by "ovs-vswitchd - --version"). - - * The Git commit number (as output by "git rev-parse HEAD"), - if you built from a Git snapshot. - - * Any local patches or changes you have applied (if any). - -The following are also handy sometimes: - - * The kernel version on which Open vSwitch is running (from - /proc/version) and the distribution and version number of - your OS (e.g. "Centos 5.0"). - - * The contents of the vswitchd configuration database (usually - /etc/openvswitch/conf.db). - - * The output of "ovs-dpctl show". - - * If you have Open vSwitch configured to connect to an - OpenFlow controller, the output of "ovs-ofctl show " - for each configured in the vswitchd configuration - database. - - * A fix or workaround, if you have one. - - * Any other information that you think might be relevant. - -bugs@openvswitch.org is a public mailing list, to which anyone can -subscribe, so please do not include confidential information in your -bug report. - -Contact -------- - -bugs@openvswitch.org -http://openvswitch.org/ diff --git a/REPORTING-BUGS.md b/REPORTING-BUGS.md new file mode 100644 index 000000000..4e08cb1e0 --- /dev/null +++ b/REPORTING-BUGS.md @@ -0,0 +1,56 @@ +Reporting Bugs in Open vSwitch +============================== + +We are eager to hear from users about problems that they have +encountered with Open vSwitch. This file documents how best to report +bugs so as to ensure that they can be fixed as quickly as possible. + +Please report bugs by sending email to bugs@openvswitch.org. + +The most important parts of your bug report are the following: + + * What you did that make the problem appear. + + * What you expected to happen. + + * What actually happened. + +Please also include the following information: + + * The Open vSwitch version number (as output by `ovs-vswitchd + --version`). + + * The Git commit number (as output by `git rev-parse HEAD`), + if you built from a Git snapshot. + + * Any local patches or changes you have applied (if any). + +The following are also handy sometimes: + + * The kernel version on which Open vSwitch is running (from + `/proc/version`) and the distribution and version number of + your OS (e.g. "Centos 5.0"). + + * The contents of the vswitchd configuration database (usually + `/etc/openvswitch/conf.db`). + + * The output of `ovs-dpctl show`. + + * If you have Open vSwitch configured to connect to an + OpenFlow controller, the output of `ovs-ofctl show ` + for each `` configured in the vswitchd configuration + database. + + * A fix or workaround, if you have one. + + * Any other information that you think might be relevant. + +bugs@openvswitch.org is a public mailing list, to which anyone can +subscribe, so please do not include confidential information in your +bug report. + +Contact +------- + +bugs@openvswitch.org +http://openvswitch.org/ diff --git a/TODO b/TODO deleted file mode 100644 index e11089ac7..000000000 --- a/TODO +++ /dev/null @@ -1,285 +0,0 @@ - Open vSwitch Project Ideas - ========================== - -This file lists a number of project ideas for Open vSwitch. The ideas -here overlap somewhat with those in the OPENFLOW-1.1+ file. - - -Programming Project Ideas -========================= - -Each of these projects would ideally result in a patch or a short -series of them posted to ovs-dev. - -Please read CONTRIBUTING and CodingStyle in the top of the source tree -before you begin work. The OPENFLOW-1.1+ file also has an -introduction to how OpenFlow is implemented in Open vSwitch. It is -also a good idea to look around the source tree for related code, and -back through the Git history for commits on related subjects, to allow -you to follow existing patterns and conventions. - -Meters ------- - -Open vSwitch has OpenFlow protocol support for meters, but it does not -have an implementation in the kernel or userspace datapaths. An -implementation was proposed some time ago (I recommend looking for the -discussion in the ovs-dev mailing list archives), but for a few -different reasons it was not accepted. Some of those reasons apply -only to a kernel implementation of meters. At the time, a userspace -implementation wasn't as interesting, because the userspace switch -did not perform at a production speed, but with the advent of -multithreaded forwarding and, now, DPDK support, userspace-only meters -would be a great way to get started. - -Improve SSL/TLS Security ------------------------- - -Open vSwitch allows some weak ciphers to be used for its secure -connections. Security audits often suggest that the project remove -those ciphers, but there's not a clean way to modify the acceptable -ciphers. At the very least, the cipher list should be audited, but it -would be nice to make it configurable. - -Open vSwitch does not insist on perfect forward security via ephemeral -Diffie-Hellman key exchange when it establishes an SSL/TLS connection. -Given the wiretapping revelations over the last year, it seems wise to -turn this on. (This would probably amount to finding the right -OpenSSL function to call or just reducing the acceptable ciphers -further.) - -These changes might have backward-compatibility implications; one -would have to test the behavior of the reduced cipher list OVS against -older versions. - -Bash Command Completion ------------------------ - -ovs-vsctl and other programs would be easier to use if bash command -completion (with ``tab'', etc.) were supported. Alex Wang - is leading a team for this project. - -Auxiliary Connections ---------------------- - -Auxiliary connections are a feature of OpenFlow 1.3 and later that -allow OpenFlow messages to be carried over datagram channels such as -UDP or DTLS. One place to start would be to implement a datagram -abstraction library for OVS analogous to the ``stream'' library -that already abstracts TCP, SSL, and other stream protocols. - -Controller connection logging to pcap file ------------------------------------------- - -http://patchwork.openvswitch.org/patch/2249/ is an RFC patch that -allows the switch to record the traffic on OpenFlow controller -connections to a pcap file for later analysis. The patch lacks a good -way to enable and disable the feature. The task here would be to add -that and repost the patch. - -Basic OpenFlow 1.4 support --------------------------- - -Some basic support for OpenFlow 1.4 is missing and needs to be -implemented. These can be found by looking through lib/ofp-util.c for -mentions of OFP14_VERSION followed by a call to OVS_NOT_REACHED (which -aborts the program). - -OpenFlow 1.4: Flow monitoring ------------------------------ - -OpenFlow 1.4 introduces OFPMP_FLOW_MONITOR for notifying a controller -of changes to selected flow tables. This feature is based on -NXST_FLOW_MONITOR that is already part of Open vSwitch, so to -implement this feature would be to extend that code to handle the -OpenFlow 1.4 wire protocol. - -OpenFlow 1.3 also includes this feature as a ONF-defined extension, so -ideally OVS would support that too. - -OpenFlow 1.4 Role Status Message --------------------------------- - -OpenFlow 1.4 section 7.4.4 ``Controller Role Status Message'' -defines a new message sent by a switch to notify the controller that -its role (whether it is a master or a slave) has changed. OVS should -implement this. - -OpenFlow 1.3 also includes this feature as a ONF-defined extension, so -ideally OVS would support that too. - -OpenFlow 1.4 Vacancy Events ---------------------------- - -OpenFlow 1.4 section 7.4.5 ``Table Status Message'' defines a new -message sent by a switch to notify the controller that a flow table is -close to filling up (or that it is no longer close to filling up). -OVS should implement this. - -OpenFlow 1.3 also includes this feature as a ONF-defined extension, so -ideally OVS would support that too. - -OpenFlow 1.4 Group and Meter Change Notification ------------------------------------------------- - -OpenFlow 1.4 adds a feature whereby a controller can ask the switch to -send it copies of messages that change groups and meters. (This is -only useful in the presence of multiple controllers.) OVS should -implement this. - -OpenFlow 1.3 also includes this feature as a ONF-defined extension, so -ideally OVS would support that too. - - -Testing Project Ideas -===================== - -Each of these projects would ideally result in confirmation that -features work or bug reports explaining how they do not. Please sent -bug reports to dev at openvswitch.org, with as many details as you have. - -ONF Plugfest Results Analysis ------------------------------ - -Ben Pfaff has a collection of files reporting Open vSwitch conformance -to OpenFlow 1.3 provided by one of the vendors at the ONF plugfest -last year. Some of the reported failures have been fixed, some of the -other failures probably result from differing interpretations of -OpenFlow 1.3, and others are probably genuine bugs in Open vSwitch. -Open vSwitch has also improved in the meantime. Ben can provide the -results, privately, to some person or team who wishes to check them -out and try to pick out the genuine bugs. - -OpenFlow Fuzzer ---------------- - -Build a ``fuzzer'' for the OpenFlow protocol (or use an existing -one, if there is one) and run it against the Open vSwitch -implementation. One could also build a fuzzer for the OSVDB protocol. - -Ryu Certification Tests Analysis --------------------------------- - -The Ryu controller comes with a suite of ``certification tests'' -that check the correctness of a switch's implementation of various -OpenFlow 1.3 features. The INSTALL file in the OVS source tree has a -section that explains how to easily run these tests against an OVS -source tree. Run the tests and figure out whether any tests fail but -should pass. (Some tests fail and should fail because OVS does not -implement the particular feature; for example, OVS does not implement -PBB encapsulation, so related tests fail.) - -OFTest Results Analysis ------------------------ - -OFTest is a test suite for OpenFlow 1.0 compliance. The INSTALL file -in the OVS source tree has a section that explains how to easily run -these tests against an OVS source tree. Run the tests and figure out -whether any tests fail but should pass, and ideally why. OFTest is -not particularly well vetted--in the past, at least, some tests have -failed against OVS due to bugs in OFTest, not in OVS--so some care is -warranted. - - -Documentation Project Ideas -=========================== - -Each of these projects would ideally result in creating some new -documentation for users. Some documentation might be suitable to -accompany Open vSwitch as part of its source tree most likely either -in plain text or ``nroff'' (manpage) format. - -OpenFlow Basics Tutorial ------------------------- - -Open vSwitch has a tutorial that covers its advanced features, but it -does not have a basic tutorial. There are several tutorials on the -Internet already, so a new tutorial would have to distinguish itself -in some way. One way would be to use the Open vSwitch ``sandbox'' -environment already used in the advanced tutorial. The sandbox does -not require any real network or even supervisor privilege on the -machine where it runs, and thus it is easy to use with hardly any -up-front setup, so it is a gentle way to get started. - -FlowVisor via patch ports -------------------------- - -FlowVisor is a proxy that sits between OpenFlow controllers and a -switch. It divides up switch resources, allowing each controller to -control a ``slice'' of the network. For example, it can break up a -network based on VLAN, allowing different controllers to handle -packets with different VLANs. - -It seems that Open vSwitch has features that allow it to implement at -least simple forms of FlowVisor control without any need for -FlowVisor. Consider an Open vSwitch instance with three bridges. -Bridge br0 has physical ports eth0 and eth1. Bridge v9 has no -physical ports, but it has two ``patch ports'' that connect it to -br0. Bridge v11 has the same setup. Flows in br0 match packets -received on vlan 9, strip the vlan header, and direct them to the -appropriate patch port leading to v9. Additional flows in br0 match -packets received from v9, attach a VLAN 9 tag to them, and direct them -out eth0 or eth1 as appropriate. Other flows in br0 treat packets on -VLAN 11 similarly. Controllers attached to bridge v9 or v11 may thus -work as if they had full control of a network. - -It seems to me that this is a good example of the power of OpenFlow -and Open vSwitch. The point of this project is to explain how to do -this, with detailed examples, in case someone finds it handy and to -open eyes toward the generality of Open vSwitch usefulness. - -``Cookbooks'' -------------- - -The Open vSwitch website has a few ``cookbook'' entries that -describe how to use Open vSwitch in a few scenarios. There are only a -few of these and all of them are dated. It would be a good idea to -come up with ideas for some more and write them. These could be added -to the Open vSwitch website or the source tree or somewhere else. - -Demos ------ - -Record a demo of Open vSwitch functionality in use (or something else -relevant) and post it to youtube or another video site so that we can -link to it from openvswitch.org. - - -How to contribute -================= - -If you plan to contribute code for a feature, please let everyone know -on ovs-dev before you start work. This will help avoid duplicating -work. - -Please consider the following: - - * Testing. Please test your code. - - * Unit tests. Please consider writing some. The tests directory - has many examples that you can use as a starting point. - - * ovs-ofctl. If you add a feature that is useful for some - ovs-ofctl command then you should add support for it there. - - * Documentation. If you add a user-visible feature, then you - should document it in the appropriate manpage and mention it in - NEWS as well. - - * Coding style (see the CodingStyle file at the top of the source - tree). - - * The patch submission guidelines (see CONTRIBUTING). I - recommend using "git send-email", which automatically follows a - lot of those guidelines. - - -Bug Reporting -============= - -Please report problems to bugs@openvswitch.org. - - -Local Variables: -mode: text -End: diff --git a/TODO.md b/TODO.md new file mode 100644 index 000000000..581e213b6 --- /dev/null +++ b/TODO.md @@ -0,0 +1,285 @@ +Open vSwitch Project Ideas +========================== + +This file lists a number of project ideas for Open vSwitch. The ideas +here overlap somewhat with those in the OPENFLOW-1.1+ file. + + +Programming Project Ideas +========================= + +Each of these projects would ideally result in a patch or a short +series of them posted to ovs-dev. + +Please read [CONTRIBUTING](CONTRIBUTING.md) and [CodingStyle](CodingStyle.md) +in the top of the source tree before you begin work. The OPENFLOW-1.1+ +file also has an introduction to how OpenFlow is implemented in Open vSwitch. +It is also a good idea to look around the source tree for related code, and +back through the Git history for commits on related subjects, to allow +you to follow existing patterns and conventions. + +Meters +------ + +Open vSwitch has OpenFlow protocol support for meters, but it does not +have an implementation in the kernel or userspace datapaths. An +implementation was proposed some time ago (I recommend looking for the +discussion in the ovs-dev mailing list archives), but for a few +different reasons it was not accepted. Some of those reasons apply +only to a kernel implementation of meters. At the time, a userspace +implementation wasn't as interesting, because the userspace switch +did not perform at a production speed, but with the advent of +multithreaded forwarding and, now, DPDK support, userspace-only meters +would be a great way to get started. + +Improve SSL/TLS Security +------------------------ + +Open vSwitch allows some weak ciphers to be used for its secure +connections. Security audits often suggest that the project remove +those ciphers, but there's not a clean way to modify the acceptable +ciphers. At the very least, the cipher list should be audited, but it +would be nice to make it configurable. + +Open vSwitch does not insist on perfect forward security via ephemeral +Diffie-Hellman key exchange when it establishes an SSL/TLS connection. +Given the wiretapping revelations over the last year, it seems wise to +turn this on. (This would probably amount to finding the right +OpenSSL function to call or just reducing the acceptable ciphers +further.) + +These changes might have backward-compatibility implications; one +would have to test the behavior of the reduced cipher list OVS against +older versions. + +Bash Command Completion +----------------------- + +ovs-vsctl and other programs would be easier to use if bash command +completion (with ``tab'', etc.) were supported. Alex Wang + is leading a team for this project. + +Auxiliary Connections +--------------------- + +Auxiliary connections are a feature of OpenFlow 1.3 and later that +allow OpenFlow messages to be carried over datagram channels such as +UDP or DTLS. One place to start would be to implement a datagram +abstraction library for OVS analogous to the ``stream'' library +that already abstracts TCP, SSL, and other stream protocols. + +Controller connection logging to pcap file +------------------------------------------ + +http://patchwork.openvswitch.org/patch/2249/ is an RFC patch that +allows the switch to record the traffic on OpenFlow controller +connections to a pcap file for later analysis. The patch lacks a good +way to enable and disable the feature. The task here would be to add +that and repost the patch. + +Basic OpenFlow 1.4 support +-------------------------- + +Some basic support for OpenFlow 1.4 is missing and needs to be +implemented. These can be found by looking through lib/ofp-util.c for +mentions of OFP14_VERSION followed by a call to OVS_NOT_REACHED (which +aborts the program). + +OpenFlow 1.4: Flow monitoring +----------------------------- + +OpenFlow 1.4 introduces OFPMP_FLOW_MONITOR for notifying a controller +of changes to selected flow tables. This feature is based on +NXST_FLOW_MONITOR that is already part of Open vSwitch, so to +implement this feature would be to extend that code to handle the +OpenFlow 1.4 wire protocol. + +OpenFlow 1.3 also includes this feature as a ONF-defined extension, so +ideally OVS would support that too. + +OpenFlow 1.4 Role Status Message +-------------------------------- + +OpenFlow 1.4 section 7.4.4 ``Controller Role Status Message'' +defines a new message sent by a switch to notify the controller that +its role (whether it is a master or a slave) has changed. OVS should +implement this. + +OpenFlow 1.3 also includes this feature as a ONF-defined extension, so +ideally OVS would support that too. + +OpenFlow 1.4 Vacancy Events +--------------------------- + +OpenFlow 1.4 section 7.4.5 ``Table Status Message'' defines a new +message sent by a switch to notify the controller that a flow table is +close to filling up (or that it is no longer close to filling up). +OVS should implement this. + +OpenFlow 1.3 also includes this feature as a ONF-defined extension, so +ideally OVS would support that too. + +OpenFlow 1.4 Group and Meter Change Notification +------------------------------------------------ + +OpenFlow 1.4 adds a feature whereby a controller can ask the switch to +send it copies of messages that change groups and meters. (This is +only useful in the presence of multiple controllers.) OVS should +implement this. + +OpenFlow 1.3 also includes this feature as a ONF-defined extension, so +ideally OVS would support that too. + + +Testing Project Ideas +===================== + +Each of these projects would ideally result in confirmation that +features work or bug reports explaining how they do not. Please sent +bug reports to dev at openvswitch.org, with as many details as you have. + +ONF Plugfest Results Analysis +----------------------------- + +Ben Pfaff has a collection of files reporting Open vSwitch conformance +to OpenFlow 1.3 provided by one of the vendors at the ONF plugfest +last year. Some of the reported failures have been fixed, some of the +other failures probably result from differing interpretations of +OpenFlow 1.3, and others are probably genuine bugs in Open vSwitch. +Open vSwitch has also improved in the meantime. Ben can provide the +results, privately, to some person or team who wishes to check them +out and try to pick out the genuine bugs. + +OpenFlow Fuzzer +--------------- + +Build a ``fuzzer'' for the OpenFlow protocol (or use an existing +one, if there is one) and run it against the Open vSwitch +implementation. One could also build a fuzzer for the OSVDB protocol. + +Ryu Certification Tests Analysis +-------------------------------- + +The Ryu controller comes with a suite of ``certification tests'' +that check the correctness of a switch's implementation of various +OpenFlow 1.3 features. The INSTALL file in the OVS source tree has a +section that explains how to easily run these tests against an OVS +source tree. Run the tests and figure out whether any tests fail but +should pass. (Some tests fail and should fail because OVS does not +implement the particular feature; for example, OVS does not implement +PBB encapsulation, so related tests fail.) + +OFTest Results Analysis +----------------------- + +OFTest is a test suite for OpenFlow 1.0 compliance. The INSTALL file +in the OVS source tree has a section that explains how to easily run +these tests against an OVS source tree. Run the tests and figure out +whether any tests fail but should pass, and ideally why. OFTest is +not particularly well vetted--in the past, at least, some tests have +failed against OVS due to bugs in OFTest, not in OVS--so some care is +warranted. + + +Documentation Project Ideas +=========================== + +Each of these projects would ideally result in creating some new +documentation for users. Some documentation might be suitable to +accompany Open vSwitch as part of its source tree most likely either +in plain text or ``nroff'' (manpage) format. + +OpenFlow Basics Tutorial +------------------------ + +Open vSwitch has a tutorial that covers its advanced features, but it +does not have a basic tutorial. There are several tutorials on the +Internet already, so a new tutorial would have to distinguish itself +in some way. One way would be to use the Open vSwitch ``sandbox'' +environment already used in the advanced tutorial. The sandbox does +not require any real network or even supervisor privilege on the +machine where it runs, and thus it is easy to use with hardly any +up-front setup, so it is a gentle way to get started. + +FlowVisor via patch ports +------------------------- + +FlowVisor is a proxy that sits between OpenFlow controllers and a +switch. It divides up switch resources, allowing each controller to +control a ``slice'' of the network. For example, it can break up a +network based on VLAN, allowing different controllers to handle +packets with different VLANs. + +It seems that Open vSwitch has features that allow it to implement at +least simple forms of FlowVisor control without any need for +FlowVisor. Consider an Open vSwitch instance with three bridges. +Bridge br0 has physical ports eth0 and eth1. Bridge v9 has no +physical ports, but it has two ``patch ports'' that connect it to +br0. Bridge v11 has the same setup. Flows in br0 match packets +received on vlan 9, strip the vlan header, and direct them to the +appropriate patch port leading to v9. Additional flows in br0 match +packets received from v9, attach a VLAN 9 tag to them, and direct them +out eth0 or eth1 as appropriate. Other flows in br0 treat packets on +VLAN 11 similarly. Controllers attached to bridge v9 or v11 may thus +work as if they had full control of a network. + +It seems to me that this is a good example of the power of OpenFlow +and Open vSwitch. The point of this project is to explain how to do +this, with detailed examples, in case someone finds it handy and to +open eyes toward the generality of Open vSwitch usefulness. + +``Cookbooks'' +------------- + +The Open vSwitch website has a few ``cookbook'' entries that +describe how to use Open vSwitch in a few scenarios. There are only a +few of these and all of them are dated. It would be a good idea to +come up with ideas for some more and write them. These could be added +to the Open vSwitch website or the source tree or somewhere else. + +Demos +----- + +Record a demo of Open vSwitch functionality in use (or something else +relevant) and post it to youtube or another video site so that we can +link to it from openvswitch.org. + + +How to contribute +================= + +If you plan to contribute code for a feature, please let everyone know +on ovs-dev before you start work. This will help avoid duplicating +work. + +Please consider the following: + + * Testing. Please test your code. + + * Unit tests. Please consider writing some. The tests directory + has many examples that you can use as a starting point. + + * ovs-ofctl. If you add a feature that is useful for some + ovs-ofctl command then you should add support for it there. + + * Documentation. If you add a user-visible feature, then you + should document it in the appropriate manpage and mention it in + NEWS as well. + + * Coding style (see the [CodingStyle](CodingStyle.md) file at the top + of the source tree). + + * The patch submission guidelines (see [CONTRIBUTING](CONTRIBUTING.md)). + I recommend using "git send-email", which automatically follows a + lot of those guidelines. + + +Bug Reporting +============= + +Please report problems to bugs@openvswitch.org. + + +Local Variables: +mode: text +End: diff --git a/WHY-OVS b/WHY-OVS deleted file mode 100644 index f5f47ff2b..000000000 --- a/WHY-OVS +++ /dev/null @@ -1,106 +0,0 @@ - Why Open vSwitch? - ================= - -Hypervisors need the ability to bridge traffic between VMs and with the -outside world. On Linux-based hypervisors, this used to mean using the -built-in L2 switch (the Linux bridge), which is fast and reliable. So, -it is reasonable to ask why Open vSwitch is used. - -The answer is that Open vSwitch is targeted at multi-server -virtualization deployments, a landscape for which the previous stack is -not well suited. These environments are often characterized by highly -dynamic end-points, the maintenance of logical abstractions, and -(sometimes) integration with or offloading to special purpose switching -hardware. - -The following characteristics and design considerations help Open -vSwitch cope with the above requirements. - -* The mobility of state: All network state associated with a network - entity (say a virtual machine) should be easily identifiable and - migratable between different hosts. This may include traditional - "soft state" (such as an entry in an L2 learning table), L3 forwarding - state, policy routing state, ACLs, QoS policy, monitoring - configuration (e.g. NetFlow, IPFIX, sFlow), etc. - - Open vSwitch has support for both configuring and migrating both slow - (configuration) and fast network state between instances. For - example, if a VM migrates between end-hosts, it is possible to not - only migrate associated configuration (SPAN rules, ACLs, QoS) but any - live network state (including, for example, existing state which - may be difficult to reconstruct). Further, Open vSwitch state is - typed and backed by a real data-model allowing for the development of - structured automation systems. - -* Responding to network dynamics: Virtual environments are often - characterized by high-rates of change. VMs coming and going, VMs - moving backwards and forwards in time, changes to the logical network - environments, and so forth. - - Open vSwitch supports a number of features that allow a network - control system to respond and adapt as the environment changes. - This includes simple accounting and visibility support such as - NetFlow, IPFIX, and sFlow. But perhaps more useful, Open vSwitch - supports a network state database (OVSDB) that supports remote - triggers. Therefore, a piece of orchestration software can "watch" - various aspects of the network and respond if/when they change. - This is used heavily today, for example, to respond to and track VM - migrations. - - Open vSwitch also supports OpenFlow as a method of exporting remote - access to control traffic. There are a number of uses for this - including global network discovery through inspection of discovery - or link-state traffic (e.g. LLDP, CDP, OSPF, etc.). - -* Maintenance of logical tags: Distributed virtual switches (such as - VMware vDS and Cisco's Nexus 1000V) often maintain logical context - within the network through appending or manipulating tags in network - packets. This can be used to uniquely identify a VM (in a manner - resistant to hardware spoofing), or to hold some other context that - is only relevant in the logical domain. Much of the problem of - building a distributed virtual switch is to efficiently and correctly - manage these tags. - - Open vSwitch includes multiple methods for specifying and maintaining - tagging rules, all of which are accessible to a remote process for - orchestration. Further, in many cases these tagging rules are stored - in an optimized form so they don't have to be coupled with a - heavyweight network device. This allows, for example, thousands of - tagging or address remapping rules to be configured, changed, and - migrated. - - In a similar vein, Open vSwitch supports a GRE implementation that can - handle thousands of simultaneous GRE tunnels and supports remote - configuration for tunnel creation, configuration, and tear-down. - This, for example, can be used to connect private VM networks in - different data centers. - -* Hardware integration: Open vSwitch's forwarding path (the in-kernel - datapath) is designed to be amenable to "offloading" packet processing - to hardware chipsets, whether housed in a classic hardware switch - chassis or in an end-host NIC. This allows for the Open vSwitch - control path to be able to both control a pure software - implementation or a hardware switch. - - There are many ongoing efforts to port Open vSwitch to hardware - chipsets. These include multiple merchant silicon chipsets (Broadcom - and Marvell), as well as a number of vendor-specific platforms. (The - PORTING file discusses how one would go about making such a port.) - - The advantage of hardware integration is not only performance within - virtualized environments. If physical switches also expose the Open - vSwitch control abstractions, both bare-metal and virtualized hosting - environments can be managed using the same mechanism for automated - network control. - -In many ways, Open vSwitch targets a different point in the design space -than previous hypervisor networking stacks, focusing on the need for -automated and dynamic network control in large-scale Linux-based -virtualization environments. - -The goal with Open vSwitch is to keep the in-kernel code as small as -possible (as is necessary for performance) and to re-use existing -subsystems when applicable (for example Open vSwitch uses the existing -QoS stack). As of Linux 3.3, Open vSwitch is included as a part of the -kernel and packaging for the userspace utilities are available on most -popular distributions. diff --git a/WHY-OVS.md b/WHY-OVS.md new file mode 100644 index 000000000..d31e69e71 --- /dev/null +++ b/WHY-OVS.md @@ -0,0 +1,106 @@ +Why Open vSwitch? +================= + +Hypervisors need the ability to bridge traffic between VMs and with the +outside world. On Linux-based hypervisors, this used to mean using the +built-in L2 switch (the Linux bridge), which is fast and reliable. So, +it is reasonable to ask why Open vSwitch is used. + +The answer is that Open vSwitch is targeted at multi-server +virtualization deployments, a landscape for which the previous stack is +not well suited. These environments are often characterized by highly +dynamic end-points, the maintenance of logical abstractions, and +(sometimes) integration with or offloading to special purpose switching +hardware. + +The following characteristics and design considerations help Open +vSwitch cope with the above requirements. + +* The mobility of state: All network state associated with a network + entity (say a virtual machine) should be easily identifiable and + migratable between different hosts. This may include traditional + "soft state" (such as an entry in an L2 learning table), L3 forwarding + state, policy routing state, ACLs, QoS policy, monitoring + configuration (e.g. NetFlow, IPFIX, sFlow), etc. + + Open vSwitch has support for both configuring and migrating both slow + (configuration) and fast network state between instances. For + example, if a VM migrates between end-hosts, it is possible to not + only migrate associated configuration (SPAN rules, ACLs, QoS) but any + live network state (including, for example, existing state which + may be difficult to reconstruct). Further, Open vSwitch state is + typed and backed by a real data-model allowing for the development of + structured automation systems. + +* Responding to network dynamics: Virtual environments are often + characterized by high-rates of change. VMs coming and going, VMs + moving backwards and forwards in time, changes to the logical network + environments, and so forth. + + Open vSwitch supports a number of features that allow a network + control system to respond and adapt as the environment changes. + This includes simple accounting and visibility support such as + NetFlow, IPFIX, and sFlow. But perhaps more useful, Open vSwitch + supports a network state database (OVSDB) that supports remote + triggers. Therefore, a piece of orchestration software can "watch" + various aspects of the network and respond if/when they change. + This is used heavily today, for example, to respond to and track VM + migrations. + + Open vSwitch also supports OpenFlow as a method of exporting remote + access to control traffic. There are a number of uses for this + including global network discovery through inspection of discovery + or link-state traffic (e.g. LLDP, CDP, OSPF, etc.). + +* Maintenance of logical tags: Distributed virtual switches (such as + VMware vDS and Cisco's Nexus 1000V) often maintain logical context + within the network through appending or manipulating tags in network + packets. This can be used to uniquely identify a VM (in a manner + resistant to hardware spoofing), or to hold some other context that + is only relevant in the logical domain. Much of the problem of + building a distributed virtual switch is to efficiently and correctly + manage these tags. + + Open vSwitch includes multiple methods for specifying and maintaining + tagging rules, all of which are accessible to a remote process for + orchestration. Further, in many cases these tagging rules are stored + in an optimized form so they don't have to be coupled with a + heavyweight network device. This allows, for example, thousands of + tagging or address remapping rules to be configured, changed, and + migrated. + + In a similar vein, Open vSwitch supports a GRE implementation that can + handle thousands of simultaneous GRE tunnels and supports remote + configuration for tunnel creation, configuration, and tear-down. + This, for example, can be used to connect private VM networks in + different data centers. + +* Hardware integration: Open vSwitch's forwarding path (the in-kernel + datapath) is designed to be amenable to "offloading" packet processing + to hardware chipsets, whether housed in a classic hardware switch + chassis or in an end-host NIC. This allows for the Open vSwitch + control path to be able to both control a pure software + implementation or a hardware switch. + + There are many ongoing efforts to port Open vSwitch to hardware + chipsets. These include multiple merchant silicon chipsets (Broadcom + and Marvell), as well as a number of vendor-specific platforms. (The + PORTING file discusses how one would go about making such a port.) + + The advantage of hardware integration is not only performance within + virtualized environments. If physical switches also expose the Open + vSwitch control abstractions, both bare-metal and virtualized hosting + environments can be managed using the same mechanism for automated + network control. + +In many ways, Open vSwitch targets a different point in the design space +than previous hypervisor networking stacks, focusing on the need for +automated and dynamic network control in large-scale Linux-based +virtualization environments. + +The goal with Open vSwitch is to keep the in-kernel code as small as +possible (as is necessary for performance) and to re-use existing +subsystems when applicable (for example Open vSwitch uses the existing +QoS stack). As of Linux 3.3, Open vSwitch is included as a part of the +kernel and packaging for the userspace utilities are available on most +popular distributions. diff --git a/datapath/Modules.mk b/datapath/Modules.mk index 90e158cd2..72cb4dcbf 100644 --- a/datapath/Modules.mk +++ b/datapath/Modules.mk @@ -34,7 +34,7 @@ openvswitch_headers = \ vport-netdev.h openvswitch_extras = \ - README + README.md dist_sources = $(foreach module,$(dist_modules),$($(module)_sources)) dist_headers = $(foreach module,$(dist_modules),$($(module)_headers)) diff --git a/datapath/README b/datapath/README deleted file mode 100644 index 37c20ee24..000000000 --- a/datapath/README +++ /dev/null @@ -1,235 +0,0 @@ -Open vSwitch datapath developer documentation -============================================= - -The Open vSwitch kernel module allows flexible userspace control over -flow-level packet processing on selected network devices. It can be -used to implement a plain Ethernet switch, network device bonding, -VLAN processing, network access control, flow-based network control, -and so on. - -The kernel module implements multiple "datapaths" (analogous to -bridges), each of which can have multiple "vports" (analogous to ports -within a bridge). Each datapath also has associated with it a "flow -table" that userspace populates with "flows" that map from keys based -on packet headers and metadata to sets of actions. The most common -action forwards the packet to another vport; other actions are also -implemented. - -When a packet arrives on a vport, the kernel module processes it by -extracting its flow key and looking it up in the flow table. If there -is a matching flow, it executes the associated actions. If there is -no match, it queues the packet to userspace for processing (as part of -its processing, userspace will likely set up a flow to handle further -packets of the same type entirely in-kernel). - - -Flow key compatibility ----------------------- - -Network protocols evolve over time. New protocols become important -and existing protocols lose their prominence. For the Open vSwitch -kernel module to remain relevant, it must be possible for newer -versions to parse additional protocols as part of the flow key. It -might even be desirable, someday, to drop support for parsing -protocols that have become obsolete. Therefore, the Netlink interface -to Open vSwitch is designed to allow carefully written userspace -applications to work with any version of the flow key, past or future. - -To support this forward and backward compatibility, whenever the -kernel module passes a packet to userspace, it also passes along the -flow key that it parsed from the packet. Userspace then extracts its -own notion of a flow key from the packet and compares it against the -kernel-provided version: - - - If userspace's notion of the flow key for the packet matches the - kernel's, then nothing special is necessary. - - - If the kernel's flow key includes more fields than the userspace - version of the flow key, for example if the kernel decoded IPv6 - headers but userspace stopped at the Ethernet type (because it - does not understand IPv6), then again nothing special is - necessary. Userspace can still set up a flow in the usual way, - as long as it uses the kernel-provided flow key to do it. - - - If the userspace flow key includes more fields than the - kernel's, for example if userspace decoded an IPv6 header but - the kernel stopped at the Ethernet type, then userspace can - forward the packet manually, without setting up a flow in the - kernel. This case is bad for performance because every packet - that the kernel considers part of the flow must go to userspace, - but the forwarding behavior is correct. (If userspace can - determine that the values of the extra fields would not affect - forwarding behavior, then it could set up a flow anyway.) - -How flow keys evolve over time is important to making this work, so -the following sections go into detail. - - -Flow key format ---------------- - -A flow key is passed over a Netlink socket as a sequence of Netlink -attributes. Some attributes represent packet metadata, defined as any -information about a packet that cannot be extracted from the packet -itself, e.g. the vport on which the packet was received. Most -attributes, however, are extracted from headers within the packet, -e.g. source and destination addresses from Ethernet, IP, or TCP -headers. - -The header file defines the exact format of the -flow key attributes. For informal explanatory purposes here, we write -them as comma-separated strings, with parentheses indicating arguments -and nesting. For example, the following could represent a flow key -corresponding to a TCP packet that arrived on vport 1: - - in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4), - eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0, - frag=no), tcp(src=49163, dst=80) - -Often we ellipsize arguments not important to the discussion, e.g.: - - in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...) - - -Wildcarded flow key format --------------------------- - -A wildcarded flow is described with two sequences of Netlink attributes -passed over the Netlink socket. A flow key, exactly as described above, and an -optional corresponding flow mask. - -A wildcarded flow can represent a group of exact match flows. Each '1' bit -in the mask specifies a exact match with the corresponding bit in the flow key. -A '0' bit specifies a don't care bit, which will match either a '1' or '0' bit -of a incoming packet. Using wildcarded flow can improve the flow set up rate -by reduce the number of new flows need to be processed by the user space program. - -Support for the mask Netlink attribute is optional for both the kernel and user -space program. The kernel can ignore the mask attribute, installing an exact -match flow, or reduce the number of don't care bits in the kernel to less than -what was specified by the user space program. In this case, variations in bits -that the kernel does not implement will simply result in additional flow setups. -The kernel module will also work with user space programs that neither support -nor supply flow mask attributes. - -Since the kernel may ignore or modify wildcard bits, it can be difficult for -the userspace program to know exactly what matches are installed. There are -two possible approaches: reactively install flows as they miss the kernel -flow table (and therefore not attempt to determine wildcard changes at all) -or use the kernel's response messages to determine the installed wildcards. - -When interacting with userspace, the kernel should maintain the match portion -of the key exactly as originally installed. This will provides a handle to -identify the flow for all future operations. However, when reporting the -mask of an installed flow, the mask should include any restrictions imposed -by the kernel. - -The behavior when using overlapping wildcarded flows is undefined. It is the -responsibility of the user space program to ensure that any incoming packet -can match at most one flow, wildcarded or not. The current implementation -performs best-effort detection of overlapping wildcarded flows and may reject -some but not all of them. However, this behavior may change in future versions. - - -Basic rule for evolving flow keys ---------------------------------- - -Some care is needed to really maintain forward and backward -compatibility for applications that follow the rules listed under -"Flow key compatibility" above. - -The basic rule is obvious: - - ------------------------------------------------------------------ - New network protocol support must only supplement existing flow - key attributes. It must not change the meaning of already defined - flow key attributes. - ------------------------------------------------------------------ - -This rule does have less-obvious consequences so it is worth working -through a few examples. Suppose, for example, that the kernel module -did not already implement VLAN parsing. Instead, it just interpreted -the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the -packet. The flow key for any packet with an 802.1Q header would look -essentially like this, ignoring metadata: - - eth(...), eth_type(0x8100) - -Naively, to add VLAN support, it makes sense to add a new "vlan" flow -key attribute to contain the VLAN tag, then continue to decode the -encapsulated headers beyond the VLAN tag using the existing field -definitions. With this change, a TCP packet in VLAN 10 would have a -flow key much like this: - - eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...) - -But this change would negatively affect a userspace application that -has not been updated to understand the new "vlan" flow key attribute. -The application could, following the flow compatibility rules above, -ignore the "vlan" attribute that it does not understand and therefore -assume that the flow contained IP packets. This is a bad assumption -(the flow only contains IP packets if one parses and skips over the -802.1Q header) and it could cause the application's behavior to change -across kernel versions even though it follows the compatibility rules. - -The solution is to use a set of nested attributes. This is, for -example, why 802.1Q support uses nested attributes. A TCP packet in -VLAN 10 is actually expressed as: - - eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800), - ip(proto=6, ...), tcp(...))) - -Notice how the "eth_type", "ip", and "tcp" flow key attributes are -nested inside the "encap" attribute. Thus, an application that does -not understand the "vlan" key will not see either of those attributes -and therefore will not misinterpret them. (Also, the outer eth_type -is still 0x8100, not changed to 0x0800.) - -Handling malformed packets --------------------------- - -Don't drop packets in the kernel for malformed protocol headers, bad -checksums, etc. This would prevent userspace from implementing a -simple Ethernet switch that forwards every packet. - -Instead, in such a case, include an attribute with "empty" content. -It doesn't matter if the empty content could be valid protocol values, -as long as those values are rarely seen in practice, because userspace -can always forward all packets with those values to userspace and -handle them individually. - -For example, consider a packet that contains an IP header that -indicates protocol 6 for TCP, but which is truncated just after the IP -header, so that the TCP header is missing. The flow key for this -packet would include a tcp attribute with all-zero src and dst, like -this: - - eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0) - -As another example, consider a packet with an Ethernet type of 0x8100, -indicating that a VLAN TCI should follow, but which is truncated just -after the Ethernet type. The flow key for this packet would include -an all-zero-bits vlan and an empty encap attribute, like this: - - eth(...), eth_type(0x8100), vlan(0), encap() - -Unlike a TCP packet with source and destination ports 0, an -all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka -VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan -attribute expressly to allow this situation to be distinguished. -Thus, the flow key in this second example unambiguously indicates a -missing or malformed VLAN TCI. - -Other rules ------------ - -The other rules for flow keys are much less subtle: - - - Duplicate attributes are not allowed at a given nesting level. - - - Ordering of attributes is not significant. - - - When the kernel sends a given flow key to userspace, it always - composes it the same way. This allows userspace to hash and - compare entire flow keys that it may not be able to fully - interpret. diff --git a/datapath/README.md b/datapath/README.md new file mode 100644 index 000000000..a8effa354 --- /dev/null +++ b/datapath/README.md @@ -0,0 +1,235 @@ +Open vSwitch datapath developer documentation +============================================= + +The Open vSwitch kernel module allows flexible userspace control over +flow-level packet processing on selected network devices. It can be +used to implement a plain Ethernet switch, network device bonding, +VLAN processing, network access control, flow-based network control, +and so on. + +The kernel module implements multiple "datapaths" (analogous to +bridges), each of which can have multiple "vports" (analogous to ports +within a bridge). Each datapath also has associated with it a "flow +table" that userspace populates with "flows" that map from keys based +on packet headers and metadata to sets of actions. The most common +action forwards the packet to another vport; other actions are also +implemented. + +When a packet arrives on a vport, the kernel module processes it by +extracting its flow key and looking it up in the flow table. If there +is a matching flow, it executes the associated actions. If there is +no match, it queues the packet to userspace for processing (as part of +its processing, userspace will likely set up a flow to handle further +packets of the same type entirely in-kernel). + + +Flow key compatibility +---------------------- + +Network protocols evolve over time. New protocols become important +and existing protocols lose their prominence. For the Open vSwitch +kernel module to remain relevant, it must be possible for newer +versions to parse additional protocols as part of the flow key. It +might even be desirable, someday, to drop support for parsing +protocols that have become obsolete. Therefore, the Netlink interface +to Open vSwitch is designed to allow carefully written userspace +applications to work with any version of the flow key, past or future. + +To support this forward and backward compatibility, whenever the +kernel module passes a packet to userspace, it also passes along the +flow key that it parsed from the packet. Userspace then extracts its +own notion of a flow key from the packet and compares it against the +kernel-provided version: + + - If userspace's notion of the flow key for the packet matches the + kernel's, then nothing special is necessary. + + - If the kernel's flow key includes more fields than the userspace + version of the flow key, for example if the kernel decoded IPv6 + headers but userspace stopped at the Ethernet type (because it + does not understand IPv6), then again nothing special is + necessary. Userspace can still set up a flow in the usual way, + as long as it uses the kernel-provided flow key to do it. + + - If the userspace flow key includes more fields than the + kernel's, for example if userspace decoded an IPv6 header but + the kernel stopped at the Ethernet type, then userspace can + forward the packet manually, without setting up a flow in the + kernel. This case is bad for performance because every packet + that the kernel considers part of the flow must go to userspace, + but the forwarding behavior is correct. (If userspace can + determine that the values of the extra fields would not affect + forwarding behavior, then it could set up a flow anyway.) + +How flow keys evolve over time is important to making this work, so +the following sections go into detail. + + +Flow key format +--------------- + +A flow key is passed over a Netlink socket as a sequence of Netlink +attributes. Some attributes represent packet metadata, defined as any +information about a packet that cannot be extracted from the packet +itself, e.g. the vport on which the packet was received. Most +attributes, however, are extracted from headers within the packet, +e.g. source and destination addresses from Ethernet, IP, or TCP +headers. + +The header file defines the exact format of the +flow key attributes. For informal explanatory purposes here, we write +them as comma-separated strings, with parentheses indicating arguments +and nesting. For example, the following could represent a flow key +corresponding to a TCP packet that arrived on vport 1: + + in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4), + eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0, + frag=no), tcp(src=49163, dst=80) + +Often we ellipsize arguments not important to the discussion, e.g.: + + in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...) + + +Wildcarded flow key format +-------------------------- + +A wildcarded flow is described with two sequences of Netlink attributes +passed over the Netlink socket. A flow key, exactly as described above, and an +optional corresponding flow mask. + +A wildcarded flow can represent a group of exact match flows. Each '1' bit +in the mask specifies a exact match with the corresponding bit in the flow key. +A '0' bit specifies a don't care bit, which will match either a '1' or '0' bit +of a incoming packet. Using wildcarded flow can improve the flow set up rate +by reduce the number of new flows need to be processed by the user space program. + +Support for the mask Netlink attribute is optional for both the kernel and user +space program. The kernel can ignore the mask attribute, installing an exact +match flow, or reduce the number of don't care bits in the kernel to less than +what was specified by the user space program. In this case, variations in bits +that the kernel does not implement will simply result in additional flow setups. +The kernel module will also work with user space programs that neither support +nor supply flow mask attributes. + +Since the kernel may ignore or modify wildcard bits, it can be difficult for +the userspace program to know exactly what matches are installed. There are +two possible approaches: reactively install flows as they miss the kernel +flow table (and therefore not attempt to determine wildcard changes at all) +or use the kernel's response messages to determine the installed wildcards. + +When interacting with userspace, the kernel should maintain the match portion +of the key exactly as originally installed. This will provides a handle to +identify the flow for all future operations. However, when reporting the +mask of an installed flow, the mask should include any restrictions imposed +by the kernel. + +The behavior when using overlapping wildcarded flows is undefined. It is the +responsibility of the user space program to ensure that any incoming packet +can match at most one flow, wildcarded or not. The current implementation +performs best-effort detection of overlapping wildcarded flows and may reject +some but not all of them. However, this behavior may change in future versions. + + +Basic rule for evolving flow keys +--------------------------------- + +Some care is needed to really maintain forward and backward +compatibility for applications that follow the rules listed under +"Flow key compatibility" above. + +The basic rule is obvious: + + ------------------------------------------------------------------ + New network protocol support must only supplement existing flow + key attributes. It must not change the meaning of already defined + flow key attributes. + ------------------------------------------------------------------ + +This rule does have less-obvious consequences so it is worth working +through a few examples. Suppose, for example, that the kernel module +did not already implement VLAN parsing. Instead, it just interpreted +the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the +packet. The flow key for any packet with an 802.1Q header would look +essentially like this, ignoring metadata: + + eth(...), eth_type(0x8100) + +Naively, to add VLAN support, it makes sense to add a new "vlan" flow +key attribute to contain the VLAN tag, then continue to decode the +encapsulated headers beyond the VLAN tag using the existing field +definitions. With this change, a TCP packet in VLAN 10 would have a +flow key much like this: + + eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...) + +But this change would negatively affect a userspace application that +has not been updated to understand the new "vlan" flow key attribute. +The application could, following the flow compatibility rules above, +ignore the "vlan" attribute that it does not understand and therefore +assume that the flow contained IP packets. This is a bad assumption +(the flow only contains IP packets if one parses and skips over the +802.1Q header) and it could cause the application's behavior to change +across kernel versions even though it follows the compatibility rules. + +The solution is to use a set of nested attributes. This is, for +example, why 802.1Q support uses nested attributes. A TCP packet in +VLAN 10 is actually expressed as: + + eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800), + ip(proto=6, ...), tcp(...))) + +Notice how the "eth_type", "ip", and "tcp" flow key attributes are +nested inside the "encap" attribute. Thus, an application that does +not understand the "vlan" key will not see either of those attributes +and therefore will not misinterpret them. (Also, the outer eth_type +is still 0x8100, not changed to 0x0800.) + +Handling malformed packets +-------------------------- + +Don't drop packets in the kernel for malformed protocol headers, bad +checksums, etc. This would prevent userspace from implementing a +simple Ethernet switch that forwards every packet. + +Instead, in such a case, include an attribute with "empty" content. +It doesn't matter if the empty content could be valid protocol values, +as long as those values are rarely seen in practice, because userspace +can always forward all packets with those values to userspace and +handle them individually. + +For example, consider a packet that contains an IP header that +indicates protocol 6 for TCP, but which is truncated just after the IP +header, so that the TCP header is missing. The flow key for this +packet would include a tcp attribute with all-zero src and dst, like +this: + + eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0) + +As another example, consider a packet with an Ethernet type of 0x8100, +indicating that a VLAN TCI should follow, but which is truncated just +after the Ethernet type. The flow key for this packet would include +an all-zero-bits vlan and an empty encap attribute, like this: + + eth(...), eth_type(0x8100), vlan(0), encap() + +Unlike a TCP packet with source and destination ports 0, an +all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka +VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan +attribute expressly to allow this situation to be distinguished. +Thus, the flow key in this second example unambiguously indicates a +missing or malformed VLAN TCI. + +Other rules +----------- + +The other rules for flow keys are much less subtle: + + - Duplicate attributes are not allowed at a given nesting level. + + - Ordering of attributes is not significant. + + - When the kernel sends a given flow key to userspace, it always + composes it the same way. This allows userspace to hash and + compare entire flow keys that it may not be able to fully + interpret. diff --git a/lib/dpif.h b/lib/dpif.h index f88fa78d8..c6e045bec 100644 --- a/lib/dpif.h +++ b/lib/dpif.h @@ -113,7 +113,7 @@ * * In Open vSwitch userspace, "struct flow" is the typical way to describe * a flow, but the datapath interface uses a different data format to - * allow ABI forward- and backward-compatibility. datapath/README + * allow ABI forward- and backward-compatibility. datapath/README.md * describes the rationale and design. Refer to OVS_KEY_ATTR_* and * "struct ovs_key_*" in include/odp-netlink.h for details. * lib/odp-util.h defines several functions for working with these flows. diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in index d3b0ecb1f..0a1fe00e3 100644 --- a/rhel/openvswitch-fedora.spec.in +++ b/rhel/openvswitch-fedora.spec.in @@ -204,7 +204,7 @@ systemctl start openvswitch.service %doc /usr/share/man/man8/ovs-test.8.gz %doc /usr/share/man/man8/ovs-l3ping.8.gz %doc /usr/share/man/man8/vtep-ctl.8.gz -%doc COPYING DESIGN INSTALL.SSL NOTICE README.md WHY-OVS FAQ NEWS INSTALL.DPDK +%doc COPYING DESIGN.md INSTALL.SSL.md NOTICE README.md WHY-OVS.md FAQ.md NEWS INSTALL.DPDK.md /var/lib/openvswitch /var/log/openvswitch /usr/share/openvswitch/scripts/ovs-ctl diff --git a/rhel/openvswitch.spec.in b/rhel/openvswitch.spec.in index 5d26d5940..9100ad095 100644 --- a/rhel/openvswitch.spec.in +++ b/rhel/openvswitch.spec.in @@ -175,7 +175,7 @@ exit 0 /usr/share/openvswitch/scripts/sysconfig.template /usr/share/openvswitch/vswitch.ovsschema /usr/share/openvswitch/vtep.ovsschema -%doc COPYING DESIGN INSTALL.SSL NOTICE README.md WHY-OVS FAQ NEWS -%doc INSTALL.DPDK rhel/README.RHEL +%doc COPYING DESIGN.md INSTALL.SSL.md NOTICE README.md WHY-OVS.md FAQ.md NEWS +%doc INSTALL.DPDK.md rhel/README.RHEL /var/lib/openvswitch /var/log/openvswitch diff --git a/third-party/README b/third-party/README deleted file mode 100644 index 0f0e8a962..000000000 --- a/third-party/README +++ /dev/null @@ -1,35 +0,0 @@ -This directory contains third-party software that may be useful for -debugging. - -tcpdump -------- -The "ofp-tcpdump.patch" patch adds the ability to parse OpenFlow -messages to tcpdump. These instructions assume that tcpdump 4.3.0 -is going to be used, but it should work with other versions that are not -substantially different. To begin, download tcpdump and apply the -patch: - - wget http://www.tcpdump.org/release/tcpdump-4.3.0.tar.gz - tar xzf tcpdump-4.3.0.tar.gz - ln -s tcpdump-4.3.0 tcpdump - patch -p0 < ofp-tcpdump.patch - -Then build the new version of tcpdump: - - cd tcpdump - ./configure - make - -Clearly, tcpdump can only parse unencrypted packets, so you will need to -connect the controller and datapath using plain TCP. To look at the -traffic, tcpdump will be started in a manner similar to the following: - - sudo ./tcpdump -s0 -i eth0 port 6633 - -The "-s0" flag indicates that tcpdump should capture the entire packet. -If the OpenFlow message is not received in its entirety, "[|openflow]" will -be printed instead of the OpenFlow message contents. - -The verbosity of the output may be increased by adding additional "-v" -flags. If "-vvv" is used, the raw OpenFlow data is also printed in -hex and ASCII. diff --git a/third-party/README.md b/third-party/README.md new file mode 100644 index 000000000..0f0e8a962 --- /dev/null +++ b/third-party/README.md @@ -0,0 +1,35 @@ +This directory contains third-party software that may be useful for +debugging. + +tcpdump +------- +The "ofp-tcpdump.patch" patch adds the ability to parse OpenFlow +messages to tcpdump. These instructions assume that tcpdump 4.3.0 +is going to be used, but it should work with other versions that are not +substantially different. To begin, download tcpdump and apply the +patch: + + wget http://www.tcpdump.org/release/tcpdump-4.3.0.tar.gz + tar xzf tcpdump-4.3.0.tar.gz + ln -s tcpdump-4.3.0 tcpdump + patch -p0 < ofp-tcpdump.patch + +Then build the new version of tcpdump: + + cd tcpdump + ./configure + make + +Clearly, tcpdump can only parse unencrypted packets, so you will need to +connect the controller and datapath using plain TCP. To look at the +traffic, tcpdump will be started in a manner similar to the following: + + sudo ./tcpdump -s0 -i eth0 port 6633 + +The "-s0" flag indicates that tcpdump should capture the entire packet. +If the OpenFlow message is not received in its entirety, "[|openflow]" will +be printed instead of the OpenFlow message contents. + +The verbosity of the output may be increased by adding additional "-v" +flags. If "-vvv" is used, the raw OpenFlow data is also printed in +hex and ASCII. diff --git a/third-party/automake.mk b/third-party/automake.mk index 02636bb53..bce5f8b5e 100644 --- a/third-party/automake.mk +++ b/third-party/automake.mk @@ -1,3 +1,3 @@ EXTRA_DIST += \ - third-party/README \ + third-party/README.md \ third-party/ofp-tcpdump.patch diff --git a/tutorial/Tutorial b/tutorial/Tutorial deleted file mode 100644 index 0506a2075..000000000 --- a/tutorial/Tutorial +++ /dev/null @@ -1,835 +0,0 @@ -Open vSwitch Advanced Features Tutorial -======================================= - -Many tutorials cover the basics of OpenFlow. This is not such a -tutorial. Rather, a knowledge of the basics of OpenFlow is a -prerequisite. If you do not already understand how an OpenFlow flow -table works, please go read a basic tutorial and then continue reading -here afterward. - -It is also important to understand the basics of Open vSwitch before -you begin. If you have never used ovs-vsctl or ovs-ofctl before, you -should learn a little about them before proceeding. - -Most of the features covered in this tutorial are Open vSwitch -extensions to OpenFlow. Also, most of the features in this tutorial -are specific to the software Open vSwitch implementation. If you are -using an Open vSwitch port to an ASIC-based hardware switch, this -tutorial will not help you. - -This tutorial does not cover every aspect of the features that it -mentions. You can find the details elsewhere in the Open vSwitch -documentation, especially ovs-ofctl(8) and the comments in the -include/openflow/nicira-ext.h header file. - ->>> In this tutorial, paragraphs set off like this designate notes - with additional information that readers may wish to skip on a - first read. - - -Getting Started -=============== - -This is a hands-on tutorial. To get the most out of it, you will need -Open vSwitch binaries. You do not, on the other hand, need any -physical networking hardware or even supervisor privilege on your -system. Instead, we will use a script called "ovs-sandbox", which -accompanies the tutorial, that constructs a software simulated network -environment based on Open vSwitch. - -You can use "ovs-sandbox" three ways: - - * If you have already installed Open vSwitch on your system, then - you should be able to just run "ovs-sandbox" from this directory - without any options. - - * If you have not installed Open vSwitch (and you do not want to - install it), then you can build Open vSwitch according to the - instructions in INSTALL, without installing it. Then run - "./ovs-sandbox -b DIRECTORY" from this directory, substituting - the Open vSwitch build directory for DIRECTORY. - - * As a slight variant on the latter, you can run "make sandbox" - from an Open vSwitch build directory. - -When you run ovs-sandbox, it does the following: - - 1. CAUTION: Deletes any subdirectory of the current directory - named "sandbox" and any files in that directory. - - 2. Creates a new directory "sandbox" in the current directory. - - 3. Sets up special environment variables that ensure that Open - vSwitch programs will look inside the "sandbox" directory - instead of in the Open vSwitch installation directory. - - 4. If you are using a built but not installed Open vSwitch, - installs the Open vSwitch manpages in a subdirectory of - "sandbox" and adjusts the MANPATH environment variable to point - to this directory. This means that you can use, for example, - "man ovs-vsctl" to see a manpage for the ovs-vsctl program that - you built. - - 5. Creates an empty Open vSwitch configuration database under - "sandbox". - - 6. Starts ovsdb-server running under "sandbox". - - 7. Starts ovs-vswitchd running under "sandbox", passing special - options that enable a special "dummy" mode for testing. - - 8. Starts a nested interactive shell inside "sandbox". - -At this point, you can run all the usual Open vSwitch utilities from -the nested shell environment. You can, for example, use ovs-vsctl to -create a bridge: - - ovs-vsctl add-br br0 - -From Open vSwitch's perspective, the bridge that you create this way -is as real as any other. You can, for example, connect it to an -OpenFlow controller or use "ovs-ofctl" to examine and modify it and -its OpenFlow flow table. On the other hand, the bridge is not visible -to the operating system's network stack, so "ifconfig" or "ip" cannot -see it or affect it, which means that utilities like "ping" and -"tcpdump" will not work either. (That has its good side, too: you -can't screw up your computer's network stack by manipulating a -sandboxed OVS.) - -When you're done using OVS from the sandbox, exit the nested shell (by -entering the "exit" shell command or pressing Control+D). This will -kill the daemons that ovs-sandbox started, but it leaves the "sandbox" -directory and its contents in place. - -The sandbox directory contains log files for the Open vSwitch dameons. -You can examine them while you're running in the sandboxed environment -or after you exit. - - -Motivation -========== - -The goal of this tutorial is to demonstrate the power of Open vSwitch -flow tables. The tutorial works through the implementation of a -MAC-learning switch with VLAN trunk and access ports. Outside of the -Open vSwitch features that we will discuss, OpenFlow provides at least -two ways to implement such a switch: - - 1. An OpenFlow controller to implement MAC learning in a - "reactive" fashion. Whenever a new MAC appears on the switch, - or a MAC moves from one switch port to another, the controller - adjusts the OpenFlow flow table to match. - - 2. The "normal" action. OpenFlow defines this action to submit a - packet to "the traditional non-OpenFlow pipeline of the - switch". That is, if a flow uses this action, then the packets - in the flow go through the switch in the same way that they - would if OpenFlow was not configured on the switch. - -Each of these approaches has unfortunate pitfalls. In the first -approach, using an OpenFlow controller to implement MAC learning, has -a significant cost in terms of network bandwidth and latency. It also -makes the controller more difficult to scale to large numbers of -switches, which is especially important in environments with thousands -of hypervisors (each of which contains a virtual OpenFlow switch). -MAC learning at an OpenFlow controller also behaves poorly if the -OpenFlow controller fails, slows down, or becomes unavailable due to -network problems. - -The second approach, using the "normal" action, has different -problems. First, little about the "normal" action is standardized, so -it behaves differently on switches from different vendors, and the -available features and how those features are configured (usually not -through OpenFlow) varies widely. Second, "normal" does not work well -with other OpenFlow actions. It is "all-or-nothing", with little -potential to adjust its behavior slightly or to compose it with other -features. - - -Scenario -======== - -We will construct Open vSwitch flow tables for a VLAN-capable, -MAC-learning switch that has four ports: - - * p1, a trunk port that carries all VLANs, on OpenFlow port 1. - - * p2, an access port for VLAN 20, on OpenFlow port 2. - - * p3 and p4, both access ports for VLAN 30, on OpenFlow ports 3 - and 4, respectively. - ->>> The ports' names are not significant. You could call them eth1 - through eth4, or any other names you like. - ->>> An OpenFlow switch always has a "local" port as well. This - scenario won't use the local port. - -Our switch design will consist of five main flow tables, each of which -implements one stage in the switch pipeline: - - Table 0: Admission control. - - Table 1: VLAN input processing. - - Table 2: Learn source MAC and VLAN for ingress port. - - Table 3: Look up learned port for destination MAC and VLAN. - - Table 4: Output processing. - -The section below describes how to set up the scenario, followed by a -section for each OpenFlow table. - -You can cut and paste the "ovs-vsctl" and "ovs-ofctl" commands in each -of the sections below into your "ovs-sandbox" shell. They are also -available as shell scripts in this directory, named t-setup, t-stage0, -t-stage1, ..., t-stage4. The "ovs-appctl" test commands are intended -for cutting and pasting and are not supplied separately. - - -Setup -===== - -To get started, start "ovs-sandbox". Inside the interactive shell -that it starts, run this command: - - ovs-vsctl add-br br0 -- set Bridge br0 fail-mode=secure - -This command creates a new bridge "br0" and puts "br0" into so-called -"fail-secure" mode. For our purpose, this just means that the -OpenFlow flow table starts out empty. - ->>> If we did not do this, then the flow table would start out with a - single flow that executes the "normal" action. We could use that - feature to yield a switch that behaves the same as the switch we - are currently building, but with the caveats described under - "Motivation" above.) - -The new bridge has only one port on it so far, the "local port" br0. -We need to add p1, p2, p3, and p4. A shell "for" loop is one way to -do it: - - for i in 1 2 3 4; do - ovs-vsctl add-port br0 p$i -- set Interface p$i ofport_request=$i - ovs-ofctl mod-port br0 p$i up - done - -In addition to adding a port, the ovs-vsctl command above sets its -"ofport_request" column to ensure that port p1 is assigned OpenFlow -port 1, p2 is assigned OpenFlow port 2, and so on. - ->>> We could omit setting the ofport_request and let Open vSwitch - choose port numbers for us, but it's convenient for the purposes - of this tutorial because we can talk about OpenFlow port 1 and - know that it corresponds to p1. - -The ovs-ofctl command above brings up the simulated interfaces, which -are down initially, using an OpenFlow request. The effect is similar -to "ifconfig up", but the sandbox's interfaces are not visible to the -operating system and therefore "ifconfig" would not affect them. - -We have not configured anything related to VLANs or MAC learning. -That's because we're going to implement those features in the flow -table. - -To see what we've done so far to set up the scenario, you can run a -command like "ovs-vsctl show" or "ovs-ofctl show br0". - - -Implementing Table 0: Admission control -======================================= - -Table 0 is where packets enter the switch. We use this stage to -discard packets that for one reason or another are invalid. For -example, packets with a multicast source address are not valid, so we -can add a flow to drop them at ingress to the switch with: - - ovs-ofctl add-flow br0 \ - "table=0, dl_src=01:00:00:00:00:00/01:00:00:00:00:00, actions=drop" - -A switch should also not forward IEEE 802.1D Spanning Tree Protocol -(STP) packets, so we can also add a flow to drop those and other -packets with reserved multicast protocols: - - ovs-ofctl add-flow br0 \ - "table=0, dl_dst=01:80:c2:00:00:00/ff:ff:ff:ff:ff:f0, actions=drop" - -We could add flows to drop other protocols, but these demonstrate the -pattern. - -We need one more flow, with a priority lower than the default, so that -flows that don't match either of the "drop" flows we added above go on -to pipeline stage 1 in OpenFlow table 1: - - ovs-ofctl add-flow br0 "table=0, priority=0, actions=resubmit(,1)" - -(The "resubmit" action is an Open vSwitch extension to OpenFlow.) - - -Testing Table 0 ---------------- - -If we were using Open vSwitch to set up a physical or a virtual -switch, then we would naturally test it by sending packets through it -one way or another, perhaps with common network testing tools like -"ping" and "tcpdump" or more specialized tools like Scapy. That's -difficult with our simulated switch, since it's not visible to the -operating system. - -But our simulated switch has a few specialized testing tools. The -most powerful of these tools is "ofproto/trace". Given a switch and -the specification of a flow, "ofproto/trace" shows, step-by-step, how -such a flow would be treated as it goes through the switch. - - -== EXAMPLE 1 == - -Try this command: - - ovs-appctl ofproto/trace br0 in_port=1,dl_dst=01:80:c2:00:00:05 - -The output should look something like this: - - Flow: metadata=0,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=01:80:c2:00:00:05,dl_type=0x0000 - Rule: table=0 cookie=0 dl_dst=01:80:c2:00:00:00/ff:ff:ff:ff:ff:f0 - OpenFlow actions=drop - - Final flow: unchanged - Datapath actions: drop - -The first block of lines describes an OpenFlow table lookup. The -first line shows the fields used for the table lookup (which is mostly -zeros because that's the default if we don't specify everything). The -second line gives the OpenFlow flow that the fields matched (called a -"rule" because that is the name used inside Open vSwitch for an -OpenFlow flow). In this case, we see that this packet that has a -reserved multicast destination address matches the rule that drops -those packets. The third line gives the rule's OpenFlow actions. - -The second block of lines summarizes the results, which are not very -interesting here. - - -== EXAMPLE 2 == - -Try another command: - - ovs-appctl ofproto/trace br0 in_port=1,dl_dst=01:80:c2:00:00:10 - -The output should be: - - Flow: metadata=0,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=01:80:c2:00:00:10,dl_type=0x0000 - Rule: table=0 cookie=0 priority=0 - OpenFlow actions=resubmit(,1) - - Resubmitted flow: unchanged - Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 - Resubmitted odp: drop - No match - - Final flow: unchanged - Datapath actions: drop - -This time the flow we handed to "ofproto/trace" doesn't match any of -our "drop" rules, so it falls through to the low-priority "resubmit" -rule, which we see in the rule and the actions selected in the first -block. The "resubmit" causes a second lookup in OpenFlow table 1, -described by the additional block of indented text in the output. We -haven't yet added any flows to OpenFlow table 1, so no flow actually -matches in the second lookup. Therefore, the packet is still actually -dropped, which means that the externally observable results would be -identical to our first example. - - -Implementing Table 1: VLAN Input Processing -=========================================== - -A packet that enters table 1 has already passed basic validation in -table 0. The purpose of table 1 is validate the packet's VLAN, based -on the VLAN configuration of the switch port through which the packet -entered the switch. We will also use it to attach a VLAN header to -packets that arrive on an access port, which allows later processing -stages to rely on the packet's VLAN always being part of the VLAN -header, reducing special cases. - -Let's start by adding a low-priority flow that drops all packets, -before we add flows that pass through acceptable packets. You can -think of this as a "default drop" rule: - - ovs-ofctl add-flow br0 "table=1, priority=0, actions=drop" - -Our trunk port p1, on OpenFlow port 1, is an easy case. p1 accepts -any packet regardless of whether it has a VLAN header or what the VLAN -was, so we can add a flow that resubmits everything on input port 1 to -the next table: - - ovs-ofctl add-flow br0 \ - "table=1, priority=99, in_port=1, actions=resubmit(,2)" - -On the access ports, we want to accept any packet that has no VLAN -header, tag it with the access port's VLAN number, and then pass it -along to the next stage: - - ovs-ofctl add-flows br0 - <<'EOF' - table=1, priority=99, in_port=2, vlan_tci=0, actions=mod_vlan_vid:20, resubmit(,2) - table=1, priority=99, in_port=3, vlan_tci=0, actions=mod_vlan_vid:30, resubmit(,2) - table=1, priority=99, in_port=4, vlan_tci=0, actions=mod_vlan_vid:30, resubmit(,2) -EOF - -We don't write any rules that match packets with 802.1Q that enter -this stage on any of the access ports, so the "default drop" rule we -added earlier causes them to be dropped, which is ordinarily what we -want for access ports. - ->>> Another variation of access ports allows ingress of packets tagged - with VLAN 0 (aka 802.1p priority tagged packets). To allow such - packets, replace "vlan_tci=0" by "vlan_tci=0/0xfff" above. - - -Testing Table 1 ---------------- - -"ofproto/trace" allows us to test the ingress VLAN rules that we added -above. - - -== EXAMPLE 1: Packet on Trunk Port == - -Here's a test of a packet coming in on the trunk port: - - ovs-appctl ofproto/trace br0 in_port=1,vlan_tci=5 - -The output shows the lookup in table 0, the resubmit to table 1, and -the resubmit to table 2 (which does nothing because we haven't put -anything there yet): - - Flow: metadata=0,in_port=1,vlan_tci=0x0005,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000 - Rule: table=0 cookie=0 priority=0 - OpenFlow actions=resubmit(,1) - - Resubmitted flow: unchanged - Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 - Resubmitted odp: drop - Rule: table=1 cookie=0 priority=99,in_port=1 - OpenFlow actions=resubmit(,2) - - Resubmitted flow: unchanged - Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 - Resubmitted odp: drop - No match - - Final flow: unchanged - Datapath actions: drop - - -== EXAMPLE 2: Valid Packet on Access Port == - -Here's a test of a valid packet (a packet without an 802.1Q header) -coming in on access port p2: - - ovs-appctl ofproto/trace br0 in_port=2 - -The output is similar to that for the previous case, except that it -additionally tags the packet with p2's VLAN 20 before it passes it -along to table 2: - - Flow: metadata=0,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000 - Rule: table=0 cookie=0 priority=0 - OpenFlow actions=resubmit(,1) - - Resubmitted flow: unchanged - Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 - Resubmitted odp: drop - Rule: table=1 cookie=0 priority=99,in_port=2,vlan_tci=0x0000 - OpenFlow actions=mod_vlan_vid:20,resubmit(,2) - - Resubmitted flow: metadata=0,in_port=2,dl_vlan=20,dl_vlan_pcp=0,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000 - Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 - Resubmitted odp: drop - No match - - Final flow: unchanged - Datapath actions: drop - - -== EXAMPLE 3: Invalid Packet on Access Port == - -This tests an invalid packet (one that includes an 802.1Q header) -coming in on access port p2: - - ovs-appctl ofproto/trace br0 in_port=2,vlan_tci=5 - -The output shows the packet matching the default drop rule: - - Flow: metadata=0,in_port=2,vlan_tci=0x0005,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000 - Rule: table=0 cookie=0 priority=0 - OpenFlow actions=resubmit(,1) - - Resubmitted flow: unchanged - Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 - Resubmitted odp: drop - Rule: table=1 cookie=0 priority=0 - OpenFlow actions=drop - - Final flow: unchanged - Datapath actions: drop - - -Implementing Table 2: MAC+VLAN Learning for Ingress Port -======================================================== - -This table allows the switch we're implementing to learn that the -packet's source MAC is located on the packet's ingress port in the -packet's VLAN. - ->>> This table is a good example why table 1 added a VLAN tag to - packets that entered the switch through an access port. We want - to associate a MAC+VLAN with a port regardless of whether the VLAN - in question was originally part of the packet or whether it was an - assumed VLAN associated with an access port. - -It only takes a single flow to do this. The following command adds -it: - - ovs-ofctl add-flow br0 \ - "table=2 actions=learn(table=10, NXM_OF_VLAN_TCI[0..11], \ - NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[], \ - load:NXM_OF_IN_PORT[]->NXM_NX_REG0[0..15]), \ - resubmit(,3)" - -The "learn" action (an Open vSwitch extension to OpenFlow) modifies a -flow table based on the content of the flow currently being processed. -Here's how you can interpret each part of the "learn" action above: - - table=10 - - Modify flow table 10. This will be the MAC learning table. - - NXM_OF_VLAN_TCI[0..11] - - Make the flow that we add to flow table 10 match the same VLAN - ID that the packet we're currently processing contains. This - effectively scopes the MAC learning entry to a single VLAN, - which is the ordinary behavior for a VLAN-aware switch. - - NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[] - - Make the flow that we add to flow table 10 match, as Ethernet - destination, the Ethernet source address of the packet we're - currently processing. - - load:NXM_OF_IN_PORT[]->NXM_NX_REG0[0..15] - - Whereas the preceding parts specify fields for the new flow to - match, this specifies an action for the flow to take when it - matches. The action is for the flow to load the ingress port - number of the current packet into register 0 (a special field - that is an Open vSwitch extension to OpenFlow). - ->>> A real use of "learn" for MAC learning would probably involve two - additional elements. First, the "learn" action would specify a - hard_timeout for the new flow, to enable a learned MAC to - eventually expire if no new packets were seen from a given source - within a reasonable interval. Second, one would usually want to - limit resource consumption by using the Flow_Table table in the - Open vSwitch configuration database to specify a maximum number of - flows in table 10. - -This definitely calls for examples. - - -Testing Table 2 ---------------- - -== EXAMPLE 1 == - -Try the following test command: - - ovs-appctl ofproto/trace br0 in_port=1,vlan_tci=20,dl_src=50:00:00:00:00:01 -generate - -The output shows that "learn" was executed, but it isn't otherwise -informative, so we won't include it here. - -The "-generate" keyword is new. Ordinarily, "ofproto/trace" has no -side effects: "output" actions do not actually output packets, "learn" -actions do not actually modify the flow table, and so on. With -"-generate", though, "ofproto/trace" does execute "learn" actions. -That's important now, because we want to see the effect of the "learn" -action on table 10. You can see that by running: - - ovs-ofctl dump-flows br0 table=10 - -which (omitting the "duration" and "idle_age" fields, which will vary -based on how soon you ran this command after the previous one, as well -as some other uninteresting fields) prints something like: - - NXST_FLOW reply (xid=0x4): - table=10, vlan_tci=0x0014/0x0fff,dl_dst=50:00:00:00:00:01 actions=load:0x1->NXM_NX_REG0[0..15] - -You can see that the packet coming in on VLAN 20 with source MAC -50:00:00:00:00:01 became a flow that matches VLAN 20 (written in -hexadecimal) and destination MAC 50:00:00:00:00:01. The flow loads -port number 1, the input port for the flow we tested, into register 0. - - -== EXAMPLE 2 == - -Here's a second test command: - - ovs-appctl ofproto/trace br0 in_port=2,dl_src=50:00:00:00:00:01 -generate - -The flow that this command tests has the same source MAC and VLAN as -example 1, although the VLAN comes from an access port VLAN rather -than an 802.1Q header. If we again dump the flows for table 10 with: - - ovs-ofctl dump-flows br0 table=10 - -then we see that the flow we saw previously has changed to indicate -that the learned port is port 2, as we would expect: - - NXST_FLOW reply (xid=0x4): - table=10, vlan_tci=0x0014/0x0fff,dl_dst=50:00:00:00:00:01 actions=load:0x2->NXM_NX_REG0[0..15] - - -Implementing Table 3: Look Up Destination Port -============================================== - -This table figures out what port we should send the packet to based on -the destination MAC and VLAN. That is, if we've learned the location -of the destination (from table 2 processing some previous packet with -that destination as its source), then we want to send the packet -there. - -We need only one flow to do the lookup: - - ovs-ofctl add-flow br0 \ - "table=3 priority=50 actions=resubmit(,10), resubmit(,4)" - -The flow's first action resubmits to table 10, the table that the -"learn" action modifies. As you saw previously, the learned flows in -this table write the learned port into register 0. If the destination -for our packet hasn't been learned, then there will be no matching -flow, and so the "resubmit" turns into a no-op. Because registers are -initialized to 0, we can use a register 0 value of 0 in our next -pipeline stage as a signal to flood the packet. - -The second action resubmits to table 4, continuing to the next -pipeline stage. - -We can add another flow to skip the learning table lookup for -multicast and broadcast packets, since those should always be flooded: - - ovs-ofctl add-flow br0 \ - "table=3 priority=99 dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 \ - actions=resubmit(,4)" - ->>> We don't strictly need to add this flow, because multicast - addresses will never show up in our learning table. (In turn, - that's because we put a flow into table 0 to drop packets that - have a multicast source address.) - - -Testing Table 3 ---------------- - -== EXAMPLE == - -Here's a command that should cause OVS to learn that f0:00:00:00:00:01 -is on p1 in VLAN 20: - - ovs-appctl ofproto/trace br0 in_port=1,dl_vlan=20,dl_src=f0:00:00:00:00:01,dl_dst=90:00:00:00:00:01 -generate - -Here's an excerpt from the output that shows (from the "no match" -looking up the resubmit to table 10) that the flow's destination was -unknown: - - Resubmitted flow: unchanged - Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 - Resubmitted odp: drop - Rule: table=3 cookie=0 priority=50 - OpenFlow actions=resubmit(,10),resubmit(,4) - - Resubmitted flow: unchanged - Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 - Resubmitted odp: drop - No match - -You can verify that the packet's source was learned two ways. The -most direct way is to dump the learning table with: - - ovs-ofctl dump-flows br0 table=10 - -which ought to show roughly the following, with extraneous details -removed: - - table=10, vlan_tci=0x0014/0x0fff,dl_dst=f0:00:00:00:00:01 actions=load:0x1->NXM_NX_REG0[0..15] - ->>> If you tried the examples for the previous step, or if you did - some of your own experiments, then you might see additional flows - there. These additional flows are harmless. If they bother you, - then you can remove them with "ovs-ofctl del-flows br0 table=10". - -The other way is to inject a packet to take advantage of the learning -entry. For example, we can inject a packet on p2 whose destination is -the MAC address that we just learned on p1: - - ovs-appctl ofproto/trace br0 in_port=2,dl_src=90:00:00:00:00:01,dl_dst=f0:00:00:00:00:01 -generate - -Here's an interesting excerpt from that command's output. This group -of lines traces the "resubmit(,10)", showing that the packet matched -the learned flow for the first MAC we used, loading the OpenFlow port -number for the learned port p1 into register 0: - - Resubmitted flow: unchanged - Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 - Resubmitted odp: drop - Rule: table=10 cookie=0 vlan_tci=0x0014/0x0fff,dl_dst=f0:00:00:00:00:01 - OpenFlow actions=load:0x1->NXM_NX_REG0[0..15] - - -If you read the commands above carefully, then you might have noticed -that they simply have the Ethernet source and destination addresses -exchanged. That means that if we now rerun the first ovs-appctl -command above, e.g.: - - ovs-appctl ofproto/trace br0 in_port=1,dl_vlan=20,dl_src=f0:00:00:00:00:01,dl_dst=90:00:00:00:00:01 -generate - -then we see in the output that the destination has now been learned: - - Resubmitted flow: unchanged - Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 - Resubmitted odp: drop - Rule: table=10 cookie=0 vlan_tci=0x0014/0x0fff,dl_dst=90:00:00:00:00:01 - OpenFlow actions=load:0x2->NXM_NX_REG0[0..15] - - -Implementing Table 4: Output Processing -======================================= - -At entry to stage 4, we know that register 0 contains either the -desired output port or is zero if the packet should be flooded. We -also know that the packet's VLAN is in its 802.1Q header, even if the -VLAN was implicit because the packet came in on an access port. - -The job of the final pipeline stage is to actually output packets. -The job is trivial for output to our trunk port p1: - - ovs-ofctl add-flow br0 "table=4 reg0=1 actions=1" - -For output to the access ports, we just have to strip the VLAN header -before outputting the packet: - - ovs-ofctl add-flows br0 - <<'EOF' - table=4 reg0=2 actions=strip_vlan,2 - table=4 reg0=3 actions=strip_vlan,3 - table=4 reg0=4 actions=strip_vlan,4 -EOF - -The only slightly tricky part is flooding multicast and broadcast -packets and unicast packets with unlearned destinations. For those, -we need to make sure that we only output the packets to the ports that -carry our packet's VLAN, and that we include the 802.1Q header in the -copy output to the trunk port but not in copies output to access -ports: - - ovs-ofctl add-flows br0 - <<'EOF' - table=4 reg0=0 priority=99 dl_vlan=20 actions=1,strip_vlan,2 - table=4 reg0=0 priority=99 dl_vlan=30 actions=1,strip_vlan,3,4 - table=4 reg0=0 priority=50 actions=1 -EOF - ->>> Our rules rely on the standard OpenFlow behavior that an output - action will not forward a packet back out the port it came in on. - That is, if a packet comes in on p1, and we've learned that the - packet's destination MAC is also on p1, so that we end up with - "actions=1" as our actions, the switch will not forward the packet - back out its input port. The multicast/broadcast/unknown - destination cases above also rely on this behavior. - - -Testing Table 4 ---------------- - -== EXAMPLE 1: Broadcast, Multicast, and Unknown Destination == - -Try tracing a broadcast packet arriving on p1 in VLAN 30: - - ovs-appctl ofproto/trace br0 in_port=1,dl_dst=ff:ff:ff:ff:ff:ff,dl_vlan=30 - -The interesting part of the output is the final line, which shows that -the switch would remove the 802.1Q header and then output the packet to -p3 and p4, which are access ports for VLAN 30: - - Datapath actions: pop_vlan,3,4 - -Similarly, if we trace a broadcast packet arriving on p3: - - ovs-appctl ofproto/trace br0 in_port=3,dl_dst=ff:ff:ff:ff:ff:ff - -then we see that it is output to p1 with an 802.1Q tag and then to p4 -without one: - - Datapath actions: push_vlan(vid=30,pcp=0),1,pop_vlan,4 - ->>> Open vSwitch could simplify the datapath actions here to just - "4,push_vlan(vid=30,pcp=0),1" but it is not smart enough to do so. - -The following are also broadcasts, but the result is to drop the -packets because the VLAN only belongs to the input port: - - ovs-appctl ofproto/trace br0 in_port=1,dl_dst=ff:ff:ff:ff:ff:ff - ovs-appctl ofproto/trace br0 in_port=1,dl_dst=ff:ff:ff:ff:ff:ff,dl_vlan=55 - -Try some other broadcast cases on your own: - - ovs-appctl ofproto/trace br0 in_port=1,dl_dst=ff:ff:ff:ff:ff:ff,dl_vlan=20 - ovs-appctl ofproto/trace br0 in_port=2,dl_dst=ff:ff:ff:ff:ff:ff - ovs-appctl ofproto/trace br0 in_port=4,dl_dst=ff:ff:ff:ff:ff:ff - -You can see the same behavior with multicast packets and with unicast -packets whose destination has not been learned, e.g.: - - ovs-appctl ofproto/trace br0 in_port=4,dl_dst=01:00:00:00:00:00 - ovs-appctl ofproto/trace br0 in_port=1,dl_dst=90:12:34:56:78:90,dl_vlan=20 - ovs-appctl ofproto/trace br0 in_port=1,dl_dst=90:12:34:56:78:90,dl_vlan=30 - - -== EXAMPLE 2: MAC Learning == - -Let's follow the same pattern as we did for table 3. First learn a -MAC on port p1 in VLAN 30: - - ovs-appctl ofproto/trace br0 in_port=1,dl_vlan=30,dl_src=10:00:00:00:00:01,dl_dst=20:00:00:00:00:01 -generate - -You can see from the last line of output that the packet's destination -is unknown, so it gets flooded to both p3 and p4, the other ports in -VLAN 30: - - Datapath actions: pop_vlan,3,4 - -Then reverse the MACs and learn the first flow's destination on port -p4: - - ovs-appctl ofproto/trace br0 in_port=4,dl_src=20:00:00:00:00:01,dl_dst=10:00:00:00:00:01 -generate - -The last line of output shows that the this packet's destination is -known to be p1, as learned from our previous command: - - Datapath actions: push_vlan(vid=30,pcp=0),1 - -Now, if we rerun our first command: - - ovs-appctl ofproto/trace br0 in_port=1,dl_vlan=30,dl_src=10:00:00:00:00:01,dl_dst=20:00:00:00:00:01 -generate - -we can see that the result is no longer a flood but to the specified -learned destination port p4: - - Datapath actions: pop_vlan,4 - - -Contact -======= - -bugs@openvswitch.org -http://openvswitch.org/ diff --git a/tutorial/Tutorial.md b/tutorial/Tutorial.md new file mode 100644 index 000000000..0cf52fb15 --- /dev/null +++ b/tutorial/Tutorial.md @@ -0,0 +1,830 @@ +Open vSwitch Advanced Features Tutorial +======================================= + +Many tutorials cover the basics of OpenFlow. This is not such a +tutorial. Rather, a knowledge of the basics of OpenFlow is a +prerequisite. If you do not already understand how an OpenFlow flow +table works, please go read a basic tutorial and then continue reading +here afterward. + +It is also important to understand the basics of Open vSwitch before +you begin. If you have never used ovs-vsctl or ovs-ofctl before, you +should learn a little about them before proceeding. + +Most of the features covered in this tutorial are Open vSwitch +extensions to OpenFlow. Also, most of the features in this tutorial +are specific to the software Open vSwitch implementation. If you are +using an Open vSwitch port to an ASIC-based hardware switch, this +tutorial will not help you. + +This tutorial does not cover every aspect of the features that it +mentions. You can find the details elsewhere in the Open vSwitch +documentation, especially ovs-ofctl(8) and the comments in the +include/openflow/nicira-ext.h header file. + +>>> In this tutorial, paragraphs set off like this designate notes + with additional information that readers may wish to skip on a + first read. + + +Getting Started +--------------- + +This is a hands-on tutorial. To get the most out of it, you will need +Open vSwitch binaries. You do not, on the other hand, need any +physical networking hardware or even supervisor privilege on your +system. Instead, we will use a script called "ovs-sandbox", which +accompanies the tutorial, that constructs a software simulated network +environment based on Open vSwitch. + +You can use `ovs-sandbox` three ways: + + * If you have already installed Open vSwitch on your system, then + you should be able to just run `ovs-sandbox` from this directory + without any options. + + * If you have not installed Open vSwitch (and you do not want to + install it), then you can build Open vSwitch according to the + instructions in INSTALL, without installing it. Then run + `./ovs-sandbox -b DIRECTORY` from this directory, substituting + the Open vSwitch build directory for DIRECTORY. + + * As a slight variant on the latter, you can run `make sandbox` + from an Open vSwitch build directory. + +When you run ovs-sandbox, it does the following: + + 1. CAUTION: Deletes any subdirectory of the current directory + named "sandbox" and any files in that directory. + + 2. Creates a new directory "sandbox" in the current directory. + + 3. Sets up special environment variables that ensure that Open + vSwitch programs will look inside the "sandbox" directory + instead of in the Open vSwitch installation directory. + + 4. If you are using a built but not installed Open vSwitch, + installs the Open vSwitch manpages in a subdirectory of + "sandbox" and adjusts the MANPATH environment variable to point + to this directory. This means that you can use, for example, + `man ovs-vsctl` to see a manpage for the ovs-vsctl program that + you built. + + 5. Creates an empty Open vSwitch configuration database under + "sandbox". + + 6. Starts ovsdb-server running under "sandbox". + + 7. Starts ovs-vswitchd running under "sandbox", passing special + options that enable a special "dummy" mode for testing. + + 8. Starts a nested interactive shell inside "sandbox". + +At this point, you can run all the usual Open vSwitch utilities from +the nested shell environment. You can, for example, use ovs-vsctl to +create a bridge: + + ovs-vsctl add-br br0 + +From Open vSwitch's perspective, the bridge that you create this way +is as real as any other. You can, for example, connect it to an +OpenFlow controller or use "ovs-ofctl" to examine and modify it and +its OpenFlow flow table. On the other hand, the bridge is not visible +to the operating system's network stack, so "ifconfig" or "ip" cannot +see it or affect it, which means that utilities like "ping" and +"tcpdump" will not work either. (That has its good side, too: you +can't screw up your computer's network stack by manipulating a +sandboxed OVS.) + +When you're done using OVS from the sandbox, exit the nested shell (by +entering the "exit" shell command or pressing Control+D). This will +kill the daemons that ovs-sandbox started, but it leaves the "sandbox" +directory and its contents in place. + +The sandbox directory contains log files for the Open vSwitch dameons. +You can examine them while you're running in the sandboxed environment +or after you exit. + + +Motivation +---------- + +The goal of this tutorial is to demonstrate the power of Open vSwitch +flow tables. The tutorial works through the implementation of a +MAC-learning switch with VLAN trunk and access ports. Outside of the +Open vSwitch features that we will discuss, OpenFlow provides at least +two ways to implement such a switch: + + 1. An OpenFlow controller to implement MAC learning in a + "reactive" fashion. Whenever a new MAC appears on the switch, + or a MAC moves from one switch port to another, the controller + adjusts the OpenFlow flow table to match. + + 2. The "normal" action. OpenFlow defines this action to submit a + packet to "the traditional non-OpenFlow pipeline of the + switch". That is, if a flow uses this action, then the packets + in the flow go through the switch in the same way that they + would if OpenFlow was not configured on the switch. + +Each of these approaches has unfortunate pitfalls. In the first +approach, using an OpenFlow controller to implement MAC learning, has +a significant cost in terms of network bandwidth and latency. It also +makes the controller more difficult to scale to large numbers of +switches, which is especially important in environments with thousands +of hypervisors (each of which contains a virtual OpenFlow switch). +MAC learning at an OpenFlow controller also behaves poorly if the +OpenFlow controller fails, slows down, or becomes unavailable due to +network problems. + +The second approach, using the "normal" action, has different +problems. First, little about the "normal" action is standardized, so +it behaves differently on switches from different vendors, and the +available features and how those features are configured (usually not +through OpenFlow) varies widely. Second, "normal" does not work well +with other OpenFlow actions. It is "all-or-nothing", with little +potential to adjust its behavior slightly or to compose it with other +features. + + +Scenario +-------- + +We will construct Open vSwitch flow tables for a VLAN-capable, +MAC-learning switch that has four ports: + + * p1, a trunk port that carries all VLANs, on OpenFlow port 1. + + * p2, an access port for VLAN 20, on OpenFlow port 2. + + * p3 and p4, both access ports for VLAN 30, on OpenFlow ports 3 + and 4, respectively. + +>>> The ports' names are not significant. You could call them eth1 + through eth4, or any other names you like. + +>>> An OpenFlow switch always has a "local" port as well. This + scenario won't use the local port. + +Our switch design will consist of five main flow tables, each of which +implements one stage in the switch pipeline: + + Table 0: Admission control. + + Table 1: VLAN input processing. + + Table 2: Learn source MAC and VLAN for ingress port. + + Table 3: Look up learned port for destination MAC and VLAN. + + Table 4: Output processing. + +The section below describes how to set up the scenario, followed by a +section for each OpenFlow table. + +You can cut and paste the "ovs-vsctl" and "ovs-ofctl" commands in each +of the sections below into your "ovs-sandbox" shell. They are also +available as shell scripts in this directory, named t-setup, t-stage0, +t-stage1, ..., t-stage4. The "ovs-appctl" test commands are intended +for cutting and pasting and are not supplied separately. + + +Setup +----- + +To get started, start "ovs-sandbox". Inside the interactive shell +that it starts, run this command: + + ovs-vsctl add-br br0 -- set Bridge br0 fail-mode=secure + +This command creates a new bridge "br0" and puts "br0" into so-called +"fail-secure" mode. For our purpose, this just means that the +OpenFlow flow table starts out empty. + +>>> If we did not do this, then the flow table would start out with a + single flow that executes the "normal" action. We could use that + feature to yield a switch that behaves the same as the switch we + are currently building, but with the caveats described under + "Motivation" above.) + +The new bridge has only one port on it so far, the "local port" br0. +We need to add p1, p2, p3, and p4. A shell "for" loop is one way to +do it: + + for i in 1 2 3 4; do + ovs-vsctl add-port br0 p$i -- set Interface p$i ofport_request=$i + ovs-ofctl mod-port br0 p$i up + done + +In addition to adding a port, the ovs-vsctl command above sets its +"ofport_request" column to ensure that port p1 is assigned OpenFlow +port 1, p2 is assigned OpenFlow port 2, and so on. + +>>> We could omit setting the ofport_request and let Open vSwitch + choose port numbers for us, but it's convenient for the purposes + of this tutorial because we can talk about OpenFlow port 1 and + know that it corresponds to p1. + +The ovs-ofctl command above brings up the simulated interfaces, which +are down initially, using an OpenFlow request. The effect is similar +to "ifconfig up", but the sandbox's interfaces are not visible to the +operating system and therefore "ifconfig" would not affect them. + +We have not configured anything related to VLANs or MAC learning. +That's because we're going to implement those features in the flow +table. + +To see what we've done so far to set up the scenario, you can run a +command like "ovs-vsctl show" or "ovs-ofctl show br0". + + +Implementing Table 0: Admission control +--------------------------------------- + +Table 0 is where packets enter the switch. We use this stage to +discard packets that for one reason or another are invalid. For +example, packets with a multicast source address are not valid, so we +can add a flow to drop them at ingress to the switch with: + + ovs-ofctl add-flow br0 \ + "table=0, dl_src=01:00:00:00:00:00/01:00:00:00:00:00, actions=drop" + +A switch should also not forward IEEE 802.1D Spanning Tree Protocol +(STP) packets, so we can also add a flow to drop those and other +packets with reserved multicast protocols: + + ovs-ofctl add-flow br0 \ + "table=0, dl_dst=01:80:c2:00:00:00/ff:ff:ff:ff:ff:f0, actions=drop" + +We could add flows to drop other protocols, but these demonstrate the +pattern. + +We need one more flow, with a priority lower than the default, so that +flows that don't match either of the "drop" flows we added above go on +to pipeline stage 1 in OpenFlow table 1: + + ovs-ofctl add-flow br0 "table=0, priority=0, actions=resubmit(,1)" + +(The "resubmit" action is an Open vSwitch extension to OpenFlow.) + + +### Testing Table 0 + +If we were using Open vSwitch to set up a physical or a virtual +switch, then we would naturally test it by sending packets through it +one way or another, perhaps with common network testing tools like +"ping" and "tcpdump" or more specialized tools like Scapy. That's +difficult with our simulated switch, since it's not visible to the +operating system. + +But our simulated switch has a few specialized testing tools. The +most powerful of these tools is "ofproto/trace". Given a switch and +the specification of a flow, "ofproto/trace" shows, step-by-step, how +such a flow would be treated as it goes through the switch. + + +### EXAMPLE 1 + +Try this command: + + ovs-appctl ofproto/trace br0 in_port=1,dl_dst=01:80:c2:00:00:05 + +The output should look something like this: + + Flow: metadata=0,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=01:80:c2:00:00:05,dl_type=0x0000 + Rule: table=0 cookie=0 dl_dst=01:80:c2:00:00:00/ff:ff:ff:ff:ff:f0 + OpenFlow actions=drop + + Final flow: unchanged + Datapath actions: drop + +The first block of lines describes an OpenFlow table lookup. The +first line shows the fields used for the table lookup (which is mostly +zeros because that's the default if we don't specify everything). The +second line gives the OpenFlow flow that the fields matched (called a +"rule" because that is the name used inside Open vSwitch for an +OpenFlow flow). In this case, we see that this packet that has a +reserved multicast destination address matches the rule that drops +those packets. The third line gives the rule's OpenFlow actions. + +The second block of lines summarizes the results, which are not very +interesting here. + + +### EXAMPLE 2 + +Try another command: + + ovs-appctl ofproto/trace br0 in_port=1,dl_dst=01:80:c2:00:00:10 + +The output should be: + + Flow: metadata=0,in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=01:80:c2:00:00:10,dl_type=0x0000 + Rule: table=0 cookie=0 priority=0 + OpenFlow actions=resubmit(,1) + + Resubmitted flow: unchanged + Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 + Resubmitted odp: drop + No match + + Final flow: unchanged + Datapath actions: drop + +This time the flow we handed to "ofproto/trace" doesn't match any of +our "drop" rules, so it falls through to the low-priority "resubmit" +rule, which we see in the rule and the actions selected in the first +block. The "resubmit" causes a second lookup in OpenFlow table 1, +described by the additional block of indented text in the output. We +haven't yet added any flows to OpenFlow table 1, so no flow actually +matches in the second lookup. Therefore, the packet is still actually +dropped, which means that the externally observable results would be +identical to our first example. + + +Implementing Table 1: VLAN Input Processing +------------------------------------------- + +A packet that enters table 1 has already passed basic validation in +table 0. The purpose of table 1 is validate the packet's VLAN, based +on the VLAN configuration of the switch port through which the packet +entered the switch. We will also use it to attach a VLAN header to +packets that arrive on an access port, which allows later processing +stages to rely on the packet's VLAN always being part of the VLAN +header, reducing special cases. + +Let's start by adding a low-priority flow that drops all packets, +before we add flows that pass through acceptable packets. You can +think of this as a "default drop" rule: + + ovs-ofctl add-flow br0 "table=1, priority=0, actions=drop" + +Our trunk port p1, on OpenFlow port 1, is an easy case. p1 accepts +any packet regardless of whether it has a VLAN header or what the VLAN +was, so we can add a flow that resubmits everything on input port 1 to +the next table: + + ovs-ofctl add-flow br0 \ + "table=1, priority=99, in_port=1, actions=resubmit(,2)" + +On the access ports, we want to accept any packet that has no VLAN +header, tag it with the access port's VLAN number, and then pass it +along to the next stage: + + ovs-ofctl add-flows br0 - <<'EOF' + table=1, priority=99, in_port=2, vlan_tci=0, actions=mod_vlan_vid:20, resubmit(,2) + table=1, priority=99, in_port=3, vlan_tci=0, actions=mod_vlan_vid:30, resubmit(,2) + table=1, priority=99, in_port=4, vlan_tci=0, actions=mod_vlan_vid:30, resubmit(,2) + EOF + +We don't write any rules that match packets with 802.1Q that enter +this stage on any of the access ports, so the "default drop" rule we +added earlier causes them to be dropped, which is ordinarily what we +want for access ports. + +>>> Another variation of access ports allows ingress of packets tagged + with VLAN 0 (aka 802.1p priority tagged packets). To allow such + packets, replace "vlan_tci=0" by "vlan_tci=0/0xfff" above. + + +### Testing Table 1 + +"ofproto/trace" allows us to test the ingress VLAN rules that we added +above. + + +### EXAMPLE 1: Packet on Trunk Port + +Here's a test of a packet coming in on the trunk port: + + ovs-appctl ofproto/trace br0 in_port=1,vlan_tci=5 + +The output shows the lookup in table 0, the resubmit to table 1, and +the resubmit to table 2 (which does nothing because we haven't put +anything there yet): + + Flow: metadata=0,in_port=1,vlan_tci=0x0005,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000 + Rule: table=0 cookie=0 priority=0 + OpenFlow actions=resubmit(,1) + + Resubmitted flow: unchanged + Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 + Resubmitted odp: drop + Rule: table=1 cookie=0 priority=99,in_port=1 + OpenFlow actions=resubmit(,2) + + Resubmitted flow: unchanged + Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 + Resubmitted odp: drop + No match + + Final flow: unchanged + Datapath actions: drop + + +### EXAMPLE 2: Valid Packet on Access Port + +Here's a test of a valid packet (a packet without an 802.1Q header) +coming in on access port p2: + + ovs-appctl ofproto/trace br0 in_port=2 + +The output is similar to that for the previous case, except that it +additionally tags the packet with p2's VLAN 20 before it passes it +along to table 2: + + Flow: metadata=0,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000 + Rule: table=0 cookie=0 priority=0 + OpenFlow actions=resubmit(,1) + + Resubmitted flow: unchanged + Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 + Resubmitted odp: drop + Rule: table=1 cookie=0 priority=99,in_port=2,vlan_tci=0x0000 + OpenFlow actions=mod_vlan_vid:20,resubmit(,2) + + Resubmitted flow: metadata=0,in_port=2,dl_vlan=20,dl_vlan_pcp=0,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000 + Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 + Resubmitted odp: drop + No match + + Final flow: unchanged + Datapath actions: drop + + +### EXAMPLE 3: Invalid Packet on Access Port + +This tests an invalid packet (one that includes an 802.1Q header) +coming in on access port p2: + + ovs-appctl ofproto/trace br0 in_port=2,vlan_tci=5 + +The output shows the packet matching the default drop rule: + + Flow: metadata=0,in_port=2,vlan_tci=0x0005,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000 + Rule: table=0 cookie=0 priority=0 + OpenFlow actions=resubmit(,1) + + Resubmitted flow: unchanged + Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 + Resubmitted odp: drop + Rule: table=1 cookie=0 priority=0 + OpenFlow actions=drop + + Final flow: unchanged + Datapath actions: drop + + +Implementing Table 2: MAC+VLAN Learning for Ingress Port +-------------------------------------------------------- + +This table allows the switch we're implementing to learn that the +packet's source MAC is located on the packet's ingress port in the +packet's VLAN. + +>>> This table is a good example why table 1 added a VLAN tag to + packets that entered the switch through an access port. We want + to associate a MAC+VLAN with a port regardless of whether the VLAN + in question was originally part of the packet or whether it was an + assumed VLAN associated with an access port. + +It only takes a single flow to do this. The following command adds +it: + + ovs-ofctl add-flow br0 \ + "table=2 actions=learn(table=10, NXM_OF_VLAN_TCI[0..11], \ + NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[], \ + load:NXM_OF_IN_PORT[]->NXM_NX_REG0[0..15]), \ + resubmit(,3)" + +The "learn" action (an Open vSwitch extension to OpenFlow) modifies a +flow table based on the content of the flow currently being processed. +Here's how you can interpret each part of the "learn" action above: + + table=10 + + Modify flow table 10. This will be the MAC learning table. + + NXM_OF_VLAN_TCI[0..11] + + Make the flow that we add to flow table 10 match the same VLAN + ID that the packet we're currently processing contains. This + effectively scopes the MAC learning entry to a single VLAN, + which is the ordinary behavior for a VLAN-aware switch. + + NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[] + + Make the flow that we add to flow table 10 match, as Ethernet + destination, the Ethernet source address of the packet we're + currently processing. + + load:NXM_OF_IN_PORT[]->NXM_NX_REG0[0..15] + + Whereas the preceding parts specify fields for the new flow to + match, this specifies an action for the flow to take when it + matches. The action is for the flow to load the ingress port + number of the current packet into register 0 (a special field + that is an Open vSwitch extension to OpenFlow). + +>>> A real use of "learn" for MAC learning would probably involve two + additional elements. First, the "learn" action would specify a + hard_timeout for the new flow, to enable a learned MAC to + eventually expire if no new packets were seen from a given source + within a reasonable interval. Second, one would usually want to + limit resource consumption by using the Flow_Table table in the + Open vSwitch configuration database to specify a maximum number of + flows in table 10. + +This definitely calls for examples. + + +### Testing Table 2 + +### EXAMPLE 1 + +Try the following test command: + + ovs-appctl ofproto/trace br0 in_port=1,vlan_tci=20,dl_src=50:00:00:00:00:01 -generate + +The output shows that "learn" was executed, but it isn't otherwise +informative, so we won't include it here. + +The "-generate" keyword is new. Ordinarily, "ofproto/trace" has no +side effects: "output" actions do not actually output packets, "learn" +actions do not actually modify the flow table, and so on. With +"-generate", though, "ofproto/trace" does execute "learn" actions. +That's important now, because we want to see the effect of the "learn" +action on table 10. You can see that by running: + + ovs-ofctl dump-flows br0 table=10 + +which (omitting the "duration" and "idle_age" fields, which will vary +based on how soon you ran this command after the previous one, as well +as some other uninteresting fields) prints something like: + + NXST_FLOW reply (xid=0x4): + table=10, vlan_tci=0x0014/0x0fff,dl_dst=50:00:00:00:00:01 actions=load:0x1->NXM_NX_REG0[0..15] + +You can see that the packet coming in on VLAN 20 with source MAC +50:00:00:00:00:01 became a flow that matches VLAN 20 (written in +hexadecimal) and destination MAC 50:00:00:00:00:01. The flow loads +port number 1, the input port for the flow we tested, into register 0. + + +### EXAMPLE 2 + +Here's a second test command: + + ovs-appctl ofproto/trace br0 in_port=2,dl_src=50:00:00:00:00:01 -generate + +The flow that this command tests has the same source MAC and VLAN as +example 1, although the VLAN comes from an access port VLAN rather +than an 802.1Q header. If we again dump the flows for table 10 with: + + ovs-ofctl dump-flows br0 table=10 + +then we see that the flow we saw previously has changed to indicate +that the learned port is port 2, as we would expect: + + NXST_FLOW reply (xid=0x4): + table=10, vlan_tci=0x0014/0x0fff,dl_dst=50:00:00:00:00:01 actions=load:0x2->NXM_NX_REG0[0..15] + + +Implementing Table 3: Look Up Destination Port +---------------------------------------------- + +This table figures out what port we should send the packet to based on +the destination MAC and VLAN. That is, if we've learned the location +of the destination (from table 2 processing some previous packet with +that destination as its source), then we want to send the packet +there. + +We need only one flow to do the lookup: + + ovs-ofctl add-flow br0 \ + "table=3 priority=50 actions=resubmit(,10), resubmit(,4)" + +The flow's first action resubmits to table 10, the table that the +"learn" action modifies. As you saw previously, the learned flows in +this table write the learned port into register 0. If the destination +for our packet hasn't been learned, then there will be no matching +flow, and so the "resubmit" turns into a no-op. Because registers are +initialized to 0, we can use a register 0 value of 0 in our next +pipeline stage as a signal to flood the packet. + +The second action resubmits to table 4, continuing to the next +pipeline stage. + +We can add another flow to skip the learning table lookup for +multicast and broadcast packets, since those should always be flooded: + + ovs-ofctl add-flow br0 \ + "table=3 priority=99 dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 \ + actions=resubmit(,4)" + +>>> We don't strictly need to add this flow, because multicast + addresses will never show up in our learning table. (In turn, + that's because we put a flow into table 0 to drop packets that + have a multicast source address.) + + +### Testing Table 3 + +### EXAMPLE + +Here's a command that should cause OVS to learn that f0:00:00:00:00:01 +is on p1 in VLAN 20: + + ovs-appctl ofproto/trace br0 in_port=1,dl_vlan=20,dl_src=f0:00:00:00:00:01,dl_dst=90:00:00:00:00:01 -generate + +Here's an excerpt from the output that shows (from the "no match" +looking up the resubmit to table 10) that the flow's destination was +unknown: + + Resubmitted flow: unchanged + Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 + Resubmitted odp: drop + Rule: table=3 cookie=0 priority=50 + OpenFlow actions=resubmit(,10),resubmit(,4) + + Resubmitted flow: unchanged + Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 + Resubmitted odp: drop + No match + +You can verify that the packet's source was learned two ways. The +most direct way is to dump the learning table with: + + ovs-ofctl dump-flows br0 table=10 + +which ought to show roughly the following, with extraneous details +removed: + + table=10, vlan_tci=0x0014/0x0fff,dl_dst=f0:00:00:00:00:01 actions=load:0x1->NXM_NX_REG0[0..15] + +>>> If you tried the examples for the previous step, or if you did + some of your own experiments, then you might see additional flows + there. These additional flows are harmless. If they bother you, + then you can remove them with "ovs-ofctl del-flows br0 table=10". + +The other way is to inject a packet to take advantage of the learning +entry. For example, we can inject a packet on p2 whose destination is +the MAC address that we just learned on p1: + + ovs-appctl ofproto/trace br0 in_port=2,dl_src=90:00:00:00:00:01,dl_dst=f0:00:00:00:00:01 -generate + +Here's an interesting excerpt from that command's output. This group +of lines traces the "resubmit(,10)", showing that the packet matched +the learned flow for the first MAC we used, loading the OpenFlow port +number for the learned port p1 into register 0: + + Resubmitted flow: unchanged + Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 + Resubmitted odp: drop + Rule: table=10 cookie=0 vlan_tci=0x0014/0x0fff,dl_dst=f0:00:00:00:00:01 + OpenFlow actions=load:0x1->NXM_NX_REG0[0..15] + + +If you read the commands above carefully, then you might have noticed +that they simply have the Ethernet source and destination addresses +exchanged. That means that if we now rerun the first ovs-appctl +command above, e.g.: + + ovs-appctl ofproto/trace br0 in_port=1,dl_vlan=20,dl_src=f0:00:00:00:00:01,dl_dst=90:00:00:00:00:01 -generate + +then we see in the output that the destination has now been learned: + + Resubmitted flow: unchanged + Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 + Resubmitted odp: drop + Rule: table=10 cookie=0 vlan_tci=0x0014/0x0fff,dl_dst=90:00:00:00:00:01 + OpenFlow actions=load:0x2->NXM_NX_REG0[0..15] + + +Implementing Table 4: Output Processing +--------------------------------------- + +At entry to stage 4, we know that register 0 contains either the +desired output port or is zero if the packet should be flooded. We +also know that the packet's VLAN is in its 802.1Q header, even if the +VLAN was implicit because the packet came in on an access port. + +The job of the final pipeline stage is to actually output packets. +The job is trivial for output to our trunk port p1: + + ovs-ofctl add-flow br0 "table=4 reg0=1 actions=1" + +For output to the access ports, we just have to strip the VLAN header +before outputting the packet: + + ovs-ofctl add-flows br0 - <<'EOF' + table=4 reg0=2 actions=strip_vlan,2 + table=4 reg0=3 actions=strip_vlan,3 + table=4 reg0=4 actions=strip_vlan,4 + EOF + +The only slightly tricky part is flooding multicast and broadcast +packets and unicast packets with unlearned destinations. For those, +we need to make sure that we only output the packets to the ports that +carry our packet's VLAN, and that we include the 802.1Q header in the +copy output to the trunk port but not in copies output to access +ports: + + ovs-ofctl add-flows br0 - <<'EOF' + table=4 reg0=0 priority=99 dl_vlan=20 actions=1,strip_vlan,2 + table=4 reg0=0 priority=99 dl_vlan=30 actions=1,strip_vlan,3,4 + table=4 reg0=0 priority=50 actions=1 + EOF + +>>> Our rules rely on the standard OpenFlow behavior that an output + action will not forward a packet back out the port it came in on. + That is, if a packet comes in on p1, and we've learned that the + packet's destination MAC is also on p1, so that we end up with + "actions=1" as our actions, the switch will not forward the packet + back out its input port. The multicast/broadcast/unknown + destination cases above also rely on this behavior. + + +### Testing Table 4 + +### EXAMPLE 1: Broadcast, Multicast, and Unknown Destination + +Try tracing a broadcast packet arriving on p1 in VLAN 30: + + ovs-appctl ofproto/trace br0 in_port=1,dl_dst=ff:ff:ff:ff:ff:ff,dl_vlan=30 + +The interesting part of the output is the final line, which shows that +the switch would remove the 802.1Q header and then output the packet to +p3 and p4, which are access ports for VLAN 30: + + Datapath actions: pop_vlan,3,4 + +Similarly, if we trace a broadcast packet arriving on p3: + + ovs-appctl ofproto/trace br0 in_port=3,dl_dst=ff:ff:ff:ff:ff:ff + +then we see that it is output to p1 with an 802.1Q tag and then to p4 +without one: + + Datapath actions: push_vlan(vid=30,pcp=0),1,pop_vlan,4 + +>>> Open vSwitch could simplify the datapath actions here to just + "4,push_vlan(vid=30,pcp=0),1" but it is not smart enough to do so. + +The following are also broadcasts, but the result is to drop the +packets because the VLAN only belongs to the input port: + + ovs-appctl ofproto/trace br0 in_port=1,dl_dst=ff:ff:ff:ff:ff:ff + ovs-appctl ofproto/trace br0 in_port=1,dl_dst=ff:ff:ff:ff:ff:ff,dl_vlan=55 + +Try some other broadcast cases on your own: + + ovs-appctl ofproto/trace br0 in_port=1,dl_dst=ff:ff:ff:ff:ff:ff,dl_vlan=20 + ovs-appctl ofproto/trace br0 in_port=2,dl_dst=ff:ff:ff:ff:ff:ff + ovs-appctl ofproto/trace br0 in_port=4,dl_dst=ff:ff:ff:ff:ff:ff + +You can see the same behavior with multicast packets and with unicast +packets whose destination has not been learned, e.g.: + + ovs-appctl ofproto/trace br0 in_port=4,dl_dst=01:00:00:00:00:00 + ovs-appctl ofproto/trace br0 in_port=1,dl_dst=90:12:34:56:78:90,dl_vlan=20 + ovs-appctl ofproto/trace br0 in_port=1,dl_dst=90:12:34:56:78:90,dl_vlan=30 + + +### EXAMPLE 2: MAC Learning + +Let's follow the same pattern as we did for table 3. First learn a +MAC on port p1 in VLAN 30: + + ovs-appctl ofproto/trace br0 in_port=1,dl_vlan=30,dl_src=10:00:00:00:00:01,dl_dst=20:00:00:00:00:01 -generate + +You can see from the last line of output that the packet's destination +is unknown, so it gets flooded to both p3 and p4, the other ports in +VLAN 30: + + Datapath actions: pop_vlan,3,4 + +Then reverse the MACs and learn the first flow's destination on port +p4: + + ovs-appctl ofproto/trace br0 in_port=4,dl_src=20:00:00:00:00:01,dl_dst=10:00:00:00:00:01 -generate + +The last line of output shows that the this packet's destination is +known to be p1, as learned from our previous command: + + Datapath actions: push_vlan(vid=30,pcp=0),1 + +Now, if we rerun our first command: + + ovs-appctl ofproto/trace br0 in_port=1,dl_vlan=30,dl_src=10:00:00:00:00:01,dl_dst=20:00:00:00:00:01 -generate + +we can see that the result is no longer a flood but to the specified +learned destination port p4: + + Datapath actions: pop_vlan,4 + + +Contact +======= + +bugs@openvswitch.org +http://openvswitch.org/ diff --git a/tutorial/automake.mk b/tutorial/automake.mk index 82ad66d73..8a75a836f 100644 --- a/tutorial/automake.mk +++ b/tutorial/automake.mk @@ -1,5 +1,5 @@ EXTRA_DIST += \ - tutorial/Tutorial \ + tutorial/Tutorial.md \ tutorial/ovs-sandbox \ tutorial/t-setup \ tutorial/t-stage0 \ diff --git a/utilities/ovs-ctl.8 b/utilities/ovs-ctl.8 index 0f7040318..2720d8c8e 100644 --- a/utilities/ovs-ctl.8 +++ b/utilities/ovs-ctl.8 @@ -441,5 +441,5 @@ distribution are good examples of how to use \fBovs\-ctl\fR. . .SH "SEE ALSO" . -\fBREADME.md\fR, \fBINSTALL.Linux\fR, \fBovsdb\-server\fR(8), +\fBREADME.md\fR, \fBINSTALL.Linux.md\fR, \fBovsdb\-server\fR(8), \fBovs\-vswitchd\fR(8). diff --git a/vtep/README.ovs-vtep b/vtep/README.ovs-vtep deleted file mode 100644 index 5ce63c7cb..000000000 --- a/vtep/README.ovs-vtep +++ /dev/null @@ -1,156 +0,0 @@ - How to Use the VTEP Emulator - ============================ - -This document explains how to use ovs-vtep, a VTEP emulator that uses -Open vSwitch for forwarding. - -Requirements -============ - -The VTEP emulator is a Python script that invokes calls to tools like -vtep-ctl and ovs-vsctl and is useful only when OVS daemons like ovsdb-server -and ovs-vswitchd are running. So those components should be installed. This -can be done by either of the following methods. - -1. Follow the instructions in the INSTALL file of the Open vSwitch repository -(don't start any daemons yet). - -2. Follow the instructions in INSTALL.Debian file and then install the -"openvswitch-vtep" package (if operating on a debian based machine). This -will automatically start the daemons. - -Design -====== - -At the end of this process, you should have the following setup: - - - +---------------------------------------------------+ - | Host Machine | - | | - | | - | +---------+ +---------+ | - | | | | | | - | | VM1 | | VM2 | | - | | | | | | - | +----o----+ +----o----+ | - | | | | - | br0 +------o-----------o--------------------o--+ | - | p0 p1 br0 | - | | - | | - | +------+ +------+ | - +------------------------------| eth0 |---| eth1 |--+ - +------+ +------+ - 10.1.1.1 10.2.2.1 - MANAGEMENT | | - +-----------------o----+ | - | - DATA/TUNNEL | - +-----------------o---+ - -Notes: - -1. We will use Open vSwitch to create our "physical" switch labeled br0 - -2. Our "physical" switch br0 will have one internal port also named br0 - and two "physical" ports, namely p0 and p1. - -3. The host machine may have two external interfaces. We will use eth0 - for management traffic and eth1 for tunnel traffic (One can use - a single interface to achieve both). Please take note of their IP - addresses in the diagram. You do not have to use exactly - the same IP addresses. Just know that the above will be used in the - steps below. - -4. You can optionally connect physical machines instead of virtual - machines to switch br0. In that case: - - 4.1. Make sure you have two extra physical interfaces in your host - machine, eth2 and eth3. - - 4.2. In the rest of this doc, replace p0 with eth2 and p1 with eth3. - -5. In addition to implementing p0 and p1 as physical interfaces, you can - also optionally implement them as standalone TAP devices, or VM - interfaces for simulation. - -6. Creating and attaching the VMs is outside the scope of this document - and is included in the diagram for reference purposes only. - -Startup -======= - -These instructions describe how to run with a single ovsdb-server -instance that handles both the OVS and VTEP schema. You can skip -steps 1-3 if you installed using the debian packages as mentioned in -step 2 of the "Requirements" section. - -1. Create the initial OVS and VTEP schemas: - - ovsdb-tool create /etc/openvswitch/ovs.db vswitchd/vswitch.ovsschema - ovsdb-tool create /etc/openvswitch/vtep.db vtep/vtep.ovsschema - -2. Start ovsdb-server and have it handle both databases: - - ovsdb-server --pidfile --detach --log-file \ - --remote punix:/var/run/openvswitch/db.sock \ - --remote=db:hardware_vtep,Global,managers \ - /etc/openvswitch/ovs.db /etc/openvswitch/vtep.db - -3. Start OVS as normal: - - ovs-vswitchd --log-file --detach --pidfile \ - unix:/var/run/openvswitch/db.sock - -4. Create a "physical" switch and its ports in OVS: - - ovs-vsctl add-br br0 - ovs-vsctl add-port br0 p0 - ovs-vsctl add-port br0 p1 - -5. Configure the physical switch in the VTEP database: - - vtep-ctl add-ps br0 - vtep-ctl set Physical_Switch br0 tunnel_ips=10.2.2.1 - -6. Start the VTEP emulator. If you installed the components by reading the - INSTALL file, run the following from the same directory as this README: - - ./ovs-vtep --log-file=/var/log/openvswitch/ovs-vtep.log \ - --pidfile=/var/run/openvswitch/ovs-vtep.pid \ - --detach br0 - - If the installation was done by installing the openvswitch-vtep - package, you can find ovs-vtep at /usr/share/openvswitch/scripts. - -7. Configure the VTEP database's manager to point at an NVC: - - vtep-ctl set-manager tcp::6632 - - Where CONTROLLER IP is your controller's IP address that is accessible - via the Host Machine's eth0 interface. - -Simulating an NVC -================= - -A VTEP implementation expects to be driven by a Network Virtualization -Controller (NVC), such as NSX. If one does not exist, it's possible to -use vtep-ctl to simulate one: - -1. Create a logical switch: - - vtep-ctl add-ls ls0 - -2. Bind the logical switch to a port: - - vtep-ctl bind-ls br0 p0 0 ls0 - vtep-ctl set Logical_Switch ls0 tunnel_key=33 - -3. Direct unknown destinations out a tunnel: - - vtep-ctl add-mcast-remote ls0 unknown-dst 10.2.2.2 - -4. Direct unicast destinations out a different tunnel: - - vtep-ctl add-ucast-remote ls0 11:22:33:44:55:66 10.2.2.3 diff --git a/vtep/README.ovs-vtep.md b/vtep/README.ovs-vtep.md new file mode 100644 index 000000000..e3bb8bbed --- /dev/null +++ b/vtep/README.ovs-vtep.md @@ -0,0 +1,157 @@ +How to Use the VTEP Emulator +============================ + +This document explains how to use ovs-vtep, a VTEP emulator that uses +Open vSwitch for forwarding. + +Requirements +------------ + +The VTEP emulator is a Python script that invokes calls to tools like +vtep-ctl and ovs-vsctl and is useful only when OVS daemons like ovsdb-server +and ovs-vswitchd are running. So those components should be installed. This +can be done by either of the following methods. + +1. Follow the instructions in the INSTALL.md file of the Open vSwitch repository +(don't start any daemons yet). + +2. Follow the instructions in INSTALL.Debian.md file and then install the +"openvswitch-vtep" package (if operating on a debian based machine). This +will automatically start the daemons. + +Design +====== + +At the end of this process, you should have the following setup: + + + +---------------------------------------------------+ + | Host Machine | + | | + | | + | +---------+ +---------+ | + | | | | | | + | | VM1 | | VM2 | | + | | | | | | + | +----o----+ +----o----+ | + | | | | + | br0 +------o-----------o--------------------o--+ | + | p0 p1 br0 | + | | + | | + | +------+ +------+ | + +------------------------------| eth0 |---| eth1 |--+ + +------+ +------+ + 10.1.1.1 10.2.2.1 + MANAGEMENT | | + +-----------------o----+ | + | + DATA/TUNNEL | + +-----------------o---+ + +Notes: + +1. We will use Open vSwitch to create our "physical" switch labeled br0 + +2. Our "physical" switch br0 will have one internal port also named br0 + and two "physical" ports, namely p0 and p1. + +3. The host machine may have two external interfaces. We will use eth0 + for management traffic and eth1 for tunnel traffic (One can use + a single interface to achieve both). Please take note of their IP + addresses in the diagram. You do not have to use exactly + the same IP addresses. Just know that the above will be used in the + steps below. + +4. You can optionally connect physical machines instead of virtual + machines to switch br0. In that case: + + 4.1. Make sure you have two extra physical interfaces in your host + machine, eth2 and eth3. + + 4.2. In the rest of this doc, replace p0 with eth2 and p1 with eth3. + +5. In addition to implementing p0 and p1 as physical interfaces, you can + also optionally implement them as standalone TAP devices, or VM + interfaces for simulation. + +6. Creating and attaching the VMs is outside the scope of this document + and is included in the diagram for reference purposes only. + +Startup +------- + +These instructions describe how to run with a single ovsdb-server +instance that handles both the OVS and VTEP schema. You can skip +steps 1-3 if you installed using the debian packages as mentioned in +step 2 of the "Requirements" section. + +1. Create the initial OVS and VTEP schemas: + + ovsdb-tool create /etc/openvswitch/ovs.db vswitchd/vswitch.ovsschema + ovsdb-tool create /etc/openvswitch/vtep.db vtep/vtep.ovsschema + +2. Start ovsdb-server and have it handle both databases: + + ovsdb-server --pidfile --detach --log-file \ + --remote punix:/var/run/openvswitch/db.sock \ + --remote=db:hardware_vtep,Global,managers \ + /etc/openvswitch/ovs.db /etc/openvswitch/vtep.db + +3. Start OVS as normal: + + ovs-vswitchd --log-file --detach --pidfile \ + unix:/var/run/openvswitch/db.sock + +4. Create a "physical" switch and its ports in OVS: + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 p0 + ovs-vsctl add-port br0 p1 + +5. Configure the physical switch in the VTEP database: + + vtep-ctl add-ps br0 + vtep-ctl set Physical_Switch br0 tunnel_ips=10.2.2.1 + +6. Start the VTEP emulator. If you installed the components by reading the + INSTALL.md file, run the following from the same directory as this + README.md: + + ./ovs-vtep --log-file=/var/log/openvswitch/ovs-vtep.log \ + --pidfile=/var/run/openvswitch/ovs-vtep.pid \ + --detach br0 + + If the installation was done by installing the openvswitch-vtep + package, you can find ovs-vtep at /usr/share/openvswitch/scripts. + +7. Configure the VTEP database's manager to point at an NVC: + + vtep-ctl set-manager tcp::6632 + + Where CONTROLLER IP is your controller's IP address that is accessible + via the Host Machine's eth0 interface. + +Simulating an NVC +----------------- + +A VTEP implementation expects to be driven by a Network Virtualization +Controller (NVC), such as NSX. If one does not exist, it's possible to +use vtep-ctl to simulate one: + +1. Create a logical switch: + + vtep-ctl add-ls ls0 + +2. Bind the logical switch to a port: + + vtep-ctl bind-ls br0 p0 0 ls0 + vtep-ctl set Logical_Switch ls0 tunnel_key=33 + +3. Direct unknown destinations out a tunnel: + + vtep-ctl add-mcast-remote ls0 unknown-dst 10.2.2.2 + +4. Direct unicast destinations out a different tunnel: + + vtep-ctl add-ucast-remote ls0 11:22:33:44:55:66 10.2.2.3 diff --git a/vtep/automake.mk b/vtep/automake.mk index 360ed3557..f2a1db290 100644 --- a/vtep/automake.mk +++ b/vtep/automake.mk @@ -19,7 +19,7 @@ scripts_SCRIPTS += \ EXTRA_DIST += \ vtep/ovs-vtep \ - vtep/README.ovs-vtep + vtep/README.ovs-vtep.md # VTEP schema and IDL EXTRA_DIST += vtep/vtep.ovsschema