Building and Installing:
------------------------
-Required DPDK 2.0, `fuse`, `fuse-devel` (`libfuse-dev` on Debian/Ubuntu)
+Required: DPDK 2.0
+Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev`
+on Debian/Ubuntu)
1. Configure build & install DPDK:
1. Set `$DPDK_DIR`
`CONFIG_RTE_BUILD_COMBINE_LIBS=y`
Update `config/common_linuxapp` so that DPDK is built with vhost
- libraries; currently, OVS only supports vhost-cuse, so DPDK vhost-user
- libraries should be explicitly turned off (they are enabled by default
- in DPDK 2.0).
+ libraries.
`CONFIG_RTE_LIBRTE_VHOST=y`
- `CONFIG_RTE_LIBRTE_VHOST_USER=n`
Then run `make install` to build and install the library.
For default install without IVSHMEM:
DPDK vhost:
-----------
-vhost-cuse is only supported at present i.e. not using the standard QEMU
-vhost-user interface. It is intended that vhost-user support will be added
-in future releases when supported in DPDK and that vhost-cuse will eventually
-be deprecated. See [DPDK Docs] for more info on vhost.
+DPDK 2.0 supports two types of vhost:
-Prerequisites:
-1. Insert the Cuse module:
+1. vhost-user
+2. vhost-cuse
- `modprobe cuse`
+Whatever type of vhost is enabled in the DPDK build specified, is the type
+that will be enabled in OVS. By default, vhost-user is enabled in DPDK.
+Therefore, unless vhost-cuse has been enabled in DPDK, vhost-user ports
+will be enabled in OVS.
+Please note that support for vhost-cuse is intended to be deprecated in OVS
+in a future release.
-2. Build and insert the `eventfd_link` module:
+DPDK vhost-user:
+----------------
- `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
- `make`
- `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
+The following sections describe the use of vhost-user 'dpdkvhostuser' ports
+with OVS.
-Following the steps above to create a bridge, you can now add DPDK vhost
-as a port to the vswitch.
+DPDK vhost-user Prerequisites:
+-------------------------
-`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost`
+1. DPDK 2.0 with vhost support enabled as documented in the "Building and
+ Installing section"
-Unlike DPDK ring ports, DPDK vhost ports can have arbitrary names:
+2. QEMU version v2.1.0+
-`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost`
+ QEMU v2.1.0 will suffice, but it is recommended to use v2.2.0 if providing
+ your VM with memory greater than 1GB due to potential issues with memory
+ mapping larger areas.
-However, please note that when attaching userspace devices to QEMU, the
-name provided during the add-port operation must match the ifname parameter
-on the QEMU command line.
+Adding DPDK vhost-user ports to the Switch:
+--------------------------------------
+Following the steps above to create a bridge, you can now add DPDK vhost-user
+as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-user ports can
+have arbitrary names.
-DPDK vhost VM configuration:
-----------------------------
+ - For vhost-user, the name of the port type is `dpdkvhostuser`
- vhost ports use a Linux* character device to communicate with QEMU.
+ ```
+ ovs-ofctl add-port br0 vhost-user-1 -- set Interface vhost-user-1
+ type=dpdkvhostuser
+ ```
+
+ This action creates a socket located at
+ `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
+ to your VM on the QEMU command line. More instructions on this can be
+ found in the next section "DPDK vhost-user VM configuration"
+ Note: If you wish for the vhost-user sockets to be created in a
+ directory other than `/usr/local/var/run/openvswitch`, you may specify
+ another location on the ovs-vswitchd command line like so:
+
+ `./vswitchd/ovs-vswitchd --dpdk -vhost_sock_dir /my-dir -c 0x1 ...`
+
+DPDK vhost-user VM configuration:
+---------------------------------
+Follow the steps below to attach vhost-user port(s) to a VM.
+
+1. Configure sockets.
+ Pass the following parameters to QEMU to attach a vhost-user device:
+
+ ```
+ -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1
+ -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
+ -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
+ ```
+
+ ...where vhost-user-1 is the name of the vhost-user port added
+ to the switch.
+ Repeat the above parameters for multiple devices, changing the
+ chardev path and id as necessary. Note that a separate and different
+ chardev path needs to be specified for each vhost-user device. For
+ example you have a second vhost-user port named 'vhost-user-2', you
+ append your QEMU command line with an additional set of parameters:
+
+ ```
+ -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
+ -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
+ -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
+ ```
+
+2. Configure huge pages.
+ QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports access
+ a virtio-net device's virtual rings and packet buffers mapping the VM's
+ physical memory on hugetlbfs. To enable vhost-user ports to map the VM's
+ memory into their process address space, pass the following paramters
+ to QEMU:
+
+ ```
+ -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
+ share=on
+ -numa node,memdev=mem -mem-prealloc
+ ```
+
+DPDK vhost-cuse:
+----------------
+
+The following sections describe the use of vhost-cuse 'dpdkvhostcuse' ports
+with OVS.
+
+DPDK vhost-cuse Prerequisites:
+-------------------------
+
+1. DPDK 2.0 with vhost support enabled as documented in the "Building and
+ Installing section"
+ As an additional step, you must enable vhost-cuse in DPDK by setting the
+ following additional flag in `config/common_linuxapp`:
+
+ `CONFIG_RTE_LIBRTE_VHOST_USER=n`
+
+ Following this, rebuild DPDK as per the instructions in the "Building and
+ Installing" section. Finally, rebuild OVS as per step 3 in the "Building
+ and Installing" section - OVS will detect that DPDK has vhost-cuse libraries
+ compiled and in turn will enable support for it in the switch and disable
+ vhost-user support.
+
+2. Insert the Cuse module:
+
+ `modprobe cuse`
+
+3. Build and insert the `eventfd_link` module:
+
+ ```
+ cd $DPDK_DIR/lib/librte_vhost/eventfd_link/
+ make
+ insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko
+ ```
+
+4. QEMU version v2.1.0+
+
+ vhost-cuse will work with QEMU v2.1.0 and above, however it is recommended to
+ use v2.2.0 if providing your VM with memory greater than 1GB due to potential
+ issues with memory mapping larger areas.
+ Note: QEMU v1.6.2 will also work, with slightly different command line parameters,
+ which are specified later in this document.
+
+Adding DPDK vhost-cuse ports to the Switch:
+--------------------------------------
+
+Following the steps above to create a bridge, you can now add DPDK vhost-cuse
+as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-cuse ports can have
+arbitrary names.
+
+ - For vhost-cuse, the name of the port type is `dpdkvhostcuse`
+
+ ```
+ ovs-ofctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1
+ type=dpdkvhostcuse
+ ```
+
+ When attaching vhost-cuse ports to QEMU, the name provided during the
+ add-port operation must match the ifname parameter on the QEMU command
+ line. More instructions on this can be found in the next section.
+
+DPDK vhost-cuse VM configuration:
+---------------------------------
+
+ vhost-cuse ports use a Linux* character device to communicate with QEMU.
By default it is set to `/dev/vhost-net`. It is possible to reuse this
standard device for DPDK vhost, which makes setup a little simpler but it
is better practice to specify an alternative character device in order to
QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
virtio-net device's virtual rings and packet buffers mapping the VM's
physical memory on hugetlbfs. To enable vhost-ports to map the VM's
- memory into their process address space, pass the following paramters
+ memory into their process address space, pass the following parameters
to QEMU:
`-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
share=on -numa node,memdev=mem -mem-prealloc`
+ Note: For use with an earlier QEMU version such as v1.6.2, use the
+ following to configure hugepages instead:
-DPDK vhost VM configuration with QEMU wrapper:
-----------------------------------------------
+ `-mem-path /dev/hugepages -mem-prealloc`
+DPDK vhost-cuse VM configuration with QEMU wrapper:
+---------------------------------------------------
The QEMU wrapper script automatically detects and calls QEMU with the
necessary parameters. It performs the following actions:
netdev=net1,mac=00:00:00:00:00:01
```
-DPDK vhost VM configuration with libvirt:
------------------------------------------
+DPDK vhost-cuse VM configuration with libvirt:
+----------------------------------------------
If you are using libvirt, you must enable libvirt to access the character
device by adding it to controllers cgroup for libvirtd using the following
`virsh create my_vhost_vm.xml`
-DPDK vhost VM configuration with libvirt and QEMU wrapper:
+DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper:
----------------------------------------------------------
To use the qemu-wrapper script in conjuntion with libvirt, follow the
the correct emulator location and set any additional options. If you are
using a alternative character device name, please set "us_vhost_path" to the
location of that device. The script will automatically detect and insert
- the correct "vhostfd" value in the QEMU command line arguements.
+ the correct "vhostfd" value in the QEMU command line arguments.
5. Use virt-manager to launch the VM
#include <config.h>
-#include <stdio.h>
#include <string.h>
#include <signal.h>
#include <stdlib.h>
#include <sched.h>
#include <stdlib.h>
#include <unistd.h>
+#include <sys/stat.h>
#include <stdio.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include "dirs.h"
#include "dp-packet.h"
#include "dpif-netdev.h"
#include "list.h"
#define NIC_PORT_RX_Q_SIZE 2048 /* Size of Physical NIC RX Queue, Max (n+32<=4096)*/
#define NIC_PORT_TX_Q_SIZE 2048 /* Size of Physical NIC TX Queue, Max (n+32<=4096)*/
-/* Character device cuse_dev_name. */
-static char *cuse_dev_name = NULL;
+char *cuse_dev_name = NULL; /* Character device cuse_dev_name. */
+char *vhost_sock_dir = NULL; /* Location of vhost-user sockets */
/*
* Maximum amount of time in micro seconds to try and enqueue to vhost.
enum dpdk_dev_type {
DPDK_DEV_ETH = 0,
- DPDK_DEV_VHOST = 1
+ DPDK_DEV_VHOST = 1,
};
static int rte_eal_init_ret = ENODEV;
/* virtio-net structure for vhost device */
OVSRCU_TYPE(struct virtio_net *) virtio_dev;
+ /* Identifier used to distinguish vhost devices from each other */
+ char vhost_id[PATH_MAX];
+
/* In dpdk_list. */
struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
};
}
static int
-netdev_dpdk_vhost_construct(struct netdev *netdev_)
+vhost_construct_helper(struct netdev *netdev_)
{
struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
- int err;
if (rte_eal_init_ret) {
return rte_eal_init_ret;
}
+ rte_spinlock_init(&netdev->vhost_tx_lock);
+ return netdev_dpdk_init(netdev_, -1, DPDK_DEV_VHOST);
+}
+
+static int
+netdev_dpdk_vhost_cuse_construct(struct netdev *netdev_)
+{
+ struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
+ int err;
+
ovs_mutex_lock(&dpdk_mutex);
- err = netdev_dpdk_init(netdev_, -1, DPDK_DEV_VHOST);
+ strncpy(netdev->vhost_id, netdev->up.name, sizeof(netdev->vhost_id));
+ err = vhost_construct_helper(netdev_);
ovs_mutex_unlock(&dpdk_mutex);
+ return err;
+}
- rte_spinlock_init(&netdev->vhost_tx_lock);
+static int
+netdev_dpdk_vhost_user_construct(struct netdev *netdev_)
+{
+ struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
+ int err;
+ ovs_mutex_lock(&dpdk_mutex);
+ /* Take the name of the vhost-user port and append it to the location where
+ * the socket is to be created, then register the socket.
+ */
+ snprintf(netdev->vhost_id, sizeof(netdev->vhost_id), "%s/%s",
+ vhost_sock_dir, netdev_->name);
+ err = rte_vhost_driver_register(netdev->vhost_id);
+ if (err) {
+ VLOG_ERR("vhost-user socket device setup failure for socket %s\n",
+ netdev->vhost_id);
+ }
+ VLOG_INFO("Socket %s created for vhost-user port %s\n", netdev->vhost_id, netdev_->name);
+ err = vhost_construct_helper(netdev_);
+ ovs_mutex_unlock(&dpdk_mutex);
return err;
}
ovs_mutex_lock(&dpdk_mutex);
/* Add device to the vhost port with the same name as that passed down. */
LIST_FOR_EACH(netdev, list_node, &dpdk_list) {
- if (strncmp(dev->ifname, netdev->up.name, IFNAMSIZ) == 0) {
+ if (strncmp(dev->ifname, netdev->vhost_id, IF_NAME_SZ) == 0) {
ovs_mutex_lock(&netdev->mutex);
ovsrcu_set(&netdev->virtio_dev, dev);
ovs_mutex_unlock(&netdev->mutex);
};
static void *
-start_cuse_session_loop(void *dummy OVS_UNUSED)
+start_vhost_loop(void *dummy OVS_UNUSED)
{
pthread_detach(pthread_self());
/* Put the cuse thread into quiescent state. */
static int
dpdk_vhost_class_init(void)
+{
+ rte_vhost_driver_callback_register(&virtio_net_device_ops);
+ ovs_thread_create("vhost_thread", start_vhost_loop, NULL);
+ return 0;
+}
+
+static int
+dpdk_vhost_cuse_class_init(void)
{
int err = -1;
- rte_vhost_driver_callback_register(&virtio_net_device_ops);
/* Register CUSE device to handle IOCTLs.
* Unless otherwise specified on the vswitchd command line, cuse_dev_name
return -1;
}
- ovs_thread_create("cuse_thread", start_cuse_session_loop, NULL);
+ dpdk_vhost_class_init();
+ return 0;
+}
+
+static int
+dpdk_vhost_user_class_init(void)
+{
+ dpdk_vhost_class_init();
return 0;
}
NULL, /* rxq_drain */ \
}
+static int
+process_vhost_flags(char *flag, char *default_val, int size,
+ char **argv, char **new_val)
+{
+ int changed = 0;
+
+ /* Depending on which version of vhost is in use, process the vhost-specific
+ * flag if it is provided on the vswitchd command line, otherwise resort to
+ * a default value.
+ *
+ * For vhost-user: Process "-cuse_dev_name" to set the custom location of
+ * the vhost-user socket(s).
+ * For vhost-cuse: Process "-vhost_sock_dir" to set the custom name of the
+ * vhost-cuse character device.
+ */
+ if (!strcmp(argv[1], flag) && (strlen(argv[2]) <= size)) {
+ changed = 1;
+ *new_val = strdup(argv[2]);
+ VLOG_INFO("User-provided %s in use: %s", flag, *new_val);
+ } else {
+ VLOG_INFO("No %s provided - defaulting to %s", flag, default_val);
+ *new_val = default_val;
+ }
+
+ return changed;
+}
+
int
dpdk_init(int argc, char **argv)
{
argc--;
argv++;
- /* If the cuse_dev_name parameter has been provided, set 'cuse_dev_name' to
- * this string if it meets the correct criteria. Otherwise, set it to the
- * default (vhost-net).
- */
- if (!strcmp(argv[1], "--cuse_dev_name") &&
- (strlen(argv[2]) <= NAME_MAX)) {
-
- cuse_dev_name = strdup(argv[2]);
+#ifdef VHOST_CUSE
+ if (process_vhost_flags("-cuse_dev_name", strdup("vhost-net"),
+ PATH_MAX, argv, &cuse_dev_name)) {
+#else
+ if (process_vhost_flags("-vhost_sock_dir", strdup(ovs_rundir()),
+ NAME_MAX, argv, &vhost_sock_dir)) {
+ struct stat s;
+ int err;
- /* Remove the cuse_dev_name configuration parameters from the argument
+ err = stat(vhost_sock_dir, &s);
+ if (err) {
+ VLOG_ERR("vHostUser socket DIR '%s' does not exist.",
+ vhost_sock_dir);
+ return err;
+ }
+#endif
+ /* Remove the vhost flag configuration parameters from the argument
* list, so that the correct elements are passed to the DPDK
* initialization function
*/
argc -= 2;
- argv += 2; /* Increment by two to bypass the cuse_dev_name arguments */
+ argv += 2; /* Increment by two to bypass the vhost flag arguments */
base = 2;
-
- VLOG_ERR("User-provided cuse_dev_name in use: /dev/%s", cuse_dev_name);
- } else {
- cuse_dev_name = "vhost-net";
- VLOG_INFO("No cuse_dev_name provided - defaulting to /dev/vhost-net");
}
/* Keep the program name argument as this is needed for call to
netdev_dpdk_get_status,
netdev_dpdk_rxq_recv);
-static const struct netdev_class dpdk_vhost_class =
+static const struct netdev_class dpdk_vhost_cuse_class =
NETDEV_DPDK_CLASS(
- "dpdkvhost",
- dpdk_vhost_class_init,
- netdev_dpdk_vhost_construct,
+ "dpdkvhostcuse",
+ dpdk_vhost_cuse_class_init,
+ netdev_dpdk_vhost_cuse_construct,
+ netdev_dpdk_vhost_destruct,
+ netdev_dpdk_vhost_set_multiq,
+ netdev_dpdk_vhost_send,
+ netdev_dpdk_vhost_get_carrier,
+ netdev_dpdk_vhost_get_stats,
+ NULL,
+ NULL,
+ netdev_dpdk_vhost_rxq_recv);
+
+const struct netdev_class dpdk_vhost_user_class =
+ NETDEV_DPDK_CLASS(
+ "dpdkvhostuser",
+ dpdk_vhost_user_class_init,
+ netdev_dpdk_vhost_user_construct,
netdev_dpdk_vhost_destruct,
netdev_dpdk_vhost_set_multiq,
netdev_dpdk_vhost_send,
dpdk_common_init();
netdev_register_provider(&dpdk_class);
netdev_register_provider(&dpdk_ring_class);
- netdev_register_provider(&dpdk_vhost_class);
+#ifdef VHOST_CUSE
+ netdev_register_provider(&dpdk_vhost_cuse_class);
+#else
+ netdev_register_provider(&dpdk_vhost_user_class);
+#endif
ovsthread_once_done(&once);
}
}