[RFC net-next 00/18] virtio_net XDP offload

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[RFC net-next 00/18] virtio_net XDP offload

Prashant Bhole
Note: This RFC has been sent to netdev as well as qemu-devel lists

This series introduces XDP offloading from virtio_net. It is based on
the following work by Jason Wang:
https://netdevconf.info/0x13/session.html?xdp-offload-with-virtio-net

Current XDP performance in virtio-net is far from what we can achieve
on host. Several major factors cause the difference:
- Cost of virtualization
- Cost of virtio (populating virtqueue and context switching)
- Cost of vhost, it needs more optimization
- Cost of data copy
Because of above reasons there is a need of offloading XDP program to
host. This set is an attempt to implement XDP offload from the guest.


* High level design:

virtio_net exposes itself as offload capable device and works as a
transport of commands to load the program on the host. When offload is
requested, it sends the program to Qemu. Qemu then loads the program
and attaches to corresponding tap device. Similarly virtio_net sends
control commands to create and control maps. tap device runs the XDP
prog in its Tx path. The fast datapath remains on host whereas slow
path in which user program reads/updates map values remains in the
guest.

When offloading to actual hardware the program needs to be translated
and JITed for the target hardware. In case of offloading from guest
we pass almost raw program to the host. The verifier on the host
verifies the offloaded program.


* Implementation in Kernel


virtio_net
==========
Creates bpf offload device and registers as offload capable device.
It also implements bpf_map_dev_ops to handle the offloaded map. A new
command structure is defined to communicate with qemu.

Map offload:
- In offload sequence maps are always offloaded before the program. In
  map offloading stage, virtio_net sends control commands to qemu to
  create a map and return a map fd which is valid on host. This fd is
  stored in driver specific map structure. A list of such maps is
  maintained.

- Currently BPF_MAP_TYPE_ARRAY and BPF_MAP_TYPE_HASH are supported.
  Offloading a per cpu array from guest to host doesn't make sense.

Program offload:
- In general the verifier in the guest replaces map fds in the user
  submitted programs with map pointers then bpf_prog_offload_ops
  callbacks are called.

- This set introduces new program offload callback 'setup()' which
  verifier calls before replacing map fds with map pointers. This way
  virtio_net can create a copy of the program with guest map fds. It
  was needed because virtio_net wants to derive driver specific map
  data from guest map fd. Then guest map fd will be replaced with
  host map fd in the copy of the program, hence the copy of the
  program which will be submitted to the host will have valid host map
  fds.

- Alternatively if we can move the prep() call in the verifier before
  map fd replacement happens, there is not need to introduce 'setup()'
  callback.

- As per current implementation of 'setup()' callback in virtio_net,
  it verifies full program for allowed helper functions and performs
  above mentioned map fd replacement.

- A list of allowed helper function is maintained and it is currently
  experimental, it will be updated later as per need. Using this
  list we can filter out most non-XDP type programs to some extent.
  Also we prevent the guest from collecting host specific information
  by not allowing some helper calls.

- XDP_PROG_SETUP_HW is called after successful program verification.
  In this call a control buffer is prepared, program instructions are
  appended to the buffer and it is sent to qemu.

tun
===
This set makes changes in tun to run XDP prog in Tx path. It will be
the offloaded program from the guest. This program can be set using
tun ioctl interface. There were multiple places where this program can
be executed.
- tun_net_xmit
- tun_xdp_xmit
- tun_recvmsg
tun_recvmsg was chosen because it runs in process context. The other
two run in bh context. Running in process context helps in setting up
service chaining using XDP redirect.

XDP_REDIRECT action of offloaded program isn't handled. It is because
target interface's ndo_xdp_xmit is called when we redirect a packet.
In offload case the target interface will be some tap interface. Any
packet redirected towards it will sent back to the guest, which is not
what we expect. Such redirect will need special handling in the kernel

XDP_TX action of offloaded program is handled. Packet is injected into
the Rx path in this case. Care is taken such that the tap's native Rx
path XDP will be executed in such case.


* Implementation in Qemu

Qemu is modified to handle handle control commands from the guest.
When program offload command is received, it loads the program in the
host OS and attaches program fd to tap device. All the program and map
operations are performed using libbpf APIs.


* Performance numbers

Single flow tests were performed. The diagram below shows the setup.
xdp1 and xdp2 sample programs were modified to use BPF_MAP_TYPE_ARRAY
instead of per cpu array and xdp1_user.c was modified to have hardware
offload parameter.

                     (Rx path XDP to drop      (Tx path XDP.
                      XDP_TX'ed pkts from       Program offloaded
                      tun Tx path XDP)          from virtio_net)
                          XDP_DROP ----------.  XDP_DROP/XDP_TX
                                              \   |
                                    (Case 2)   \  |   XDP_DROP/XDP_TX
 pktgen ---> 10G-NIC === 10G-NIC --- bridge --- tun --- virtio-net
|<------ netns ------>|    |                     ^   |<----guest---->|
                           v                     |
                           '---- XDP_REDIRECT----'
                                  (Case 1)

Case 1: Packets XDP_REDIRECT'ed towards tun.
                        Non-offload        Offload
  xdp1 (XDP_DROP)        2.46 Mpps        12.90 Mpps
  xdp2 (XDP_TX)          1.50 Mpps         7.26 Mpps

Case 2: Packets are not redirected. They pass through the bridge.
                        Non-offload        Offload
  xdp1 (XDP_DROP)        1.03 Mpps         1.01 Mpps
  xdp2 (XDP_TX)          1.10 Mpps         0.99 Mpps

  In case 2, the offload performance is low. In this case the
  producer function is tun_net_xmit. It puts single packet in ptr ring
  and spends most of the time in waking up vhost thread. On the other
  hand, each time when vhost thread wakes up, it calls tun_recvmsg.
  Since Tx path XDP runs in tun_recvmsg, vhost doesn't see any packet.
  It sleeps frequently and producer function most spends more time in
  waking it up. vhost polling improves these numbers but in that case
  non-offload performance also improves and remains higher than the
  offload case. Performance in this case can be improved later in a
  separate work.

Since this set makes changes in virtio_net, tun and vhost_net, it was
necessary to measure the performance difference after applying this
set. Performance numbers are in table below:

   Netperf Test         Before      After      Difference
  UDP_STREAM 18byte     89.43       90.74       +1.46%
  UDP_STREAM 1472byte    6882        7026       +2.09%
  TCP_STREAM             9403        9407       +0.04%
  UDP_RR                13520       13478       -0.31%
  TCP_RR                13120       12918       -1.53%


* Points for improvement (TODO)

- In current implementation, qemu passes host map fd to the guest,
  which means guest is poking host information. It can be avoided by
  moving the map fd replacement task from guest to qemu.

- Currently there is no way on the host side to show whether a tap
  interface has offloaded XDP program attached.

- When sending program and map related control commands from guest to
  host, it will be better if we pass metadata about the program, map.
  For example BTF data.

- In future virtio can have feature bit for offloading capability

- TUNGETFEATURES should have a flag to notify about offloading
  capability

- Submit virtio spec patch to describe XDP offloading feature

- When offloading is enabled, it should be a migration blocker.

- DoS: Offloaded map uses host's memory which is other than what has
  been allocated for the guest. Offloading many maps of large size can
  be one of the DoS strategy. Hence qemu should have parameter to
  limit how many maps guest can offload or how much memory offloaded
  maps use.


* Other dependencies

- Loading a bpf program requires CAP_SYS_ADMIN capability. We tested
  this set by running qemu as root OR adding CAP_SYS_ADMIN to the
  qemu binary. In other cases Qemu doesn't have this capability.
  Alexei's recent work CAP_BPF can be a solution to this problem.
  The CAP_BPF work is still being discussed in the mailing list.

Jason Wang (9):
  bpf: introduce bpf_prog_offload_verifier_setup()
  net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core()
  net: core: export do_xdp_generic_core()
  tun: set offloaded xdp program
  virtio-net: store xdp_prog in device
  virtio_net: add XDP prog offload infrastructure
  virtio_net: implement XDP prog offload functionality
  bpf: export function __bpf_map_get
  virtio_net: implment XDP map offload functionality

Prashant Bhole (9):
  tuntap: check tun_msg_ctl type at necessary places
  vhost_net: user tap recvmsg api to access ptr ring
  tuntap: remove usage of ptr ring in vhost_net
  tun: run offloaded XDP program in Tx path
  tun: add a way to inject Tx path packet into Rx path
  tun: handle XDP_TX action of offloaded program
  tun: run xdp prog when tun is read from file interface
  virtio_net: use XDP attachment helpers
  virtio_net: restrict bpf helper calls from offloaded program

 drivers/net/tap.c               |  42 ++-
 drivers/net/tun.c               | 257 +++++++++++++--
 drivers/net/virtio_net.c        | 552 +++++++++++++++++++++++++++++---
 drivers/vhost/net.c             |  77 ++---
 include/linux/bpf.h             |   1 +
 include/linux/bpf_verifier.h    |   1 +
 include/linux/if_tap.h          |   5 -
 include/linux/if_tun.h          |  23 +-
 include/linux/netdevice.h       |   2 +
 include/uapi/linux/if_tun.h     |   1 +
 include/uapi/linux/virtio_net.h |  50 +++
 kernel/bpf/offload.c            |  14 +
 kernel/bpf/syscall.c            |   1 +
 kernel/bpf/verifier.c           |   6 +
 net/core/dev.c                  |   8 +-
 15 files changed, 901 insertions(+), 139 deletions(-)

--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 01/18] bpf: introduce bpf_prog_offload_verifier_setup()

Prashant Bhole
From: Jason Wang <[hidden email]>

Background:
This change was initiated from virtio_net XDP offload work. As per
the implementation plan, a copy of original program with map fds from
guest replaced with map fds from host needs to be offloaded to the
host. To implement this fd replacement, insn_hook() must provide an
insn with map fd intact. bpf_map and driver specific map data can be
derived from map_fd.

Since verifier calls all the offload callbacks after replacing map
fds, it was difficult to implement virtio_net XDP offload feature.
If virtio_net gets only one callback with original bpf program, it
will get a chance to perform the fd replacement in its own copy of the
program.

Solution:
Let's introduce a setup() callback in bpf_prog_offload_ops. It will be
non mandetory. The verifier will call it just before replacing the map
fds.

Signed-off-by: Jason Wang <[hidden email]>
Signed-off-by: Prashant Bhole <[hidden email]>
---
 include/linux/bpf.h          |  1 +
 include/linux/bpf_verifier.h |  1 +
 kernel/bpf/offload.c         | 14 ++++++++++++++
 kernel/bpf/verifier.c        |  6 ++++++
 4 files changed, 22 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 35903f148be5..1cdba120357c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -361,6 +361,7 @@ struct bpf_prog_offload_ops {
     struct bpf_insn *insn);
  int (*remove_insns)(struct bpf_verifier_env *env, u32 off, u32 cnt);
  /* program management callbacks */
+ int (*setup)(struct bpf_prog *prog);
  int (*prepare)(struct bpf_prog *prog);
  int (*translate)(struct bpf_prog *prog);
  void (*destroy)(struct bpf_prog *prog);
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 26e40de9ef55..de7028e17c0d 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -418,6 +418,7 @@ static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
  return cur_func(env)->regs;
 }
 
+int bpf_prog_offload_verifier_setup(struct bpf_prog *prog);
 int bpf_prog_offload_verifier_prep(struct bpf_prog *prog);
 int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env,
  int insn_idx, int prev_insn_idx);
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 5b9da0954a27..04ca7a31d947 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -124,6 +124,20 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
  return err;
 }
 
+int bpf_prog_offload_verifier_setup(struct bpf_prog *prog)
+{
+ struct bpf_prog_offload *offload;
+ int ret = 0;
+
+ down_read(&bpf_devs_lock);
+ offload = prog->aux->offload;
+ if (offload && offload->offdev->ops->setup)
+ ret = offload->offdev->ops->setup(prog);
+ up_read(&bpf_devs_lock);
+
+ return ret;
+}
+
 int bpf_prog_offload_verifier_prep(struct bpf_prog *prog)
 {
  struct bpf_prog_offload *offload;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a0482e1c4a77..94b43542439e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9737,6 +9737,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 
  env->allow_ptr_leaks = is_priv;
 
+ if (bpf_prog_is_dev_bound(env->prog->aux)) {
+ ret = bpf_prog_offload_verifier_setup(env->prog);
+ if (ret)
+ goto skip_full_check;
+ }
+
  if (is_priv)
  env->test_state_freq = attr->prog_flags & BPF_F_TEST_STATE_FREQ;
 
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 02/18] net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core()

Prashant Bhole
In reply to this post by Prashant Bhole
From: Jason Wang <[hidden email]>

In skb generic path, we need a way to run XDP program on skb but
to have customized handling of XDP actions. netif_receive_generic_xdp
will be more helpful in such cases than do_xdp_generic.

This patch prepares netif_receive_generic_xdp() to be used as general
purpose function by renaming it.

Signed-off-by: Jason Wang <[hidden email]>
Signed-off-by: Prashant Bhole <[hidden email]>
---
 net/core/dev.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index c7fc902ccbdc..5ae647b9914f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4461,9 +4461,9 @@ static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
  return rxqueue;
 }
 
-static u32 netif_receive_generic_xdp(struct sk_buff *skb,
-     struct xdp_buff *xdp,
-     struct bpf_prog *xdp_prog)
+static u32 do_xdp_generic_core(struct sk_buff *skb,
+       struct xdp_buff *xdp,
+       struct bpf_prog *xdp_prog)
 {
  struct netdev_rx_queue *rxqueue;
  void *orig_data, *orig_data_end;
@@ -4610,7 +4610,7 @@ int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
  u32 act;
  int err;
 
- act = netif_receive_generic_xdp(skb, &xdp, xdp_prog);
+ act = do_xdp_generic_core(skb, &xdp, xdp_prog);
  if (act != XDP_PASS) {
  switch (act) {
  case XDP_REDIRECT:
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 03/18] net: core: export do_xdp_generic_core()

Prashant Bhole
In reply to this post by Prashant Bhole
From: Jason Wang <[hidden email]>

Let's export do_xdp_generic as a general purpose function. It will
just run XDP program on skb but will not handle XDP actions.

Signed-off-by: Jason Wang <[hidden email]>
Signed-off-by: Prashant Bhole <[hidden email]>
---
 include/linux/netdevice.h | 2 ++
 net/core/dev.c            | 6 +++---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9e6fb8524d91..2b6317ac9795 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3648,6 +3648,8 @@ static inline void dev_consume_skb_any(struct sk_buff *skb)
 
 void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
 int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb);
+u32 do_xdp_generic_core(struct sk_buff *skb, struct xdp_buff *xdp,
+ struct bpf_prog *xdp_prog);
 int netif_rx(struct sk_buff *skb);
 int netif_rx_ni(struct sk_buff *skb);
 int netif_receive_skb(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 5ae647b9914f..d97c3f35e047 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4461,9 +4461,8 @@ static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
  return rxqueue;
 }
 
-static u32 do_xdp_generic_core(struct sk_buff *skb,
-       struct xdp_buff *xdp,
-       struct bpf_prog *xdp_prog)
+u32 do_xdp_generic_core(struct sk_buff *skb, struct xdp_buff *xdp,
+ struct bpf_prog *xdp_prog)
 {
  struct netdev_rx_queue *rxqueue;
  void *orig_data, *orig_data_end;
@@ -4574,6 +4573,7 @@ static u32 do_xdp_generic_core(struct sk_buff *skb,
 
  return act;
 }
+EXPORT_SYMBOL_GPL(do_xdp_generic_core);
 
 /* When doing generic XDP we have to bypass the qdisc layer and the
  * network taps in order to match in-driver-XDP behavior.
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 04/18] tuntap: check tun_msg_ctl type at necessary places

Prashant Bhole
In reply to this post by Prashant Bhole
tun_msg_ctl is used by vhost_net to communicate with tuntap. We will
introduce another type in soon. As a preparation this patch adds
conditions to check tun_msg_ctl type at necessary places.

Signed-off-by: Prashant Bhole <[hidden email]>
---
 drivers/net/tap.c | 7 +++++--
 drivers/net/tun.c | 6 +++++-
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 3ae70c7e6860..4df7bf00af66 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1213,6 +1213,7 @@ static int tap_sendmsg(struct socket *sock, struct msghdr *m,
  struct tap_queue *q = container_of(sock, struct tap_queue, sock);
  struct tun_msg_ctl *ctl = m->msg_control;
  struct xdp_buff *xdp;
+ void *ptr = NULL;
  int i;
 
  if (ctl && (ctl->type == TUN_MSG_PTR)) {
@@ -1223,8 +1224,10 @@ static int tap_sendmsg(struct socket *sock, struct msghdr *m,
  return 0;
  }
 
- return tap_get_user(q, ctl ? ctl->ptr : NULL, &m->msg_iter,
-    m->msg_flags & MSG_DONTWAIT);
+ if (ctl && ctl->type == TUN_MSG_UBUF)
+ ptr = ctl->ptr;
+
+ return tap_get_user(q, ptr, &m->msg_iter, m->msg_flags & MSG_DONTWAIT);
 }
 
 static int tap_recvmsg(struct socket *sock, struct msghdr *m,
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 683d371e6e82..1e436d9ec4e1 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2529,6 +2529,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
  struct tun_struct *tun = tun_get(tfile);
  struct tun_msg_ctl *ctl = m->msg_control;
  struct xdp_buff *xdp;
+ void *ptr = NULL;
 
  if (!tun)
  return -EBADFD;
@@ -2560,7 +2561,10 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
  goto out;
  }
 
- ret = tun_get_user(tun, tfile, ctl ? ctl->ptr : NULL, &m->msg_iter,
+ if (ctl && ctl->type == TUN_MSG_UBUF)
+ ptr = ctl->ptr;
+
+ ret = tun_get_user(tun, tfile, ptr, &m->msg_iter,
    m->msg_flags & MSG_DONTWAIT,
    m->msg_flags & MSG_MORE);
 out:
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 05/18] vhost_net: user tap recvmsg api to access ptr ring

Prashant Bhole
In reply to this post by Prashant Bhole
Currently vhost_net directly accesses ptr ring of tap driver to
fetch Rx packet pointers. In order to avoid it this patch modifies
tap driver's recvmsg api to do additional task of fetching Rx packet
pointers.

A special struct tun_msg_ctl is already being passed via msg_control
for tun Rx XDP batching. This patch extends tun_msg_ctl usage to
send sub commands to recvmsg api. Now tun_recvmsg will handle commands
to consume and unconsume packet pointers from ptr ring.

This will be useful in implementation of virtio-net XDP offload
feature, where packets will be XDP processed before they are passed
to vhost_net.

Signed-off-by: Prashant Bhole <[hidden email]>
---
 drivers/net/tap.c      | 22 ++++++++++++++++++-
 drivers/net/tun.c      | 24 ++++++++++++++++++++-
 drivers/vhost/net.c    | 48 +++++++++++++++++++++++++++++++-----------
 include/linux/if_tun.h | 18 ++++++++++++++++
 4 files changed, 98 insertions(+), 14 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 4df7bf00af66..8635cdfd7aa4 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1234,8 +1234,28 @@ static int tap_recvmsg(struct socket *sock, struct msghdr *m,
        size_t total_len, int flags)
 {
  struct tap_queue *q = container_of(sock, struct tap_queue, sock);
- struct sk_buff *skb = m->msg_control;
+ struct tun_msg_ctl *ctl = m->msg_control;
+ struct sk_buff *skb = NULL;
  int ret;
+
+ if (ctl) {
+ switch (ctl->type) {
+ case TUN_MSG_PKT:
+ skb = ctl->ptr;
+ break;
+ case TUN_MSG_CONSUME_PKTS:
+ return ptr_ring_consume_batched(&q->ring,
+ ctl->ptr,
+ ctl->num);
+ case TUN_MSG_UNCONSUME_PKTS:
+ ptr_ring_unconsume(&q->ring, ctl->ptr, ctl->num,
+   tun_ptr_free);
+ return 0;
+ default:
+ return -EINVAL;
+ }
+ }
+
  if (flags & ~(MSG_DONTWAIT|MSG_TRUNC)) {
  kfree_skb(skb);
  return -EINVAL;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 1e436d9ec4e1..4f28f2387435 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2577,7 +2577,8 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
 {
  struct tun_file *tfile = container_of(sock, struct tun_file, socket);
  struct tun_struct *tun = tun_get(tfile);
- void *ptr = m->msg_control;
+ struct tun_msg_ctl *ctl = m->msg_control;
+ void *ptr = NULL;
  int ret;
 
  if (!tun) {
@@ -2585,6 +2586,27 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
  goto out_free;
  }
 
+ if (ctl) {
+ switch (ctl->type) {
+ case TUN_MSG_PKT:
+ ptr = ctl->ptr;
+ break;
+ case TUN_MSG_CONSUME_PKTS:
+ ret = ptr_ring_consume_batched(&tfile->tx_ring,
+       ctl->ptr,
+       ctl->num);
+ goto out;
+ case TUN_MSG_UNCONSUME_PKTS:
+ ptr_ring_unconsume(&tfile->tx_ring, ctl->ptr,
+   ctl->num, tun_ptr_free);
+ ret = 0;
+ goto out;
+ default:
+ ret = -EINVAL;
+ goto out_put_tun;
+ }
+ }
+
  if (flags & ~(MSG_DONTWAIT|MSG_TRUNC|MSG_ERRQUEUE)) {
  ret = -EINVAL;
  goto out_put_tun;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 1a2dd53caade..0f91b374a558 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -175,24 +175,44 @@ static void *vhost_net_buf_consume(struct vhost_net_buf *rxq)
 
 static int vhost_net_buf_produce(struct vhost_net_virtqueue *nvq)
 {
+ struct vhost_virtqueue *vq = &nvq->vq;
+ struct socket *sock = vq->private_data;
  struct vhost_net_buf *rxq = &nvq->rxq;
+ struct tun_msg_ctl ctl = {
+ .type = TUN_MSG_CONSUME_PKTS,
+ .ptr = (void *) rxq->queue,
+ .num = VHOST_NET_BATCH,
+ };
+ struct msghdr msg = {
+ .msg_control = &ctl,
+ };
 
  rxq->head = 0;
- rxq->tail = ptr_ring_consume_batched(nvq->rx_ring, rxq->queue,
-      VHOST_NET_BATCH);
+ rxq->tail = sock->ops->recvmsg(sock, &msg, 0, 0);
+ if (WARN_ON_ONCE(rxq->tail < 0))
+ rxq->tail = 0;
+
  return rxq->tail;
 }
 
 static void vhost_net_buf_unproduce(struct vhost_net_virtqueue *nvq)
 {
+ struct vhost_virtqueue *vq = &nvq->vq;
+ struct socket *sock = vq->private_data;
  struct vhost_net_buf *rxq = &nvq->rxq;
+ struct tun_msg_ctl ctl = {
+ .type = TUN_MSG_UNCONSUME_PKTS,
+ .ptr = (void *) (rxq->queue + rxq->head),
+ .num = vhost_net_buf_get_size(rxq),
+ };
+ struct msghdr msg = {
+ .msg_control = &ctl,
+ };
 
- if (nvq->rx_ring && !vhost_net_buf_is_empty(rxq)) {
- ptr_ring_unconsume(nvq->rx_ring, rxq->queue + rxq->head,
-   vhost_net_buf_get_size(rxq),
-   tun_ptr_free);
- rxq->head = rxq->tail = 0;
- }
+ if (!vhost_net_buf_is_empty(rxq))
+ sock->ops->recvmsg(sock, &msg, 0, 0);
+
+ rxq->head = rxq->tail = 0;
 }
 
 static int vhost_net_buf_peek_len(void *ptr)
@@ -1109,6 +1129,7 @@ static void handle_rx(struct vhost_net *net)
  .flags = 0,
  .gso_type = VIRTIO_NET_HDR_GSO_NONE
  };
+ struct tun_msg_ctl ctl;
  size_t total_len = 0;
  int err, mergeable;
  s16 headcount;
@@ -1166,8 +1187,11 @@ static void handle_rx(struct vhost_net *net)
  goto out;
  }
  busyloop_intr = false;
- if (nvq->rx_ring)
- msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
+ if (nvq->rx_ring) {
+ ctl.type = TUN_MSG_PKT;
+ ctl.ptr = vhost_net_buf_consume(&nvq->rxq);
+ msg.msg_control = &ctl;
+ }
  /* On overrun, truncate and discard */
  if (unlikely(headcount > UIO_MAXIOV)) {
  iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
@@ -1346,8 +1370,8 @@ static struct socket *vhost_net_stop_vq(struct vhost_net *n,
  mutex_lock(&vq->mutex);
  sock = vq->private_data;
  vhost_net_disable_vq(n, vq);
- vq->private_data = NULL;
  vhost_net_buf_unproduce(nvq);
+ vq->private_data = NULL;
  nvq->rx_ring = NULL;
  mutex_unlock(&vq->mutex);
  return sock;
@@ -1538,8 +1562,8 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
  }
 
  vhost_net_disable_vq(n, vq);
- vq->private_data = sock;
  vhost_net_buf_unproduce(nvq);
+ vq->private_data = sock;
  r = vhost_vq_init_access(vq);
  if (r)
  goto err_used;
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index 5bda8cf457b6..bb94843e3829 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -11,8 +11,26 @@
 
 #define TUN_XDP_FLAG 0x1UL
 
+/*
+ * tun_msg_ctl types
+ */
+
 #define TUN_MSG_UBUF 1
 #define TUN_MSG_PTR  2
+/*
+ * Used for passing a packet pointer from vhost to tun
+ */
+#define TUN_MSG_PKT  3
+/*
+ * Used for passing an array of pointer from vhost to tun.
+ * tun consumes packets from ptr ring and stores in pointer array.
+ */
+#define TUN_MSG_CONSUME_PKTS    4
+/*
+ * Used for passing an array of pointer from vhost to tun.
+ * tun consumes get pointer from array and puts back into ptr ring.
+ */
+#define TUN_MSG_UNCONSUME_PKTS  5
 struct tun_msg_ctl {
  unsigned short type;
  unsigned short num;
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 06/18] tuntap: remove usage of ptr ring in vhost_net

Prashant Bhole
In reply to this post by Prashant Bhole
Remove usage of ptr ring of tuntap in vhost_net and remove the
functions exported from tuntap drivers to get ptr ring.

Signed-off-by: Prashant Bhole <[hidden email]>
---
 drivers/net/tap.c      | 13 -------------
 drivers/net/tun.c      | 13 -------------
 drivers/vhost/net.c    | 31 ++++---------------------------
 include/linux/if_tap.h |  5 -----
 include/linux/if_tun.h |  5 -----
 5 files changed, 4 insertions(+), 63 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 8635cdfd7aa4..6426501b8d0e 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1298,19 +1298,6 @@ struct socket *tap_get_socket(struct file *file)
 }
 EXPORT_SYMBOL_GPL(tap_get_socket);
 
-struct ptr_ring *tap_get_ptr_ring(struct file *file)
-{
- struct tap_queue *q;
-
- if (file->f_op != &tap_fops)
- return ERR_PTR(-EINVAL);
- q = file->private_data;
- if (!q)
- return ERR_PTR(-EBADFD);
- return &q->ring;
-}
-EXPORT_SYMBOL_GPL(tap_get_ptr_ring);
-
 int tap_queue_resize(struct tap_dev *tap)
 {
  struct net_device *dev = tap->dev;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 4f28f2387435..d078b4659897 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -3750,19 +3750,6 @@ struct socket *tun_get_socket(struct file *file)
 }
 EXPORT_SYMBOL_GPL(tun_get_socket);
 
-struct ptr_ring *tun_get_tx_ring(struct file *file)
-{
- struct tun_file *tfile;
-
- if (file->f_op != &tun_fops)
- return ERR_PTR(-EINVAL);
- tfile = file->private_data;
- if (!tfile)
- return ERR_PTR(-EBADFD);
- return &tfile->tx_ring;
-}
-EXPORT_SYMBOL_GPL(tun_get_tx_ring);
-
 module_init(tun_init);
 module_exit(tun_cleanup);
 MODULE_DESCRIPTION(DRV_DESCRIPTION);
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 0f91b374a558..2e069d1ef946 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -122,7 +122,6 @@ struct vhost_net_virtqueue {
  /* Reference counting for outstanding ubufs.
  * Protected by vq mutex. Writers must also take device mutex. */
  struct vhost_net_ubuf_ref *ubufs;
- struct ptr_ring *rx_ring;
  struct vhost_net_buf rxq;
  /* Batched XDP buffs */
  struct xdp_buff *xdp;
@@ -997,8 +996,9 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
  int len = 0;
  unsigned long flags;
 
- if (rvq->rx_ring)
- return vhost_net_buf_peek(rvq);
+ len = vhost_net_buf_peek(rvq);
+ if (len)
+ return len;
 
  spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
  head = skb_peek(&sk->sk_receive_queue);
@@ -1187,7 +1187,7 @@ static void handle_rx(struct vhost_net *net)
  goto out;
  }
  busyloop_intr = false;
- if (nvq->rx_ring) {
+ if (!vhost_net_buf_is_empty(&nvq->rxq)) {
  ctl.type = TUN_MSG_PKT;
  ctl.ptr = vhost_net_buf_consume(&nvq->rxq);
  msg.msg_control = &ctl;
@@ -1343,7 +1343,6 @@ static int vhost_net_open(struct inode *inode, struct file *f)
  n->vqs[i].batched_xdp = 0;
  n->vqs[i].vhost_hlen = 0;
  n->vqs[i].sock_hlen = 0;
- n->vqs[i].rx_ring = NULL;
  vhost_net_buf_init(&n->vqs[i].rxq);
  }
  vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
@@ -1372,7 +1371,6 @@ static struct socket *vhost_net_stop_vq(struct vhost_net *n,
  vhost_net_disable_vq(n, vq);
  vhost_net_buf_unproduce(nvq);
  vq->private_data = NULL;
- nvq->rx_ring = NULL;
  mutex_unlock(&vq->mutex);
  return sock;
 }
@@ -1468,25 +1466,6 @@ static struct socket *get_raw_socket(int fd)
  return ERR_PTR(r);
 }
 
-static struct ptr_ring *get_tap_ptr_ring(int fd)
-{
- struct ptr_ring *ring;
- struct file *file = fget(fd);
-
- if (!file)
- return NULL;
- ring = tun_get_tx_ring(file);
- if (!IS_ERR(ring))
- goto out;
- ring = tap_get_ptr_ring(file);
- if (!IS_ERR(ring))
- goto out;
- ring = NULL;
-out:
- fput(file);
- return ring;
-}
-
 static struct socket *get_tap_socket(int fd)
 {
  struct file *file = fget(fd);
@@ -1570,8 +1549,6 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
  r = vhost_net_enable_vq(n, vq);
  if (r)
  goto err_used;
- if (index == VHOST_NET_VQ_RX)
- nvq->rx_ring = get_tap_ptr_ring(fd);
 
  oldubufs = nvq->ubufs;
  nvq->ubufs = ubufs;
diff --git a/include/linux/if_tap.h b/include/linux/if_tap.h
index 915a187cfabd..68fe366fb185 100644
--- a/include/linux/if_tap.h
+++ b/include/linux/if_tap.h
@@ -4,7 +4,6 @@
 
 #if IS_ENABLED(CONFIG_TAP)
 struct socket *tap_get_socket(struct file *);
-struct ptr_ring *tap_get_ptr_ring(struct file *file);
 #else
 #include <linux/err.h>
 #include <linux/errno.h>
@@ -14,10 +13,6 @@ static inline struct socket *tap_get_socket(struct file *f)
 {
  return ERR_PTR(-EINVAL);
 }
-static inline struct ptr_ring *tap_get_ptr_ring(struct file *f)
-{
- return ERR_PTR(-EINVAL);
-}
 #endif /* CONFIG_TAP */
 
 #include <net/sock.h>
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index bb94843e3829..f01a255e076d 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -44,7 +44,6 @@ struct tun_xdp_hdr {
 
 #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
 struct socket *tun_get_socket(struct file *);
-struct ptr_ring *tun_get_tx_ring(struct file *file);
 bool tun_is_xdp_frame(void *ptr);
 void *tun_xdp_to_ptr(void *ptr);
 void *tun_ptr_to_xdp(void *ptr);
@@ -58,10 +57,6 @@ static inline struct socket *tun_get_socket(struct file *f)
 {
  return ERR_PTR(-EINVAL);
 }
-static inline struct ptr_ring *tun_get_tx_ring(struct file *f)
-{
- return ERR_PTR(-EINVAL);
-}
 static inline bool tun_is_xdp_frame(void *ptr)
 {
  return false;
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 07/18] tun: set offloaded xdp program

Prashant Bhole
In reply to this post by Prashant Bhole
From: Jason Wang <[hidden email]>

This patch introduces an ioctl way to set an offloaded XDP program
to tun driver. This ioctl will be used by qemu to offload XDP program
from virtio_net in the guest.

Signed-off-by: Jason Wang <[hidden email]>
Signed-off-by: Prashant Bhole <[hidden email]>
---
 drivers/net/tun.c           | 19 ++++++++++++++-----
 include/uapi/linux/if_tun.h |  1 +
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d078b4659897..ecb49101b0b5 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -241,6 +241,7 @@ struct tun_struct {
  struct bpf_prog __rcu *xdp_prog;
  struct tun_prog __rcu *steering_prog;
  struct tun_prog __rcu *filter_prog;
+ struct tun_prog __rcu *offloaded_xdp_prog;
  struct ethtool_link_ksettings link_ksettings;
 };
 
@@ -2256,7 +2257,7 @@ static void tun_prog_free(struct rcu_head *rcu)
 {
  struct tun_prog *prog = container_of(rcu, struct tun_prog, rcu);
 
- bpf_prog_destroy(prog->prog);
+ bpf_prog_put(prog->prog);
  kfree(prog);
 }
 
@@ -2301,6 +2302,7 @@ static void tun_free_netdev(struct net_device *dev)
  security_tun_dev_free_security(tun->security);
  __tun_set_ebpf(tun, &tun->steering_prog, NULL);
  __tun_set_ebpf(tun, &tun->filter_prog, NULL);
+ __tun_set_ebpf(tun, &tun->offloaded_xdp_prog, NULL);
 }
 
 static void tun_setup(struct net_device *dev)
@@ -3036,7 +3038,7 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
 }
 
 static int tun_set_ebpf(struct tun_struct *tun, struct tun_prog **prog_p,
- void __user *data)
+ void __user *data, int type)
 {
  struct bpf_prog *prog;
  int fd;
@@ -3047,7 +3049,7 @@ static int tun_set_ebpf(struct tun_struct *tun, struct tun_prog **prog_p,
  if (fd == -1) {
  prog = NULL;
  } else {
- prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
+ prog = bpf_prog_get_type(fd, type);
  if (IS_ERR(prog))
  return PTR_ERR(prog);
  }
@@ -3345,11 +3347,18 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
  break;
 
  case TUNSETSTEERINGEBPF:
- ret = tun_set_ebpf(tun, &tun->steering_prog, argp);
+ ret = tun_set_ebpf(tun, &tun->steering_prog, argp,
+   BPF_PROG_TYPE_SOCKET_FILTER);
  break;
 
  case TUNSETFILTEREBPF:
- ret = tun_set_ebpf(tun, &tun->filter_prog, argp);
+ ret = tun_set_ebpf(tun, &tun->filter_prog, argp,
+   BPF_PROG_TYPE_SOCKET_FILTER);
+ break;
+
+ case TUNSETOFFLOADEDXDP:
+ ret = tun_set_ebpf(tun, &tun->offloaded_xdp_prog, argp,
+   BPF_PROG_TYPE_XDP);
  break;
 
  case TUNSETCARRIER:
diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index 454ae31b93c7..21dbd8db2401 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -61,6 +61,7 @@
 #define TUNSETFILTEREBPF _IOR('T', 225, int)
 #define TUNSETCARRIER _IOW('T', 226, int)
 #define TUNGETDEVNETNS _IO('T', 227)
+#define TUNSETOFFLOADEDXDP _IOW('T', 228, int)
 
 /* TUNSETIFF ifr flags */
 #define IFF_TUN 0x0001
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 08/18] tun: run offloaded XDP program in Tx path

Prashant Bhole
In reply to this post by Prashant Bhole
run offloaded XDP program as soon as packet is removed from the ptr
ring. Since this is XDP in Tx path, the traditional handling of
XDP actions XDP_TX/REDIRECT isn't valid. For this reason we call
do_xdp_generic_core instead of do_xdp_generic. do_xdp_generic_core
just runs the program and leaves the action handling to us.

Signed-off-by: Prashant Bhole <[hidden email]>
---
 drivers/net/tun.c | 149 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 146 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index ecb49101b0b5..466ea69f00ee 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -131,6 +131,7 @@ struct tap_filter {
 /* MAX_TAP_QUEUES 256 is chosen to allow rx/tx queues to be equal
  * to max number of VCPUs in guest. */
 #define MAX_TAP_QUEUES 256
+#define MAX_TAP_BATCH 64
 #define MAX_TAP_FLOWS  4096
 
 #define TUN_FLOW_EXPIRE (3 * HZ)
@@ -2156,6 +2157,109 @@ static ssize_t tun_put_user(struct tun_struct *tun,
  return total;
 }
 
+static struct sk_buff *tun_prepare_xdp_skb(struct sk_buff *skb)
+{
+ struct sk_buff *nskb;
+
+ if (skb_shared(skb) || skb_cloned(skb)) {
+ nskb = skb_copy(skb, GFP_ATOMIC);
+ consume_skb(skb);
+ return nskb;
+ }
+
+ return skb;
+}
+
+static u32 tun_do_xdp_offload_generic(struct tun_struct *tun,
+      struct sk_buff *skb)
+{
+ struct tun_prog *xdp_prog;
+ struct xdp_buff xdp;
+ u32 act = XDP_PASS;
+
+ xdp_prog = rcu_dereference(tun->offloaded_xdp_prog);
+ if (xdp_prog) {
+ skb = tun_prepare_xdp_skb(skb);
+ if (!skb) {
+ act = XDP_DROP;
+ kfree_skb(skb);
+ goto drop;
+ }
+
+ act = do_xdp_generic_core(skb, &xdp, xdp_prog->prog);
+ switch (act) {
+ case XDP_TX:
+ /*
+ * Rx path generic XDP will be called in this path
+ */
+ netif_receive_skb(skb);
+ break;
+ case XDP_PASS:
+ break;
+ case XDP_REDIRECT:
+ /*
+ * Since we are not handling this case yet, let's free
+ * skb here. In case of XDP_DROP/XDP_ABORTED, the skb
+ * was already freed in do_xdp_generic_core()
+ */
+ kfree_skb(skb);
+ /* fall through */
+ default:
+ bpf_warn_invalid_xdp_action(act);
+ /* fall through */
+ case XDP_ABORTED:
+ trace_xdp_exception(tun->dev, xdp_prog->prog, act);
+ /* fall through */
+ case XDP_DROP:
+ goto drop;
+ }
+ }
+
+ return act;
+drop:
+ this_cpu_inc(tun->pcpu_stats->tx_dropped);
+ return act;
+}
+
+static u32 tun_do_xdp_offload(struct tun_struct *tun, struct tun_file *tfile,
+      struct xdp_frame *frame)
+{
+ struct tun_prog *xdp_prog;
+ struct tun_page tpage;
+ struct xdp_buff xdp;
+ u32 act = XDP_PASS;
+ int flush = 0;
+
+ xdp_prog = rcu_dereference(tun->offloaded_xdp_prog);
+ if (xdp_prog) {
+ xdp.data_hard_start = frame->data - frame->headroom;
+ xdp.data = frame->data;
+ xdp.data_end = xdp.data + frame->len;
+ xdp.data_meta = xdp.data - frame->metasize;
+
+ act = bpf_prog_run_xdp(xdp_prog->prog, &xdp);
+ switch (act) {
+ case XDP_PASS:
+ break;
+ case XDP_TX:
+ /* fall through */
+ case XDP_REDIRECT:
+ /* fall through */
+ default:
+ bpf_warn_invalid_xdp_action(act);
+ /* fall through */
+ case XDP_ABORTED:
+ trace_xdp_exception(tun->dev, xdp_prog->prog, act);
+ /* fall through */
+ case XDP_DROP:
+ xdp_return_frame_rx_napi(frame);
+ break;
+ }
+ }
+
+ return act;
+}
+
 static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
 {
  DECLARE_WAITQUEUE(wait, current);
@@ -2574,6 +2678,47 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
  return ret;
 }
 
+static int tun_consume_packets(struct tun_file *tfile, void **ptr_array, int n)
+{
+ struct tun_prog *xdp_prog;
+ struct xdp_frame *frame;
+ struct tun_struct *tun;
+ int i, num_ptrs;
+ int pkt_cnt = 0;
+ void *pkts[MAX_TAP_BATCH];
+ void *ptr;
+ u32 act;
+
+ if (unlikely(!tfile))
+ return 0;
+
+ if (n > MAX_TAP_BATCH)
+ n = MAX_TAP_BATCH;
+
+ rcu_read_lock();
+ tun = rcu_dereference(tfile->tun);
+ if (unlikely(!tun))
+ return 0;
+ xdp_prog = rcu_dereference(tun->offloaded_xdp_prog);
+
+ num_ptrs = ptr_ring_consume_batched(&tfile->tx_ring, pkts, n);
+ for (i = 0; i < num_ptrs; i++) {
+ ptr = pkts[i];
+ if (tun_is_xdp_frame(ptr)) {
+ frame = tun_ptr_to_xdp(ptr);
+ act = tun_do_xdp_offload(tun, tfile, frame);
+ } else {
+ act = tun_do_xdp_offload_generic(tun, ptr);
+ }
+
+ if (act == XDP_PASS)
+ ptr_array[pkt_cnt++] = ptr;
+ }
+
+ rcu_read_unlock();
+ return pkt_cnt;
+}
+
 static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
        int flags)
 {
@@ -2594,9 +2739,7 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
  ptr = ctl->ptr;
  break;
  case TUN_MSG_CONSUME_PKTS:
- ret = ptr_ring_consume_batched(&tfile->tx_ring,
-       ctl->ptr,
-       ctl->num);
+ ret = tun_consume_packets(tfile, ctl->ptr, ctl->num);
  goto out;
  case TUN_MSG_UNCONSUME_PKTS:
  ptr_ring_unconsume(&tfile->tx_ring, ctl->ptr,
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 09/18] tun: add a way to inject Tx path packet into Rx path

Prashant Bhole
In reply to this post by Prashant Bhole
In order to support XDP_TX from offloaded XDP program, we need a way
to inject Tx path packet into Rx path. Let's modify the Rx path
function tun_xdp_one() for this purpose.

This patch adds a parameter to pass information whether packet has
virtio_net header. When header isn't present, it is considered as
XDP_TX'ed packet from offloaded program.

Signed-off-by: Prashant Bhole <[hidden email]>
---
 drivers/net/tun.c | 35 ++++++++++++++++++++++++++++-------
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 466ea69f00ee..8d6cdd3e5139 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2221,6 +2221,13 @@ static u32 tun_do_xdp_offload_generic(struct tun_struct *tun,
  return act;
 }
 
+static int tun_xdp_one(struct tun_struct *tun,
+       struct tun_file *tfile,
+       struct xdp_buff *xdp, int *flush,
+       struct tun_page *tpage, int has_hdr);
+
+static void tun_put_page(struct tun_page *tpage);
+
 static u32 tun_do_xdp_offload(struct tun_struct *tun, struct tun_file *tfile,
       struct xdp_frame *frame)
 {
@@ -2527,23 +2534,36 @@ static void tun_put_page(struct tun_page *tpage)
 static int tun_xdp_one(struct tun_struct *tun,
        struct tun_file *tfile,
        struct xdp_buff *xdp, int *flush,
-       struct tun_page *tpage)
+       struct tun_page *tpage, int has_hdr)
 {
  unsigned int datasize = xdp->data_end - xdp->data;
- struct tun_xdp_hdr *hdr = xdp->data_hard_start;
- struct virtio_net_hdr *gso = &hdr->gso;
+ struct tun_xdp_hdr *hdr;
+ struct virtio_net_hdr *gso;
  struct tun_pcpu_stats *stats;
  struct bpf_prog *xdp_prog;
  struct sk_buff *skb = NULL;
+ unsigned int headroom;
  u32 rxhash = 0, act;
- int buflen = hdr->buflen;
+ int buflen;
  int err = 0;
  bool skb_xdp = false;
  struct page *page;
 
+ if (has_hdr) {
+ hdr = xdp->data_hard_start;
+ gso = &hdr->gso;
+ buflen = hdr->buflen;
+ } else {
+ /* came here from tun tx path */
+ xdp->data_hard_start -= sizeof(struct xdp_frame);
+ headroom = xdp->data - xdp->data_hard_start;
+ buflen = datasize + headroom +
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+ }
+
  xdp_prog = rcu_dereference(tun->xdp_prog);
  if (xdp_prog) {
- if (gso->gso_type) {
+ if (has_hdr && gso->gso_type) {
  skb_xdp = true;
  goto build;
  }
@@ -2588,7 +2608,8 @@ static int tun_xdp_one(struct tun_struct *tun,
  skb_reserve(skb, xdp->data - xdp->data_hard_start);
  skb_put(skb, xdp->data_end - xdp->data);
 
- if (virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) {
+ if (has_hdr &&
+    virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) {
  this_cpu_inc(tun->pcpu_stats->rx_frame_errors);
  kfree_skb(skb);
  err = -EINVAL;
@@ -2652,7 +2673,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 
  for (i = 0; i < n; i++) {
  xdp = &((struct xdp_buff *)ctl->ptr)[i];
- tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
+ tun_xdp_one(tun, tfile, xdp, &flush, &tpage, true);
  }
 
  if (flush)
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 10/18] tun: handle XDP_TX action of offloaded program

Prashant Bhole
In reply to this post by Prashant Bhole
When offloaded program returns XDP_TX, we need to inject the packet in
Rx path of tun. This patch injects such packets in Rx path using
tun_xdp_one.

Signed-off-by: Prashant Bhole <[hidden email]>
---
 drivers/net/tun.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 8d6cdd3e5139..084ca95358fe 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2249,7 +2249,13 @@ static u32 tun_do_xdp_offload(struct tun_struct *tun, struct tun_file *tfile,
  case XDP_PASS:
  break;
  case XDP_TX:
- /* fall through */
+ tpage.page = NULL;
+ tpage.count = 0;
+ tun_xdp_one(tun, tfile, &xdp, &flush, &tpage, false);
+ tun_put_page(&tpage);
+ if (flush)
+ xdp_do_flush_map();
+ break;
  case XDP_REDIRECT:
  /* fall through */
  default:
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 11/18] tun: run xdp prog when tun is read from file interface

Prashant Bhole
In reply to this post by Prashant Bhole
It handles the case when qemu performs read on tun using file
operations.

Signed-off-by: Prashant Bhole <[hidden email]>
---
 drivers/net/tun.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 084ca95358fe..639921c10e32 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2318,8 +2318,10 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
    struct iov_iter *to,
    int noblock, void *ptr)
 {
+ struct xdp_frame *frame;
  ssize_t ret;
  int err;
+ u32 act;
 
  tun_debug(KERN_INFO, tun, "tun_do_read\n");
 
@@ -2333,6 +2335,15 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
  ptr = tun_ring_recv(tfile, noblock, &err);
  if (!ptr)
  return err;
+
+ if (tun_is_xdp_frame(ptr)) {
+ frame = tun_ptr_to_xdp(ptr);
+ act = tun_do_xdp_offload(tun, tfile, frame);
+ } else {
+ act = tun_do_xdp_offload_generic(tun, ptr);
+ }
+ if (act != XDP_PASS)
+ return err;
  }
 
  if (tun_is_xdp_frame(ptr)) {
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 12/18] virtio-net: store xdp_prog in device

Prashant Bhole
In reply to this post by Prashant Bhole
From: Jason Wang <[hidden email]>

This is a preparation for adding XDP offload support in virtio_net
driver. By storing XDP program in virtionet_info will make it
consistent with the offloaded program which will introduce in next
patches.

Signed-off-by: Jason Wang <[hidden email]>
Co-developed-by: Prashant Bhole <[hidden email]>
Signed-off-by: Prashant Bhole <[hidden email]>
---
 drivers/net/virtio_net.c | 62 ++++++++++++++++------------------------
 1 file changed, 25 insertions(+), 37 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4d7d5434cc5d..c8bbb1b90c1c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -137,8 +137,6 @@ struct receive_queue {
 
  struct napi_struct napi;
 
- struct bpf_prog __rcu *xdp_prog;
-
  struct virtnet_rq_stats stats;
 
  /* Chain pages by the private ptr. */
@@ -229,6 +227,8 @@ struct virtnet_info {
 
  /* failover when STANDBY feature enabled */
  struct failover *failover;
+
+ struct bpf_prog __rcu *xdp_prog;
 };
 
 struct padded_vnet_hdr {
@@ -486,7 +486,6 @@ static int virtnet_xdp_xmit(struct net_device *dev,
     int n, struct xdp_frame **frames, u32 flags)
 {
  struct virtnet_info *vi = netdev_priv(dev);
- struct receive_queue *rq = vi->rq;
  struct bpf_prog *xdp_prog;
  struct send_queue *sq;
  unsigned int len;
@@ -501,7 +500,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
  /* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
  * indicate XDP resources have been successfully allocated.
  */
- xdp_prog = rcu_dereference(rq->xdp_prog);
+ xdp_prog = rcu_dereference(vi->xdp_prog);
  if (!xdp_prog)
  return -ENXIO;
 
@@ -649,7 +648,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
  stats->bytes += len;
 
  rcu_read_lock();
- xdp_prog = rcu_dereference(rq->xdp_prog);
+ xdp_prog = rcu_dereference(vi->xdp_prog);
  if (xdp_prog) {
  struct virtio_net_hdr_mrg_rxbuf *hdr = buf + header_offset;
  struct xdp_frame *xdpf;
@@ -798,7 +797,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
  stats->bytes += len - vi->hdr_len;
 
  rcu_read_lock();
- xdp_prog = rcu_dereference(rq->xdp_prog);
+ xdp_prog = rcu_dereference(vi->xdp_prog);
  if (xdp_prog) {
  struct xdp_frame *xdpf;
  struct page *xdp_page;
@@ -2060,7 +2059,7 @@ static int virtnet_set_channels(struct net_device *dev,
  * also when XDP is loaded all RX queues have XDP programs so we only
  * need to check a single RX queue.
  */
- if (vi->rq[0].xdp_prog)
+ if (vi->xdp_prog)
  return -EINVAL;
 
  get_online_cpus();
@@ -2441,13 +2440,10 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
  return -ENOMEM;
  }
 
- old_prog = rtnl_dereference(vi->rq[0].xdp_prog);
+ old_prog = rtnl_dereference(vi->xdp_prog);
  if (!prog && !old_prog)
  return 0;
 
- if (prog)
- bpf_prog_add(prog, vi->max_queue_pairs - 1);
-
  /* Make sure NAPI is not using any XDP TX queues for RX. */
  if (netif_running(dev)) {
  for (i = 0; i < vi->max_queue_pairs; i++) {
@@ -2457,11 +2453,8 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
  }
 
  if (!prog) {
- for (i = 0; i < vi->max_queue_pairs; i++) {
- rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
- if (i == 0)
- virtnet_restore_guest_offloads(vi);
- }
+ rcu_assign_pointer(vi->xdp_prog, prog);
+ virtnet_restore_guest_offloads(vi);
  synchronize_net();
  }
 
@@ -2472,16 +2465,12 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
  vi->xdp_queue_pairs = xdp_qp;
 
  if (prog) {
- for (i = 0; i < vi->max_queue_pairs; i++) {
- rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
- if (i == 0 && !old_prog)
- virtnet_clear_guest_offloads(vi);
- }
+ rcu_assign_pointer(vi->xdp_prog, prog);
+ if (!old_prog)
+ virtnet_clear_guest_offloads(vi);
  }
 
  for (i = 0; i < vi->max_queue_pairs; i++) {
- if (old_prog)
- bpf_prog_put(old_prog);
  if (netif_running(dev)) {
  virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
  virtnet_napi_tx_enable(vi, vi->sq[i].vq,
@@ -2489,13 +2478,15 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
  }
  }
 
+ if (old_prog)
+ bpf_prog_put(old_prog);
+
  return 0;
 
 err:
  if (!prog) {
  virtnet_clear_guest_offloads(vi);
- for (i = 0; i < vi->max_queue_pairs; i++)
- rcu_assign_pointer(vi->rq[i].xdp_prog, old_prog);
+ rcu_assign_pointer(vi->xdp_prog, old_prog);
  }
 
  if (netif_running(dev)) {
@@ -2514,13 +2505,11 @@ static u32 virtnet_xdp_query(struct net_device *dev)
 {
  struct virtnet_info *vi = netdev_priv(dev);
  const struct bpf_prog *xdp_prog;
- int i;
 
- for (i = 0; i < vi->max_queue_pairs; i++) {
- xdp_prog = rtnl_dereference(vi->rq[i].xdp_prog);
- if (xdp_prog)
- return xdp_prog->aux->id;
- }
+ xdp_prog = rtnl_dereference(vi->xdp_prog);
+ if (xdp_prog)
+ return xdp_prog->aux->id;
+
  return 0;
 }
 
@@ -2657,18 +2646,17 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 
 static void _free_receive_bufs(struct virtnet_info *vi)
 {
- struct bpf_prog *old_prog;
+ struct bpf_prog *old_prog = rtnl_dereference(vi->xdp_prog);
  int i;
 
  for (i = 0; i < vi->max_queue_pairs; i++) {
  while (vi->rq[i].pages)
  __free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
-
- old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
- RCU_INIT_POINTER(vi->rq[i].xdp_prog, NULL);
- if (old_prog)
- bpf_prog_put(old_prog);
  }
+
+ RCU_INIT_POINTER(vi->xdp_prog, NULL);
+ if (old_prog)
+ bpf_prog_put(old_prog);
 }
 
 static void free_receive_bufs(struct virtnet_info *vi)
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 13/18] virtio_net: use XDP attachment helpers

Prashant Bhole
In reply to this post by Prashant Bhole
Next patches will introduce virtio_net XDP offloading. In that case
we need to manage offloaded and non-offload program. In order to make
it consistent this patch introduces use of XDP attachment helpers.

Signed-off-by: Prashant Bhole <[hidden email]>
---
 drivers/net/virtio_net.c | 30 +++++++++++-------------------
 1 file changed, 11 insertions(+), 19 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c8bbb1b90c1c..cee5c2b15c62 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -229,6 +229,8 @@ struct virtnet_info {
  struct failover *failover;
 
  struct bpf_prog __rcu *xdp_prog;
+
+ struct xdp_attachment_info xdp;
 };
 
 struct padded_vnet_hdr {
@@ -2398,15 +2400,19 @@ static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
  return virtnet_set_guest_offloads(vi, offloads);
 }
 
-static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
-   struct netlink_ext_ack *extack)
+static int virtnet_xdp_set(struct net_device *dev, struct netdev_bpf *bpf)
 {
  unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
+ struct netlink_ext_ack *extack = bpf->extack;
  struct virtnet_info *vi = netdev_priv(dev);
+ struct bpf_prog *prog = bpf->prog;
  struct bpf_prog *old_prog;
  u16 xdp_qp = 0, curr_qp;
  int i, err;
 
+ if (!xdp_attachment_flags_ok(&vi->xdp, bpf))
+ return -EBUSY;
+
  if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
     && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
         virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
@@ -2478,8 +2484,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
  }
  }
 
- if (old_prog)
- bpf_prog_put(old_prog);
+ xdp_attachment_setup(&vi->xdp, bpf);
 
  return 0;
 
@@ -2501,26 +2506,13 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
  return err;
 }
 
-static u32 virtnet_xdp_query(struct net_device *dev)
-{
- struct virtnet_info *vi = netdev_priv(dev);
- const struct bpf_prog *xdp_prog;
-
- xdp_prog = rtnl_dereference(vi->xdp_prog);
- if (xdp_prog)
- return xdp_prog->aux->id;
-
- return 0;
-}
-
 static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 {
  switch (xdp->command) {
  case XDP_SETUP_PROG:
- return virtnet_xdp_set(dev, xdp->prog, xdp->extack);
+ return virtnet_xdp_set(dev, xdp);
  case XDP_QUERY_PROG:
- xdp->prog_id = virtnet_xdp_query(dev);
- return 0;
+ return xdp_attachment_query(&vi->xdp, xdp);
  default:
  return -EINVAL;
  }
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 14/18] virtio_net: add XDP prog offload infrastructure

Prashant Bhole
In reply to this post by Prashant Bhole
From: Jason Wang <[hidden email]>

This patch prepares virtio_net of XDP offloading. It adds data
structures, blank callback implementations for bpf_prog_offload_ops.
It also implements ndo_init, ndo_uninit operations for setting up
offload related data structures.

Signed-off-by: Jason Wang <[hidden email]>
Co-developed-by: Prashant Bhole <[hidden email]>
Signed-off-by: Prashant Bhole <[hidden email]>
---
 drivers/net/virtio_net.c | 103 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 103 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index cee5c2b15c62..a1088d0114f2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -229,8 +229,14 @@ struct virtnet_info {
  struct failover *failover;
 
  struct bpf_prog __rcu *xdp_prog;
+ struct bpf_prog __rcu *offload_xdp_prog;
 
  struct xdp_attachment_info xdp;
+ struct xdp_attachment_info xdp_hw;
+
+ struct bpf_offload_dev *bpf_dev;
+
+ struct list_head bpf_bound_progs;
 };
 
 struct padded_vnet_hdr {
@@ -258,6 +264,14 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
  return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
 }
 
+struct virtnet_bpf_bound_prog {
+ struct virtnet_info *vi;
+ struct bpf_prog *prog;
+ struct list_head list;
+ u32 len;
+ struct bpf_insn insnsi[0];
+};
+
 /* Converting between virtqueue no. and kernel tx/rx queue no.
  * 0:rx0 1:tx0 2:rx1 3:tx1 ... 2N:rxN 2N+1:txN 2N+2:cvq
  */
@@ -2506,13 +2520,63 @@ static int virtnet_xdp_set(struct net_device *dev, struct netdev_bpf *bpf)
  return err;
 }
 
+static int virtnet_bpf_verify_insn(struct bpf_verifier_env *env, int insn_idx,
+   int prev_insn)
+{
+ return 0;
+}
+
+static void virtnet_bpf_destroy_prog(struct bpf_prog *prog)
+{
+}
+
+static int virtnet_xdp_set_offload(struct virtnet_info *vi,
+   struct netdev_bpf *bpf)
+{
+ return -EBUSY;
+}
+
+static int virtnet_bpf_verifier_setup(struct bpf_prog *prog)
+{
+ return -ENOMEM;
+}
+
+static int virtnet_bpf_verifier_prep(struct bpf_prog *prog)
+{
+ return 0;
+}
+
+static int virtnet_bpf_translate(struct bpf_prog *prog)
+{
+ return 0;
+}
+
+static int virtnet_bpf_finalize(struct bpf_verifier_env *env)
+{
+ return 0;
+}
+
+static const struct bpf_prog_offload_ops virtnet_bpf_dev_ops = {
+ .setup = virtnet_bpf_verifier_setup,
+ .prepare = virtnet_bpf_verifier_prep,
+ .insn_hook = virtnet_bpf_verify_insn,
+ .finalize = virtnet_bpf_finalize,
+ .translate = virtnet_bpf_translate,
+ .destroy = virtnet_bpf_destroy_prog,
+};
+
 static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 {
+ struct virtnet_info *vi = netdev_priv(dev);
  switch (xdp->command) {
  case XDP_SETUP_PROG:
  return virtnet_xdp_set(dev, xdp);
  case XDP_QUERY_PROG:
  return xdp_attachment_query(&vi->xdp, xdp);
+ case XDP_SETUP_PROG_HW:
+ return virtnet_xdp_set_offload(vi, xdp);
+ case XDP_QUERY_PROG_HW:
+ return xdp_attachment_query(&vi->xdp_hw, xdp);
  default:
  return -EINVAL;
  }
@@ -2559,7 +2623,46 @@ static int virtnet_set_features(struct net_device *dev,
  return 0;
 }
 
+static int virtnet_bpf_init(struct virtnet_info *vi)
+{
+ int err;
+
+ vi->bpf_dev = bpf_offload_dev_create(&virtnet_bpf_dev_ops, NULL);
+ err = PTR_ERR_OR_ZERO(vi->bpf_dev);
+ if (err)
+ return err;
+
+ err = bpf_offload_dev_netdev_register(vi->bpf_dev, vi->dev);
+ if (err)
+ goto err_netdev_register;
+
+ INIT_LIST_HEAD(&vi->bpf_bound_progs);
+
+ return 0;
+
+err_netdev_register:
+ bpf_offload_dev_destroy(vi->bpf_dev);
+ return err;
+}
+
+static int virtnet_init(struct net_device *dev)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+
+ return virtnet_bpf_init(vi);
+}
+
+static void virtnet_uninit(struct net_device *dev)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+
+ bpf_offload_dev_netdev_unregister(vi->bpf_dev, vi->dev);
+ bpf_offload_dev_destroy(vi->bpf_dev);
+}
+
 static const struct net_device_ops virtnet_netdev = {
+ .ndo_init            = virtnet_init,
+ .ndo_uninit          = virtnet_uninit,
  .ndo_open            = virtnet_open,
  .ndo_stop       = virtnet_close,
  .ndo_start_xmit      = start_xmit,
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 15/18] virtio_net: implement XDP prog offload functionality

Prashant Bhole
In reply to this post by Prashant Bhole
From: Jason Wang <[hidden email]>

This patch implements bpf_prog_offload_ops callbacks and adds handling
for XDP_SETUP_PROG_HW. Handling of XDP_SETUP_PROG_HW involves setting
up struct virtio_net_ctrl_ebpf_prog and appending program instructions
to it. This control buffer is sent to Qemu with class
VIRTIO_NET_CTRL_EBPF and command VIRTIO_NET_BPF_CMD_SET_OFFLOAD.
The expected behavior from Qemu is that it should try to load the
program in host os and report the status.

It also adds restriction to have either driver or offloaded program
at a time. This restriction can be removed later after proper testing.

Signed-off-by: Jason Wang <[hidden email]>
Co-developed-by: Prashant Bhole <[hidden email]>
Signed-off-by: Prashant Bhole <[hidden email]>
---
 drivers/net/virtio_net.c        | 114 +++++++++++++++++++++++++++++++-
 include/uapi/linux/virtio_net.h |  27 ++++++++
 2 files changed, 139 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a1088d0114f2..dddfbb2a2075 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -169,6 +169,7 @@ struct control_buf {
  u8 allmulti;
  __virtio16 vid;
  __virtio64 offloads;
+ struct virtio_net_ctrl_ebpf_prog prog_ctrl;
 };
 
 struct virtnet_info {
@@ -272,6 +273,8 @@ struct virtnet_bpf_bound_prog {
  struct bpf_insn insnsi[0];
 };
 
+#define VIRTNET_EA(extack, msg) NL_SET_ERR_MSG_MOD((extack), msg)
+
 /* Converting between virtqueue no. and kernel tx/rx queue no.
  * 0:rx0 1:tx0 2:rx1 3:tx1 ... 2N:rxN 2N+1:txN 2N+2:cvq
  */
@@ -2427,6 +2430,11 @@ static int virtnet_xdp_set(struct net_device *dev, struct netdev_bpf *bpf)
  if (!xdp_attachment_flags_ok(&vi->xdp, bpf))
  return -EBUSY;
 
+ if (rtnl_dereference(vi->offload_xdp_prog)) {
+ VIRTNET_EA(bpf->extack, "program already attached in offload mode");
+ return -EINVAL;
+ }
+
  if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
     && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
         virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
@@ -2528,17 +2536,114 @@ static int virtnet_bpf_verify_insn(struct bpf_verifier_env *env, int insn_idx,
 
 static void virtnet_bpf_destroy_prog(struct bpf_prog *prog)
 {
+ struct virtnet_bpf_bound_prog *state;
+
+ state = prog->aux->offload->dev_priv;
+ list_del(&state->list);
+ kfree(state);
+}
+
+static int virtnet_xdp_offload_check(struct virtnet_info *vi,
+     struct netdev_bpf *bpf)
+{
+ if (!bpf->prog)
+ return 0;
+
+ if (!bpf->prog->aux->offload) {
+ VIRTNET_EA(bpf->extack, "xdpoffload of non-bound program");
+ return -EINVAL;
+ }
+ if (bpf->prog->aux->offload->netdev != vi->dev) {
+ VIRTNET_EA(bpf->extack, "program bound to different dev");
+ return -EINVAL;
+ }
+
+ if (rtnl_dereference(vi->xdp_prog)) {
+ VIRTNET_EA(bpf->extack, "program already attached in driver mode");
+ return -EINVAL;
+ }
+
+ return 0;
 }
 
 static int virtnet_xdp_set_offload(struct virtnet_info *vi,
    struct netdev_bpf *bpf)
 {
- return -EBUSY;
+ struct virtio_net_ctrl_ebpf_prog *ctrl;
+ struct virtnet_bpf_bound_prog *bound_prog = NULL;
+ struct virtio_device *vdev = vi->vdev;
+ struct bpf_prog *prog = bpf->prog;
+ void *ctrl_buf = NULL;
+ struct scatterlist sg;
+ int prog_len;
+ int err = 0;
+
+ if (!xdp_attachment_flags_ok(&vi->xdp_hw, bpf))
+ return -EBUSY;
+
+ if (prog) {
+ if (prog->type != BPF_PROG_TYPE_XDP)
+ return -EOPNOTSUPP;
+ bound_prog = prog->aux->offload->dev_priv;
+ prog_len = prog->len * sizeof(bound_prog->insnsi[0]);
+
+ ctrl_buf = kmalloc(GFP_KERNEL, sizeof(*ctrl) + prog_len);
+ if (!ctrl_buf)
+ return -ENOMEM;
+ ctrl = ctrl_buf;
+ ctrl->cmd = cpu_to_virtio32(vi->vdev,
+    VIRTIO_NET_BPF_CMD_SET_OFFLOAD);
+ ctrl->len = cpu_to_virtio32(vi->vdev, prog_len);
+ ctrl->gpl_compatible = cpu_to_virtio16(vi->vdev,
+       prog->gpl_compatible);
+ memcpy(ctrl->insns, bound_prog->insnsi,
+       prog->len * sizeof(bound_prog->insnsi[0]));
+ sg_init_one(&sg, ctrl_buf, sizeof(*ctrl) + prog_len);
+ } else {
+ ctrl = &vi->ctrl->prog_ctrl;
+ ctrl->cmd  = cpu_to_virtio32(vi->vdev,
+     VIRTIO_NET_BPF_CMD_UNSET_OFFLOAD);
+ sg_init_one(&sg, ctrl, sizeof(*ctrl));
+ }
+
+ if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_EBPF,
+  VIRTIO_NET_CTRL_EBPF_PROG,
+  &sg)) {
+ dev_warn(&vdev->dev, "Failed to set bpf offload prog\n");
+ err = -EFAULT;
+ goto out;
+ }
+
+ rcu_assign_pointer(vi->offload_xdp_prog, prog);
+
+ xdp_attachment_setup(&vi->xdp_hw, bpf);
+
+out:
+ kfree(ctrl_buf);
+ return err;
 }
 
 static int virtnet_bpf_verifier_setup(struct bpf_prog *prog)
 {
- return -ENOMEM;
+ struct virtnet_info *vi = netdev_priv(prog->aux->offload->netdev);
+ size_t insn_len = prog->len * sizeof(struct bpf_insn);
+ struct virtnet_bpf_bound_prog *state;
+
+ state = kzalloc(sizeof(*state) + insn_len, GFP_KERNEL);
+ if (!state)
+ return -ENOMEM;
+
+ memcpy(&state->insnsi[0], prog->insnsi, insn_len);
+
+ state->vi = vi;
+ state->prog = prog;
+ state->len = prog->len;
+
+ list_add_tail(&state->list, &vi->bpf_bound_progs);
+
+ prog->aux->offload->dev_priv = state;
+
+ return 0;
 }
 
 static int virtnet_bpf_verifier_prep(struct bpf_prog *prog)
@@ -2568,12 +2673,17 @@ static const struct bpf_prog_offload_ops virtnet_bpf_dev_ops = {
 static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 {
  struct virtnet_info *vi = netdev_priv(dev);
+ int err;
+
  switch (xdp->command) {
  case XDP_SETUP_PROG:
  return virtnet_xdp_set(dev, xdp);
  case XDP_QUERY_PROG:
  return xdp_attachment_query(&vi->xdp, xdp);
  case XDP_SETUP_PROG_HW:
+ err = virtnet_xdp_offload_check(vi, xdp);
+ if (err)
+ return err;
  return virtnet_xdp_set_offload(vi, xdp);
  case XDP_QUERY_PROG_HW:
  return xdp_attachment_query(&vi->xdp_hw, xdp);
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index a3715a3224c1..0ea2f7910a5a 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -261,4 +261,31 @@ struct virtio_net_ctrl_mq {
 #define VIRTIO_NET_CTRL_GUEST_OFFLOADS   5
 #define VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET        0
 
+/*
+ * Control XDP offloads offloads
+ *
+ * When guest wants to offload XDP program to the device, it calls
+ * VIRTIO_NET_CTRL_EBPF_PROG along with VIRTIO_NET_BPF_CMD_SET_OFFLOAD
+ * subcommands. When offloading is successful, the device runs offloaded
+ * XDP program for each packet before sending it to the guest.
+ *
+ * VIRTIO_NET_BPF_CMD_UNSET_OFFLOAD removes the the offloaded program from
+ * the device, if exists.
+ */
+
+struct virtio_net_ctrl_ebpf_prog {
+ /* program length in bytes */
+ __virtio32 len;
+ __virtio16 cmd;
+ __virtio16 gpl_compatible;
+ __u8 insns[0];
+};
+
+#define VIRTIO_NET_CTRL_EBPF 6
+ #define VIRTIO_NET_CTRL_EBPF_PROG 1
+
+/* Commands for VIRTIO_NET_CTRL_EBPF_PROG */
+#define VIRTIO_NET_BPF_CMD_SET_OFFLOAD 1
+#define VIRTIO_NET_BPF_CMD_UNSET_OFFLOAD 2
+
 #endif /* _UAPI_LINUX_VIRTIO_NET_H */
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 16/18] bpf: export function __bpf_map_get

Prashant Bhole
In reply to this post by Prashant Bhole
From: Jason Wang <[hidden email]>

__bpf_map_get is necessary to get verify whether an fd corresponds
to a bpf map, without adding a refcount on that map. After exporting
it can be used by a kernel module.

Signed-off-by: Jason Wang <[hidden email]>
Signed-off-by: Prashant Bhole <[hidden email]>
---
 kernel/bpf/syscall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e3461ec59570..e524ab1e7c64 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -737,6 +737,7 @@ struct bpf_map *__bpf_map_get(struct fd f)
 
  return f.file->private_data;
 }
+EXPORT_SYMBOL(__bpf_map_get);
 
 void bpf_map_inc(struct bpf_map *map)
 {
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 17/18] virtio_net: implment XDP map offload functionality

Prashant Bhole
In reply to this post by Prashant Bhole
From: Jason Wang <[hidden email]>

This patch implements:
* Handling of BPF_OFFLOAD_MAP_ALLOC, BPF_OFFLOAD_MAP_FREE:
  Allocate driver specific map data structure. Set up struct
  virtio_net_ctrl_ebpf_map and send the control buffer to Qemu with
  class VIRTIO_NET_CTRL_EBPF, cmd VIRTIO_NET_CTRL_EBPF_MAP. The cmd
  in the control buffer is set to VIRTIO_NET_BPF_CMD_CREATE_MAP. The
  expected behavior from Qemu is that it should perform the action
  as per command and return the status (and map data). In case of
  create map command, Qemu should set the map_fd in the control buffer

* bpf_map_dev_ops operations:
  Common map operations are implemented with use of above mentioned
  struct virtio_net_ctrl_ebpf_map. This control buffer has space for
  storing: key + key or key + value.

* map_fd replacement in a copy of the program:
  Since map are created before the verification of program begins,
  we have map fds from the host side for each offloaded map when
  program verification begins. map fds in the copy of the program are
  replaced with map fds from host side. This copy of program is used
  for offloading.

Signed-off-by: Jason Wang <[hidden email]>
Co-developed-by: Prashant Bhole <[hidden email]>
Signed-off-by: Prashant Bhole <[hidden email]>
---
 drivers/net/virtio_net.c        | 241 ++++++++++++++++++++++++++++++++
 include/uapi/linux/virtio_net.h |  23 +++
 2 files changed, 264 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index dddfbb2a2075..91a94b787c64 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -238,6 +238,7 @@ struct virtnet_info {
  struct bpf_offload_dev *bpf_dev;
 
  struct list_head bpf_bound_progs;
+ struct list_head map_list;
 };
 
 struct padded_vnet_hdr {
@@ -275,6 +276,13 @@ struct virtnet_bpf_bound_prog {
 
 #define VIRTNET_EA(extack, msg) NL_SET_ERR_MSG_MOD((extack), msg)
 
+struct virtnet_bpf_map {
+ struct bpf_offloaded_map *offmap;
+ struct virtnet_info *vi;
+ struct virtio_net_ctrl_ebpf_map *ctrl;
+ struct list_head list;
+};
+
 /* Converting between virtqueue no. and kernel tx/rx queue no.
  * 0:rx0 1:tx0 2:rx1 3:tx1 ... 2N:rxN 2N+1:txN 2N+2:cvq
  */
@@ -2528,6 +2536,19 @@ static int virtnet_xdp_set(struct net_device *dev, struct netdev_bpf *bpf)
  return err;
 }
 
+static struct virtnet_bpf_map *virtnet_get_bpf_map(struct virtnet_info *vi,
+   struct bpf_map *map)
+{
+ struct virtnet_bpf_map *virtnet_map;
+
+ list_for_each_entry(virtnet_map, &vi->map_list, list) {
+ if (&virtnet_map->offmap->map == map)
+ return virtnet_map;
+ }
+
+ return NULL;
+}
+
 static int virtnet_bpf_verify_insn(struct bpf_verifier_env *env, int insn_idx,
    int prev_insn)
 {
@@ -2623,11 +2644,194 @@ static int virtnet_xdp_set_offload(struct virtnet_info *vi,
  return err;
 }
 
+static int virtnet_bpf_ctrl_map(struct bpf_offloaded_map *offmap,
+ int cmd, u8 *key, u8 *value, u64 flags,
+ u8 *out_key, u8 *out_value)
+{
+ struct virtio_net_ctrl_ebpf_map *ctrl;
+ struct virtnet_bpf_map *virtnet_map;
+ struct bpf_map *map = &offmap->map;
+ unsigned char *keyptr, *valptr;
+ struct virtnet_info *vi;
+ struct scatterlist sg;
+
+ virtnet_map = offmap->dev_priv;
+ vi = virtnet_map->vi;
+ ctrl = virtnet_map->ctrl;
+
+ keyptr = ctrl->buf;
+ valptr = ctrl->buf + ctrl->key_size;
+
+ if (key)
+ memcpy(keyptr, key, map->key_size);
+ if (value)
+ memcpy(valptr, value, map->value_size);
+
+ ctrl->cmd = cpu_to_virtio32(vi->vdev, cmd);
+ ctrl->flags = cpu_to_virtio64(vi->vdev, flags);
+
+ sg_init_one(&sg, ctrl, sizeof(*ctrl) + ctrl->buf_len);
+ if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_EBPF,
+  VIRTIO_NET_CTRL_EBPF_MAP,
+  &sg))
+ return -EFAULT;
+
+ if (out_key)
+ memcpy(out_key, valptr, map->key_size);
+ if (out_value)
+ memcpy(out_value, valptr, map->value_size);
+ return 0;
+}
+
+static int virtnet_bpf_map_update_entry(struct bpf_offloaded_map *offmap,
+ void *key, void *value, u64 flags)
+{
+ return virtnet_bpf_ctrl_map(offmap,
+    VIRTIO_NET_BPF_CMD_UPDATE_ELEM,
+    key, value, flags, NULL, NULL);
+}
+
+static int virtnet_bpf_map_delete_elem(struct bpf_offloaded_map *offmap,
+       void *key)
+{
+ return virtnet_bpf_ctrl_map(offmap,
+    VIRTIO_NET_BPF_CMD_DELETE_ELEM,
+    key, NULL, 0, NULL, NULL);
+}
+
+static int virtnet_bpf_map_lookup_entry(struct bpf_offloaded_map *offmap,
+ void *key, void *value)
+{
+ return virtnet_bpf_ctrl_map(offmap,
+    VIRTIO_NET_BPF_CMD_LOOKUP_ELEM,
+    key, NULL, 0, NULL, value);
+}
+
+static int virtnet_bpf_map_get_first_key(struct bpf_offloaded_map *offmap,
+ void *next_key)
+{
+ return virtnet_bpf_ctrl_map(offmap,
+    VIRTIO_NET_BPF_CMD_GET_FIRST,
+    NULL, NULL, 0, next_key, NULL);
+}
+
+static int virtnet_bpf_map_get_next_key(struct bpf_offloaded_map *offmap,
+ void *key, void *next_key)
+{
+ if (!key)
+ return virtnet_bpf_map_get_first_key(offmap, next_key);
+
+ return virtnet_bpf_ctrl_map(offmap,
+    VIRTIO_NET_BPF_CMD_GET_NEXT,
+    key, NULL, 0, next_key, NULL);
+}
+
+static const struct bpf_map_dev_ops virtnet_bpf_map_ops = {
+ .map_get_next_key = virtnet_bpf_map_get_next_key,
+ .map_lookup_elem = virtnet_bpf_map_lookup_entry,
+ .map_update_elem = virtnet_bpf_map_update_entry,
+ .map_delete_elem = virtnet_bpf_map_delete_elem,
+};
+
+static int virtnet_bpf_map_alloc(struct virtnet_info *vi,
+ struct bpf_offloaded_map *offmap)
+{
+ struct virtnet_bpf_map *virtnet_map = NULL;
+ struct virtio_net_ctrl_ebpf_map *ctrl = NULL;
+ struct bpf_map *map = &offmap->map;
+ struct scatterlist sg;
+ int buf_len;
+
+ if (map->map_type != BPF_MAP_TYPE_ARRAY &&
+    map->map_type != BPF_MAP_TYPE_HASH)
+ goto err;
+
+ virtnet_map = kzalloc(sizeof(*virtnet_map), GFP_KERNEL);
+ if (!virtnet_map)
+ goto err;
+
+ /* allocate buffer size to fit
+ * - sizeof (struct virio_net_ctrl_map_buf)
+ * - key_size
+ * - max(key_size + value_size, key_size + key_size)
+ */
+ buf_len = map->key_size;
+ buf_len += (map->key_size > map->value_size) ?
+  map->key_size : map->value_size;
+ ctrl = kzalloc(sizeof(*ctrl) + buf_len, GFP_KERNEL);
+ if (!ctrl)
+ goto err;
+
+ ctrl->buf_len = cpu_to_virtio32(vi->vdev, buf_len);
+ ctrl->key_size = cpu_to_virtio32(vi->vdev, map->key_size);
+ ctrl->value_size = cpu_to_virtio32(vi->vdev, map->value_size);
+ ctrl->max_entries = cpu_to_virtio32(vi->vdev, map->max_entries);
+ ctrl->map_type = cpu_to_virtio32(vi->vdev, map->map_type);
+ ctrl->map_flags = 0;
+ ctrl->cmd = cpu_to_virtio32(vi->vdev, VIRTIO_NET_BPF_CMD_CREATE_MAP);
+
+ sg_init_one(&sg, ctrl, sizeof(*ctrl) + ctrl->buf_len);
+
+ if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_EBPF,
+  VIRTIO_NET_CTRL_EBPF_MAP,
+  &sg)) {
+ dev_warn(&vi->vdev->dev, "Failed to create ebpf map\n");
+ goto err;
+ }
+
+ offmap->dev_ops = &virtnet_bpf_map_ops;
+ offmap->dev_priv = virtnet_map;
+
+ virtnet_map->offmap = offmap;
+ virtnet_map->vi = vi;
+ virtnet_map->ctrl = ctrl;
+
+ list_add_tail(&virtnet_map->list, &vi->map_list);
+
+ return 0;
+err:
+ kfree(virtnet_map);
+ kfree(ctrl);
+ return -EFAULT;
+}
+
+static int virtnet_bpf_map_free(struct virtnet_info *vi,
+ struct bpf_offloaded_map *offmap)
+{
+ struct bpf_map *map = &offmap->map;
+ struct virtnet_bpf_map *virtnet_map = virtnet_get_bpf_map(vi, map);
+ struct virtio_net_ctrl_ebpf_map *ctrl = virtnet_map->ctrl;
+ struct scatterlist sg;
+
+ if (!virtnet_map)
+ return -EINVAL;
+
+ ctrl->cmd = cpu_to_virtio32(vi->vdev, VIRTIO_NET_BPF_CMD_FREE_MAP);
+
+ sg_init_one(&sg, ctrl, sizeof(*ctrl) + ctrl->buf_len);
+
+ if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_EBPF,
+  VIRTIO_NET_CTRL_EBPF_MAP,
+  &sg)) {
+ dev_warn(&vi->vdev->dev, "Failed to free ebpf map\n");
+ return -EFAULT;
+ }
+
+ list_del(&virtnet_map->list);
+ kfree(virtnet_map->ctrl);
+ kfree(virtnet_map);
+ return 0;
+}
+
 static int virtnet_bpf_verifier_setup(struct bpf_prog *prog)
 {
  struct virtnet_info *vi = netdev_priv(prog->aux->offload->netdev);
  size_t insn_len = prog->len * sizeof(struct bpf_insn);
  struct virtnet_bpf_bound_prog *state;
+ struct virtnet_bpf_map *virtnet_map;
+ struct bpf_map *map;
+ struct fd mapfd;
+ int i, err = 0;
 
  state = kzalloc(sizeof(*state) + insn_len, GFP_KERNEL);
  if (!state)
@@ -2639,11 +2843,43 @@ static int virtnet_bpf_verifier_setup(struct bpf_prog *prog)
  state->prog = prog;
  state->len = prog->len;
 
+ for (i = 0; i < state->len; i++) {
+ struct bpf_insn *insn = &state->insnsi[i];
+
+ if (insn->code != (BPF_LD | BPF_IMM | BPF_DW))
+ continue;
+
+ mapfd = fdget(insn->imm);
+ map = __bpf_map_get(mapfd);
+ if (IS_ERR(map)) {
+ pr_warn("fd %d is not pointing to valid bpf_map\n",
+ insn->imm);
+ err = -EINVAL;
+ goto err_replace;
+ }
+
+ virtnet_map = virtnet_get_bpf_map(vi, map);
+ if (!virtnet_map) {
+ pr_warn("could not get a offloaded map fd %d\n",
+ insn->imm);
+ err = -EINVAL;
+ goto err_replace;
+ }
+
+ /* Note: no need of virtio32_to_cpu */
+ insn->imm = virtnet_map->ctrl->map_fd;
+ fdput(mapfd);
+ }
+
  list_add_tail(&state->list, &vi->bpf_bound_progs);
 
  prog->aux->offload->dev_priv = state;
 
  return 0;
+
+err_replace:
+ kfree(state);
+ return err;
 }
 
 static int virtnet_bpf_verifier_prep(struct bpf_prog *prog)
@@ -2676,6 +2912,10 @@ static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
  int err;
 
  switch (xdp->command) {
+ case BPF_OFFLOAD_MAP_ALLOC:
+ return virtnet_bpf_map_alloc(vi, xdp->offmap);
+ case BPF_OFFLOAD_MAP_FREE:
+ return virtnet_bpf_map_free(vi, xdp->offmap);
  case XDP_SETUP_PROG:
  return virtnet_xdp_set(dev, xdp);
  case XDP_QUERY_PROG:
@@ -2747,6 +2987,7 @@ static int virtnet_bpf_init(struct virtnet_info *vi)
  goto err_netdev_register;
 
  INIT_LIST_HEAD(&vi->bpf_bound_progs);
+ INIT_LIST_HEAD(&vi->map_list);
 
  return 0;
 
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index 0ea2f7910a5a..1d330a0883ac 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -281,11 +281,34 @@ struct virtio_net_ctrl_ebpf_prog {
  __u8 insns[0];
 };
 
+struct virtio_net_ctrl_ebpf_map {
+ __virtio32 buf_len;
+ __virtio32 cmd;
+ __virtio32 map_type;
+ __virtio32 key_size;
+ __virtio32 value_size;
+ __virtio32 max_entries;
+ __virtio32 map_flags;
+ __virtio32 map_fd;
+ __virtio64 flags;
+ __u8 buf[0];
+};
+
 #define VIRTIO_NET_CTRL_EBPF 6
  #define VIRTIO_NET_CTRL_EBPF_PROG 1
+ #define VIRTIO_NET_CTRL_EBPF_MAP 2
 
 /* Commands for VIRTIO_NET_CTRL_EBPF_PROG */
 #define VIRTIO_NET_BPF_CMD_SET_OFFLOAD 1
 #define VIRTIO_NET_BPF_CMD_UNSET_OFFLOAD 2
 
+/* Commands for VIRTIO_NET_CTRL_EBPF_MAP */
+#define VIRTIO_NET_BPF_CMD_CREATE_MAP 1
+#define VIRTIO_NET_BPF_CMD_FREE_MAP 2
+#define VIRTIO_NET_BPF_CMD_UPDATE_ELEM 3
+#define VIRTIO_NET_BPF_CMD_LOOKUP_ELEM 4
+#define VIRTIO_NET_BPF_CMD_DELETE_ELEM 5
+#define VIRTIO_NET_BPF_CMD_GET_FIRST 6
+#define VIRTIO_NET_BPF_CMD_GET_NEXT 7
+
 #endif /* _UAPI_LINUX_VIRTIO_NET_H */
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

[RFC net-next 18/18] virtio_net: restrict bpf helper calls from offloaded program

Prashant Bhole
In reply to this post by Prashant Bhole
Since we are offloading this program to the host, some of the helper
calls will not make sense. For example get_numa_node_id. Some helpers
can not be used because we don't handle them yet.

So let's allow a small set of helper calls for now.

Signed-off-by: Prashant Bhole <[hidden email]>
---
 drivers/net/virtio_net.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 91a94b787c64..ab5be6b95bbd 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2549,6 +2549,25 @@ static struct virtnet_bpf_map *virtnet_get_bpf_map(struct virtnet_info *vi,
  return NULL;
 }
 
+static int virtnet_bpf_check_helper_call(struct bpf_insn *insn)
+{
+ switch (insn->imm) {
+ case BPF_FUNC_map_lookup_elem:
+ case BPF_FUNC_map_update_elem:
+ case BPF_FUNC_map_delete_elem:
+ case BPF_FUNC_ktime_get_ns:
+ case BPF_FUNC_get_prandom_u32:
+ case BPF_FUNC_csum_update:
+ case BPF_FUNC_xdp_adjust_head:
+ case BPF_FUNC_xdp_adjust_meta:
+ case BPF_FUNC_xdp_adjust_tail:
+ case BPF_FUNC_strtol:
+ case BPF_FUNC_strtoul:
+ return 0;
+ }
+ return -EOPNOTSUPP;
+}
+
 static int virtnet_bpf_verify_insn(struct bpf_verifier_env *env, int insn_idx,
    int prev_insn)
 {
@@ -2830,6 +2849,7 @@ static int virtnet_bpf_verifier_setup(struct bpf_prog *prog)
  struct virtnet_bpf_bound_prog *state;
  struct virtnet_bpf_map *virtnet_map;
  struct bpf_map *map;
+ u8 opcode, class;
  struct fd mapfd;
  int i, err = 0;
 
@@ -2846,6 +2866,16 @@ static int virtnet_bpf_verifier_setup(struct bpf_prog *prog)
  for (i = 0; i < state->len; i++) {
  struct bpf_insn *insn = &state->insnsi[i];
 
+ opcode = BPF_OP(insn->code);
+ class = BPF_CLASS(insn->code);
+
+ if ((class == BPF_JMP || class == BPF_JMP32) &&
+    opcode == BPF_CALL && insn->src_reg != BPF_PSEUDO_CALL) {
+ if (virtnet_bpf_check_helper_call(insn))
+ return -EOPNOTSUPP;
+ continue;
+ }
+
  if (insn->code != (BPF_LD | BPF_IMM | BPF_DW))
  continue;
 
--
2.20.1


Reply | Threaded
Open this post in threaded view
|

Re: [RFC net-next 00/18] virtio_net XDP offload

Jakub Kicinski
In reply to this post by Prashant Bhole
On Tue, 26 Nov 2019 19:07:26 +0900, Prashant Bhole wrote:

> Note: This RFC has been sent to netdev as well as qemu-devel lists
>
> This series introduces XDP offloading from virtio_net. It is based on
> the following work by Jason Wang:
> https://netdevconf.info/0x13/session.html?xdp-offload-with-virtio-net
>
> Current XDP performance in virtio-net is far from what we can achieve
> on host. Several major factors cause the difference:
> - Cost of virtualization
> - Cost of virtio (populating virtqueue and context switching)
> - Cost of vhost, it needs more optimization
> - Cost of data copy
> Because of above reasons there is a need of offloading XDP program to
> host. This set is an attempt to implement XDP offload from the guest.

This turns the guest kernel into a uAPI proxy.

BPF uAPI calls related to the "offloaded" BPF objects are forwarded
to the hypervisor, they pop up in QEMU which makes the requested call
to the hypervisor kernel. Today it's the Linux kernel tomorrow it may
be someone's proprietary "SmartNIC" implementation.

Why can't those calls be forwarded at the higher layer? Why do they
have to go through the guest kernel?

If kernel performs no significant work (or "adds value", pardon the
expression), and problem can easily be solved otherwise we shouldn't
do the work of maintaining the mechanism.

The approach of kernel generating actual machine code which is then
loaded into a sandbox on the hypervisor/SmartNIC is another story.

I'd appreciate if others could chime in.

123