1.. _virtio-net:
2
3Virtio-Net
4##########
5
6Virtio-net is the para-virtualization solution used in ACRN for
7networking. The ACRN Device Model emulates virtual NICs for User VM and the
8frontend virtio network driver, simulating the virtual NIC and following
9the virtio specification. (Refer to :ref:`introduction` and
10:ref:`virtio-hld` background introductions to ACRN and Virtio.)
11
12Here are some notes about Virtio-net support in ACRN:
13
14- Legacy devices are supported, modern devices are not supported
15- Two virtqueues are used in virtio-net: RX queue and TX queue
16- Indirect descriptor is supported
17- TAP backend is supported
18- Control queue is not supported
19- NIC multiple queues are not supported
20
21Network Virtualization Architecture
22***********************************
23
24ACRN's network virtualization architecture is shown below in
25:numref:`net-virt-arch`, and illustrates the many necessary network
26virtualization components that must cooperate for the User VM to send and
27receive data from the outside world.
28
29.. figure:: images/network-virt-arch.png
30   :align: center
31   :width: 900px
32   :name: net-virt-arch
33
34   Network Virtualization Architecture
35
36(The green components are parts of the ACRN solution, while the gray
37components are parts of the Linux kernel.)
38
39Let's explore these components further.
40
41Service VM/User VM Network Stack:
42   This is the standard Linux TCP/IP stack and the most
43   feature-rich TCP/IP implementation.
44
45virtio-net Frontend Driver:
46   This is the standard driver in the Linux Kernel for virtual Ethernet
47   devices. This driver matches devices with PCI vendor ID 0x1AF4 and PCI
48   Device ID 0x1000 (for legacy devices in our case) or 0x1041 (for modern
49   devices). The virtual NIC supports two virtqueues, one for transmitting
50   packets and the other for receiving packets. The frontend driver places
51   empty buffers into one virtqueue for receiving packets, and enqueues
52   outgoing packets into another virtqueue for transmission. The size of
53   each virtqueue is 1024, configurable in the virtio-net backend driver.
54
55ACRN Hypervisor:
56   The ACRN hypervisor is a type 1 hypervisor, running directly on the
57   bare-metal hardware, and suitable for a variety of IoT and embedded
58   device solutions. It fetches and analyzes the guest instructions, puts
59   the decoded information into the shared page as an IOREQ, and notifies
60   or interrupts the HSM module in the Service VM for processing.
61
62HSM Module:
63   The Hypervisor Service Module (HSM) is a kernel module in the
64   Service VM acting as a middle layer to support the Device Model
65   and hypervisor. The HSM forwards a IOREQ to the virtio-net backend
66   driver for processing.
67
68ACRN Device Model and virtio-net Backend Driver:
69   The ACRN Device Model (DM) gets an IOREQ from a shared page and calls
70   the virtio-net backend driver to process the request. The backend driver
71   receives the data in a shared virtqueue and sends it to the TAP device.
72
73Bridge and TAP Device:
74   Bridge and TAP are standard virtual network infrastructures. They play
75   an important role in communication among the Service VM, the User VM, and the
76   outside world.
77
78IGB Driver:
79   IGB is the physical Network Interface Card (NIC) Linux kernel driver
80   responsible for sending data to and receiving data from the physical
81   NIC.
82
83The virtual network card (NIC) is implemented as a virtio legacy device
84in the ACRN Device Model (DM). It is registered as a PCI virtio device
85to the guest OS (User VM) and uses the standard virtio-net in the Linux kernel as
86its driver (the guest kernel should be built with
87``CONFIG_VIRTIO_NET=y``).
88
89The virtio-net backend in DM forwards the data received from the
90frontend to the TAP device, then from the TAP device to the bridge, and
91finally from the bridge to the physical NIC driver, and vice versa for
92returning data from the NIC to the frontend.
93
94ACRN Virtio-Network Calling Stack
95*********************************
96
97Various components of ACRN network virtualization are shown in the
98architecture diagram shows in :numref:`net-virt-arch`.  In this section,
99we will use User VM data transmission (TX) and reception (RX) examples to
100explain step-by-step how these components work together to implement
101ACRN network virtualization.
102
103Initialization in Device Model
104==============================
105
106**virtio_net_init**
107
108- Present frontend for a virtual PCI based NIC
109- Setup control plan callbacks
110- Setup data plan callbacks, including TX, RX
111- Setup TAP backend
112
113Initialization in Virtio-Net Frontend Driver
114============================================
115
116**virtio_pci_probe**
117
118- Construct virtio device using virtual PCI device and register it to
119  virtio bus
120
121**virtio_dev_probe --> virtnet_probe --> init_vqs**
122
123- Register network driver
124- Setup shared virtqueues
125
126ACRN User VM TX FLOW
127====================
128
129The following shows the ACRN User VM network TX flow, using TCP as an
130example, showing the flow through each layer:
131
132**User VM TCP Layer**
133
134.. code-block:: c
135
136   tcp_sendmsg -->
137       tcp_sendmsg_locked -->
138           tcp_push_one -->
139               tcp_write_xmit -->
140                   tcp_transmit_skb -->
141
142**User VM IP Layer**
143
144.. code-block:: c
145
146   ip_queue_xmit -->
147       ip_local_out -->
148           __ip_local_out -->
149               dst_output -->
150                   ip_output -->
151                       ip_finish_output -->
152                           ip_finish_output2 -->
153                               neigh_output -->
154                                   neigh_resolve_output -->
155
156**User VM MAC Layer**
157
158.. code-block:: c
159
160   dev_queue_xmit -->
161       __dev_queue_xmit -->
162           dev_hard_start_xmit -->
163               xmit_one -->
164                   netdev_start_xmit -->
165                       __netdev_start_xmit -->
166
167
168**User VM MAC Layer virtio-net Frontend Driver**
169
170.. code-block:: c
171
172   start_xmit -->                   // virtual NIC driver xmit in virtio_net
173       xmit_skb -->
174           virtqueue_add_outbuf --> // add out buffer to shared virtqueue
175               virtqueue_add -->
176
177       virtqueue_kick -->           // notify the backend
178           virtqueue_notify -->
179               vp_notify -->
180                   iowrite16 -->    // trap here, HV will first get notified
181
182**ACRN Hypervisor**
183
184.. code-block:: c
185
186   vmexit_handler -->                      // vmexit because VMX_EXIT_REASON_IO_INSTRUCTION
187       pio_instr_vmexit_handler -->
188           emulate_io -->                  // ioreq cant be processed in HV, forward it to HSM
189               acrn_insert_request_wait -->
190                   fire_hsm_interrupt -->  // interrupt Service VM, HSM will get notified
191
192**HSM Module**
193
194.. code-block:: c
195
196   hsm_intr_handler -->                          // HSM interrupt handler
197       tasklet_schedule -->
198           io_req_tasklet -->
199               acrn_ioreq_distribute_request --> // ioreq can't be processed in HSM, forward it to device DM
200                   acrn_ioreq_notify_client -->
201                       wake_up_interruptible --> // wake up DM to handle ioreq
202
203**ACRN Device Model / virtio-net Backend Driver**
204
205.. code-block:: c
206
207   handle_vmexit -->
208       vmexit_inout -->
209           emulate_inout -->
210               pci_emul_io_handler -->
211                   virtio_pci_write -->
212                       virtio_pci_legacy_write -->
213                           virtio_net_ping_txq -->       // start TX thread to process, notify thread return
214                               virtio_net_tx_thread -->  // this is TX thread
215                                   virtio_net_proctx --> // call corresponding backend (tap) to process
216                                       virtio_net_tap_tx -->
217                                           writev -->    // write data to tap device
218
219**Service VM TAP Device Forwarding**
220
221.. code-block:: c
222
223   do_writev -->
224       vfs_writev -->
225           do_iter_write -->
226               do_iter_readv_writev -->
227                   call_write_iter -->
228                       tun_chr_write_iter -->
229                           tun_get_user -->
230                               netif_receive_skb -->
231                                   netif_receive_skb_internal -->
232                                       __netif_receive_skb -->
233                                           __netif_receive_skb_core -->
234
235
236**Service VM Bridge Forwarding**
237
238.. code-block:: c
239
240   br_handle_frame -->
241       br_handle_frame_finish -->
242           br_forward -->
243               __br_forward -->
244                   br_forward_finish -->
245                       br_dev_queue_push_xmit -->
246
247**Service VM MAC Layer**
248
249.. code-block:: c
250
251   dev_queue_xmit -->
252       __dev_queue_xmit -->
253           dev_hard_start_xmit -->
254               xmit_one -->
255                   netdev_start_xmit -->
256                       __netdev_start_xmit -->
257
258
259**Service VM MAC Layer IGB Driver**
260
261.. code-block:: c
262
263   igb_xmit_frame --> // IGB physical NIC driver xmit function
264
265ACRN User VM RX FLOW
266====================
267
268The following shows the ACRN User VM network RX flow, using TCP as an example.
269Let's start by receiving a device interrupt. (Note that the hypervisor
270will first get notified when receiving an interrupt even in passthrough
271cases.)
272
273**Hypervisor Interrupt Dispatch**
274
275.. code-block:: c
276
277   vmexit_handler -->                          // vmexit because VMX_EXIT_REASON_EXTERNAL_INTERRUPT
278       external_interrupt_vmexit_handler -->
279           dispatch_interrupt -->
280               common_handler_edge -->
281                  ptdev_interrupt_handler -->
282                     ptdev_enqueue_softirq --> // Interrupt will be delivered in bottom-half softirq
283
284
285**Hypervisor Interrupt Injection**
286
287.. code-block:: c
288
289   do_softirq -->
290       ptdev_softirq -->
291           vlapic_intr_msi -->     // insert the interrupt into Service VM
292
293   start_vcpu -->                  // VM Entry here, will process the pending interrupts
294
295**Service VM MAC Layer IGB Driver**
296
297.. code-block:: c
298
299   do_IRQ -->
300       ...
301       igb_msix_ring -->
302           igbpoll -->
303               napi_gro_receive -->
304                   napi_skb_finish -->
305                       netif_receive_skb_internal -->
306                           __netif_receive_skb -->
307                               __netif_receive_skb_core --
308
309**Service VM Bridge Forwarding**
310
311.. code-block:: c
312
313   br_handle_frame -->
314       br_handle_frame_finish -->
315           br_forward -->
316               __br_forward -->
317                   br_forward_finish -->
318                       br_dev_queue_push_xmit -->
319
320**Service VM MAC Layer**
321
322.. code-block:: c
323
324   dev_queue_xmit -->
325       __dev_queue_xmit -->
326           dev_hard_start_xmit -->
327               xmit_one -->
328                   netdev_start_xmit -->
329                       __netdev_start_xmit -->
330
331**Service VM MAC Layer TAP Driver**
332
333.. code-block:: c
334
335   tun_net_xmit --> // Notify and wake up reader process
336
337**ACRN Device Model / virtio-net Backend Driver**
338
339.. code-block:: c
340
341   virtio_net_rx_callback -->       // the tap fd get notified and this function invoked
342       virtio_net_tap_rx -->        // read data from tap, prepare virtqueue, insert interrupt into the User VM
343           vq_endchains -->
344               vq_interrupt -->
345                   pci_generate_msi -->
346
347**HSM Module**
348
349.. code-block:: c
350
351   hsm_dev_ioctl -->                // process the IOCTL and call hypercall to inject interrupt
352       hcall_inject_msi -->
353
354**ACRN Hypervisor**
355
356.. code-block:: c
357
358   vmexit_handler -->               // vmexit because VMX_EXIT_REASON_VMCALL
359       vmcall_vmexit_handler -->
360           hcall_inject_msi -->     // insert interrupt into User VM
361               vlapic_intr_msi -->
362
363**User VM MAC Layer virtio_net Frontend Driver**
364
365.. code-block:: c
366
367   vring_interrupt -->              // virtio-net frontend driver interrupt handler
368       skb_recv_done -->            // registered by virtnet_probe-->init_vqs-->virtnet_find_vqs
369           virtqueue_napi_schedule -->
370               __napi_schedule -->
371                   virtnet_poll -->
372                       virtnet_receive -->
373                           receive_buf -->
374
375**User VM MAC Layer**
376
377.. code-block:: c
378
379   napi_gro_receive -->
380       napi_skb_finish -->
381           netif_receive_skb_internal -->
382               __netif_receive_skb -->
383                   __netif_receive_skb_core -->
384
385**User VM IP Layer**
386
387.. code-block:: c
388
389   ip_rcv -->
390       ip_rcv_finish -->
391           dst_input -->
392               ip_local_deliver -->
393                   ip_local_deliver_finish -->
394
395
396**User VM TCP Layer**
397
398.. code-block:: c
399
400   tcp_v4_rcv -->
401       tcp_v4_do_rcv -->
402           tcp_rcv_established -->
403               tcp_data_queue -->
404                   tcp_queue_rcv -->
405                       __skb_queue_tail -->
406
407                   sk->sk_data_ready --> // application will get notified
408
409How to Use TAP Interface
410========================
411
412The network infrastructure shown in :numref:`net-virt-infra` needs to be
413prepared in the Service VM before we start. We need to create a bridge and at
414least one TAP device (two TAP devices are needed to create a dual
415virtual NIC) and attach a physical NIC and TAP device to the bridge.
416
417.. figure:: images/network-virt-service-vm-infrastruct.png
418   :align: center
419   :width: 900px
420   :name: net-virt-infra
421
422   Network Infrastructure in Service VM
423
424You can use Linux commands (e.g. ip, brctl) to create this network. In
425our case, we use systemd to automatically create the network by default.
426You can check the files with prefix 50- in the Service VM
427``/usr/lib/systemd/network/``:
428
429- :acrn_raw:`50-acrn.netdev <misc/services/acrn_bridge/acrn.netdev>`
430- :acrn_raw:`50-acrn.network <misc/services/acrn_bridge/acrn.network>`
431- :acrn_raw:`50-tap0.netdev <misc/services/acrn_bridge/tap0.netdev>`
432- :acrn_raw:`50-eth.network <misc/services/acrn_bridge/eth.network>`
433
434When the Service VM is started, run ``ifconfig`` to show the devices created by
435this systemd configuration:
436
437.. code-block:: none
438
439   acrn-br0 Link encap:Ethernet HWaddr B2:50:41:FE:F7:A3
440      inet addr:10.239.154.43 Bcast:10.239.154.255 Mask:255.255.255.0
441      inet6 addr: fe80::b050:41ff:fefe:f7a3/64 Scope:Link
442      UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
443      RX packets:226932 errors:0 dropped:21383 overruns:0 frame:0
444      TX packets:14816 errors:0 dropped:0 overruns:0 carrier:0
445      collisions:0 txqueuelen:1000
446      RX bytes:100457754 (95.8 Mb) TX bytes:83481244 (79.6 Mb)
447
448   tap0 Link encap:Ethernet HWaddr F6:A7:7E:52:50:C6
449      UP BROADCAST MULTICAST MTU:1500 Metric:1
450      RX packets:0 errors:0 dropped:0 overruns:0 frame:0
451      TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
452      collisions:0 txqueuelen:1000
453      RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
454
455   enp3s0 Link encap:Ethernet HWaddr 98:4F:EE:14:5B:74
456      inet6 addr: fe80::9a4f:eeff:fe14:5b74/64 Scope:Link
457      UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
458      RX packets:279174 errors:0 dropped:0 overruns:0 frame:0
459      TX packets:69923 errors:0 dropped:0 overruns:0 carrier:0
460      collisions:0 txqueuelen:1000
461      RX bytes:107312294 (102.3 Mb) TX bytes:87117507 (83.0 Mb)
462      Memory:82200000-8227ffff
463
464   lo Link encap:Local Loopback
465      inet addr:127.0.0.1 Mask:255.0.0.0
466      inet6 addr: ::1/128 Scope:Host
467      UP LOOPBACK RUNNING MTU:65536 Metric:1
468      RX packets:16 errors:0 dropped:0 overruns:0 frame:0
469      TX packets:16 errors:0 dropped:0 overruns:0 carrier:0
470      collisions:0 txqueuelen:1000
471      RX bytes:1216 (1.1 Kb) TX bytes:1216 (1.1 Kb)
472
473Run ``brctl show`` to see the bridge ``acrn-br0`` and attached devices:
474
475.. code-block:: none
476
477   bridge name   bridge id STP       enabled   interfaces
478
479   acrn-br0      8000.b25041fef7a3   no        tap0
480                                               enp3s0
481
482Add a PCI slot to the Device Model acrn-dm command line (mac address is
483optional):
484
485.. code-block:: none
486
487    -s 4,virtio-net,tap=<name>,[mac=<XX:XX:XX:XX:XX:XX>]
488
489When the User VM is launched, run ``ifconfig`` to check the network. enp0s4r
490is the virtual NIC created by acrn-dm:
491
492.. code-block:: none
493
494   enp0s4 Link encap:Ethernet HWaddr 00:16:3E:39:0F:CD
495      inet addr:10.239.154.186 Bcast:10.239.154.255 Mask:255.255.255.0
496      inet6 addr: fe80::216:3eff:fe39:fcd/64 Scope:Link
497      UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
498      RX packets:140 errors:0 dropped:8 overruns:0 frame:0
499      TX packets:46 errors:0 dropped:0 overruns:0 carrier:0
500      collisions:0 txqueuelen:1000
501      RX bytes:110727 (108.1 Kb) TX bytes:4474 (4.3 Kb)
502
503   lo Link encap:Local Loopback
504      inet addr:127.0.0.1 Mask:255.0.0.0
505      inet6 addr: ::1/128 Scope:Host
506      UP LOOPBACK RUNNING MTU:65536 Metric:1
507      RX packets:0 errors:0 dropped:0 overruns:0 frame:0
508      TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
509      collisions:0 txqueuelen:1000
510      RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
511
512How to Use MacVTap Interface
513============================
514In addition to TAP interface, ACRN also supports MacVTap interface.
515MacVTap replaces the combination of the TAP and bridge drivers with
516a single module based on MacVLan driver. With MacVTap, each
517virtual network interface is assigned its own MAC and IP address
518and is directly attached to the physical interface of the host machine
519to improve throughput and latencies.
520
521Create a MacVTap interface in the Service VM as shown here:
522
523.. code-block:: none
524
525   sudo ip link add link eth0 name macvtap0 type macvtap
526
527where ``eth0`` is the name of the physical network interface, and
528``macvtap0`` is the name of the MacVTap interface being created.
529
530Once the MacVTap interface is created, the User VM can be launched by adding
531a PCI slot to the Device Model acrn-dm as shown below.
532
533.. code-block:: none
534
535   -s 4,virtio-net,tap=macvtap0,[mac=<XX:XX:XX:XX:XX:XX>]
536
537Performance Estimation
538======================
539
540We've introduced the network virtualization solution in ACRN, from the
541top level architecture to the detailed TX and RX flow. The
542control plane and data plane are all processed in ACRN Device Model,
543which may bring some overhead. But this is not a bottleneck for 1000Mbit
544NICs or below. Network bandwidth for virtualization can be very close to
545the native bandwidth. For a high-speed NIC (for example, 10Gb or above), it is
546necessary to separate the data plane from the control plane. We can use
547vhost for acceleration. For most IoT scenarios, processing in user space
548is simple and reasonable.
549
550
551