[QA-virtio]:Why vring size is limited to 1024?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

[QA-virtio]:Why vring size is limited to 1024?

Zhangjie (HZ)
Hi,
There exits packets loss when we do packet forwarding in VM,
especially when we use dpdk to do the forwarding. By enlarging vring
can alleviate the problem. But now vring size is limited to 1024 as follows:
VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
                            void (*handle_output)(VirtIODevice *, VirtQueue *))
{
        ...
        if (i == VIRTIO_PCI_QUEUE_MAX || queue_size > VIRTQUEUE_MAX_SIZE)
        abort();
}
ps:#define VIRTQUEUE_MAX_SIZE 1024
I delete the judgement code, and set vring size to 2048,
VM can be successfully started, and the network is ok too.
So, Why vring size is limited to 1024 and what is the influence?

Thanks!
--
Best Wishes!
Zhang Jie


Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Michael S. Tsirkin-4
On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote:
> Hi,
> There exits packets loss when we do packet forwarding in VM,
> especially when we use dpdk to do the forwarding. By enlarging vring
> can alleviate the problem.

I think this has to do with the fact that dpdk disables
checksum offloading, this has the side effect of disabling
segmentation offloading.

Please fix dpdk to support checksum offloading, and
I think the problem will go away.


> But now vring size is limited to 1024 as follows:
> VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>                             void (*handle_output)(VirtIODevice *, VirtQueue *))
> {
> ...
> if (i == VIRTIO_PCI_QUEUE_MAX || queue_size > VIRTQUEUE_MAX_SIZE)
>         abort();
> }
> ps:#define VIRTQUEUE_MAX_SIZE 1024
> I delete the judgement code, and set vring size to 2048,
> VM can be successfully started, and the network is ok too.
> So, Why vring size is limited to 1024 and what is the influence?
>
> Thanks!

There are several reason for this limit.
First guest has to allocate descriptor buffer which is 16 * vring size.
With 1K size that is already 16K which might be tricky to
allocate contigiously if memory is fragmented when device is
added by hotplug.
The second issue is that we want to be able to implement
the device on top of linux kernel, and
a single descriptor might use all of
the virtqueue. In this case we wont to be able to pass the
descriptor directly to linux as a single iov, since
that is limited to 1K entries.

> --
> Best Wishes!
> Zhang Jie

Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Zhangjie (HZ)
Thanks for your patient answer! :-)

On 2014/9/30 17:33, Michael S. Tsirkin wrote:

> On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote:
>> Hi,
>> There exits packets loss when we do packet forwarding in VM,
>> especially when we use dpdk to do the forwarding. By enlarging vring
>> can alleviate the problem.
>
> I think this has to do with the fact that dpdk disables
> checksum offloading, this has the side effect of disabling
> segmentation offloading.
>
> Please fix dpdk to support checksum offloading, and
> I think the problem will go away.
In some application scene, loss of udp packets are not allowed,
 and udp packets are always short than mtu.
So, we need to support high pps(eg.0.3M Packets/s) forwarding, and
offloading cannot fix it.

>
>
>> But now vring size is limited to 1024 as follows:
>> VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>>                             void (*handle_output)(VirtIODevice *, VirtQueue *))
>> {
>> ...
>> if (i == VIRTIO_PCI_QUEUE_MAX || queue_size > VIRTQUEUE_MAX_SIZE)
>>         abort();
>> }
>> ps:#define VIRTQUEUE_MAX_SIZE 1024
>> I delete the judgement code, and set vring size to 2048,
>> VM can be successfully started, and the network is ok too.
>> So, Why vring size is limited to 1024 and what is the influence?
>>
>> Thanks!
>
> There are several reason for this limit.
> First guest has to allocate descriptor buffer which is 16 * vring size.
> With 1K size that is already 16K which might be tricky to
> allocate contigiously if memory is fragmented when device is
> added by hotplug.
That is very
> The second issue is that we want to be able to implement
> the device on top of linux kernel, and
> a single descriptor might use all of
> the virtqueue. In this case we wont to be able to pass the
> descriptor directly to linux as a single iov, since
> that is limited to 1K entries.
For the second issue, I wonder if it is ok to set vring size of virtio-net to large than 1024,
as for net work, there is at most 18 pages for a skb, it will not exceed iov.
>
>> --
>> Best Wishes!
>> Zhang Jie
> .
>

--
Best Wishes!
Zhang Jie


Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Michael S. Tsirkin-4
On Wed, Oct 08, 2014 at 03:17:56PM +0800, Zhangjie (HZ) wrote:

> Thanks for your patient answer! :-)
>
> On 2014/9/30 17:33, Michael S. Tsirkin wrote:
> > On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote:
> >> Hi,
> >> There exits packets loss when we do packet forwarding in VM,
> >> especially when we use dpdk to do the forwarding. By enlarging vring
> >> can alleviate the problem.
> >
> > I think this has to do with the fact that dpdk disables
> > checksum offloading, this has the side effect of disabling
> > segmentation offloading.
> >
> > Please fix dpdk to support checksum offloading, and
> > I think the problem will go away.
> In some application scene, loss of udp packets are not allowed,
>  and udp packets are always short than mtu.
> So, we need to support high pps(eg.0.3M Packets/s) forwarding, and
> offloading cannot fix it.

That's the point. With UFO you get larger than MTU UDP packets:
http://www.linuxfoundation.org/collaborate/workgroups/networking/ufo

Additionally, checksum offloading reduces CPU utilization
and reduces the number of data copies, allowing higher pps
with smaller buffers.

It might look like queue depth helps performance for netperf, but in
real-life workloads the latency under load will suffer, with more
protocols implementing tunnelling on top of UDP such extreme bufferbloat
will not be tolerated.

> >
> >
> >> But now vring size is limited to 1024 as follows:
> >> VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
> >>                             void (*handle_output)(VirtIODevice *, VirtQueue *))
> >> {
> >> ...
> >> if (i == VIRTIO_PCI_QUEUE_MAX || queue_size > VIRTQUEUE_MAX_SIZE)
> >>         abort();
> >> }
> >> ps:#define VIRTQUEUE_MAX_SIZE 1024
> >> I delete the judgement code, and set vring size to 2048,
> >> VM can be successfully started, and the network is ok too.
> >> So, Why vring size is limited to 1024 and what is the influence?
> >>
> >> Thanks!
> >
> > There are several reason for this limit.
> > First guest has to allocate descriptor buffer which is 16 * vring size.
> > With 1K size that is already 16K which might be tricky to
> > allocate contigiously if memory is fragmented when device is
> > added by hotplug.
> That is very
> > The second issue is that we want to be able to implement
> > the device on top of linux kernel, and
> > a single descriptor might use all of
> > the virtqueue. In this case we wont to be able to pass the
> > descriptor directly to linux as a single iov, since
> > that is limited to 1K entries.
> For the second issue, I wonder if it is ok to set vring size of virtio-net to large than 1024,
> as for net work, there is at most 18 pages for a skb, it will not exceed iov.
> >
> >> --
> >> Best Wishes!
> >> Zhang Jie
> > .
> >
>
> --
> Best Wishes!
> Zhang Jie

Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Avi Kivity-2
In reply to this post by Michael S. Tsirkin-4

On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:
> a single descriptor might use all of
> the virtqueue. In this case we wont to be able to pass the
> descriptor directly to linux as a single iov, since
>

You could separate maximum request scatter/gather list size from the
virtqueue size.  They are totally unrelated - even now you can have a
larger request by using indirect descriptors.


Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Zhangjie (HZ)
In reply to this post by Michael S. Tsirkin-4
MST, Thanks very much, I get it.

On 2014/10/8 15:37, Michael S. Tsirkin wrote:

> On Wed, Oct 08, 2014 at 03:17:56PM +0800, Zhangjie (HZ) wrote:
>> Thanks for your patient answer! :-)
>>
>> On 2014/9/30 17:33, Michael S. Tsirkin wrote:
>>> On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote:
>>>> Hi,
>>>> There exits packets loss when we do packet forwarding in VM,
>>>> especially when we use dpdk to do the forwarding. By enlarging vring
>>>> can alleviate the problem.
>>>
>>> I think this has to do with the fact that dpdk disables
>>> checksum offloading, this has the side effect of disabling
>>> segmentation offloading.
>>>
>>> Please fix dpdk to support checksum offloading, and
>>> I think the problem will go away.
>> In some application scene, loss of udp packets are not allowed,
>>  and udp packets are always short than mtu.
>> So, we need to support high pps(eg.0.3M Packets/s) forwarding, and
>> offloading cannot fix it.
>
> That's the point. With UFO you get larger than MTU UDP packets:
> http://www.linuxfoundation.org/collaborate/workgroups/networking/ufo
Then vm only do forwarding, and not create new packets itself.
As we can not gro normal udp packets, when udp packets come from the nic of host, ufo cannot work.

>
> Additionally, checksum offloading reduces CPU utilization
> and reduces the number of data copies, allowing higher pps
> with smaller buffers.
>
> It might look like queue depth helps performance for netperf, but in
> real-life workloads the latency under load will suffer, with more
> protocols implementing tunnelling on top of UDP such extreme bufferbloat
> will not be tolerated.
>
>>>
>>>
>>>> But now vring size is limited to 1024 as follows:
>>>> VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>>>>                             void (*handle_output)(VirtIODevice *, VirtQueue *))
>>>> {
>>>> ...
>>>> if (i == VIRTIO_PCI_QUEUE_MAX || queue_size > VIRTQUEUE_MAX_SIZE)
>>>>         abort();
>>>> }
>>>> ps:#define VIRTQUEUE_MAX_SIZE 1024
>>>> I delete the judgement code, and set vring size to 2048,
>>>> VM can be successfully started, and the network is ok too.
>>>> So, Why vring size is limited to 1024 and what is the influence?
>>>>
>>>> Thanks!
>>>
>>> There are several reason for this limit.
>>> First guest has to allocate descriptor buffer which is 16 * vring size.
>>> With 1K size that is already 16K which might be tricky to
>>> allocate contigiously if memory is fragmented when device is
>>> added by hotplug.
>> That is very
>>> The second issue is that we want to be able to implement
>>> the device on top of linux kernel, and
>>> a single descriptor might use all of
>>> the virtqueue. In this case we wont to be able to pass the
>>> descriptor directly to linux as a single iov, since
>>> that is limited to 1K entries.
>> For the second issue, I wonder if it is ok to set vring size of virtio-net to large than 1024,
>> as for net work, there is at most 18 pages for a skb, it will not exceed iov.
>>>
>>>> --
>>>> Best Wishes!
>>>> Zhang Jie
>>> .
>>>
>>
>> --
>> Best Wishes!
>> Zhang Jie
> .
>

--
Best Wishes!
Zhang Jie


Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Zhangjie (HZ)
In reply to this post by Avi Kivity-2


On 2014/10/8 15:43, Avi Kivity wrote:
>>
>
> You could separate maximum request scatter/gather list size from the virtqueue size.  They are totally unrelated - even now you can have a larger request by using indirect descriptors.
Yes, there is no strong correlation between virtqueue size and iov form the code.
--
Best Wishes!
Zhang Jie


Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Michael S. Tsirkin-4
In reply to this post by Zhangjie (HZ)
On Wed, Oct 08, 2014 at 04:07:47PM +0800, Zhangjie (HZ) wrote:

> MST, Thanks very much, I get it.
>
> On 2014/10/8 15:37, Michael S. Tsirkin wrote:
> > On Wed, Oct 08, 2014 at 03:17:56PM +0800, Zhangjie (HZ) wrote:
> >> Thanks for your patient answer! :-)
> >>
> >> On 2014/9/30 17:33, Michael S. Tsirkin wrote:
> >>> On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote:
> >>>> Hi,
> >>>> There exits packets loss when we do packet forwarding in VM,
> >>>> especially when we use dpdk to do the forwarding. By enlarging vring
> >>>> can alleviate the problem.
> >>>
> >>> I think this has to do with the fact that dpdk disables
> >>> checksum offloading, this has the side effect of disabling
> >>> segmentation offloading.
> >>>
> >>> Please fix dpdk to support checksum offloading, and
> >>> I think the problem will go away.
> >> In some application scene, loss of udp packets are not allowed,
> >>  and udp packets are always short than mtu.
> >> So, we need to support high pps(eg.0.3M Packets/s) forwarding, and
> >> offloading cannot fix it.
> >
> > That's the point. With UFO you get larger than MTU UDP packets:
> > http://www.linuxfoundation.org/collaborate/workgroups/networking/ufo
> Then vm only do forwarding, and not create new packets itself.
> As we can not gro normal udp packets, when udp packets come from the nic of host, ufo cannot work.

This is something I've been thinking about for a while now.
We really should add GRO-like path for UDP, this isn't
too different from UDP.

LRO can often work with UDP too, but linux discards too much
info on LRO, but if you are doing drivers in userspace
you might be able to support this.

> >
> > Additionally, checksum offloading reduces CPU utilization
> > and reduces the number of data copies, allowing higher pps
> > with smaller buffers.
> >
> > It might look like queue depth helps performance for netperf, but in
> > real-life workloads the latency under load will suffer, with more
> > protocols implementing tunnelling on top of UDP such extreme bufferbloat
> > will not be tolerated.
> >
> >>>
> >>>
> >>>> But now vring size is limited to 1024 as follows:
> >>>> VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
> >>>>                             void (*handle_output)(VirtIODevice *, VirtQueue *))
> >>>> {
> >>>> ...
> >>>> if (i == VIRTIO_PCI_QUEUE_MAX || queue_size > VIRTQUEUE_MAX_SIZE)
> >>>>         abort();
> >>>> }
> >>>> ps:#define VIRTQUEUE_MAX_SIZE 1024
> >>>> I delete the judgement code, and set vring size to 2048,
> >>>> VM can be successfully started, and the network is ok too.
> >>>> So, Why vring size is limited to 1024 and what is the influence?
> >>>>
> >>>> Thanks!
> >>>
> >>> There are several reason for this limit.
> >>> First guest has to allocate descriptor buffer which is 16 * vring size.
> >>> With 1K size that is already 16K which might be tricky to
> >>> allocate contigiously if memory is fragmented when device is
> >>> added by hotplug.
> >> That is very
> >>> The second issue is that we want to be able to implement
> >>> the device on top of linux kernel, and
> >>> a single descriptor might use all of
> >>> the virtqueue. In this case we wont to be able to pass the
> >>> descriptor directly to linux as a single iov, since
> >>> that is limited to 1K entries.
> >> For the second issue, I wonder if it is ok to set vring size of virtio-net to large than 1024,
> >> as for net work, there is at most 18 pages for a skb, it will not exceed iov.
> >>>
> >>>> --
> >>>> Best Wishes!
> >>>> Zhang Jie
> >>> .
> >>>
> >>
> >> --
> >> Best Wishes!
> >> Zhang Jie
> > .
> >
>
> --
> Best Wishes!
> Zhang Jie

Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Michael S. Tsirkin-4
In reply to this post by Avi Kivity-2
On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote:

>
> On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:
> >a single descriptor might use all of
> >the virtqueue. In this case we wont to be able to pass the
> >descriptor directly to linux as a single iov, since
> >
>
> You could separate maximum request scatter/gather list size from the
> virtqueue size.  They are totally unrelated - even now you can have a larger
> request by using indirect descriptors.

We could add a feature to have a smaller or larger S/G length limit.
Is this something useful?

--
MST

Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Avi Kivity-2

On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote:

> On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote:
>> On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:
>>> a single descriptor might use all of
>>> the virtqueue. In this case we wont to be able to pass the
>>> descriptor directly to linux as a single iov, since
>>>
>> You could separate maximum request scatter/gather list size from the
>> virtqueue size.  They are totally unrelated - even now you can have a larger
>> request by using indirect descriptors.
> We could add a feature to have a smaller or larger S/G length limit.
> Is this something useful?
>

Having a larger ring size is useful, esp. with zero-copy transmit, and
you would need the sglist length limit in order to not require
linearization on linux hosts.  So the limit is not useful in itself,
only indirectly.

Google cloud engine exposes virtio ring sizes of 16384.

Even more useful is getting rid of the desc array and instead passing
descs inline in avail and used.


Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Michael S. Tsirkin-4
On Wed, Oct 08, 2014 at 12:51:21PM +0300, Avi Kivity wrote:

>
> On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote:
> >On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote:
> >>On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:
> >>>a single descriptor might use all of
> >>>the virtqueue. In this case we wont to be able to pass the
> >>>descriptor directly to linux as a single iov, since
> >>>
> >>You could separate maximum request scatter/gather list size from the
> >>virtqueue size.  They are totally unrelated - even now you can have a larger
> >>request by using indirect descriptors.
> >We could add a feature to have a smaller or larger S/G length limit.
> >Is this something useful?
> >
>
> Having a larger ring size is useful, esp. with zero-copy transmit, and you
> would need the sglist length limit in order to not require linearization on
> linux hosts.  So the limit is not useful in itself, only indirectly.
>
> Google cloud engine exposes virtio ring sizes of 16384.

OK this sounds useful, I'll queue this up for consideration.
Thanks!

> Even more useful is getting rid of the desc array and instead passing descs
> inline in avail and used.

You expect this to improve performance?
Quite possibly but this will have to be demonstrated.

--
MST

Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Avi Kivity-2

On 10/08/2014 01:14 PM, Michael S. Tsirkin wrote:

> On Wed, Oct 08, 2014 at 12:51:21PM +0300, Avi Kivity wrote:
>> On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote:
>>> On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote:
>>>> On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:
>>>>> a single descriptor might use all of
>>>>> the virtqueue. In this case we wont to be able to pass the
>>>>> descriptor directly to linux as a single iov, since
>>>>>
>>>> You could separate maximum request scatter/gather list size from the
>>>> virtqueue size.  They are totally unrelated - even now you can have a larger
>>>> request by using indirect descriptors.
>>> We could add a feature to have a smaller or larger S/G length limit.
>>> Is this something useful?
>>>
>> Having a larger ring size is useful, esp. with zero-copy transmit, and you
>> would need the sglist length limit in order to not require linearization on
>> linux hosts.  So the limit is not useful in itself, only indirectly.
>>
>> Google cloud engine exposes virtio ring sizes of 16384.
> OK this sounds useful, I'll queue this up for consideration.
> Thanks!

Thanks.

>> Even more useful is getting rid of the desc array and instead passing descs
>> inline in avail and used.
> You expect this to improve performance?
> Quite possibly but this will have to be demonstrated.
>

The top vhost function in small packet workloads is vhost_get_vq_desc,
and the top instruction within that (50%) is the one that reads the
first 8 bytes of desc.  It's a guaranteed cache line miss (and again on
the guest side when it's time to reuse).

Inline descriptors will amortize the cache miss over 4 descriptors, and
will allow the hardware to prefetch, since the descriptors are linear in
memory.



Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Michael S. Tsirkin-4
On Wed, Oct 08, 2014 at 01:37:25PM +0300, Avi Kivity wrote:

>
> On 10/08/2014 01:14 PM, Michael S. Tsirkin wrote:
> >On Wed, Oct 08, 2014 at 12:51:21PM +0300, Avi Kivity wrote:
> >>On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote:
> >>>On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote:
> >>>>On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:
> >>>>>a single descriptor might use all of
> >>>>>the virtqueue. In this case we wont to be able to pass the
> >>>>>descriptor directly to linux as a single iov, since
> >>>>>
> >>>>You could separate maximum request scatter/gather list size from the
> >>>>virtqueue size.  They are totally unrelated - even now you can have a larger
> >>>>request by using indirect descriptors.
> >>>We could add a feature to have a smaller or larger S/G length limit.
> >>>Is this something useful?
> >>>
> >>Having a larger ring size is useful, esp. with zero-copy transmit, and you
> >>would need the sglist length limit in order to not require linearization on
> >>linux hosts.  So the limit is not useful in itself, only indirectly.
> >>
> >>Google cloud engine exposes virtio ring sizes of 16384.
> >OK this sounds useful, I'll queue this up for consideration.
> >Thanks!
>
> Thanks.
>
> >>Even more useful is getting rid of the desc array and instead passing descs
> >>inline in avail and used.
> >You expect this to improve performance?
> >Quite possibly but this will have to be demonstrated.
> >
>
> The top vhost function in small packet workloads is vhost_get_vq_desc, and
> the top instruction within that (50%) is the one that reads the first 8
> bytes of desc.  It's a guaranteed cache line miss (and again on the guest
> side when it's time to reuse).

OK so basically what you are pointing out is that we get 5 accesses:
read of available head, read of available ring, read of descriptor,
write of used ring, write of used ring head.

If processing is in-order, we could build a much simpler design, with a
valid bit in the descriptor, cleared by host as descriptors are
consumed.

Basically get rid of both used and available ring.

Sounds good in theory.

> Inline descriptors will amortize the cache miss over 4 descriptors, and will
> allow the hardware to prefetch, since the descriptors are linear in memory.

If descriptors are used in order (as they are with current qemu)
then aren't they amortized already?

--
MST

Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Avi Kivity-2

On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote:

>>>> Even more useful is getting rid of the desc array and instead passing descs
>>>> inline in avail and used.
>>> You expect this to improve performance?
>>> Quite possibly but this will have to be demonstrated.
>>>
>> The top vhost function in small packet workloads is vhost_get_vq_desc, and
>> the top instruction within that (50%) is the one that reads the first 8
>> bytes of desc.  It's a guaranteed cache line miss (and again on the guest
>> side when it's time to reuse).
> OK so basically what you are pointing out is that we get 5 accesses:
> read of available head, read of available ring, read of descriptor,
> write of used ring, write of used ring head.

Right.  And only read of descriptor is not amortized.

> If processing is in-order, we could build a much simpler design, with a
> valid bit in the descriptor, cleared by host as descriptors are
> consumed.
>
> Basically get rid of both used and available ring.

That only works if you don't allow reordering, which is never the case
for block, and not the case for zero-copy net.  It also has writers on
both side of the ring.

The right design is to keep avail and used, but instead of making them
rings of pointers to descs, make them rings of descs.

The host reads descs from avail, processes them, then writes them back
on used (possibly out-of-order).  The guest writes descs to avail and
reads them back from used.

You'll probably have to add a 64-bit cookie to desc so you can complete
without an additional lookup.

>
> Sounds good in theory.
>
>> Inline descriptors will amortize the cache miss over 4 descriptors, and will
>> allow the hardware to prefetch, since the descriptors are linear in memory.
> If descriptors are used in order (as they are with current qemu)
> then aren't they amortized already?
>


Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Avi Kivity-2
In reply to this post by Michael S. Tsirkin-4

On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote:
>
>> Inline descriptors will amortize the cache miss over 4 descriptors, and will
>> allow the hardware to prefetch, since the descriptors are linear in memory.
> If descriptors are used in order (as they are with current qemu)
> then aren't they amortized already?
>

The descriptors are only in-order for non-zero-copy net.  They are out
of order for block and zero-copy net.

(also, the guest has to be careful in how it allocates descriptors).

Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Michael S. Tsirkin-4
In reply to this post by Avi Kivity-2
On Wed, Oct 08, 2014 at 01:59:13PM +0300, Avi Kivity wrote:

>
> On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote:
> >>>>Even more useful is getting rid of the desc array and instead passing descs
> >>>>inline in avail and used.
> >>>You expect this to improve performance?
> >>>Quite possibly but this will have to be demonstrated.
> >>>
> >>The top vhost function in small packet workloads is vhost_get_vq_desc, and
> >>the top instruction within that (50%) is the one that reads the first 8
> >>bytes of desc.  It's a guaranteed cache line miss (and again on the guest
> >>side when it's time to reuse).
> >OK so basically what you are pointing out is that we get 5 accesses:
> >read of available head, read of available ring, read of descriptor,
> >write of used ring, write of used ring head.
>
> Right.  And only read of descriptor is not amortized.
>
> >If processing is in-order, we could build a much simpler design, with a
> >valid bit in the descriptor, cleared by host as descriptors are
> >consumed.
> >
> >Basically get rid of both used and available ring.
>
> That only works if you don't allow reordering, which is never the case for
> block, and not the case for zero-copy net.  It also has writers on both side
> of the ring.
>
> The right design is to keep avail and used, but instead of making them rings
> of pointers to descs, make them rings of descs.
>
> The host reads descs from avail, processes them, then writes them back on
> used (possibly out-of-order).  The guest writes descs to avail and reads
> them back from used.
>
> You'll probably have to add a 64-bit cookie to desc so you can complete
> without an additional lookup.

My old presentation from 2012 or so suggested something like this.
We don't need a 64 bit cookie I think - a small 16 bit one
should be enough.

> >
> >Sounds good in theory.
> >
> >>Inline descriptors will amortize the cache miss over 4 descriptors, and will
> >>allow the hardware to prefetch, since the descriptors are linear in memory.
> >If descriptors are used in order (as they are with current qemu)
> >then aren't they amortized already?
> >

Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Avi Kivity-2

On 10/08/2014 03:22 PM, Michael S. Tsirkin wrote:

> On Wed, Oct 08, 2014 at 01:59:13PM +0300, Avi Kivity wrote:
>> On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote:
>>>>>> Even more useful is getting rid of the desc array and instead passing descs
>>>>>> inline in avail and used.
>>>>> You expect this to improve performance?
>>>>> Quite possibly but this will have to be demonstrated.
>>>>>
>>>> The top vhost function in small packet workloads is vhost_get_vq_desc, and
>>>> the top instruction within that (50%) is the one that reads the first 8
>>>> bytes of desc.  It's a guaranteed cache line miss (and again on the guest
>>>> side when it's time to reuse).
>>> OK so basically what you are pointing out is that we get 5 accesses:
>>> read of available head, read of available ring, read of descriptor,
>>> write of used ring, write of used ring head.
>> Right.  And only read of descriptor is not amortized.
>>
>>> If processing is in-order, we could build a much simpler design, with a
>>> valid bit in the descriptor, cleared by host as descriptors are
>>> consumed.
>>>
>>> Basically get rid of both used and available ring.
>> That only works if you don't allow reordering, which is never the case for
>> block, and not the case for zero-copy net.  It also has writers on both side
>> of the ring.
>>
>> The right design is to keep avail and used, but instead of making them rings
>> of pointers to descs, make them rings of descs.
>>
>> The host reads descs from avail, processes them, then writes them back on
>> used (possibly out-of-order).  The guest writes descs to avail and reads
>> them back from used.
>>
>> You'll probably have to add a 64-bit cookie to desc so you can complete
>> without an additional lookup.
> My old presentation from 2012 or so suggested something like this.
> We don't need a 64 bit cookie I think - a small 16 bit one
> should be enough.
>

A 16 bit cookie means you need an extra table to hold the real request
pointers.

With a 64-bit cookie you can store a pointer to the skbuff or bio in the
ring itself, and avoid the extra lookup.

The extra lookup isn't the end of the world, since doesn't cross core
boundaries, but it's worth avoiding.


Reply | Threaded
Open this post in threaded view
|

Re: [QA-virtio]:Why vring size is limited to 1024?

Avi Kivity-2

On 10/08/2014 03:28 PM, Avi Kivity wrote:

> My old presentation from 2012 or so suggested something like this.
>> We don't need a 64 bit cookie I think - a small 16 bit one
>> should be enough.
>>
>
> A 16 bit cookie means you need an extra table to hold the real request
> pointers.
>
> With a 64-bit cookie you can store a pointer to the skbuff or bio in
> the ring itself, and avoid the extra lookup.
>
> The extra lookup isn't the end of the world, since doesn't cross core
> boundaries, but it's worth avoiding.
>

What you can do is have two types of descriptors: head and fragment

union desc {
     struct head {
          u16 nfrags
          u16 flags
          u64 cookie
     }
     struct frag {
          u64 paddr
          u16 flen
          u16 flags
     }
}

so now a request length is 12*(nfrags+1).

You can be evil and steal some bits from paddr/cookie, and have each
descriptor 8 bytes long.

btw, I also recommend storing things like vnet_hdr in the ring itself,
instead of out-of-line in memory.  Maybe the ring should just transport
bytes and let the upper layer decide how it's formatted.