[RFC] create a single workqueue for each vm to update vm irq routing table

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
60 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[RFC] create a single workqueue for each vm to update vm irq routing table

Zhanghaoyu (A)
Hi all,

When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table,
in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is blocked for so much time to wait RCU grace period, and during this period, this vcpu cannot provide service to VM,
so those interrupts delivered to this vcpu cannot be handled in time, and the apps running on this vcpu cannot be serviced too.
It's unacceptable in some real-time scenario, e.g. telecom.

So, I want to create a single workqueue for each VM, to asynchronously performing the RCU synchronization for irq routing table,
and let the vcpu thread return and VMENTRY to service VM immediately, no more need to blocked to wait RCU grace period.
And, I have implemented a raw patch, took a test in our telecom environment, above problem disappeared.

Any better ideas?

Thanks,
Zhang Haoyu


Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Paolo Bonzini-5
Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto:
> When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table,
> in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is blocked for so much time to wait RCU grace period, and during this period, this vcpu cannot provide service to VM,
> so those interrupts delivered to this vcpu cannot be handled in time, and the apps running on this vcpu cannot be serviced too.
> It's unacceptable in some real-time scenario, e.g. telecom.
>
> So, I want to create a single workqueue for each VM, to asynchronously performing the RCU synchronization for irq routing table,
> and let the vcpu thread return and VMENTRY to service VM immediately, no more need to blocked to wait RCU grace period.
> And, I have implemented a raw patch, took a test in our telecom environment, above problem disappeared.

I don't think a workqueue is even needed.  You just need to use call_rcu
to free "old" after releasing kvm->irq_lock.

What do you think?

Paolo

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Gleb Natapov-4
In reply to this post by Zhanghaoyu (A)
On Tue, Nov 26, 2013 at 12:40:36PM +0000, Zhanghaoyu (A) wrote:
> Hi all,
>
> When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table,
Why vcpu thread ask the hypervisor to update the irq routing table on
pcpu migration?

> in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is blocked for so much time to wait RCU grace period, and during this period, this vcpu cannot provide service to VM,
> so those interrupts delivered to this vcpu cannot be handled in time, and the apps running on this vcpu cannot be serviced too.
> It's unacceptable in some real-time scenario, e.g. telecom.
>
> So, I want to create a single workqueue for each VM, to asynchronously performing the RCU synchronization for irq routing table,
> and let the vcpu thread return and VMENTRY to service VM immediately, no more need to blocked to wait RCU grace period.
> And, I have implemented a raw patch, took a test in our telecom environment, above problem disappeared.
>
> Any better ideas?
>
> Thanks,
> Zhang Haoyu

--
                        Gleb.

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Gleb Natapov-4
On Tue, Nov 26, 2013 at 02:48:10PM +0200, Gleb Natapov wrote:
> On Tue, Nov 26, 2013 at 12:40:36PM +0000, Zhanghaoyu (A) wrote:
> > Hi all,
> >
> > When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table,
> Why vcpu thread ask the hypervisor to update the irq routing table on
> pcpu migration?
>
Ah, I misread. Guest sets irq smp_affinity not host.

--
                        Gleb.

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Gleb Natapov-4
In reply to this post by Paolo Bonzini-5
On Tue, Nov 26, 2013 at 01:47:03PM +0100, Paolo Bonzini wrote:

> Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto:
> > When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table,
> > in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is blocked for so much time to wait RCU grace period, and during this period, this vcpu cannot provide service to VM,
> > so those interrupts delivered to this vcpu cannot be handled in time, and the apps running on this vcpu cannot be serviced too.
> > It's unacceptable in some real-time scenario, e.g. telecom.
> >
> > So, I want to create a single workqueue for each VM, to asynchronously performing the RCU synchronization for irq routing table,
> > and let the vcpu thread return and VMENTRY to service VM immediately, no more need to blocked to wait RCU grace period.
> > And, I have implemented a raw patch, took a test in our telecom environment, above problem disappeared.
>
> I don't think a workqueue is even needed.  You just need to use call_rcu
> to free "old" after releasing kvm->irq_lock.
>
> What do you think?
>
It should be rate limited somehow. Since it guest triggarable guest may cause
host to allocate a lot of memory this way.

Is this about MSI interrupt affinity? IIRC changing INT interrupt
affinity should not trigger kvm_set_irq_routing update. If this is about
MSI only then what about changing userspace to use KVM_SIGNAL_MSI for
MSI injection?

--
                        Gleb.

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Paolo Bonzini-5
Il 26/11/2013 13:56, Gleb Natapov ha scritto:
> > I don't think a workqueue is even needed.  You just need to use call_rcu
> > to free "old" after releasing kvm->irq_lock.
> >
> > What do you think?
>
> It should be rate limited somehow. Since it guest triggarable guest may cause
> host to allocate a lot of memory this way.

True, though if I understand Zhanghaoyu's proposal a workqueue would be
even worse.

Paolo

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Avi Kivity
In reply to this post by Paolo Bonzini-5


On Tue, Nov 26, 2013 at 2:47 PM, Paolo Bonzini <[hidden email]> wrote:
> Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto:
>> When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table,
>> in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is blocked for so much time to wait RCU grace period, and during this period, this vcpu cannot provide service to VM,
>> so those interrupts delivered to this vcpu cannot be handled in time, and the apps running on this vcpu cannot be serviced too.
>> It's unacceptable in some real-time scenario, e.g. telecom.
>>
>> So, I want to create a single workqueue for each VM, to asynchronously performing the RCU synchronization for irq routing table,
>> and let the vcpu thread return and VMENTRY to service VM immediately, no more need to blocked to wait RCU grace period.
>> And, I have implemented a raw patch, took a test in our telecom environment, above problem disappeared.
>
> I don't think a workqueue is even needed.  You just need to use call_rcu
> to free "old" after releasing kvm->irq_lock.
>
> What do you think?

Can this cause an interrupt to be delivered to the wrong (old) vcpu?

The way Linux sets interrupt affinity, it cannot, since changing the affinity is (IIRC) done in the interrupt handler, so the next interrupt cannot be in flight and thus pick up the old interrupt routing table.

However it may be vulnerable in other ways.


Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Paolo Bonzini-5
Il 26/11/2013 14:18, Avi Kivity ha scritto:
>
>> I don't think a workqueue is even needed.  You just need to use call_rcu
>> to free "old" after releasing kvm->irq_lock.
>>
>> What do you think?
>
> Can this cause an interrupt to be delivered to the wrong (old) vcpu?

No, this would be exactly the same code that is running now:

        mutex_lock(&kvm->irq_lock);
        old = kvm->irq_routing;
        kvm_irq_routing_update(kvm, new);
        mutex_unlock(&kvm->irq_lock);

        synchronize_rcu();
        kfree(old);
        return 0;

Except that the kfree would run in the call_rcu kernel thread instead of
the vcpu thread.  But the vcpus already see the new routing table after
the rcu_assign_pointer that is in kvm_irq_routing_update.

There is still the problem that Gleb pointed out, though.

Paolo

> The way Linux sets interrupt affinity, it cannot, since changing the
> affinity is (IIRC) done in the interrupt handler, so the next interrupt
> cannot be in flight and thus pick up the old interrupt routing table.
>
> However it may be vulnerable in other ways.
>
>


Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Avi Kivity-2



On Tue, Nov 26, 2013 at 3:47 PM, Paolo Bonzini <[hidden email]> wrote:
Il 26/11/2013 14:18, Avi Kivity ha scritto:
>
>> I don't think a workqueue is even needed.  You just need to use call_rcu
>> to free "old" after releasing kvm->irq_lock.
>>
>> What do you think?
>
> Can this cause an interrupt to be delivered to the wrong (old) vcpu?

No, this would be exactly the same code that is running now:

        mutex_lock(&kvm->irq_lock);
        old = kvm->irq_routing;
        kvm_irq_routing_update(kvm, new);
        mutex_unlock(&kvm->irq_lock);

        synchronize_rcu();
        kfree(old);
        return 0;

Except that the kfree would run in the call_rcu kernel thread instead of
the vcpu thread.  But the vcpus already see the new routing table after
the rcu_assign_pointer that is in kvm_irq_routing_update.



I understood the proposal was also to eliminate the synchronize_rcu(), so while new interrupts would see the new routing table, interrupts already in flight could pick up the old one.

 

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Paolo Bonzini-5
Il 26/11/2013 15:36, Avi Kivity ha scritto:

>
>     No, this would be exactly the same code that is running now:
>
>             mutex_lock(&kvm->irq_lock);
>             old = kvm->irq_routing;
>             kvm_irq_routing_update(kvm, new);
>             mutex_unlock(&kvm->irq_lock);
>
>             synchronize_rcu();
>             kfree(old);
>             return 0;
>
>     Except that the kfree would run in the call_rcu kernel thread instead of
>     the vcpu thread.  But the vcpus already see the new routing table after
>     the rcu_assign_pointer that is in kvm_irq_routing_update.
>
> I understood the proposal was also to eliminate the synchronize_rcu(),
> so while new interrupts would see the new routing table, interrupts
> already in flight could pick up the old one.

Isn't that always the case with RCU?  (See my answer above: "the vcpus
already see the new routing table after the rcu_assign_pointer that is
in kvm_irq_routing_update").

If you eliminate the synchronize_rcu, new interrupts would see the new
routing table, while interrupts already in flight will get a dangling
pointer.

Paolo

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Avi Kivity-2
On 11/26/2013 04:46 PM, Paolo Bonzini wrote:

> Il 26/11/2013 15:36, Avi Kivity ha scritto:
>>      No, this would be exactly the same code that is running now:
>>
>>              mutex_lock(&kvm->irq_lock);
>>              old = kvm->irq_routing;
>>              kvm_irq_routing_update(kvm, new);
>>              mutex_unlock(&kvm->irq_lock);
>>
>>              synchronize_rcu();
>>              kfree(old);
>>              return 0;
>>
>>      Except that the kfree would run in the call_rcu kernel thread instead of
>>      the vcpu thread.  But the vcpus already see the new routing table after
>>      the rcu_assign_pointer that is in kvm_irq_routing_update.
>>
>> I understood the proposal was also to eliminate the synchronize_rcu(),
>> so while new interrupts would see the new routing table, interrupts
>> already in flight could pick up the old one.
> Isn't that always the case with RCU?  (See my answer above: "the vcpus
> already see the new routing table after the rcu_assign_pointer that is
> in kvm_irq_routing_update").

With synchronize_rcu(), you have the additional guarantee that any
parallel accesses to the old routing table have completed.  Since we
also trigger the irq from rcu context, you know that after
synchronize_rcu() you won't get any interrupts to the old destination
(see kvm_set_irq_inatomic()).

It's another question whether the hardware provides the same guarantee.

> If you eliminate the synchronize_rcu, new interrupts would see the new
> routing table, while interrupts already in flight will get a dangling
> pointer.

Sure, if you drop the synchronize_rcu(), you have to add call_rcu().

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Gleb Natapov-4
On Tue, Nov 26, 2013 at 04:54:44PM +0200, Avi Kivity wrote:

> On 11/26/2013 04:46 PM, Paolo Bonzini wrote:
> >Il 26/11/2013 15:36, Avi Kivity ha scritto:
> >>     No, this would be exactly the same code that is running now:
> >>
> >>             mutex_lock(&kvm->irq_lock);
> >>             old = kvm->irq_routing;
> >>             kvm_irq_routing_update(kvm, new);
> >>             mutex_unlock(&kvm->irq_lock);
> >>
> >>             synchronize_rcu();
> >>             kfree(old);
> >>             return 0;
> >>
> >>     Except that the kfree would run in the call_rcu kernel thread instead of
> >>     the vcpu thread.  But the vcpus already see the new routing table after
> >>     the rcu_assign_pointer that is in kvm_irq_routing_update.
> >>
> >>I understood the proposal was also to eliminate the synchronize_rcu(),
> >>so while new interrupts would see the new routing table, interrupts
> >>already in flight could pick up the old one.
> >Isn't that always the case with RCU?  (See my answer above: "the vcpus
> >already see the new routing table after the rcu_assign_pointer that is
> >in kvm_irq_routing_update").
>
> With synchronize_rcu(), you have the additional guarantee that any
> parallel accesses to the old routing table have completed.  Since we
> also trigger the irq from rcu context, you know that after
> synchronize_rcu() you won't get any interrupts to the old
> destination (see kvm_set_irq_inatomic()).
We do not have this guaranty for other vcpus that do not call
synchronize_rcu(). They may still use outdated routing table while a vcpu
or iothread that performed table update sits in synchronize_rcu().

--
                        Gleb.

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Paolo Bonzini-5
Il 26/11/2013 16:03, Gleb Natapov ha scritto:

>>>> > >>I understood the proposal was also to eliminate the synchronize_rcu(),
>>>> > >>so while new interrupts would see the new routing table, interrupts
>>>> > >>already in flight could pick up the old one.
>>> > >Isn't that always the case with RCU?  (See my answer above: "the vcpus
>>> > >already see the new routing table after the rcu_assign_pointer that is
>>> > >in kvm_irq_routing_update").
>> >
>> > With synchronize_rcu(), you have the additional guarantee that any
>> > parallel accesses to the old routing table have completed.  Since we
>> > also trigger the irq from rcu context, you know that after
>> > synchronize_rcu() you won't get any interrupts to the old
>> > destination (see kvm_set_irq_inatomic()).
> We do not have this guaranty for other vcpus that do not call
> synchronize_rcu(). They may still use outdated routing table while a vcpu
> or iothread that performed table update sits in synchronize_rcu().

Avi's point is that, after the VCPU resumes execution, you know that no
interrupt will be sent to the old destination because
kvm_set_msi_inatomic (and ultimately kvm_irq_delivery_to_apic_fast) is
also called within the RCU read-side critical section.

Without synchronize_rcu you could have

    VCPU writes to routing table
                                       e = entry from IRQ routing table
    kvm_irq_routing_update(kvm, new);
    VCPU resumes execution
                                       kvm_set_msi_irq(e, &irq);
                                       kvm_irq_delivery_to_apic_fast();

where the entry is stale but the VCPU has already resumed execution.

If we want to ensure, we need to use a different mechanism for
synchronization than the global RCU.  QRCU would work; readers are not
wait-free but only if there is a concurrent synchronize_qrcu, which
should be rare.

Paolo

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Avi Kivity-2
In reply to this post by Gleb Natapov-4
On 11/26/2013 05:03 PM, Gleb Natapov wrote:

> On Tue, Nov 26, 2013 at 04:54:44PM +0200, Avi Kivity wrote:
>> On 11/26/2013 04:46 PM, Paolo Bonzini wrote:
>>> Il 26/11/2013 15:36, Avi Kivity ha scritto:
>>>>      No, this would be exactly the same code that is running now:
>>>>
>>>>              mutex_lock(&kvm->irq_lock);
>>>>              old = kvm->irq_routing;
>>>>              kvm_irq_routing_update(kvm, new);
>>>>              mutex_unlock(&kvm->irq_lock);
>>>>
>>>>              synchronize_rcu();
>>>>              kfree(old);
>>>>              return 0;
>>>>
>>>>      Except that the kfree would run in the call_rcu kernel thread instead of
>>>>      the vcpu thread.  But the vcpus already see the new routing table after
>>>>      the rcu_assign_pointer that is in kvm_irq_routing_update.
>>>>
>>>> I understood the proposal was also to eliminate the synchronize_rcu(),
>>>> so while new interrupts would see the new routing table, interrupts
>>>> already in flight could pick up the old one.
>>> Isn't that always the case with RCU?  (See my answer above: "the vcpus
>>> already see the new routing table after the rcu_assign_pointer that is
>>> in kvm_irq_routing_update").
>> With synchronize_rcu(), you have the additional guarantee that any
>> parallel accesses to the old routing table have completed.  Since we
>> also trigger the irq from rcu context, you know that after
>> synchronize_rcu() you won't get any interrupts to the old
>> destination (see kvm_set_irq_inatomic()).
> We do not have this guaranty for other vcpus that do not call
> synchronize_rcu(). They may still use outdated routing table while a vcpu
> or iothread that performed table update sits in synchronize_rcu().
>

Consider this guest code:

   write msi entry, directing the interrupt away from this vcpu
   nop
   memset(&idt, 0, sizeof(idt));

Currently, this code will never trigger a triple fault.  With the change
to call_rcu(), it may.

Now it may be that the guest does not expect this to work (PCI writes
are posted; and interrupts can be delayed indefinitely by the pci
fabric), but we don't know if there's a path that guarantees the guest
something that we're taking away with this change.




Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Avi Kivity-2
In reply to this post by Paolo Bonzini-5
On 11/26/2013 05:20 PM, Paolo Bonzini wrote:

> Il 26/11/2013 16:03, Gleb Natapov ha scritto:
>>>>>>>> I understood the proposal was also to eliminate the synchronize_rcu(),
>>>>>>>> so while new interrupts would see the new routing table, interrupts
>>>>>>>> already in flight could pick up the old one.
>>>>>> Isn't that always the case with RCU?  (See my answer above: "the vcpus
>>>>>> already see the new routing table after the rcu_assign_pointer that is
>>>>>> in kvm_irq_routing_update").
>>>> With synchronize_rcu(), you have the additional guarantee that any
>>>> parallel accesses to the old routing table have completed.  Since we
>>>> also trigger the irq from rcu context, you know that after
>>>> synchronize_rcu() you won't get any interrupts to the old
>>>> destination (see kvm_set_irq_inatomic()).
>> We do not have this guaranty for other vcpus that do not call
>> synchronize_rcu(). They may still use outdated routing table while a vcpu
>> or iothread that performed table update sits in synchronize_rcu().
> Avi's point is that, after the VCPU resumes execution, you know that no
> interrupt will be sent to the old destination because
> kvm_set_msi_inatomic (and ultimately kvm_irq_delivery_to_apic_fast) is
> also called within the RCU read-side critical section.
>
> Without synchronize_rcu you could have
>
>      VCPU writes to routing table
>                                         e = entry from IRQ routing table
>      kvm_irq_routing_update(kvm, new);
>      VCPU resumes execution
>                                         kvm_set_msi_irq(e, &irq);
>                                         kvm_irq_delivery_to_apic_fast();
>
> where the entry is stale but the VCPU has already resumed execution.
>
> If we want to ensure, we need to use a different mechanism for
> synchronization than the global RCU.  QRCU would work; readers are not
> wait-free but only if there is a concurrent synchronize_qrcu, which
> should be rare.

An alternative path is to convince ourselves that the hardware does not
provide the guarantees that the current code provides, and so we can
relax them.

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Paolo Bonzini-5
Il 26/11/2013 16:25, Avi Kivity ha scritto:
>> If we want to ensure, we need to use a different mechanism for
>> synchronization than the global RCU.  QRCU would work; readers are not
>> wait-free but only if there is a concurrent synchronize_qrcu, which
>> should be rare.
>
> An alternative path is to convince ourselves that the hardware does not
> provide the guarantees that the current code provides, and so we can
> relax them.

No, I think it's a reasonable guarantee to provide.

Paolo

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Avi Kivity-2
On 11/26/2013 05:28 PM, Paolo Bonzini wrote:

> Il 26/11/2013 16:25, Avi Kivity ha scritto:
>>> If we want to ensure, we need to use a different mechanism for
>>> synchronization than the global RCU.  QRCU would work; readers are not
>>> wait-free but only if there is a concurrent synchronize_qrcu, which
>>> should be rare.
>> An alternative path is to convince ourselves that the hardware does not
>> provide the guarantees that the current code provides, and so we can
>> relax them.
> No, I think it's a reasonable guarantee to provide.
>

Why?

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Paolo Bonzini-5
Il 26/11/2013 16:35, Avi Kivity ha scritto:

>>>> If we want to ensure, we need to use a different mechanism for
>>>> synchronization than the global RCU.  QRCU would work; readers are not
>>>> wait-free but only if there is a concurrent synchronize_qrcu, which
>>>> should be rare.
>>> An alternative path is to convince ourselves that the hardware does not
>>> provide the guarantees that the current code provides, and so we can
>>> relax them.
>> No, I think it's a reasonable guarantee to provide.
>
> Why?

Because IIUC the semantics may depend not just on the interrupt
controller, but also on the specific PCI device.  It seems safer to
assume that at least one device/driver pair wants this to work.

(BTW, PCI memory writes are posted, but configuration writes are not).

Paolo

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Michael S. Tsirkin-4
In reply to this post by Gleb Natapov-4
On Tue, Nov 26, 2013 at 02:56:10PM +0200, Gleb Natapov wrote:

> On Tue, Nov 26, 2013 at 01:47:03PM +0100, Paolo Bonzini wrote:
> > Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto:
> > > When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table,
> > > in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is blocked for so much time to wait RCU grace period, and during this period, this vcpu cannot provide service to VM,
> > > so those interrupts delivered to this vcpu cannot be handled in time, and the apps running on this vcpu cannot be serviced too.
> > > It's unacceptable in some real-time scenario, e.g. telecom.
> > >
> > > So, I want to create a single workqueue for each VM, to asynchronously performing the RCU synchronization for irq routing table,
> > > and let the vcpu thread return and VMENTRY to service VM immediately, no more need to blocked to wait RCU grace period.
> > > And, I have implemented a raw patch, took a test in our telecom environment, above problem disappeared.
> >
> > I don't think a workqueue is even needed.  You just need to use call_rcu
> > to free "old" after releasing kvm->irq_lock.
> >
> > What do you think?
> >
> It should be rate limited somehow. Since it guest triggarable guest may cause
> host to allocate a lot of memory this way.

The checks in __call_rcu(), should handle this I think.  These keep a per-CPU
counter, which can be adjusted via rcutree.blimit, which defaults
to taking evasive action if more than 10K callbacks are waiting on a
given CPU.



> Is this about MSI interrupt affinity? IIRC changing INT interrupt
> affinity should not trigger kvm_set_irq_routing update. If this is about
> MSI only then what about changing userspace to use KVM_SIGNAL_MSI for
> MSI injection?
>
> --
> Gleb.

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] create a single workqueue for each vm to update vm irq routing table

Avi Kivity-2
In reply to this post by Paolo Bonzini-5
On 11/26/2013 05:58 PM, Paolo Bonzini wrote:

> Il 26/11/2013 16:35, Avi Kivity ha scritto:
>>>>> If we want to ensure, we need to use a different mechanism for
>>>>> synchronization than the global RCU.  QRCU would work; readers are not
>>>>> wait-free but only if there is a concurrent synchronize_qrcu, which
>>>>> should be rare.
>>>> An alternative path is to convince ourselves that the hardware does not
>>>> provide the guarantees that the current code provides, and so we can
>>>> relax them.
>>> No, I think it's a reasonable guarantee to provide.
>> Why?
> Because IIUC the semantics may depend not just on the interrupt
> controller, but also on the specific PCI device.  It seems safer to
> assume that at least one device/driver pair wants this to work.

It's indeed safe, but I think there's a nice win to be had if we drop
the assumption.

> (BTW, PCI memory writes are posted, but configuration writes are not).

MSIs are configured via PCI memory writes.

By itself, that doesn't buy us anything, since the guest could flush the
write via a read.  But I think the fact that the interrupt messages
themselves are posted proves that it is safe.  The fact that Linux does
interrupt migration from within the interrupt handler also shows that
someone else believes that it is the only safe place to do it.



123