[PATCH v3 0/4] block: fix 'savevm' hang with -object iothread

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH v3 0/4] block: fix 'savevm' hang with -object iothread

Stefan Hajnoczi-7
v3:
 * Add missing bdrv_drain_all_end() in error code paths [Kevin]
v2:
 * New patch to use bdrv_drain_all_begin/end() in savevm/loadvm [Kevin]
   (All other patches unchanged)

The 'savevm' command hangs when -object iothread is used.  See patches for
details, but basically the vmstate read/write code didn't conform to the latest
block layer locking rules.

Stefan Hajnoczi (4):
  block: count bdrv_co_rw_vmstate() requests
  block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()
  migration: avoid recursive AioContext locking in save_vmstate()
  migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all()

 block/io.c         | 21 +++++++++++++--------
 migration/savevm.c | 30 ++++++++++++++++++++++++++----
 2 files changed, 39 insertions(+), 12 deletions(-)

--
2.9.3


Reply | Threaded
Open this post in threaded view
|

[PATCH v3 1/4] block: count bdrv_co_rw_vmstate() requests

Stefan Hajnoczi-7
Call bdrv_inc/dec_in_flight() for vmstate reads/writes.  This seems
unnecessary at first glance because vmstate reads/writes are done
synchronously while the guest is stopped.  But we need the bdrv_wakeup()
in bdrv_dec_in_flight() so the main loop sees request completion.
Besides, it's cleaner to count vmstate reads/writes like ordinary
read/write requests.

The bdrv_wakeup() partially fixes a 'savevm' hang with -object iothread.

Signed-off-by: Stefan Hajnoczi <[hidden email]>
Reviewed-by: Eric Blake <[hidden email]>
Reviewed-by: Paolo Bonzini <[hidden email]>
---
 block/io.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index fdd7485..cc56e90 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1988,17 +1988,24 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
                    bool is_read)
 {
     BlockDriver *drv = bs->drv;
+    int ret = -ENOTSUP;
+
+    bdrv_inc_in_flight(bs);
 
     if (!drv) {
-        return -ENOMEDIUM;
+        ret = -ENOMEDIUM;
     } else if (drv->bdrv_load_vmstate) {
-        return is_read ? drv->bdrv_load_vmstate(bs, qiov, pos)
-                       : drv->bdrv_save_vmstate(bs, qiov, pos);
+        if (is_read) {
+            ret = drv->bdrv_load_vmstate(bs, qiov, pos);
+        } else {
+            ret = drv->bdrv_save_vmstate(bs, qiov, pos);
+        }
     } else if (bs->file) {
-        return bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
+        ret = bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
     }
 
-    return -ENOTSUP;
+    bdrv_dec_in_flight(bs);
+    return ret;
 }
 
 static void coroutine_fn bdrv_co_rw_vmstate_entry(void *opaque)
--
2.9.3


Reply | Threaded
Open this post in threaded view
|

[PATCH v3 2/4] block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()

Stefan Hajnoczi-7
In reply to this post by Stefan Hajnoczi-7
Calling aio_poll() directly may have been fine previously, but this is
the future, man!  The difference between an aio_poll() loop and
BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext
around aio_poll().

This allows the IOThread to run fd handlers or BHs to complete the
request.  Failure to release the AioContext causes deadlocks.

Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object
iothread.

Signed-off-by: Stefan Hajnoczi <[hidden email]>
Reviewed-by: Eric Blake <[hidden email]>
Reviewed-by: Paolo Bonzini <[hidden email]>
---
 block/io.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/io.c b/block/io.c
index cc56e90..f0041cd 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2031,9 +2031,7 @@ bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
         Coroutine *co = qemu_coroutine_create(bdrv_co_rw_vmstate_entry, &data);
 
         bdrv_coroutine_enter(bs, co);
-        while (data.ret == -EINPROGRESS) {
-            aio_poll(bdrv_get_aio_context(bs), true);
-        }
+        BDRV_POLL_WHILE(bs, data.ret == -EINPROGRESS);
         return data.ret;
     }
 }
--
2.9.3


Reply | Threaded
Open this post in threaded view
|

[PATCH v3 3/4] migration: avoid recursive AioContext locking in save_vmstate()

Stefan Hajnoczi-7
In reply to this post by Stefan Hajnoczi-7
AioContext was designed to allow nested acquire/release calls.  It uses
a recursive mutex so callers don't need to worry about nesting...or so
we thought.

BDRV_POLL_WHILE() is used to wait for block I/O requests.  It releases
the AioContext temporarily around aio_poll().  This gives IOThreads a
chance to acquire the AioContext to process I/O completions.

It turns out that recursive locking and BDRV_POLL_WHILE() don't mix.
BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread
will not be able to acquire the AioContext if it was acquired
multiple times.

Instead of trying to release AioContext n times in BDRV_POLL_WHILE(),
this patch simply avoids nested locking in save_vmstate().  It's the
simplest fix and we should step back to consider the big picture with
all the recent changes to block layer threading.

This patch is the final fix to solve 'savevm' hanging with -object
iothread.

Signed-off-by: Stefan Hajnoczi <[hidden email]>
Reviewed-by: Eric Blake <[hidden email]>
Reviewed-by: Paolo Bonzini <[hidden email]>
---
 migration/savevm.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index f5e8194..3ca319f 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2150,6 +2150,14 @@ int save_vmstate(const char *name, Error **errp)
         goto the_end;
     }
 
+    /* The bdrv_all_create_snapshot() call that follows acquires the AioContext
+     * for itself.  BDRV_POLL_WHILE() does not support nested locking because
+     * it only releases the lock once.  Therefore synchronous I/O will deadlock
+     * unless we release the AioContext before bdrv_all_create_snapshot().
+     */
+    aio_context_release(aio_context);
+    aio_context = NULL;
+
     ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, &bs);
     if (ret < 0) {
         error_setg(errp, "Error while creating snapshot on '%s'",
@@ -2160,7 +2168,9 @@ int save_vmstate(const char *name, Error **errp)
     ret = 0;
 
  the_end:
-    aio_context_release(aio_context);
+    if (aio_context) {
+        aio_context_release(aio_context);
+    }
     if (saved_vm_running) {
         vm_start();
     }
--
2.9.3


Reply | Threaded
Open this post in threaded view
|

[PATCH v3 4/4] migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all()

Stefan Hajnoczi-7
In reply to this post by Stefan Hajnoczi-7
blk/bdrv_drain_all() only takes effect for a single instant and then
resumes block jobs, guest devices, and other external clients like the
NBD server.  This can be handy when performing a synchronous drain
before terminating the program, for example.

Monitor commands usually need to quiesce I/O across an entire code
region so blk/bdrv_drain_all() is not suitable.  They must use
bdrv_drain_all_begin/end() to mark the region.  This prevents new I/O
requests from slipping in or worse - block jobs completing and modifying
the graph.

I audited other blk/bdrv_drain_all() callers but did not find anything
that needs a similar fix.  This patch fixes the savevm/loadvm commands.
Although I haven't encountered a read world issue this makes the code
safer.

Suggested-by: Kevin Wolf <[hidden email]>
Signed-off-by: Stefan Hajnoczi <[hidden email]>
---
 migration/savevm.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 3ca319f..c7c5ea5 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2113,6 +2113,8 @@ int save_vmstate(const char *name, Error **errp)
     }
     vm_stop(RUN_STATE_SAVE_VM);
 
+    bdrv_drain_all_begin();
+
     aio_context_acquire(aio_context);
 
     memset(sn, 0, sizeof(*sn));
@@ -2171,6 +2173,9 @@ int save_vmstate(const char *name, Error **errp)
     if (aio_context) {
         aio_context_release(aio_context);
     }
+
+    bdrv_drain_all_end();
+
     if (saved_vm_running) {
         vm_start();
     }
@@ -2279,20 +2284,21 @@ int load_vmstate(const char *name, Error **errp)
     }
 
     /* Flush all IO requests so they don't interfere with the new state.  */
-    bdrv_drain_all();
+    bdrv_drain_all_begin();
 
     ret = bdrv_all_goto_snapshot(name, &bs);
     if (ret < 0) {
         error_setg(errp, "Error %d while activating snapshot '%s' on '%s'",
                      ret, name, bdrv_get_device_name(bs));
-        return ret;
+        goto err_drain;
     }
 
     /* restore the VM state */
     f = qemu_fopen_bdrv(bs_vm_state, 0);
     if (!f) {
         error_setg(errp, "Could not open VM state file");
-        return -EINVAL;
+        ret = -EINVAL;
+        goto err_drain;
     }
 
     qemu_system_reset(VMRESET_SILENT);
@@ -2303,6 +2309,8 @@ int load_vmstate(const char *name, Error **errp)
     qemu_fclose(f);
     aio_context_release(aio_context);
 
+    bdrv_drain_all_end();
+
     migration_incoming_state_destroy();
     if (ret < 0) {
         error_setg(errp, "Error %d while loading VM state", ret);
@@ -2310,6 +2318,10 @@ int load_vmstate(const char *name, Error **errp)
     }
 
     return 0;
+
+err_drain:
+    bdrv_drain_all_end();
+    return ret;
 }
 
 void vmstate_register_ram(MemoryRegion *mr, DeviceState *dev)
--
2.9.3


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v3 4/4] migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all()

Eric Blake
On 05/22/2017 08:57 AM, Stefan Hajnoczi wrote:

> blk/bdrv_drain_all() only takes effect for a single instant and then
> resumes block jobs, guest devices, and other external clients like the
> NBD server.  This can be handy when performing a synchronous drain
> before terminating the program, for example.
>
> Monitor commands usually need to quiesce I/O across an entire code
> region so blk/bdrv_drain_all() is not suitable.  They must use
> bdrv_drain_all_begin/end() to mark the region.  This prevents new I/O
> requests from slipping in or worse - block jobs completing and modifying
> the graph.
>
> I audited other blk/bdrv_drain_all() callers but did not find anything
> that needs a similar fix.  This patch fixes the savevm/loadvm commands.
> Although I haven't encountered a read world issue this makes the code
> safer.
>
> Suggested-by: Kevin Wolf <[hidden email]>
> Signed-off-by: Stefan Hajnoczi <[hidden email]>
> ---
>  migration/savevm.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
Reviewed-by: Eric Blake <[hidden email]>

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


signature.asc (617 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Qemu-block] [PATCH v3 0/4] block: fix 'savevm' hang with -object iothread

Stefan Hajnoczi
In reply to this post by Stefan Hajnoczi-7
On Mon, May 22, 2017 at 02:57:00PM +0100, Stefan Hajnoczi wrote:

> v3:
>  * Add missing bdrv_drain_all_end() in error code paths [Kevin]
> v2:
>  * New patch to use bdrv_drain_all_begin/end() in savevm/loadvm [Kevin]
>    (All other patches unchanged)
>
> The 'savevm' command hangs when -object iothread is used.  See patches for
> details, but basically the vmstate read/write code didn't conform to the latest
> block layer locking rules.
>
> Stefan Hajnoczi (4):
>   block: count bdrv_co_rw_vmstate() requests
>   block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()
>   migration: avoid recursive AioContext locking in save_vmstate()
>   migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all()
>
>  block/io.c         | 21 +++++++++++++--------
>  migration/savevm.c | 30 ++++++++++++++++++++++++++----
>  2 files changed, 39 insertions(+), 12 deletions(-)
Ping

signature.asc (465 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Qemu-block] [PATCH v3 0/4] block: fix 'savevm' hang with -object iothread

Stefan Hajnoczi
In reply to this post by Stefan Hajnoczi-7
On Mon, May 22, 2017 at 02:57:00PM +0100, Stefan Hajnoczi wrote:

> v3:
>  * Add missing bdrv_drain_all_end() in error code paths [Kevin]
> v2:
>  * New patch to use bdrv_drain_all_begin/end() in savevm/loadvm [Kevin]
>    (All other patches unchanged)
>
> The 'savevm' command hangs when -object iothread is used.  See patches for
> details, but basically the vmstate read/write code didn't conform to the latest
> block layer locking rules.
>
> Stefan Hajnoczi (4):
>   block: count bdrv_co_rw_vmstate() requests
>   block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()
>   migration: avoid recursive AioContext locking in save_vmstate()
>   migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all()
>
>  block/io.c         | 21 +++++++++++++--------
>  migration/savevm.c | 30 ++++++++++++++++++++++++++----
>  2 files changed, 39 insertions(+), 12 deletions(-)
Ping ^ 2

Stefan

signature.asc (465 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Qemu-block] [PATCH v3 0/4] block: fix 'savevm' hang with -object iothread

Kevin Wolf-5
Am 12.06.2017 um 15:47 hat Stefan Hajnoczi geschrieben:

> On Mon, May 22, 2017 at 02:57:00PM +0100, Stefan Hajnoczi wrote:
> > v3:
> >  * Add missing bdrv_drain_all_end() in error code paths [Kevin]
> > v2:
> >  * New patch to use bdrv_drain_all_begin/end() in savevm/loadvm [Kevin]
> >    (All other patches unchanged)
> >
> > The 'savevm' command hangs when -object iothread is used.  See patches for
> > details, but basically the vmstate read/write code didn't conform to the latest
> > block layer locking rules.
> >
> > Stefan Hajnoczi (4):
> >   block: count bdrv_co_rw_vmstate() requests
> >   block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()
> >   migration: avoid recursive AioContext locking in save_vmstate()
> >   migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all()
> >
> >  block/io.c         | 21 +++++++++++++--------
> >  migration/savevm.c | 30 ++++++++++++++++++++++++++----
> >  2 files changed, 39 insertions(+), 12 deletions(-)
>
> Ping ^ 2
Thanks, applied to the block branch.

Kevin

attachment0 (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v3 3/4] migration: avoid recursive AioContext locking in save_vmstate()

Pavel Butsykin-2
In reply to this post by Stefan Hajnoczi-7

On 22.05.2017 16:57, Stefan Hajnoczi wrote:

> AioContext was designed to allow nested acquire/release calls.  It uses
> a recursive mutex so callers don't need to worry about nesting...or so
> we thought.
>
> BDRV_POLL_WHILE() is used to wait for block I/O requests.  It releases
> the AioContext temporarily around aio_poll().  This gives IOThreads a
> chance to acquire the AioContext to process I/O completions.
>
> It turns out that recursive locking and BDRV_POLL_WHILE() don't mix.
> BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread
> will not be able to acquire the AioContext if it was acquired
> multiple times.
>
> Instead of trying to release AioContext n times in BDRV_POLL_WHILE(),
> this patch simply avoids nested locking in save_vmstate().  It's the
> simplest fix and we should step back to consider the big picture with
> all the recent changes to block layer threading.
>
> This patch is the final fix to solve 'savevm' hanging with -object
> iothread.

The same I see in external_snapshot_prepare():
     /* Acquire AioContext now so any threads operating on old_bs stop */
     state->aio_context = bdrv_get_aio_context(state->old_bs);
     aio_context_acquire(state->aio_context);
     bdrv_drained_begin(state->old_bs);

     if (!bdrv_is_inserted(state->old_bs)) {
         error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
         return;
     }

     if (bdrv_op_is_blocked(state->old_bs,
                            BLOCK_OP_TYPE_EXTERNAL_SNAPSHOT, errp)) {
         return;
     }

     if (!bdrv_is_read_only(state->old_bs)) {
         if (bdrv_flush(state->old_bs)) {      <---!!!

and at the moment BDRV_POLL_WHILE(bs, flush_co.ret == NOT_DONE),
we have at least two locks.. So here is another deadlock.

> Signed-off-by: Stefan Hajnoczi <[hidden email]>
> Reviewed-by: Eric Blake <[hidden email]>
> Reviewed-by: Paolo Bonzini <[hidden email]>
> ---
>   migration/savevm.c | 12 +++++++++++-
>   1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/migration/savevm.c b/migration/savevm.c
> index f5e8194..3ca319f 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2150,6 +2150,14 @@ int save_vmstate(const char *name, Error **errp)
>           goto the_end;
>       }
>  
> +    /* The bdrv_all_create_snapshot() call that follows acquires the AioContext
> +     * for itself.  BDRV_POLL_WHILE() does not support nested locking because
> +     * it only releases the lock once.  Therefore synchronous I/O will deadlock
> +     * unless we release the AioContext before bdrv_all_create_snapshot().
> +     */
> +    aio_context_release(aio_context);
> +    aio_context = NULL;
> +
>       ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, &bs);
>       if (ret < 0) {
>           error_setg(errp, "Error while creating snapshot on '%s'",
> @@ -2160,7 +2168,9 @@ int save_vmstate(const char *name, Error **errp)
>       ret = 0;
>  
>    the_end:
> -    aio_context_release(aio_context);
> +    if (aio_context) {
> +        aio_context_release(aio_context);
> +    }
>       if (saved_vm_running) {
>           vm_start();
>       }
>

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v3 3/4] migration: avoid recursive AioContext locking in save_vmstate()

Pavel Butsykin-2
On 14.06.2017 13:10, Pavel Butsykin wrote:

>
> On 22.05.2017 16:57, Stefan Hajnoczi wrote:
>> AioContext was designed to allow nested acquire/release calls.  It uses
>> a recursive mutex so callers don't need to worry about nesting...or so
>> we thought.
>>
>> BDRV_POLL_WHILE() is used to wait for block I/O requests.  It releases
>> the AioContext temporarily around aio_poll().  This gives IOThreads a
>> chance to acquire the AioContext to process I/O completions.
>>
>> It turns out that recursive locking and BDRV_POLL_WHILE() don't mix.
>> BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread
>> will not be able to acquire the AioContext if it was acquired
>> multiple times.
>>
>> Instead of trying to release AioContext n times in BDRV_POLL_WHILE(),
>> this patch simply avoids nested locking in save_vmstate().  It's the
>> simplest fix and we should step back to consider the big picture with
>> all the recent changes to block layer threading.
>>
>> This patch is the final fix to solve 'savevm' hanging with -object
>> iothread.
>
> The same I see in external_snapshot_prepare():
>      /* Acquire AioContext now so any threads operating on old_bs stop */
>      state->aio_context = bdrv_get_aio_context(state->old_bs);
>      aio_context_acquire(state->aio_context);
>      bdrv_drained_begin(state->old_bs);
>
>      if (!bdrv_is_inserted(state->old_bs)) {
>          error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
>          return;
>      }
>
>      if (bdrv_op_is_blocked(state->old_bs,
>                             BLOCK_OP_TYPE_EXTERNAL_SNAPSHOT, errp)) {
>          return;
>      }
>
>      if (!bdrv_is_read_only(state->old_bs)) {
>          if (bdrv_flush(state->old_bs)) {      <---!!!
>
> and at the moment BDRV_POLL_WHILE(bs, flush_co.ret == NOT_DONE),
> we have at least two locks.. So here is another deadlock.

Sorry, here different kind of deadlock. In external_snapshot case, the
deadlock can happen only if state->old_bs->aio_context == my_iothread->ctx,
because in this case the aio_co_enter() always calls aio_co_schedule():

void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *co)
{
     aio_co_enter(bdrv_get_aio_context(bs), co);
}
...
void aio_co_enter(AioContext *ctx, struct Coroutine *co)
{
     if (ctx != qemu_get_current_aio_context()) {
         aio_co_schedule(ctx, co);
         return;
     }
...

But the iothread will not be able to perform coroutine, because the
my_iothread->ctx is already under lock here:

static void external_snapshot_prepare(BlkActionState *common,
...
     /* Acquire AioContext now so any threads operating on old_bs stop */
     state->aio_context = bdrv_get_aio_context(state->old_bs);
     aio_context_acquire(state->aio_context);
     bdrv_drained_begin(state->old_bs);
...

As a result, we can see the following deadlock:

Thread 1 (Thread 0x7f0da6af9bc0 (LWP 973294)):
#0  0x00007f0da49b5ebf in __GI_ppoll (fds=0x7f0da87434b0, nfds=1,
timeout=<optimized out>, sigmask=0x0)
     at ../sysdeps/unix/sysv/linux/ppoll.c:56
#1  0x00007f0da71e8234 in qemu_poll_ns (fds=0x7f0da87434b0, nfds=1,
timeout=-1) at util/qemu-timer.c:322
#2  0x00007f0da71eaf32 in aio_poll (ctx=0x7f0da870e7c0, blocking=true)
at util/aio-posix.c:622
#3  0x00007f0da715ff9c in bdrv_flush (bs=0x7f0da876c270) at block/io.c:2397
#4  0x00007f0da6e98344 in external_snapshot_prepare
(common=0x7f0da9d727b0, errp=0x7ffec3893f38)
     at blockdev.c:1686
#5  0x00007f0da6e99537 in qmp_transaction (dev_list=0x7f0da98abb40,
has_props=false,
     props=0x7f0daa2788c0, errp=0x7ffec3893fc0) at blockdev.c:2205
#6  0x00007f0da6ebee21 in qmp_marshal_transaction (args=0x7f0da9e7b700,
ret=0x7ffec3894030,
     errp=0x7ffec3894028) at qmp-marshal.c:5952
#7  0x00007f0da71d9783 in do_qmp_dispatch (cmds=0x7f0da785f940
<qmp_commands>, request=0x7f0da87b8400,
     errp=0x7ffec3894090) at qapi/qmp-dispatch.c:104
#8  0x00007f0da71d98bb in qmp_dispatch (cmds=0x7f0da785f940
<qmp_commands>, request=0x7f0da87b8400)
     at qapi/qmp-dispatch.c:131
#9  0x00007f0da6d6a0a1 in handle_qmp_command (parser=0x7f0da874c150,
tokens=0x7f0da870e2e0)
     at /root/qemu/monitor.c:3834
#10 0x00007f0da71e0c62 in json_message_process_token
(lexer=0x7f0da874c158, input=0x7f0da870e200,
     type=JSON_RCURLY, x=422, y=35) at qobject/json-streamer.c:105
#11 0x00007f0da720ba27 in json_lexer_feed_char (lexer=0x7f0da874c158,
ch=125 '}', flush=false)
     at qobject/json-lexer.c:319
#12 0x00007f0da720bb67 in json_lexer_feed (lexer=0x7f0da874c158,
buffer=0x7ffec3894320 "}", size=1)
     at qobject/json-lexer.c:369
#13 0x00007f0da71e0d02 in json_message_parser_feed
(parser=0x7f0da874c150, buffer=0x7ffec3894320 "}",
     size=1) at qobject/json-streamer.c:124
---Type <return> to continue, or q <return> to quit---
#14 0x00007f0da6d6a263 in monitor_qmp_read (opaque=0x7f0da874c0d0,
buf=0x7ffec3894320 "}", size=1)
     at /root/qemu/monitor.c:3876
#15 0x00007f0da717af53 in qemu_chr_be_write_impl (s=0x7f0da871c3f0,
buf=0x7ffec3894320 "}", len=1)
     at chardev/char.c:167
#16 0x00007f0da717afbb in qemu_chr_be_write (s=0x7f0da871c3f0,
buf=0x7ffec3894320 "}", len=1)
     at chardev/char.c:179
#17 0x00007f0da7183204 in tcp_chr_read (chan=0x7f0daa211680,
cond=G_IO_IN, opaque=0x7f0da871c3f0)
     at chardev/char-socket.c:441
#18 0x00007f0da7196e93 in qio_channel_fd_source_dispatch
(source=0x7f0da871cb80,
     callback=0x7f0da7183061 <tcp_chr_read>, user_data=0x7f0da871c3f0)
at io/channel-watch.c:84
#19 0x00007f0da55fbd7a in g_main_dispatch (context=0x7f0da870eb90) at
gmain.c:3152
#20 g_main_context_dispatch (context=0x7f0da870eb90) at gmain.c:3767
#21 0x00007f0da71e9224 in glib_pollfds_poll () at util/main-loop.c:213
#22 0x00007f0da71e9322 in os_host_main_loop_wait (timeout=811021610) at
util/main-loop.c:261
#23 0x00007f0da71e93de in main_loop_wait (nonblocking=0) at
util/main-loop.c:517
#24 0x00007f0da6ea6786 in main_loop () at vl.c:1918
#25 0x00007f0da6eae43d in main (argc=65, argv=0x7ffec38958b8,
envp=0x7ffec3895ac8) at vl.c:4752

Thread 7 (Thread 0x7f0da04f6700 (LWP 973297)):
#0  __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f0da4c93d1d in _L_lock_840 () from /lib64/libpthread.so.0
#2  0x00007f0da4c93c3a in __GI___pthread_mutex_lock
(mutex=0x7f0da871dc80) at pthread_mutex_lock.c:85
#3  0x00007f0da71ed62a in qemu_mutex_lock (mutex=0x7f0da871dc80) at
util/qemu-thread-posix.c:61
#4  0x00007f0da71e6d1d in aio_context_acquire (ctx=0x7f0da871dc20) at
util/async.c:489
#5  0x00007f0da71e6945 in co_schedule_bh_cb (opaque=0x7f0da871dc20) at
util/async.c:390
#6  0x00007f0da71e60ee in aio_bh_call (bh=0x7f0da871dd50) at util/async.c:90
#7  0x00007f0da71e6185 in aio_bh_poll (ctx=0x7f0da871dc20) at
util/async.c:118
#8  0x00007f0da71eb212 in aio_poll (ctx=0x7f0da871dc20, blocking=true)
at util/aio-posix.c:682
#9  0x00007f0da6e9e1ec in iothread_run (opaque=0x7f0da871da90) at
iothread.c:59
#10 0x00007f0da4c91dc5 in start_thread (arg=0x7f0da04f6700) at
pthread_create.c:308
#11 0x00007f0da49c073d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread1 is waiting for that iothread(Thread7) will execute
coroutine(bdrv_flush_co_entry),
but the iothread is locked in Thread1.

Maybe we should take the lock only if state->aio_context ==
qemu_get_current_aio_context

>> Signed-off-by: Stefan Hajnoczi <[hidden email]>
>> Reviewed-by: Eric Blake <[hidden email]>
>> Reviewed-by: Paolo Bonzini <[hidden email]>
>> ---
>>   migration/savevm.c | 12 +++++++++++-
>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index f5e8194..3ca319f 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -2150,6 +2150,14 @@ int save_vmstate(const char *name, Error **errp)
>>           goto the_end;
>>       }
>> +    /* The bdrv_all_create_snapshot() call that follows acquires the
>> AioContext
>> +     * for itself.  BDRV_POLL_WHILE() does not support nested locking
>> because
>> +     * it only releases the lock once.  Therefore synchronous I/O
>> will deadlock
>> +     * unless we release the AioContext before
>> bdrv_all_create_snapshot().
>> +     */
>> +    aio_context_release(aio_context);
>> +    aio_context = NULL;
>> +
>>       ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, &bs);
>>       if (ret < 0) {
>>           error_setg(errp, "Error while creating snapshot on '%s'",
>> @@ -2160,7 +2168,9 @@ int save_vmstate(const char *name, Error **errp)
>>       ret = 0;
>>    the_end:
>> -    aio_context_release(aio_context);
>> +    if (aio_context) {
>> +        aio_context_release(aio_context);
>> +    }
>>       if (saved_vm_running) {
>>           vm_start();
>>       }
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v3 3/4] migration: avoid recursive AioContext locking in save_vmstate()

Kevin Wolf-5
Am 14.06.2017 um 15:15 hat Pavel Butsykin geschrieben:

> On 14.06.2017 13:10, Pavel Butsykin wrote:
> >
> >On 22.05.2017 16:57, Stefan Hajnoczi wrote:
> >>AioContext was designed to allow nested acquire/release calls.  It uses
> >>a recursive mutex so callers don't need to worry about nesting...or so
> >>we thought.
> >>
> >>BDRV_POLL_WHILE() is used to wait for block I/O requests.  It releases
> >>the AioContext temporarily around aio_poll().  This gives IOThreads a
> >>chance to acquire the AioContext to process I/O completions.
> >>
> >>It turns out that recursive locking and BDRV_POLL_WHILE() don't mix.
> >>BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread
> >>will not be able to acquire the AioContext if it was acquired
> >>multiple times.
> >>
> >>Instead of trying to release AioContext n times in BDRV_POLL_WHILE(),
> >>this patch simply avoids nested locking in save_vmstate().  It's the
> >>simplest fix and we should step back to consider the big picture with
> >>all the recent changes to block layer threading.
> >>
> >>This patch is the final fix to solve 'savevm' hanging with -object
> >>iothread.
> >
> >The same I see in external_snapshot_prepare():
> >[...]
> >and at the moment BDRV_POLL_WHILE(bs, flush_co.ret == NOT_DONE),
> >we have at least two locks.. So here is another deadlock.
>
> Sorry, here different kind of deadlock. In external_snapshot case, the
> deadlock can happen only if state->old_bs->aio_context == my_iothread->ctx,
> because in this case the aio_co_enter() always calls aio_co_schedule():

Can you please write qemu-iotests case for any deadlock case that we're
seeing? Stefan, we could also use one for the bug fixed in this series.

Kevin

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v3 3/4] migration: avoid recursive AioContext locking in save_vmstate()

Pavel Butsykin-2
On 14.06.2017 17:43, Kevin Wolf wrote:

> Am 14.06.2017 um 15:15 hat Pavel Butsykin geschrieben:
>> On 14.06.2017 13:10, Pavel Butsykin wrote:
>>>
>>> On 22.05.2017 16:57, Stefan Hajnoczi wrote:
>>>> AioContext was designed to allow nested acquire/release calls.  It uses
>>>> a recursive mutex so callers don't need to worry about nesting...or so
>>>> we thought.
>>>>
>>>> BDRV_POLL_WHILE() is used to wait for block I/O requests.  It releases
>>>> the AioContext temporarily around aio_poll().  This gives IOThreads a
>>>> chance to acquire the AioContext to process I/O completions.
>>>>
>>>> It turns out that recursive locking and BDRV_POLL_WHILE() don't mix.
>>>> BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread
>>>> will not be able to acquire the AioContext if it was acquired
>>>> multiple times.
>>>>
>>>> Instead of trying to release AioContext n times in BDRV_POLL_WHILE(),
>>>> this patch simply avoids nested locking in save_vmstate().  It's the
>>>> simplest fix and we should step back to consider the big picture with
>>>> all the recent changes to block layer threading.
>>>>
>>>> This patch is the final fix to solve 'savevm' hanging with -object
>>>> iothread.
>>>
>>> The same I see in external_snapshot_prepare():
>>> [...]
>>> and at the moment BDRV_POLL_WHILE(bs, flush_co.ret == NOT_DONE),
>>> we have at least two locks.. So here is another deadlock.
>>
>> Sorry, here different kind of deadlock. In external_snapshot case, the
>> deadlock can happen only if state->old_bs->aio_context == my_iothread->ctx,
>> because in this case the aio_co_enter() always calls aio_co_schedule():
>
> Can you please write qemu-iotests case for any deadlock case that we're
> seeing? Stefan, we could also use one for the bug fixed in this series.
It's 085 test, only need to enable iothread. (patch attached)
# ./check -qcow2 085 -iothread
...
+Timeout waiting for return on handle 0
Failures: 085
Failed 1 of 1 tests

The timeout because of the deadlock.


Actually the deadlock for the same reason :D

A recursive lock occurs when in the transaction more than one action:
void qmp_transaction(TransactionActionList *dev_list,
...
     /* We don't do anything in this loop that commits us to the
operations */
     while (NULL != dev_entry) {
...
         state->ops->prepare(state, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             goto delete_and_fail;
         }
     }

     QSIMPLEQ_FOREACH(state, &snap_bdrv_states, entry) {
         if (state->ops->commit) {
             state->ops->commit(state);

There the contex lock is acquired in state->ops->prepare(), but is
released in state->ops->commit() (at bdrv_reopen_multiple()). And when
in a transaction two or more actions, we will see a recursive lock.
Unfortunately I have no idea how cheap it is to fix this.

> Kevin
>

0001-qemu-iotests-add-iothread-option.patch (3K) Download Attachment