[PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu

Lukas Straub
Hello Everyone,
In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
to some other server and that server dies or hangs, qemu hangs too.
These patches introduce the new 'yank' out-of-band qmp command to recover from
these kinds of hangs. The different subsystems register callbacks which get
executed with the yank command. For example the callback can shutdown() a
socket. This is intended for the colo use-case, but it can be used for other
things too of course.

Regards,
Lukas Straub

v7:
 -yank_register_instance now returns error via Error **errp instead of aborting
 -dropped "chardev/char.c: Check for duplicate id before  creating chardev"

v6:
 -add Reviewed-by and Acked-by tags
 -rebase on master
 -lots of changes in nbd due to rebase
 -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
 -fix a crash discovered by the newly added chardev test
 -fix the test itself

v5:
 -move yank.c to util/
 -move yank.h to include/qemu/
 -add license to yank.h
 -use const char*
 -nbd: use atomic_store_release and atomic_load_aqcuire
 -io-channel: ensure thread-safety and document it
 -add myself as maintainer for yank

v4:
 -fix build errors...

v3:
 -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
 -fix build errors
 -rewrite migration patch so it actually passes all tests

v2:
 -don't touch io/ code anymore
 -always register yank functions
 -'yank' now takes a list of instances to yank
 -'query-yank' returns a list of yankable instances

Lukas Straub (8):
  Introduce yank feature
  block/nbd.c: Add yank feature
  chardev/char-socket.c: Add yank feature
  migration: Add yank feature
  io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
  io: Document thread-safety of qio_channel_shutdown
  MAINTAINERS: Add myself as maintainer for yank feature
  tests/test-char.c: Wait for the chardev to connect in
    char_socket_client_dupid_test

 MAINTAINERS                   |   6 ++
 block/nbd.c                   | 129 +++++++++++++++---------
 chardev/char-socket.c         |  31 ++++++
 include/io/channel.h          |   2 +
 include/qemu/yank.h           |  80 +++++++++++++++
 io/channel-tls.c              |   6 +-
 migration/channel.c           |  12 +++
 migration/migration.c         |  25 ++++-
 migration/multifd.c           |  10 ++
 migration/qemu-file-channel.c |   6 ++
 migration/savevm.c            |   6 ++
 qapi/misc.json                |  45 +++++++++
 tests/Makefile.include        |   2 +-
 tests/test-char.c             |   1 +
 util/Makefile.objs            |   1 +
 util/yank.c                   | 184 ++++++++++++++++++++++++++++++++++
 16 files changed, 493 insertions(+), 53 deletions(-)
 create mode 100644 include/qemu/yank.h
 create mode 100644 util/yank.c

--
2.20.1

attachment0 (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH v7 1/8] Introduce yank feature

Lukas Straub
The yank feature allows to recover from hanging qemu by "yanking"
at various parts. Other qemu systems can register themselves and
multiple yank functions. Then all yank functions for selected
instances can be called by the 'yank' out-of-band qmp command.
Available instances can be queried by a 'query-yank' oob command.

Signed-off-by: Lukas Straub <[hidden email]>
Acked-by: Stefan Hajnoczi <[hidden email]>
---
 include/qemu/yank.h |  80 +++++++++++++++++++
 qapi/misc.json      |  45 +++++++++++
 util/Makefile.objs  |   1 +
 util/yank.c         | 184 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 310 insertions(+)
 create mode 100644 include/qemu/yank.h
 create mode 100644 util/yank.c

diff --git a/include/qemu/yank.h b/include/qemu/yank.h
new file mode 100644
index 0000000000..cd184fcd05
--- /dev/null
+++ b/include/qemu/yank.h
@@ -0,0 +1,80 @@
+/*
+ * QEMU yank feature
+ *
+ * Copyright (c) Lukas Straub <[hidden email]>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef YANK_H
+#define YANK_H
+
+typedef void (YankFn) (void *opaque);
+
+/**
+ * yank_register_instance: Register a new instance.
+ *
+ * This registers a new instance for yanking. Must be called before any yank
+ * function is registered for this instance.
+ *
+ * This function is thread-safe.
+ *
+ * @instance_name: The globally unique name of the instance.
+ * @errp: ...
+ */
+void yank_register_instance(const char *instance_name, Error **errp);
+
+/**
+ * yank_unregister_instance: Unregister a instance.
+ *
+ * This unregisters a instance. Must be called only after every yank function
+ * of the instance has been unregistered.
+ *
+ * This function is thread-safe.
+ *
+ * @instance_name: The name of the instance.
+ */
+void yank_unregister_instance(const char *instance_name);
+
+/**
+ * yank_register_function: Register a yank function
+ *
+ * This registers a yank function. All limitations of qmp oob commands apply
+ * to the yank function as well.
+ *
+ * This function is thread-safe.
+ *
+ * @instance_name: The name of the instance
+ * @func: The yank function
+ * @opaque: Will be passed to the yank function
+ */
+void yank_register_function(const char *instance_name,
+                            YankFn *func,
+                            void *opaque);
+
+/**
+ * yank_unregister_function: Unregister a yank function
+ *
+ * This unregisters a yank function.
+ *
+ * This function is thread-safe.
+ *
+ * @instance_name: The name of the instance
+ * @func: func that was passed to yank_register_function
+ * @opaque: opaque that was passed to yank_register_function
+ */
+void yank_unregister_function(const char *instance_name,
+                              YankFn *func,
+                              void *opaque);
+
+/**
+ * yank_unregister_function: Generic yank function for iochannel
+ *
+ * This is a generic yank function which will call qio_channel_shutdown on the
+ * provided QIOChannel.
+ *
+ * @opaque: QIOChannel to shutdown
+ */
+void yank_generic_iochannel(void *opaque);
+#endif
diff --git a/qapi/misc.json b/qapi/misc.json
index 9d32820dc1..0d6a8f20b7 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -1615,3 +1615,48 @@
 ##
 { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }

+##
+# @YankInstances:
+#
+# @instances: List of yank instances.
+#
+# Yank instances are named after the following schema:
+# "blockdev:<node-name>", "chardev:<chardev-name>" and "migration"
+#
+# Since: 5.1
+##
+{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } }
+
+##
+# @yank:
+#
+# Recover from hanging qemu by yanking the specified instances.
+#
+# Takes @YankInstances as argument.
+#
+# Returns: nothing.
+#
+# Example:
+#
+# -> { "execute": "yank", "arguments": { "instances": ["blockdev:nbd0"] } }
+# <- { "return": {} }
+#
+# Since: 5.1
+##
+{ 'command': 'yank', 'data': 'YankInstances', 'allow-oob': true }
+
+##
+# @query-yank:
+#
+# Query yank instances.
+#
+# Returns: @YankInstances
+#
+# Example:
+#
+# -> { "execute": "query-yank" }
+# <- { "return": { "instances": ["blockdev:nbd0"] } }
+#
+# Since: 5.1
+##
+{ 'command': 'query-yank', 'returns': 'YankInstances', 'allow-oob': true }
diff --git a/util/Makefile.objs b/util/Makefile.objs
index cc5e37177a..13faa98425 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -45,6 +45,7 @@ util-obj-$(CONFIG_GIO) += dbus.o
 dbus.o-cflags = $(GIO_CFLAGS)
 dbus.o-libs = $(GIO_LIBS)
 util-obj-$(CONFIG_USER_ONLY) += selfmap.o
+util-obj-y += yank.o

 #######################################################################
 # code used by both qemu system emulation and qemu-img
diff --git a/util/yank.c b/util/yank.c
new file mode 100644
index 0000000000..b0cd27728b
--- /dev/null
+++ b/util/yank.c
@@ -0,0 +1,184 @@
+/*
+ * QEMU yank feature
+ *
+ * Copyright (c) Lukas Straub <[hidden email]>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/thread.h"
+#include "qemu/queue.h"
+#include "qapi/qapi-commands-misc.h"
+#include "io/channel.h"
+#include "qemu/yank.h"
+
+struct YankFuncAndParam {
+    YankFn *func;
+    void *opaque;
+    QLIST_ENTRY(YankFuncAndParam) next;
+};
+
+struct YankInstance {
+    char *name;
+    QLIST_HEAD(, YankFuncAndParam) yankfns;
+    QLIST_ENTRY(YankInstance) next;
+};
+
+static QemuMutex lock;
+static QLIST_HEAD(yankinst_list, YankInstance) head
+    = QLIST_HEAD_INITIALIZER(head);
+
+static struct YankInstance *yank_find_instance(const char *name)
+{
+    struct YankInstance *tmp, *instance;
+    instance = NULL;
+    QLIST_FOREACH(tmp, &head, next) {
+        if (!strcmp(tmp->name, name)) {
+            instance = tmp;
+        }
+    }
+    return instance;
+}
+
+void yank_register_instance(const char *instance_name, Error **errp)
+{
+    struct YankInstance *instance;
+
+    qemu_mutex_lock(&lock);
+
+    if (yank_find_instance(instance_name)) {
+        error_setg(errp, "duplicate yank instance name: '%s'", instance_name);
+        qemu_mutex_unlock(&lock);
+        return;
+    }
+
+    instance = g_slice_new(struct YankInstance);
+    instance->name = g_strdup(instance_name);
+    QLIST_INIT(&instance->yankfns);
+    QLIST_INSERT_HEAD(&head, instance, next);
+
+    qemu_mutex_unlock(&lock);
+}
+
+void yank_unregister_instance(const char *instance_name)
+{
+    struct YankInstance *instance;
+
+    qemu_mutex_lock(&lock);
+    instance = yank_find_instance(instance_name);
+    assert(instance);
+
+    assert(QLIST_EMPTY(&instance->yankfns));
+    QLIST_REMOVE(instance, next);
+    g_free(instance->name);
+    g_slice_free(struct YankInstance, instance);
+
+    qemu_mutex_unlock(&lock);
+}
+
+void yank_register_function(const char *instance_name,
+                            YankFn *func,
+                            void *opaque)
+{
+    struct YankInstance *instance;
+    struct YankFuncAndParam *entry;
+
+    qemu_mutex_lock(&lock);
+    instance = yank_find_instance(instance_name);
+    assert(instance);
+
+    entry = g_slice_new(struct YankFuncAndParam);
+    entry->func = func;
+    entry->opaque = opaque;
+
+    QLIST_INSERT_HEAD(&instance->yankfns, entry, next);
+    qemu_mutex_unlock(&lock);
+}
+
+void yank_unregister_function(const char *instance_name,
+                              YankFn *func,
+                              void *opaque)
+{
+    struct YankInstance *instance;
+    struct YankFuncAndParam *entry;
+
+    qemu_mutex_lock(&lock);
+    instance = yank_find_instance(instance_name);
+    assert(instance);
+
+    QLIST_FOREACH(entry, &instance->yankfns, next) {
+        if (entry->func == func && entry->opaque == opaque) {
+            QLIST_REMOVE(entry, next);
+            g_slice_free(struct YankFuncAndParam, entry);
+            qemu_mutex_unlock(&lock);
+            return;
+        }
+    }
+
+    abort();
+}
+
+void yank_generic_iochannel(void *opaque)
+{
+    QIOChannel *ioc = QIO_CHANNEL(opaque);
+
+    qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+}
+
+void qmp_yank(strList *instances,
+              Error **errp)
+{
+    strList *tmp;
+    struct YankInstance *instance;
+    struct YankFuncAndParam *entry;
+
+    qemu_mutex_lock(&lock);
+    tmp = instances;
+    for (; tmp; tmp = tmp->next) {
+        instance = yank_find_instance(tmp->value);
+        if (!instance) {
+            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+                      "Instance '%s' not found", tmp->value);
+            qemu_mutex_unlock(&lock);
+            return;
+        }
+    }
+    tmp = instances;
+    for (; tmp; tmp = tmp->next) {
+        instance = yank_find_instance(tmp->value);
+        assert(instance);
+        QLIST_FOREACH(entry, &instance->yankfns, next) {
+            entry->func(entry->opaque);
+        }
+    }
+    qemu_mutex_unlock(&lock);
+}
+
+YankInstances *qmp_query_yank(Error **errp)
+{
+    struct YankInstance *instance;
+    YankInstances *ret;
+
+    ret = g_new0(YankInstances, 1);
+    ret->instances = NULL;
+
+    qemu_mutex_lock(&lock);
+    QLIST_FOREACH(instance, &head, next) {
+        strList *entry;
+        entry = g_new0(strList, 1);
+        entry->value = g_strdup(instance->name);
+        entry->next = ret->instances;
+        ret->instances = entry;
+    }
+    qemu_mutex_unlock(&lock);
+
+    return ret;
+}
+
+static void __attribute__((__constructor__)) yank_init(void)
+{
+    qemu_mutex_init(&lock);
+}
--
2.20.1


attachment0 (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH v7 2/8] block/nbd.c: Add yank feature

Lukas Straub
In reply to this post by Lukas Straub
Register a yank function which shuts down the socket and sets
s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an
error occured.

Signed-off-by: Lukas Straub <[hidden email]>
Acked-by: Stefan Hajnoczi <[hidden email]>
---
 block/nbd.c | 129 ++++++++++++++++++++++++++++++++--------------------
 1 file changed, 80 insertions(+), 49 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 7bb881fef4..8632cf5340 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -35,6 +35,7 @@
 #include "qemu/option.h"
 #include "qemu/cutils.h"
 #include "qemu/main-loop.h"
+#include "qemu/atomic.h"

 #include "qapi/qapi-visit-sockets.h"
 #include "qapi/qmp/qstring.h"
@@ -43,6 +44,8 @@
 #include "block/nbd.h"
 #include "block/block_int.h"

+#include "qemu/yank.h"
+
 #define EN_OPTSTR ":exportname="
 #define MAX_NBD_REQUESTS    16

@@ -84,6 +87,8 @@ typedef struct BDRVNBDState {
     NBDReply reply;
     BlockDriverState *bs;

+    char *yank_name;
+
     /* Connection parameters */
     uint32_t reconnect_delay;
     SocketAddress *saddr;
@@ -93,10 +98,10 @@ typedef struct BDRVNBDState {
     char *x_dirty_bitmap;
 } BDRVNBDState;

-static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr,
-                                                  Error **errp);
-static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
-                                Error **errp);
+static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
+                                    Error **errp);
+static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
+static void nbd_yank(void *opaque);

 static void nbd_clear_bdrvstate(BDRVNBDState *s)
 {
@@ -109,17 +114,19 @@ static void nbd_clear_bdrvstate(BDRVNBDState *s)
     s->tlscredsid = NULL;
     g_free(s->x_dirty_bitmap);
     s->x_dirty_bitmap = NULL;
+    g_free(s->yank_name);
+    s->yank_name = NULL;
 }

 static void nbd_channel_error(BDRVNBDState *s, int ret)
 {
     if (ret == -EIO) {
-        if (s->state == NBD_CLIENT_CONNECTED) {
+        if (atomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED) {
             s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT :
                                             NBD_CLIENT_CONNECTING_NOWAIT;
         }
     } else {
-        if (s->state == NBD_CLIENT_CONNECTED) {
+        if (atomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED) {
             qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
         }
         s->state = NBD_CLIENT_QUIT;
@@ -170,7 +177,7 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs,
      * s->connection_co is either yielded from nbd_receive_reply or from
      * nbd_co_reconnect_loop()
      */
-    if (s->state == NBD_CLIENT_CONNECTED) {
+    if (atomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED) {
         qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context);
     }

@@ -237,20 +244,20 @@ static void nbd_teardown_connection(BlockDriverState *bs)

 static bool nbd_client_connecting(BDRVNBDState *s)
 {
-    return s->state == NBD_CLIENT_CONNECTING_WAIT ||
-        s->state == NBD_CLIENT_CONNECTING_NOWAIT;
+    NBDClientState state = atomic_load_acquire(&s->state);
+    return state == NBD_CLIENT_CONNECTING_WAIT ||
+        state == NBD_CLIENT_CONNECTING_NOWAIT;
 }

 static bool nbd_client_connecting_wait(BDRVNBDState *s)
 {
-    return s->state == NBD_CLIENT_CONNECTING_WAIT;
+    return atomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT;
 }

 static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
 {
     int ret;
     Error *local_err = NULL;
-    QIOChannelSocket *sioc;

     if (!nbd_client_connecting(s)) {
         return;
@@ -283,21 +290,21 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
     /* Finalize previous connection if any */
     if (s->ioc) {
         nbd_client_detach_aio_context(s->bs);
+        yank_unregister_function(s->yank_name, nbd_yank, s->bs);
         object_unref(OBJECT(s->sioc));
         s->sioc = NULL;
         object_unref(OBJECT(s->ioc));
         s->ioc = NULL;
     }

-    sioc = nbd_establish_connection(s->saddr, &local_err);
-    if (!sioc) {
+    if (nbd_establish_connection(s->bs, s->saddr, &local_err) < 0) {
         ret = -ECONNREFUSED;
         goto out;
     }

     bdrv_dec_in_flight(s->bs);

-    ret = nbd_client_handshake(s->bs, sioc, &local_err);
+    ret = nbd_client_handshake(s->bs, &local_err);

     if (s->drained) {
         s->wait_drained_end = true;
@@ -334,7 +341,7 @@ static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState *s)
     nbd_reconnect_attempt(s);

     while (nbd_client_connecting(s)) {
-        if (s->state == NBD_CLIENT_CONNECTING_WAIT &&
+        if (atomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT &&
             qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > delay_ns)
         {
             s->state = NBD_CLIENT_CONNECTING_NOWAIT;
@@ -371,7 +378,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
     int ret = 0;
     Error *local_err = NULL;

-    while (s->state != NBD_CLIENT_QUIT) {
+    while (atomic_load_acquire(&s->state) != NBD_CLIENT_QUIT) {
         /*
          * The NBD client can only really be considered idle when it has
          * yielded from qio_channel_readv_all_eof(), waiting for data. This is
@@ -386,7 +393,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
             nbd_co_reconnect_loop(s);
         }

-        if (s->state != NBD_CLIENT_CONNECTED) {
+        if (atomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
             continue;
         }

@@ -441,6 +448,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
     s->connection_co = NULL;
     if (s->ioc) {
         nbd_client_detach_aio_context(s->bs);
+        yank_unregister_function(s->yank_name, nbd_yank, s->bs);
         object_unref(OBJECT(s->sioc));
         s->sioc = NULL;
         object_unref(OBJECT(s->ioc));
@@ -465,7 +473,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
         qemu_co_queue_wait(&s->free_sema, &s->send_mutex);
     }

-    if (s->state != NBD_CLIENT_CONNECTED) {
+    if (atomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
         rc = -EIO;
         goto err;
     }
@@ -492,7 +500,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
     if (qiov) {
         qio_channel_set_cork(s->ioc, true);
         rc = nbd_send_request(s->ioc, request);
-        if (rc >= 0 && s->state == NBD_CLIENT_CONNECTED) {
+        if (rc >= 0 && atomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED) {
             if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
                                        NULL) < 0) {
                 rc = -EIO;
@@ -807,7 +815,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
     s->requests[i].receiving = true;
     qemu_coroutine_yield();
     s->requests[i].receiving = false;
-    if (s->state != NBD_CLIENT_CONNECTED) {
+    if (atomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
         error_setg(errp, "Connection closed");
         return -EIO;
     }
@@ -966,7 +974,7 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
     NBDReply local_reply;
     NBDStructuredReplyChunk *chunk;
     Error *local_err = NULL;
-    if (s->state != NBD_CLIENT_CONNECTED) {
+    if (atomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
         error_setg(&local_err, "Connection closed");
         nbd_iter_channel_error(iter, -EIO, &local_err);
         goto break_loop;
@@ -991,7 +999,8 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
     }

     /* Do not execute the body of NBD_FOREACH_REPLY_CHUNK for simple reply. */
-    if (nbd_reply_is_simple(reply) || s->state != NBD_CLIENT_CONNECTED) {
+    if (nbd_reply_is_simple(reply) ||
+        atomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTED) {
         goto break_loop;
     }

@@ -1423,6 +1432,15 @@ static int nbd_client_reopen_prepare(BDRVReopenState *state,
     return 0;
 }

+static void nbd_yank(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+
+    atomic_store_release(&s->state, NBD_CLIENT_QUIT);
+    qio_channel_shutdown(QIO_CHANNEL(s->sioc), QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+}
+
 static void nbd_client_close(BlockDriverState *bs)
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
@@ -1435,52 +1453,53 @@ static void nbd_client_close(BlockDriverState *bs)
     nbd_teardown_connection(bs);
 }

-static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr,
-                                                  Error **errp)
+static int nbd_establish_connection(BlockDriverState *bs,
+                                    SocketAddress *saddr,
+                                    Error **errp)
 {
     ERRP_GUARD();
-    QIOChannelSocket *sioc;
+    BDRVNBDState *s = (BDRVNBDState *)bs->opaque;

-    sioc = qio_channel_socket_new();
-    qio_channel_set_name(QIO_CHANNEL(sioc), "nbd-client");
+    s->sioc = qio_channel_socket_new();
+    qio_channel_set_name(QIO_CHANNEL(s->sioc), "nbd-client");
+    yank_register_function(s->yank_name, nbd_yank, bs);

-    qio_channel_socket_connect_sync(sioc, saddr, errp);
+    qio_channel_socket_connect_sync(s->sioc, saddr, errp);
     if (*errp) {
-        object_unref(OBJECT(sioc));
-        return NULL;
+        yank_unregister_function(s->yank_name, nbd_yank, bs);
+        object_unref(OBJECT(s->sioc));
+        s->sioc = NULL;
+        return -1;
     }

-    qio_channel_set_delay(QIO_CHANNEL(sioc), false);
+    qio_channel_set_delay(QIO_CHANNEL(s->sioc), false);

-    return sioc;
+    return 0;
 }

-/* nbd_client_handshake takes ownership on sioc. On failure it is unref'ed. */
-static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
-                                Error **errp)
+/* nbd_client_handshake takes ownership on s->sioc. On failure it's unref'ed. */
+static int nbd_client_handshake(BlockDriverState *bs, Error **errp)
 {
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
     AioContext *aio_context = bdrv_get_aio_context(bs);
     int ret;

     trace_nbd_client_handshake(s->export);
-
-    s->sioc = sioc;
-
-    qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
-    qio_channel_attach_aio_context(QIO_CHANNEL(sioc), aio_context);
+    qio_channel_set_blocking(QIO_CHANNEL(s->sioc), false, NULL);
+    qio_channel_attach_aio_context(QIO_CHANNEL(s->sioc), aio_context);

     s->info.request_sizes = true;
     s->info.structured_reply = true;
     s->info.base_allocation = true;
     s->info.x_dirty_bitmap = g_strdup(s->x_dirty_bitmap);
     s->info.name = g_strdup(s->export ?: "");
-    ret = nbd_receive_negotiate(aio_context, QIO_CHANNEL(sioc), s->tlscreds,
+    ret = nbd_receive_negotiate(aio_context, QIO_CHANNEL(s->sioc), s->tlscreds,
                                 s->hostname, &s->ioc, &s->info, errp);
     g_free(s->info.x_dirty_bitmap);
     g_free(s->info.name);
     if (ret < 0) {
-        object_unref(OBJECT(sioc));
+        yank_unregister_function(s->yank_name, nbd_yank, bs);
+        object_unref(OBJECT(s->sioc));
         s->sioc = NULL;
         return ret;
     }
@@ -1508,7 +1527,7 @@ static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
     }

     if (!s->ioc) {
-        s->ioc = QIO_CHANNEL(sioc);
+        s->ioc = QIO_CHANNEL(s->sioc);
         object_ref(OBJECT(s->ioc));
     }

@@ -1524,9 +1543,10 @@ static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
     {
         NBDRequest request = { .type = NBD_CMD_DISC };

-        nbd_send_request(s->ioc ?: QIO_CHANNEL(sioc), &request);
+        nbd_send_request(s->ioc ?: QIO_CHANNEL(s->sioc), &request);

-        object_unref(OBJECT(sioc));
+        yank_unregister_function(s->yank_name, nbd_yank, bs);
+        object_unref(OBJECT(s->sioc));
         s->sioc = NULL;

         return ret;
@@ -1918,7 +1938,6 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
 {
     int ret;
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-    QIOChannelSocket *sioc;

     ret = nbd_process_options(bs, options, errp);
     if (ret < 0) {
@@ -1929,17 +1948,28 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
     qemu_co_mutex_init(&s->send_mutex);
     qemu_co_queue_init(&s->free_sema);

+    s->yank_name = g_strconcat("blockdev:", bs->node_name, NULL);
+    yank_register_instance(s->yank_name, errp);
+    if (*errp) {
+        g_free(s->yank_name);
+        s->yank_name = NULL;
+        return -EEXIST;
+    }
+
     /*
      * establish TCP connection, return error if it fails
      * TODO: Configurable retry-until-timeout behaviour.
      */
-    sioc = nbd_establish_connection(s->saddr, errp);
-    if (!sioc) {
+    if (nbd_establish_connection(bs, s->saddr, errp) < 0) {
+        yank_unregister_instance(s->yank_name);
+        g_free(s->yank_name);
+        s->yank_name = NULL;
         return -ECONNREFUSED;
     }

-    ret = nbd_client_handshake(bs, sioc, errp);
+    ret = nbd_client_handshake(bs, errp);
     if (ret < 0) {
+        yank_unregister_instance(s->yank_name);
         nbd_clear_bdrvstate(s);
         return ret;
     }
@@ -1997,6 +2027,7 @@ static void nbd_close(BlockDriverState *bs)
     BDRVNBDState *s = bs->opaque;

     nbd_client_close(bs);
+    yank_unregister_instance(s->yank_name);
     nbd_clear_bdrvstate(s);
 }

--
2.20.1


attachment0 (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH v7 3/8] chardev/char-socket.c: Add yank feature

Lukas Straub
In reply to this post by Lukas Straub
Register a yank function to shutdown the socket on yank.

Signed-off-by: Lukas Straub <[hidden email]>
Acked-by: Stefan Hajnoczi <[hidden email]>
---
 chardev/char-socket.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index ef62dbf3d7..8e2865ca83 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -34,6 +34,7 @@
 #include "qapi/error.h"
 #include "qapi/clone-visitor.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qemu/yank.h"

 #include "chardev/char-io.h"

@@ -69,6 +70,7 @@ typedef struct {
     size_t read_msgfds_num;
     int *write_msgfds;
     size_t write_msgfds_num;
+    char *yank_name;

     SocketAddress *addr;
     bool is_listen;
@@ -413,6 +415,11 @@ static void tcp_chr_free_connection(Chardev *chr)

     tcp_set_msgfds(chr, NULL, 0);
     remove_fd_in_watch(chr);
+    if (s->state == TCP_CHARDEV_STATE_CONNECTING
+        || s->state == TCP_CHARDEV_STATE_CONNECTED) {
+        yank_unregister_function(s->yank_name, yank_generic_iochannel,
+                                 QIO_CHANNEL(s->sioc));
+    }
     object_unref(OBJECT(s->sioc));
     s->sioc = NULL;
     object_unref(OBJECT(s->ioc));
@@ -916,6 +923,8 @@ static int tcp_chr_add_client(Chardev *chr, int fd)
     }
     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     tcp_chr_set_client_ioc_name(chr, sioc);
+    yank_register_function(s->yank_name, yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     ret = tcp_chr_new_client(chr, sioc);
     object_unref(OBJECT(sioc));
     return ret;
@@ -930,6 +939,8 @@ static void tcp_chr_accept(QIONetListener *listener,

     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     tcp_chr_set_client_ioc_name(chr, cioc);
+    yank_register_function(s->yank_name, yank_generic_iochannel,
+                           QIO_CHANNEL(cioc));
     tcp_chr_new_client(chr, cioc);
 }

@@ -945,6 +956,8 @@ static int tcp_chr_connect_client_sync(Chardev *chr, Error **errp)
         object_unref(OBJECT(sioc));
         return -1;
     }
+    yank_register_function(s->yank_name, yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     tcp_chr_new_client(chr, sioc);
     object_unref(OBJECT(sioc));
     return 0;
@@ -960,6 +973,8 @@ static void tcp_chr_accept_server_sync(Chardev *chr)
     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     sioc = qio_net_listener_wait_client(s->listener);
     tcp_chr_set_client_ioc_name(chr, sioc);
+    yank_register_function(s->yank_name, yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     tcp_chr_new_client(chr, sioc);
     object_unref(OBJECT(sioc));
 }
@@ -1070,6 +1085,10 @@ static void char_socket_finalize(Object *obj)
         object_unref(OBJECT(s->tls_creds));
     }
     g_free(s->tls_authz);
+    if (s->yank_name) {
+        yank_unregister_instance(s->yank_name);
+        g_free(s->yank_name);
+    }

     qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
 }
@@ -1085,6 +1104,8 @@ static void qemu_chr_socket_connected(QIOTask *task, void *opaque)

     if (qio_task_propagate_error(task, &err)) {
         tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED);
+        yank_unregister_function(s->yank_name, yank_generic_iochannel,
+                                 QIO_CHANNEL(sioc));
         check_report_connect_error(chr, err);
         goto cleanup;
     }
@@ -1118,6 +1139,8 @@ static void tcp_chr_connect_client_async(Chardev *chr)
     tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
     sioc = qio_channel_socket_new();
     tcp_chr_set_client_ioc_name(chr, sioc);
+    yank_register_function(s->yank_name, yank_generic_iochannel,
+                           QIO_CHANNEL(sioc));
     /*
      * Normally code would use the qio_channel_socket_connect_async
      * method which uses a QIOTask + qio_task_set_error internally
@@ -1360,6 +1383,14 @@ static void qmp_chardev_open_socket(Chardev *chr,
         qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_FD_PASS);
     }

+    s->yank_name = g_strconcat("chardev:", chr->label, NULL);
+    yank_register_instance(s->yank_name, errp);
+    if (*errp) {
+        g_free(s->yank_name);
+        s->yank_name = NULL;
+        return;
+    }
+
     /* be isn't opened until we get a connection */
     *be_opened = false;

--
2.20.1


attachment0 (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH v7 4/8] migration: Add yank feature

Lukas Straub
In reply to this post by Lukas Straub
Register yank functions on sockets to shut them down.

Signed-off-by: Lukas Straub <[hidden email]>
Acked-by: Stefan Hajnoczi <[hidden email]>
---
 migration/channel.c           | 12 ++++++++++++
 migration/migration.c         | 25 ++++++++++++++++++++++++-
 migration/multifd.c           | 10 ++++++++++
 migration/qemu-file-channel.c |  6 ++++++
 migration/savevm.c            |  6 ++++++
 tests/Makefile.include        |  2 +-
 6 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/migration/channel.c b/migration/channel.c
index 20e4c8e2dc..21fc8046b9 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -18,6 +18,8 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "io/channel-tls.h"
+#include "io/channel-socket.h"
+#include "qemu/yank.h"

 /**
  * @migration_channel_process_incoming - Create new incoming migration channel
@@ -35,6 +37,11 @@ void migration_channel_process_incoming(QIOChannel *ioc)
     trace_migration_set_incoming_channel(
         ioc, object_get_typename(OBJECT(ioc)));

+    if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) {
+        yank_register_function("migration", yank_generic_iochannel,
+                               QIO_CHANNEL(ioc));
+    }
+
     if (s->parameters.tls_creds &&
         *s->parameters.tls_creds &&
         !object_dynamic_cast(OBJECT(ioc),
@@ -67,6 +74,11 @@ void migration_channel_connect(MigrationState *s,
         ioc, object_get_typename(OBJECT(ioc)), hostname, error);

     if (!error) {
+        if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)) {
+            yank_register_function("migration", yank_generic_iochannel,
+                                   QIO_CHANNEL(ioc));
+        }
+
         if (s->parameters.tls_creds &&
             *s->parameters.tls_creds &&
             !object_dynamic_cast(OBJECT(ioc),
diff --git a/migration/migration.c b/migration/migration.c
index 8fe36339db..e4818edb2a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -55,6 +55,7 @@
 #include "net/announce.h"
 #include "qemu/queue.h"
 #include "multifd.h"
+#include "qemu/yank.h"

 #define MAX_THROTTLE  (32 << 20)      /* Migration transfer speed throttling */

@@ -243,6 +244,8 @@ void migration_incoming_state_destroy(void)
         qapi_free_SocketAddressList(mis->socket_address_list);
         mis->socket_address_list = NULL;
     }
+
+    yank_unregister_instance("migration");
 }

 static void migrate_generate_event(int new_state)
@@ -379,8 +382,14 @@ void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;

+    yank_register_instance("migration", errp);
+    if (*errp) {
+        return;
+    }
+
     qapi_event_send_migration(MIGRATION_STATUS_SETUP);
     if (!strcmp(uri, "defer")) {
+        yank_unregister_instance("migration");
         deferred_incoming_migration(errp);
     } else if (strstart(uri, "tcp:", &p)) {
         tcp_start_incoming_migration(p, errp);
@@ -395,6 +404,7 @@ void qemu_start_incoming_migration(const char *uri, Error **errp)
     } else if (strstart(uri, "fd:", &p)) {
         fd_start_incoming_migration(p, errp);
     } else {
+        yank_unregister_instance("migration");
         error_setg(errp, "unknown migration protocol: %s", uri);
     }
 }
@@ -1662,6 +1672,7 @@ static void migrate_fd_cleanup(MigrationState *s)
     }
     notifier_list_notify(&migration_state_notifiers, s);
     block_cleanup_parameters(s);
+    yank_unregister_instance("migration");
 }

 static void migrate_fd_cleanup_schedule(MigrationState *s)
@@ -1935,6 +1946,7 @@ void qmp_migrate_recover(const char *uri, Error **errp)
      * only re-setup the migration stream and poke existing migration
      * to continue using that newly established channel.
      */
+    yank_unregister_instance("migration");
     qemu_start_incoming_migration(uri, errp);
 }

@@ -2071,7 +2083,12 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         /* Error detected, put into errp */
         return;
     }
-
+    if (!(has_resume && resume)) {
+        yank_register_instance("migration", errp);
+        if (*errp) {
+            return;
+        }
+    }
     if (strstart(uri, "tcp:", &p)) {
         tcp_start_outgoing_migration(s, p, &local_err);
 #ifdef CONFIG_RDMA
@@ -2085,6 +2102,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     } else if (strstart(uri, "fd:", &p)) {
         fd_start_outgoing_migration(s, p, &local_err);
     } else {
+        if (!(has_resume && resume)) {
+            yank_unregister_instance("migration");
+        }
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri",
                    "a valid migration protocol");
         migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
@@ -2094,6 +2114,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     }

     if (local_err) {
+        if (!(has_resume && resume)) {
+            yank_unregister_instance("migration");
+        }
         migrate_fd_error(s, local_err);
         error_propagate(errp, local_err);
         return;
diff --git a/migration/multifd.c b/migration/multifd.c
index d0441202aa..2c9863e770 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -24,6 +24,9 @@
 #include "trace.h"
 #include "multifd.h"

+#include "qemu/yank.h"
+#include "io/channel-socket.h"
+
 /* Multiple fd's */

 #define MULTIFD_MAGIC 0x11223344U
@@ -866,6 +869,13 @@ int multifd_load_cleanup(Error **errp)
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];

+        if (object_dynamic_cast(OBJECT(p->c), TYPE_QIO_CHANNEL_SOCKET)
+            && OBJECT(p->c)->ref == 1) {
+            yank_unregister_function("migration",
+                                     yank_generic_iochannel,
+                                     QIO_CHANNEL(p->c));
+        }
+
         object_unref(OBJECT(p->c));
         p->c = NULL;
         qemu_mutex_destroy(&p->mutex);
diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index d2ce32f4b9..d8f8384fea 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -27,6 +27,7 @@
 #include "qemu-file.h"
 #include "io/channel-socket.h"
 #include "qemu/iov.h"
+#include "qemu/yank.h"


 static ssize_t channel_writev_buffer(void *opaque,
@@ -104,6 +105,11 @@ static int channel_close(void *opaque, Error **errp)
     int ret;
     QIOChannel *ioc = QIO_CHANNEL(opaque);
     ret = qio_channel_close(ioc, errp);
+    if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET)
+        && OBJECT(ioc)->ref == 1) {
+        yank_unregister_function("migration", yank_generic_iochannel,
+                                 QIO_CHANNEL(ioc));
+    }
     object_unref(OBJECT(ioc));
     return ret;
 }
diff --git a/migration/savevm.c b/migration/savevm.c
index a843d202b5..395b8fa704 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -63,6 +63,7 @@
 #include "migration/colo.h"
 #include "qemu/bitmap.h"
 #include "net/announce.h"
+#include "qemu/yank.h"

 const unsigned int postcopy_ram_discard_version = 0;

@@ -2935,6 +2936,11 @@ int load_snapshot(const char *name, Error **errp)
     qemu_system_reset(SHUTDOWN_CAUSE_NONE);
     mis->from_src_file = f;

+    yank_register_instance("migration", errp);
+    if (*errp) {
+        ret = -EINVAL;
+        goto err_drain;
+    }
     aio_context_acquire(aio_context);
     ret = qemu_loadvm_state(f);
     migration_incoming_state_destroy();
diff --git a/tests/Makefile.include b/tests/Makefile.include
index c7e4646ded..e733918269 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -452,7 +452,7 @@ tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
  $(test-qapi-obj-y)
 tests/test-vmstate$(EXESUF): tests/test-vmstate.o \
  migration/vmstate.o migration/vmstate-types.o migration/qemu-file.o \
-        migration/qemu-file-channel.o migration/qjson.o \
+        migration/qemu-file-channel.o migration/qjson.o util/yank.o \
  $(test-io-obj-y)
 tests/test-timed-average$(EXESUF): tests/test-timed-average.o $(test-util-obj-y)
 tests/test-base64$(EXESUF): tests/test-base64.o $(test-util-obj-y)
--
2.20.1


attachment0 (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH v7 5/8] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe

Lukas Straub
In reply to this post by Lukas Straub
Make qio_channel_tls_shutdown thread-safe by using atomics when
accessing tioc->shutdown.

Signed-off-by: Lukas Straub <[hidden email]>
Reviewed-by: Daniel P. Berrangé <[hidden email]>
Acked-by: Stefan Hajnoczi <[hidden email]>
---
 io/channel-tls.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/io/channel-tls.c b/io/channel-tls.c
index 7ec8ceff2f..b350c84640 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -23,6 +23,7 @@
 #include "qemu/module.h"
 #include "io/channel-tls.h"
 #include "trace.h"
+#include "qemu/atomic.h"


 static ssize_t qio_channel_tls_write_handler(const char *buf,
@@ -277,7 +278,8 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc,
                     return QIO_CHANNEL_ERR_BLOCK;
                 }
             } else if (errno == ECONNABORTED &&
-                       (tioc->shutdown & QIO_CHANNEL_SHUTDOWN_READ)) {
+                       (atomic_load_acquire(&tioc->shutdown) &
+                        QIO_CHANNEL_SHUTDOWN_READ)) {
                 return 0;
             }

@@ -361,7 +363,7 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc,
 {
     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);

-    tioc->shutdown |= how;
+    atomic_or(&tioc->shutdown, how);

     return qio_channel_shutdown(tioc->master, how, errp);
 }
--
2.20.1


attachment0 (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH v7 6/8] io: Document thread-safety of qio_channel_shutdown

Lukas Straub
In reply to this post by Lukas Straub
Migration and yank code assume that qio_channel_shutdown is thread
-safe. Document this after checking the code.

Signed-off-by: Lukas Straub <[hidden email]>
Reviewed-by: Daniel P. Berrangé <[hidden email]>
Acked-by: Stefan Hajnoczi <[hidden email]>
---
 include/io/channel.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/io/channel.h b/include/io/channel.h
index d4557f0930..6b8a2b87b8 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -516,6 +516,8 @@ int qio_channel_close(QIOChannel *ioc,
  * QIO_CHANNEL_FEATURE_SHUTDOWN prior to calling
  * this method.
  *
+ * This function is thread-safe.
+ *
  * Returns: 0 on success, -1 on error
  */
 int qio_channel_shutdown(QIOChannel *ioc,
--
2.20.1


attachment0 (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH v7 7/8] MAINTAINERS: Add myself as maintainer for yank feature

Lukas Straub
In reply to this post by Lukas Straub
I'll maintain this for now as the colo usecase is the first user
of this functionality.

Signed-off-by: Lukas Straub <[hidden email]>
Reviewed-by: Daniel P. Berrangé <[hidden email]>
Acked-by: Stefan Hajnoczi <[hidden email]>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0886eb3d2b..bf5075a637 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2614,6 +2614,12 @@ F: util/uuid.c
 F: include/qemu/uuid.h
 F: tests/test-uuid.c

+Yank feature
+M: Lukas Straub <[hidden email]>
+S: Odd fixes
+F: util/yank.c
+F: include/qemu/yank.h
+
 COLO Framework
 M: zhanghailiang <[hidden email]>
 S: Maintained
--
2.20.1


attachment0 (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH v7 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test

Lukas Straub
In reply to this post by Lukas Straub
A connecting chardev object has an additional reference by the connecting
thread, so if the chardev is still connecting by the end of the test,
then the chardev object won't be freed. This in turn means that the yank
instance won't be unregistered and when running the next test-case
yank_register_instance will abort, because the yank instance is
already/still registered.

Signed-off-by: Lukas Straub <[hidden email]>
---
 tests/test-char.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/test-char.c b/tests/test-char.c
index d35cc839bc..2ced07de69 100644
--- a/tests/test-char.c
+++ b/tests/test-char.c
@@ -937,6 +937,7 @@ static void char_socket_client_dupid_test(gconstpointer opaque)
     g_assert_nonnull(opts);
     chr1 = qemu_chr_new_from_opts(opts, NULL, &error_abort);
     g_assert_nonnull(chr1);
+    qemu_chr_wait_connected(chr1, &error_abort);

     chr2 = qemu_chr_new_from_opts(opts, NULL, &local_err);
     g_assert_null(chr2);
--
2.20.1

attachment0 (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu

Lukas Straub
In reply to this post by Lukas Straub
On Tue, 4 Aug 2020 10:11:22 +0200
Lukas Straub <[hidden email]> wrote:

> Hello Everyone,
> In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> to some other server and that server dies or hangs, qemu hangs too.
> These patches introduce the new 'yank' out-of-band qmp command to recover from
> these kinds of hangs. The different subsystems register callbacks which get
> executed with the yank command. For example the callback can shutdown() a
> socket. This is intended for the colo use-case, but it can be used for other
> things too of course.
>
> Regards,
> Lukas Straub
>
> v7:
>  -yank_register_instance now returns error via Error **errp instead of aborting
>  -dropped "chardev/char.c: Check for duplicate id before  creating chardev"
>
> v6:
>  -add Reviewed-by and Acked-by tags
>  -rebase on master
>  -lots of changes in nbd due to rebase
>  -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
>  -fix a crash discovered by the newly added chardev test
>  -fix the test itself
>
> v5:
>  -move yank.c to util/
>  -move yank.h to include/qemu/
>  -add license to yank.h
>  -use const char*
>  -nbd: use atomic_store_release and atomic_load_aqcuire
>  -io-channel: ensure thread-safety and document it
>  -add myself as maintainer for yank
>
> v4:
>  -fix build errors...
>
> v3:
>  -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
>  -fix build errors
>  -rewrite migration patch so it actually passes all tests
>
> v2:
>  -don't touch io/ code anymore
>  -always register yank functions
>  -'yank' now takes a list of instances to yank
>  -'query-yank' returns a list of yankable instances
>
> Lukas Straub (8):
>   Introduce yank feature
>   block/nbd.c: Add yank feature
>   chardev/char-socket.c: Add yank feature
>   migration: Add yank feature
>   io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
>   io: Document thread-safety of qio_channel_shutdown
>   MAINTAINERS: Add myself as maintainer for yank feature
>   tests/test-char.c: Wait for the chardev to connect in
>     char_socket_client_dupid_test
>
>  MAINTAINERS                   |   6 ++
>  block/nbd.c                   | 129 +++++++++++++++---------
>  chardev/char-socket.c         |  31 ++++++
>  include/io/channel.h          |   2 +
>  include/qemu/yank.h           |  80 +++++++++++++++
>  io/channel-tls.c              |   6 +-
>  migration/channel.c           |  12 +++
>  migration/migration.c         |  25 ++++-
>  migration/multifd.c           |  10 ++
>  migration/qemu-file-channel.c |   6 ++
>  migration/savevm.c            |   6 ++
>  qapi/misc.json                |  45 +++++++++
>  tests/Makefile.include        |   2 +-
>  tests/test-char.c             |   1 +
>  util/Makefile.objs            |   1 +
>  util/yank.c                   | 184 ++++++++++++++++++++++++++++++++++
>  16 files changed, 493 insertions(+), 53 deletions(-)
>  create mode 100644 include/qemu/yank.h
>  create mode 100644 util/yank.c
>
> --
> 2.20.1
Ping...

attachment0 (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu

Lukas Straub
On Tue, 18 Aug 2020 14:26:31 +0200
Lukas Straub <[hidden email]> wrote:

> On Tue, 4 Aug 2020 10:11:22 +0200
> Lukas Straub <[hidden email]> wrote:
>
> > Hello Everyone,
> > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > to some other server and that server dies or hangs, qemu hangs too.
> > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > these kinds of hangs. The different subsystems register callbacks which get
> > executed with the yank command. For example the callback can shutdown() a
> > socket. This is intended for the colo use-case, but it can be used for other
> > things too of course.
> >
> > Regards,
> > Lukas Straub
> >
> > v7:
> >  -yank_register_instance now returns error via Error **errp instead of aborting
> >  -dropped "chardev/char.c: Check for duplicate id before  creating chardev"
> >
> > v6:
> >  -add Reviewed-by and Acked-by tags
> >  -rebase on master
> >  -lots of changes in nbd due to rebase
> >  -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
> >  -fix a crash discovered by the newly added chardev test
> >  -fix the test itself
> >
> > v5:
> >  -move yank.c to util/
> >  -move yank.h to include/qemu/
> >  -add license to yank.h
> >  -use const char*
> >  -nbd: use atomic_store_release and atomic_load_aqcuire
> >  -io-channel: ensure thread-safety and document it
> >  -add myself as maintainer for yank
> >
> > v4:
> >  -fix build errors...
> >
> > v3:
> >  -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
> >  -fix build errors
> >  -rewrite migration patch so it actually passes all tests
> >
> > v2:
> >  -don't touch io/ code anymore
> >  -always register yank functions
> >  -'yank' now takes a list of instances to yank
> >  -'query-yank' returns a list of yankable instances
> >
> > Lukas Straub (8):
> >   Introduce yank feature
> >   block/nbd.c: Add yank feature
> >   chardev/char-socket.c: Add yank feature
> >   migration: Add yank feature
> >   io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
> >   io: Document thread-safety of qio_channel_shutdown
> >   MAINTAINERS: Add myself as maintainer for yank feature
> >   tests/test-char.c: Wait for the chardev to connect in
> >     char_socket_client_dupid_test
> >
> >  MAINTAINERS                   |   6 ++
> >  block/nbd.c                   | 129 +++++++++++++++---------
> >  chardev/char-socket.c         |  31 ++++++
> >  include/io/channel.h          |   2 +
> >  include/qemu/yank.h           |  80 +++++++++++++++
> >  io/channel-tls.c              |   6 +-
> >  migration/channel.c           |  12 +++
> >  migration/migration.c         |  25 ++++-
> >  migration/multifd.c           |  10 ++
> >  migration/qemu-file-channel.c |   6 ++
> >  migration/savevm.c            |   6 ++
> >  qapi/misc.json                |  45 +++++++++
> >  tests/Makefile.include        |   2 +-
> >  tests/test-char.c             |   1 +
> >  util/Makefile.objs            |   1 +
> >  util/yank.c                   | 184 ++++++++++++++++++++++++++++++++++
> >  16 files changed, 493 insertions(+), 53 deletions(-)
> >  create mode 100644 include/qemu/yank.h
> >  create mode 100644 util/yank.c
> >
> > --
> > 2.20.1  
>
> Ping...
Ping 2...

Also, can the different subsystems have a look at this and give their ok?

Regards,
Lukas Straub

attachment0 (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v7 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test

Daniel P. Berrangé
In reply to this post by Lukas Straub
On Tue, Aug 04, 2020 at 10:12:01AM +0200, Lukas Straub wrote:

> A connecting chardev object has an additional reference by the connecting
> thread, so if the chardev is still connecting by the end of the test,
> then the chardev object won't be freed. This in turn means that the yank
> instance won't be unregistered and when running the next test-case
> yank_register_instance will abort, because the yank instance is
> already/still registered.
>
> Signed-off-by: Lukas Straub <[hidden email]>
> ---
>  tests/test-char.c | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Daniel P. Berrangé <[hidden email]>


Regards,
Daniel
--
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v7 1/8] Introduce yank feature

Daniel P. Berrangé
In reply to this post by Lukas Straub
On Tue, Aug 04, 2020 at 10:11:34AM +0200, Lukas Straub wrote:

> The yank feature allows to recover from hanging qemu by "yanking"
> at various parts. Other qemu systems can register themselves and
> multiple yank functions. Then all yank functions for selected
> instances can be called by the 'yank' out-of-band qmp command.
> Available instances can be queried by a 'query-yank' oob command.
>
> Signed-off-by: Lukas Straub <[hidden email]>
> Acked-by: Stefan Hajnoczi <[hidden email]>
> ---
>  include/qemu/yank.h |  80 +++++++++++++++++++
>  qapi/misc.json      |  45 +++++++++++
>  util/Makefile.objs  |   1 +
>  util/yank.c         | 184 ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 310 insertions(+)
>  create mode 100644 include/qemu/yank.h
>  create mode 100644 util/yank.c

Reviewed-by: Daniel P. Berrangé <[hidden email]>


Regards,
Daniel
--
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v7 2/8] block/nbd.c: Add yank feature

Daniel P. Berrangé
In reply to this post by Lukas Straub
On Tue, Aug 04, 2020 at 10:11:37AM +0200, Lukas Straub wrote:
> Register a yank function which shuts down the socket and sets
> s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an
> error occured.
>
> Signed-off-by: Lukas Straub <[hidden email]>
> Acked-by: Stefan Hajnoczi <[hidden email]>
> ---
>  block/nbd.c | 129 ++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 80 insertions(+), 49 deletions(-)

Reviewed-by: Daniel P. Berrangé <[hidden email]>


Regards,
Daniel
--
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v7 3/8] chardev/char-socket.c: Add yank feature

Daniel P. Berrangé
In reply to this post by Lukas Straub
On Tue, Aug 04, 2020 at 10:11:41AM +0200, Lukas Straub wrote:
> Register a yank function to shutdown the socket on yank.
>
> Signed-off-by: Lukas Straub <[hidden email]>
> Acked-by: Stefan Hajnoczi <[hidden email]>
> ---
>  chardev/char-socket.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)

Reviewed-by: Daniel P. Berrangé <[hidden email]>


Regards,
Daniel
--
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v7 4/8] migration: Add yank feature

Daniel P. Berrangé
In reply to this post by Lukas Straub
On Tue, Aug 04, 2020 at 10:11:45AM +0200, Lukas Straub wrote:

> Register yank functions on sockets to shut them down.
>
> Signed-off-by: Lukas Straub <[hidden email]>
> Acked-by: Stefan Hajnoczi <[hidden email]>
> ---
>  migration/channel.c           | 12 ++++++++++++
>  migration/migration.c         | 25 ++++++++++++++++++++++++-
>  migration/multifd.c           | 10 ++++++++++
>  migration/qemu-file-channel.c |  6 ++++++
>  migration/savevm.c            |  6 ++++++
>  tests/Makefile.include        |  2 +-
>  6 files changed, 59 insertions(+), 2 deletions(-)

Reviewed-by: Daniel P. Berrangé <[hidden email]>


Regards,
Daniel
--
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu

Daniel P. Berrangé
In reply to this post by Lukas Straub
On Thu, Aug 27, 2020 at 10:42:46AM +0200, Lukas Straub wrote:

> On Tue, 18 Aug 2020 14:26:31 +0200
> Lukas Straub <[hidden email]> wrote:
>
> > On Tue, 4 Aug 2020 10:11:22 +0200
> > Lukas Straub <[hidden email]> wrote:
> >
> > > Hello Everyone,
> > > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > > to some other server and that server dies or hangs, qemu hangs too.
> > > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > > these kinds of hangs. The different subsystems register callbacks which get
> > > executed with the yank command. For example the callback can shutdown() a
> > > socket. This is intended for the colo use-case, but it can be used for other
> > > things too of course.
> > >
> > > Regards,
> > > Lukas Straub
> > >
> > > v7:
> > >  -yank_register_instance now returns error via Error **errp instead of aborting
> > >  -dropped "chardev/char.c: Check for duplicate id before  creating chardev"
> > >
> > > v6:
> > >  -add Reviewed-by and Acked-by tags
> > >  -rebase on master
> > >  -lots of changes in nbd due to rebase
> > >  -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
> > >  -fix a crash discovered by the newly added chardev test
> > >  -fix the test itself
> > >
> > > v5:
> > >  -move yank.c to util/
> > >  -move yank.h to include/qemu/
> > >  -add license to yank.h
> > >  -use const char*
> > >  -nbd: use atomic_store_release and atomic_load_aqcuire
> > >  -io-channel: ensure thread-safety and document it
> > >  -add myself as maintainer for yank
> > >
> > > v4:
> > >  -fix build errors...
> > >
> > > v3:
> > >  -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
> > >  -fix build errors
> > >  -rewrite migration patch so it actually passes all tests
> > >
> > > v2:
> > >  -don't touch io/ code anymore
> > >  -always register yank functions
> > >  -'yank' now takes a list of instances to yank
> > >  -'query-yank' returns a list of yankable instances
> > >
> > > Lukas Straub (8):
> > >   Introduce yank feature
> > >   block/nbd.c: Add yank feature
> > >   chardev/char-socket.c: Add yank feature
> > >   migration: Add yank feature
> > >   io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
> > >   io: Document thread-safety of qio_channel_shutdown
> > >   MAINTAINERS: Add myself as maintainer for yank feature
> > >   tests/test-char.c: Wait for the chardev to connect in
> > >     char_socket_client_dupid_test
> > >
> > >  MAINTAINERS                   |   6 ++
> > >  block/nbd.c                   | 129 +++++++++++++++---------
> > >  chardev/char-socket.c         |  31 ++++++
> > >  include/io/channel.h          |   2 +
> > >  include/qemu/yank.h           |  80 +++++++++++++++
> > >  io/channel-tls.c              |   6 +-
> > >  migration/channel.c           |  12 +++
> > >  migration/migration.c         |  25 ++++-
> > >  migration/multifd.c           |  10 ++
> > >  migration/qemu-file-channel.c |   6 ++
> > >  migration/savevm.c            |   6 ++
> > >  qapi/misc.json                |  45 +++++++++
> > >  tests/Makefile.include        |   2 +-
> > >  tests/test-char.c             |   1 +
> > >  util/Makefile.objs            |   1 +
> > >  util/yank.c                   | 184 ++++++++++++++++++++++++++++++++++
> > >  16 files changed, 493 insertions(+), 53 deletions(-)
> > >  create mode 100644 include/qemu/yank.h
> > >  create mode 100644 util/yank.c
> > >
> > > --
> > > 2.20.1  
> >
> > Ping...
>
> Ping 2...
>
> Also, can the different subsystems have a look at this and give their ok?

We need ACKs from the NBD, migration and chardev maintainers, for the
respective patches, then I think this series is ready for a pull request.

Once acks arrive, I'm happy to send a PULL unless someone else has a
desire todo it.


Regards,
Daniel
--
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v7 1/8] Introduce yank feature

Markus Armbruster
In reply to this post by Lukas Straub
I apologize for not reviewing this much earlier.

Lukas Straub <[hidden email]> writes:

> The yank feature allows to recover from hanging qemu by "yanking"
> at various parts. Other qemu systems can register themselves and
> multiple yank functions. Then all yank functions for selected
> instances can be called by the 'yank' out-of-band qmp command.
> Available instances can be queried by a 'query-yank' oob command.
>
> Signed-off-by: Lukas Straub <[hidden email]>
> Acked-by: Stefan Hajnoczi <[hidden email]>
> ---
>  include/qemu/yank.h |  80 +++++++++++++++++++
>  qapi/misc.json      |  45 +++++++++++
>  util/Makefile.objs  |   1 +
>  util/yank.c         | 184 ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 310 insertions(+)
>  create mode 100644 include/qemu/yank.h
>  create mode 100644 util/yank.c
>
> diff --git a/include/qemu/yank.h b/include/qemu/yank.h
> new file mode 100644
> index 0000000000..cd184fcd05
> --- /dev/null
> +++ b/include/qemu/yank.h
> @@ -0,0 +1,80 @@
> +/*
> + * QEMU yank feature
> + *
> + * Copyright (c) Lukas Straub <[hidden email]>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef YANK_H
> +#define YANK_H
> +
> +typedef void (YankFn) (void *opaque);

No space before parameter lists, please.

> +
> +/**
> + * yank_register_instance: Register a new instance.
> + *
> + * This registers a new instance for yanking. Must be called before any yank
> + * function is registered for this instance.
> + *
> + * This function is thread-safe.
> + *
> + * @instance_name: The globally unique name of the instance.
> + * @errp: ...
> + */
> +void yank_register_instance(const char *instance_name, Error **errp);
> +
> +/**
> + * yank_unregister_instance: Unregister a instance.
> + *
> + * This unregisters a instance. Must be called only after every yank function
> + * of the instance has been unregistered.
> + *
> + * This function is thread-safe.
> + *
> + * @instance_name: The name of the instance.
> + */
> +void yank_unregister_instance(const char *instance_name);
> +
> +/**
> + * yank_register_function: Register a yank function
> + *
> + * This registers a yank function. All limitations of qmp oob commands apply
> + * to the yank function as well.

The restrictions are documented in docs/devel/qapi-code-gen.txt under
"An OOB-capable command handler must satisfy the following conditions".
Let's point there,

> + *
> + * This function is thread-safe.
> + *
> + * @instance_name: The name of the instance
> + * @func: The yank function
> + * @opaque: Will be passed to the yank function
> + */
> +void yank_register_function(const char *instance_name,
> +                            YankFn *func,
> +                            void *opaque);

Pardon my ignorance... can you explain to me why a yank instance may
have multiple functions?

> +
> +/**
> + * yank_unregister_function: Unregister a yank function
> + *
> + * This unregisters a yank function.
> + *
> + * This function is thread-safe.
> + *
> + * @instance_name: The name of the instance
> + * @func: func that was passed to yank_register_function
> + * @opaque: opaque that was passed to yank_register_function
> + */
> +void yank_unregister_function(const char *instance_name,
> +                              YankFn *func,
> +                              void *opaque);
> +
> +/**
> + * yank_unregister_function: Generic yank function for iochannel

Pasto, should be

    * yank_generic_iochannel: ...

> + *
> + * This is a generic yank function which will call qio_channel_shutdown on the
> + * provided QIOChannel.
> + *
> + * @opaque: QIOChannel to shutdown
> + */
> +void yank_generic_iochannel(void *opaque);
> +#endif
> diff --git a/qapi/misc.json b/qapi/misc.json
> index 9d32820dc1..0d6a8f20b7 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -1615,3 +1615,48 @@
>  ##
>  { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }
>
> +##
> +# @YankInstances:
> +#
> +# @instances: List of yank instances.
> +#
> +# Yank instances are named after the following schema:
> +# "blockdev:<node-name>", "chardev:<chardev-name>" and "migration"
> +#
> +# Since: 5.1
> +##
> +{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } }

I'm afraid this is a problematic QMP interface.

By making YankInstances a struct, you keep the door open to adding more
members, which is good.

But by making its 'instances' member a ['str'], you close the door to
using anything but a single string for the individual instances.  Not so
good.

The single string encodes information which QMP client will need to
parse from the string.  We frown on that in QMP.  Use QAPI complex types
capabilities for structured data.

Could you use something like this instead?

{ 'enum': 'YankInstanceType',
  'data': { 'block-node', 'chardev', 'migration' } }

{ 'struct': 'YankInstanceBlockNode',
  'data': { 'node-name': 'str' } }

{ 'struct': 'YankInstanceChardev',
  'data' { 'label': 'str' } }

{ 'union': 'YankInstance',
  'base': { 'type': 'YankInstanceType' },
  'discriminator': 'type',
  'data': {
      'block-node': 'YankInstanceBlockNode',
      'chardev': 'YankInstanceChardev' } }

{ 'command': 'yank',
  'data': { 'instances': ['YankInstance'] },
  'allow-oob': true }

If you're confident nothing will ever be added to YankInstanceBlockNode
and YankInstanceChardev, you could use str instead.

> +
> +##
> +# @yank:
> +#
> +# Recover from hanging qemu by yanking the specified instances.

What's an "instance", and what does it mean to "yank" it?

The documentation of YankInstances above gives a clue on what an
"instance" is: presumably a block node, a character device or the
migration job.

I guess a YankInstance is whatever the code chooses to make one, and the
current code makes these three kinds.

Does it make every block node a YankInstance?  If not, which ones?

Does it make every character device a YankInstance?  If not, which ones?

Does it make migration always a YankInstance?  If not, when?

> +#
> +# Takes @YankInstances as argument.
> +#
> +# Returns: nothing.
> +#
> +# Example:
> +#
> +# -> { "execute": "yank", "arguments": { "instances": ["blockdev:nbd0"] } }
> +# <- { "return": {} }
> +#
> +# Since: 5.1
> +##
> +{ 'command': 'yank', 'data': 'YankInstances', 'allow-oob': true }
> +
> +##
> +# @query-yank:
> +#
> +# Query yank instances.
> +#
> +# Returns: @YankInstances
> +#
> +# Example:
> +#
> +# -> { "execute": "query-yank" }
> +# <- { "return": { "instances": ["blockdev:nbd0"] } }
> +#
> +# Since: 5.1
> +##
> +{ 'command': 'query-yank', 'returns': 'YankInstances', 'allow-oob': true }
> diff --git a/util/Makefile.objs b/util/Makefile.objs
> index cc5e37177a..13faa98425 100644
> --- a/util/Makefile.objs
> +++ b/util/Makefile.objs
> @@ -45,6 +45,7 @@ util-obj-$(CONFIG_GIO) += dbus.o
>  dbus.o-cflags = $(GIO_CFLAGS)
>  dbus.o-libs = $(GIO_LIBS)
>  util-obj-$(CONFIG_USER_ONLY) += selfmap.o
> +util-obj-y += yank.o
>
>  #######################################################################
>  # code used by both qemu system emulation and qemu-img
> diff --git a/util/yank.c b/util/yank.c
> new file mode 100644
> index 0000000000..b0cd27728b
> --- /dev/null
> +++ b/util/yank.c
> @@ -0,0 +1,184 @@
> +/*
> + * QEMU yank feature
> + *
> + * Copyright (c) Lukas Straub <[hidden email]>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu/thread.h"
> +#include "qemu/queue.h"
> +#include "qapi/qapi-commands-misc.h"
> +#include "io/channel.h"
> +#include "qemu/yank.h"
> +
> +struct YankFuncAndParam {
> +    YankFn *func;
> +    void *opaque;
> +    QLIST_ENTRY(YankFuncAndParam) next;
> +};
> +
> +struct YankInstance {
> +    char *name;
> +    QLIST_HEAD(, YankFuncAndParam) yankfns;
> +    QLIST_ENTRY(YankInstance) next;
> +};
> +
> +static QemuMutex lock;

Please give the variable a more telling name, such as @yank_lock, and
document what exactly the lock protects.  I can guess it's just the list
that immediately follows, but I prefer to be explicit when it comes to
locking.

> +static QLIST_HEAD(yankinst_list, YankInstance) head
> +    = QLIST_HEAD_INITIALIZER(head);

Please give the variable a more telling name, such as
@yank_instance_list.

I doubt there is a need to name the struct.  This should do:

   static QLIST_HEAD(, YankInstance) yank_instance_list

> +
> +static struct YankInstance *yank_find_instance(const char *name)
> +{
> +    struct YankInstance *tmp, *instance;
> +    instance = NULL;
> +    QLIST_FOREACH(tmp, &head, next) {
> +        if (!strcmp(tmp->name, name)) {
> +            instance = tmp;
> +        }
> +    }
> +    return instance;
> +}

Suggest

   static struct YankInstance *yank_find_instance(const char *name)
   {
       struct YankInstance *instance;

       QLIST_FOREACH(instance, &head, next) {
           if (!strcmp(instance->name, name)) {
               return instance;
           }
       }
       return NULL;
   }

> +
> +void yank_register_instance(const char *instance_name, Error **errp)
> +{
> +    struct YankInstance *instance;
> +
> +    qemu_mutex_lock(&lock);
> +
> +    if (yank_find_instance(instance_name)) {
> +        error_setg(errp, "duplicate yank instance name: '%s'", instance_name);

Rather long line, suggest to wrap before the last argument.

> +        qemu_mutex_unlock(&lock);
> +        return;
> +    }
> +
> +    instance = g_slice_new(struct YankInstance);
> +    instance->name = g_strdup(instance_name);
> +    QLIST_INIT(&instance->yankfns);
> +    QLIST_INSERT_HEAD(&head, instance, next);
> +
> +    qemu_mutex_unlock(&lock);
> +}
> +
> +void yank_unregister_instance(const char *instance_name)
> +{
> +    struct YankInstance *instance;
> +
> +    qemu_mutex_lock(&lock);
> +    instance = yank_find_instance(instance_name);
> +    assert(instance);
> +
> +    assert(QLIST_EMPTY(&instance->yankfns));
> +    QLIST_REMOVE(instance, next);
> +    g_free(instance->name);
> +    g_slice_free(struct YankInstance, instance);
> +
> +    qemu_mutex_unlock(&lock);
> +}
> +
> +void yank_register_function(const char *instance_name,
> +                            YankFn *func,
> +                            void *opaque)
> +{
> +    struct YankInstance *instance;
> +    struct YankFuncAndParam *entry;
> +
> +    qemu_mutex_lock(&lock);
> +    instance = yank_find_instance(instance_name);
> +    assert(instance);
> +
> +    entry = g_slice_new(struct YankFuncAndParam);
> +    entry->func = func;
> +    entry->opaque = opaque;
> +
> +    QLIST_INSERT_HEAD(&instance->yankfns, entry, next);
> +    qemu_mutex_unlock(&lock);
> +}
> +
> +void yank_unregister_function(const char *instance_name,
> +                              YankFn *func,
> +                              void *opaque)
> +{
> +    struct YankInstance *instance;
> +    struct YankFuncAndParam *entry;
> +
> +    qemu_mutex_lock(&lock);
> +    instance = yank_find_instance(instance_name);
> +    assert(instance);
> +
> +    QLIST_FOREACH(entry, &instance->yankfns, next) {
> +        if (entry->func == func && entry->opaque == opaque) {
> +            QLIST_REMOVE(entry, next);
> +            g_slice_free(struct YankFuncAndParam, entry);
> +            qemu_mutex_unlock(&lock);
> +            return;
> +        }
> +    }
> +
> +    abort();
> +}
> +
> +void yank_generic_iochannel(void *opaque)
> +{
> +    QIOChannel *ioc = QIO_CHANNEL(opaque);
> +
> +    qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> +}
> +
> +void qmp_yank(strList *instances,
> +              Error **errp)
> +{
> +    strList *tmp;
> +    struct YankInstance *instance;
> +    struct YankFuncAndParam *entry;
> +
> +    qemu_mutex_lock(&lock);
> +    tmp = instances;
> +    for (; tmp; tmp = tmp->next) {

Make that

       for (tail = instances; tail; tail = tail->next) {

> +        instance = yank_find_instance(tmp->value);
> +        if (!instance) {
> +            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
> +                      "Instance '%s' not found", tmp->value);
> +            qemu_mutex_unlock(&lock);
> +            return;
> +        }
> +    }
> +    tmp = instances;
> +    for (; tmp; tmp = tmp->next) {

Likewise.

> +        instance = yank_find_instance(tmp->value);
> +        assert(instance);
> +        QLIST_FOREACH(entry, &instance->yankfns, next) {
> +            entry->func(entry->opaque);
> +        }
> +    }
> +    qemu_mutex_unlock(&lock);
> +}
> +
> +YankInstances *qmp_query_yank(Error **errp)
> +{
> +    struct YankInstance *instance;
> +    YankInstances *ret;
> +
> +    ret = g_new0(YankInstances, 1);
> +    ret->instances = NULL;
> +
> +    qemu_mutex_lock(&lock);
> +    QLIST_FOREACH(instance, &head, next) {
> +        strList *entry;
> +        entry = g_new0(strList, 1);
> +        entry->value = g_strdup(instance->name);
> +        entry->next = ret->instances;
> +        ret->instances = entry;
> +    }
> +    qemu_mutex_unlock(&lock);
> +
> +    return ret;
> +}
> +
> +static void __attribute__((__constructor__)) yank_init(void)
> +{
> +    qemu_mutex_init(&lock);
> +}
> --
> 2.20.1

The two QMP commands permit out-of-band execution ('allow-oob': true).
OOB is easy to get wrong, but I figure you have a legitimate use case.
Let's review the restrictions documented in
docs/devel/qapi-code-gen.txt:

    An OOB-capable command handler must satisfy the following conditions:

    - It terminates quickly.
    - It does not invoke system calls that may block.
    - It does not access guest RAM that may block when userfaultfd is
      enabled for postcopy live migration.
    - It takes only "fast" locks, i.e. all critical sections protected by
      any lock it takes also satisfy the conditions for OOB command
      handler code.

Since the command handlers take &lock, the restrictions apply to the
other critical sections protected by &lock as well.  I believe these are
all okay: they do nothing but allocate, initialize and free memory.

The restrictions also apply to the YankFn callbacks, but you documented
that.  Okay.

The one such callback included in this patch is
yank_generic_iochannel(), which is a thin wrapper around
qio_channel_shutdown(), which in turn runs the io_shutdown method.
Thus, the restructions also apply to all the io_shutdown methods.
That's not documented.

Daniel, should it be documented?


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu

Markus Armbruster
In reply to this post by Daniel P. Berrangé
Daniel P. Berrangé <[hidden email]> writes:

> On Thu, Aug 27, 2020 at 10:42:46AM +0200, Lukas Straub wrote:
[...]
>> Also, can the different subsystems have a look at this and give their ok?
>
> We need ACKs from the NBD, migration and chardev maintainers, for the
> respective patches, then I think this series is ready for a pull request.

The QMP interface and its documentation need a bit of work, see my
review of PATCH 1.  I'm hopeful v8 will nail it.

> Once acks arrive, I'm happy to send a PULL unless someone else has a
> desire todo it.

Not yet, please.


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v7 1/8] Introduce yank feature

Daniel P. Berrangé
In reply to this post by Markus Armbruster
On Thu, Aug 27, 2020 at 02:37:00PM +0200, Markus Armbruster wrote:

> I apologize for not reviewing this much earlier.
>
> Lukas Straub <[hidden email]> writes:
>
> > The yank feature allows to recover from hanging qemu by "yanking"
> > at various parts. Other qemu systems can register themselves and
> > multiple yank functions. Then all yank functions for selected
> > instances can be called by the 'yank' out-of-band qmp command.
> > Available instances can be queried by a 'query-yank' oob command.
> >
> > Signed-off-by: Lukas Straub <[hidden email]>
> > Acked-by: Stefan Hajnoczi <[hidden email]>
> > ---
> >  include/qemu/yank.h |  80 +++++++++++++++++++
> >  qapi/misc.json      |  45 +++++++++++
> >  util/Makefile.objs  |   1 +
> >  util/yank.c         | 184 ++++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 310 insertions(+)
> >  create mode 100644 include/qemu/yank.h
> >  create mode 100644 util/yank.c


> > + *
> > + * This function is thread-safe.
> > + *
> > + * @instance_name: The name of the instance
> > + * @func: The yank function
> > + * @opaque: Will be passed to the yank function
> > + */
> > +void yank_register_function(const char *instance_name,
> > +                            YankFn *func,
> > +                            void *opaque);
>
> Pardon my ignorance... can you explain to me why a yank instance may
> have multiple functions?

multifd migration - there's a single migration "instance", which
has multiple FDs open, each of which has a func registered.


> > diff --git a/qapi/misc.json b/qapi/misc.json
> > index 9d32820dc1..0d6a8f20b7 100644
> > --- a/qapi/misc.json
> > +++ b/qapi/misc.json
> > @@ -1615,3 +1615,48 @@
> >  ##
> >  { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }
> >
> > +##
> > +# @YankInstances:
> > +#
> > +# @instances: List of yank instances.
> > +#
> > +# Yank instances are named after the following schema:
> > +# "blockdev:<node-name>", "chardev:<chardev-name>" and "migration"
> > +#
> > +# Since: 5.1
> > +##
> > +{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } }
>
> I'm afraid this is a problematic QMP interface.
>
> By making YankInstances a struct, you keep the door open to adding more
> members, which is good.
>
> But by making its 'instances' member a ['str'], you close the door to
> using anything but a single string for the individual instances.  Not so
> good.
>
> The single string encodes information which QMP client will need to
> parse from the string.  We frown on that in QMP.  Use QAPI complex types
> capabilities for structured data.
>
> Could you use something like this instead?
>
> { 'enum': 'YankInstanceType',
>   'data': { 'block-node', 'chardev', 'migration' } }
>
> { 'struct': 'YankInstanceBlockNode',
>   'data': { 'node-name': 'str' } }
>
> { 'struct': 'YankInstanceChardev',
>   'data' { 'label': 'str' } }
>
> { 'union': 'YankInstance',
>   'base': { 'type': 'YankInstanceType' },
>   'discriminator': 'type',
>   'data': {
>       'block-node': 'YankInstanceBlockNode',
>       'chardev': 'YankInstanceChardev' } }
>
> { 'command': 'yank',
>   'data': { 'instances': ['YankInstance'] },
>   'allow-oob': true }
>
> If you're confident nothing will ever be added to YankInstanceBlockNode
> and YankInstanceChardev, you could use str instead.

I raised this idea back in the v1 posting. There's a thread starting
at this:

  https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg02881.html

which eventually concluded that plain string is best.

I think that's right because the yank feature is providing generic
infrastructure with zero knowledge of backends that are using it.
The only interaction between the yank functionality and its callers
is via an opaque function callback. So there's no conceptual place
at which the yank infra would want to know about the backends nor
pass any backend specific config params to it.

> > +##
> > +# @yank:
> > +#
> > +# Recover from hanging qemu by yanking the specified instances.
>
> What's an "instance", and what does it mean to "yank" it?
>
> The documentation of YankInstances above gives a clue on what an
> "instance" is: presumably a block node, a character device or the
> migration job.
>
> I guess a YankInstance is whatever the code chooses to make one, and the
> current code makes these three kinds.
>
> Does it make every block node a YankInstance?  If not, which ones?
>
> Does it make every character device a YankInstance?  If not, which ones?
>
> Does it make migration always a YankInstance?  If not, when?

From the POV of the yank code, the "instance" is intentionally opaque.
Whatever the instance wants todo with its callback is acceptable, as
long as it isn't violating the rules for the callback by doing stuff
which can block. In practice right now, an instance is anything which
has a network connection associated with it, but it doesn't have to be
restricted to just networking. Anything which is talking to a service
which can get "stuck" is in scope for supporting yanking.

eg I could imagine an instance having some external helper process and
the yank callback would kill the process



> > +static void __attribute__((__constructor__)) yank_init(void)
> > +{
> > +    qemu_mutex_init(&lock);
> > +}
> > --
> > 2.20.1
>
> The two QMP commands permit out-of-band execution ('allow-oob': true).
> OOB is easy to get wrong, but I figure you have a legitimate use case.
> Let's review the restrictions documented in
> docs/devel/qapi-code-gen.txt:
>
>     An OOB-capable command handler must satisfy the following conditions:
>
>     - It terminates quickly.
>     - It does not invoke system calls that may block.
>     - It does not access guest RAM that may block when userfaultfd is
>       enabled for postcopy live migration.
>     - It takes only "fast" locks, i.e. all critical sections protected by
>       any lock it takes also satisfy the conditions for OOB command
>       handler code.
>
> Since the command handlers take &lock, the restrictions apply to the
> other critical sections protected by &lock as well.  I believe these are
> all okay: they do nothing but allocate, initialize and free memory.
>
> The restrictions also apply to the YankFn callbacks, but you documented
> that.  Okay.
>
> The one such callback included in this patch is
> yank_generic_iochannel(), which is a thin wrapper around
> qio_channel_shutdown(), which in turn runs the io_shutdown method.
> Thus, the restructions also apply to all the io_shutdown methods.
> That's not documented.
>
> Daniel, should it be documented?

Patch 6 documents that the qio_channel_shutdown method must be
thread safe but perhaps the doc could be more explicit

Regards,
Daniel
--
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


12