[PATCH v2 0/7] qcow2: Reduce the number of I/O ops when doing COW

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v2 0/7] qcow2: Reduce the number of I/O ops when doing COW

Alberto Garcia
Hi all,

here's a patch series that rewrites the copy-on-write code in the
qcow2 driver to reduce the number of I/O operations.

This is version v2, please refer to the original e-mail for a complete
description:

https://lists.gnu.org/archive/html/qemu-block/2017-05/msg00882.html

Regards,

Berto

Changes:

v2:
- Patch 1: Update commit message [Eric]
- Patch 7: Make sure that the number of iovs does not exceed IOV_MAX [Anton]
- Patch 7: Don't add zero-length buffers to the qiov in perform_cow()

v1: https://lists.gnu.org/archive/html/qemu-block/2017-05/msg00882.html
- Initial version

Output of git-backport-diff against v1:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/7:[down] 'qcow2: Remove unused Error variable in do_perform_cow()'
002/7:[----] [--] 'qcow2: Use unsigned int for both members of Qcow2COWRegion'
003/7:[----] [--] 'qcow2: Make perform_cow() call do_perform_cow() twice'
004/7:[----] [--] 'qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()'
005/7:[----] [--] 'qcow2: Allow reading both COW regions with only one request'
006/7:[----] [--] 'qcow2: Pass a QEMUIOVector to do_perform_cow_{read,write}()'
007/7:[0014] [FC] 'qcow2: Merge the writing of the COW regions with the guest data'

Alberto Garcia (7):
  qcow2: Remove unused Error variable in do_perform_cow()
  qcow2: Use unsigned int for both members of Qcow2COWRegion
  qcow2: Make perform_cow() call do_perform_cow() twice
  qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()
  qcow2: Allow reading both COW regions with only one request
  qcow2: Pass a QEMUIOVector to do_perform_cow_{read,write}()
  qcow2: Merge the writing of the COW regions with the guest data

 block/qcow2-cluster.c | 192 +++++++++++++++++++++++++++++++++++++-------------
 block/qcow2.c         |  64 ++++++++++++++---
 block/qcow2.h         |  11 ++-
 3 files changed, 207 insertions(+), 60 deletions(-)

--
2.11.0


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v2 1/7] qcow2: Remove unused Error variable in do_perform_cow()

Alberto Garcia
We are using the return value of qcow2_encrypt_sectors() to detect
problems but we are throwing away the returned Error since we have no
way to report it to the user. Therefore we can simply get rid of the
local Error variable and pass NULL instead.

Alternatively we could try to figure out a way to pass the original
error instead of simply returning -EIO, but that would be more
invasive, so let's keep the current approach.

Signed-off-by: Alberto Garcia <[hidden email]>
---
 block/qcow2-cluster.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index d779ea19cf..d1c419f52b 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -440,16 +440,14 @@ static int coroutine_fn do_perform_cow(BlockDriverState *bs,
     }
 
     if (bs->encrypted) {
-        Error *err = NULL;
         int64_t sector = (src_cluster_offset + offset_in_cluster)
                          >> BDRV_SECTOR_BITS;
         assert(s->cipher);
         assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
         assert((bytes & ~BDRV_SECTOR_MASK) == 0);
         if (qcow2_encrypt_sectors(s, sector, iov.iov_base, iov.iov_base,
-                                  bytes >> BDRV_SECTOR_BITS, true, &err) < 0) {
+                                  bytes >> BDRV_SECTOR_BITS, true, NULL) < 0) {
             ret = -EIO;
-            error_free(err);
             goto out;
         }
     }
--
2.11.0


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v2 2/7] qcow2: Use unsigned int for both members of Qcow2COWRegion

Alberto Garcia
In reply to this post by Alberto Garcia
Qcow2COWRegion has two attributes:

- The offset of the COW region from the start of the first cluster
  touched by the I/O request. Since it's always going to be positive
  and the maximum request size is at most INT_MAX, we can use a
  regular unsigned int to store this offset.

- The size of the COW region in bytes. This is guaranteed to be >= 0,
  so we should use an unsigned type instead.

In x86_64 this reduces the size of Qcow2COWRegion from 16 to 8 bytes.
It will also help keep some assertions simpler now that we know that
there are no negative numbers.

The prototype of do_perform_cow() is also updated to reflect these
changes.

Signed-off-by: Alberto Garcia <[hidden email]>
---
 block/qcow2-cluster.c | 4 ++--
 block/qcow2.h         | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index d1c419f52b..a86c5a75a9 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -406,8 +406,8 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
 static int coroutine_fn do_perform_cow(BlockDriverState *bs,
                                        uint64_t src_cluster_offset,
                                        uint64_t cluster_offset,
-                                       int offset_in_cluster,
-                                       int bytes)
+                                       unsigned offset_in_cluster,
+                                       unsigned bytes)
 {
     BDRVQcow2State *s = bs->opaque;
     QEMUIOVector qiov;
diff --git a/block/qcow2.h b/block/qcow2.h
index 1801dc30dc..c26ee0a33d 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -301,10 +301,10 @@ typedef struct Qcow2COWRegion {
      * Offset of the COW region in bytes from the start of the first cluster
      * touched by the request.
      */
-    uint64_t    offset;
+    unsigned    offset;
 
     /** Number of bytes to copy */
-    int         nb_bytes;
+    unsigned    nb_bytes;
 } Qcow2COWRegion;
 
 /**
--
2.11.0


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v2 3/7] qcow2: Make perform_cow() call do_perform_cow() twice

Alberto Garcia
In reply to this post by Alberto Garcia
Instead of calling perform_cow() twice with a different COW region
each time, call it just once and make perform_cow() handle both
regions.

This patch simply moves code around. The next one will do the actual
reordering of the COW operations.

Signed-off-by: Alberto Garcia <[hidden email]>
---
 block/qcow2-cluster.c | 38 +++++++++++++++++++++++---------------
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index a86c5a75a9..4c03639a72 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -414,6 +414,10 @@ static int coroutine_fn do_perform_cow(BlockDriverState *bs,
     struct iovec iov;
     int ret;
 
+    if (bytes == 0) {
+        return 0;
+    }
+
     iov.iov_len = bytes;
     iov.iov_base = qemu_try_blockalign(bs, iov.iov_len);
     if (iov.iov_base == NULL) {
@@ -751,31 +755,40 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
     return cluster_offset;
 }
 
-static int perform_cow(BlockDriverState *bs, QCowL2Meta *m, Qcow2COWRegion *r)
+static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
 {
     BDRVQcow2State *s = bs->opaque;
+    Qcow2COWRegion *start = &m->cow_start;
+    Qcow2COWRegion *end = &m->cow_end;
     int ret;
 
-    if (r->nb_bytes == 0) {
+    if (start->nb_bytes == 0 && end->nb_bytes == 0) {
         return 0;
     }
 
     qemu_co_mutex_unlock(&s->lock);
-    ret = do_perform_cow(bs, m->offset, m->alloc_offset, r->offset, r->nb_bytes);
+    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
+                         start->offset, start->nb_bytes);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
+                         end->offset, end->nb_bytes);
+
+fail:
     qemu_co_mutex_lock(&s->lock);
 
-    if (ret < 0) {
-        return ret;
-    }
-
     /*
      * Before we update the L2 table to actually point to the new cluster, we
      * need to be sure that the refcounts have been increased and COW was
      * handled.
      */
-    qcow2_cache_depends_on_flush(s->l2_table_cache);
+    if (ret == 0) {
+        qcow2_cache_depends_on_flush(s->l2_table_cache);
+    }
 
-    return 0;
+    return ret;
 }
 
 int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
@@ -795,12 +808,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
     }
 
     /* copy content of unmodified sectors */
-    ret = perform_cow(bs, m, &m->cow_start);
-    if (ret < 0) {
-        goto err;
-    }
-
-    ret = perform_cow(bs, m, &m->cow_end);
+    ret = perform_cow(bs, m);
     if (ret < 0) {
         goto err;
     }
--
2.11.0


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v2 4/7] qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()

Alberto Garcia
In reply to this post by Alberto Garcia
This patch splits do_perform_cow() into three separate functions to
read, encrypt and write the COW regions.

perform_cow() can now read both regions first, then encrypt them and
finally write them to disk. The memory allocation is also done in
this function now, using one single buffer large enough to hold both
regions.

Signed-off-by: Alberto Garcia <[hidden email]>
---
 block/qcow2-cluster.c | 114 +++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 84 insertions(+), 30 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 4c03639a72..af43e6a34f 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -403,34 +403,26 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
     return 0;
 }
 
-static int coroutine_fn do_perform_cow(BlockDriverState *bs,
-                                       uint64_t src_cluster_offset,
-                                       uint64_t cluster_offset,
-                                       unsigned offset_in_cluster,
-                                       unsigned bytes)
+static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
+                                            uint64_t src_cluster_offset,
+                                            unsigned offset_in_cluster,
+                                            uint8_t *buffer,
+                                            unsigned bytes)
 {
-    BDRVQcow2State *s = bs->opaque;
     QEMUIOVector qiov;
-    struct iovec iov;
+    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
     int ret;
 
     if (bytes == 0) {
         return 0;
     }
 
-    iov.iov_len = bytes;
-    iov.iov_base = qemu_try_blockalign(bs, iov.iov_len);
-    if (iov.iov_base == NULL) {
-        return -ENOMEM;
-    }
-
     qemu_iovec_init_external(&qiov, &iov, 1);
 
     BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
 
     if (!bs->drv) {
-        ret = -ENOMEDIUM;
-        goto out;
+        return -ENOMEDIUM;
     }
 
     /* Call .bdrv_co_readv() directly instead of using the public block-layer
@@ -440,39 +432,63 @@ static int coroutine_fn do_perform_cow(BlockDriverState *bs,
     ret = bs->drv->bdrv_co_preadv(bs, src_cluster_offset + offset_in_cluster,
                                   bytes, &qiov, 0);
     if (ret < 0) {
-        goto out;
+        return ret;
     }
 
-    if (bs->encrypted) {
+    return 0;
+}
+
+static bool coroutine_fn do_perform_cow_encrypt(BlockDriverState *bs,
+                                                uint64_t src_cluster_offset,
+                                                unsigned offset_in_cluster,
+                                                uint8_t *buffer,
+                                                unsigned bytes)
+{
+    if (bytes && bs->encrypted) {
+        BDRVQcow2State *s = bs->opaque;
         int64_t sector = (src_cluster_offset + offset_in_cluster)
                          >> BDRV_SECTOR_BITS;
         assert(s->cipher);
         assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
         assert((bytes & ~BDRV_SECTOR_MASK) == 0);
-        if (qcow2_encrypt_sectors(s, sector, iov.iov_base, iov.iov_base,
+        if (qcow2_encrypt_sectors(s, sector, buffer, buffer,
                                   bytes >> BDRV_SECTOR_BITS, true, NULL) < 0) {
-            ret = -EIO;
-            goto out;
+            return false;
         }
     }
+    return true;
+}
+
+static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
+                                             uint64_t cluster_offset,
+                                             unsigned offset_in_cluster,
+                                             uint8_t *buffer,
+                                             unsigned bytes)
+{
+    QEMUIOVector qiov;
+    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
+    int ret;
+
+    if (bytes == 0) {
+        return 0;
+    }
+
+    qemu_iovec_init_external(&qiov, &iov, 1);
 
     ret = qcow2_pre_write_overlap_check(bs, 0,
             cluster_offset + offset_in_cluster, bytes);
     if (ret < 0) {
-        goto out;
+        return ret;
     }
 
     BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
     ret = bdrv_co_pwritev(bs->file, cluster_offset + offset_in_cluster,
                           bytes, &qiov, 0);
     if (ret < 0) {
-        goto out;
+        return ret;
     }
 
-    ret = 0;
-out:
-    qemu_vfree(iov.iov_base);
-    return ret;
+    return 0;
 }
 
 
@@ -760,22 +776,59 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     BDRVQcow2State *s = bs->opaque;
     Qcow2COWRegion *start = &m->cow_start;
     Qcow2COWRegion *end = &m->cow_end;
+    unsigned buffer_size = start->nb_bytes + end->nb_bytes;
+    uint8_t *start_buffer, *end_buffer;
     int ret;
 
+    assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
+
     if (start->nb_bytes == 0 && end->nb_bytes == 0) {
         return 0;
     }
 
+    /* Reserve a buffer large enough to store the data from both the
+     * start and end COW regions */
+    start_buffer = qemu_try_blockalign(bs, buffer_size);
+    if (start_buffer == NULL) {
+        return -ENOMEM;
+    }
+    /* The part of the buffer where the end region is located */
+    end_buffer = start_buffer + start->nb_bytes;
+
     qemu_co_mutex_unlock(&s->lock);
-    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
-                         start->offset, start->nb_bytes);
+    /* First we read the existing data from both COW regions */
+    ret = do_perform_cow_read(bs, m->offset, start->offset,
+                              start_buffer, start->nb_bytes);
     if (ret < 0) {
         goto fail;
     }
 
-    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
-                         end->offset, end->nb_bytes);
+    ret = do_perform_cow_read(bs, m->offset, end->offset,
+                              end_buffer, end->nb_bytes);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    /* Encrypt the data if necessary before writing it */
+    if (bs->encrypted) {
+        if (!do_perform_cow_encrypt(bs, m->offset, start->offset,
+                                    start_buffer, start->nb_bytes) ||
+            !do_perform_cow_encrypt(bs, m->offset, end->offset,
+                                    end_buffer, end->nb_bytes)) {
+            ret = -EIO;
+            goto fail;
+        }
+    }
+
+    /* And now we can write everything */
+    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset,
+                               start_buffer, start->nb_bytes);
+    if (ret < 0) {
+        goto fail;
+    }
 
+    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset,
+                               end_buffer, end->nb_bytes);
 fail:
     qemu_co_mutex_lock(&s->lock);
 
@@ -788,6 +841,7 @@ fail:
         qcow2_cache_depends_on_flush(s->l2_table_cache);
     }
 
+    qemu_vfree(start_buffer);
     return ret;
 }
 
--
2.11.0


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v2 5/7] qcow2: Allow reading both COW regions with only one request

Alberto Garcia
In reply to this post by Alberto Garcia
Reading both COW regions requires two separate requests, but it's
perfectly possible to merge them and perform only one. This generally
improves performance, particularly on rotating disk drives. The
downside is that the data in the middle region is read but discarded.

This patch takes a conservative approach and only merges reads when
the size of the middle region is <= 16KB.

Signed-off-by: Alberto Garcia <[hidden email]>
---
 block/qcow2-cluster.c | 41 ++++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index af43e6a34f..8f6bc3d0b9 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -777,34 +777,53 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     Qcow2COWRegion *start = &m->cow_start;
     Qcow2COWRegion *end = &m->cow_end;
     unsigned buffer_size = start->nb_bytes + end->nb_bytes;
+    unsigned data_bytes = end->offset - (start->offset + start->nb_bytes);
+    bool merge_reads = false;
     uint8_t *start_buffer, *end_buffer;
     int ret;
 
     assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
+    assert(buffer_size <= UINT_MAX - data_bytes);
+    assert(start->offset + start->nb_bytes <= end->offset);
 
     if (start->nb_bytes == 0 && end->nb_bytes == 0) {
         return 0;
     }
 
-    /* Reserve a buffer large enough to store the data from both the
-     * start and end COW regions */
+    /* If we have to read both the start and end COW regions and the
+     * middle region is not too large then perform just one read
+     * operation */
+    if (start->nb_bytes && end->nb_bytes && data_bytes <= 16384) {
+        merge_reads = true;
+        buffer_size += data_bytes;
+    }
+
+    /* Reserve a buffer large enough to store all the data that we're
+     * going to read */
     start_buffer = qemu_try_blockalign(bs, buffer_size);
     if (start_buffer == NULL) {
         return -ENOMEM;
     }
     /* The part of the buffer where the end region is located */
-    end_buffer = start_buffer + start->nb_bytes;
+    end_buffer = start_buffer + buffer_size - end->nb_bytes;
 
     qemu_co_mutex_unlock(&s->lock);
-    /* First we read the existing data from both COW regions */
-    ret = do_perform_cow_read(bs, m->offset, start->offset,
-                              start_buffer, start->nb_bytes);
-    if (ret < 0) {
-        goto fail;
-    }
+    /* First we read the existing data from both COW regions. We
+     * either read the whole region in one go, or the start and end
+     * regions separately. */
+    if (merge_reads) {
+        ret = do_perform_cow_read(bs, m->offset, start->offset,
+                                  start_buffer, buffer_size);
+    } else {
+        ret = do_perform_cow_read(bs, m->offset, start->offset,
+                                  start_buffer, start->nb_bytes);
+        if (ret < 0) {
+            goto fail;
+        }
 
-    ret = do_perform_cow_read(bs, m->offset, end->offset,
-                              end_buffer, end->nb_bytes);
+        ret = do_perform_cow_read(bs, m->offset, end->offset,
+                                  end_buffer, end->nb_bytes);
+    }
     if (ret < 0) {
         goto fail;
     }
--
2.11.0


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v2 6/7] qcow2: Pass a QEMUIOVector to do_perform_cow_{read, write}()

Alberto Garcia
In reply to this post by Alberto Garcia
Instead of passing a single buffer pointer to do_perform_cow_write(),
pass a QEMUIOVector. This will allow us to merge the write requests
for the COW regions and the actual data into a single one.

Although do_perform_cow_read() does not strictly need to change its
API, we're doing it here as well for consistency.

Signed-off-by: Alberto Garcia <[hidden email]>
---
 block/qcow2-cluster.c | 51 ++++++++++++++++++++++++---------------------------
 1 file changed, 24 insertions(+), 27 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 8f6bc3d0b9..71609ff7a2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -406,19 +406,14 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
 static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
                                             uint64_t src_cluster_offset,
                                             unsigned offset_in_cluster,
-                                            uint8_t *buffer,
-                                            unsigned bytes)
+                                            QEMUIOVector *qiov)
 {
-    QEMUIOVector qiov;
-    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
     int ret;
 
-    if (bytes == 0) {
+    if (qiov->size == 0) {
         return 0;
     }
 
-    qemu_iovec_init_external(&qiov, &iov, 1);
-
     BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
 
     if (!bs->drv) {
@@ -430,7 +425,7 @@ static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
      * which can lead to deadlock when block layer copy-on-read is enabled.
      */
     ret = bs->drv->bdrv_co_preadv(bs, src_cluster_offset + offset_in_cluster,
-                                  bytes, &qiov, 0);
+                                  qiov->size, qiov, 0);
     if (ret < 0) {
         return ret;
     }
@@ -462,28 +457,23 @@ static bool coroutine_fn do_perform_cow_encrypt(BlockDriverState *bs,
 static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
                                              uint64_t cluster_offset,
                                              unsigned offset_in_cluster,
-                                             uint8_t *buffer,
-                                             unsigned bytes)
+                                             QEMUIOVector *qiov)
 {
-    QEMUIOVector qiov;
-    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
     int ret;
 
-    if (bytes == 0) {
+    if (qiov->size == 0) {
         return 0;
     }
 
-    qemu_iovec_init_external(&qiov, &iov, 1);
-
     ret = qcow2_pre_write_overlap_check(bs, 0,
-            cluster_offset + offset_in_cluster, bytes);
+            cluster_offset + offset_in_cluster, qiov->size);
     if (ret < 0) {
         return ret;
     }
 
     BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
     ret = bdrv_co_pwritev(bs->file, cluster_offset + offset_in_cluster,
-                          bytes, &qiov, 0);
+                          qiov->size, qiov, 0);
     if (ret < 0) {
         return ret;
     }
@@ -780,6 +770,7 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     unsigned data_bytes = end->offset - (start->offset + start->nb_bytes);
     bool merge_reads = false;
     uint8_t *start_buffer, *end_buffer;
+    QEMUIOVector qiov;
     int ret;
 
     assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
@@ -807,22 +798,25 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     /* The part of the buffer where the end region is located */
     end_buffer = start_buffer + buffer_size - end->nb_bytes;
 
+    qemu_iovec_init(&qiov, 3);
+
     qemu_co_mutex_unlock(&s->lock);
     /* First we read the existing data from both COW regions. We
      * either read the whole region in one go, or the start and end
      * regions separately. */
     if (merge_reads) {
-        ret = do_perform_cow_read(bs, m->offset, start->offset,
-                                  start_buffer, buffer_size);
+        qemu_iovec_add(&qiov, start_buffer, buffer_size);
+        ret = do_perform_cow_read(bs, m->offset, start->offset, &qiov);
     } else {
-        ret = do_perform_cow_read(bs, m->offset, start->offset,
-                                  start_buffer, start->nb_bytes);
+        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
+        ret = do_perform_cow_read(bs, m->offset, start->offset, &qiov);
         if (ret < 0) {
             goto fail;
         }
 
-        ret = do_perform_cow_read(bs, m->offset, end->offset,
-                                  end_buffer, end->nb_bytes);
+        qemu_iovec_reset(&qiov);
+        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
+        ret = do_perform_cow_read(bs, m->offset, end->offset, &qiov);
     }
     if (ret < 0) {
         goto fail;
@@ -840,14 +834,16 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     }
 
     /* And now we can write everything */
-    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset,
-                               start_buffer, start->nb_bytes);
+    qemu_iovec_reset(&qiov);
+    qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
+    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
     if (ret < 0) {
         goto fail;
     }
 
-    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset,
-                               end_buffer, end->nb_bytes);
+    qemu_iovec_reset(&qiov);
+    qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
+    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset, &qiov);
 fail:
     qemu_co_mutex_lock(&s->lock);
 
@@ -861,6 +857,7 @@ fail:
     }
 
     qemu_vfree(start_buffer);
+    qemu_iovec_destroy(&qiov);
     return ret;
 }
 
--
2.11.0


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v2 7/7] qcow2: Merge the writing of the COW regions with the guest data

Alberto Garcia
In reply to this post by Alberto Garcia
If the guest tries to write data that results on the allocation of a
new cluster, instead of writing the guest data first and then the data
from the COW regions, write everything together using one single I/O
operation.

This can improve the write performance by 25% or more, depending on
several factors such as the media type, the cluster size and the I/O
request size.

Signed-off-by: Alberto Garcia <[hidden email]>
---
 block/qcow2-cluster.c | 38 ++++++++++++++++++++++--------
 block/qcow2.c         | 64 +++++++++++++++++++++++++++++++++++++++++++--------
 block/qcow2.h         |  7 ++++++
 3 files changed, 90 insertions(+), 19 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 71609ff7a2..8e789e790b 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -776,6 +776,7 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
     assert(buffer_size <= UINT_MAX - data_bytes);
     assert(start->offset + start->nb_bytes <= end->offset);
+    assert(!m->data_qiov || m->data_qiov->size == data_bytes);
 
     if (start->nb_bytes == 0 && end->nb_bytes == 0) {
         return 0;
@@ -833,17 +834,36 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
         }
     }
 
-    /* And now we can write everything */
-    qemu_iovec_reset(&qiov);
-    qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
-    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
-    if (ret < 0) {
-        goto fail;
+    /* And now we can write everything. If we have the guest data we
+     * can write everything in one single operation */
+    if (m->data_qiov) {
+        qemu_iovec_reset(&qiov);
+        if (start->nb_bytes) {
+            qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
+        }
+        qemu_iovec_concat(&qiov, m->data_qiov, 0, data_bytes);
+        if (end->nb_bytes) {
+            qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
+        }
+        /* NOTE: we have a write_aio blkdebug event here followed by
+         * a cow_write one in do_perform_cow_write(), but there's only
+         * one single I/O operation */
+        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
+        ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
+    } else {
+        /* If there's no guest data then write both COW regions separately */
+        qemu_iovec_reset(&qiov);
+        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
+        ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
+        if (ret < 0) {
+            goto fail;
+        }
+
+        qemu_iovec_reset(&qiov);
+        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
+        ret = do_perform_cow_write(bs, m->alloc_offset, end->offset, &qiov);
     }
 
-    qemu_iovec_reset(&qiov);
-    qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
-    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset, &qiov);
 fail:
     qemu_co_mutex_lock(&s->lock);
 
diff --git a/block/qcow2.c b/block/qcow2.c
index b3ba5daa93..89be2083d4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1575,6 +1575,44 @@ fail:
     return ret;
 }
 
+/* Check if it's possible to merge a write request with the writing of
+ * the data from the COW regions */
+static bool can_merge_cow(uint64_t offset, unsigned bytes,
+                          QEMUIOVector *hd_qiov, QCowL2Meta *l2meta)
+{
+    QCowL2Meta *m;
+
+    for (m = l2meta; m != NULL; m = m->next) {
+        /* If both COW regions are empty then there's nothing to merge */
+        if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) {
+            continue;
+        }
+
+        /* The data (middle) region must be immediately after the
+         * start region */
+        if (l2meta_cow_start(m) + m->cow_start.nb_bytes != offset) {
+            continue;
+        }
+
+        /* The end region must be immediately after the data (middle)
+         * region */
+        if (m->offset + m->cow_end.offset != offset + bytes) {
+            continue;
+        }
+
+        /* Make sure that adding both COW regions to the QEMUIOVector
+         * does not exceed IOV_MAX */
+        if (hd_qiov->niov > IOV_MAX - 2) {
+            continue;
+        }
+
+        m->data_qiov = hd_qiov;
+        return true;
+    }
+
+    return false;
+}
+
 static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
                                          uint64_t bytes, QEMUIOVector *qiov,
                                          int flags)
@@ -1657,16 +1695,22 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
             goto fail;
         }
 
-        qemu_co_mutex_unlock(&s->lock);
-        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
-        trace_qcow2_writev_data(qemu_coroutine_self(),
-                                cluster_offset + offset_in_cluster);
-        ret = bdrv_co_pwritev(bs->file,
-                              cluster_offset + offset_in_cluster,
-                              cur_bytes, &hd_qiov, 0);
-        qemu_co_mutex_lock(&s->lock);
-        if (ret < 0) {
-            goto fail;
+        /* If we need to do COW, check if it's possible to merge the
+         * writing of the guest data together with that of the COW regions.
+         * If it's not possible (or not necessary) then write the
+         * guest data now. */
+        if (!can_merge_cow(offset, cur_bytes, &hd_qiov, l2meta)) {
+            qemu_co_mutex_unlock(&s->lock);
+            BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
+            trace_qcow2_writev_data(qemu_coroutine_self(),
+                                    cluster_offset + offset_in_cluster);
+            ret = bdrv_co_pwritev(bs->file,
+                                  cluster_offset + offset_in_cluster,
+                                  cur_bytes, &hd_qiov, 0);
+            qemu_co_mutex_lock(&s->lock);
+            if (ret < 0) {
+                goto fail;
+            }
         }
 
         while (l2meta != NULL) {
diff --git a/block/qcow2.h b/block/qcow2.h
index c26ee0a33d..87b15eb4aa 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -343,6 +343,13 @@ typedef struct QCowL2Meta
      */
     Qcow2COWRegion cow_end;
 
+    /**
+     * The I/O vector with the data from the actual guest write request.
+     * If non-NULL, this is meant to be merged together with the data
+     * from @cow_start and @cow_end into one single write operation.
+     */
+    QEMUIOVector *data_qiov;
+
     /** Pointer to next L2Meta of the same write request */
     struct QCowL2Meta *next;
 
--
2.11.0


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v2 1/7] qcow2: Remove unused Error variable in do_perform_cow()

Eric Blake
In reply to this post by Alberto Garcia
On 06/07/2017 09:08 AM, Alberto Garcia wrote:

> We are using the return value of qcow2_encrypt_sectors() to detect
> problems but we are throwing away the returned Error since we have no
> way to report it to the user. Therefore we can simply get rid of the
> local Error variable and pass NULL instead.
>
> Alternatively we could try to figure out a way to pass the original
> error instead of simply returning -EIO, but that would be more
> invasive, so let's keep the current approach.
>
> Signed-off-by: Alberto Garcia <[hidden email]>
> ---
>  block/qcow2-cluster.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
Reviewed-by: Eric Blake <[hidden email]>

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


signature.asc (617 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v2 2/7] qcow2: Use unsigned int for both members of Qcow2COWRegion

Eric Blake
In reply to this post by Alberto Garcia
On 06/07/2017 09:08 AM, Alberto Garcia wrote:
> Qcow2COWRegion has two attributes:
>
> - The offset of the COW region from the start of the first cluster
>   touched by the I/O request. Since it's always going to be positive
>   and the maximum request size is at most INT_MAX, we can use a
>   regular unsigned int to store this offset.

I don't know if we will ever get to the point that we allow a 64-bit
request at the block layer (and then the block layer guarantees it is
split down to the driver's limits, which works when a driver is still
bound by 32-bit limits).  But we ALSO know that a cluster is at most 2M
(in our current implementation of qcow2), so the offset of where the COW
region starts in relation to the start of a cluster is < 2M.

>
> - The size of the COW region in bytes. This is guaranteed to be >= 0,
>   so we should use an unsigned type instead.

And likewise, since a COW region is a sub-cluster, and clusters are
bounded at 2M, we also have a sub-int upper bound on the size of the region.

>
> In x86_64 this reduces the size of Qcow2COWRegion from 16 to 8 bytes.
> It will also help keep some assertions simpler now that we know that
> there are no negative numbers.
>
> The prototype of do_perform_cow() is also updated to reflect these
> changes.
>
> Signed-off-by: Alberto Garcia <[hidden email]>
> ---
>  block/qcow2-cluster.c | 4 ++--
>  block/qcow2.h         | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index d1c419f52b..a86c5a75a9 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -406,8 +406,8 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
>  static int coroutine_fn do_perform_cow(BlockDriverState *bs,
>                                         uint64_t src_cluster_offset,
>                                         uint64_t cluster_offset,
> -                                       int offset_in_cluster,
> -                                       int bytes)
> +                                       unsigned offset_in_cluster,
> +                                       unsigned bytes)
I don't know if the code base has a strong preference for 'unsigned int'
over 'unsigned', but it doesn't bother me.

Reviewed-by: Eric Blake <[hidden email]>

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


signature.asc (617 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v2 6/7] qcow2: Pass a QEMUIOVector to do_perform_cow_{read, write}()

Manos Pitsidianakis
In reply to this post by Alberto Garcia
On Wed, Jun 07, 2017 at 05:08:27PM +0300, Alberto Garcia wrote:
>Instead of passing a single buffer pointer to do_perform_cow_write(),
>pass a QEMUIOVector. This will allow us to merge the write requests
>for the COW regions and the actual data into a single one.
>
>Although do_perform_cow_read() does not strictly need to change its
>API, we're doing it here as well for consistency.
>
>Signed-off-by: Alberto Garcia <[hidden email]>

Reviewed-by: Manos Pitsidianakis <[hidden email]>

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v2 3/7] qcow2: Make perform_cow() call do_perform_cow() twice

Eric Blake
In reply to this post by Alberto Garcia
On 06/07/2017 09:08 AM, Alberto Garcia wrote:

> Instead of calling perform_cow() twice with a different COW region
> each time, call it just once and make perform_cow() handle both
> regions.
>
> This patch simply moves code around. The next one will do the actual
> reordering of the COW operations.
>
> Signed-off-by: Alberto Garcia <[hidden email]>
> ---
>  block/qcow2-cluster.c | 38 +++++++++++++++++++++++---------------
>  1 file changed, 23 insertions(+), 15 deletions(-)

>      qemu_co_mutex_unlock(&s->lock);
> -    ret = do_perform_cow(bs, m->offset, m->alloc_offset, r->offset, r->nb_bytes);
> +    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
> +                         start->offset, start->nb_bytes);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
> +                         end->offset, end->nb_bytes);
> +
> +fail:
Since you reach this label even on success, it feels a bit awkward. I
probably would have avoided the goto and used fewer lines by refactoring
the logic as:

ret = do_perform_cow(..., start->...);
if (ret >= 0) {
    ret = do_perform_cow(..., end->...);
}

But that doesn't affect correctness, so whether or not you redo the
logic in the shorter fashion:

Reviewed-by: Eric Blake <[hidden email]>

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


signature.asc (617 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v2 3/7] qcow2: Make perform_cow() call do_perform_cow() twice

Alberto Garcia
On Wed 07 Jun 2017 11:43:34 PM CEST, Eric Blake wrote:

>>  block/qcow2-cluster.c | 38 +++++++++++++++++++++++---------------
>>  1 file changed, 23 insertions(+), 15 deletions(-)
>
>>      qemu_co_mutex_unlock(&s->lock);
>> -    ret = do_perform_cow(bs, m->offset, m->alloc_offset, r->offset, r->nb_bytes);
>> +    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
>> +                         start->offset, start->nb_bytes);
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
>> +                         end->offset, end->nb_bytes);
>> +
>> +fail:
>
> Since you reach this label even on success, it feels a bit awkward. I
> probably would have avoided the goto and used fewer lines by
> refactoring the logic as:
>
> ret = do_perform_cow(..., start->...);
> if (ret >= 0) {
>     ret = do_perform_cow(..., end->...);
> }

I actually wrote this code without any gotos the way you mention, and it
works fine in this patch, but as I was making more changes I ended up
with a quite large piece of code that needed to add "if (!ret)" to
almost every if statement, so I didn't quite like the final result.

I'm open to reconsider it, but take a look first at the code after the
last patch and you'll see that it would not be as neat as it appears
now.

Berto

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v2 2/7] qcow2: Use unsigned int for both members of Qcow2COWRegion

Alberto Garcia
In reply to this post by Eric Blake
On Wed 07 Jun 2017 06:02:06 PM CEST, Eric Blake wrote:

>> - The offset of the COW region from the start of the first cluster
>>   touched by the I/O request. Since it's always going to be positive
>>   and the maximum request size is at most INT_MAX, we can use a
>>   regular unsigned int to store this offset.
>
> I don't know if we will ever get to the point that we allow a 64-bit
> request at the block layer (and then the block layer guarantees it is
> split down to the driver's limits, which works when a driver is still
> bound by 32-bit limits).  But we ALSO know that a cluster is at most
> 2M (in our current implementation of qcow2), so the offset of where
> the COW region starts in relation to the start of a cluster is < 2M.

That's correct for the offset of the first region (m->cow_start),
however m->cow_end is also relative to the offset of the first allocated
cluster, so it can be > 2M if the requests spans several clusters.

So I guess the maximum theoretical offset would be something like
INT_MAX + 2*cluster_size (a bit below that actually).

>> - The size of the COW region in bytes. This is guaranteed to be >= 0,
>>   so we should use an unsigned type instead.
>
> And likewise, since a COW region is a sub-cluster, and clusters are
> bounded at 2M, we also have a sub-int upper bound on the size of the
> region.

That's correct.

Berto

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v2 2/7] qcow2: Use unsigned int for both members of Qcow2COWRegion

Eric Blake
On 06/08/2017 08:06 AM, Alberto Garcia wrote:

> On Wed 07 Jun 2017 06:02:06 PM CEST, Eric Blake wrote:
>
>>> - The offset of the COW region from the start of the first cluster
>>>   touched by the I/O request. Since it's always going to be positive
>>>   and the maximum request size is at most INT_MAX, we can use a
>>>   regular unsigned int to store this offset.
>>
>> I don't know if we will ever get to the point that we allow a 64-bit
>> request at the block layer (and then the block layer guarantees it is
>> split down to the driver's limits, which works when a driver is still
>> bound by 32-bit limits).  But we ALSO know that a cluster is at most
>> 2M (in our current implementation of qcow2), so the offset of where
>> the COW region starts in relation to the start of a cluster is < 2M.
>
> That's correct for the offset of the first region (m->cow_start),
> however m->cow_end is also relative to the offset of the first allocated
> cluster, so it can be > 2M if the requests spans several clusters.
>
> So I guess the maximum theoretical offset would be something like
> INT_MAX + 2*cluster_size (a bit below that actually).
At which point, your earlier claim that we bound requests at INT_MAX
takes over.  But thanks for correcting me that the end cow region can
indeed be more than a cluster away from the beginning region.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


signature.asc (617 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v2 4/7] qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()

Eric Blake
In reply to this post by Alberto Garcia
On 06/07/2017 09:08 AM, Alberto Garcia wrote:

> This patch splits do_perform_cow() into three separate functions to
> read, encrypt and write the COW regions.
>
> perform_cow() can now read both regions first, then encrypt them and
> finally write them to disk. The memory allocation is also done in
> this function now, using one single buffer large enough to hold both
> regions.
>
> Signed-off-by: Alberto Garcia <[hidden email]>
> ---
>  block/qcow2-cluster.c | 114 +++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 84 insertions(+), 30 deletions(-)
>
Let's suppose we have a guest issuing 512-byte aligned requests and a
host that requires 4k alignment; and the guest does an operation that
needs a COW with one sector at both the front and end of the cluster.

> @@ -760,22 +776,59 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
>      BDRVQcow2State *s = bs->opaque;
>      Qcow2COWRegion *start = &m->cow_start;
>      Qcow2COWRegion *end = &m->cow_end;
> +    unsigned buffer_size = start->nb_bytes + end->nb_bytes;

This sets buffer_size to 1024 initially.

> +    uint8_t *start_buffer, *end_buffer;
>      int ret;
>  
> +    assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
> +
>      if (start->nb_bytes == 0 && end->nb_bytes == 0) {
>          return 0;
>      }
>  
> +    /* Reserve a buffer large enough to store the data from both the
> +     * start and end COW regions */
> +    start_buffer = qemu_try_blockalign(bs, buffer_size);
This is going to allocate a bigger buffer, namely one that is at least
4k in size (at least, that's my understanding - the block device is able
to track its preferred IO size/alignment).

> +    if (start_buffer == NULL) {
> +        return -ENOMEM;
> +    }
> +    /* The part of the buffer where the end region is located */
> +    end_buffer = start_buffer + start->nb_bytes;

But now end_buffer does not have optimal alignment.  In the old code, we
called qemu_try_blockalign() twice, so that both read()s were called on
a 4k boundary; but now, the end read() is called unaligned to a 4k
boundary.  Of course, since we're only reading 512 bytes, instead of 4k,
it MIGHT not matter, but I don't know if we are going to cause a bounce
buffer to come into play that we could otherwise avoid if we were
smarter with our alignments.  Is that something we need to analyze
further, or even possibly intentionally over-allocate our buffer to
ensure optimal read alignments?

> +    /* And now we can write everything */
> +    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset,
> +                               start_buffer, start->nb_bytes);
> +    if (ret < 0) {
> +        goto fail;
> +    }
>  
> +    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset,
> +                               end_buffer, end->nb_bytes);

At any rate, other than the potential of a pessimization due to poor
alignments, your split looks sane, and it makes it more obvious that if
we set up the write iov, a later patch could then call a single
do_perform_cow_write using pwritev() over both chunks in one syscall.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


signature.asc (617 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v2 4/7] qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()

Alberto Garcia
On Fri 09 Jun 2017 04:53:05 PM CEST, Eric Blake wrote:

> Let's suppose we have a guest issuing 512-byte aligned requests and a
> host that requires 4k alignment; and the guest does an operation that
> needs a COW with one sector at both the front and end of the cluster.
>
>> @@ -760,22 +776,59 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
>>      BDRVQcow2State *s = bs->opaque;
>>      Qcow2COWRegion *start = &m->cow_start;
>>      Qcow2COWRegion *end = &m->cow_end;
>> +    unsigned buffer_size = start->nb_bytes + end->nb_bytes;
>
> This sets buffer_size to 1024 initially.
>
>> +    uint8_t *start_buffer, *end_buffer;
>>      int ret;
>>  
>> +    assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
>> +
>>      if (start->nb_bytes == 0 && end->nb_bytes == 0) {
>>          return 0;
>>      }
>>  
>> +    /* Reserve a buffer large enough to store the data from both the
>> +     * start and end COW regions */
>> +    start_buffer = qemu_try_blockalign(bs, buffer_size);
>
> This is going to allocate a bigger buffer, namely one that is at least
> 4k in size (at least, that's my understanding - the block device is
> able to track its preferred IO size/alignment).
>
>> +    if (start_buffer == NULL) {
>> +        return -ENOMEM;
>> +    }
>> +    /* The part of the buffer where the end region is located */
>> +    end_buffer = start_buffer + start->nb_bytes;
>
> But now end_buffer does not have optimal alignment.  In the old code,
> we called qemu_try_blockalign() twice, so that both read()s were
> called on a 4k boundary; but now, the end read() is called unaligned
> to a 4k boundary.  Of course, since we're only reading 512 bytes,
> instead of 4k, it MIGHT not matter, but I don't know if we are going
> to cause a bounce buffer to come into play that we could otherwise
> avoid if we were smarter with our alignments.  Is that something we
> need to analyze further, or even possibly intentionally over-allocate
> our buffer to ensure optimal read alignments?

I think you're right, I actually thought about this but forgot it when
preparing the series for publishing.

If I'm not wrong it should be simply a matter of replacing

   buffer_size = start->nb_bytes + end->nb_bytes;

with

   buffer_size = QEMU_ALIGN_UP(start->nb_bytes, bdrv_opt_mem_align(bs))
                 + end->nb_bytes;

and then

    end_buffer = start_buffer + buffer_size - end->nb_bytes;

    (this one we do anyway in patch 5)

Berto

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v2 6/7] qcow2: Pass a QEMUIOVector to do_perform_cow_{read, write}()

Kevin Wolf-5
In reply to this post by Alberto Garcia
Am 07.06.2017 um 16:08 hat Alberto Garcia geschrieben:

> Instead of passing a single buffer pointer to do_perform_cow_write(),
> pass a QEMUIOVector. This will allow us to merge the write requests
> for the COW regions and the actual data into a single one.
>
> Although do_perform_cow_read() does not strictly need to change its
> API, we're doing it here as well for consistency.
>
> Signed-off-by: Alberto Garcia <[hidden email]>
> ---
>  block/qcow2-cluster.c | 51 ++++++++++++++++++++++++---------------------------
>  1 file changed, 24 insertions(+), 27 deletions(-)

> @@ -807,22 +798,25 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
>      /* The part of the buffer where the end region is located */
>      end_buffer = start_buffer + buffer_size - end->nb_bytes;
>  
> +    qemu_iovec_init(&qiov, 3);
> +

You don't actually make use of more than one iovec in this patch. And
after the last patch, the value is potentially too low - which isn't
fatal because this is not a fixed maximum, but just a hint, but it can
cause unnecessary memory allocations.

It would probably be better to use 1 here and change it into
2 + m->data_qiov->niov when you add write merging.

Kevin

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v2 7/7] qcow2: Merge the writing of the COW regions with the guest data

Kevin Wolf-5
In reply to this post by Alberto Garcia
Am 07.06.2017 um 16:08 hat Alberto Garcia geschrieben:

> If the guest tries to write data that results on the allocation of a
> new cluster, instead of writing the guest data first and then the data
> from the COW regions, write everything together using one single I/O
> operation.
>
> This can improve the write performance by 25% or more, depending on
> several factors such as the media type, the cluster size and the I/O
> request size.
>
> Signed-off-by: Alberto Garcia <[hidden email]>

> diff --git a/block/qcow2.c b/block/qcow2.c
> index b3ba5daa93..89be2083d4 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1575,6 +1575,44 @@ fail:
>      return ret;
>  }
>  
> +/* Check if it's possible to merge a write request with the writing of
> + * the data from the COW regions */
> +static bool can_merge_cow(uint64_t offset, unsigned bytes,
> +                          QEMUIOVector *hd_qiov, QCowL2Meta *l2meta)
> +{
> +    QCowL2Meta *m;
> +
> +    for (m = l2meta; m != NULL; m = m->next) {
> +        /* If both COW regions are empty then there's nothing to merge */
> +        if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) {
> +            continue;
> +        }
> +
> +        /* The data (middle) region must be immediately after the
> +         * start region */
> +        if (l2meta_cow_start(m) + m->cow_start.nb_bytes != offset) {
> +            continue;
> +        }
> +
> +        /* The end region must be immediately after the data (middle)
> +         * region */
> +        if (m->offset + m->cow_end.offset != offset + bytes) {
> +            continue;
> +        }
> +
> +        /* Make sure that adding both COW regions to the QEMUIOVector
> +         * does not exceed IOV_MAX */
> +        if (hd_qiov->niov > IOV_MAX - 2) {
> +            continue;
> +        }
> +
> +        m->data_qiov = hd_qiov;

I don't think having this side effect is good for a function that is
called can_xyz(). I'd either call it merge_cow() or move the assignment
to the caller.

If we consider allowing to merge multiple L2Metas in the future, the
actual merging could become more complicated, so maybe merge_cow() is
the better option.

Kevin

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v2 0/7] qcow2: Reduce the number of I/O ops when doing COW

Kevin Wolf-5
In reply to this post by Alberto Garcia
Am 07.06.2017 um 16:08 hat Alberto Garcia geschrieben:
> Hi all,
>
> here's a patch series that rewrites the copy-on-write code in the
> qcow2 driver to reduce the number of I/O operations.
>
> This is version v2, please refer to the original e-mail for a complete
> description:
>
> https://lists.gnu.org/archive/html/qemu-block/2017-05/msg00882.html

I had two minor comments, and there's the one from Eric about the buffer
alignement. Other than that, this series looks good. Feel free to add my
Reviewed-by after addressing these three comments.

Kevin

12
Loading...