migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

Christian Ehrhardt
Hi,
testing on 2.10-rc2 I ran into an issue around:
  unable to execute QEMU command 'nbd-server-add': Block node is read-only

### TL;DR ###
- triggered by livbirt driven live migration with --copy-storage-all
- buils down to nbd_server_add failing
- can be reproduced on a single system without libvirt
- last known working (so far) was qemu 2.8

Simple repro:
$ qemu-img create -f qcow2 /tmp/test.qcow2 100M
$ qemu-system-x86_64 -S -m 512 -smp 1 -nodefaults --nographic -monitor
stdio -drive
file=/tmp/test.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -incoming
defer
QEMU 2.9.92 monitor - type 'help' for more information
(qemu) warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx
[bit 5]
nbd_server_start 0.0.0.0:49153
(qemu) nbd_server_add -w drive-virtio-disk0
Block node is read-only

### Details ###

Trigger:
virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://
10.22.69.61/system

Setup:
- Two systems without shared storage
- An equal image is synced before the test, so it would only migrate
minimal remaining changes
=> Only the --copy-storage-all case triggers this, other migrations work as
far as I tested.
=> No related apparmor denials

The last combination I knew to be successful was libvirt 3.5 and qemu 2.8.
So I Downgraded libvirt (but kept qemu) and retest.
=> Still an issue
=> So it is a qemu 2.10 issue and not libvirt 3.6
Continuing with libvirt 3.6 to have latest updates.

On the migration target I see the following in the log:
libvirtd[11829]: 2017-08-11 08:51:49.283+0000: 11842: warning :
qemuDomainObjTaint:4545 : Domain id=2 name='kvmguest-artful-normal'
uuid=b6f4cdab-77b0-43b1-933d-9683567f3a89 is tainted: high-privileges
libvirtd[11829]: 2017-08-11 08:51:49.386+0000: 11842: error :
qemuMonitorJSONCheckError:389 : internal error: unable to execute QEMU
command 'nbd-server-add': Block node is read-only

I checked the images on source (active since the guest is runngin) and
target (inactive and out of sync copies)
source $ virsh domblklist kvmguest-artful-normal
Target     Source
------------------------------------------------
vda        /var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow
vdb        /var/lib/uvtool/libvirt/images/kvmguest-artful-normal-ds.qcow

But when checking details on the source I didn't get any being blocked:
qemu-img info --backing-chain
/var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow
qemu-img: Could not open
'/var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow': Failed to get
shared "write" lock
Is another process using the image?

On the target these are inactive, so the inquiry works (content is the same
anyway).
All files are there and the backing chain looks correct.

I'm not sure if being unable to qemu-img while running is considered an
issue of its own, but this could be related to:
(qemu) commit 244a5668106297378391b768e7288eb157616f64
Author: Fam Zheng <[hidden email]>
    file-posix: Add image locking to perm operations


The add should be from libvirt
 - the chain here should be
   qemuMigrationPrepareAny -> qemuMigrationStartNBDServer ->
qemuMonitorNBDServerAdd -> qemuMonitorJSONNBDServerAdd
 - there is a debug statement in qemuMonitorNBDServerAdd
VIR_DEBUG("deviceID=%s", deviceID);
   This shows it is the nbd sevrer for the first disk
 - seems libvirt adding a nbd server for the copy-storage-all
 - that qmp fails in qemu
 - afterwards all else in the logs is libvirt cleaning up and killing the
qemu that was prepped (with -S) on
   the target

With virt debug and gdb I catched the device for the qmp command and stop
it while doing so.
debug : qemuMonitorNBDServerStart:3993 : host=:: port=49153
debug : qemuMonitorNBDServerStart:3995 : mon:0x7f6d4c016500
vm:0x7f6d4c011580 json:1 fd:27
debug : qemuMonitorNBDServerAdd:4006 : deviceID=drive-virtio-disk0
debug : qemuMonitorNBDServerAdd:4008 : mon:0x7f6d4c016500 vm:0x7f6d4c011580
json:1 fd:27
[...]
debug : qemuMonitorJSONCheckError:378 : unable to execute QEMU command
{"execute":"nbd-server-add","arguments":{"device":"drive-virtio-disk0","writable":true},"id":"libvirt-24"}:
{"id":"libvirt-24","error":{"class":"GenericError","desc":"Block node is
read-only"}}


Reducing the qemu call libvirt does to some basiscs that don't need libvirt
I tried to reproduce.
At the -S starting point that the migration uses just as well the devices
are already there, going on in monitor from there:
(qemu) info block
drive-virtio-disk0 (#block164):
/var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow (qcow2)
    Attached to:      /machine/peripheral/virtio-disk0/virtio-backend
    Cache mode:       writeback
    Backing file:
/var/lib/uvtool/libvirt/images/x-uvt-b64-Y29tLnVidW50dS5jbG91ZC5kYWlseTpzZXJ2ZXI6MTcuMTA6YW1kNjQgMjAxNzA4MTA=
(chain depth: 1)

drive-virtio-disk1 (#block507):
/var/lib/uvtool/libvirt/images/kvmguest-artful-normal-ds.qcow (qcow2)
    Attached to:      /machine/peripheral/virtio-disk1/virtio-backend
    Cache mode:       writeback

# starting the server
(qemu) nbd_server_start 0.0.0.0:49153
# This gave me a valid server
tcp        0      0 0.0.0.0:49153           0.0.0.0:*               LISTEN
     0          2328989    13593/qemu-system-x

# adding the disk
(qemu) nbd_server_add -w drive-virtio-disk0
Block node is read-only

Ok, so reproducible without libvirt on the receiving side.
Simplifying that to a smaller test:

$ qemu-img create -f qcow2 /tmp/test.qcow2 100M
$ qemu-system-x86_64 -S -m 512 -smp 1 -nodefaults --nographic -monitor
stdio -drive
file=/tmp/test.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -incoming
defer
QEMU 2.9.92 monitor - type 'help' for more information
(qemu) warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx
[bit 5]
nbd_server_start 0.0.0.0:49153
(qemu) nbd_server_add -w drive-virtio-disk0
Block node is read-only
(qemu) quit

It might be reasonable to keep the -device section to specify how it is
included.
Finally dropping -incoming defer makes a difference in the error message:

qemu-system-x86_64 -S -m 512 -smp 1 -nodefaults --nographic -monitor stdio
-drive file=/tmp/test.qcow2,format=qcow2,if=none,id=drive-virtio-disk0
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
QEMU 2.9.92 monitor - type 'help' for more information
(qemu) warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx
[bit 5]
info block
drive-virtio-disk0 (#block163): /tmp/test.qcow2 (qcow2)
    Attached to:      /machine/peripheral/virtio-disk0/virtio-backend
    Cache mode:       writeback
(qemu) nbd_server_start 0.0.0.0:49153
(qemu) nbd_server_add -w drive-virtio-disk0
Conflicts with use by drive-virtio-disk0 as 'root', which does not allow
'write' on #block163

Without -incoming defer in strace I see:
[pid 13720] fcntl(14, F_OFD_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET,
l_start=203, l_len=1}) = 0
[pid 13720] fcntl(14, F_OFD_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET,
l_start=204, l_len=1}) = 0
[pid 13720] writev(1, [{iov_base="Conflicts with use by drive-virt"...,
iov_len=95}], 1) = 95
Which makes me even more suspicious about the patch initially mentioned.

With -incoming defer set it seems to check things via a signalfd and then
give up on the "Block node is read-only"
[pid 13750] readv(0, [{iov_base="\33", iov_len=1}], 1) = 1
[pid 13750] ppoll([{fd=0, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7,
events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=15,
events=POLLIN}], 6, {tv_sec=47226, tv_nsec=197570000}, NULL, 8) = 1
([{fd=0, revents=POLLIN}], left {tv_sec=47226, tv_nsec=197566613})
[pid 13750] readv(0, [{iov_base="[", iov_len=1}], 1) = 1
[pid 13750] ppoll([{fd=0, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7,
events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=15,
events=POLLIN}], 6, {tv_sec=47226, tv_nsec=197134000}, NULL, 8) = 1
([{fd=0, revents=POLLIN}], left {tv_sec=47226, tv_nsec=197130643})
[pid 13750] readv(0, [{iov_base="A", iov_len=1}], 1) = 1
[pid 13750] writev(1, [{iov_base="nbd_server_add -w drive-virtio-d"...,
iov_len=39}], 1) = 39
[pid 13750] ppoll([{fd=0, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7,
events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=15,
events=POLLIN}], 6, {tv_sec=47226, tv_nsec=196721000}, NULL, 8) = 1
([{fd=0, revents=POLLIN}], left {tv_sec=47225, tv_nsec=589631744})
[pid 13750] readv(0, [{iov_base="\r", iov_len=1}], 1) = 1
[pid 13750] writev(1, [{iov_base="\r\n", iov_len=2}], 1) = 2
[pid 13750] writev(1, [{iov_base="Block node is read-only\r\n",
iov_len=25}], 1) = 25

The last qemu I had around was a 2.8 where this works still fine.
If needed I might go bisecting but I have the feeling that I provided
enough data for the experts to easily "spot" it - so holding back the
bisect effort until needed.


--
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

Fam Zheng-2
On Fri, 08/11 13:07, Christian Ehrhardt wrote:

> Hi,
> testing on 2.10-rc2 I ran into an issue around:
>   unable to execute QEMU command 'nbd-server-add': Block node is read-only
>
> ### TL;DR ###
> - triggered by livbirt driven live migration with --copy-storage-all
> - buils down to nbd_server_add failing
> - can be reproduced on a single system without libvirt
> - last known working (so far) was qemu 2.8
>
> Simple repro:
> $ qemu-img create -f qcow2 /tmp/test.qcow2 100M
> $ qemu-system-x86_64 -S -m 512 -smp 1 -nodefaults --nographic -monitor
> stdio -drive
> file=/tmp/test.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -incoming
> defer
> QEMU 2.9.92 monitor - type 'help' for more information
> (qemu) warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx
> [bit 5]
> nbd_server_start 0.0.0.0:49153
> (qemu) nbd_server_add -w drive-virtio-disk0
> Block node is read-only
>
> ### Details ###
>
> Trigger:
> virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://
> 10.22.69.61/system
>
> Setup:
> - Two systems without shared storage
> - An equal image is synced before the test, so it would only migrate
> minimal remaining changes
> => Only the --copy-storage-all case triggers this, other migrations work as
> far as I tested.
> => No related apparmor denials
>
> The last combination I knew to be successful was libvirt 3.5 and qemu 2.8.
> So I Downgraded libvirt (but kept qemu) and retest.
> => Still an issue
> => So it is a qemu 2.10 issue and not libvirt 3.6
> Continuing with libvirt 3.6 to have latest updates.
>
> On the migration target I see the following in the log:
> libvirtd[11829]: 2017-08-11 08:51:49.283+0000: 11842: warning :
> qemuDomainObjTaint:4545 : Domain id=2 name='kvmguest-artful-normal'
> uuid=b6f4cdab-77b0-43b1-933d-9683567f3a89 is tainted: high-privileges
> libvirtd[11829]: 2017-08-11 08:51:49.386+0000: 11842: error :
> qemuMonitorJSONCheckError:389 : internal error: unable to execute QEMU
> command 'nbd-server-add': Block node is read-only
>
> I checked the images on source (active since the guest is runngin) and
> target (inactive and out of sync copies)
> source $ virsh domblklist kvmguest-artful-normal
> Target     Source
> ------------------------------------------------
> vda        /var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow
> vdb        /var/lib/uvtool/libvirt/images/kvmguest-artful-normal-ds.qcow
>
> But when checking details on the source I didn't get any being blocked:
> qemu-img info --backing-chain
> /var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow
> qemu-img: Could not open
> '/var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow': Failed to get
> shared "write" lock
> Is another process using the image?
>
> On the target these are inactive, so the inquiry works (content is the same
> anyway).
> All files are there and the backing chain looks correct.
>
> I'm not sure if being unable to qemu-img while running is considered an
> issue of its own, but this could be related to:
> (qemu) commit 244a5668106297378391b768e7288eb157616f64
> Author: Fam Zheng <[hidden email]>
>     file-posix: Add image locking to perm operations
>
>
> The add should be from libvirt
>  - the chain here should be
>    qemuMigrationPrepareAny -> qemuMigrationStartNBDServer ->
> qemuMonitorNBDServerAdd -> qemuMonitorJSONNBDServerAdd
>  - there is a debug statement in qemuMonitorNBDServerAdd
> VIR_DEBUG("deviceID=%s", deviceID);
>    This shows it is the nbd sevrer for the first disk
>  - seems libvirt adding a nbd server for the copy-storage-all
>  - that qmp fails in qemu
>  - afterwards all else in the logs is libvirt cleaning up and killing the
> qemu that was prepped (with -S) on
>    the target
>
> With virt debug and gdb I catched the device for the qmp command and stop
> it while doing so.
> debug : qemuMonitorNBDServerStart:3993 : host=:: port=49153
> debug : qemuMonitorNBDServerStart:3995 : mon:0x7f6d4c016500
> vm:0x7f6d4c011580 json:1 fd:27
> debug : qemuMonitorNBDServerAdd:4006 : deviceID=drive-virtio-disk0
> debug : qemuMonitorNBDServerAdd:4008 : mon:0x7f6d4c016500 vm:0x7f6d4c011580
> json:1 fd:27
> [...]
> debug : qemuMonitorJSONCheckError:378 : unable to execute QEMU command
> {"execute":"nbd-server-add","arguments":{"device":"drive-virtio-disk0","writable":true},"id":"libvirt-24"}:
> {"id":"libvirt-24","error":{"class":"GenericError","desc":"Block node is
> read-only"}}
>
>
> Reducing the qemu call libvirt does to some basiscs that don't need libvirt
> I tried to reproduce.
> At the -S starting point that the migration uses just as well the devices
> are already there, going on in monitor from there:
> (qemu) info block
> drive-virtio-disk0 (#block164):
> /var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow (qcow2)
>     Attached to:      /machine/peripheral/virtio-disk0/virtio-backend
>     Cache mode:       writeback
>     Backing file:
> /var/lib/uvtool/libvirt/images/x-uvt-b64-Y29tLnVidW50dS5jbG91ZC5kYWlseTpzZXJ2ZXI6MTcuMTA6YW1kNjQgMjAxNzA4MTA=
> (chain depth: 1)
>
> drive-virtio-disk1 (#block507):
> /var/lib/uvtool/libvirt/images/kvmguest-artful-normal-ds.qcow (qcow2)
>     Attached to:      /machine/peripheral/virtio-disk1/virtio-backend
>     Cache mode:       writeback
>
> # starting the server
> (qemu) nbd_server_start 0.0.0.0:49153
> # This gave me a valid server
> tcp        0      0 0.0.0.0:49153           0.0.0.0:*               LISTEN
>      0          2328989    13593/qemu-system-x
>
> # adding the disk
> (qemu) nbd_server_add -w drive-virtio-disk0
> Block node is read-only
>
> Ok, so reproducible without libvirt on the receiving side.
> Simplifying that to a smaller test:
>
> $ qemu-img create -f qcow2 /tmp/test.qcow2 100M
> $ qemu-system-x86_64 -S -m 512 -smp 1 -nodefaults --nographic -monitor
> stdio -drive
> file=/tmp/test.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -incoming
> defer
> QEMU 2.9.92 monitor - type 'help' for more information
> (qemu) warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx
> [bit 5]
> nbd_server_start 0.0.0.0:49153
> (qemu) nbd_server_add -w drive-virtio-disk0
> Block node is read-only
> (qemu) quit
>
> It might be reasonable to keep the -device section to specify how it is
> included.

This is actually caused by

commit 9c5e6594f15b7364624a3ad40306c396c93a2145
Author: Kevin Wolf <[hidden email]>
Date:   Thu May 4 18:52:40 2017 +0200

    block: Fix write/resize permissions for inactive images

which forbids "nbd_server_add -w" in the "inactive" state set by -incoming.

But I'm not sure what is a proper fix. Maybe revert the bdrv_is_writable() part
of the commit? Kevin?

Fam

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

Kevin Wolf-5
Am 11.08.2017 um 14:04 hat Fam Zheng geschrieben:

> On Fri, 08/11 13:07, Christian Ehrhardt wrote:
> > Simplifying that to a smaller test:
> >
> > $ qemu-img create -f qcow2 /tmp/test.qcow2 100M
> > $ qemu-system-x86_64 -S -m 512 -smp 1 -nodefaults --nographic -monitor
> > stdio -drive
> > file=/tmp/test.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -incoming
> > defer
> > QEMU 2.9.92 monitor - type 'help' for more information
> > (qemu) warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx
> > [bit 5]
> > nbd_server_start 0.0.0.0:49153
> > (qemu) nbd_server_add -w drive-virtio-disk0
> > Block node is read-only
> > (qemu) quit
> >
> > It might be reasonable to keep the -device section to specify how it is
> > included.
>
> This is actually caused by
>
> commit 9c5e6594f15b7364624a3ad40306c396c93a2145
> Author: Kevin Wolf <[hidden email]>
> Date:   Thu May 4 18:52:40 2017 +0200
>
>     block: Fix write/resize permissions for inactive images
>
> which forbids "nbd_server_add -w" in the "inactive" state set by -incoming.
>
> But I'm not sure what is a proper fix. Maybe revert the bdrv_is_writable() part
> of the commit? Kevin?

No, the reason why this fail is simply that nbd_export_new() does things
in the wrong order. It always had to activate the image, otherwise the
first write would run into an assertion failure, but it only does so
after requesting the permissions.

blk_insert_bs() in line 1061 requests the permissions,
blk_invalidate_cache() in line 1097 activates the image.

We could either use bdrv_invalidate_cache() instead and call that before
blk_insert_bs(), or start with perm=0, shared=BLK_PERM_ALL and then use
blk_set_perm() only after activating the image. The third option would
be to set blk->disable_perm = true, which would ensure that the
permissions are only effective after the image gets activated; but this
field isn't accessible outside block-backend.c at the moment.

Kevin

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

Christian Ehrhardt
On Fri, Aug 11, 2017 at 2:37 PM, Kevin Wolf <[hidden email]> wrote:

> Am 11.08.2017 um 14:04 hat Fam Zheng geschrieben:
> > On Fri, 08/11 13:07, Christian Ehrhardt wrote:
> > > Simplifying that to a smaller test:
> > >
>
[...]

> > > Block node is read-only
>
[...]

> >
> > This is actually caused by
> >
> > commit 9c5e6594f15b7364624a3ad40306c396c93a2145
> > Author: Kevin Wolf <[hidden email]>
> > Date:   Thu May 4 18:52:40 2017 +0200
> >
> >     block: Fix write/resize permissions for inactive images
> >
>

Yeah in the meantime an automated bisect run is through an agrees.
Thanks for pointing out the right change triggering that so early!

Thanks Kevin for all the suggestions already, I quickly looked at the code
but I'm lost there without taking much more time.
Is anybody experienced in that area looking at fixing that?
Because IMHO that is kind of a block bug for 2.10 by breaking formerly
working behavior (even as Kevin identified it seems to have done it wrong
all the time).
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

Kevin Wolf-5
Am 11.08.2017 um 17:34 hat Christian Ehrhardt geschrieben:

> On Fri, Aug 11, 2017 at 2:37 PM, Kevin Wolf <[hidden email]> wrote:
>
> > Am 11.08.2017 um 14:04 hat Fam Zheng geschrieben:
> > > On Fri, 08/11 13:07, Christian Ehrhardt wrote:
> > > > Simplifying that to a smaller test:
> > > >
> >
> [...]
>
> > > > Block node is read-only
> >
> [...]
>
> > >
> > > This is actually caused by
> > >
> > > commit 9c5e6594f15b7364624a3ad40306c396c93a2145
> > > Author: Kevin Wolf <[hidden email]>
> > > Date:   Thu May 4 18:52:40 2017 +0200
> > >
> > >     block: Fix write/resize permissions for inactive images
> > >
> >
>
> Yeah in the meantime an automated bisect run is through an agrees.
> Thanks for pointing out the right change triggering that so early!
>
> Thanks Kevin for all the suggestions already, I quickly looked at the code
> but I'm lost there without taking much more time.
> Is anybody experienced in that area looking at fixing that?
> Because IMHO that is kind of a block bug for 2.10 by breaking formerly
> working behavior (even as Kevin identified it seems to have done it wrong
> all the time).

The patch below implements what I was thinking of, and it seems to fix
this problem. However, you'll immediately run into the next one, which
is a message like 'Conflicts with use by ide0-hd0 as 'root', which does
not allow 'write' on #block172'.

The reason for this is that since commit 4417ab7adf1,
bdrv_invalidate_cache() not only activates the image, but also is the
signal for guest device BlockBackends that migration has completed and
they should now request exclusive access to the image.

As the NBD server shows, this assumption is wrong. Images can be
activated even before migration completes. Maybe this really needs to
be done in a VM state change handler.

We can't simply revert commit 4417ab7adf1 because for image file
locking, it is important that we drop locks for inactive images, so
BdrvChildRole.activate/inactivate will probably stay. Maybe only one
bool blk->disable_perm isn't enough, we might need a different one for
devices before migration completed, or at least a counter.

I'll be on vacation starting tomorrow, so someone else needs to tackle
this. Fam, I think you are relatively familiar with the op blockers?

Kevin


diff --git a/nbd/server.c b/nbd/server.c
index 82a78bf439..b68b6fb535 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1045,11 +1045,22 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset, off_t size,
                           bool writethrough, BlockBackend *on_eject_blk,
                           Error **errp)
 {
+    AioContext  *ctx;
     BlockBackend *blk;
     NBDExport *exp = g_malloc0(sizeof(NBDExport));
     uint64_t perm;
     int ret;
 
+    /*
+     * NBD exports are used for non-shared storage migration.  Make sure
+     * that BDRV_O_INACTIVE is cleared and the image is ready for write
+     * access since the export could be available before migration handover.
+     */
+    ctx = bdrv_get_aio_context(bs);
+    aio_context_acquire(ctx);
+    bdrv_invalidate_cache(bs, NULL);
+    aio_context_release(ctx);
+
     /* Don't allow resize while the NBD server is running, otherwise we don't
      * care what happens with the node. */
     perm = BLK_PERM_CONSISTENT_READ;
@@ -1087,15 +1098,6 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset, off_t size,
         exp->eject_notifier.notify = nbd_eject_notifier;
         blk_add_remove_bs_notifier(on_eject_blk, &exp->eject_notifier);
     }
-
-    /*
-     * NBD exports are used for non-shared storage migration.  Make sure
-     * that BDRV_O_INACTIVE is cleared and the image is ready for write
-     * access since the export could be available before migration handover.
-     */
-    aio_context_acquire(exp->ctx);
-    blk_invalidate_cache(blk, NULL);
-    aio_context_release(exp->ctx);
     return exp;
 
 fail:

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

Fam Zheng-2
On Fri, 08/11 18:48, Kevin Wolf wrote:

> The patch below implements what I was thinking of, and it seems to fix
> this problem. However, you'll immediately run into the next one, which
> is a message like 'Conflicts with use by ide0-hd0 as 'root', which does
> not allow 'write' on #block172'.
>
> The reason for this is that since commit 4417ab7adf1,
> bdrv_invalidate_cache() not only activates the image, but also is the
> signal for guest device BlockBackends that migration has completed and
> they should now request exclusive access to the image.
>
> As the NBD server shows, this assumption is wrong. Images can be
> activated even before migration completes. Maybe this really needs to
> be done in a VM state change handler.
>
> We can't simply revert commit 4417ab7adf1 because for image file
> locking, it is important that we drop locks for inactive images, so
> BdrvChildRole.activate/inactivate will probably stay. Maybe only one
> bool blk->disable_perm isn't enough, we might need a different one for
> devices before migration completed, or at least a counter.

I'll make your diff a formal patch and add a VM state change handler for 2.10.

I'm not very confident with a counter because it's not obviour to me that
inactivate/activate pairs are always balanced.

>
> I'll be on vacation starting tomorrow, so someone else needs to tackle
> this. Fam, I think you are relatively familiar with the op blockers?
>
> Kevin
>
>
> diff --git a/nbd/server.c b/nbd/server.c
> index 82a78bf439..b68b6fb535 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1045,11 +1045,22 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset, off_t size,
>                            bool writethrough, BlockBackend *on_eject_blk,
>                            Error **errp)
>  {
> +    AioContext  *ctx;
>      BlockBackend *blk;
>      NBDExport *exp = g_malloc0(sizeof(NBDExport));
>      uint64_t perm;
>      int ret;
>  
> +    /*
> +     * NBD exports are used for non-shared storage migration.  Make sure
> +     * that BDRV_O_INACTIVE is cleared and the image is ready for write
> +     * access since the export could be available before migration handover.
> +     */
> +    ctx = bdrv_get_aio_context(bs);
> +    aio_context_acquire(ctx);
> +    bdrv_invalidate_cache(bs, NULL);
> +    aio_context_release(ctx);
> +
>      /* Don't allow resize while the NBD server is running, otherwise we don't
>       * care what happens with the node. */
>      perm = BLK_PERM_CONSISTENT_READ;
> @@ -1087,15 +1098,6 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset, off_t size,
>          exp->eject_notifier.notify = nbd_eject_notifier;
>          blk_add_remove_bs_notifier(on_eject_blk, &exp->eject_notifier);
>      }
> -
> -    /*
> -     * NBD exports are used for non-shared storage migration.  Make sure
> -     * that BDRV_O_INACTIVE is cleared and the image is ready for write
> -     * access since the export could be available before migration handover.
> -     */
> -    aio_context_acquire(exp->ctx);
> -    blk_invalidate_cache(blk, NULL);
> -    aio_context_release(exp->ctx);
>      return exp;
>  
>  fail:

Fam

Loading...