VFIO VGA test branches

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

VFIO VGA test branches

Alex Williamson-3
Hi folks,

A number of people have been trying VFIO's VGA support, a few have even
been successful.  Resetting devices has been a problem and makes it
very, very difficult to really use VGA assignment effectively.  The code
in the branches below attempts to address this.  Discrete graphics
devices are typically on their own bus, which we can reset so we
theoretically get something pretty close to a power-on state for the GPU
on each run (or after each guest reset).  With this I'm able to get
multiple runs on my HD7850 with no need to reset the host.  Hopefully
this will also cleanup after any host uses of the device so we can
unload driver rather than blacklisting them.

If you've been playing with VFIO and VGA, please give the branches below
a shot and report successes and failures.  Note that this new reset is
only enable with the x-vga=on option, so should not do gratuitous bus
resets for other devices.  Thanks,

Alex

git://github.com/awilliam/linux-vfio.git vfio-vga-reset
git://github.com/awilliam/qemu-vfio.git vfio-vga-reset

PS - The above linux branch is v3.9 based which has a known kvm emulator
bug.  If you're on Intel and nothing happens, try:

sudo modprobe -r kvm_intel
sudo modprobe kvm_intel emulate_invalid_guest_state=0

This is required to execute the VGA BIOS on my HD7850.

If things still don't work, apply the following patch:

--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -40,7 +40,7 @@
 #include "sysemu/kvm.h"
 #include "sysemu/sysemu.h"
 
-/* #define DEBUG_VFIO */
+#define DEBUG_VFIO
 #ifdef DEBUG_VFIO
 #define DPRINTF(fmt, ...) \
     do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)

And log the output (there will be lots).

Also, AMD/ATI and Nvidia are the only devices expected to have a
reasonable shot at working.  I'm seeing reports of success on AMD/ATI HD
5xxx, 6xxx, and 7xxx, as well as Nvidia Geforce 7-series, 8-series, 4xx
series, and 6xx series.  Older cards from those vendors probably aren't
very interesting to support (honestly I wouldn't care much about 7/8
series Nvidia or HD5xxx AMD, except I happen to have some for testing -
use emulated VGA if you don't care about performance).

Intel IGD graphics has numerous issues since it's partially incorporated
into the chipset.  Please don't bother to report IGD is broken unless
you're interested in fixing it.


Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Alex Williamson-3
A few notes for anyone trying this...

      * I recommend the q35 machine type and using the default config
        file found in the docs directory.  This means your command line
        should include:

         -M q35 -nodefconfig -readconfig /path/to/qemu.git/docs/q35-chipset.cfg

      * You're likely passing through a graphics card that is attached
        to the host system below a root port, so make it appear that way
        to the guest too.  If your graphics card has a graphics function
        and audio function, assign them as:

        -device vfio-pci,host=2:00.0,x-vga=on,multifunction=on,bus=ich9-pcie-port-1,addr=0.0 \
        -device vfio-pci,host=2:00.1,bus=ich9-pcie-port-1,addr=0.1
       
        The bus name comes from the q35-chipset.cfg above.  If your
        graphics doesn't include a separate audio device, drop the
        second line and the multifunction option of the first (addr is
        also optional at that point, 0.0 will be the default).
       
      * If you follow both of the above, your VGA device is now below a
        root port, but the version of seabios in qemu doesn't support
        initializing VGA routing to that device.  To fix, use upstream
        seabios: git://git.seabios.org/seabios.git  The default config
        should work.  Then add the following to your qemu commandline:
       
        -L /path/to/seabios.git/out/ -L /path/to/qemu/bios/files/
       
        (the latter is likely /usr/local/share/qemu/)
       
      * You can use -nographic to prevent QEMU from trying to start SDL
        or need a vnc parameter.  You can also specify a -vnc option and
        use the window for mouse input.

      * Use -vga none.  At this point I'm not really interested in
        dual-headed VMs unless you're interested in working on it.
        Having an emulated VGA means we're not really testing VGA
        support through VFIO.

      * Do no use the vfio-pci romfile option unless you need it (ie.
        try w/o first).  Option ROMs check an internal signature against
        the hardware.  If they don't match, it isn't run.  If you
        download a ROM from the internet, you may get nowhere.  If you
        do need a ROM, it's best to scrape it off the device you're
        using.  You can do this through the "rom" file in sysfs for the
        device.  "echo 1 > rom" to enable it, the read it as "cat rom
        > /tmp/rom".  To do this, it should be a secondary graphics
        device and be untouched by host drivers.  You may have better
        luck booting from an install CD to get an environment where the
        device is untouched for this.

      * USB passthrough is handy for input and easier than figuring out
        which ports are connected to which USB controllers for vfio-pci
        assignment.  Use lsusb to find the devices, note the bus and
        device numbers, the use:

        -device usb-host,hostbus=8,hostaddr=2

I think that's it.  Feel free to reply with other best practices.
Thanks,

Alex

On Fri, 2013-05-03 at 16:56 -0600, Alex Williamson wrote:

> Hi folks,
>
> A number of people have been trying VFIO's VGA support, a few have even
> been successful.  Resetting devices has been a problem and makes it
> very, very difficult to really use VGA assignment effectively.  The code
> in the branches below attempts to address this.  Discrete graphics
> devices are typically on their own bus, which we can reset so we
> theoretically get something pretty close to a power-on state for the GPU
> on each run (or after each guest reset).  With this I'm able to get
> multiple runs on my HD7850 with no need to reset the host.  Hopefully
> this will also cleanup after any host uses of the device so we can
> unload driver rather than blacklisting them.
>
> If you've been playing with VFIO and VGA, please give the branches below
> a shot and report successes and failures.  Note that this new reset is
> only enable with the x-vga=on option, so should not do gratuitous bus
> resets for other devices.  Thanks,
>
> Alex
>
> git://github.com/awilliam/linux-vfio.git vfio-vga-reset
> git://github.com/awilliam/qemu-vfio.git vfio-vga-reset
>
> PS - The above linux branch is v3.9 based which has a known kvm emulator
> bug.  If you're on Intel and nothing happens, try:
>
> sudo modprobe -r kvm_intel
> sudo modprobe kvm_intel emulate_invalid_guest_state=0
>
> This is required to execute the VGA BIOS on my HD7850.
>
> If things still don't work, apply the following patch:
>
> --- a/hw/misc/vfio.c
> +++ b/hw/misc/vfio.c
> @@ -40,7 +40,7 @@
>  #include "sysemu/kvm.h"
>  #include "sysemu/sysemu.h"
>  
> -/* #define DEBUG_VFIO */
> +#define DEBUG_VFIO
>  #ifdef DEBUG_VFIO
>  #define DPRINTF(fmt, ...) \
>      do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
>
> And log the output (there will be lots).
>
> Also, AMD/ATI and Nvidia are the only devices expected to have a
> reasonable shot at working.  I'm seeing reports of success on AMD/ATI HD
> 5xxx, 6xxx, and 7xxx, as well as Nvidia Geforce 7-series, 8-series, 4xx
> series, and 6xx series.  Older cards from those vendors probably aren't
> very interesting to support (honestly I wouldn't care much about 7/8
> series Nvidia or HD5xxx AMD, except I happen to have some for testing -
> use emulated VGA if you don't care about performance).
>
> Intel IGD graphics has numerous issues since it's partially incorporated
> into the chipset.  Please don't bother to report IGD is broken unless
> you're interested in fixing it.




Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Justin Gottula
In reply to this post by Alex Williamson-3
Hi,

The kernel won't compile with CONFIG_HOTPLUG_PCI=m:

drivers/pci/hotplug/pci_hotplug_core.c:548:5: error: redefinition of ‘pci_hp_reset_slot’
 int pci_hp_reset_slot(struct hotplug_slot *hotplug, int probe)
     ^
In file included from drivers/pci/hotplug/pci_hotplug_core.c:41:0:
include/linux/pci_hotplug.h:141:19: note: previous definition of ‘pci_hp_reset_slot’ was here
 static inline int pci_hp_reset_slot(struct hotplug_slot *slot, int probe)
                   ^
make[3]: *** [drivers/pci/hotplug/pci_hotplug_core.o] Error 1


Justin
Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Alex Williamson-3
On Fri, 2013-05-10 at 14:31 -0700, Justin Gottula wrote:

> Hi,
>
> The kernel won't compile with CONFIG_HOTPLUG_PCI=m:
>
> drivers/pci/hotplug/pci_hotplug_core.c:548:5: error: redefinition of
> ‘pci_hp_reset_slot’
>  int pci_hp_reset_slot(struct hotplug_slot *hotplug, int probe)
>      ^
> In file included from drivers/pci/hotplug/pci_hotplug_core.c:41:0:
> include/linux/pci_hotplug.h:141:19: note: previous definition of
> ‘pci_hp_reset_slot’ was here
>  static inline int pci_hp_reset_slot(struct hotplug_slot *slot, int probe)
>                    ^
> make[3]: *** [drivers/pci/hotplug/pci_hotplug_core.o] Error 1

Oops, thanks.  Hopefully you can use =y or off for now.  These trees
aren't yet ready for upstream.  Thanks,

Alex



Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Knut Omang-2
In reply to this post by Alex Williamson-3
Hi all,

Perfect timing from my perspective, thanks Alex!

I spent the better part of the weekend testing your branches on a new system
I just put together for this purpose, results below..

On Fri, 2013-05-03 at 16:56 -0600, Alex Williamson wrote:
...
> git://github.com/awilliam/linux-vfio.git vfio-vga-reset
> git://github.com/awilliam/qemu-vfio.git vfio-vga-reset

System setup:

- Fedora 18 on
- Gigabyte Z77X-UD5H motherboard
- Intel Core i7 3770 (Ivy bridge w/integrated graphics)
- 2 discrete graphics cards:

lspci | egrep 'VGA|Audio'
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Caicos [Radeon HD 6450]
01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Caicos HDMI Audio [Radeon HD 6400 Series]
02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Cape Verde PRO [Radeon HD 7700 Series]
02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]

Short summary:

- Once I got past a few time consuming obstacles explained below
   - the graphics part of the graphics/hdmi audio passthrough seems to work perfect
     on both discrete graphics cards
     (though so far only one at at time and with some minor issues, see below)
   - no success with the hdmi audio yet (ideas for further investigation appreciated!)

- Contrary to [hidden email] I had no success with using pci-assign for VGA
  with a standard fedora 18 kernel and fairly recent qemu, nor with your branches,

Details:

- I started off with the required kernel parameter 'intel_iommu=on' + necessary parameters for disabling radeon
   (radeon.modeset=0 rd.driver.blacklist=radeon) using the integrated graphics as primary display
   - this caused the system to freeze (with color artifacts on the console)

- In my naivity and because of the "i" in ifgx I tried both with
  'intel_iommu=ifgx_off' and then 'intel_iommu=on,igfx_off'
  and a full set of combinations of vfio, cards, kernels and pci-assign before I suspected
  that iommu support was turned off for **all** graphics cards with igfx_off

- The solution was to have integrated graphics turned off in the BIOS, and 'intel_iommu=on':

- iommu groups:

ls -l /sys/bus/pci/devices/0000:01:00.0/iommu_group/devices
total 0
lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.0 -> ../../../../devices/pci0000:00/0000:00:01.0
lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.1 -> ../../../../devices/pci0000:00/0000:00:01.1
lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.0 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0
lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.1 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.1
lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.0 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.0
lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.1 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.1

- eg. both the VGA/HDMI Audio pairs + the two root ports they are plugged into are in the same group:

# lspci -n
...
01:00.0 0300: 1002:683f
01:00.1 0403: 1002:aab0
02:00.0 0300: 1002:6779
02:00.1 0403: 1002:aa98
...

modprobe vfio_pci
echo 0000:01:00.1 > /sys/bus/pci/devices/0000\:01\:00.1/driver/unbind
echo 0000:02:00.1 > /sys/bus/pci/devices/0000\:02\:00.1/driver/unbind
echo 1002 683f > /sys/bus/pci/drivers/vfio-pci/new_id
echo 1002 aab0 > /sys/bus/pci/drivers/vfio-pci/new_id
echo 1002 6779 > /sys/bus/pci/drivers/vfio-pci/new_id
echo 1002 aa98 > /sys/bus/pci/drivers/vfio-pci/new_id

# lsusb
...
Bus 001 Device 008: ID 046d:c315 Logitech, Inc. Classic New Touch Keyboard
Bus 001 Device 004: ID 046d:c05b Logitech, Inc. M-U0004 810-001317 [B110 Optical USB Mouse]
...

- I also applied your suggested patch to the quirk function in VFIO (see below)

- Here is a (trimmed for readability) command line I successfully used to boot from the Windows 7 install DVD,
  notice the cd and disk device descriptions and the bus parameter - I struggled a while with that
  until I came across a comment by Gerd Hoffmann here: https://bugzilla.redhat.com/show_bug.cgi?id=922670 (Thanks, Gerd!)


qemu-kvm -M q35 \
  -nodefconfig -readconfig $SRC/qemu/docs/q35-chipset.cfg \
  -device vfio-pci,host=2:00.0,x-vga=on,multifunction=on,bus=ich9-pcie-port-1,addr=0.0 \
  -device vfio-pci,host=2:00.1,bus=ich9-pcie-port-1,addr=0.1 \
  -L $SRC/seabios/out/ -L $SRC/qemu/pc-bios \
  -vga none -nographic -cpu host -rtc base=localtime -k no -m 8192 -smp 2 \
  -drive file=/dev/sr0,index=2,media=cdrom,id=cd \
  -drive file=ivm03.img,index=0,media=disk,id=ivm03 \
  -device ide-drive,drive=ivm03,bus=ide.0 \
  -device ide-cd,drive=cd,bus=ide.1 \
  -net nic,vlan=0,model=virtio -net tap,vlan=0 \
  -enable-kvm \
  -device usb-host,hostbus=1,hostaddr=8 \
  -device usb-host,hostbus=1,hostaddr=4

- Both the graphics card seems to have a rom but only the HD6450 let itself to "scraping".
Anyway, supplying it to vfio did not seem to make any difference.

find /sys -name rom
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/rom
/sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/rom
...

Some observations and remaining unresolved issues:

- VFIO patch:
  Initially (while still running with igfx_off) I observed exactly the same behaviour as [hidden email]
  reported a while ago: With vfio_pci debug enabled, vfio_pci ended up spinning with repeated calls to
  vfio_ati_3c3_quirk_read and repeated logs:
    vfio: vfio_vga_read(0x3c3, 1) = 0x0
  I patched up accordingly with


diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index da0e5f9..a361d06 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -1291,7 +1291,7 @@ static uint64_t vfio_ati_3c3_quirk_read(void *opaque,
     uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
                                   addr + quirk->data.base_offset, size);
 
-    if (data == quirk->data.address_match) {
+    if (1 || data == quirk->data.address_match) {
         data = vfio_pci_read_config(&vdev->pdev, quirk->data.address_val, size);
         DPRINTF("%s(0x3c3, 1) = 0x%"PRIx64"\n", __func__, data);
     }


  This of course did not help much until I actually got the iommu
  enabled for the radeons (similar "repeated patters" as deniv reported)
  but what I have observed after I got it working is that if
  I disable the patch above, things are not that well: the Fedora VM
  comes up with VGA and the Fedora boot screen, then goes blank when
  switching to X.

- The fact that the iommu group now extends across all my available graphics
  devices now makes it difficult to  get the radeon (or catalyst) driver use to
  the other card since the vfio_pci driver needs to hold it.
  Not a complete showstopper since the vesa driver comes up with 1024x768..
  Might it be a good idea to have an override option (exception list or similar?)
  to allow the vfio_pci to be less restrictive about owning the whole group
   - allow functionality over security in such case? This of course is further complicated
  by the need for graphics drivers to be disabled/enabled already at the kernel prompt..

- There seems to be a bug in the (version F8) UEFI BIOS on the motherboard,
  The BIOS offers (undocumented) a full range of selections of which PCIe
  (or PCIe 1x) graphics card to use as primary, but any other selection
  than the first PCIe 16x slot has no effect and the motherboard reverts
  to the first slot, so to be able to test both cards, I had to put the card under test
  into the second (8x) PCIe slot. I am waiting for feedback from Gigabyte on possible
  fixes for this in newer BIOSes.

- The ultimate goal is to try to consolidate some older Windows desktops as "seats"
  on the new system, using the discrete graphics with HDMI/Displayport audio.
  With the HD7700 moved to the second PCIe slot I tested both Windows and
  Linux guests to try to get some sound through the HDMI audio device.
  Windows complains that no usable device is available. On Linux (Fedora 18, KDE desktop),
  the system settings -> multimedia dialogue never opens up which seems to indicate that
  PulseAudio has problems communicating with the passed through device (?),
  any hints/pointers here appreciated. From the vfio log it seems at least
  config space is accessed ok.

- There also seems to be issues with radeon and intel_iommu=on - if I try
  to enable modesetting and normal X support for the radeon cards, X fails to start.

- It would be nice if the integrated graphics could be used as the host primary display -
  I would be happy if someone has any hints as to if/how the ifgx_off option
  could be extended/modified to only affect iommu operation on selected device(s),
  if at all possible..

Thanks,

Knut Omang


Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Alex Williamson-3
On Mon, 2013-05-13 at 22:55 +0200, Knut Omang wrote:

> Hi all,
>
> Perfect timing from my perspective, thanks Alex!
>
> I spent the better part of the weekend testing your branches on a new system
> I just put together for this purpose, results below..
>
> On Fri, 2013-05-03 at 16:56 -0600, Alex Williamson wrote:
> ...
> > git://github.com/awilliam/linux-vfio.git vfio-vga-reset
> > git://github.com/awilliam/qemu-vfio.git vfio-vga-reset
>
> System setup:
>
> - Fedora 18 on
> - Gigabyte Z77X-UD5H motherboard
> - Intel Core i7 3770 (Ivy bridge w/integrated graphics)
> - 2 discrete graphics cards:
>
> lspci | egrep 'VGA|Audio'
> 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
> 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
> 01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Caicos [Radeon HD 6450]
> 01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Caicos HDMI Audio [Radeon HD 6400 Series]
> 02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Cape Verde PRO [Radeon HD 7700 Series]
> 02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
>
> Short summary:
>
> - Once I got past a few time consuming obstacles explained below
>    - the graphics part of the graphics/hdmi audio passthrough seems to work perfect
>      on both discrete graphics cards
>      (though so far only one at at time and with some minor issues, see below)
>    - no success with the hdmi audio yet (ideas for further investigation appreciated!)

I've had hdmi audio working with an HD7850, but only in Windows (7) and
it was using legacy interrupts for some reason instead of MSI.  I wonder
if Liux guests might work with snd_hda_intel.enable_msi=0.  I'm not sure
what's wrong with MSI, but it seems to be new with the PCI bus reset
support.

> - Contrary to [hidden email] I had no success with using pci-assign for VGA
>   with a standard fedora 18 kernel and fairly recent qemu, nor with your branches,
>
> Details:
>
> - I started off with the required kernel parameter 'intel_iommu=on' + necessary parameters for disabling radeon
>    (radeon.modeset=0 rd.driver.blacklist=radeon) using the integrated graphics as primary display
>    - this caused the system to freeze (with color artifacts on the console)
>
> - In my naivity and because of the "i" in ifgx I tried both with
>   'intel_iommu=ifgx_off' and then 'intel_iommu=on,igfx_off'
>   and a full set of combinations of vfio, cards, kernels and pci-assign before I suspected
>   that iommu support was turned off for **all** graphics cards with igfx_off

I'm not sure why this is, looks like the code only tries to turn it off
when only graphics is under the remapping device.  We'd probably need to
see the DMAR to know more (/sys/firmware/acpi/tables/DMAR).

> - The solution was to have integrated graphics turned off in the BIOS, and 'intel_iommu=on':
>
> - iommu groups:
>
> ls -l /sys/bus/pci/devices/0000:01:00.0/iommu_group/devices
> total 0
> lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.0 -> ../../../../devices/pci0000:00/0000:00:01.0
> lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.1 -> ../../../../devices/pci0000:00/0000:00:01.1
> lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.0 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0
> lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.1 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.1
> lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.0 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.0
> lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.1 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.1
>
> - eg. both the VGA/HDMI Audio pairs + the two root ports they are plugged into are in the same group:

Ick.  Intel has been pretty good about advertising ACS support on their
root ports.  I wonder if this is an oversight or if they are actually
not isolated from each other.

> # lspci -n
> ...
> 01:00.0 0300: 1002:683f
> 01:00.1 0403: 1002:aab0
> 02:00.0 0300: 1002:6779
> 02:00.1 0403: 1002:aa98
> ...
>
> modprobe vfio_pci
> echo 0000:01:00.1 > /sys/bus/pci/devices/0000\:01\:00.1/driver/unbind
> echo 0000:02:00.1 > /sys/bus/pci/devices/0000\:02\:00.1/driver/unbind
> echo 1002 683f > /sys/bus/pci/drivers/vfio-pci/new_id
> echo 1002 aab0 > /sys/bus/pci/drivers/vfio-pci/new_id
> echo 1002 6779 > /sys/bus/pci/drivers/vfio-pci/new_id
> echo 1002 aa98 > /sys/bus/pci/drivers/vfio-pci/new_id
>
> # lsusb
> ...
> Bus 001 Device 008: ID 046d:c315 Logitech, Inc. Classic New Touch Keyboard
> Bus 001 Device 004: ID 046d:c05b Logitech, Inc. M-U0004 810-001317 [B110 Optical USB Mouse]
> ...
>
> - I also applied your suggested patch to the quirk function in VFIO (see below)
>
> - Here is a (trimmed for readability) command line I successfully used to boot from the Windows 7 install DVD,
>   notice the cd and disk device descriptions and the bus parameter - I struggled a while with that
>   until I came across a comment by Gerd Hoffmann here: https://bugzilla.redhat.com/show_bug.cgi?id=922670 (Thanks, Gerd!)
>
>
> qemu-kvm -M q35 \
>   -nodefconfig -readconfig $SRC/qemu/docs/q35-chipset.cfg \
>   -device vfio-pci,host=2:00.0,x-vga=on,multifunction=on,bus=ich9-pcie-port-1,addr=0.0 \
>   -device vfio-pci,host=2:00.1,bus=ich9-pcie-port-1,addr=0.1 \
>   -L $SRC/seabios/out/ -L $SRC/qemu/pc-bios \
>   -vga none -nographic -cpu host -rtc base=localtime -k no -m 8192 -smp 2 \
>   -drive file=/dev/sr0,index=2,media=cdrom,id=cd \
>   -drive file=ivm03.img,index=0,media=disk,id=ivm03 \
>   -device ide-drive,drive=ivm03,bus=ide.0 \
>   -device ide-cd,drive=cd,bus=ide.1 \
>   -net nic,vlan=0,model=virtio -net tap,vlan=0 \
>   -enable-kvm \
>   -device usb-host,hostbus=1,hostaddr=8 \
>   -device usb-host,hostbus=1,hostaddr=4
>
> - Both the graphics card seemshould really support ACS on s to have a rom but only the HD6450 let itself to "scraping".

Did you try scraping the HD6450 while the HD7700 was the boot VGA and
vica versa?  The boot VGA ROM is handled in a special way and what you
really get is the shadow copy, which isn't what we want.

> Anyway, supplying it to vfio did not seem to make any difference.
>
> find /sys -name rom
> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/rom
> /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/rom
> ...
>
> Some observations and remaining unresolved issues:
>
> - VFIO patch:
>   Initially (while still running with igfx_off) I observed exactly the same behaviour as [hidden email]
>   reported a while ago: With vfio_pci debug enabled, vfio_pci ended up spinning with repeated calls to
>   vfio_ati_3c3_quirk_read and repeated logs:
>     vfio: vfio_vga_read(0x3c3, 1) = 0x0
>   I patched up accordingly with
>
>
> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> index da0e5f9..a361d06 100644
> --- a/hw/misc/vfio.c
> +++ b/hw/misc/vfio.c
> @@ -1291,7 +1291,7 @@ static uint64_t vfio_ati_3c3_quirk_read(void *opaque,
>      uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
>                                    addr + quirk->data.base_offset, size);
>  
> -    if (data == quirk->data.address_match) {
> +    if (1 || data == quirk->data.address_match) {
>          data = vfio_pci_read_config(&vdev->pdev, quirk->data.address_val, size);
>          DPRINTF("%s(0x3c3, 1) = 0x%"PRIx64"\n", __func__, data);
>      }
>
>
>   This of course did not help much until I actually got the iommu
>   enabled for the radeons (similar "repeated patters" as deniv reported)
>   but what I have observed after I got it working is that if
>   I disable the patch above, things are not that well: the Fedora VM
>   comes up with VGA and the Fedora boot screen, then goes blank when
>   switching to X.

Hmm, I think we'd probably have better luck making that unconditional
until we have reason to do otherwise.

> - The fact that the iommu group now extends across all my available graphics
>   devices now makes it difficult to  get the radeon (or catalyst) driver use to
>   the other card since the vfio_pci driver needs to hold it.
>   Not a complete showstopper since the vesa driver comes up with 1024x768..
>   Might it be a good idea to have an override option (exception list or similar?)
>   to allow the vfio_pci to be less restrictive about owning the whole group
>    - allow functionality over security in such case? This of course is further complicated
>   by the need for graphics drivers to be disabled/enabled already at the kernel prompt..

We have a quirk in the kernel that enables us to witelist devices, but
yes, there is no flexibility in this w/o modifying the code and
rebuilding.  (see drivers/pci/quirks.c:pci_dev_acs_enabled and follow
the example above w/ pci_dev_dma_source - function can just return 1)

> - There seems to be a bug in the (version F8) UEFI BIOS on the motherboard,
>   The BIOS offers (undocumented) a full range of selections of which PCIe
>   (or PCIe 1x) graphics card to use as primary, but any other selection
>   than the first PCIe 16x slot has no effect and the motherboard reverts
>   to the first slot, so to be able to test both cards, I had to put the card under test
>   into the second (8x) PCIe slot. I am waiting for feedback from Gigabyte on possible
>   fixes for this in newer BIOSes.
>
> - The ultimate goal is to try to consolidate some older Windows desktops as "seats"
>   on the new system, using the discrete graphics with HDMI/Displayport audio.
>   With the HD7700 moved to the second PCIe slot I tested both Windows and
>   Linux guests to try to get some sound through the HDMI audio device.
>   Windows complains that no usable device is available. On Linux (Fedora 18, KDE desktop),
>   the system settings -> multimedia dialogue never opens up which seems to indicate that
>   PulseAudio has problems communicating with the passed through device (?),
>   any hints/pointers here appreciated. From the vfio log it seems at least
>   config space is accessed ok.
>
> - There also seems to be issues with radeon and intel_iommu=on - if I try
>   to enable modesetting and normal X support for the radeon cards, X fails to start.
>
> - It would be nice if the integrated graphics could be used as the host primary display -
>   I would be happy if someone has any hints as to if/how the ifgx_off option
>   could be extended/modified to only affect iommu operation on selected device(s),
>   if at all possible..

Let's see what we can discover from your DMAR.  Also send along sudo
lspci -vvv.  Thanks,

Alex


Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Knut Omang-2
On Mon, 2013-05-13 at 16:23 -0600, Alex Williamson wrote:

> On Mon, 2013-05-13 at 22:55 +0200, Knut Omang wrote:
> > Hi all,
> >
> > Perfect timing from my perspective, thanks Alex!
> >
> > I spent the better part of the weekend testing your branches on a new system
> > I just put together for this purpose, results below..
> >
> > On Fri, 2013-05-03 at 16:56 -0600, Alex Williamson wrote:
> > ...
> > > git://github.com/awilliam/linux-vfio.git vfio-vga-reset
> > > git://github.com/awilliam/qemu-vfio.git vfio-vga-reset
> >
> > System setup:
> >
> > - Fedora 18 on
> > - Gigabyte Z77X-UD5H motherboard
> > - Intel Core i7 3770 (Ivy bridge w/integrated graphics)
> > - 2 discrete graphics cards:
> >
> > lspci | egrep 'VGA|Audio'
> > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
> > 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
> > 01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Caicos [Radeon HD 6450]
> > 01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Caicos HDMI Audio [Radeon HD 6400 Series]
> > 02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Cape Verde PRO [Radeon HD 7700 Series]
> > 02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
> >
> > Short summary:
> >
> > - Once I got past a few time consuming obstacles explained below
> >    - the graphics part of the graphics/hdmi audio passthrough seems to work perfect
> >      on both discrete graphics cards
> >      (though so far only one at at time and with some minor issues, see below)
> >    - no success with the hdmi audio yet (ideas for further investigation appreciated!)
>
> I've had hdmi audio working with an HD7850, but only in Windows (7) and
> it was using legacy interrupts for some reason instead of MSI.  I wonder
> if Liux guests might work with snd_hda_intel.enable_msi=0.  I'm not sure
> what's wrong with MSI, but it seems to be new with the PCI bus reset
> support.
I tried

modprobe -r snd_hda_intel
modprobe snd_hda_intel enable_msi=0

- did not seem to have any effect on Linux.
Here is the guest's lspci -vvv entry for the audio after the above:

01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
        Subsystem: PC Partner Limited Device aab0
        Physical Slot: 0
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin B routed to IRQ 17
        Region 0: Memory at fea60000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #1, Speed 8GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 00000000fee00000  Data: 4072
        Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Kernel driver in use: snd_hda_intel

> > - Contrary to [hidden email] I had no success with using pci-assign for VGA
> >   with a standard fedora 18 kernel and fairly recent qemu, nor with your branches,
> >
> > Details:
> >
> > - I started off with the required kernel parameter 'intel_iommu=on' + necessary parameters for disabling radeon
> >    (radeon.modeset=0 rd.driver.blacklist=radeon) using the integrated graphics as primary display
> >    - this caused the system to freeze (with color artifacts on the console)
> >
> > - In my naivity and because of the "i" in ifgx I tried both with
> >   'intel_iommu=ifgx_off' and then 'intel_iommu=on,igfx_off'
> >   and a full set of combinations of vfio, cards, kernels and pci-assign before I suspected
> >   that iommu support was turned off for **all** graphics cards with igfx_off
>
> I'm not sure why this is, looks like the code only tries to turn it off
> when only graphics is under the remapping device.  We'd probably need to
> see the DMAR to know more (/sys/firmware/acpi/tables/DMAR).
Attaching both a decoded one and the raw dump.

> > - The solution was to have integrated graphics turned off in the BIOS, and 'intel_iommu=on':
> >
> > - iommu groups:
> >
> > ls -l /sys/bus/pci/devices/0000:01:00.0/iommu_group/devices
> > total 0
> > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.0 -> ../../../../devices/pci0000:00/0000:00:01.0
> > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.1 -> ../../../../devices/pci0000:00/0000:00:01.1
> > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.0 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0
> > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.1 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.1
> > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.0 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.0
> > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.1 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.1
> >
> > - eg. both the VGA/HDMI Audio pairs + the two root ports they are plugged into are in the same group:
>
> Ick.  Intel has been pretty good about advertising ACS support on their
> root ports.  I wonder if this is an oversight or if they are actually
> not isolated from each other.
Sad state I'm afraid - one of the reasons I went for Intel this time - I
have usually chosen AMD in the past but had a bad experience with an FM1
board with no IOMMU support..

No ACS on any of the root ports (or anything else..) - see attachment..

I wish there were lspci -vvv's out there for all hardware - quite a
gamble to buy motherboards if one wish to utilize it for something more
than a plain Windows install..

> > # lspci -n
> > ...
> > 01:00.0 0300: 1002:683f
> > 01:00.1 0403: 1002:aab0
> > 02:00.0 0300: 1002:6779
> > 02:00.1 0403: 1002:aa98
> > ...
> >
> > modprobe vfio_pci
> > echo 0000:01:00.1 > /sys/bus/pci/devices/0000\:01\:00.1/driver/unbind
> > echo 0000:02:00.1 > /sys/bus/pci/devices/0000\:02\:00.1/driver/unbind
> > echo 1002 683f > /sys/bus/pci/drivers/vfio-pci/new_id
> > echo 1002 aab0 > /sys/bus/pci/drivers/vfio-pci/new_id
> > echo 1002 6779 > /sys/bus/pci/drivers/vfio-pci/new_id
> > echo 1002 aa98 > /sys/bus/pci/drivers/vfio-pci/new_id
> >
> > # lsusb
> > ...
> > Bus 001 Device 008: ID 046d:c315 Logitech, Inc. Classic New Touch Keyboard
> > Bus 001 Device 004: ID 046d:c05b Logitech, Inc. M-U0004 810-001317 [B110 Optical USB Mouse]
> > ...
> >
> > - I also applied your suggested patch to the quirk function in VFIO (see below)
> >
> > - Here is a (trimmed for readability) command line I successfully used to boot from the Windows 7 install DVD,
> >   notice the cd and disk device descriptions and the bus parameter - I struggled a while with that
> >   until I came across a comment by Gerd Hoffmann here: https://bugzilla.redhat.com/show_bug.cgi?id=922670 (Thanks, Gerd!)
> >
> >
> > qemu-kvm -M q35 \
> >   -nodefconfig -readconfig $SRC/qemu/docs/q35-chipset.cfg \
> >   -device vfio-pci,host=2:00.0,x-vga=on,multifunction=on,bus=ich9-pcie-port-1,addr=0.0 \
> >   -device vfio-pci,host=2:00.1,bus=ich9-pcie-port-1,addr=0.1 \
> >   -L $SRC/seabios/out/ -L $SRC/qemu/pc-bios \
> >   -vga none -nographic -cpu host -rtc base=localtime -k no -m 8192 -smp 2 \
> >   -drive file=/dev/sr0,index=2,media=cdrom,id=cd \
> >   -drive file=ivm03.img,index=0,media=disk,id=ivm03 \
> >   -device ide-drive,drive=ivm03,bus=ide.0 \
> >   -device ide-cd,drive=cd,bus=ide.1 \
> >   -net nic,vlan=0,model=virtio -net tap,vlan=0 \
> >   -enable-kvm \
> >   -device usb-host,hostbus=1,hostaddr=8 \
> >   -device usb-host,hostbus=1,hostaddr=4
> >
> > - Both the graphics card seemshould really support ACS on s to have a rom but only the HD6450 let itself to "scraping".
>
> Did you try scraping the HD6450 while the HD7700 was the boot VGA and
> vica versa?  The boot VGA ROM is handled in a special way and what you
> really get is the shadow copy, which isn't what we want.
I did all the scraping work with the Radeons with my initial setup while
the integrated graphics was the primary display. I tried once more now
to scrape the HD7700 while the HD6450 is the primary VGA and still get
the same result:

# echo 1 > /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/rom
# cat /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/rom > HD7700.rom
cat: /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/rom: Input/output error

system log reports:
May 14 07:15:54 asu kernel: [   82.344189] pci 0000:02:00.0: Invalid ROM contents

>
> > Anyway, supplying it to vfio did not seem to make any difference.
> >
> > find /sys -name rom
> > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/rom
> > /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/rom
> > ...
> >
> > Some observations and remaining unresolved issues:
> >
> > - VFIO patch:
> >   Initially (while still running with igfx_off) I observed exactly the same behaviour as [hidden email]
> >   reported a while ago: With vfio_pci debug enabled, vfio_pci ended up spinning with repeated calls to
> >   vfio_ati_3c3_quirk_read and repeated logs:
> >     vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >   I patched up accordingly with
> >
> >
> > diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> > index da0e5f9..a361d06 100644
> > --- a/hw/misc/vfio.c
> > +++ b/hw/misc/vfio.c
> > @@ -1291,7 +1291,7 @@ static uint64_t vfio_ati_3c3_quirk_read(void *opaque,
> >      uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
> >                                    addr + quirk->data.base_offset, size);
> >  
> > -    if (data == quirk->data.address_match) {
> > +    if (1 || data == quirk->data.address_match) {
> >          data = vfio_pci_read_config(&vdev->pdev, quirk->data.address_val, size);
> >          DPRINTF("%s(0x3c3, 1) = 0x%"PRIx64"\n", __func__, data);
> >      }
> >
> >
> >   This of course did not help much until I actually got the iommu
> >   enabled for the radeons (similar "repeated patters" as deniv reported)
> >   but what I have observed after I got it working is that if
> >   I disable the patch above, things are not that well: the Fedora VM
> >   comes up with VGA and the Fedora boot screen, then goes blank when
> >   switching to X.
>
> Hmm, I think we'd probably have better luck making that unconditional
> until we have reason to do otherwise.
I'm starting to wonder whether there's some timing issue or maybe
something with the initial state of the hardware affecting this.
It might be that the blank screen situation is more likely to occur if
debug is enabled - this morning I saw the same behavior even with the
patch enabled - then tried once more and got success. This was right
after a reboot so I tried a "warm" reboot and the same happened again:
First attempt got through the initial VGA phase then blanked, I ^C'ed
the VM then restarted and got all the way to the GUI again..

> > - The fact that the iommu group now extends across all my available graphics
> >   devices now makes it difficult to  get the radeon (or catalyst) driver use to
> >   the other card since the vfio_pci driver needs to hold it.
> >   Not a complete showstopper since the vesa driver comes up with 1024x768..
> >   Might it be a good idea to have an override option (exception list or similar?)
> >   to allow the vfio_pci to be less restrictive about owning the whole group
> >    - allow functionality over security in such case? This of course is further complicated
> >   by the need for graphics drivers to be disabled/enabled already at the kernel prompt..
>
> We have a quirk in the kernel that enables us to witelist devices, but
> yes, there is no flexibility in this w/o modifying the code and
> rebuilding.  (see drivers/pci/quirks.c:pci_dev_acs_enabled and follow
> the example above w/ pci_dev_dma_source - function can just return 1)
Thanks, I'll have a look at that,

> > - There seems to be a bug in the (version F8) UEFI BIOS on the motherboard,
> >   The BIOS offers (undocumented) a full range of selections of which PCIe
> >   (or PCIe 1x) graphics card to use as primary, but any other selection
> >   than the first PCIe 16x slot has no effect and the motherboard reverts
> >   to the first slot, so to be able to test both cards, I had to put the card under test
> >   into the second (8x) PCIe slot. I am waiting for feedback from Gigabyte on possible
> >   fixes for this in newer BIOSes.
> >
> > - The ultimate goal is to try to consolidate some older Windows desktops as "seats"
> >   on the new system, using the discrete graphics with HDMI/Displayport audio.
> >   With the HD7700 moved to the second PCIe slot I tested both Windows and
> >   Linux guests to try to get some sound through the HDMI audio device.
> >   Windows complains that no usable device is available. On Linux (Fedora 18, KDE desktop),
> >   the system settings -> multimedia dialogue never opens up which seems to indicate that
> >   PulseAudio has problems communicating with the passed through device (?),
> >   any hints/pointers here appreciated. From the vfio log it seems at least
> >   config space is accessed ok.
> >
> > - There also seems to be issues with radeon and intel_iommu=on - if I try
> >   to enable modesetting and normal X support for the radeon cards, X fails to start.
> >
> > - It would be nice if the integrated graphics could be used as the host primary display -
> >   I would be happy if someone has any hints as to if/how the ifgx_off option
> >   could be extended/modified to only affect iommu operation on selected device(s),
> >   if at all possible..
>
> Let's see what we can discover from your DMAR.  Also send along sudo
> lspci -vvv.  Thanks,
Attached.

One "interesting feature" I have never seen before on a motherboard is
that I get "pcilib: sysfs_read_vpd: read failed: Connection timed out"
while doing the lspci -vvv but this appears to come from trying to read
the Vital Product Data capability of the secondary onboard ethernet
[07:00.0 Ethernet controller: Atheros Communications Inc. AR8151 v2.0
Gigabit Ethernet (rev c0)] which should not have any significance here..

Thanks,

Knut


lspci_asu.txt (50K) Download Attachment
DMAR_asu.dsl (3K) Download Attachment
DMAR_asu.raw (180 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Justin Gottula
In reply to this post by Alex Williamson-3
Hi Alex,

VGA passthrough is working great here, with the exception of device reset.

In short, everything works the first time the guest runs. But the second time I start the guest, before anything comes on the screen, the host grinds to a halt and freezes (gradually, until after a few moments, magic sysrq doesn't even work). No 'reduced performance' here, just a completely frozen system.

Suspending and waking the host in between guest runs, while inconvenient, completely avoids the problem. No more freezes and full graphics performance in the guest. So my guess is that the PCI reset from software just isn't happening for some reason.

The passthru devices are being assigned to pci-stub and vfio-pci as they should be. Secondary passthrough (-vga cirrus or -vga std) doesn't change much: things still work, and reset still doesn't. One small difference is that the freeze on second boot is delayed until Windows initializes the secondary graphics adapter, since the device isn't touched by the BIOS prior to that.

Overriding the video BIOS doesn't seem to change anything. Passing through just the VGA device (excluding the HDMI audio device) doesn't seem to make much of a difference either.

- hardware
ASUS M5A99X EVO (AMD 990X/SB950; the IVRS is broken and overriden)
AMD Radeon HD 5750 (for the host)
AMD Radeon HD 7870 (for passthru)

- software: host
linux (Joerg Roedel's iommu tree) with linux-vfio merged in
qemu (latest git with vfio)
seabios (latest git)

- software: guest
windows 8
amd catalyst 13.5b2
virtio drivers

- lspci (abbreviated)
00:00.0 Host bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (external gfx0 port B) (rev 02)
00:00.2 IOMMU: Advanced Micro Devices [AMD] nee ATI RD990 I/O Memory Management Unit (IOMMU)
00:02.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port B)
00:03.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port C)
00:14.0 SMBus: Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller (rev 42)
01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Juniper [Radeon HD 5700 Series]
01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Juniper HDMI Audio [Radeon HD 5700 Series]
02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Pitcairn [Radeon HD 7800]
02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]

- lspci -n (abbreviated)
01:00.0 0300: 1002:68be
01:00.1 0403: 1002:aa58
02:00.0 0300: 1002:6818
02:00.1 0403: 1002:aab0

- lspci -t (abbreviated)
-[0000:00]-+-00.0
           +-00.2
           +-02.0-[01]--+-00.0
           |            \-00.1
           +-03.0-[02]--+-00.0
                        \-00.1

- kernel cmdline
/vmlinuz-linux-iommu initrd=/initramfs-linux-iommu.img root=/dev/CorsairVG/ArchLinux rootflags=subvol=root systemd.unit=graphical.target debug nomodeset vga=804 acpi.debug_level=0x2 acpi.debug_layer=0xFFFFFFFF amd_iommu_dump vfio_iommu_type1.allow_unsafe_interrupts=1 ivrs_ioapic[9]=00:14.0 ivrs_ioapic[10]=00:00.1

- qemu options
qemu-system-x86_64 -enable-kvm -name Windows8 \
 -M q35 -nodefconfig -readconfig /pool/KVM/Windows8/q35-chipset.cfg \
 -m 4096 -balloon none \
 -rtc base=localtime \
 -cpu host -smp 8,sockets=1,cores=4,threads=2 \
 -bios /usr/share/qemu/bios.bin \
 -vga none \
 -drive if=virtio,format=raw,discard=on,cache=none,file=/dev/CorsairVG/Windows \
 -drive if=virtio,format=raw,file=/pool/KVM/Windows8/WinData.ntfs.img \
 -drive id=cdrom,media=cdrom,format=raw,file=/dev/null \
 -device ide-cd,bus=ide.0,drive=cdrom \
 -boot order=dc,menu=on \
 -net nic,model=virtio,macaddr=00:55:aa:00:00:01 -net bridge,br=vm_br \
 -soundhw hda \
 -usbdevice tablet \
 -device vfio-pci,host=02:00.0,bus=ich9-pcie-port-1,addr=0.0,multifunction=on,x-vga=on \
 -device vfio-pci,host=02:00.1,bus=ich9-pcie-port-1,addr=0.1

- dmesg | egrep -i '(iommu|ioapic|pci-stub|vfio)' | grep -vi 'command line'
[    0.000000] ACPI: IOAPIC (id[0x09] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 9, version 33, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: IOAPIC (id[0x0a] address[0xfec20000] gsi_base[24])
[    0.000000] IOAPIC[1]: apic_id 10, version 33, address 0xfec20000, GSI 24-55
[    0.223213] AMD-Vi:   DEV_SPECIAL(IOAPIC[0])         devid: 00:14.0
[    0.223219] AMD-Vi:   DEV_SPECIAL(IOAPIC[255])               devid: 00:00.1
[    0.600863] ACPI: Using IOAPIC for interrupt routing
[    2.033551] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[    2.191528] VFIO - User Level meta-driver version: 0.3
[    3.601002] pci-stub: add 1002:6818 sub=FFFFFFFF:FFFFFFFF cls=00000000/00000000
[    3.607895] pci-stub 0000:02:00.0: claimed by stub
[    3.614688] pci-stub: add 1002:AAB0 sub=FFFFFFFF:FFFFFFFF cls=00000000/00000000
[    3.621640] pci-stub 0000:02:00.1: claimed by stub
[ 5137.969990] vfio-pci 0000:02:00.0: enabling device (0000 -> 0003)
[ 5137.995842] vfio_ecap_init: 0000:02:00.0 hiding ecap 0x19@0x270
[ 5137.995849] vfio_ecap_init: 0000:02:00.0 hiding ecap 0x1b@0x2d0
[ 5166.727114] vfio-pci 0000:02:00.0: irq 93 for MSI/MSI-X
(this is from before the second boot attempt)

- last two lines from dmesg before the freeze (netcat'd to another box)
Clocksource tsc unstable (delta = -416526709 ns)
Switching to clocksource hpet

- output with DEBUG_VFIO: lots and lots, see attachment

If you need any more information, I'll be glad to provide it.

Justin

qemu-logs.tar.gz (393K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Knut Omang-2
In reply to this post by Alex Williamson-3

On Mon, 2013-05-13 at 16:23 -0600, Alex Williamson wrote:

> On Mon, 2013-05-13 at 22:55 +0200, Knut Omang wrote:
> > Hi all,
> >
> > Perfect timing from my perspective, thanks Alex!
> >
> > I spent the better part of the weekend testing your branches on a new system
> > I just put together for this purpose, results below..
> >
> > On Fri, 2013-05-03 at 16:56 -0600, Alex Williamson wrote:
> > ...
> > > git://github.com/awilliam/linux-vfio.git vfio-vga-reset
> > > git://github.com/awilliam/qemu-vfio.git vfio-vga-reset
> >
> > System setup:
> >
> > - Fedora 18 on
> > - Gigabyte Z77X-UD5H motherboard
> > - Intel Core i7 3770 (Ivy bridge w/integrated graphics)
> > - 2 discrete graphics cards:
> >
> > lspci | egrep 'VGA|Audio'
> > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
> > 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
> > 01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Caicos [Radeon HD 6450]
> > 01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Caicos HDMI Audio [Radeon HD 6400 Series]
> > 02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Cape Verde PRO [Radeon HD 7700 Series]
> > 02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
> >
> > Short summary:
> >
> > - Once I got past a few time consuming obstacles explained below
> >    - the graphics part of the graphics/hdmi audio passthrough seems to work perfect
> >      on both discrete graphics cards
> >      (though so far only one at at time and with some minor issues, see below)
> >    - no success with the hdmi audio yet (ideas for further investigation appreciated!)
>
> I've had hdmi audio working with an HD7850, but only in Windows (7) and
> it was using legacy interrupts for some reason instead of MSI.  I wonder
> if Liux guests might work with snd_hda_intel.enable_msi=0.  I'm not sure
> what's wrong with MSI, but it seems to be new with the PCI bus reset
> support.
In my first tries, Windows were just using a generic
VGA driver, which still seems to work perfect with reboots and everything
and in full screen resolution (1920x1200).
However after installing the Catalyst AMD driver stack, upon boot
Windows 7 now consequently get a BSOD from the graphics driver
with the message:

"Attempt to reset the display driver and recover from timeout failed"
- a picture of the BSOD screen attached.

I attach the corresponding vfio log where I added some timing code to
make it easier to see when the BSOD happens (with 2 seconds of silence
in the log before the VM reboots, I believe this is at 09:28:32-34 in
the log.

Similar behaviour both just after reboot/power cycle of the host and
subsequent VM boot attempts.

This is still with the HD7700 as passed through device, but after a
motherboard firmware upgrade (to F14) which did not seem to affect the
observed behaviour on Windows prior to Catalyst install or with Linux
guest, neither did it fix the bug in selecting primary devices as I
was hoping for.

Let me know if you have ideas for further debugging this,

Thanks,

Knut

> > - Contrary to [hidden email] I had no success with using pci-assign for VGA
> >   with a standard fedora 18 kernel and fairly recent qemu, nor with your branches,
> >
> > Details:
> >
> > - I started off with the required kernel parameter 'intel_iommu=on' + necessary parameters for disabling radeon
> >    (radeon.modeset=0 rd.driver.blacklist=radeon) using the integrated graphics as primary display
> >    - this caused the system to freeze (with color artifacts on the console)
> >
> > - In my naivity and because of the "i" in ifgx I tried both with
> >   'intel_iommu=ifgx_off' and then 'intel_iommu=on,igfx_off'
> >   and a full set of combinations of vfio, cards, kernels and pci-assign before I suspected
> >   that iommu support was turned off for **all** graphics cards with igfx_off
>
> I'm not sure why this is, looks like the code only tries to turn it off
> when only graphics is under the remapping device.  We'd probably need to
> see the DMAR to know more (/sys/firmware/acpi/tables/DMAR).
>
> > - The solution was to have integrated graphics turned off in the BIOS, and 'intel_iommu=on':
> >
> > - iommu groups:
> >
> > ls -l /sys/bus/pci/devices/0000:01:00.0/iommu_group/devices
> > total 0
> > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.0 -> ../../../../devices/pci0000:00/0000:00:01.0
> > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.1 -> ../../../../devices/pci0000:00/0000:00:01.1
> > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.0 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0
> > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.1 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.1
> > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.0 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.0
> > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.1 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.1
> >
> > - eg. both the VGA/HDMI Audio pairs + the two root ports they are plugged into are in the same group:
>
> Ick.  Intel has been pretty good about advertising ACS support on their
> root ports.  I wonder if this is an oversight or if they are actually
> not isolated from each other.
>
> > # lspci -n
> > ...
> > 01:00.0 0300: 1002:683f
> > 01:00.1 0403: 1002:aab0
> > 02:00.0 0300: 1002:6779
> > 02:00.1 0403: 1002:aa98
> > ...
> >
> > modprobe vfio_pci
> > echo 0000:01:00.1 > /sys/bus/pci/devices/0000\:01\:00.1/driver/unbind
> > echo 0000:02:00.1 > /sys/bus/pci/devices/0000\:02\:00.1/driver/unbind
> > echo 1002 683f > /sys/bus/pci/drivers/vfio-pci/new_id
> > echo 1002 aab0 > /sys/bus/pci/drivers/vfio-pci/new_id
> > echo 1002 6779 > /sys/bus/pci/drivers/vfio-pci/new_id
> > echo 1002 aa98 > /sys/bus/pci/drivers/vfio-pci/new_id
> >
> > # lsusb
> > ...
> > Bus 001 Device 008: ID 046d:c315 Logitech, Inc. Classic New Touch Keyboard
> > Bus 001 Device 004: ID 046d:c05b Logitech, Inc. M-U0004 810-001317 [B110 Optical USB Mouse]
> > ...
> >
> > - I also applied your suggested patch to the quirk function in VFIO (see below)
> >
> > - Here is a (trimmed for readability) command line I successfully used to boot from the Windows 7 install DVD,
> >   notice the cd and disk device descriptions and the bus parameter - I struggled a while with that
> >   until I came across a comment by Gerd Hoffmann here: https://bugzilla.redhat.com/show_bug.cgi?id=922670 (Thanks, Gerd!)
> >
> >
> > qemu-kvm -M q35 \
> >   -nodefconfig -readconfig $SRC/qemu/docs/q35-chipset.cfg \
> >   -device vfio-pci,host=2:00.0,x-vga=on,multifunction=on,bus=ich9-pcie-port-1,addr=0.0 \
> >   -device vfio-pci,host=2:00.1,bus=ich9-pcie-port-1,addr=0.1 \
> >   -L $SRC/seabios/out/ -L $SRC/qemu/pc-bios \
> >   -vga none -nographic -cpu host -rtc base=localtime -k no -m 8192 -smp 2 \
> >   -drive file=/dev/sr0,index=2,media=cdrom,id=cd \
> >   -drive file=ivm03.img,index=0,media=disk,id=ivm03 \
> >   -device ide-drive,drive=ivm03,bus=ide.0 \
> >   -device ide-cd,drive=cd,bus=ide.1 \
> >   -net nic,vlan=0,model=virtio -net tap,vlan=0 \
> >   -enable-kvm \
> >   -device usb-host,hostbus=1,hostaddr=8 \
> >   -device usb-host,hostbus=1,hostaddr=4
> >
> > - Both the graphics card seemshould really support ACS on s to have a rom but only the HD6450 let itself to "scraping".
>
> Did you try scraping the HD6450 while the HD7700 was the boot VGA and
> vica versa?  The boot VGA ROM is handled in a special way and what you
> really get is the shadow copy, which isn't what we want.
>
> > Anyway, supplying it to vfio did not seem to make any difference.
> >
> > find /sys -name rom
> > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/rom
> > /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/rom
> > ...
> >
> > Some observations and remaining unresolved issues:
> >
> > - VFIO patch:
> >   Initially (while still running with igfx_off) I observed exactly the same behaviour as [hidden email]
> >   reported a while ago: With vfio_pci debug enabled, vfio_pci ended up spinning with repeated calls to
> >   vfio_ati_3c3_quirk_read and repeated logs:
> >     vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >   I patched up accordingly with
> >
> >
> > diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> > index da0e5f9..a361d06 100644
> > --- a/hw/misc/vfio.c
> > +++ b/hw/misc/vfio.c
> > @@ -1291,7 +1291,7 @@ static uint64_t vfio_ati_3c3_quirk_read(void *opaque,
> >      uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
> >                                    addr + quirk->data.base_offset, size);
> >  
> > -    if (data == quirk->data.address_match) {
> > +    if (1 || data == quirk->data.address_match) {
> >          data = vfio_pci_read_config(&vdev->pdev, quirk->data.address_val, size);
> >          DPRINTF("%s(0x3c3, 1) = 0x%"PRIx64"\n", __func__, data);
> >      }
> >
> >
> >   This of course did not help much until I actually got the iommu
> >   enabled for the radeons (similar "repeated patters" as deniv reported)
> >   but what I have observed after I got it working is that if
> >   I disable the patch above, things are not that well: the Fedora VM
> >   comes up with VGA and the Fedora boot screen, then goes blank when
> >   switching to X.
>
> Hmm, I think we'd probably have better luck making that unconditional
> until we have reason to do otherwise.
>
> > - The fact that the iommu group now extends across all my available graphics
> >   devices now makes it difficult to  get the radeon (or catalyst) driver use to
> >   the other card since the vfio_pci driver needs to hold it.
> >   Not a complete showstopper since the vesa driver comes up with 1024x768..
> >   Might it be a good idea to have an override option (exception list or similar?)
> >   to allow the vfio_pci to be less restrictive about owning the whole group
> >    - allow functionality over security in such case? This of course is further complicated
> >   by the need for graphics drivers to be disabled/enabled already at the kernel prompt..
>
> We have a quirk in the kernel that enables us to witelist devices, but
> yes, there is no flexibility in this w/o modifying the code and
> rebuilding.  (see drivers/pci/quirks.c:pci_dev_acs_enabled and follow
> the example above w/ pci_dev_dma_source - function can just return 1)
>
> > - There seems to be a bug in the (version F8) UEFI BIOS on the motherboard,
> >   The BIOS offers (undocumented) a full range of selections of which PCIe
> >   (or PCIe 1x) graphics card to use as primary, but any other selection
> >   than the first PCIe 16x slot has no effect and the motherboard reverts
> >   to the first slot, so to be able to test both cards, I had to put the card under test
> >   into the second (8x) PCIe slot. I am waiting for feedback from Gigabyte on possible
> >   fixes for this in newer BIOSes.
> >
> > - The ultimate goal is to try to consolidate some older Windows desktops as "seats"
> >   on the new system, using the discrete graphics with HDMI/Displayport audio.
> >   With the HD7700 moved to the second PCIe slot I tested both Windows and
> >   Linux guests to try to get some sound through the HDMI audio device.
> >   Windows complains that no usable device is available. On Linux (Fedora 18, KDE desktop),
> >   the system settings -> multimedia dialogue never opens up which seems to indicate that
> >   PulseAudio has problems communicating with the passed through device (?),
> >   any hints/pointers here appreciated. From the vfio log it seems at least
> >   config space is accessed ok.
> >
> > - There also seems to be issues with radeon and intel_iommu=on - if I try
> >   to enable modesetting and normal X support for the radeon cards, X fails to start.
> >
> > - It would be nice if the integrated graphics could be used as the host primary display -
> >   I would be happy if someone has any hints as to if/how the ifgx_off option
> >   could be extended/modified to only affect iommu operation on selected device(s),
> >   if at all possible..
>
> Let's see what we can discover from your DMAR.  Also send along sudo
> lspci -vvv.  Thanks,
>
> Alex
>


bsod_ivm03.jpg (144K) Download Attachment
ivm03_bsod.log.bz2 (182K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Maik Broemme
Hi Knut,

Knut Omang <[hidden email]> wrote:

>
> On Mon, 2013-05-13 at 16:23 -0600, Alex Williamson wrote:
> > On Mon, 2013-05-13 at 22:55 +0200, Knut Omang wrote:
> > > Hi all,
> > >
> > > Perfect timing from my perspective, thanks Alex!
> > >
> > > I spent the better part of the weekend testing your branches on a new system
> > > I just put together for this purpose, results below..
> > >
> > > On Fri, 2013-05-03 at 16:56 -0600, Alex Williamson wrote:
> > > ...
> > > > git://github.com/awilliam/linux-vfio.git vfio-vga-reset
> > > > git://github.com/awilliam/qemu-vfio.git vfio-vga-reset
> > >
> > > System setup:
> > >
> > > - Fedora 18 on
> > > - Gigabyte Z77X-UD5H motherboard
> > > - Intel Core i7 3770 (Ivy bridge w/integrated graphics)
> > > - 2 discrete graphics cards:
> > >
> > > lspci | egrep 'VGA|Audio'
> > > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
> > > 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
> > > 01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Caicos [Radeon HD 6450]
> > > 01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Caicos HDMI Audio [Radeon HD 6400 Series]
> > > 02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Cape Verde PRO [Radeon HD 7700 Series]
> > > 02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
> > >
> > > Short summary:
> > >
> > > - Once I got past a few time consuming obstacles explained below
> > >    - the graphics part of the graphics/hdmi audio passthrough seems to work perfect
> > >      on both discrete graphics cards
> > >      (though so far only one at at time and with some minor issues, see below)
> > >    - no success with the hdmi audio yet (ideas for further investigation appreciated!)
> >
> > I've had hdmi audio working with an HD7850, but only in Windows (7) and
> > it was using legacy interrupts for some reason instead of MSI.  I wonder
> > if Liux guests might work with snd_hda_intel.enable_msi=0.  I'm not sure
> > what's wrong with MSI, but it seems to be new with the PCI bus reset
> > support.
>
> In my first tries, Windows were just using a generic
> VGA driver, which still seems to work perfect with reboots and everything
> and in full screen resolution (1920x1200).
> However after installing the Catalyst AMD driver stack, upon boot
> Windows 7 now consequently get a BSOD from the graphics driver
> with the message:
>
> "Attempt to reset the display driver and recover from timeout failed"
> - a picture of the BSOD screen attached.
>
> I attach the corresponding vfio log where I added some timing code to
> make it easier to see when the BSOD happens (with 2 seconds of silence
> in the log before the VM reboots, I believe this is at 09:28:32-34 in
> the log.
>
> Similar behaviour both just after reboot/power cycle of the host and
> subsequent VM boot attempts.
>
> This is still with the HD7700 as passed through device, but after a
> motherboard firmware upgrade (to F14) which did not seem to affect the
> observed behaviour on Windows prior to Catalyst install or with Linux
> guest, neither did it fix the bug in selecting primary devices as I
> was hoping for.
>
> Let me know if you have ideas for further debugging this,
>

I had a similar problem a couple of days ago and posted it in this list.
I got similar BSOD and tested already the following configurations:

1) machine: q35 / kvm: yes / vga: none   / x-vga: on  = qemu freeze
2) machine: q35 / kvm: no  / vga: none   / x-vga: on  = qemu freeze
   (with errors below)
3) machine: q35 / kvm: yes / vga: none   / x-vga: off = qemu runs with
   100% CPU due to no VGA init (no picture)
4) machine: q35 / kvm: yes / vga: cirrus / x-vga: off = qemu runs with
   BOSD on loading atikmpag.sys
5) machine: pc  / kvm: yes / vga: cirrus / x-vga: off = qemu runs fine

However I've re-run the BSOD case already with the following branches
from Alex:

git://github.com/awilliam/linux-vfio.git vfio-vga-reset
git://github.com/awilliam/qemu-vfio.git vfio-vga-reset

Also with latest seabios and it worked so far. No more BSOD and reboot
of VM was also possible without suspend / resume the host between.

> Thanks,
>
> Knut
>
> > > - Contrary to [hidden email] I had no success with using pci-assign for VGA
> > >   with a standard fedora 18 kernel and fairly recent qemu, nor with your branches,
> > >
> > > Details:
> > >
> > > - I started off with the required kernel parameter 'intel_iommu=on' + necessary parameters for disabling radeon
> > >    (radeon.modeset=0 rd.driver.blacklist=radeon) using the integrated graphics as primary display
> > >    - this caused the system to freeze (with color artifacts on the console)
> > >
> > > - In my naivity and because of the "i" in ifgx I tried both with
> > >   'intel_iommu=ifgx_off' and then 'intel_iommu=on,igfx_off'
> > >   and a full set of combinations of vfio, cards, kernels and pci-assign before I suspected
> > >   that iommu support was turned off for **all** graphics cards with igfx_off
> >
> > I'm not sure why this is, looks like the code only tries to turn it off
> > when only graphics is under the remapping device.  We'd probably need to
> > see the DMAR to know more (/sys/firmware/acpi/tables/DMAR).
> >
> > > - The solution was to have integrated graphics turned off in the BIOS, and 'intel_iommu=on':
> > >
> > > - iommu groups:
> > >
> > > ls -l /sys/bus/pci/devices/0000:01:00.0/iommu_group/devices
> > > total 0
> > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.0 -> ../../../../devices/pci0000:00/0000:00:01.0
> > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.1 -> ../../../../devices/pci0000:00/0000:00:01.1
> > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.0 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0
> > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.1 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.1
> > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.0 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.0
> > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.1 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.1
> > >
> > > - eg. both the VGA/HDMI Audio pairs + the two root ports they are plugged into are in the same group:
> >
> > Ick.  Intel has been pretty good about advertising ACS support on their
> > root ports.  I wonder if this is an oversight or if they are actually
> > not isolated from each other.
> >
> > > # lspci -n
> > > ...
> > > 01:00.0 0300: 1002:683f
> > > 01:00.1 0403: 1002:aab0
> > > 02:00.0 0300: 1002:6779
> > > 02:00.1 0403: 1002:aa98
> > > ...
> > >
> > > modprobe vfio_pci
> > > echo 0000:01:00.1 > /sys/bus/pci/devices/0000\:01\:00.1/driver/unbind
> > > echo 0000:02:00.1 > /sys/bus/pci/devices/0000\:02\:00.1/driver/unbind
> > > echo 1002 683f > /sys/bus/pci/drivers/vfio-pci/new_id
> > > echo 1002 aab0 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > echo 1002 6779 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > echo 1002 aa98 > /sys/bus/pci/drivers/vfio-pci/new_id
> > >
> > > # lsusb
> > > ...
> > > Bus 001 Device 008: ID 046d:c315 Logitech, Inc. Classic New Touch Keyboard
> > > Bus 001 Device 004: ID 046d:c05b Logitech, Inc. M-U0004 810-001317 [B110 Optical USB Mouse]
> > > ...
> > >
> > > - I also applied your suggested patch to the quirk function in VFIO (see below)
> > >
> > > - Here is a (trimmed for readability) command line I successfully used to boot from the Windows 7 install DVD,
> > >   notice the cd and disk device descriptions and the bus parameter - I struggled a while with that
> > >   until I came across a comment by Gerd Hoffmann here: https://bugzilla.redhat.com/show_bug.cgi?id=922670 (Thanks, Gerd!)
> > >
> > >
> > > qemu-kvm -M q35 \
> > >   -nodefconfig -readconfig $SRC/qemu/docs/q35-chipset.cfg \
> > >   -device vfio-pci,host=2:00.0,x-vga=on,multifunction=on,bus=ich9-pcie-port-1,addr=0.0 \
> > >   -device vfio-pci,host=2:00.1,bus=ich9-pcie-port-1,addr=0.1 \
> > >   -L $SRC/seabios/out/ -L $SRC/qemu/pc-bios \
> > >   -vga none -nographic -cpu host -rtc base=localtime -k no -m 8192 -smp 2 \
> > >   -drive file=/dev/sr0,index=2,media=cdrom,id=cd \
> > >   -drive file=ivm03.img,index=0,media=disk,id=ivm03 \
> > >   -device ide-drive,drive=ivm03,bus=ide.0 \
> > >   -device ide-cd,drive=cd,bus=ide.1 \
> > >   -net nic,vlan=0,model=virtio -net tap,vlan=0 \
> > >   -enable-kvm \
> > >   -device usb-host,hostbus=1,hostaddr=8 \
> > >   -device usb-host,hostbus=1,hostaddr=4
> > >
> > > - Both the graphics card seemshould really support ACS on s to have a rom but only the HD6450 let itself to "scraping".
> >
> > Did you try scraping the HD6450 while the HD7700 was the boot VGA and
> > vica versa?  The boot VGA ROM is handled in a special way and what you
> > really get is the shadow copy, which isn't what we want.
> >
> > > Anyway, supplying it to vfio did not seem to make any difference.
> > >
> > > find /sys -name rom
> > > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/rom
> > > /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/rom
> > > ...
> > >
> > > Some observations and remaining unresolved issues:
> > >
> > > - VFIO patch:
> > >   Initially (while still running with igfx_off) I observed exactly the same behaviour as [hidden email]
> > >   reported a while ago: With vfio_pci debug enabled, vfio_pci ended up spinning with repeated calls to
> > >   vfio_ati_3c3_quirk_read and repeated logs:
> > >     vfio: vfio_vga_read(0x3c3, 1) = 0x0
> > >   I patched up accordingly with
> > >
> > >
> > > diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> > > index da0e5f9..a361d06 100644
> > > --- a/hw/misc/vfio.c
> > > +++ b/hw/misc/vfio.c
> > > @@ -1291,7 +1291,7 @@ static uint64_t vfio_ati_3c3_quirk_read(void *opaque,
> > >      uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
> > >                                    addr + quirk->data.base_offset, size);
> > >  
> > > -    if (data == quirk->data.address_match) {
> > > +    if (1 || data == quirk->data.address_match) {
> > >          data = vfio_pci_read_config(&vdev->pdev, quirk->data.address_val, size);
> > >          DPRINTF("%s(0x3c3, 1) = 0x%"PRIx64"\n", __func__, data);
> > >      }
> > >
> > >
> > >   This of course did not help much until I actually got the iommu
> > >   enabled for the radeons (similar "repeated patters" as deniv reported)
> > >   but what I have observed after I got it working is that if
> > >   I disable the patch above, things are not that well: the Fedora VM
> > >   comes up with VGA and the Fedora boot screen, then goes blank when
> > >   switching to X.
> >
> > Hmm, I think we'd probably have better luck making that unconditional
> > until we have reason to do otherwise.
> >
> > > - The fact that the iommu group now extends across all my available graphics
> > >   devices now makes it difficult to  get the radeon (or catalyst) driver use to
> > >   the other card since the vfio_pci driver needs to hold it.
> > >   Not a complete showstopper since the vesa driver comes up with 1024x768..
> > >   Might it be a good idea to have an override option (exception list or similar?)
> > >   to allow the vfio_pci to be less restrictive about owning the whole group
> > >    - allow functionality over security in such case? This of course is further complicated
> > >   by the need for graphics drivers to be disabled/enabled already at the kernel prompt..
> >
> > We have a quirk in the kernel that enables us to witelist devices, but
> > yes, there is no flexibility in this w/o modifying the code and
> > rebuilding.  (see drivers/pci/quirks.c:pci_dev_acs_enabled and follow
> > the example above w/ pci_dev_dma_source - function can just return 1)
> >
> > > - There seems to be a bug in the (version F8) UEFI BIOS on the motherboard,
> > >   The BIOS offers (undocumented) a full range of selections of which PCIe
> > >   (or PCIe 1x) graphics card to use as primary, but any other selection
> > >   than the first PCIe 16x slot has no effect and the motherboard reverts
> > >   to the first slot, so to be able to test both cards, I had to put the card under test
> > >   into the second (8x) PCIe slot. I am waiting for feedback from Gigabyte on possible
> > >   fixes for this in newer BIOSes.
> > >
> > > - The ultimate goal is to try to consolidate some older Windows desktops as "seats"
> > >   on the new system, using the discrete graphics with HDMI/Displayport audio.
> > >   With the HD7700 moved to the second PCIe slot I tested both Windows and
> > >   Linux guests to try to get some sound through the HDMI audio device.
> > >   Windows complains that no usable device is available. On Linux (Fedora 18, KDE desktop),
> > >   the system settings -> multimedia dialogue never opens up which seems to indicate that
> > >   PulseAudio has problems communicating with the passed through device (?),
> > >   any hints/pointers here appreciated. From the vfio log it seems at least
> > >   config space is accessed ok.
> > >
> > > - There also seems to be issues with radeon and intel_iommu=on - if I try
> > >   to enable modesetting and normal X support for the radeon cards, X fails to start.
> > >
> > > - It would be nice if the integrated graphics could be used as the host primary display -
> > >   I would be happy if someone has any hints as to if/how the ifgx_off option
> > >   could be extended/modified to only affect iommu operation on selected device(s),
> > >   if at all possible..
> >
> > Let's see what we can discover from your DMAR.  Also send along sudo
> > lspci -vvv.  Thanks,
> >
> > Alex
> >
>
>

--Maik

Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Alex Williamson-3
On Sun, 2013-05-19 at 23:26 +0400, Maik Broemme wrote:

> Hi Knut,
>
> Knut Omang <[hidden email]> wrote:
> >
> > On Mon, 2013-05-13 at 16:23 -0600, Alex Williamson wrote:
> > > On Mon, 2013-05-13 at 22:55 +0200, Knut Omang wrote:
> > > > Hi all,
> > > >
> > > > Perfect timing from my perspective, thanks Alex!
> > > >
> > > > I spent the better part of the weekend testing your branches on a new system
> > > > I just put together for this purpose, results below..
> > > >
> > > > On Fri, 2013-05-03 at 16:56 -0600, Alex Williamson wrote:
> > > > ...
> > > > > git://github.com/awilliam/linux-vfio.git vfio-vga-reset
> > > > > git://github.com/awilliam/qemu-vfio.git vfio-vga-reset
> > > >
> > > > System setup:
> > > >
> > > > - Fedora 18 on
> > > > - Gigabyte Z77X-UD5H motherboard
> > > > - Intel Core i7 3770 (Ivy bridge w/integrated graphics)
> > > > - 2 discrete graphics cards:
> > > >
> > > > lspci | egrep 'VGA|Audio'
> > > > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
> > > > 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
> > > > 01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Caicos [Radeon HD 6450]
> > > > 01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Caicos HDMI Audio [Radeon HD 6400 Series]
> > > > 02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Cape Verde PRO [Radeon HD 7700 Series]
> > > > 02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
> > > >
> > > > Short summary:
> > > >
> > > > - Once I got past a few time consuming obstacles explained below
> > > >    - the graphics part of the graphics/hdmi audio passthrough seems to work perfect
> > > >      on both discrete graphics cards
> > > >      (though so far only one at at time and with some minor issues, see below)
> > > >    - no success with the hdmi audio yet (ideas for further investigation appreciated!)
> > >
> > > I've had hdmi audio working with an HD7850, but only in Windows (7) and
> > > it was using legacy interrupts for some reason instead of MSI.  I wonder
> > > if Liux guests might work with snd_hda_intel.enable_msi=0.  I'm not sure
> > > what's wrong with MSI, but it seems to be new with the PCI bus reset
> > > support.
> >
> > In my first tries, Windows were just using a generic
> > VGA driver, which still seems to work perfect with reboots and everything
> > and in full screen resolution (1920x1200).
> > However after installing the Catalyst AMD driver stack, upon boot
> > Windows 7 now consequently get a BSOD from the graphics driver
> > with the message:
> >
> > "Attempt to reset the display driver and recover from timeout failed"
> > - a picture of the BSOD screen attached.
> >
> > I attach the corresponding vfio log where I added some timing code to
> > make it easier to see when the BSOD happens (with 2 seconds of silence
> > in the log before the VM reboots, I believe this is at 09:28:32-34 in
> > the log.
> >
> > Similar behaviour both just after reboot/power cycle of the host and
> > subsequent VM boot attempts.
> >
> > This is still with the HD7700 as passed through device, but after a
> > motherboard firmware upgrade (to F14) which did not seem to affect the
> > observed behaviour on Windows prior to Catalyst install or with Linux
> > guest, neither did it fix the bug in selecting primary devices as I
> > was hoping for.
> >
> > Let me know if you have ideas for further debugging this,
> >
>
> I had a similar problem a couple of days ago and posted it in this list.
> I got similar BSOD and tested already the following configurations:
>
> 1) machine: q35 / kvm: yes / vga: none   / x-vga: on  = qemu freeze
> 2) machine: q35 / kvm: no  / vga: none   / x-vga: on  = qemu freeze
>    (with errors below)
> 3) machine: q35 / kvm: yes / vga: none   / x-vga: off = qemu runs with
>    100% CPU due to no VGA init (no picture)
> 4) machine: q35 / kvm: yes / vga: cirrus / x-vga: off = qemu runs with
>    BOSD on loading atikmpag.sys
> 5) machine: pc  / kvm: yes / vga: cirrus / x-vga: off = qemu runs fine
>
> However I've re-run the BSOD case already with the following branches
> from Alex:
>
> git://github.com/awilliam/linux-vfio.git vfio-vga-reset
> git://github.com/awilliam/qemu-vfio.git vfio-vga-reset
>
> Also with latest seabios and it worked so far. No more BSOD and reboot
> of VM was also possible without suspend / resume the host between.

Good to hear.  It looks like you have the same motherboard as my AMD
test system.  An HD7850 in that system runs quite reliably with the
branches above although I do occasionally get VGA palette corruption.

Are you still require -vga cirrus or do the -vga none, x-vga=on cases
work now too?  Thanks,

Alex


Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Alex Williamson-3
In reply to this post by Justin Gottula
On Fri, 2013-05-17 at 01:09 -0700, Justin Gottula wrote:

> Hi Alex,
>
> VGA passthrough is working great here, with the exception of device reset.
>
> In short, everything works the first time the guest runs. But the second
> time I start the guest, before anything comes on the screen, the host
> grinds to a halt and freezes (gradually, until after a few moments, magic
> sysrq doesn't even work). No 'reduced performance' here, just a completely
> frozen system.
>
> Suspending and waking the host in between guest runs, while inconvenient,
> completely avoids the problem. No more freezes and full graphics
> performance in the guest. So my guess is that the PCI reset from software
> just isn't happening for some reason.
>
> The passthru devices are being assigned to pci-stub and vfio-pci as they
> should be. Secondary passthrough (-vga cirrus or -vga std) doesn't change
> much: things still work, and reset still doesn't. One small difference is
> that the freeze on second boot is delayed until Windows initializes the
> secondary graphics adapter, since the device isn't touched by the BIOS
> prior to that.
>
> Overriding the video BIOS doesn't seem to change anything. Passing through
> just the VGA device (excluding the HDMI audio device) doesn't seem to make
> much of a difference either.
>
> - hardware
> ASUS M5A99X EVO (AMD 990X/SB950; the IVRS is broken and overriden)
> AMD Radeon HD 5750 (for the host)
> AMD Radeon HD 7870 (for passthru)
>
> - software: host
> linux (Joerg Roedel's iommu tree) with linux-vfio merged in
> qemu (latest git with vfio)

Are you dependent on Joerg's tree for the IVRS fixup?  It would be
preferable to start with just my vfio-vga-reset branches before adding
more variables.  Also, be sure you're using the correct branch to get
the PCI bus reset code.  You can verify with something like:

grep VFIO_DEVICE_PCI_BUS_RESET qemu.git/hw/misc/vfio.c
grep VFIO_DEVICE_PCI_BUS_RESET linux.git/drivers/vfio/pci/vfio_pci.c

I have seen timer messages from the host and they can cause the host to
become very unresponsive.  I'm not sure I've seen a full freeze though.
In one case I've also seen this with the PCI bus reset code where the
bus didn't return to a working state (the reason I haven't reposted the
PCI changes upstream yet).  When I saw this it was in a system where
resetting another slot works quite well, so when I get a chance to look
at it again, I'll probably start with seeing if the problem is unique to
the slot.  If you can manage to get the system to run lspci once it's in
this state you can tell if the bus is still in reset if the devices
report ref FF and get an unknown header type 7F for those devices.

> seabios (latest git)
>
> - software: guest
> windows 8
> amd catalyst 13.5b2
> virtio drivers
>
> - lspci (abbreviated)
> 00:00.0 Host bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI
> bridge (external gfx0 port B) (rev 02)
> 00:00.2 IOMMU: Advanced Micro Devices [AMD] nee ATI RD990 I/O Memory
> Management Unit (IOMMU)
> 00:02.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI
> bridge (PCI express gpp port B)
> 00:03.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI
> bridge (PCI express gpp port C)
> 00:14.0 SMBus: Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller
> (rev 42)
> 01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI
> Juniper [Radeon HD 5700 Series]
> 01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Juniper HDMI
> Audio [Radeon HD 5700 Series]
> 02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI
> Pitcairn [Radeon HD 7800]
> 02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape
> Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
>
> - lspci -n (abbreviated)
> 01:00.0 0300: 1002:68be
> 01:00.1 0403: 1002:aa58
> 02:00.0 0300: 1002:6818
> 02:00.1 0403: 1002:aab0
>
> - lspci -t (abbreviated)
> -[0000:00]-+-00.0
>            +-00.2
>            +-02.0-[01]--+-00.0
>            |            \-00.1
>            +-03.0-[02]--+-00.0
>                         \-00.1
>
> - kernel cmdline
> /vmlinuz-linux-iommu initrd=/initramfs-linux-iommu.img
> root=/dev/CorsairVG/ArchLinux rootflags=subvol=root
> systemd.unit=graphical.target debug nomodeset vga=804 acpi.debug_level=0x2
> acpi.debug_layer=0xFFFFFFFF amd_iommu_dump
> vfio_iommu_type1.allow_unsafe_interrupts=1 ivrs_ioapic[9]=00:14.0
> ivrs_ioapic[10]=00:00.1
>
> - qemu options
> qemu-system-x86_64 -enable-kvm -name Windows8 \
>  -M q35 -nodefconfig -readconfig /pool/KVM/Windows8/q35-chipset.cfg \
>  -m 4096 -balloon none \
>  -rtc base=localtime \
>  -cpu host -smp 8,sockets=1,cores=4,threads=2 \
>  -bios /usr/share/qemu/bios.bin \
>  -vga none \
>  -drive
> if=virtio,format=raw,discard=on,cache=none,file=/dev/CorsairVG/Windows \
>  -drive if=virtio,format=raw,file=/pool/KVM/Windows8/WinData.ntfs.img \
>  -drive id=cdrom,media=cdrom,format=raw,file=/dev/null \
>  -device ide-cd,bus=ide.0,drive=cdrom \
>  -boot order=dc,menu=on \
>  -net nic,model=virtio,macaddr=00:55:aa:00:00:01 -net bridge,br=vm_br \
>  -soundhw hda \
>  -usbdevice tablet \
>  -device
> vfio-pci,host=02:00.0,bus=ich9-pcie-port-1,addr=0.0,multifunction=on,x-vga=on
> \
>  -device vfio-pci,host=02:00.1,bus=ich9-pcie-port-1,addr=0.1
>
> - dmesg | egrep -i '(iommu|ioapic|pci-stub|vfio)' | grep -vi 'command line'
> [    0.000000] ACPI: IOAPIC (id[0x09] address[0xfec00000] gsi_base[0])
> [    0.000000] IOAPIC[0]: apic_id 9, version 33, address 0xfec00000, GSI
> 0-23
> [    0.000000] ACPI: IOAPIC (id[0x0a] address[0xfec20000] gsi_base[24])
> [    0.000000] IOAPIC[1]: apic_id 10, version 33, address 0xfec20000, GSI
> 24-55
> [    0.223213] AMD-Vi:   DEV_SPECIAL(IOAPIC[0])         devid: 00:14.0
> [    0.223219] AMD-Vi:   DEV_SPECIAL(IOAPIC[255])               devid:
> 00:00.1
> [    0.600863] ACPI: Using IOAPIC for interrupt routing
> [    2.033551] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
> [    2.191528] VFIO - User Level meta-driver version: 0.3
> [    3.601002] pci-stub: add 1002:6818 sub=FFFFFFFF:FFFFFFFF
> cls=00000000/00000000
> [    3.607895] pci-stub 0000:02:00.0: claimed by stub
> [    3.614688] pci-stub: add 1002:AAB0 sub=FFFFFFFF:FFFFFFFF
> cls=00000000/00000000
> [    3.621640] pci-stub 0000:02:00.1: claimed by stub

2:00.0/1 is added to pci-stub here, but used by vfio-pci below.  Is
pci-stub just temporary to keep radeon from binding to it?

> [ 5137.969990] vfio-pci 0000:02:00.0: enabling device (0000 -> 0003)
> [ 5137.995842] vfio_ecap_init: 0000:02:00.0 hiding ecap 0x19@0x270
> [ 5137.995849] vfio_ecap_init: 0000:02:00.0 hiding ecap 0x1b@0x2d0
> [ 5166.727114] vfio-pci 0000:02:00.0: irq 93 for MSI/MSI-X
> (this is from before the second boot attempt)
>
> - last two lines from dmesg before the freeze (netcat'd to another box)
> Clocksource tsc unstable (delta = -416526709 ns)
> Switching to clocksource hpet
>
> - output with DEBUG_VFIO: lots and lots, see attachment
>
> If you need any more information, I'll be glad to provide it.

Thanks,

Alex



Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Alex Williamson-3
In reply to this post by Knut Omang-2
On Sun, 2013-05-19 at 17:35 +0200, Knut Omang wrote:

> On Mon, 2013-05-13 at 16:23 -0600, Alex Williamson wrote:
> > On Mon, 2013-05-13 at 22:55 +0200, Knut Omang wrote:
> > > Hi all,
> > >
> > > Perfect timing from my perspective, thanks Alex!
> > >
> > > I spent the better part of the weekend testing your branches on a new system
> > > I just put together for this purpose, results below..
> > >
> > > On Fri, 2013-05-03 at 16:56 -0600, Alex Williamson wrote:
> > > ...
> > > > git://github.com/awilliam/linux-vfio.git vfio-vga-reset
> > > > git://github.com/awilliam/qemu-vfio.git vfio-vga-reset
> > >
> > > System setup:
> > >
> > > - Fedora 18 on
> > > - Gigabyte Z77X-UD5H motherboard
> > > - Intel Core i7 3770 (Ivy bridge w/integrated graphics)
> > > - 2 discrete graphics cards:
> > >
> > > lspci | egrep 'VGA|Audio'
> > > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
> > > 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
> > > 01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Caicos [Radeon HD 6450]
> > > 01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Caicos HDMI Audio [Radeon HD 6400 Series]
> > > 02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Cape Verde PRO [Radeon HD 7700 Series]
> > > 02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
> > >
> > > Short summary:
> > >
> > > - Once I got past a few time consuming obstacles explained below
> > >    - the graphics part of the graphics/hdmi audio passthrough seems to work perfect
> > >      on both discrete graphics cards
> > >      (though so far only one at at time and with some minor issues, see below)
> > >    - no success with the hdmi audio yet (ideas for further investigation appreciated!)
> >
> > I've had hdmi audio working with an HD7850, but only in Windows (7) and
> > it was using legacy interrupts for some reason instead of MSI.  I wonder
> > if Liux guests might work with snd_hda_intel.enable_msi=0.  I'm not sure
> > what's wrong with MSI, but it seems to be new with the PCI bus reset
> > support.
>
> In my first tries, Windows were just using a generic
> VGA driver, which still seems to work perfect with reboots and everything
> and in full screen resolution (1920x1200).
> However after installing the Catalyst AMD driver stack, upon boot
> Windows 7 now consequently get a BSOD from the graphics driver
> with the message:
>
> "Attempt to reset the display driver and recover from timeout failed"
> - a picture of the BSOD screen attached.

I've seen that BSOD before, but I don't know how to reproduce it.  It
seems like I haven't seen it with the PCI bus reset code.  I'm running
version 13.1 of the catalyst driver, you?

> I attach the corresponding vfio log where I added some timing code to
> make it easier to see when the BSOD happens (with 2 seconds of silence
> in the log before the VM reboots, I believe this is at 09:28:32-34 in
> the log.

Yep, looks like that's where windows starts the BSOD.

> Similar behaviour both just after reboot/power cycle of the host and
> subsequent VM boot attempts.
>
> This is still with the HD7700 as passed through device, but after a
> motherboard firmware upgrade (to F14) which did not seem to affect the
> observed behaviour on Windows prior to Catalyst install or with Linux
> guest, neither did it fix the bug in selecting primary devices as I
> was hoping for.
>
> Let me know if you have ideas for further debugging this,

I don't have any great ideas since I don't know how to reproduce the
timeout.  Double/triple check that you're using the correct
vfio-vga-reset branches in both qemu and kernel

# grep VFIO_DEVICE_PCI_BUS_RESET qemu.git/hw/misc/vfio.c
# grep VFIO_DEVICE_PCI_BUS_RESET linux.git/drivers/vfio/pci/vfio_pci.c

I didn't see anything telling in your DMAR either.  The system seems to
have just one DRHD that includes everything, so I'm not sure why you saw
any behavior change from igfx_off.  Thanks,

Alex

> > > - Contrary to [hidden email] I had no success with using pci-assign for VGA
> > >   with a standard fedora 18 kernel and fairly recent qemu, nor with your branches,
> > >
> > > Details:
> > >
> > > - I started off with the required kernel parameter 'intel_iommu=on' + necessary parameters for disabling radeon
> > >    (radeon.modeset=0 rd.driver.blacklist=radeon) using the integrated graphics as primary display
> > >    - this caused the system to freeze (with color artifacts on the console)
> > >
> > > - In my naivity and because of the "i" in ifgx I tried both with
> > >   'intel_iommu=ifgx_off' and then 'intel_iommu=on,igfx_off'
> > >   and a full set of combinations of vfio, cards, kernels and pci-assign before I suspected
> > >   that iommu support was turned off for **all** graphics cards with igfx_off
> >
> > I'm not sure why this is, looks like the code only tries to turn it off
> > when only graphics is under the remapping device.  We'd probably need to
> > see the DMAR to know more (/sys/firmware/acpi/tables/DMAR).
> >
> > > - The solution was to have integrated graphics turned off in the BIOS, and 'intel_iommu=on':
> > >
> > > - iommu groups:
> > >
> > > ls -l /sys/bus/pci/devices/0000:01:00.0/iommu_group/devices
> > > total 0
> > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.0 -> ../../../../devices/pci0000:00/0000:00:01.0
> > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.1 -> ../../../../devices/pci0000:00/0000:00:01.1
> > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.0 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0
> > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.1 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.1
> > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.0 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.0
> > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.1 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.1
> > >
> > > - eg. both the VGA/HDMI Audio pairs + the two root ports they are plugged into are in the same group:
> >
> > Ick.  Intel has been pretty good about advertising ACS support on their
> > root ports.  I wonder if this is an oversight or if they are actually
> > not isolated from each other.
> >
> > > # lspci -n
> > > ...
> > > 01:00.0 0300: 1002:683f
> > > 01:00.1 0403: 1002:aab0
> > > 02:00.0 0300: 1002:6779
> > > 02:00.1 0403: 1002:aa98
> > > ...
> > >
> > > modprobe vfio_pci
> > > echo 0000:01:00.1 > /sys/bus/pci/devices/0000\:01\:00.1/driver/unbind
> > > echo 0000:02:00.1 > /sys/bus/pci/devices/0000\:02\:00.1/driver/unbind
> > > echo 1002 683f > /sys/bus/pci/drivers/vfio-pci/new_id
> > > echo 1002 aab0 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > echo 1002 6779 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > echo 1002 aa98 > /sys/bus/pci/drivers/vfio-pci/new_id
> > >
> > > # lsusb
> > > ...
> > > Bus 001 Device 008: ID 046d:c315 Logitech, Inc. Classic New Touch Keyboard
> > > Bus 001 Device 004: ID 046d:c05b Logitech, Inc. M-U0004 810-001317 [B110 Optical USB Mouse]
> > > ...
> > >
> > > - I also applied your suggested patch to the quirk function in VFIO (see below)
> > >
> > > - Here is a (trimmed for readability) command line I successfully used to boot from the Windows 7 install DVD,
> > >   notice the cd and disk device descriptions and the bus parameter - I struggled a while with that
> > >   until I came across a comment by Gerd Hoffmann here: https://bugzilla.redhat.com/show_bug.cgi?id=922670 (Thanks, Gerd!)
> > >
> > >
> > > qemu-kvm -M q35 \
> > >   -nodefconfig -readconfig $SRC/qemu/docs/q35-chipset.cfg \
> > >   -device vfio-pci,host=2:00.0,x-vga=on,multifunction=on,bus=ich9-pcie-port-1,addr=0.0 \
> > >   -device vfio-pci,host=2:00.1,bus=ich9-pcie-port-1,addr=0.1 \
> > >   -L $SRC/seabios/out/ -L $SRC/qemu/pc-bios \
> > >   -vga none -nographic -cpu host -rtc base=localtime -k no -m 8192 -smp 2 \
> > >   -drive file=/dev/sr0,index=2,media=cdrom,id=cd \
> > >   -drive file=ivm03.img,index=0,media=disk,id=ivm03 \
> > >   -device ide-drive,drive=ivm03,bus=ide.0 \
> > >   -device ide-cd,drive=cd,bus=ide.1 \
> > >   -net nic,vlan=0,model=virtio -net tap,vlan=0 \
> > >   -enable-kvm \
> > >   -device usb-host,hostbus=1,hostaddr=8 \
> > >   -device usb-host,hostbus=1,hostaddr=4
> > >
> > > - Both the graphics card seemshould really support ACS on s to have a rom but only the HD6450 let itself to "scraping".
> >
> > Did you try scraping the HD6450 while the HD7700 was the boot VGA and
> > vica versa?  The boot VGA ROM is handled in a special way and what you
> > really get is the shadow copy, which isn't what we want.
> >
> > > Anyway, supplying it to vfio did not seem to make any difference.
> > >
> > > find /sys -name rom
> > > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/rom
> > > /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/rom
> > > ...
> > >
> > > Some observations and remaining unresolved issues:
> > >
> > > - VFIO patch:
> > >   Initially (while still running with igfx_off) I observed exactly the same behaviour as [hidden email]
> > >   reported a while ago: With vfio_pci debug enabled, vfio_pci ended up spinning with repeated calls to
> > >   vfio_ati_3c3_quirk_read and repeated logs:
> > >     vfio: vfio_vga_read(0x3c3, 1) = 0x0
> > >   I patched up accordingly with
> > >
> > >
> > > diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> > > index da0e5f9..a361d06 100644
> > > --- a/hw/misc/vfio.c
> > > +++ b/hw/misc/vfio.c
> > > @@ -1291,7 +1291,7 @@ static uint64_t vfio_ati_3c3_quirk_read(void *opaque,
> > >      uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
> > >                                    addr + quirk->data.base_offset, size);
> > >  
> > > -    if (data == quirk->data.address_match) {
> > > +    if (1 || data == quirk->data.address_match) {
> > >          data = vfio_pci_read_config(&vdev->pdev, quirk->data.address_val, size);
> > >          DPRINTF("%s(0x3c3, 1) = 0x%"PRIx64"\n", __func__, data);
> > >      }
> > >
> > >
> > >   This of course did not help much until I actually got the iommu
> > >   enabled for the radeons (similar "repeated patters" as deniv reported)
> > >   but what I have observed after I got it working is that if
> > >   I disable the patch above, things are not that well: the Fedora VM
> > >   comes up with VGA and the Fedora boot screen, then goes blank when
> > >   switching to X.
> >
> > Hmm, I think we'd probably have better luck making that unconditional
> > until we have reason to do otherwise.
> >
> > > - The fact that the iommu group now extends across all my available graphics
> > >   devices now makes it difficult to  get the radeon (or catalyst) driver use to
> > >   the other card since the vfio_pci driver needs to hold it.
> > >   Not a complete showstopper since the vesa driver comes up with 1024x768..
> > >   Might it be a good idea to have an override option (exception list or similar?)
> > >   to allow the vfio_pci to be less restrictive about owning the whole group
> > >    - allow functionality over security in such case? This of course is further complicated
> > >   by the need for graphics drivers to be disabled/enabled already at the kernel prompt..
> >
> > We have a quirk in the kernel that enables us to witelist devices, but
> > yes, there is no flexibility in this w/o modifying the code and
> > rebuilding.  (see drivers/pci/quirks.c:pci_dev_acs_enabled and follow
> > the example above w/ pci_dev_dma_source - function can just return 1)
> >
> > > - There seems to be a bug in the (version F8) UEFI BIOS on the motherboard,
> > >   The BIOS offers (undocumented) a full range of selections of which PCIe
> > >   (or PCIe 1x) graphics card to use as primary, but any other selection
> > >   than the first PCIe 16x slot has no effect and the motherboard reverts
> > >   to the first slot, so to be able to test both cards, I had to put the card under test
> > >   into the second (8x) PCIe slot. I am waiting for feedback from Gigabyte on possible
> > >   fixes for this in newer BIOSes.
> > >
> > > - The ultimate goal is to try to consolidate some older Windows desktops as "seats"
> > >   on the new system, using the discrete graphics with HDMI/Displayport audio.
> > >   With the HD7700 moved to the second PCIe slot I tested both Windows and
> > >   Linux guests to try to get some sound through the HDMI audio device.
> > >   Windows complains that no usable device is available. On Linux (Fedora 18, KDE desktop),
> > >   the system settings -> multimedia dialogue never opens up which seems to indicate that
> > >   PulseAudio has problems communicating with the passed through device (?),
> > >   any hints/pointers here appreciated. From the vfio log it seems at least
> > >   config space is accessed ok.
> > >
> > > - There also seems to be issues with radeon and intel_iommu=on - if I try
> > >   to enable modesetting and normal X support for the radeon cards, X fails to start.
> > >
> > > - It would be nice if the integrated graphics could be used as the host primary display -
> > >   I would be happy if someone has any hints as to if/how the ifgx_off option
> > >   could be extended/modified to only affect iommu operation on selected device(s),
> > >   if at all possible..
> >
> > Let's see what we can discover from your DMAR.  Also send along sudo
> > lspci -vvv.  Thanks,
> >
> > Alex
> >
>
>




Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Maik Broemme
In reply to this post by Alex Williamson-3
Hi Alex,

Alex Williamson <[hidden email]> wrote:

> On Sun, 2013-05-19 at 23:26 +0400, Maik Broemme wrote:
> > Hi Knut,
> >
> > Knut Omang <[hidden email]> wrote:
> > >
> > > On Mon, 2013-05-13 at 16:23 -0600, Alex Williamson wrote:
> > > > On Mon, 2013-05-13 at 22:55 +0200, Knut Omang wrote:
> > > > > Hi all,
> > > > >
> > > > > Perfect timing from my perspective, thanks Alex!
> > > > >
> > > > > I spent the better part of the weekend testing your branches on a new system
> > > > > I just put together for this purpose, results below..
> > > > >
> > > > > On Fri, 2013-05-03 at 16:56 -0600, Alex Williamson wrote:
> > > > > ...
> > > > > > git://github.com/awilliam/linux-vfio.git vfio-vga-reset
> > > > > > git://github.com/awilliam/qemu-vfio.git vfio-vga-reset
> > > > >
> > > > > System setup:
> > > > >
> > > > > - Fedora 18 on
> > > > > - Gigabyte Z77X-UD5H motherboard
> > > > > - Intel Core i7 3770 (Ivy bridge w/integrated graphics)
> > > > > - 2 discrete graphics cards:
> > > > >
> > > > > lspci | egrep 'VGA|Audio'
> > > > > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
> > > > > 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
> > > > > 01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Caicos [Radeon HD 6450]
> > > > > 01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Caicos HDMI Audio [Radeon HD 6400 Series]
> > > > > 02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Cape Verde PRO [Radeon HD 7700 Series]
> > > > > 02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
> > > > >
> > > > > Short summary:
> > > > >
> > > > > - Once I got past a few time consuming obstacles explained below
> > > > >    - the graphics part of the graphics/hdmi audio passthrough seems to work perfect
> > > > >      on both discrete graphics cards
> > > > >      (though so far only one at at time and with some minor issues, see below)
> > > > >    - no success with the hdmi audio yet (ideas for further investigation appreciated!)
> > > >
> > > > I've had hdmi audio working with an HD7850, but only in Windows (7) and
> > > > it was using legacy interrupts for some reason instead of MSI.  I wonder
> > > > if Liux guests might work with snd_hda_intel.enable_msi=0.  I'm not sure
> > > > what's wrong with MSI, but it seems to be new with the PCI bus reset
> > > > support.
> > >
> > > In my first tries, Windows were just using a generic
> > > VGA driver, which still seems to work perfect with reboots and everything
> > > and in full screen resolution (1920x1200).
> > > However after installing the Catalyst AMD driver stack, upon boot
> > > Windows 7 now consequently get a BSOD from the graphics driver
> > > with the message:
> > >
> > > "Attempt to reset the display driver and recover from timeout failed"
> > > - a picture of the BSOD screen attached.
> > >
> > > I attach the corresponding vfio log where I added some timing code to
> > > make it easier to see when the BSOD happens (with 2 seconds of silence
> > > in the log before the VM reboots, I believe this is at 09:28:32-34 in
> > > the log.
> > >
> > > Similar behaviour both just after reboot/power cycle of the host and
> > > subsequent VM boot attempts.
> > >
> > > This is still with the HD7700 as passed through device, but after a
> > > motherboard firmware upgrade (to F14) which did not seem to affect the
> > > observed behaviour on Windows prior to Catalyst install or with Linux
> > > guest, neither did it fix the bug in selecting primary devices as I
> > > was hoping for.
> > >
> > > Let me know if you have ideas for further debugging this,
> > >
> >
> > I had a similar problem a couple of days ago and posted it in this list.
> > I got similar BSOD and tested already the following configurations:
> >
> > 1) machine: q35 / kvm: yes / vga: none   / x-vga: on  = qemu freeze
> > 2) machine: q35 / kvm: no  / vga: none   / x-vga: on  = qemu freeze
> >    (with errors below)
> > 3) machine: q35 / kvm: yes / vga: none   / x-vga: off = qemu runs with
> >    100% CPU due to no VGA init (no picture)
> > 4) machine: q35 / kvm: yes / vga: cirrus / x-vga: off = qemu runs with
> >    BOSD on loading atikmpag.sys
> > 5) machine: pc  / kvm: yes / vga: cirrus / x-vga: off = qemu runs fine
> >
> > However I've re-run the BSOD case already with the following branches
> > from Alex:
> >
> > git://github.com/awilliam/linux-vfio.git vfio-vga-reset
> > git://github.com/awilliam/qemu-vfio.git vfio-vga-reset
> >
> > Also with latest seabios and it worked so far. No more BSOD and reboot
> > of VM was also possible without suspend / resume the host between.
>
> Good to hear.  It looks like you have the same motherboard as my AMD
> test system.  An HD7850 in that system runs quite reliably with the
> branches above although I do occasionally get VGA palette corruption.
>

Good to know. I'm using a Radeon HD7870 which works fine now. I have the
same VGA palette corruption occasionally but only until Catalyst driver
is loaded. So it happens sometimes during VGA init if Windows 7 boot
logo is shown with very strange colors and went away if Catalyst driver
is loaded.

> Are you still require -vga cirrus or do the -vga none, x-vga=on cases
> work now too?  Thanks,
>

No longer required, -vga none with x-vga=on work on your branches fine
now. Not sure if there was something more changed because with original
Fedora 3.9.2 kernel it still doesn't work.

> Alex
>

--Maik

Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Knut Omang-2
In reply to this post by Maik Broemme
On Sun, 2013-05-19 at 23:26 +0400, Maik Broemme wrote:

> Hi Knut,
>
> Knut Omang <[hidden email]> wrote:
> >
> > On Mon, 2013-05-13 at 16:23 -0600, Alex Williamson wrote:
> > > On Mon, 2013-05-13 at 22:55 +0200, Knut Omang wrote:
> > > > Hi all,
> > > >
> > > > Perfect timing from my perspective, thanks Alex!
> > > >
> > > > I spent the better part of the weekend testing your branches on a new system
> > > > I just put together for this purpose, results below..
> > > >
> > > > On Fri, 2013-05-03 at 16:56 -0600, Alex Williamson wrote:
> > > > ...
> > > > > git://github.com/awilliam/linux-vfio.git vfio-vga-reset
> > > > > git://github.com/awilliam/qemu-vfio.git vfio-vga-reset
> > > >
> > > > System setup:
> > > >
> > > > - Fedora 18 on
> > > > - Gigabyte Z77X-UD5H motherboard
> > > > - Intel Core i7 3770 (Ivy bridge w/integrated graphics)
> > > > - 2 discrete graphics cards:
> > > >
> > > > lspci | egrep 'VGA|Audio'
> > > > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
> > > > 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
> > > > 01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Caicos [Radeon HD 6450]
> > > > 01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Caicos HDMI Audio [Radeon HD 6400 Series]
> > > > 02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Cape Verde PRO [Radeon HD 7700 Series]
> > > > 02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
> > > >
> > > > Short summary:
> > > >
> > > > - Once I got past a few time consuming obstacles explained below
> > > >    - the graphics part of the graphics/hdmi audio passthrough seems to work perfect
> > > >      on both discrete graphics cards
> > > >      (though so far only one at at time and with some minor issues, see below)
> > > >    - no success with the hdmi audio yet (ideas for further investigation appreciated!)
> > >
> > > I've had hdmi audio working with an HD7850, but only in Windows (7) and
> > > it was using legacy interrupts for some reason instead of MSI.  I wonder
> > > if Liux guests might work with snd_hda_intel.enable_msi=0.  I'm not sure
> > > what's wrong with MSI, but it seems to be new with the PCI bus reset
> > > support.
> >
> > In my first tries, Windows were just using a generic
> > VGA driver, which still seems to work perfect with reboots and everything
> > and in full screen resolution (1920x1200).
> > However after installing the Catalyst AMD driver stack, upon boot
> > Windows 7 now consequently get a BSOD from the graphics driver
> > with the message:
> >
> > "Attempt to reset the display driver and recover from timeout failed"
> > - a picture of the BSOD screen attached.
> >
> > I attach the corresponding vfio log where I added some timing code to
> > make it easier to see when the BSOD happens (with 2 seconds of silence
> > in the log before the VM reboots, I believe this is at 09:28:32-34 in
> > the log.
> >
> > Similar behaviour both just after reboot/power cycle of the host and
> > subsequent VM boot attempts.
> >
> > This is still with the HD7700 as passed through device, but after a
> > motherboard firmware upgrade (to F14) which did not seem to affect the
> > observed behaviour on Windows prior to Catalyst install or with Linux
> > guest, neither did it fix the bug in selecting primary devices as I
> > was hoping for.
> >
> > Let me know if you have ideas for further debugging this,
> >
>
> I had a similar problem a couple of days ago and posted it in this list.
> I got similar BSOD and tested already the following configurations:
>
> 1) machine: q35 / kvm: yes / vga: none   / x-vga: on  = qemu freeze
> 2) machine: q35 / kvm: no  / vga: none   / x-vga: on  = qemu freeze
>    (with errors below)
> 3) machine: q35 / kvm: yes / vga: none   / x-vga: off = qemu runs with
>    100% CPU due to no VGA init (no picture)
> 4) machine: q35 / kvm: yes / vga: cirrus / x-vga: off = qemu runs with
>    BOSD on loading atikmpag.sys
> 5) machine: pc  / kvm: yes / vga: cirrus / x-vga: off = qemu runs fine
>
> However I've re-run the BSOD case already with the following branches
> from Alex:
>
> git://github.com/awilliam/linux-vfio.git vfio-vga-reset
> git://github.com/awilliam/qemu-vfio.git vfio-vga-reset
>
> Also with latest seabios and it worked so far. No more BSOD and reboot
> of VM was also possible without suspend / resume the host between.

Hmm, seems you have been more lucky with the choice of chipset/CPU than
me - all my tests are also with these branches but on Intel...

Knut

> > Thanks,
> >
> > Knut
> >
> > > > - Contrary to [hidden email] I had no success with using pci-assign for VGA
> > > >   with a standard fedora 18 kernel and fairly recent qemu, nor with your branches,
> > > >
> > > > Details:
> > > >
> > > > - I started off with the required kernel parameter 'intel_iommu=on' + necessary parameters for disabling radeon
> > > >    (radeon.modeset=0 rd.driver.blacklist=radeon) using the integrated graphics as primary display
> > > >    - this caused the system to freeze (with color artifacts on the console)
> > > >
> > > > - In my naivity and because of the "i" in ifgx I tried both with
> > > >   'intel_iommu=ifgx_off' and then 'intel_iommu=on,igfx_off'
> > > >   and a full set of combinations of vfio, cards, kernels and pci-assign before I suspected
> > > >   that iommu support was turned off for **all** graphics cards with igfx_off
> > >
> > > I'm not sure why this is, looks like the code only tries to turn it off
> > > when only graphics is under the remapping device.  We'd probably need to
> > > see the DMAR to know more (/sys/firmware/acpi/tables/DMAR).
> > >
> > > > - The solution was to have integrated graphics turned off in the BIOS, and 'intel_iommu=on':
> > > >
> > > > - iommu groups:
> > > >
> > > > ls -l /sys/bus/pci/devices/0000:01:00.0/iommu_group/devices
> > > > total 0
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.0 -> ../../../../devices/pci0000:00/0000:00:01.0
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.1 -> ../../../../devices/pci0000:00/0000:00:01.1
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.0 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.1 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.1
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.0 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.0
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.1 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.1
> > > >
> > > > - eg. both the VGA/HDMI Audio pairs + the two root ports they are plugged into are in the same group:
> > >
> > > Ick.  Intel has been pretty good about advertising ACS support on their
> > > root ports.  I wonder if this is an oversight or if they are actually
> > > not isolated from each other.
> > >
> > > > # lspci -n
> > > > ...
> > > > 01:00.0 0300: 1002:683f
> > > > 01:00.1 0403: 1002:aab0
> > > > 02:00.0 0300: 1002:6779
> > > > 02:00.1 0403: 1002:aa98
> > > > ...
> > > >
> > > > modprobe vfio_pci
> > > > echo 0000:01:00.1 > /sys/bus/pci/devices/0000\:01\:00.1/driver/unbind
> > > > echo 0000:02:00.1 > /sys/bus/pci/devices/0000\:02\:00.1/driver/unbind
> > > > echo 1002 683f > /sys/bus/pci/drivers/vfio-pci/new_id
> > > > echo 1002 aab0 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > > echo 1002 6779 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > > echo 1002 aa98 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > >
> > > > # lsusb
> > > > ...
> > > > Bus 001 Device 008: ID 046d:c315 Logitech, Inc. Classic New Touch Keyboard
> > > > Bus 001 Device 004: ID 046d:c05b Logitech, Inc. M-U0004 810-001317 [B110 Optical USB Mouse]
> > > > ...
> > > >
> > > > - I also applied your suggested patch to the quirk function in VFIO (see below)
> > > >
> > > > - Here is a (trimmed for readability) command line I successfully used to boot from the Windows 7 install DVD,
> > > >   notice the cd and disk device descriptions and the bus parameter - I struggled a while with that
> > > >   until I came across a comment by Gerd Hoffmann here: https://bugzilla.redhat.com/show_bug.cgi?id=922670 (Thanks, Gerd!)
> > > >
> > > >
> > > > qemu-kvm -M q35 \
> > > >   -nodefconfig -readconfig $SRC/qemu/docs/q35-chipset.cfg \
> > > >   -device vfio-pci,host=2:00.0,x-vga=on,multifunction=on,bus=ich9-pcie-port-1,addr=0.0 \
> > > >   -device vfio-pci,host=2:00.1,bus=ich9-pcie-port-1,addr=0.1 \
> > > >   -L $SRC/seabios/out/ -L $SRC/qemu/pc-bios \
> > > >   -vga none -nographic -cpu host -rtc base=localtime -k no -m 8192 -smp 2 \
> > > >   -drive file=/dev/sr0,index=2,media=cdrom,id=cd \
> > > >   -drive file=ivm03.img,index=0,media=disk,id=ivm03 \
> > > >   -device ide-drive,drive=ivm03,bus=ide.0 \
> > > >   -device ide-cd,drive=cd,bus=ide.1 \
> > > >   -net nic,vlan=0,model=virtio -net tap,vlan=0 \
> > > >   -enable-kvm \
> > > >   -device usb-host,hostbus=1,hostaddr=8 \
> > > >   -device usb-host,hostbus=1,hostaddr=4
> > > >
> > > > - Both the graphics card seemshould really support ACS on s to have a rom but only the HD6450 let itself to "scraping".
> > >
> > > Did you try scraping the HD6450 while the HD7700 was the boot VGA and
> > > vica versa?  The boot VGA ROM is handled in a special way and what you
> > > really get is the shadow copy, which isn't what we want.
> > >
> > > > Anyway, supplying it to vfio did not seem to make any difference.
> > > >
> > > > find /sys -name rom
> > > > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/rom
> > > > /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/rom
> > > > ...
> > > >
> > > > Some observations and remaining unresolved issues:
> > > >
> > > > - VFIO patch:
> > > >   Initially (while still running with igfx_off) I observed exactly the same behaviour as [hidden email]
> > > >   reported a while ago: With vfio_pci debug enabled, vfio_pci ended up spinning with repeated calls to
> > > >   vfio_ati_3c3_quirk_read and repeated logs:
> > > >     vfio: vfio_vga_read(0x3c3, 1) = 0x0
> > > >   I patched up accordingly with
> > > >
> > > >
> > > > diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> > > > index da0e5f9..a361d06 100644
> > > > --- a/hw/misc/vfio.c
> > > > +++ b/hw/misc/vfio.c
> > > > @@ -1291,7 +1291,7 @@ static uint64_t vfio_ati_3c3_quirk_read(void *opaque,
> > > >      uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
> > > >                                    addr + quirk->data.base_offset, size);
> > > >  
> > > > -    if (data == quirk->data.address_match) {
> > > > +    if (1 || data == quirk->data.address_match) {
> > > >          data = vfio_pci_read_config(&vdev->pdev, quirk->data.address_val, size);
> > > >          DPRINTF("%s(0x3c3, 1) = 0x%"PRIx64"\n", __func__, data);
> > > >      }
> > > >
> > > >
> > > >   This of course did not help much until I actually got the iommu
> > > >   enabled for the radeons (similar "repeated patters" as deniv reported)
> > > >   but what I have observed after I got it working is that if
> > > >   I disable the patch above, things are not that well: the Fedora VM
> > > >   comes up with VGA and the Fedora boot screen, then goes blank when
> > > >   switching to X.
> > >
> > > Hmm, I think we'd probably have better luck making that unconditional
> > > until we have reason to do otherwise.
> > >
> > > > - The fact that the iommu group now extends across all my available graphics
> > > >   devices now makes it difficult to  get the radeon (or catalyst) driver use to
> > > >   the other card since the vfio_pci driver needs to hold it.
> > > >   Not a complete showstopper since the vesa driver comes up with 1024x768..
> > > >   Might it be a good idea to have an override option (exception list or similar?)
> > > >   to allow the vfio_pci to be less restrictive about owning the whole group
> > > >    - allow functionality over security in such case? This of course is further complicated
> > > >   by the need for graphics drivers to be disabled/enabled already at the kernel prompt..
> > >
> > > We have a quirk in the kernel that enables us to witelist devices, but
> > > yes, there is no flexibility in this w/o modifying the code and
> > > rebuilding.  (see drivers/pci/quirks.c:pci_dev_acs_enabled and follow
> > > the example above w/ pci_dev_dma_source - function can just return 1)
> > >
> > > > - There seems to be a bug in the (version F8) UEFI BIOS on the motherboard,
> > > >   The BIOS offers (undocumented) a full range of selections of which PCIe
> > > >   (or PCIe 1x) graphics card to use as primary, but any other selection
> > > >   than the first PCIe 16x slot has no effect and the motherboard reverts
> > > >   to the first slot, so to be able to test both cards, I had to put the card under test
> > > >   into the second (8x) PCIe slot. I am waiting for feedback from Gigabyte on possible
> > > >   fixes for this in newer BIOSes.
> > > >
> > > > - The ultimate goal is to try to consolidate some older Windows desktops as "seats"
> > > >   on the new system, using the discrete graphics with HDMI/Displayport audio.
> > > >   With the HD7700 moved to the second PCIe slot I tested both Windows and
> > > >   Linux guests to try to get some sound through the HDMI audio device.
> > > >   Windows complains that no usable device is available. On Linux (Fedora 18, KDE desktop),
> > > >   the system settings -> multimedia dialogue never opens up which seems to indicate that
> > > >   PulseAudio has problems communicating with the passed through device (?),
> > > >   any hints/pointers here appreciated. From the vfio log it seems at least
> > > >   config space is accessed ok.
> > > >
> > > > - There also seems to be issues with radeon and intel_iommu=on - if I try
> > > >   to enable modesetting and normal X support for the radeon cards, X fails to start.
> > > >
> > > > - It would be nice if the integrated graphics could be used as the host primary display -
> > > >   I would be happy if someone has any hints as to if/how the ifgx_off option
> > > >   could be extended/modified to only affect iommu operation on selected device(s),
> > > >   if at all possible..
> > >
> > > Let's see what we can discover from your DMAR.  Also send along sudo
> > > lspci -vvv.  Thanks,
> > >
> > > Alex
> > >
> >
> >
>
> --Maik



Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Knut Omang-2
In reply to this post by Alex Williamson-3
On Sun, 2013-05-19 at 22:15 -0600, Alex Williamson wrote:

> On Sun, 2013-05-19 at 17:35 +0200, Knut Omang wrote:
> > On Mon, 2013-05-13 at 16:23 -0600, Alex Williamson wrote:
> > > On Mon, 2013-05-13 at 22:55 +0200, Knut Omang wrote:
> > > > Hi all,
> > > >
> > > > Perfect timing from my perspective, thanks Alex!
> > > >
> > > > I spent the better part of the weekend testing your branches on a new system
> > > > I just put together for this purpose, results below..
> > > >
> > > > On Fri, 2013-05-03 at 16:56 -0600, Alex Williamson wrote:
> > > > ...
> > > > > git://github.com/awilliam/linux-vfio.git vfio-vga-reset
> > > > > git://github.com/awilliam/qemu-vfio.git vfio-vga-reset
> > > >
> > > > System setup:
> > > >
> > > > - Fedora 18 on
> > > > - Gigabyte Z77X-UD5H motherboard
> > > > - Intel Core i7 3770 (Ivy bridge w/integrated graphics)
> > > > - 2 discrete graphics cards:
> > > >
> > > > lspci | egrep 'VGA|Audio'
> > > > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
> > > > 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
> > > > 01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Caicos [Radeon HD 6450]
> > > > 01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Caicos HDMI Audio [Radeon HD 6400 Series]
> > > > 02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Cape Verde PRO [Radeon HD 7700 Series]
> > > > 02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
> > > >
> > > > Short summary:
> > > >
> > > > - Once I got past a few time consuming obstacles explained below
> > > >    - the graphics part of the graphics/hdmi audio passthrough seems to work perfect
> > > >      on both discrete graphics cards
> > > >      (though so far only one at at time and with some minor issues, see below)
> > > >    - no success with the hdmi audio yet (ideas for further investigation appreciated!)
> > >
> > > I've had hdmi audio working with an HD7850, but only in Windows (7) and
> > > it was using legacy interrupts for some reason instead of MSI.  I wonder
> > > if Liux guests might work with snd_hda_intel.enable_msi=0.  I'm not sure
> > > what's wrong with MSI, but it seems to be new with the PCI bus reset
> > > support.
> >
> > In my first tries, Windows were just using a generic
> > VGA driver, which still seems to work perfect with reboots and everything
> > and in full screen resolution (1920x1200).
> > However after installing the Catalyst AMD driver stack, upon boot
> > Windows 7 now consequently get a BSOD from the graphics driver
> > with the message:
> >
> > "Attempt to reset the display driver and recover from timeout failed"
> > - a picture of the BSOD screen attached.
>
> I've seen that BSOD before, but I don't know how to reproduce it.  It
> seems like I haven't seen it with the PCI bus reset code.  I'm running
> version 13.1 of the catalyst driver, you?
I first tried with the install CD that came with the card - v.13-045
then upgraded to the latest from AMD, catalyst v.13.4 which appears to
be driver v.12.104 - similar behaviour for both. This was with a plain
Windows 7 install from my SP1 DVD.

With most recommended windows updates and the latest catalyst driver,
the BSOD is gone but instead I see the initial VGA boot screen and the
windows logo, then syncs but no display and then reboot into recovery
mode. (If I try all updates, Windows seems never to be able to recover
from the last reboot)

I have tried without kvm and also with vnc or spice graphics in addition
but in those cases it seems Windows is not able to allocate MMIO
resources for both adapters so I haven't been able to test the catalyst
driver as a secondary windows display.

> > I attach the corresponding vfio log where I added some timing code to
> > make it easier to see when the BSOD happens (with 2 seconds of silence
> > in the log before the VM reboots, I believe this is at 09:28:32-34 in
> > the log.
>
> Yep, looks like that's where windows starts the BSOD.
>
> > Similar behaviour both just after reboot/power cycle of the host and
> > subsequent VM boot attempts.
> >
> > This is still with the HD7700 as passed through device, but after a
> > motherboard firmware upgrade (to F14) which did not seem to affect the
> > observed behaviour on Windows prior to Catalyst install or with Linux
> > guest, neither did it fix the bug in selecting primary devices as I
> > was hoping for.
> >
> > Let me know if you have ideas for further debugging this,
>
> I don't have any great ideas since I don't know how to reproduce the
> timeout.  Double/triple check that you're using the correct
> vfio-vga-reset branches in both qemu and kernel
>
> # grep VFIO_DEVICE_PCI_BUS_RESET qemu.git/hw/misc/vfio.c
> # grep VFIO_DEVICE_PCI_BUS_RESET linux.git/drivers/vfio/pci/vfio_pci.c
[Matches in both..]
I do believe I have used the right branches all along.

> I didn't see anything telling in your DMAR either.  The system seems to
> have just one DRHD that includes everything, so I'm not sure why you saw
> any behavior change from igfx_off.  Thanks,

After the firmware upgrade, I tried again with the integrated graphics
enabled, this time with more success - I am now able to get a GUI fedora
console on the integrated graphics, but see some colorful artifacts
there during the VGA startup on one of the Radeon cards, which goes away
with a toggle to another console and back.

Seems I have slightly mislead you with the DMAR table - sorry about that
- the table I posted was with the igfx disabled, with the igfx enabled I
see one more hardware unit dedicated to the igfx if I am able to
interpret it right (attached)

Both the HD7700 and the HD6450 behave very similar and both still starts
and displays Windows fine if I disable the Catalyst driver.

Knut

> Alex
>
> > > > - Contrary to [hidden email] I had no success with using pci-assign for VGA
> > > >   with a standard fedora 18 kernel and fairly recent qemu, nor with your branches,
> > > >
> > > > Details:
> > > >
> > > > - I started off with the required kernel parameter 'intel_iommu=on' + necessary parameters for disabling radeon
> > > >    (radeon.modeset=0 rd.driver.blacklist=radeon) using the integrated graphics as primary display
> > > >    - this caused the system to freeze (with color artifacts on the console)
> > > >
> > > > - In my naivity and because of the "i" in ifgx I tried both with
> > > >   'intel_iommu=ifgx_off' and then 'intel_iommu=on,igfx_off'
> > > >   and a full set of combinations of vfio, cards, kernels and pci-assign before I suspected
> > > >   that iommu support was turned off for **all** graphics cards with igfx_off
> > >
> > > I'm not sure why this is, looks like the code only tries to turn it off
> > > when only graphics is under the remapping device.  We'd probably need to
> > > see the DMAR to know more (/sys/firmware/acpi/tables/DMAR).
> > >
> > > > - The solution was to have integrated graphics turned off in the BIOS, and 'intel_iommu=on':
> > > >
> > > > - iommu groups:
> > > >
> > > > ls -l /sys/bus/pci/devices/0000:01:00.0/iommu_group/devices
> > > > total 0
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.0 -> ../../../../devices/pci0000:00/0000:00:01.0
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.1 -> ../../../../devices/pci0000:00/0000:00:01.1
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.0 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.1 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.1
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.0 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.0
> > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.1 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.1
> > > >
> > > > - eg. both the VGA/HDMI Audio pairs + the two root ports they are plugged into are in the same group:
> > >
> > > Ick.  Intel has been pretty good about advertising ACS support on their
> > > root ports.  I wonder if this is an oversight or if they are actually
> > > not isolated from each other.
> > >
> > > > # lspci -n
> > > > ...
> > > > 01:00.0 0300: 1002:683f
> > > > 01:00.1 0403: 1002:aab0
> > > > 02:00.0 0300: 1002:6779
> > > > 02:00.1 0403: 1002:aa98
> > > > ...
> > > >
> > > > modprobe vfio_pci
> > > > echo 0000:01:00.1 > /sys/bus/pci/devices/0000\:01\:00.1/driver/unbind
> > > > echo 0000:02:00.1 > /sys/bus/pci/devices/0000\:02\:00.1/driver/unbind
> > > > echo 1002 683f > /sys/bus/pci/drivers/vfio-pci/new_id
> > > > echo 1002 aab0 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > > echo 1002 6779 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > > echo 1002 aa98 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > >
> > > > # lsusb
> > > > ...
> > > > Bus 001 Device 008: ID 046d:c315 Logitech, Inc. Classic New Touch Keyboard
> > > > Bus 001 Device 004: ID 046d:c05b Logitech, Inc. M-U0004 810-001317 [B110 Optical USB Mouse]
> > > > ...
> > > >
> > > > - I also applied your suggested patch to the quirk function in VFIO (see below)
> > > >
> > > > - Here is a (trimmed for readability) command line I successfully used to boot from the Windows 7 install DVD,
> > > >   notice the cd and disk device descriptions and the bus parameter - I struggled a while with that
> > > >   until I came across a comment by Gerd Hoffmann here: https://bugzilla.redhat.com/show_bug.cgi?id=922670 (Thanks, Gerd!)
> > > >
> > > >
> > > > qemu-kvm -M q35 \
> > > >   -nodefconfig -readconfig $SRC/qemu/docs/q35-chipset.cfg \
> > > >   -device vfio-pci,host=2:00.0,x-vga=on,multifunction=on,bus=ich9-pcie-port-1,addr=0.0 \
> > > >   -device vfio-pci,host=2:00.1,bus=ich9-pcie-port-1,addr=0.1 \
> > > >   -L $SRC/seabios/out/ -L $SRC/qemu/pc-bios \
> > > >   -vga none -nographic -cpu host -rtc base=localtime -k no -m 8192 -smp 2 \
> > > >   -drive file=/dev/sr0,index=2,media=cdrom,id=cd \
> > > >   -drive file=ivm03.img,index=0,media=disk,id=ivm03 \
> > > >   -device ide-drive,drive=ivm03,bus=ide.0 \
> > > >   -device ide-cd,drive=cd,bus=ide.1 \
> > > >   -net nic,vlan=0,model=virtio -net tap,vlan=0 \
> > > >   -enable-kvm \
> > > >   -device usb-host,hostbus=1,hostaddr=8 \
> > > >   -device usb-host,hostbus=1,hostaddr=4
> > > >
> > > > - Both the graphics card seemshould really support ACS on s to have a rom but only the HD6450 let itself to "scraping".
> > >
> > > Did you try scraping the HD6450 while the HD7700 was the boot VGA and
> > > vica versa?  The boot VGA ROM is handled in a special way and what you
> > > really get is the shadow copy, which isn't what we want.
> > >
> > > > Anyway, supplying it to vfio did not seem to make any difference.
> > > >
> > > > find /sys -name rom
> > > > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/rom
> > > > /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/rom
> > > > ...
> > > >
> > > > Some observations and remaining unresolved issues:
> > > >
> > > > - VFIO patch:
> > > >   Initially (while still running with igfx_off) I observed exactly the same behaviour as [hidden email]
> > > >   reported a while ago: With vfio_pci debug enabled, vfio_pci ended up spinning with repeated calls to
> > > >   vfio_ati_3c3_quirk_read and repeated logs:
> > > >     vfio: vfio_vga_read(0x3c3, 1) = 0x0
> > > >   I patched up accordingly with
> > > >
> > > >
> > > > diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> > > > index da0e5f9..a361d06 100644
> > > > --- a/hw/misc/vfio.c
> > > > +++ b/hw/misc/vfio.c
> > > > @@ -1291,7 +1291,7 @@ static uint64_t vfio_ati_3c3_quirk_read(void *opaque,
> > > >      uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
> > > >                                    addr + quirk->data.base_offset, size);
> > > >  
> > > > -    if (data == quirk->data.address_match) {
> > > > +    if (1 || data == quirk->data.address_match) {
> > > >          data = vfio_pci_read_config(&vdev->pdev, quirk->data.address_val, size);
> > > >          DPRINTF("%s(0x3c3, 1) = 0x%"PRIx64"\n", __func__, data);
> > > >      }
> > > >
> > > >
> > > >   This of course did not help much until I actually got the iommu
> > > >   enabled for the radeons (similar "repeated patters" as deniv reported)
> > > >   but what I have observed after I got it working is that if
> > > >   I disable the patch above, things are not that well: the Fedora VM
> > > >   comes up with VGA and the Fedora boot screen, then goes blank when
> > > >   switching to X.
> > >
> > > Hmm, I think we'd probably have better luck making that unconditional
> > > until we have reason to do otherwise.
> > >
> > > > - The fact that the iommu group now extends across all my available graphics
> > > >   devices now makes it difficult to  get the radeon (or catalyst) driver use to
> > > >   the other card since the vfio_pci driver needs to hold it.
> > > >   Not a complete showstopper since the vesa driver comes up with 1024x768..
> > > >   Might it be a good idea to have an override option (exception list or similar?)
> > > >   to allow the vfio_pci to be less restrictive about owning the whole group
> > > >    - allow functionality over security in such case? This of course is further complicated
> > > >   by the need for graphics drivers to be disabled/enabled already at the kernel prompt..
> > >
> > > We have a quirk in the kernel that enables us to witelist devices, but
> > > yes, there is no flexibility in this w/o modifying the code and
> > > rebuilding.  (see drivers/pci/quirks.c:pci_dev_acs_enabled and follow
> > > the example above w/ pci_dev_dma_source - function can just return 1)
> > >
> > > > - There seems to be a bug in the (version F8) UEFI BIOS on the motherboard,
> > > >   The BIOS offers (undocumented) a full range of selections of which PCIe
> > > >   (or PCIe 1x) graphics card to use as primary, but any other selection
> > > >   than the first PCIe 16x slot has no effect and the motherboard reverts
> > > >   to the first slot, so to be able to test both cards, I had to put the card under test
> > > >   into the second (8x) PCIe slot. I am waiting for feedback from Gigabyte on possible
> > > >   fixes for this in newer BIOSes.
> > > >
> > > > - The ultimate goal is to try to consolidate some older Windows desktops as "seats"
> > > >   on the new system, using the discrete graphics with HDMI/Displayport audio.
> > > >   With the HD7700 moved to the second PCIe slot I tested both Windows and
> > > >   Linux guests to try to get some sound through the HDMI audio device.
> > > >   Windows complains that no usable device is available. On Linux (Fedora 18, KDE desktop),
> > > >   the system settings -> multimedia dialogue never opens up which seems to indicate that
> > > >   PulseAudio has problems communicating with the passed through device (?),
> > > >   any hints/pointers here appreciated. From the vfio log it seems at least
> > > >   config space is accessed ok.
> > > >
> > > > - There also seems to be issues with radeon and intel_iommu=on - if I try
> > > >   to enable modesetting and normal X support for the radeon cards, X fails to start.
> > > >
> > > > - It would be nice if the integrated graphics could be used as the host primary display -
> > > >   I would be happy if someone has any hints as to if/how the ifgx_off option
> > > >   could be extended/modified to only affect iommu operation on selected device(s),
> > > >   if at all possible..
> > >
> > > Let's see what we can discover from your DMAR.  Also send along sudo
> > > lspci -vvv.  Thanks,
> > >
> > > Alex
> > >
> >
> >
>
>
>


DMAR_igfx.dsl (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Justin Gottula
In reply to this post by Alex Williamson-3
On Sun, May 19, 2013 at 8:44 PM, Alex Williamson <[hidden email]> wrote:
Also, be sure you're using the correct branch to get
the PCI bus reset code.  You can verify with something like:

grep VFIO_DEVICE_PCI_BUS_RESET qemu.git/hw/misc/vfio.c
grep VFIO_DEVICE_PCI_BUS_RESET linux.git/drivers/vfio/pci/vfio_pci.c

Oops, that's embarrassing. I'm pretty sure I pulled the wrong qemu branch.

With that goof eliminated, everything is working flawlessly: the host doesn't freeze up anymore, and I no longer need to suspend between VM runs. Very nice!

Thanks for the patches and the help,

Justin
Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Maik Broemme
In reply to this post by Maik Broemme
Hi Alex,

Maik Broemme <[hidden email]> wrote:

> Hi Alex,
>
> Alex Williamson <[hidden email]> wrote:
> >
> > Good to hear.  It looks like you have the same motherboard as my AMD
> > test system.  An HD7850 in that system runs quite reliably with the
> > branches above although I do occasionally get VGA palette corruption.
> >
>
> Good to know. I'm using a Radeon HD7870 which works fine now. I have the
> same VGA palette corruption occasionally but only until Catalyst driver
> is loaded. So it happens sometimes during VGA init if Windows 7 boot
> logo is shown with very strange colors and went away if Catalyst driver
> is loaded.
>
> > Are you still require -vga cirrus or do the -vga none, x-vga=on cases
> > work now too?  Thanks,
> >
>
> No longer required, -vga none with x-vga=on work on your branches fine
> now. Not sure if there was something more changed because with original
> Fedora 3.9.2 kernel it still doesn't work.
>

Alex, I have a strange issue now with either the 'vfio-vga-reset'
branches or with the stable 3.9.4 kernel. This is my 'lspci' output:

00:14.2 Audio device: Advanced Micro Devices [AMD] nee ATI SBx00 Azalia (Intel HDA) (rev 40)
01:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 520] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GF119 HDMI Audio Controller (rev a1)
02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Pitcairn [Radeon HD 7800]
02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]

The '01:00.0' is my primary device used for Linux and '02:00.0' my
secondary for QEMU. Two new different problems:

1) If the 'nvidia.ko' binary driver is loaded for the first card, QEMU
immediately get stuck after startup and hangs with:

1140  futex(0x7f0ad9b21300, FUTEX_WAIT_PRIVATE, 2, NULL

I have the complete strace output if needed. After that I can only
terminate qemu with 'kill -9' and if I start it again the following
Oops occurs:

[  655.684121] ------------[ cut here ]------------
[  655.684134] WARNING: at lib/list_debug.c:29 __list_add+0x77/0xd0()
[  655.684151] Hardware name: GA-990FXA-UD3
[  655.684271] list_add corruption. next->prev should be prev (ffffffff81ca3d98), but was           (null). (next=ffff88041bc3fe08).
[  655.684477] Modules linked in: vhost_net macvtap macvlan tun arc4 md4 nls_utf8 cifs dns_resolver fscache vfio_pci vfio_iommu_type1 vfio bridge stp llc ip6table_filter ip6_tables it87 hwmon_vid snd_hda_codec_hdmi nvidia(POF) acpi_cpufreq mperf kvm_amd snd_hda_codec_realtek kvm crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel snd_hda_codec microcode edac_core snd_hwdep fam15h_power snd_seq edac_mce_amd snd_seq_device k10temp r8169 sp5100_tco snd_pcm mii i2c_piix4 snd_page_alloc snd_timer i2c_core snd soundcore mxm_wmi firewire_ohci firewire_core crc_itu_t wmi
[  655.685451] Pid: 2097, comm: qemu-system-x86 Tainted: PF          O 3.9.4-200.fc18.x86_64 #1
[  655.685642] Call Trace:
[  655.685738]  [<ffffffff8105f125>] warn_slowpath_common+0x75/0xa0
[  655.685851]  [<ffffffff8105f206>] warn_slowpath_fmt+0x46/0x50
[  655.685955]  [<ffffffff81316ef7>] __list_add+0x77/0xd0
[  655.686058]  [<ffffffff8108392c>] add_wait_queue+0x3c/0x60
[  655.686162]  [<ffffffff813f241d>] vga_get+0xdd/0x190
[  655.686266]  [<ffffffff81093e40>] ? try_to_wake_up+0x2d0/0x2d0
[  655.686373]  [<ffffffffa01ac625>] vfio_pci_vga_rw+0xb5/0x230 [vfio_pci]
[  655.686481]  [<ffffffffa01aa279>] vfio_pci_rw+0x39/0x80 [vfio_pci]
[  655.686587]  [<ffffffffa01aa30c>] vfio_pci_read+0x1c/0x20 [vfio_pci]
[  655.686701]  [<ffffffffa01a40e3>] vfio_device_fops_read+0x23/0x30 [vfio]
[  655.686814]  [<ffffffff811a01b9>] vfs_read+0xa9/0x180
[  655.686915]  [<ffffffff811a05ba>] sys_pread64+0x9a/0xb0
[  655.687018]  [<ffffffff81669f59>] system_call_fastpath+0x16/0x1b
[  655.687123] ---[ end trace a68eabc3660237b1 ]---

This is always reproducible. I know it is the binary driver and maybe
nobody cares but it is widely used. :)

2) If the 'nouveau.ko' driver is loaded it is even more strange. As soon
as I start qemu all my SATA links get a hard reset and kernel freezes.
No SysRQs are working anymore and only reboot helps. If needed I can
look if I can get some dumps from this freeze because it writes nothing
more to the disks.

But it is getting even more strange. I was putting the secondary card
in another PCI slot and then it started to work with nouveau module
loaded and passthrough ATI card to QEMU. But this worked only until I
started X server with nouveau X driver. As soon as X is running and I
started QEMU it hanged again in FUTEX_WAIT_PRIVATE.

3) Without loading 'nvidia.ko' or 'nouveau.ko' modules it works out of
the box with several start/stop cycles. However I have no X in this
case. ;)

Any ideas? :)

> > Alex
> >
>
> --Maik
>

--Maik

Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Alex Williamson-3
On Tue, 2013-05-28 at 03:40 +0200, Maik Broemme wrote:

> Hi Alex,
>
> Maik Broemme <[hidden email]> wrote:
> > Hi Alex,
> >
> > Alex Williamson <[hidden email]> wrote:
> > >
> > > Good to hear.  It looks like you have the same motherboard as my AMD
> > > test system.  An HD7850 in that system runs quite reliably with the
> > > branches above although I do occasionally get VGA palette corruption.
> > >
> >
> > Good to know. I'm using a Radeon HD7870 which works fine now. I have the
> > same VGA palette corruption occasionally but only until Catalyst driver
> > is loaded. So it happens sometimes during VGA init if Windows 7 boot
> > logo is shown with very strange colors and went away if Catalyst driver
> > is loaded.
> >
> > > Are you still require -vga cirrus or do the -vga none, x-vga=on cases
> > > work now too?  Thanks,
> > >
> >
> > No longer required, -vga none with x-vga=on work on your branches fine
> > now. Not sure if there was something more changed because with original
> > Fedora 3.9.2 kernel it still doesn't work.
> >
>
> Alex, I have a strange issue now with either the 'vfio-vga-reset'
> branches or with the stable 3.9.4 kernel. This is my 'lspci' output:
>
> 00:14.2 Audio device: Advanced Micro Devices [AMD] nee ATI SBx00 Azalia (Intel HDA) (rev 40)
> 01:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 520] (rev a1)
> 01:00.1 Audio device: NVIDIA Corporation GF119 HDMI Audio Controller (rev a1)
> 02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Pitcairn [Radeon HD 7800]
> 02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
>
> The '01:00.0' is my primary device used for Linux and '02:00.0' my
> secondary for QEMU. Two new different problems:
>
> 1) If the 'nvidia.ko' binary driver is loaded for the first card, QEMU
> immediately get stuck after startup and hangs with:
>
> 1140  futex(0x7f0ad9b21300, FUTEX_WAIT_PRIVATE, 2, NULL
>
> I have the complete strace output if needed. After that I can only
> terminate qemu with 'kill -9' and if I start it again the following
> Oops occurs:
>
> [  655.684121] ------------[ cut here ]------------
> [  655.684134] WARNING: at lib/list_debug.c:29 __list_add+0x77/0xd0()
> [  655.684151] Hardware name: GA-990FXA-UD3
> [  655.684271] list_add corruption. next->prev should be prev (ffffffff81ca3d98), but was           (null). (next=ffff88041bc3fe08).
> [  655.684477] Modules linked in: vhost_net macvtap macvlan tun arc4 md4 nls_utf8 cifs dns_resolver fscache vfio_pci vfio_iommu_type1 vfio bridge stp llc ip6table_filter ip6_tables it87 hwmon_vid snd_hda_codec_hdmi nvidia(POF) acpi_cpufreq mperf kvm_amd snd_hda_codec_realtek kvm crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel snd_hda_codec microcode edac_core snd_hwdep fam15h_power snd_seq edac_mce_amd snd_seq_device k10temp r8169 sp5100_tco snd_pcm mii i2c_piix4 snd_page_alloc snd_timer i2c_core snd soundcore mxm_wmi firewire_ohci firewire_core crc_itu_t wmi
> [  655.685451] Pid: 2097, comm: qemu-system-x86 Tainted: PF          O 3.9.4-200.fc18.x86_64 #1
> [  655.685642] Call Trace:
> [  655.685738]  [<ffffffff8105f125>] warn_slowpath_common+0x75/0xa0
> [  655.685851]  [<ffffffff8105f206>] warn_slowpath_fmt+0x46/0x50
> [  655.685955]  [<ffffffff81316ef7>] __list_add+0x77/0xd0
> [  655.686058]  [<ffffffff8108392c>] add_wait_queue+0x3c/0x60
> [  655.686162]  [<ffffffff813f241d>] vga_get+0xdd/0x190
> [  655.686266]  [<ffffffff81093e40>] ? try_to_wake_up+0x2d0/0x2d0
> [  655.686373]  [<ffffffffa01ac625>] vfio_pci_vga_rw+0xb5/0x230 [vfio_pci]
> [  655.686481]  [<ffffffffa01aa279>] vfio_pci_rw+0x39/0x80 [vfio_pci]
> [  655.686587]  [<ffffffffa01aa30c>] vfio_pci_read+0x1c/0x20 [vfio_pci]
> [  655.686701]  [<ffffffffa01a40e3>] vfio_device_fops_read+0x23/0x30 [vfio]
> [  655.686814]  [<ffffffff811a01b9>] vfs_read+0xa9/0x180
> [  655.686915]  [<ffffffff811a05ba>] sys_pread64+0x9a/0xb0
> [  655.687018]  [<ffffffff81669f59>] system_call_fastpath+0x16/0x1b
> [  655.687123] ---[ end trace a68eabc3660237b1 ]---
>
> This is always reproducible. I know it is the binary driver and maybe
> nobody cares but it is widely used. :)

Hmm, so perhaps the first attempt called into the vga arbiter to get the
VGA resources and the hang is because it was never able to get them.
VFIO only uses the interruptible vga_get call, so you were able to kill
the process, but maybe the vga arbiter didn't cleanup so well.  There's
not much we can do if nvidia.ko never releases the VGA resources.  The
VGA arbiter could do a better job with list cleanup on interruption, but
it doesn't seem like it would help you run w/ nvidia.ko in the host.

> 2) If the 'nouveau.ko' driver is loaded it is even more strange. As soon
> as I start qemu all my SATA links get a hard reset and kernel freezes.
> No SysRQs are working anymore and only reboot helps. If needed I can
> look if I can get some dumps from this freeze because it writes nothing
> more to the disks.

This was after a host reboot, I hope?  If you're using the
vfio-vga-reset kernel branch then a secondary bus reset happens when the
guest is started.  I have seen cases on my GA-990FXA-UD3 where the bus
doesn't come back from reset (possibly due to queued I/O on the bus).
After reset we attempt to restore device config space which hangs with
the PCI config space access lock held.  This generally results in soft
lockups and a mostly unusable system.  Does this sound similar to what
you're seeing?  This is the main problem preventing me from trying to
push the PCI bus reset patches upstream.

> But it is getting even more strange. I was putting the secondary card
> in another PCI slot and then it started to work with nouveau module
> loaded and passthrough ATI card to QEMU. But this worked only until I
> started X server with nouveau X driver. As soon as X is running and I
> started QEMU it hanged again in FUTEX_WAIT_PRIVATE.
>
> 3) Without loading 'nvidia.ko' or 'nouveau.ko' modules it works out of
> the box with several start/stop cycles. However I have no X in this
> case. ;)
>
> Any ideas? :)

I'd suspect that the nouveau X driver also isn't releasing the VGA
resources through the VGA arbiter.  That's pretty disappointing.  You
can check from userspace who own VGA via: sudo head /dev/vga_arbiter.
Without VGA arbiter we have no coordination of legacy VGA resources
between various drivers, but not all drivers support it (vgacon) and
those that do apparently don't attempt to be very fair.  We'll need to
look into fixing Xorg on the host if this is actually the problem.
Thanks,

Alex


Reply | Threaded
Open this post in threaded view
|

Re: VFIO VGA test branches

Knut Omang-2
In reply to this post by Knut Omang-2
On Mon, 2013-05-20 at 23:11 +0200, Knut Omang wrote:

> On Sun, 2013-05-19 at 22:15 -0600, Alex Williamson wrote:
> > On Sun, 2013-05-19 at 17:35 +0200, Knut Omang wrote:
> > > On Mon, 2013-05-13 at 16:23 -0600, Alex Williamson wrote:
> > > > On Mon, 2013-05-13 at 22:55 +0200, Knut Omang wrote:
> > > > > Hi all,
> > > > >
> > > > > Perfect timing from my perspective, thanks Alex!
> > > > >
> > > > > I spent the better part of the weekend testing your branches on a new system
> > > > > I just put together for this purpose, results below..
> > > > >
> > > > > On Fri, 2013-05-03 at 16:56 -0600, Alex Williamson wrote:
> > > > > ...
> > > > > > git://github.com/awilliam/linux-vfio.git vfio-vga-reset
> > > > > > git://github.com/awilliam/qemu-vfio.git vfio-vga-reset
> > > > >
> > > > > System setup:
> > > > >
> > > > > - Fedora 18 on
> > > > > - Gigabyte Z77X-UD5H motherboard
> > > > > - Intel Core i7 3770 (Ivy bridge w/integrated graphics)
> > > > > - 2 discrete graphics cards:
> > > > >
> > > > > lspci | egrep 'VGA|Audio'
> > > > > 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
> > > > > 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
> > > > > 01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Caicos [Radeon HD 6450]
> > > > > 01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Caicos HDMI Audio [Radeon HD 6400 Series]
> > > > > 02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Cape Verde PRO [Radeon HD 7700 Series]
> > > > > 02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
> > > > >
> > > > > Short summary:
> > > > >
> > > > > - Once I got past a few time consuming obstacles explained below
> > > > >    - the graphics part of the graphics/hdmi audio passthrough seems to work perfect
> > > > >      on both discrete graphics cards
> > > > >      (though so far only one at at time and with some minor issues, see below)
> > > > >    - no success with the hdmi audio yet (ideas for further investigation appreciated!)
> > > >
> > > > I've had hdmi audio working with an HD7850, but only in Windows (7) and
> > > > it was using legacy interrupts for some reason instead of MSI.  I wonder
> > > > if Liux guests might work with snd_hda_intel.enable_msi=0.  I'm not sure
> > > > what's wrong with MSI, but it seems to be new with the PCI bus reset
> > > > support.
> > >
> > > In my first tries, Windows were just using a generic
> > > VGA driver, which still seems to work perfect with reboots and everything
> > > and in full screen resolution (1920x1200).
> > > However after installing the Catalyst AMD driver stack, upon boot
> > > Windows 7 now consequently get a BSOD from the graphics driver
> > > with the message:
> > >
> > > "Attempt to reset the display driver and recover from timeout failed"
> > > - a picture of the BSOD screen attached.
> >
> > I've seen that BSOD before, but I don't know how to reproduce it.  It
> > seems like I haven't seen it with the PCI bus reset code.  I'm running
> > version 13.1 of the catalyst driver, you?
>
> I first tried with the install CD that came with the card - v.13-045
> then upgraded to the latest from AMD, catalyst v.13.4 which appears to
> be driver v.12.104 - similar behaviour for both. This was with a plain
> Windows 7 install from my SP1 DVD.
>
> With most recommended windows updates and the latest catalyst driver,
> the BSOD is gone but instead I see the initial VGA boot screen and the
> windows logo, then syncs but no display and then reboot into recovery
> mode. (If I try all updates, Windows seems never to be able to recover
> from the last reboot)
>
> I have tried without kvm and also with vnc or spice graphics in addition
> but in those cases it seems Windows is not able to allocate MMIO
> resources for both adapters so I haven't been able to test the catalyst
> driver as a secondary windows display.
>
> > > I attach the corresponding vfio log where I added some timing code to
> > > make it easier to see when the BSOD happens (with 2 seconds of silence
> > > in the log before the VM reboots, I believe this is at 09:28:32-34 in
> > > the log.
> >
> > Yep, looks like that's where windows starts the BSOD.
> >
> > > Similar behaviour both just after reboot/power cycle of the host and
> > > subsequent VM boot attempts.
> > >
> > > This is still with the HD7700 as passed through device, but after a
> > > motherboard firmware upgrade (to F14) which did not seem to affect the
> > > observed behaviour on Windows prior to Catalyst install or with Linux
> > > guest, neither did it fix the bug in selecting primary devices as I
> > > was hoping for.
> > >
> > > Let me know if you have ideas for further debugging this,
> >
> > I don't have any great ideas since I don't know how to reproduce the
> > timeout.  Double/triple check that you're using the correct
> > vfio-vga-reset branches in both qemu and kernel
> >
> > # grep VFIO_DEVICE_PCI_BUS_RESET qemu.git/hw/misc/vfio.c
> > # grep VFIO_DEVICE_PCI_BUS_RESET linux.git/drivers/vfio/pci/vfio_pci.c
>
> [Matches in both..]
> I do believe I have used the right branches all along.
>
> > I didn't see anything telling in your DMAR either.  The system seems to
> > have just one DRHD that includes everything, so I'm not sure why you saw
> > any behavior change from igfx_off.  Thanks,
>
> After the firmware upgrade, I tried again with the integrated graphics
> enabled, this time with more success - I am now able to get a GUI fedora
> console on the integrated graphics, but see some colorful artifacts
> there during the VGA startup on one of the Radeon cards, which goes away
> with a toggle to another console and back.
>
> Seems I have slightly mislead you with the DMAR table - sorry about that
> - the table I posted was with the igfx disabled, with the igfx enabled I
> see one more hardware unit dedicated to the igfx if I am able to
> interpret it right (attached)

I noticed this warning in the host log - I suppose it is unrelated but
thought I'd mention it just in case there is some side effect I do not
understand here:

[    0.538124] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[    0.538619] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
[    0.538676] ------------[ cut here ]------------
[    0.538681] WARNING: at drivers/pci/search.c:46 pci_find_upstream_pcie_bridge+0x58/0x80()
[    0.538683] Hardware name: To be filled by O.E.M.
[    0.538685] Modules linked in:
[    0.538687] Pid: 1, comm: swapper/0 Not tainted 3.9.0+ #1
[    0.538689] Call Trace:
[    0.538694]  [<ffffffff8105ed2f>] warn_slowpath_common+0x7f/0xc0
[    0.538697]  [<ffffffff8105ed8a>] warn_slowpath_null+0x1a/0x20
[    0.538699]  [<ffffffff8132dc28>] pci_find_upstream_pcie_bridge+0x58/0x80
[    0.538703]  [<ffffffff8152e26b>] intel_iommu_add_device+0x4b/0x1f0
[    0.538706]  [<ffffffff81525b30>] ? bus_set_iommu+0x60/0x60
[    0.538708]  [<ffffffff81525b63>] add_iommu_group+0x33/0x60
[    0.538712]  [<ffffffff813f38fd>] bus_for_each_dev+0x5d/0xa0
[    0.538714]  [<ffffffff81525b1b>] bus_set_iommu+0x4b/0x60
[    0.538718]  [<ffffffff81d47d61>] intel_iommu_init+0xa72/0xb9a
[    0.538722]  [<ffffffff81d0db94>] ? memblock_find_dma_reserve+0x13d/0x13d
[    0.538724]  [<ffffffff81d0dba7>] pci_iommu_init+0x13/0x3e
[    0.538727]  [<ffffffff8100215a>] do_one_initcall+0x12a/0x180
[    0.538730]  [<ffffffff81d0603b>] kernel_init_freeable+0x150/0x1df
[    0.538732]  [<ffffffff81d0588d>] ? do_early_param+0x8c/0x8c
[    0.538736]  [<ffffffff81646580>] ? rest_init+0x80/0x80
[    0.538738]  [<ffffffff8164658e>] kernel_init+0xe/0xf0
[    0.538742]  [<ffffffff8166af6c>] ret_from_fork+0x7c/0xb0
[    0.538744]  [<ffffffff81646580>] ? rest_init+0x80/0x80
[    0.538749] ---[ end trace f4e8b5168095f9c1 ]---


> Both the HD7700 and the HD6450 behave very similar and both still starts
> and displays Windows fine if I disable the Catalyst driver.
>
> Knut
>
> > Alex
> >
> > > > > - Contrary to [hidden email] I had no success with using pci-assign for VGA
> > > > >   with a standard fedora 18 kernel and fairly recent qemu, nor with your branches,
> > > > >
> > > > > Details:
> > > > >
> > > > > - I started off with the required kernel parameter 'intel_iommu=on' + necessary parameters for disabling radeon
> > > > >    (radeon.modeset=0 rd.driver.blacklist=radeon) using the integrated graphics as primary display
> > > > >    - this caused the system to freeze (with color artifacts on the console)
> > > > >
> > > > > - In my naivity and because of the "i" in ifgx I tried both with
> > > > >   'intel_iommu=ifgx_off' and then 'intel_iommu=on,igfx_off'
> > > > >   and a full set of combinations of vfio, cards, kernels and pci-assign before I suspected
> > > > >   that iommu support was turned off for **all** graphics cards with igfx_off
> > > >
> > > > I'm not sure why this is, looks like the code only tries to turn it off
> > > > when only graphics is under the remapping device.  We'd probably need to
> > > > see the DMAR to know more (/sys/firmware/acpi/tables/DMAR).
> > > >
> > > > > - The solution was to have integrated graphics turned off in the BIOS, and 'intel_iommu=on':
> > > > >
> > > > > - iommu groups:
> > > > >
> > > > > ls -l /sys/bus/pci/devices/0000:01:00.0/iommu_group/devices
> > > > > total 0
> > > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.0 -> ../../../../devices/pci0000:00/0000:00:01.0
> > > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:00:01.1 -> ../../../../devices/pci0000:00/0000:00:01.1
> > > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.0 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0
> > > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:01:00.1 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.1
> > > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.0 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.0
> > > > > lrwxrwxrwx 1 root root 0 May 11 08:55 0000:02:00.1 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.1
> > > > >
> > > > > - eg. both the VGA/HDMI Audio pairs + the two root ports they are plugged into are in the same group:
> > > >
> > > > Ick.  Intel has been pretty good about advertising ACS support on their
> > > > root ports.  I wonder if this is an oversight or if they are actually
> > > > not isolated from each other.
> > > >
> > > > > # lspci -n
> > > > > ...
> > > > > 01:00.0 0300: 1002:683f
> > > > > 01:00.1 0403: 1002:aab0
> > > > > 02:00.0 0300: 1002:6779
> > > > > 02:00.1 0403: 1002:aa98
> > > > > ...
> > > > >
> > > > > modprobe vfio_pci
> > > > > echo 0000:01:00.1 > /sys/bus/pci/devices/0000\:01\:00.1/driver/unbind
> > > > > echo 0000:02:00.1 > /sys/bus/pci/devices/0000\:02\:00.1/driver/unbind
> > > > > echo 1002 683f > /sys/bus/pci/drivers/vfio-pci/new_id
> > > > > echo 1002 aab0 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > > > echo 1002 6779 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > > > echo 1002 aa98 > /sys/bus/pci/drivers/vfio-pci/new_id
> > > > >
> > > > > # lsusb
> > > > > ...
> > > > > Bus 001 Device 008: ID 046d:c315 Logitech, Inc. Classic New Touch Keyboard
> > > > > Bus 001 Device 004: ID 046d:c05b Logitech, Inc. M-U0004 810-001317 [B110 Optical USB Mouse]
> > > > > ...
> > > > >
> > > > > - I also applied your suggested patch to the quirk function in VFIO (see below)
> > > > >
> > > > > - Here is a (trimmed for readability) command line I successfully used to boot from the Windows 7 install DVD,
> > > > >   notice the cd and disk device descriptions and the bus parameter - I struggled a while with that
> > > > >   until I came across a comment by Gerd Hoffmann here: https://bugzilla.redhat.com/show_bug.cgi?id=922670 (Thanks, Gerd!)
> > > > >
> > > > >
> > > > > qemu-kvm -M q35 \
> > > > >   -nodefconfig -readconfig $SRC/qemu/docs/q35-chipset.cfg \
> > > > >   -device vfio-pci,host=2:00.0,x-vga=on,multifunction=on,bus=ich9-pcie-port-1,addr=0.0 \
> > > > >   -device vfio-pci,host=2:00.1,bus=ich9-pcie-port-1,addr=0.1 \
> > > > >   -L $SRC/seabios/out/ -L $SRC/qemu/pc-bios \
> > > > >   -vga none -nographic -cpu host -rtc base=localtime -k no -m 8192 -smp 2 \
> > > > >   -drive file=/dev/sr0,index=2,media=cdrom,id=cd \
> > > > >   -drive file=ivm03.img,index=0,media=disk,id=ivm03 \
> > > > >   -device ide-drive,drive=ivm03,bus=ide.0 \
> > > > >   -device ide-cd,drive=cd,bus=ide.1 \
> > > > >   -net nic,vlan=0,model=virtio -net tap,vlan=0 \
> > > > >   -enable-kvm \
> > > > >   -device usb-host,hostbus=1,hostaddr=8 \
> > > > >   -device usb-host,hostbus=1,hostaddr=4
> > > > >
> > > > > - Both the graphics card seemshould really support ACS on s to have a rom but only the HD6450 let itself to "scraping".
> > > >
> > > > Did you try scraping the HD6450 while the HD7700 was the boot VGA and
> > > > vica versa?  The boot VGA ROM is handled in a special way and what you
> > > > really get is the shadow copy, which isn't what we want.
> > > >
> > > > > Anyway, supplying it to vfio did not seem to make any difference.
> > > > >
> > > > > find /sys -name rom
> > > > > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/rom
> > > > > /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0/rom
> > > > > ...
> > > > >
> > > > > Some observations and remaining unresolved issues:
> > > > >
> > > > > - VFIO patch:
> > > > >   Initially (while still running with igfx_off) I observed exactly the same behaviour as [hidden email]
> > > > >   reported a while ago: With vfio_pci debug enabled, vfio_pci ended up spinning with repeated calls to
> > > > >   vfio_ati_3c3_quirk_read and repeated logs:
> > > > >     vfio: vfio_vga_read(0x3c3, 1) = 0x0
> > > > >   I patched up accordingly with
> > > > >
> > > > >
> > > > > diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> > > > > index da0e5f9..a361d06 100644
> > > > > --- a/hw/misc/vfio.c
> > > > > +++ b/hw/misc/vfio.c
> > > > > @@ -1291,7 +1291,7 @@ static uint64_t vfio_ati_3c3_quirk_read(void *opaque,
> > > > >      uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
> > > > >                                    addr + quirk->data.base_offset, size);
> > > > >  
> > > > > -    if (data == quirk->data.address_match) {
> > > > > +    if (1 || data == quirk->data.address_match) {
> > > > >          data = vfio_pci_read_config(&vdev->pdev, quirk->data.address_val, size);
> > > > >          DPRINTF("%s(0x3c3, 1) = 0x%"PRIx64"\n", __func__, data);
> > > > >      }
> > > > >
> > > > >
> > > > >   This of course did not help much until I actually got the iommu
> > > > >   enabled for the radeons (similar "repeated patters" as deniv reported)
> > > > >   but what I have observed after I got it working is that if
> > > > >   I disable the patch above, things are not that well: the Fedora VM
> > > > >   comes up with VGA and the Fedora boot screen, then goes blank when
> > > > >   switching to X.
> > > >
> > > > Hmm, I think we'd probably have better luck making that unconditional
> > > > until we have reason to do otherwise.
> > > >
> > > > > - The fact that the iommu group now extends across all my available graphics
> > > > >   devices now makes it difficult to  get the radeon (or catalyst) driver use to
> > > > >   the other card since the vfio_pci driver needs to hold it.
> > > > >   Not a complete showstopper since the vesa driver comes up with 1024x768..
> > > > >   Might it be a good idea to have an override option (exception list or similar?)
> > > > >   to allow the vfio_pci to be less restrictive about owning the whole group
> > > > >    - allow functionality over security in such case? This of course is further complicated
> > > > >   by the need for graphics drivers to be disabled/enabled already at the kernel prompt..
> > > >
> > > > We have a quirk in the kernel that enables us to witelist devices, but
> > > > yes, there is no flexibility in this w/o modifying the code and
> > > > rebuilding.  (see drivers/pci/quirks.c:pci_dev_acs_enabled and follow
> > > > the example above w/ pci_dev_dma_source - function can just return 1)
> > > >
> > > > > - There seems to be a bug in the (version F8) UEFI BIOS on the motherboard,
> > > > >   The BIOS offers (undocumented) a full range of selections of which PCIe
> > > > >   (or PCIe 1x) graphics card to use as primary, but any other selection
> > > > >   than the first PCIe 16x slot has no effect and the motherboard reverts
> > > > >   to the first slot, so to be able to test both cards, I had to put the card under test
> > > > >   into the second (8x) PCIe slot. I am waiting for feedback from Gigabyte on possible
> > > > >   fixes for this in newer BIOSes.
> > > > >
> > > > > - The ultimate goal is to try to consolidate some older Windows desktops as "seats"
> > > > >   on the new system, using the discrete graphics with HDMI/Displayport audio.
> > > > >   With the HD7700 moved to the second PCIe slot I tested both Windows and
> > > > >   Linux guests to try to get some sound through the HDMI audio device.
> > > > >   Windows complains that no usable device is available. On Linux (Fedora 18, KDE desktop),
> > > > >   the system settings -> multimedia dialogue never opens up which seems to indicate that
> > > > >   PulseAudio has problems communicating with the passed through device (?),
> > > > >   any hints/pointers here appreciated. From the vfio log it seems at least
> > > > >   config space is accessed ok.
> > > > >
> > > > > - There also seems to be issues with radeon and intel_iommu=on - if I try
> > > > >   to enable modesetting and normal X support for the radeon cards, X fails to start.
> > > > >
> > > > > - It would be nice if the integrated graphics could be used as the host primary display -
> > > > >   I would be happy if someone has any hints as to if/how the ifgx_off option
> > > > >   could be extended/modified to only affect iommu operation on selected device(s),
> > > > >   if at all possible..
> > > >
> > > > Let's see what we can discover from your DMAR.  Also send along sudo
> > > > lspci -vvv.  Thanks,
> > > >
> > > > Alex
> > > >
> > >
> > >
> >
> >
> >
>



12