[PATCH xserver 4/4] xfree86: promote one GPU screen if (NumScreens == 0 && NumGPUScreens > 0)
Hans de Goede
hdegoede at redhat.com
Wed Sep 7 10:33:35 UTC 2016
Hi,
On 07-09-16 12:20, Laszlo Ersek wrote:
> On 09/07/16 11:54, Hans de Goede wrote:
>> Hi,
>>
>> On 06-09-16 22:32, Laszlo Ersek wrote:
>>> On 09/06/16 14:46, Hans de Goede wrote:
>>>> Hi,
>>>>
>>>> On 06-09-16 13:47, Laszlo Ersek wrote:
>>>>> Adding Matt and Ard
>>>>>
>>>>> On 09/06/16 00:32, Dave Airlie wrote:
>>>>
>>>> <snip>
>>>>
>>>>> Picking different primaries on different boots sounds perfectly
>>>>> acceptable to me as long as:
>>>>> - no explicit DeviceId.BusID is specified by the user,
>>>>> - no primary emerges otherwise,
>>>>> - there are several secondaries.
>>>>>
>>>>> This behavior is clearly not ideal in the long term (like at the third,
>>>>> fourth, fifth etc boots of the same machine), but the user can
>>>>> trivially
>>>>> fix it *after* the installation finishes (by setting Device.BusID).
>>>>> Plus, the suggested documentation (man page) and the server log are
>>>>> clear about it.
>>>>>
>>>>> Importantly, the suggested behavior (even if non-deterministic) is
>>>>> clearly superior to the X Server not starting at all. "Output
>>>>> somewhere"
>>>>> is better than no output at all. (And, in 99% of the cases, there's
>>>>> only
>>>>> one secondary, which gives exactly the expected result.)
>>>>>
>>>>> ... If it made the series acceptable, I'd be happy to tweak patch #4 so
>>>>> that the code get active only for
>>>>>
>>>>> (NumScreens == 0 && NumGPUScreens == 1)
>>>>
>>>> I agree that this is a reasonable thing to do, if there is only
>>>> one GPUScreen we don't get the problem of which GPUScreen to pick
>>>> and then falling back to the single available GPUScreen seems
>>>> like the sensible thing to do.
>>>>
>>>> That still leaves the problem though, that as I wrote on your original
>>>> submission, doing this after the probing is done is too late, because
>>>> the flags argumet passed into the probeSingleDevice call in
>>>> hw/xfree86/common/xf86platformBus.c: xf86platformProbeDev()
>>>> (and passed into the driver from there) must match if the screen is
>>>> going to be a normal screen or a GPU-screen.
>>>
>>> It is true that patch #3 does not take care of this universally (i.e.,
>>> in a future-proof way).
>>>
>>> It does take care of all of the *current* effects of
>>> PLATFORM_PROBE_GPU_SCREEN, which is that the modesetting driver sets
>>> XF86_ALLOCATE_GPU_SCREEN for xf86AllocateScreen().
>>>
>>> That in turn makes a difference for the "scrnIndex" and "is_gpu" fields
>>> of the newly allocated ScrnInfoRec object.
>>>
>>> And, patch #3 does handle those as part of the promotion.
>>>
>>> If the modesetting driver (or any other driver) used
>>> PLATFORM_PROBE_GPU_SCREEN for anything more than passing
>>> XF86_ALLOCATE_GPU_SCREEN to xf86AllocateScreen(), then this approach
>>> wouldn't work.
>>
>> And as I've already tried to tell you, the modesetting driver
>> does actually use it for more then just XF86_ALLOCATE_GPU_SCREEN.
>>
>> Specifically in its ScreenInit it looks at isGPU and sets
>> some internal flags based on that, so clearing isGPU after probing
>> is too late.
>
> Ah! Okay.
>
> I obviously didn't ignore your feedback, and I did grep the tree for XF86_ALLOCATE_GPU_SCREEN (and saw its only one *direct* use in the modesetting driver that I already mentioned). However, I did miss the fact that the modesetting driver looks at is_gpu after xf86AllocateScreen() returns.
>
> So yea, patch #3 is entirely busted. Thanks for explaining that in depth.
>
>>
>> Either way this is just really fragile and needlessly complicated.
>>
>>>> Hmm, looking at this closer, the easy fix I had in mind is not
>>>> possible, at least not in the place where I wanted to add it.
>>>>
>>>> Looking even further, we should already do a fallback to the first
>>>> non-primary dev for seat0 servers and at the right time (before calling
>>>> probe), take a look at: hw/xfree86/common/xf86Bus.c: xf86BusProbe()
>>>>
>>>> void
>>>> xf86BusProbe(void)
>>>> {
>>>> #ifdef XSERVER_PLATFORM_BUS
>>>> xf86platformProbe();
>>>> if (ServerIsNotSeat0() && xf86_num_platform_devices > 0)
>>>> return;
>>>> #endif
>>>> #ifdef XSERVER_LIBPCIACCESS
>>>> xf86PciProbe();
>>>> #endif
>>>> #if (defined(__sparc__) || defined(__sparc)) && !defined(__OpenBSD__)
>>>> xf86SbusProbe();
>>>> #endif
>>>> #ifdef XSERVER_PLATFORM_BUS
>>>> xf86platformPrimary();
>>>> #endif
>>>> }
>>>>
>>>> A normal server is seat0, so ServerIsNotSeat0() returns False
>>>> and we end up calling xf86platformPrimary() at the end of the
>>>> function which falls back to the first non-primary dev.
>>>>
>>>> The only reason this would not work is if xf86PciProbe()
>>>> has set primaryBus.type to a value other then BUS_NONE.
>>>
>>> Sure, that does happen (I've been aware of it, but I haven't realized it
>>> as a problem); in this part of the function:
>>>
>>> /* If we haven't found a primary device try a different heuristic */
>>> if (primaryBus.type == BUS_NONE && num) {
>>> for (i = 0; i < num; i++) {
>>> uint16_t command;
>>>
>>> info = xf86PciVideoInfo[i];
>>> pci_device_cfg_read_u16(info, &command, 4);
>>>
>>> if ((command & PCI_CMD_MEM_ENABLE)
>>> && ((num == 1) || IS_VGA(info->device_class))) {
>>> if (primaryBus.type == BUS_NONE) {
>>> primaryBus.type = BUS_PCI;
>>> primaryBus.id.pci = info;
>>>
>>> IS_VGA does not apply to virtio-gpu-pci, but num==1 does.
>>>
>>> Also, virtio-gpu-pci has MMIO decoding enabled in the command register,
>>> because virtio-gpu-pci is virtio-1.0 only, and it uses an MMIO BAR for
>>> virtio register access and signaling.
>>>
>>> The corresponding log snippet is
>>>
>>> [ 190.155] (II) xfree86: Adding drm device (/dev/dri/card0)
>>> [ 389.067] (--) PCI:*(0:0:1:0) 1af4:1050:1af4:1100 rev 1, Mem @
>>> 0x10040000/4096, 0x8000000000/8388608, BIOS @ 0x????????/65536
>>>
>>> Note the "*" in "PCI:*" -- that comes from
>>>
>>> /* Print a summary of the video devices found */
>>> for (k = 0; k < num; k++) {
>>> const char *prim = " ";
>>> Bool memdone = FALSE, iodone = FALSE;
>>>
>>> info = xf86PciVideoInfo[k];
>>>
>>> if (!PCIALWAYSPRINTCLASSES(info->device_class))
>>> continue;
>>>
>>> if (xf86IsPrimaryPci(info))
>>> prim = "*";
>>>
>>> xf86Msg(X_PROBED, "PCI:%s(%u:%u:%u:%u) %04x:%04x:%04x:%04x ",
>>> prim,
>>> info->domain, info->bus, info->dev, info->func,
>>> info->vendor_id, info->device_id,
>>> info->subvendor_id, info->subdevice_id);
>>>
>>>> So I think we need to figure out why the existing fallback
>>>> path is not working.
>>>
>>> It's because the num==1 part makes xf86PciProbe() assign BUS_PCI to
>>> primaryBus.type.
>>>
>>> Here's a more complete call tree:
>>>
>>> InitOutput()
>>>
>>> xf86BusProbe()
>>> xf86platformProbe()
>>> xf86PciProbe()
>>> primaryBus.type := BUS_PCI /* due to num == 1 */
>>> xf86platformPrimary()
>>> /* does nothing */
>>>
>>> xf86BusConfig()
>>> xf86CallDriverProbe()
>>> foundScreen := xf86platformProbeDev()
>>> /* xf86PlatformDeviceCheckBusID() is never called because
>>> * the user never specified Device.BusID */
>>>
>>> /* xf86IsPrimaryPlatform() does not match
>>> * because primaryBus.type == BUS_PCI */
>>>
>>> /* autoAddGPU is TRUE, so we call: */
>>> probeSingleDevice(... PLATFORM_PROBE_GPU_SCREEN ...)
>>> /* and because that succeeds, we set: */
>>> foundScreen := TRUE
>>>
>>> /* the libpciaccess branch is not reached
>>> * because foundScreen is TRUE from xf86platformProbeDev() */
>>>
>>> I looked into why xf86IsPrimaryPlatform() wouldn't match this device in
>>> xf86platformProbeDev(). Here's my notes from earlier, standing in
>>> xf86IsPrimaryPlatform():
>>>
>>> - primaryBus.type == BUS_PCI
>>>
>>> - primaryBus.id.pci == 0x671600 (pointer to "struct pci_device")
>>>
>>> - xf86_platform_devices[0].attribs->busid == "pci:0000:00:01.0"
>>> (good, it matches the location of the virtio-gpu-pci device)
>>>
>>> - xf86_platform_devices[0].pdev == 0x670d80 (pointer to a
>>> different "struct pci_device" object)
>>>
>>> - however, the *contents* of "*primaryBus.id.pci" and
>>> "*xf86_platform_devices[0].pdev" are identical! They both
>>> describe the "virtio-gpu-pci" device.
>>>
>>> This means that xf86IsPrimaryPlatform() returns false despite
>>> "primaryBus" describing the exact same PCI device as
>>> "xf86_platform_devices[0]".
>>
>> Ah, see now we're getting somewhere, so maybe we just need to
>> teach xf86IsPrimaryPlatform to also accept devices marked
>> as primary by the xf86PciProbe() iow, make it deal with
>> primaryBus.type == BUS_PCI and in that case compare the
>> pci busid and if they match return TRUE ?
>
> Nota bene: this is *exactly* what xf86IsPrimaryPci() does, as far as I can see, just approaching it from the "other side". Compare:
>
> static Bool
> xf86IsPrimaryPlatform(struct xf86_platform_device *plat)
> {
> return ((primaryBus.type == BUS_PLATFORM) && (plat == primaryBus.id.plat));
> }
>
> versus
>
> Bool
> xf86IsPrimaryPci(struct pci_device *pPci)
> {
> if (primaryBus.type == BUS_PCI)
> return pPci == primaryBus.id.pci;
> #ifdef XSERVER_PLATFORM_BUS
> if (primaryBus.type == BUS_PLATFORM)
> if (primaryBus.id.plat->pdev)
> if (MATCH_PCI_DEVICES(primaryBus.id.plat->pdev, pPci))
> return TRUE;
> #endif
> return FALSE;
> }
>
> So xf86IsPrimaryPci() accepts a platform device as primary PCI device if the "deep description" matches, but xf86IsPrimaryPlatform() does not implement the "vice versa".
>
> I didn't recommend extending xf86IsPrimaryPlatform() like this (that is, to practically unify the two "is primary?" functions) because I found it extremely confusing that the two separate enumeration paths (platform *and* PCI) would *both* have to "look the other way" for determining a primary. I mean, what sense does it make to accept a BUS_PLATFORM device as primary PCI device, but then *also* accept a BUS_PCI device as primary platform device? One of those should really be unnecessary, dependent on the order between the enumerations.
>
>>
>> I think that should fix your problem and it matches the intend
>> of how things are supposed to work.
>
> Yes, unifying xf86IsPrimaryPlatform() and xf86IsPrimaryPci() would likely work (both including the same "deep comparison"), but as I said above, I can't fathom why "looking the other way" would be necessary in both places.
The problem really is that the new platform-bus and the old pci-bus
code overlap, platform-bus can handle any type of device including
pci devices (but also e.g. usb) where as the pci code can only
handle pci devices. Some drivers only support the old style
pci-probe methods, but the primary device detection code is
server based, not driver based, so we might end up with
a primary device which only has a pci-bus capable driver
which was detected as primary by the platform code, or
as in your case the other way around.
TL;DR: I believe that modifying the xf86IsPrimaryPlatform()
code to more or less mirror xf86IsPrimaryPci is the right thing
to do, please give this a try.
Regards,
Hans
More information about the xorg-devel
mailing list