[PATCH] dri2: add initial prime support. (v1.2)

Thu Jul 5 10:28:20 PDT 2012

On 03.07.2012, at 20:47, Dave Airlie wrote:

> On Tue, Jul 3, 2012 at 7:39 PM, Mario Kleiner
> <mario.kleiner at tuebingen.mpg.de> wrote:
>> 
>> On 03.07.2012, at 16:22, Dave Airlie wrote:
>> 
>>> From: Dave Airlie <airlied at redhat.com>
>>> 
>>> This adds the initial prime support for dri2 offload. The main thing is
>>> when we get a connection from a prime client, we stored the information
>>> and mark all drawables from that client as prime. We then create all
>>> buffers for that drawable on the prime device dri2screen.
>>> 

...

>> 
>>> +DrawablePtr DRI2UpdatePrime(DrawablePtr pDraw, DRI2BufferPtr pDest)
>>> +{
>>> +    DRI2DrawablePtr pPriv = DRI2GetDrawable(pDraw);
>>> +    PixmapPtr spix;
>>> +    PixmapPtr mpix = GetDrawablePixmap(pDraw);
>>> +    ScreenPtr master, slave;
>>> +    Bool ret;
>>> +
>>> +    master = mpix->drawable.pScreen;
>>> +
>>> +    if (pDraw->type == DRAWABLE_WINDOW) {
>>> +     WindowPtr pWin = (WindowPtr)pDraw;
>>> +     PixmapPtr pPixmap = pDraw->pScreen->GetWindowPixmap(pWin);
>>> +
>>> +     if (pDraw->pScreen->GetScreenPixmap(pDraw->pScreen) == pPixmap) {
>>> +         if (pPriv->redirectpixmap &&
>>> +             pPriv->redirectpixmap->drawable.width == pDraw->width &&
>>> +             pPriv->redirectpixmap->drawable.height == pDraw->height &&
>>> +             pPriv->redirectpixmap->drawable.depth == pDraw->depth) {
>>> +             mpix = pPriv->redirectpixmap;
>>> +         } else {
>>> +                if (master->ReplaceScanoutPixmap) {
>>> +                    mpix = (*master->CreatePixmap)(master, pDraw->width, pDraw->height,
>>> +                                                   pDraw->depth, CREATE_PIXMAP_USAGE_SHARED);
>>> +                    if (!mpix)
>>> +                        return NULL;
>>> +
>>> +                    ret = (*master->ReplaceScanoutPixmap)(pDraw, mpix, TRUE);
>>> +                    if (ret == FALSE) {
>>> +                        (*master->DestroyPixmap)(mpix);
>>> +                        return NULL;
>>> +                    }
>> 

The following bit ok?

>> I probably just don't understand the code well enough, but is a ...
>> 
>> if (pPriv->redirectpixmap) (*master->DestroyPixmap)(pPriv->redirectpixmap);
>> 
>> … missing here before the assignment of the new mpix?
>> 
>>> +                    pPriv->redirectpixmap = mpix;
>>> +                } else
>>> +                    return NULL;
>>> +         }
>>> +     } else if (pPriv->redirectpixmap) {
>>> +            (*master->ReplaceScanoutPixmap)(pDraw, pPriv->redirectpixmap, FALSE);
>>> +         (*master->DestroyPixmap)(pPriv->redirectpixmap);
>>> +         pPriv->redirectpixmap = NULL;
>>> +     }
>>> +    }
>>> +

…

>> 
>> If you define a new CopyRegion2 hook, it would be good to now also pass along the pPriv->
>> swap_interval parameter, or a "bool vsync = pPriv->swap_interval ? true : false" flag. That would allow the ddx CopyRegion2() to skip sync to vblank during copy if swap_interval is zero. Other OS'es and the NVidia/AMD binary blobs on Linux treat a zero swap interval as "don't sync to retrace". DRI2SwapBuffers() currently skips all swap scheduling on zero swap interval and uses CopyRegion(), but CopyRegion doesn't know it is a zero swap interval/no-vsync swap, so it syncs anyway. All ddx'es have a "SwapBuffersWait" xorg.conf option to disable this behaviour, but i think that was added as a workaround, not really the amount of control that applications expect.
> 
> I could do that but I really would prefer we do things in iterations
> and not confuse the interfaces, so add a CopyRegion3 with this sort of
> thing.

Ok. Also Chris Wilson just posted an updated patch for an AsyncSwap hook, which would solve the same problem, depending on what gets in, in what order.

> 
> The thing is I've not really considered timing because its really
> messy with two GPUs, current offload design is offload GPU calls
> CopyRegion2 which uses the hw copy engine to copy from the VRAM to
> GTT, the GTT object is shared with the other GPU and is the backing
> for the redirected window pixmap. So when the first driver copies to
> the second, it causes damage on the backing pixmap which gets sent to
> compositor which copies from it. Doing sync and flipping is a lot
> messier, since you probably want to sync everything on the displaying
> GPU.

Thanks for the explanation. Posting damage at the current place could also cause composition of partially copied frames, right? Just a question to see if i understand the damage tracking correctly.

> 
> 
>> If so, how does a compositor in the redirected case know when the offload slave has finished its blit to the redirectPixmap? How does the offload slave sync to vblank of the masters crtc in the unredirected case when its pPriv->redirectpixmap is set as a crtc's scanout pixmap?
> 
> Again sync to vblank is definitely part 2, we've been looking at
> inter-gpu semaphores (by we I mean mlankhorst), but I'd appreciate any
> input you can give since you know this stuff fairly well ;-).

Not so sure how well i know this stuff, but i totally agreed to the "definitely part 2" thing :). I'm happy if your current patches get in soonish. I think i will just shut up and wait until your first iteration has landed and stabilized in 1.13 and then also try to play with it a bit. Coincidentally i just got donated a used 2010 MacBookPro three weeks ago with builtin Intel HD graphics + NVidia GeForce GT 330M + Mini-Displayport, didn't have any time to play with it under Linux yet, but hopefully that will be a suitable setup for testing this stuff?

This is only a sketchy thought for a future iteration, and probably i'm missing something, but couldn't we use a notification mechanism similar to the current pageflip completion events and vblank events to solve it in a relatively generic way that doesn't need lots of potentially hw-specific low-level interaction between different gpu combos - or are inter-gpu semaphores "easy" to handle?

Basically:

Add a new "hw copy finished" event and callbacks to the ddx/libdrm/kernel/kms like the current pageflip events.

The offload-slave ddx could submit its CopyRegion2 command for the hw copy engine together with a request for the "hw copy finished" event. The kms driver would detect the hw copy being done (is there some hw copy engine idle interrupt maybe?), emit the event and thereby trigger the callback in the masters ddx or the x-server dri2 code.

In the redirected case, the callback handler could then post the damage to the redirectpixmap to trigger the compositor.

In the non-redirected fullscreen case, maybe we could do double-buffering on the redirectpixmap(s) and the callback would just trigger the 2nd half of the current regular DRI2Swapbuffers path, as if this is a regular bufferswap on the master, scheduling the swap via vblank events, followed by a CopyRegion or kms-pageflip as usual.

thanks,
-mario

> 
>> Currently muxless Optimus hardware is an absolute no-go for timing sensitive applications, making it difficult to find Laptops that are useable. It would be super-cool for such apps if the prime implementation could do well timing-wise on such setups.
> 
> Yes, step 1 is just enabling basic apps on them so users can play
> games etc, I think we'll have to revisit things for timing sensitive
> apps etc.
> 
> Dave.