[PATCH] DRI2: Expose API to set drawable swap limit.

Pauli Nieminen ext-pauli.nieminen at nokia.com
Tue Oct 26 01:35:56 PDT 2010


On 26/10/10 02:54 +0200, ext Mario Kleiner wrote:
> On Oct 25, 2010, at 6:59 PM, Jesse Barnes wrote:
> > On Mon, 25 Oct 2010 17:13:55 +0300
> > Pauli Nieminen <ext-pauli.nieminen at nokia.com> wrote:
> >
> >> This allows ddx to set swap_limit if there is more than one back
> >> buffer for drawable. Setting swap_limit has to also check if change
> >> affects a client that is blocked.
> >>
> >> This can be used to implement N-buffering in driver with minimal
> >> logic in allocation and selecting next back.
> >>
> >> Signed-off-by: Pauli Nieminen <ext-pauli.nieminen at nokia.com>
> >> Tested-by: Francisco Jerez <currojerez at riseup.net>
> >> CC: Kristian Høgsberg <krh at bitplanet.net>
> >> CC: Jesse Barnes <jbarnes at virtuousgeek.org>
> >> ---
> >>  hw/xfree86/dri2/dri2.c |   23 +++++++++++++++++++++++
> >>  hw/xfree86/dri2/dri2.h |    1 +
> >>  2 files changed, 24 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/hw/xfree86/dri2/dri2.c b/hw/xfree86/dri2/dri2.c
> >> index 34f735f..c96eb35 100644
> >> --- a/hw/xfree86/dri2/dri2.c
> >> +++ b/hw/xfree86/dri2/dri2.c
> >> @@ -190,6 +190,29 @@ DRI2AllocateDrawable(DrawablePtr pDraw)
> >>      return pPriv;
> >>  }
> >>
> >> +Bool
> >> +DRI2SwapLimit(DrawablePtr pDraw, int swap_limit)
> >> +{
> >> +    DRI2DrawablePtr pPriv = DRI2GetDrawable(pDraw);
> >> +    if (!pPriv)
> >> +	return FALSE;
> >> +
> >> +    pPriv->swap_limit = swap_limit;
> >> +
> >> +    /* Check throttling */
> >> +    if (pPriv->swapsPending >= pPriv->swap_limit)
> >> +	return TRUE;
> >> +
> >> +    if (pPriv->target_sbc == -1 && !pPriv->blockedOnMsc) {
> >> +	if (pPriv->blockedClient) {
> >> +	    AttendClient(pPriv->blockedClient);
> >> +	    pPriv->blockedClient = NULL;
> >> +	}
> >> +    }
> >> +
> >> +    return TRUE;
> >> +}
> >> +
> >>  typedef struct DRI2DrawableRefRec {
> >>      XID		  id;
> >>      XID		  dri2_id;
> >> diff --git a/hw/xfree86/dri2/dri2.h b/hw/xfree86/dri2/dri2.h
> >> index fe0bf6c..0e0bea4 100644
> >> --- a/hw/xfree86/dri2/dri2.h
> >> +++ b/hw/xfree86/dri2/dri2.h
> >> @@ -251,6 +251,7 @@ extern _X_EXPORT DRI2BufferPtr  
> >> *DRI2GetBuffersWithFormat(DrawablePtr pDraw,
> >>  	int *out_count);
> >>
> >>  extern _X_EXPORT void DRI2SwapInterval(DrawablePtr pDrawable, int  
> >> interval);
> >> +extern _X_EXPORT Bool DRI2SwapLimit(DrawablePtr pDraw, int  
> >> swap_limit);
> >>  extern _X_EXPORT int DRI2SwapBuffers(ClientPtr client,  
> >> DrawablePtr pDrawable,
> >>  				     CARD64 target_msc, CARD64 divisor,
> >>  				     CARD64 remainder, CARD64 *swap_target,
> >
> > I think this is ok, but I'd like Mario to ack it as well; I think we
> > can still get into trouble if multiple clients are sharing a drawable
> > and doing waits on it.
> > -- 
> > Jesse Barnes, Intel Open Source Technology Center
> 
> 
> Ja, i think the patch is ok for doing what it does. But i think it  
> isn't sufficient to implement n-buffering reliably. We'd need more  
> clever scheduling of swaps and vblank events if multiple swaps are  
> pending.
> 
> The problem is wrong presentation timing for copyswaps if the gpu is  
> loaded and really corrupt/wrong presentation in the pageflip case.
> 
> Assume a client A submits 10 render + swapbuffers requests in a row,  
> for presentation at ten consecutive refresh intervals (and the  
> swaplimit is >=10). The driver would happily queue 10 vblank events  
> in the kernel for consecutive vblank counts.
> 
> Meanwhile client B submits a huge rendering request into the command  
> stream, worth multiple video refresh durations of rendering time.
> 
> 1. BBBBBBBBB -> gpu
> 
> The 1st vblank event fires, gets processed by the server and a  
> copyswap commandbuffer A gets submitted into the cs and a  
> intel_swap_event or OML_sync_control timestamp delivered to the  
> client about the supposed time of swap completion.
> 
> 2. ABBBBBBBB -> gpu
> 
> After 1st swap request is stuck behind client B's command buffers for  
> at least 1 video refresh cycle, the next vblank event fires as  
> scheduled and the next copyswap buffer gets queued, more wrong  
> timestamps get delivered:
> 
> 3. AABBBBBBB -> gpu
> 
> In the end they all get executed at some point, but their true swap  
> timing has nothing to do with what was specified in the  
> glXSwapBuffersMscOML() call and the (divisor, remainder) mechanism  
> for dealing with such delayed swaps fails to anything useful  
> completely. Also all delivered timestamps and msc counts are wrong -  
> they indicate perfect timing which has nothing to do with reality.
> 

IMO, That is driver side problem. There is 2 thing missing in Intel driver
for this case.

- Fair scheduling/GPU preemption.

Preemption would give you better latency when there is multiple clients with
only one having expensive rendering operations.

- Intel driver should have logic to synchronize flips with rendering.

It is drivers job to know what goes on in HW side. That why Intel driver
should know when rendering completes and only after that assume flip happens.
If it doesn't do that then it is driver bug that has nothing to do with DRI2
swap limit.

> Pageflipped swaps are worse. Step 1 and 2 would be the same for  
> pageflip, but in step 2 a pageflip would be queued inside the  
> kernel's pageflip ioctl(). One refresh cycle later, when the next  
> vblank fires, the ioctl() gets called again to schedule the next  
> pageflip while the 1st one hasn't completed yet. Result is that the  
> ioctl() returns EBUSY and the swap is discarded. Same for some of the  
> following swaps if the gpu is a bit busy. So with pageflipping, not  
> only the timting is wrong, but many of the rendered frames aren't  
> presented at all and stuff gets really out of sequence.
> 
> These funny things can't happen as long as we only allow 1 pending  
> swaps (swaplimit == 1).

This patch doesn't change anything unless driver is ready to change
swaplimit. Driver should change swaplimit only when it is good enough to
handle it.

> 
> My idea was to solve it as follows for n-buffering. If somebody feels  
> like implementing that or something better now, instead of me in a  
> couple of weeks or months, go for it:
> 
> 1. We maintain a little fifo queue / list of pending swap requests  
> for each drawable.
> 2. The DRI2ScheduleSwap() function just converts the incoming  
> swapbuffers request into a little struct with all the parameters  
> (target_msc, divisor, remainder, ...) and inserts it at the tail of  
> that queue, after checking that swaps_pending < current swaplimit.  
> If  (++swaps_pending) == 1, it kicks off a little  
> DRI2ReallyScheduleOneSwap() function which dequeues this first  
> request from the queue and calls into the ddx->ScheduleSwap()  
> routine. At the end of the DRI2SwapComplete() function, when a swap  
> really got completed, that DRI2ReallyScheduleOneSwap() function gets  
> called and checks if more swap requests are pending in the queue and  
> dequeues and processes the oldest request. If the queue is empty, it  
> is a no op and the whole thing goes idle again. Maybe  
> DRI2ReallyScheduleOneSwap() would also take care of msc jumps due to  
> drawables being moved between crtc's etc.
> 
> This can't perfectly solve delays in one swapbuffer request due to  
> traffic jams in the command stream, but it could prevent that one  
> misscheduled swap will trigger the followup ones to go ugly in random  
> ways.
> 
>

This would break gl and x interaction. When swap is scheduled iver has to
already exchange the to be new front in case next request is xrender
operation affecting same pixmap/window. In that case swap buffers is barrier
that allows client to assume old back buffer is already front. But that is
old bug for Intel.

Driver might have to do other operations also for swap request. In that case
driver wants knowledge for about swap request as early as possible.

This patch works perfectly fine if ddx driver handles hw correctly. If driver
can't handle hw right (synchronize flips with render completion) then driver
should change swaplimit. Render completion is hw specific detail that shouldn't
affect common code.

Also DRI2 code can't handle MSC jumps correctly without help from DDX driver.
Only way to handle it right is to get MSC from old and new crtc and apply
difference as offset to target_msc. Only driver can read these both values
when it has to be done.

> The problems with multiple clients blocking on glXWaitForMscOML or  
> glXWaitForSbcOML() is a separate issue which would need  
> implementation of additional wait queues.
> 

yes. This patch doesn't try to implement the missing linked list for blocked
clients.

> And then, as we discussed on xds, there's the need for proper  
> timestamping of copyswaps in the kernel, otherwise this proposal  
> would only fix the pageflipped case, not the windowed case.

yes. But that is kernel driver problem that it doesn't provide the info.
Too bad copy timestamp isn't what application needs. It needs MSC timestamp
from compositor when that swap gets to screen. But that isn't trivial to
implement without proved protocol. Maybe the synchronization extension work
from NVIDIA could be used to provide that information.

> 
> thanks,
> -mario
> 
> *********************************************************************
> Mario Kleiner
> Max Planck Institute for Biological Cybernetics
> Spemannstr. 38
> 72076 Tuebingen
> Germany
> 
> e-mail: mario.kleiner at tuebingen.mpg.de
> office: +49 (0)7071/601-1623
> fax:    +49 (0)7071/601-616
> www:    http://www.kyb.tuebingen.mpg.de/~kleinerm
> *********************************************************************
> "For a successful technology, reality must take precedence
> over public relations, for Nature cannot be fooled."
> (Richard Feynman)


More information about the xorg-devel mailing list