[PATCH] DRI2: Expose API to set drawable swap limit.

Mon Oct 25 17:54:04 PDT 2010

On Oct 25, 2010, at 6:59 PM, Jesse Barnes wrote:
> On Mon, 25 Oct 2010 17:13:55 +0300
> Pauli Nieminen <ext-pauli.nieminen at nokia.com> wrote:
>
>> This allows ddx to set swap_limit if there is more than one back
>> buffer for drawable. Setting swap_limit has to also check if change
>> affects a client that is blocked.
>>
>> This can be used to implement N-buffering in driver with minimal
>> logic in allocation and selecting next back.
>>
>> Signed-off-by: Pauli Nieminen <ext-pauli.nieminen at nokia.com>
>> Tested-by: Francisco Jerez <currojerez at riseup.net>
>> CC: Kristian HÃ¸gsberg <krh at bitplanet.net>
>> CC: Jesse Barnes <jbarnes at virtuousgeek.org>
>> ---
>>  hw/xfree86/dri2/dri2.c |   23 +++++++++++++++++++++++
>>  hw/xfree86/dri2/dri2.h |    1 +
>>  2 files changed, 24 insertions(+), 0 deletions(-)
>>
>> diff --git a/hw/xfree86/dri2/dri2.c b/hw/xfree86/dri2/dri2.c
>> index 34f735f..c96eb35 100644
>> --- a/hw/xfree86/dri2/dri2.c
>> +++ b/hw/xfree86/dri2/dri2.c
>> @@ -190,6 +190,29 @@ DRI2AllocateDrawable(DrawablePtr pDraw)
>>      return pPriv;
>>  }
>>
>> +Bool
>> +DRI2SwapLimit(DrawablePtr pDraw, int swap_limit)
>> +{
>> +    DRI2DrawablePtr pPriv = DRI2GetDrawable(pDraw);
>> +    if (!pPriv)
>> +	return FALSE;
>> +
>> +    pPriv->swap_limit = swap_limit;
>> +
>> +    /* Check throttling */
>> +    if (pPriv->swapsPending >= pPriv->swap_limit)
>> +	return TRUE;
>> +
>> +    if (pPriv->target_sbc == -1 && !pPriv->blockedOnMsc) {
>> +	if (pPriv->blockedClient) {
>> +	    AttendClient(pPriv->blockedClient);
>> +	    pPriv->blockedClient = NULL;
>> +	}
>> +    }
>> +
>> +    return TRUE;
>> +}
>> +
>>  typedef struct DRI2DrawableRefRec {
>>      XID		  id;
>>      XID		  dri2_id;
>> diff --git a/hw/xfree86/dri2/dri2.h b/hw/xfree86/dri2/dri2.h
>> index fe0bf6c..0e0bea4 100644
>> --- a/hw/xfree86/dri2/dri2.h
>> +++ b/hw/xfree86/dri2/dri2.h
>> @@ -251,6 +251,7 @@ extern _X_EXPORT DRI2BufferPtr  
>> *DRI2GetBuffersWithFormat(DrawablePtr pDraw,
>>  	int *out_count);
>>
>>  extern _X_EXPORT void DRI2SwapInterval(DrawablePtr pDrawable, int  
>> interval);
>> +extern _X_EXPORT Bool DRI2SwapLimit(DrawablePtr pDraw, int  
>> swap_limit);
>>  extern _X_EXPORT int DRI2SwapBuffers(ClientPtr client,  
>> DrawablePtr pDrawable,
>>  				     CARD64 target_msc, CARD64 divisor,
>>  				     CARD64 remainder, CARD64 *swap_target,
>
> I think this is ok, but I'd like Mario to ack it as well; I think we
> can still get into trouble if multiple clients are sharing a drawable
> and doing waits on it.
> -- 
> Jesse Barnes, Intel Open Source Technology Center

Ja, i think the patch is ok for doing what it does. But i think it  
isn't sufficient to implement n-buffering reliably. We'd need more  
clever scheduling of swaps and vblank events if multiple swaps are  
pending.

The problem is wrong presentation timing for copyswaps if the gpu is  
loaded and really corrupt/wrong presentation in the pageflip case.

Assume a client A submits 10 render + swapbuffers requests in a row,  
for presentation at ten consecutive refresh intervals (and the  
swaplimit is >=10). The driver would happily queue 10 vblank events  
in the kernel for consecutive vblank counts.

Meanwhile client B submits a huge rendering request into the command  
stream, worth multiple video refresh durations of rendering time.

1. BBBBBBBBB -> gpu

The 1st vblank event fires, gets processed by the server and a  
copyswap commandbuffer A gets submitted into the cs and a  
intel_swap_event or OML_sync_control timestamp delivered to the  
client about the supposed time of swap completion.

2. ABBBBBBBB -> gpu

After 1st swap request is stuck behind client B's command buffers for  
at least 1 video refresh cycle, the next vblank event fires as  
scheduled and the next copyswap buffer gets queued, more wrong  
timestamps get delivered:

3. AABBBBBBB -> gpu

In the end they all get executed at some point, but their true swap  
timing has nothing to do with what was specified in the  
glXSwapBuffersMscOML() call and the (divisor, remainder) mechanism  
for dealing with such delayed swaps fails to anything useful  
completely. Also all delivered timestamps and msc counts are wrong -  
they indicate perfect timing which has nothing to do with reality.

Pageflipped swaps are worse. Step 1 and 2 would be the same for  
pageflip, but in step 2 a pageflip would be queued inside the  
kernel's pageflip ioctl(). One refresh cycle later, when the next  
vblank fires, the ioctl() gets called again to schedule the next  
pageflip while the 1st one hasn't completed yet. Result is that the  
ioctl() returns EBUSY and the swap is discarded. Same for some of the  
following swaps if the gpu is a bit busy. So with pageflipping, not  
only the timting is wrong, but many of the rendered frames aren't  
presented at all and stuff gets really out of sequence.

These funny things can't happen as long as we only allow 1 pending  
swaps (swaplimit == 1).

My idea was to solve it as follows for n-buffering. If somebody feels  
like implementing that or something better now, instead of me in a  
couple of weeks or months, go for it:

1. We maintain a little fifo queue / list of pending swap requests  
for each drawable.
2. The DRI2ScheduleSwap() function just converts the incoming  
swapbuffers request into a little struct with all the parameters  
(target_msc, divisor, remainder, ...) and inserts it at the tail of  
that queue, after checking that swaps_pending < current swaplimit.  
If  (++swaps_pending) == 1, it kicks off a little  
DRI2ReallyScheduleOneSwap() function which dequeues this first  
request from the queue and calls into the ddx->ScheduleSwap()  
routine. At the end of the DRI2SwapComplete() function, when a swap  
really got completed, that DRI2ReallyScheduleOneSwap() function gets  
called and checks if more swap requests are pending in the queue and  
dequeues and processes the oldest request. If the queue is empty, it  
is a no op and the whole thing goes idle again. Maybe  
DRI2ReallyScheduleOneSwap() would also take care of msc jumps due to  
drawables being moved between crtc's etc.

This can't perfectly solve delays in one swapbuffer request due to  
traffic jams in the command stream, but it could prevent that one  
misscheduled swap will trigger the followup ones to go ugly in random  
ways.

The problems with multiple clients blocking on glXWaitForMscOML or  
glXWaitForSbcOML() is a separate issue which would need  
implementation of additional wait queues.

And then, as we discussed on xds, there's the need for proper  
timestamping of copyswaps in the kernel, otherwise this proposal  
would only fix the pageflipped case, not the windowed case.

thanks,
-mario

*********************************************************************
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany

e-mail: mario.kleiner at tuebingen.mpg.de
office: +49 (0)7071/601-1623
fax:    +49 (0)7071/601-616
www:    http://www.kyb.tuebingen.mpg.de/~kleinerm
*********************************************************************
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)