[PATCH] DRI2: Expose API to set drawable swap limit.

Wed Oct 27 08:38:17 PDT 2010

On Oct 26, 2010, at 10:35 AM, Pauli Nieminen wrote:

> On 26/10/10 02:54 +0200, ext Mario Kleiner wrote:
>> On Oct 25, 2010, at 6:59 PM, Jesse Barnes wrote:
>>> On Mon, 25 Oct 2010 17:13:55 +0300
>>> Pauli Nieminen <ext-pauli.nieminen at nokia.com> wrote:
>>>
>>>> This allows ddx to set swap_limit if there is more than one back
>>>> buffer for drawable. Setting swap_limit has to also check if change
>>>> affects a client that is blocked.
>>>>
>>>>
...
>>>> +Bool
>>>> +DRI2SwapLimit(DrawablePtr pDraw, int swap_limit)
>>>> +{
>>>> +    DRI2DrawablePtr pPriv = DRI2GetDrawable(pDraw);
>>>> +    if (!pPriv)
>>>> +	return FALSE;
>>>> +
>>>> +    pPriv->swap_limit = swap_limit;
>>>> +
>>>> +    /* Check throttling */
>>>> +    if (pPriv->swapsPending >= pPriv->swap_limit)
>>>> +	return TRUE;
>>>> +
>>>> +    if (pPriv->target_sbc == -1 && !pPriv->blockedOnMsc) {
>>>> +	if (pPriv->blockedClient) {
>>>> +	    AttendClient(pPriv->blockedClient);
>>>> +	    pPriv->blockedClient = NULL;
>>>> +	}
>>>> +    }
>>>> +
>>>> +    return TRUE;
>>>> +}
>>
>> Ja, i think the patch is ok for doing what it does. But i think it
>> isn't sufficient to implement n-buffering reliably. We'd need more
>> clever scheduling of swaps and vblank events if multiple swaps are
>> pending.
>

<... snip ...>

> IMO, That is driver side problem. There is 2 thing missing in Intel  
> driver
> for this case.
>
> - Fair scheduling/GPU preemption.
>
> Preemption would give you better latency when there is multiple  
> clients with
> only one having expensive rendering operations.
>
> - Intel driver should have logic to synchronize flips with rendering.
>
> It is drivers job to know what goes on in HW side. That why Intel  
> driver
> should know when rendering completes and only after that assume  
> flip happens.
> If it doesn't do that then it is driver bug that has nothing to do  
> with DRI2
> swap limit.
>

Yes, but it is a limitation that i think all drivers are sharing at  
the moment.

>
> This patch doesn't change anything unless driver is ready to change
> swaplimit. Driver should change swaplimit only when it is good  
> enough to
> handle it.
>

I'm not against merging your patch for changing the swap limit. I was  
just summarizing for everybody the problems that current drivers (at  
least intel and radeon and the kernel drm part of dri2) will have/ 
cause for a swap_limit > 1. However, currently your patch allows  
changing the swap limit without approval from the ddx, which can't  
handle swap_limit > 1 reliably at the moment and that's not a bug of  
the ddx but a valid limitation of the current ddx implementations.

I think we should have some way for the drivers to back out of this  
gracefully or at least cover their tails. E.g., allow the ddx to set  
a hard upper limit for the swap_limit, in a new field max_swap_limit.  
Your patch could make sure that swap_limit is never set to a value  
higher than that. Then the intel and radeon ddx could set  
max_swap_limit = 1 unless users somehow (xorg.conf option? dri.conf?)  
opt into risky higher max_swap_limits.

Or some part of the implementation (your patch? the ddx itself?)  
should output some warning in the x log if a value is selected that  
it can't reliably handle, so app developers aren't confused to death  
about weird behaviour.

>>
>> My idea was to solve it as follows for n-buffering. If somebody feels
>> like implementing that or something better now, instead of me in a
>> couple of weeks or months, go for it:
>>
>> 1. We maintain a little fifo queue / list of pending swap requests
>> for each drawable.
>> 2. The DRI2ScheduleSwap() function just converts the incoming
>> swapbuffers request into a little struct with all the parameters
>> (target_msc, divisor, remainder, ...) and inserts it at the tail of
>> that queue, after checking that swaps_pending < current swaplimit.
>> If  (++swaps_pending) == 1, it kicks off a little
>> DRI2ReallyScheduleOneSwap() function which dequeues this first
>> request from the queue and calls into the ddx->ScheduleSwap()
>> routine. At the end of the DRI2SwapComplete() function, when a swap
>> really got completed, that DRI2ReallyScheduleOneSwap() function gets
>> called and checks if more swap requests are pending in the queue and
>> dequeues and processes the oldest request. If the queue is empty, it
>> is a no op and the whole thing goes idle again. Maybe
>> DRI2ReallyScheduleOneSwap() would also take care of msc jumps due to
>> drawables being moved between crtc's etc.
>>
>> This can't perfectly solve delays in one swapbuffer request due to
>> traffic jams in the command stream, but it could prevent that one
>> misscheduled swap will trigger the followup ones to go ugly in random
>> ways.
>>
>>
>
> This would break gl and x interaction. When swap is scheduled iver  
> has to
> already exchange the to be new front in case next request is xrender
> operation affecting same pixmap/window. In that case swap buffers  
> is barrier
> that allows client to assume old back buffer is already front. But  
> that is
> old bug for Intel.
>

Good point. I wasn't aware of that. So something like sketched above  
should/could be implemented inside the ddx instead of the common  
server code. But there, where it can already exchange to the new  
front, it should work, right?

> Driver might have to do other operations also for swap request. In  
> that case
> driver wants knowledge for about swap request as early as possible.
>
> This patch works perfectly fine if ddx driver handles hw correctly.  
> If driver
> can't handle hw right (synchronize flips with render completion)  
> then driver
> should change swaplimit. Render completion is hw specific detail  
> that shouldn't
> affect common code.
>

Good point.

> Also DRI2 code can't handle MSC jumps correctly without help from  
> DDX driver.
> Only way to handle it right is to get MSC from old and new crtc and  
> apply
> difference as offset to target_msc. Only driver can read these both  
> values
> when it has to be done.
>

I half-agree, see the response to the other patch.

>> The problems with multiple clients blocking on glXWaitForMscOML or
>> glXWaitForSbcOML() is a separate issue which would need
>> implementation of additional wait queues.
>>
>
> yes. This patch doesn't try to implement the missing linked list  
> for blocked
> clients.
>

Ja, that's just another todo item, unrelated to your patch.

>> And then, as we discussed on xds, there's the need for proper
>> timestamping of copyswaps in the kernel, otherwise this proposal
>> would only fix the pageflipped case, not the windowed case.
>
> yes. But that is kernel driver problem that it doesn't provide the  
> info.
> Too bad copy timestamp isn't what application needs.

We need copy timestamps for non-composited desktop. pageflip  
timestamps are hopefully done in a reliable way soon at the kernel  
level. We had some ideas for reliable copy timestamps to implement  
after pageflip timestamps are done.

> It needs MSC timestamp
> from compositor when that swap gets to screen. But that isn't  
> trivial to
> implement without proved protocol. Maybe the synchronization  
> extension work
> from NVIDIA could be used to provide that information.
>

Agreed, that's yet another todo.
-mario

*********************************************************************
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany

e-mail: mario.kleiner at tuebingen.mpg.de
office: +49 (0)7071/601-1623
fax:    +49 (0)7071/601-616
www:    http://www.kyb.tuebingen.mpg.de/~kleinerm
*********************************************************************
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)