[PATCH] DRI2: Expose API to set drawable swap limit.
Mario Kleiner
mario.kleiner at tuebingen.mpg.de
Mon Oct 25 17:54:04 PDT 2010
On Oct 25, 2010, at 6:59 PM, Jesse Barnes wrote:
> On Mon, 25 Oct 2010 17:13:55 +0300
> Pauli Nieminen <ext-pauli.nieminen at nokia.com> wrote:
>
>> This allows ddx to set swap_limit if there is more than one back
>> buffer for drawable. Setting swap_limit has to also check if change
>> affects a client that is blocked.
>>
>> This can be used to implement N-buffering in driver with minimal
>> logic in allocation and selecting next back.
>>
>> Signed-off-by: Pauli Nieminen <ext-pauli.nieminen at nokia.com>
>> Tested-by: Francisco Jerez <currojerez at riseup.net>
>> CC: Kristian Høgsberg <krh at bitplanet.net>
>> CC: Jesse Barnes <jbarnes at virtuousgeek.org>
>> ---
>> hw/xfree86/dri2/dri2.c | 23 +++++++++++++++++++++++
>> hw/xfree86/dri2/dri2.h | 1 +
>> 2 files changed, 24 insertions(+), 0 deletions(-)
>>
>> diff --git a/hw/xfree86/dri2/dri2.c b/hw/xfree86/dri2/dri2.c
>> index 34f735f..c96eb35 100644
>> --- a/hw/xfree86/dri2/dri2.c
>> +++ b/hw/xfree86/dri2/dri2.c
>> @@ -190,6 +190,29 @@ DRI2AllocateDrawable(DrawablePtr pDraw)
>> return pPriv;
>> }
>>
>> +Bool
>> +DRI2SwapLimit(DrawablePtr pDraw, int swap_limit)
>> +{
>> + DRI2DrawablePtr pPriv = DRI2GetDrawable(pDraw);
>> + if (!pPriv)
>> + return FALSE;
>> +
>> + pPriv->swap_limit = swap_limit;
>> +
>> + /* Check throttling */
>> + if (pPriv->swapsPending >= pPriv->swap_limit)
>> + return TRUE;
>> +
>> + if (pPriv->target_sbc == -1 && !pPriv->blockedOnMsc) {
>> + if (pPriv->blockedClient) {
>> + AttendClient(pPriv->blockedClient);
>> + pPriv->blockedClient = NULL;
>> + }
>> + }
>> +
>> + return TRUE;
>> +}
>> +
>> typedef struct DRI2DrawableRefRec {
>> XID id;
>> XID dri2_id;
>> diff --git a/hw/xfree86/dri2/dri2.h b/hw/xfree86/dri2/dri2.h
>> index fe0bf6c..0e0bea4 100644
>> --- a/hw/xfree86/dri2/dri2.h
>> +++ b/hw/xfree86/dri2/dri2.h
>> @@ -251,6 +251,7 @@ extern _X_EXPORT DRI2BufferPtr
>> *DRI2GetBuffersWithFormat(DrawablePtr pDraw,
>> int *out_count);
>>
>> extern _X_EXPORT void DRI2SwapInterval(DrawablePtr pDrawable, int
>> interval);
>> +extern _X_EXPORT Bool DRI2SwapLimit(DrawablePtr pDraw, int
>> swap_limit);
>> extern _X_EXPORT int DRI2SwapBuffers(ClientPtr client,
>> DrawablePtr pDrawable,
>> CARD64 target_msc, CARD64 divisor,
>> CARD64 remainder, CARD64 *swap_target,
>
> I think this is ok, but I'd like Mario to ack it as well; I think we
> can still get into trouble if multiple clients are sharing a drawable
> and doing waits on it.
> --
> Jesse Barnes, Intel Open Source Technology Center
Ja, i think the patch is ok for doing what it does. But i think it
isn't sufficient to implement n-buffering reliably. We'd need more
clever scheduling of swaps and vblank events if multiple swaps are
pending.
The problem is wrong presentation timing for copyswaps if the gpu is
loaded and really corrupt/wrong presentation in the pageflip case.
Assume a client A submits 10 render + swapbuffers requests in a row,
for presentation at ten consecutive refresh intervals (and the
swaplimit is >=10). The driver would happily queue 10 vblank events
in the kernel for consecutive vblank counts.
Meanwhile client B submits a huge rendering request into the command
stream, worth multiple video refresh durations of rendering time.
1. BBBBBBBBB -> gpu
The 1st vblank event fires, gets processed by the server and a
copyswap commandbuffer A gets submitted into the cs and a
intel_swap_event or OML_sync_control timestamp delivered to the
client about the supposed time of swap completion.
2. ABBBBBBBB -> gpu
After 1st swap request is stuck behind client B's command buffers for
at least 1 video refresh cycle, the next vblank event fires as
scheduled and the next copyswap buffer gets queued, more wrong
timestamps get delivered:
3. AABBBBBBB -> gpu
In the end they all get executed at some point, but their true swap
timing has nothing to do with what was specified in the
glXSwapBuffersMscOML() call and the (divisor, remainder) mechanism
for dealing with such delayed swaps fails to anything useful
completely. Also all delivered timestamps and msc counts are wrong -
they indicate perfect timing which has nothing to do with reality.
Pageflipped swaps are worse. Step 1 and 2 would be the same for
pageflip, but in step 2 a pageflip would be queued inside the
kernel's pageflip ioctl(). One refresh cycle later, when the next
vblank fires, the ioctl() gets called again to schedule the next
pageflip while the 1st one hasn't completed yet. Result is that the
ioctl() returns EBUSY and the swap is discarded. Same for some of the
following swaps if the gpu is a bit busy. So with pageflipping, not
only the timting is wrong, but many of the rendered frames aren't
presented at all and stuff gets really out of sequence.
These funny things can't happen as long as we only allow 1 pending
swaps (swaplimit == 1).
My idea was to solve it as follows for n-buffering. If somebody feels
like implementing that or something better now, instead of me in a
couple of weeks or months, go for it:
1. We maintain a little fifo queue / list of pending swap requests
for each drawable.
2. The DRI2ScheduleSwap() function just converts the incoming
swapbuffers request into a little struct with all the parameters
(target_msc, divisor, remainder, ...) and inserts it at the tail of
that queue, after checking that swaps_pending < current swaplimit.
If (++swaps_pending) == 1, it kicks off a little
DRI2ReallyScheduleOneSwap() function which dequeues this first
request from the queue and calls into the ddx->ScheduleSwap()
routine. At the end of the DRI2SwapComplete() function, when a swap
really got completed, that DRI2ReallyScheduleOneSwap() function gets
called and checks if more swap requests are pending in the queue and
dequeues and processes the oldest request. If the queue is empty, it
is a no op and the whole thing goes idle again. Maybe
DRI2ReallyScheduleOneSwap() would also take care of msc jumps due to
drawables being moved between crtc's etc.
This can't perfectly solve delays in one swapbuffer request due to
traffic jams in the command stream, but it could prevent that one
misscheduled swap will trigger the followup ones to go ugly in random
ways.
The problems with multiple clients blocking on glXWaitForMscOML or
glXWaitForSbcOML() is a separate issue which would need
implementation of additional wait queues.
And then, as we discussed on xds, there's the need for proper
timestamping of copyswaps in the kernel, otherwise this proposal
would only fix the pageflipped case, not the windowed case.
thanks,
-mario
*********************************************************************
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany
e-mail: mario.kleiner at tuebingen.mpg.de
office: +49 (0)7071/601-1623
fax: +49 (0)7071/601-616
www: http://www.kyb.tuebingen.mpg.de/~kleinerm
*********************************************************************
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)
More information about the xorg-devel
mailing list