Swap limit

Fri Dec 10 07:50:07 PST 2010

On 10/12/10 15:40 +0100, ext Mario Kleiner wrote:
> On 12/08/2010 05:15 PM, Pauli Nieminen wrote:
> > On 08/12/10 16:55 +0100, ext Alex Deucher wrote:
> >>
> >> One other thing that might be worth adding to DRI2 is a way for the
> >> driver to access the swap interval.  If we could, then the driver
> >> could dynamically disable things like vline waits for buffer blits or
> >> do non-vsynced pageflipping more easily if the swap interval was 0.
> >>
> >
> > Actually I tough DRI2 was already telling SwapInteval with
> >
> > *swap_target = pPriv->last_swap_target + pPriv->swap_interval;
> >
> > If swap_target doesn't advance then driver would know to flip as fast as
> > possible.
> >
> > But then I noticed a bug.
> >
> > First one is that we always blit with swapinterval is zero when it should be
> > possible to flip (even tear free).
> >
> 
> Let it tear :). I think swap_interval == 0 should cause immediate swaps 
> with tearing. All OpenGL implementations i know (Windows, MacOS classic, 
> MacOS/X, Linux with proprietary drivers, old SGI's) interpret a 
> swapinterval of zero as "swap as soon as rendering is finished, 
> immediately, don't sync to scanout cycle" and i think that makes the 
> most sense. It's good for benchmarking how fast your system can go if 
> not throttled by the monitor (useful), for crazy gamers that trade 
> visual quality for fps and for special applications that need to 
> sometimes control swap timing themselves (like my toolkit).
> 

You can run whole graphics pipeline without waiting for display and without
tearing. That would provide maximum performance for benchmarks while still
maintaining visual quality.

> > Code from DRI2ScheduleSwap:
> > /* Old DDX or no swap interval, just blit */
> > if (!ds->ScheduleSwap || !pPriv->swap_interval) {
> >
> 
> Not a "bug", just a half-done way of achieving the expected 
> swap_interval zero ;-) [ok, maybe that is a bug].
> 

If driver can't manage that without coping then driver can fallback to copy.

Simple life isn't it? :)

> It schedules an immediate copy-swap via blitting. Unfortunately the ddx 
> doesn't know about the swap_interval, so it still synchronizes the 
> execution of the blit to vsync via vline waits. That's tear-free, but it 
> depends on the location and size of the drawable and the current 
> position of the scanout if this will cause an immediate swap (if scanout 
> is outside the drawables area) or a vsync'ed swap. It's a bit undefined 
> behaviour for non-fullscreen drawables and it effectively enforces a 
> minimum swap interval of 1 for fullscreen drawables, which is not what 
> we want.
> 
> Both the intel ddx and (soon) the ati ddx have a xorg.conf parameter 
> "SwapBuffersWait" to disable vsync completely, then you'd get copy-swaps 
> as fast as possible for swap_interval zero, but for a non-zero 
> swap_interval you'd have some chance of tearing, so this is alos just a 
> band-aid.
>

I agree.

> If we wanted this for page-flipping we'd need a shortcut similar to this 
> one which would bypass the vblank scheduling and call the ddx pageflip 
> routine directly, plus an extension to the kernel's pageflip ioctl() to 
> allow non-vsynced flips.
> 
> And we need a new interface to tell the ddx that a swap should be 
> non-vsync'ed.
> 
> -mario

Isn't it simple that if swap_target is current vblank count then flip should
happen immediately. I don't see any need for special path. With the MSC hack
in that path swap_target will be always current frame for drivers that
support GetMSG. For others swap_target stays same all the time for the
drawable.

"Tearing swap limit 0"

Just toggle hw to mode that it flips immediately after writing to flip
registers. Now you can just reserve a buffer to be front, a buffer to back
and N buffers for GPU queue. In practice that would mean 3 buffers as
application doesn't run over 1 frame a head of GPU.

"Tear free swap limit 0"

How to do non-blocking fliping without forcing flip happen while scanout
happens. 

First there is only 2 buffers used. One for front and one for back.

1st frame comes to schedule swap

Driver queues flip to happen after frame completes. Flip will take effect in
next vblank. Now there is need to allocate one more buffer and increase swap
limit because we have 2 buffers reserved as front.

1st frame completes -> flip is actually written to hw

2nd frame comes to schedule swap

Driver queues flip to pipeline that will overwrite firs flip if frame
completes before next vblank. Because no buffers has been freed by hw by now
we have to allocate one more buffer. Now we are up to 4 buffers already.

2nd frame completes -> flip is actually written to hw

Now the back buffer for 1st frame is free and we can reuse it. It never ended
up to display.

3rd frame comes to schedlue swap

We can reuse buffer so we can stay at 4 buffers.

vblank makes 2nd frame vissible to screen.

Now original front buffer is free and ready to be reused. But back buffer for
2nd frame is going to stay as front untill next vblank.

3rd frame completes -> flip

If GPU is never over one frame behind CPU we can manage with 4 buffers
without blocking.

Front buffer is always reserved until vblank event that flips aways from the
buffer.

Back buffer is reserved when getbuffers is called. Rservervation continues
until one of following conditions is filled:
1. Back is turned to be front in vblank so reservation ends for that buffer
after next flip.
2. If the next frame completes before vblank this back never got to screen
and it is free.

So formula for reserved buffers is
3 + number of frames that GPU can be behind CPU

Pauli