Swap limit

Mon Dec 13 09:31:56 PST 2010

On Fri, Dec 10, 2010 at 10:50 AM, Pauli Nieminen
<ext-pauli.nieminen at nokia.com> wrote:
> On 10/12/10 15:40 +0100, ext Mario Kleiner wrote:
>> On 12/08/2010 05:15 PM, Pauli Nieminen wrote:
>> > On 08/12/10 16:55 +0100, ext Alex Deucher wrote:
>> >>
>> >> One other thing that might be worth adding to DRI2 is a way for the
>> >> driver to access the swap interval.  If we could, then the driver
>> >> could dynamically disable things like vline waits for buffer blits or
>> >> do non-vsynced pageflipping more easily if the swap interval was 0.
>> >>
>> >
>> > Actually I tough DRI2 was already telling SwapInteval with
>> >
>> > *swap_target = pPriv->last_swap_target + pPriv->swap_interval;
>> >
>> > If swap_target doesn't advance then driver would know to flip as fast as
>> > possible.
>> >
>> > But then I noticed a bug.
>> >
>> > First one is that we always blit with swapinterval is zero when it should be
>> > possible to flip (even tear free).
>> >
>>
>> Let it tear :). I think swap_interval == 0 should cause immediate swaps
>> with tearing. All OpenGL implementations i know (Windows, MacOS classic,
>> MacOS/X, Linux with proprietary drivers, old SGI's) interpret a
>> swapinterval of zero as "swap as soon as rendering is finished,
>> immediately, don't sync to scanout cycle" and i think that makes the
>> most sense. It's good for benchmarking how fast your system can go if
>> not throttled by the monitor (useful), for crazy gamers that trade
>> visual quality for fps and for special applications that need to
>> sometimes control swap timing themselves (like my toolkit).
>>
>
> You can run whole graphics pipeline without waiting for display and without
> tearing. That would provide maximum performance for benchmarks while still
> maintaining visual quality.
>
>> > Code from DRI2ScheduleSwap:
>> > /* Old DDX or no swap interval, just blit */
>> > if (!ds->ScheduleSwap || !pPriv->swap_interval) {
>> >
>>
>> Not a "bug", just a half-done way of achieving the expected
>> swap_interval zero ;-) [ok, maybe that is a bug].
>>
>
> If driver can't manage that without coping then driver can fallback to copy.
>
> Simple life isn't it? :)
>
>> It schedules an immediate copy-swap via blitting. Unfortunately the ddx
>> doesn't know about the swap_interval, so it still synchronizes the
>> execution of the blit to vsync via vline waits. That's tear-free, but it
>> depends on the location and size of the drawable and the current
>> position of the scanout if this will cause an immediate swap (if scanout
>> is outside the drawables area) or a vsync'ed swap. It's a bit undefined
>> behaviour for non-fullscreen drawables and it effectively enforces a
>> minimum swap interval of 1 for fullscreen drawables, which is not what
>> we want.
>>
>> Both the intel ddx and (soon) the ati ddx have a xorg.conf parameter
>> "SwapBuffersWait" to disable vsync completely, then you'd get copy-swaps
>> as fast as possible for swap_interval zero, but for a non-zero
>> swap_interval you'd have some chance of tearing, so this is alos just a
>> band-aid.
>>
>
> I agree.
>
>> If we wanted this for page-flipping we'd need a shortcut similar to this
>> one which would bypass the vblank scheduling and call the ddx pageflip
>> routine directly, plus an extension to the kernel's pageflip ioctl() to
>> allow non-vsynced flips.
>>
>> And we need a new interface to tell the ddx that a swap should be
>> non-vsync'ed.
>>
>> -mario
>
> Isn't it simple that if swap_target is current vblank count then flip should
> happen immediately. I don't see any need for special path. With the MSC hack
> in that path swap_target will be always current frame for drivers that
> support GetMSG. For others swap_target stays same all the time for the
> drawable.
>
> "Tearing swap limit 0"
>
> Just toggle hw to mode that it flips immediately after writing to flip
> registers. Now you can just reserve a buffer to be front, a buffer to back
> and N buffers for GPU queue. In practice that would mean 3 buffers as
> application doesn't run over 1 frame a head of GPU.
>
> "Tear free swap limit 0"
>
> How to do non-blocking fliping without forcing flip happen while scanout
> happens.
>
> First there is only 2 buffers used. One for front and one for back.
>
> 1st frame comes to schedule swap
>
> Driver queues flip to happen after frame completes. Flip will take effect in
> next vblank. Now there is need to allocate one more buffer and increase swap
> limit because we have 2 buffers reserved as front.
>
> 1st frame completes -> flip is actually written to hw
>
> 2nd frame comes to schedule swap
>
> Driver queues flip to pipeline that will overwrite firs flip if frame
> completes before next vblank. Because no buffers has been freed by hw by now
> we have to allocate one more buffer. Now we are up to 4 buffers already.
>
> 2nd frame completes -> flip is actually written to hw
>
> Now the back buffer for 1st frame is free and we can reuse it. It never ended
> up to display.
>
> 3rd frame comes to schedlue swap
>
> We can reuse buffer so we can stay at 4 buffers.
>
> vblank makes 2nd frame vissible to screen.
>
> Now original front buffer is free and ready to be reused. But back buffer for
> 2nd frame is going to stay as front untill next vblank.
>
> 3rd frame completes -> flip
>
> If GPU is never over one frame behind CPU we can manage with 4 buffers
> without blocking.
>
> Front buffer is always reserved until vblank event that flips aways from the
> buffer.
>
> Back buffer is reserved when getbuffers is called. Rservervation continues
> until one of following conditions is filled:
> 1. Back is turned to be front in vblank so reservation ends for that buffer
> after next flip.
> 2. If the next frame completes before vblank this back never got to screen
> and it is free.
>
> So formula for reserved buffers is
> 3 + number of frames that GPU can be behind CPU
>
> Pauli

As everything is serialized the question is do you want to synchronize
or not, if you don't want to synchronize just use 2 buffer and draw &
swap btw them, worst case is some rendering happen on a still scanned
out buffer. This is basically benchmark mode

If you want to present full frame without glitch then you likely want
to rate limit your application to the refresh rate in that case you
need at most 3 buffer. Buffer A is the one currently displayed, buffer
B is the one rendering is already scheduled and swapping will happen
once rendering is done, buffer C is the one for the next rendering. So
userspace never waits, the buffer C is always available. The buffer A
become the buffer C once buffer B is scanedout and C becomes B.

The case you are describing is if you don't want to rate limit but
still want complete non glitched frame, i think these usecase is
useless if we implement the tearing swapbuffer. With tearing swap
buffer a buffer with a full picture can become current right away and
thus you can queue right after swap the rendering to the previous
buffer. I am pretty sure this is what games want so they know that
they go full speed, they can also try to frame limit by asking for
vsync. Thus i think what Mario ask for is what we should try to
implement. This would need new ioctl for pageflip (asking if the flip
should be vsynced or not) and maybe also for vblank stuff.

Cheers,
Jerome