Fence Sync patches

Fri Dec 3 11:16:43 PST 2010

On Fri, 2010-12-03 at 10:13 -0800, James Jones wrote:

> I wrote a slide deck on synchronization and presentation ideas for X a year 
> ago or so before starting this work:
> 
> http://people.freedesktop.org/~aplattner/x-presentation-and-
> synchronization.pdf
> 
> Aaron presented it at XDevConf last year.  However, that doesn't really cover 
> the immediately useful function for GL composite managers: 
> XDamageSubtractAndTrigger().  I plan on putting some patches to compiz 
> together demonstrating the usage, but basically the flow is:
> 
> -Create a bunch of sync objects at startup time in your GL/GLES-based 
> compositor, and import them into your GL context as GL sync objects.  I'll 
> call those syncObjectsX[] and syncObjectsGL[] respectively.
> 
> -Rather than calling XDamageSubtract(), call 
> XDamageSubtractAndTrigger(syncObjectsX[current]).

So the basic flow here is:

 Client => X server     [render this]
 X server => GPU        [render this]
 X server => compositor [something was rendered]
 compositor => xserver  [trigger the fence]
 compositor => GPU      [render this after the fence]
 xserver => GPU         [trigger the fence]

In the normal case where there is a single damage event per frame, the
fact that we have this round trip where the compositor has to go back to
the X server, and the X server has to go back to the GPU bothers me.

It's perhaps especially problematic in the case of the open source
drivers where the synchronization is already handled correctly without
this extra work and the extra work would just be a complete waste of
time. [*]

But it doesn't seem like a particularly efficient or low-latency way of
handling things even in the case of a driver with no built in
synchronization.

Can you go into the reasoning for this approach?  

> -Prefix all the GL rendering that repairs the damage subtracted with a sync 
> wait: glWaitSync(syncObjectsGL[current++])
> 
> The GL rendering will then wait (on the GPU.  It won't block the application 
> unless it gets really backed up) until all rendering that created the damage 
> has finished on the GPU.  Managing the ring-buffer of sync objects is a little 
> more complicated than that in practice, but that's the basic idea. 

Can you be more specific about that? Do you need to do a
glClientWaitSync() when you wrap around and reuse the first sync object
pair?

[...]

> I admit this isn't an ideal work-flow, and yes it is one more layer of hard-
> to-test voodoo needed to write a robust TFP/EGLimage based composite manager, 
> but it's the best we can do without modifying client applications.  However, 
> fence sync objects provide a basis for all kinds of cooler stuff once you 
> start defining new ways that client applications can use them to notify the 
> composite manager when they've finished rendering a frame explicitly.  Then 
> the extra step of telling X you want notification when some rendering you've 
> already been notified of has completed will go away.  The rendering 
> notification (damage event) and a pointer of some sort to the sync object that 
> tracks it will arrive together.  That's what I'll be working on after the 
> initial object support is wrapped up.

It worries me to see a pretty complex, somewhat expensive band-aid going
in *without* knowing more about that long term picture. Obviously if the
fence objects are useful for other things, then that reduces the
complexity of the band-aid a bit.

- Owen

[*] If it was documented that a Damage event didn't imply the rendering
had hit the GPU, then the X server could be changed not to flush
rendering before sending damage events. In the normal case where the
rendering is just a single glXSwapBuffers() or XCopyArea() that doesn't
actually improve efficiency, but it does slightly reduce extra work from
the fence. On the other hand, that would change this exercise from
"fixing a corner case that misrenders on one driver" to "breaking every
non-updated compositing manager".