Fence Sync patches
jajones at nvidia.com
Fri Dec 3 13:08:17 PST 2010
On Friday 03 December 2010 11:16:43 am Owen Taylor wrote:
> On Fri, 2010-12-03 at 10:13 -0800, James Jones wrote:
> > I wrote a slide deck on synchronization and presentation ideas for X a
> > year ago or so before starting this work:
> > http://people.freedesktop.org/~aplattner/x-presentation-and-
> > synchronization.pdf
> > Aaron presented it at XDevConf last year. However, that doesn't really
> > cover the immediately useful function for GL composite managers:
> > XDamageSubtractAndTrigger(). I plan on putting some patches to compiz
> > together demonstrating the usage, but basically the flow is:
> > -Create a bunch of sync objects at startup time in your GL/GLES-based
> > compositor, and import them into your GL context as GL sync objects.
> > I'll call those syncObjectsX and syncObjectsGL respectively.
> > -Rather than calling XDamageSubtract(), call
> > XDamageSubtractAndTrigger(syncObjectsX[current]).
> So the basic flow here is:
> Client => X server [render this]
> X server => GPU [render this]
> X server => compositor [something was rendered]
> compositor => xserver [trigger the fence]
> compositor => GPU [render this after the fence]
> xserver => GPU [trigger the fence]
Roughly, but I think that ordering implies a very worst-case scenario. In
reality the last two steps will most likely occur simultaneously, and the last
step might even be a no-op: If the X server already knows the rendering has
completed it can simply mark the fence triggered immediately without going out
to the GPU. This is often the case in our driver, though I haven't
implemented that particular optimization yet.
> In the normal case where there is a single damage event per frame, the
> fact that we have this round trip where the compositor has to go back to
> the X server, and the X server has to go back to the GPU bothers me.
I like to point out that it's not really a round trip, but rather two trips to
the same destination in parallel. A round trip would add more latency.
> It's perhaps especially problematic in the case of the open source
> drivers where the synchronization is already handled correctly without
> this extra work and the extra work would just be a complete waste of
> time. [*]
The default implementation assumes the Open Source driver behavior and marks
the fence triggered as soon as the server receives the request, so the only
added time will be a single asynchronous X request if the open source OpenGL-
side implementation is done efficiently.
> But it doesn't seem like a particularly efficient or low-latency way of
> handling things even in the case of a driver with no built in
> Can you go into the reasoning for this approach?
As I said, this definitely isn't the ideal approach, it's the best fully
backwards compatible approach we could come up with. Things we considered:
-Add a GL call to wait on the GPU for the damage event sequence number. We
got bogged down here worrying about wrapping of 32-bit values, the lack of
ability to do a "wait for >=" on a 64-bit values on GPUs, and the discussion
rat-holed. This was discussed on IRC so long ago I don't even remember all
-Have the server generate a sync object ID, trigger it, and send that with the
damage event if clients opt-in some how. This seemed very anti-X design
(clients should create, or at least name, transient resources), and has the
potential of generating tons and tons of objects if the client forgets to
delete them or can't keep up with the damage events. Also, importing X sync
objects to GL is expensive, so it's desirable to make that a startup-time
-Have the client provide the ringbuffer of objects to X and have it figure out
which one to trigger on every damage event. I don't think I ever discussed
this with anyone. I dismissed it as hiding too much magic in X.
> > -Prefix all the GL rendering that repairs the damage subtracted with a
> > sync wait: glWaitSync(syncObjectsGL[current++])
> > The GL rendering will then wait (on the GPU. It won't block the
> > application unless it gets really backed up) until all rendering that
> > created the damage has finished on the GPU. Managing the ring-buffer of
> > sync objects is a little more complicated than that in practice, but
> > that's the basic idea.
> Can you be more specific about that? Do you need to do a
> glClientWaitSync() when you wrap around and reuse the first sync object
Yeah, that's about it.
> > I admit this isn't an ideal work-flow, and yes it is one more layer of
> > hard- to-test voodoo needed to write a robust TFP/EGLimage based
> > composite manager, but it's the best we can do without modifying client
> > applications. However, fence sync objects provide a basis for all kinds
> > of cooler stuff once you start defining new ways that client
> > applications can use them to notify the composite manager when they've
> > finished rendering a frame explicitly. Then the extra step of telling X
> > you want notification when some rendering you've already been notified
> > of has completed will go away. The rendering notification (damage
> > event) and a pointer of some sort to the sync object that tracks it will
> > arrive together. That's what I'll be working on after the initial
> > object support is wrapped up.
> It worries me to see a pretty complex, somewhat expensive band-aid going
> in *without* knowing more about that long term picture. Obviously if the
> fence objects are useful for other things, then that reduces the
> complexity of the band-aid a bit.
While I don't have the code changes that rely on this change ready for the
"cooler stuff" yet, one such future application, multi-buffering is discussed
in the second half of the PDF link I sent in the last response. I understand
there is some hesitance to reintroduce a multi-buffered approach to rendering
in X when the previous multibuffer extension was mostly (completely?) unused
and it can bloat memory footprints, but I do think layering it on top of
composite and taking into account the efficiency gained by offloading the
buffer swapping to composite managers makes it a lot more interesting. Multi-
buffering also allows true tear-free rendering in X. Right now, composite
provides double buffering, but doesn't eliminate all tearing because
applications can be asynchronously rendering to the composite backing buffer
while the composite manager is texturing from it. Applications eliminate most
of that by doing their own double-buffering: They allocate a window-size
pixmap, render to it, then blit it all to the window/composite backing buffer
at once. However, that blit is wasteful when the composite manager is just
going to then blit the contents to the screen. All that's really needed in
most cases is to switch which backing pixmap the composite manager textures
from, but that needs to be supported in the composite protocol.
Even if the application doesn't want multiple buffers, it could use sync
objects to properly mutex accesses to a single backing buffer with the
Fence syncs can also be used as a more powerful, API-agnostic version of
glXWaitX()/glXWaitGL. While glXWaitX() waits for X rendering on a particular
display to complete before allowing GL rendering on a particular context to
continue (and vice-versa for glXWaitGL()), fence sync objects can operate
across any asynchronous rendering stream on an X screen. A multi-threaded
client with one display per thread, one for X, one for GL, could synchronize
the two using fence sync objects.
In general I believe explicit back-end synchronization objects are a powerful
tool. I don't doubt there are more uses for them out there than I can
enumerate at this time.
> - Owen
> [*] If it was documented that a Damage event didn't imply the rendering
> had hit the GPU, then the X server could be changed not to flush
> rendering before sending damage events. In the normal case where the
> rendering is just a single glXSwapBuffers() or XCopyArea() that doesn't
> actually improve efficiency, but it does slightly reduce extra work from
> the fence. On the other hand, that would change this exercise from
> "fixing a corner case that misrenders on one driver" to "breaking every
> non-updated compositing manager".
Its been noted several times that X damage events only guarantee subsequent X
rendering (and by extension, any rendering of extensions that are defined to
occur in-band with X rendering, which GLX explicitly does not guarantee) will
happen after the damage has landed, and my updates to the damage documentation
explicitly document this.
More information about the xorg-devel