Fence Sync patches

Fri Dec 3 14:14:34 PST 2010

On Friday 03 December 2010 12:18:10 pm Keith Packard wrote:
> * PGP Signed by an unknown key
> 
> On Fri, 03 Dec 2010 14:16:43 -0500, Owen Taylor <otaylor at redhat.com> wrote:
> > It's perhaps especially problematic in the case of the open source
> > drivers where the synchronization is already handled correctly without
> > this extra work and the extra work would just be a complete waste of
> > time. [*]
> 
> I have hesitated to argue against this plan as it may be perceived as
> corporate bias. But, I would sure like to see something better than an
> argument from authority as to why this is necessary.

I appreciate the effort to remain unbiased, but always welcome any technical 
feedback.  In this case, the issues you bring up were already considered and 
discussed on IRC.

For my part, I regret that this work may be seen by some as an NVIDIA-backed 
attempt to dump functionality only needed by a closed-source driver on an 
under-staffed open-source project.  That is not my intention, and while the 
discussion has centered around the one initial application of the changes that 
currently only benefits our driver, I do think fence objects will be generally 
useful on all platforms.  Further, I'll be around to support this code, answer 
questions about it, etc, as long as NVIDIA keeps paying me, and probably even 
if they don't just because I find it interesting.  Working on this code has 
also taught me a lot about the X code base and development process, and I hope 
I can help out more in the future with other issues too as a result.  I'll be 
applying for a fd.org account shortly.

> The trouble here is that all of the drivers we can look at don't need
> this to work efficiently, so there's no way to evaluate the
> design to see if it makes any sense at all.
> 
> Requiring changes to all compositing managers just to support one driver
> seems like a failure in design, and it will be fragile as none of it
> will matter until the compositing manager is run against that driver.

I disagree that this is a failure of design.  I always try to strike a balance 
between the most efficient solution for the end users and the constraints of 
development and maintenance.  And we're used to being the odd man out.  We 
regularly need to provide patches or at least point out bugs in many projects 
that only happen on our drivers for whatever reason (We support a different 
set of GL extensions, we accelerate a different set of X render operations, 
older apps relying on non-compliant SGI weirdness, newer apps relying on non-
compliant/undefined DRI/mesa/whatever behavior, etc.).  It's not in our 
interest to needlessly burden our users, but different isn't necessarily 
always bad.

> > But it doesn't seem like a particularly efficient or low-latency way of
> > handling things even in the case of a driver with no built in
> > synchronization.
> 
> I don't think it's all that different from the mechanisms used in the
> open source drivers, it's just that the open source drivers do the
> synchronization between multiple clients using the same objects
> automatically inside the kernel. It looks like there are about the same
> number of context switches, and a similar amount of user/kernel
> traffic. For the other drivers, using similar language, we do:
> 
>     Client => xserver      [render this]
>     X server => GPU        [render this, fence A]
>     X server => compositor [something was rendered]
>     compositor => xserver  [subtract damage]
>     compositor => GPU      [wait A, render this]
> 
> With the explicit fencing solution, nothing appears on the screen until
> the X server queues the fence trigger to the GPU and that gets executed,
> so it may be that another client-xserver context switch is required,
> once per frame.

Agreed, as stated in my response to Owen, there should be little difference 
overall.

> The question I have is that if these fence objects can be explicitly
> managed, why can't they be implicitly managed? Set a fence before
> delivering damage, wait for the fence before accessing those objects
> From another application, just as in the diagram above. The only
> rendering from the client that we're talking about is the back->front
> swap/copy operation, not exactly a high-frequency activity.
> 
> That doesn't depend on having a single GPU hardware ring, just on having
> a kernel driver that tracks these fences for each object to insert
> appropriate inter-application synchronization operations. Heck, we can
> even tell you which drawables have damage objects registered so that you
> could avoid fencing anything except the Composite buffers for each
> window.

Couple of things here:

The open source drivers have chosen to perform this implicit synchronization.  
However, it isn't required by any specification.  GLX explicitly notes this 
synchronization is not required in the second paragraph of the spec.  The 
burden is on clients to implement the synchronization with the assumption that 
they will know how much synchronization is needed, and which is the most 
efficient method of synchronization for their needs.  I think that was a very 
good design decision, and continues to be so.  Just because new applications 
and X extensions introduce new synchronization needs doesn't mean new forms of 
implicit synchronization should be shoe-horned in at the driver level.  
Rather, a clean form of explicit synchronization should be exposed to 
applications that they can use in any way they see fit.

In theory, implicit synchronization would be possible in our driver model.  
However, it would almost certainly be more work than adding and maintaining 
this X extension and modifying a half-dozen composite managers to use it (I 
consider the composite manager modifications needed fairly trivial, and as I 
mentioned, I'll be updating at least one of them myself as an example).  Our 
kernel driver has very, very little awareness of surface management.

Thanks,
-James

> > [*] If it was documented that a Damage event didn't imply the rendering
> > had hit the GPU, then the X server could be changed not to flush
> > rendering before sending damage events.
> 
> Doing the flush is a recent change; we had similar rendering issues with
> open source drivers until recently.