GLX redirection extension

Fri Jul 21 21:19:05 PDT 2006

On Fri, 2006-07-21 at 08:44 -0700, Ian Romanick wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Felix Bellaby wrote:
> > On Thu, 2006-07-20 at 13:57 -0400, Kristian Høgsberg wrote:
> >>On 7/20/06, Felix Bellaby <felix at bellaby.plus.com> wrote:
> >>
> >>>I think that I may have found a potentially useful way out of some
> >>>problems within the current compositing framework. Basically, the server
> >>>has information that the compositor would find useful.
> >>
> >>I'm not sure what problems you're trying to solve here, but we don't
> >>need extra protocol for redirecting GLX.  When a window is redirected
> >>with COMPOSITE, all rendering to that window is supposed to go to an
> >>offscreen pixmap.
> > 
> > That is the principle problem. Rendering GL directly onto the screen is
> > much faster than rendering into a pixmap and copying the pixmap onto the
> > screen, especially if the second operation is performed using a texture
> > map. The current situation allows you to use a composited desktop with
> > 2D applications _OR_ use GL for serious applications like 3D modeling.
> > You can not do both at the same time.
> > 
> > With GLX redirection, the compositor could allocate areas of the screen
> > for use by GL applications and they could run at full speed unhampered
> > by the draw -> pixmap -> texture -> screen set up. This is how direct
> > GLX has worked under X in the past, and I can not see how this technique
> > can be used in the future unless the compositor treats different apps in
> > different ways based on their GLX requests. 
> 
> Right now rendering to pixmaps is not accelerated for GL.  We need to
> fix that.  Kristian and I spent some time talking about how this might
> be acomplished at DDC.  The "only" think that we really need is a way
> for the DRI driver, whether it's loaded in the client or in the server, to:
> 
> 1. Find out where the off-screen pixmap is located in card memory.  This
> may require the server to migrate the pixmap back to the card.
> 
> 2. Ask the core server to not migrate the pixmap off the card while it
> is doing 3D rendering.
> 
> Once we can do those two things, the DRI driver will treat the pixmap
> the same way that it treats any other rendering target.  Since the data
> is still in a pixmap, the compistor can treat it just like any other window.

I was not worried about how to accelerate rendering into an off-screen
Pixmap, though it is obviously an important step. Rather, I was thinking
about integrating accelerated on-screen rendering into a composited
environment. On-screen rendering will always be quicker than off-screen
rendering + transferring to the screen. I do not see how on-screen
rendering can be accomodated within the existing COMPOSITE framework.

> The big trick in the compositor is that you have to make sure the IDs
> used for the hidden FBOs don't conflict with the IDs of FBOs used by the
> application.  Since the application can specify any 32-bit integer for
> and FBO ID that it wants, this is non-trivial.

I guess that might need a spec change to reserve some high end of the
FBO range. I doubt that it is much of a issue in practice. Clients are
unlikely to use a scattergun, especially with glGenFramebuffer available
for clarity.

> The other issue is that we need to be able to do X rendering to a
> window.  There is no way to do X rendering to an FBO, you can only do GL
> rendering.  I think that's the deal breaker right there.

The lack of X rendering to FBOs does prevent them from providing a one
size fits all solution, but my proposal was to enable the compositor to
identify the kind of drawing that was to occur, and so get round this
deal breaker.

> > Furthermore, rendering into an off screen pixmap currently fails to take
> > full advantage of the capabilities of GL hardware. If the compositor /
> > server responded to CreateWindow requests by offering a framebuffer
> > object using a texture as the renderbuffer then the drawing could go
> > directly into the texture with no intermediate pixmap. SwapBuffer
> > requests on the GLXWindow drawable could move the rendering into a
> > different texture. This would probably provide the fastest possible
> > implementation of texture based compositing available on existing
> > hardware, and it could be done using redirected GLX in combination with
> > existing GLX/GL drivers. I am not convinced that patching the drivers
> > will achieve the same speeds using texture_from_pixmap.
> 
> I thought about that route too.  Afterall, I'm in the framebuffer object
> working group in the ARB. :) The difficulty with going this route is
> that it adds complexity to the compositor and the server-side GLX
> implementation.

The complexity has to go somewhere. I would rather have it in the open
source code in the compositor, where lots of GL developers will see it.
I do not want it ending up inside closed source proprietory drivers.

>   Since we need to have server-side accelerated rendering
> to pixmaps anyway, I don't think going the framebuffer object route
> really buys us anything.

It depends on how well GL and X get on together rendering into the same
framebuffer. The fact that only GL can render into FBOs should open up
performance gains. I find GL rendering into a FBO texture is currently
25% faster than accelerated GL into a Pixmap on my ultra-cheap nvidia
6150. Of course, that might change, but it is suggestive.

> > Sharing Pixmaps between the applications doing their drawing and the
> > compositor drawing the screen can not be trivial when the applications
> > and compositor occupy separate processes. Telling the compositor what is
> > going on after the event using Damage reports works to a point, but the
> > GLX specs are deliberately vague about what happens when you share
> > drawables between uncoordinated processes. Perfectly compliant
> > implementations are free to respond in undefined ways, and may do so in
> > order to milk a bit more speed. 
> 
> The spec uses a lot of works like "implementation defined".  We're
> defining our implementation. :)

> > For example, the current nvidia drivers crash when GLXPixmaps are used
> > with Pixmaps that COMPOSITE has already discarded following the
> > unmap/resize of a window. nvidia have undertaken to fix this, but I do
> > not think that it is strictly a bug. The nvidia drivers also seem to
> > freeze GL applications when a compositor tries to access the COMPOSITE
> > Pixmaps directly using GL. Again, this may be a bug that they can fix,
> > but it may have a performance cost.
> > 
> > We seem set on the hope that every single GL driver developer will be
> > able to solve these kinds of problems on behalf of the compositor,
> > without losing performance or expecting any thanks. I think that it
> > might be more sensible and efficient to design the compositor to do
> > things using GLX spec compliant techniques, rather than leaving it
> > largely to others to try to fix things up behind the scenes in their
> > drivers.
> 
> I think you're reading too much into what the GLX spec says.  I don't
> think we're doing or trying to do anything that the spec says is
> illegal.  We are straying into some areas where the spec says the
> behavior is undefined or platform specific.  We're defining the way that
> we want our platform to work.  

I agree that what you are doing is not prohibited by the specs, and that
the specs must continuously evolve as practice finds reasons to extend,
refine and rewrite them. However, changes in the misty world of hardware
drivers are more likely to encounter problems than changes in the very
visible world of ordinary GL programming. 

My hope was that the existing specs might provide a common ground
between existing implementations from which to develop portable
compositing code without changing driver internals. 

> I can assure that Apple and Microsoft both do the same thing on 
> their platforms.

Now, there's an example to follow. :)

> >>  The two big issues with COMPOSITE today is that
> >>accelerated OpenGL (direct or indirect) and Xv doesn't respect this
> >>and render directly to the front buffer.
> > 
> > The latest nvidia drivers do perform accelerated GL drawing into off
> > screen pixmaps when compositing is enabled. However, this is slower than
> > doing it directly onto the screen back buffer. Xv renders directly onto
> > a screen overlay because this is the only way to render quality video at
> > acceptable framerates.
> 
> Up until the existance of the compositor, rendering to pixmaps was a
> very, *very* uncommon usage.  I suspect that the issue is that Nvidia
> has spent zero time optimizing this uncommon path. The other potential 
> issue that I see is that the compositor necessarilly adds a certain 
> amount of latency.

The nvidia code does look alpha and they might squeeze more speed out
given time. However, the latency and pixmap -> screen copying/mapping
are here to stay. Crude tests on my hardware suggest that 90% of the
extra drawing time taken by GL->Pixmap/texture->screen compared with
GL->screen occurs in the texture->screen stage.

>That's just nature of that particular beast.

But it does not need to be. There is no reason why compositors can not
reserve areas of the screen for the special use of apps that need to do
special kinds of rendering. All that it needs is a bit of thought on how
to make it look pretty, and some means of telling the compositor that
some windows need special treatment.

Felix