Enhancements for Render composite request

Thu Sep 24 00:45:39 PDT 2009

Hi Keith,

Sorry for the slow follow-up on this thread.

There seems to be a lot of overlap between A) the synchronizing of
Composite rendering with vblank (your operation #2 below), and B) the
buffer presentation/synchronization issues Jesse Barnes put together here:

      http://dri.freedesktop.org/wiki/CompositeSwap

A few of us at NVIDIA discussed this a while ago.  I'll offer a few
thoughts for now, and I'm looking forward to the discussions at the
Linux Plumber's Conference and XDevConf.

      - A third problem today in the X Composited desktop world, beyond A)
        and B) above, is sequencing between X rendering to a redirected
        window, and a GLX_EXT_texture_from_pixmap user (such as Compiz).
        With at least the NVIDIA driver (and I suspect others?), the X
        driver and OpenGL driver have independent channels of communication
        to the GPU, with no guaranteed automatic ordering between them.

        Users often observe that X may send rendering commands for a
        pixmap to the GPU (using the X driver's GPU channel) and then
        send a damage event to the composite manager, who will use TFP
        to texture from the pixmap (using OpenGL's GPU channel).  Today,
        the texture operation may be performed before the X rendering is
        guaranteed to be complete.  Without expensive stalls of the GPU,
        there is not a great way for the composite manager to express
        that the texture operation should happen *after* any specific
        X rendering.

        It would be nice to solve this problem as long as we're solving
        other related synchronization issues.

      - For best performance, we really want to keep the GPU and CPU busy
        in parallel... we shouldn't force the GPU to wait for the CPU,
        or vice versa, unnecessarily.  We should try to avoid things
        like waiting in the CPU for a GPU operation to complete before
        submitting more GPU work.  Instead, something like fence or sync
        objects could be used to express that one GPU channel should
        not progress past a certain point until another GPU channel has
        progressed to at least a certain point.

      - The OpenGL ARB_sync extension:

          http://www.opengl.org/registry/specs/ARB/sync.txt

        introduces the flexible concept of sync objects as a mechanism to
        express ordering between different GPU channels.  This seems like
        an interesting model to follow within the X Window System.

        In particular, the CPU can continue to submit commands to the GPU,
        that GPU channel just won't proceed past the sync object until
        the sync object is released.

      - On a lot of modern graphics hardware, the hardware itself can
        release a sync object at vblank time.  This avoids some of the
        latency concerns of getting the CPU involved to do:

          a) wait in CPU for vblank interrupt
          b) submit some commands to GPU to perform blit
          c) hope GPU is able to perform blit before vblank interval is over

        There are also power consumption benefits to not incurring the
        vblank interrupt.

      - The Render X extension doesn't really seem like the right place
        to describe syncing.  E.g., it doesn't address synchronization
        for core X primitives.

The slides below go into a little more detail on some of the above,
and also include some brainstorming for possible ways to approach the
composite + buffer presentation issues, but they are far from complete.

      http://people.freedesktop.org/~aplattner/x-presentation-and-synchronization

Thanks,
- Andy

On Tue, 25 Aug 2009, Keith Packard wrote:

> * PGP Signed by an unknown key
>
> The render composite request has a couple of glaring failures:
>
> 1) Only one rectangle per request. Apps generate a lot of protocol,
>    the server spends a lot of time decoding requests and the driver
>    has to merge requests back together to hand more than one polygon
>    to the hardware. It's interesting that exa (and hence uxa by
>    derivation) have a poly-rectangle composite operation in their
>    driver interface.
>
> 2) No vblank synchronization. Anyone wanting to double buffer 2D apps
>    has no way of avoiding tearing. I'd like this inside the X server
>    to make updates under a RandR transform sync to vblank.
>
> As operation 1) is already supported by the EXA API, and can be emulated
> in DIX by executing multiple one-rectangle composite requests, this
> seems easy to add to the protocol in a completely compatible fashion:
>
> COMPOSITERECT	[
> 			src-x, src-y:	INT16
> 			msk-x, msk-y:	INT16
> 			dst-x, dst-y:	INT16
> 			width, height:	CARD16
> 		]
>
> CompositeRectangles
>
> 	op:		PICTOP
> 	src:		PICTURE
> 	mask:		PICTURE or None
> 	dst:		PICTURE
> 	rects:		LISTofCOMPOSITERECT
>
> 	This request is equivalent to a sequence of Composite requests
> 	using the same op/src/mask/dst values and stepping through
> 	rects.
>
> It seems like operation 2) should be an option on the picture object;
> set a sync mode on the picture and all operations would be covered by
> that mode. It would be 'best effort', so that drivers not supporting the
> sync mode would simply skip it. The question is how fancy this option
> should be; in the simple case, we'd make it just avoid tearing, more
> complex cases could involve having sequential operations to the same
> picture wait for a specific frame number. I'd love to have comments on
> precisely which 'swap modes' would be useful here.
>
> -- 
> keith.packard at intel.com
>
> * Unknown Key
> * 0x096C4DD3
>