Enhancements for Render composite request
aritger at nvidia.com
Thu Sep 24 00:45:39 PDT 2009
Sorry for the slow follow-up on this thread.
There seems to be a lot of overlap between A) the synchronizing of
Composite rendering with vblank (your operation #2 below), and B) the
buffer presentation/synchronization issues Jesse Barnes put together here:
A few of us at NVIDIA discussed this a while ago. I'll offer a few
thoughts for now, and I'm looking forward to the discussions at the
Linux Plumber's Conference and XDevConf.
- A third problem today in the X Composited desktop world, beyond A)
and B) above, is sequencing between X rendering to a redirected
window, and a GLX_EXT_texture_from_pixmap user (such as Compiz).
With at least the NVIDIA driver (and I suspect others?), the X
driver and OpenGL driver have independent channels of communication
to the GPU, with no guaranteed automatic ordering between them.
Users often observe that X may send rendering commands for a
pixmap to the GPU (using the X driver's GPU channel) and then
send a damage event to the composite manager, who will use TFP
to texture from the pixmap (using OpenGL's GPU channel). Today,
the texture operation may be performed before the X rendering is
guaranteed to be complete. Without expensive stalls of the GPU,
there is not a great way for the composite manager to express
that the texture operation should happen *after* any specific
It would be nice to solve this problem as long as we're solving
other related synchronization issues.
- For best performance, we really want to keep the GPU and CPU busy
in parallel... we shouldn't force the GPU to wait for the CPU,
or vice versa, unnecessarily. We should try to avoid things
like waiting in the CPU for a GPU operation to complete before
submitting more GPU work. Instead, something like fence or sync
objects could be used to express that one GPU channel should
not progress past a certain point until another GPU channel has
progressed to at least a certain point.
- The OpenGL ARB_sync extension:
introduces the flexible concept of sync objects as a mechanism to
express ordering between different GPU channels. This seems like
an interesting model to follow within the X Window System.
In particular, the CPU can continue to submit commands to the GPU,
that GPU channel just won't proceed past the sync object until
the sync object is released.
- On a lot of modern graphics hardware, the hardware itself can
release a sync object at vblank time. This avoids some of the
latency concerns of getting the CPU involved to do:
a) wait in CPU for vblank interrupt
b) submit some commands to GPU to perform blit
c) hope GPU is able to perform blit before vblank interval is over
There are also power consumption benefits to not incurring the
- The Render X extension doesn't really seem like the right place
to describe syncing. E.g., it doesn't address synchronization
for core X primitives.
The slides below go into a little more detail on some of the above,
and also include some brainstorming for possible ways to approach the
composite + buffer presentation issues, but they are far from complete.
On Tue, 25 Aug 2009, Keith Packard wrote:
> * PGP Signed by an unknown key
> The render composite request has a couple of glaring failures:
> 1) Only one rectangle per request. Apps generate a lot of protocol,
> the server spends a lot of time decoding requests and the driver
> has to merge requests back together to hand more than one polygon
> to the hardware. It's interesting that exa (and hence uxa by
> derivation) have a poly-rectangle composite operation in their
> driver interface.
> 2) No vblank synchronization. Anyone wanting to double buffer 2D apps
> has no way of avoiding tearing. I'd like this inside the X server
> to make updates under a RandR transform sync to vblank.
> As operation 1) is already supported by the EXA API, and can be emulated
> in DIX by executing multiple one-rectangle composite requests, this
> seems easy to add to the protocol in a completely compatible fashion:
> COMPOSITERECT [
> src-x, src-y: INT16
> msk-x, msk-y: INT16
> dst-x, dst-y: INT16
> width, height: CARD16
> op: PICTOP
> src: PICTURE
> mask: PICTURE or None
> dst: PICTURE
> rects: LISTofCOMPOSITERECT
> This request is equivalent to a sequence of Composite requests
> using the same op/src/mask/dst values and stepping through
> It seems like operation 2) should be an option on the picture object;
> set a sync mode on the picture and all operations would be covered by
> that mode. It would be 'best effort', so that drivers not supporting the
> sync mode would simply skip it. The question is how fancy this option
> should be; in the simple case, we'd make it just avoid tearing, more
> complex cases could involve having sequential operations to the same
> picture wait for a specific frame number. I'd love to have comments on
> precisely which 'swap modes' would be useful here.
> keith.packard at intel.com
> * Unknown Key
> * 0x096C4DD3
More information about the xorg-devel