render improvements

Owen Taylor otaylor at
Fri Apr 15 14:30:05 PDT 2005

On Fri, 2005-04-15 at 17:02 -0400, Zack Rusin wrote:
> On Friday 15 April 2005 16:18, Owen Taylor wrote:
> > The xorg patch above removes all the changes from the xorg tree that
> > my patch merges from the xorg tree into the xserver tree ... but
> > I think those changes don't have much to do with the bulk of your
> > code, which is all about the general case, instead of special
> > case optimizations.
> Right.
> > To avoid further confusion, I'll go ahead and commit my changes this
> > evening ... hopefully that won't cause you too much pain.
> Well, to be honest we removed the mmx code for now. I can add it back 
> without any problems, but it shouldn't be necessary. 

That's a confident statement :-)

I admit to not having studied your approach in great detail; but if
my understanding is correct, you are doing

 Copy from source, dest into temporary ARGB32 buffers
 Composite in temporary ARGB32 buffers
 Copy from temporary ARGB32 buffers to dest

Assuming that the temporary buffer fits into L1 cache, that isn't
horribly bad, but it's is only going to be fast as an in-place 
algorithm if you don't get any pipelining between memory
accesses and arithmetic in the in-place algorithm.

Also, for compositing to video memory, you get a fairly big win
by optimizing alpha=0, alpha=255 source pixels to not read from
the destination, something that you can't do with your method.

While it's better to have a fast general case and no special cases
then a slow general case that gets hit and some ultra-fast special
cases, it's still better to have a fast general case and some
ultra-fast special cases.

I don't think you did any testing of the xorg code rendering to
system memory? Once I get my patch merged, it might be interesting
to try your benchmarks against Xephyr and compare that to your

> Also since now the combining methods operate on scanlines adding code 
> that would in a common way accelerate all operations by combining a 
> couple of pixels in one pass should be rather easy. 

You do most likely want to MMX optimize the pieces of your algorithm.
All my experience is that MMX makes a large (> 2x) improvement for
this kind of code.

> > I think it would be very good if someone went through and merged up
> > the remaining differences in fb/ between xserver and xorg. There's
> > no reason that they should differ at all.
> I guess I could do it early next week, but...
> > If we did that, then we could have a simple way of treating fb/
> > changes ... to say that all changes must go first into xserver than
> > get merged into xorg.
> Before we do that, lets decide what to do about convolution filters. 
> Start of them them is in the xserver but not in the xorg or the specs. 
> Glitz implements them already. We haven't implemented them in our 
> implementation. I wasn't sure whether I should bother quite yet. This 
> might be the right moment to figure out what to do with them :)

Hmm. I don't think that needs to block merging the rest. (it's mostly
small bug fixes that got put into one bit of the xorg code or the

> > In the slightly longer term, the work that needs to be done is to
> > make the X server trees use libpixman.
> Yeah, that'd be good. One other short-term thing I'd like to see is 
> improving the tessellation code in XRenderCompositeDoublePoly, or maybe 
> even having one common trapezoidation algorithm that everyone could 
> share. I guess even merging back the changes from the Cairo tessellator 
> back into Render would be enough for now.

Do we want to link libxrender against libpixman and move the tesellator
there? Do we think that XRenderCompositeDoublePoly() is something people
should be using at all?


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <>

More information about the xorg mailing list