render improvements

Lars Knoll lars at
Tue Apr 19 00:23:18 PDT 2005

On Monday 18 April 2005 23:51, Keith Packard wrote:
> > > b) operating on scanlines in general gives us more power to use MMX to
> > > optimize the general case itself,
> scanlines don't deal with filters and transforms well at all; I'd like
> to see this code use square patches (8x8 or so) which seems like a good
> fit for both MMX and transforms.

There are a few reasons we used scanlines for the implementation. 

The first one was that it's rather easy to implement. 

The other is that we had some rather good experience with them using our 
client side painter. Even for rather large images that don't fit into the L1 
cache, general affine transformations were decently fast using this approach. 
As long as you can use the processor cache, the time to fetch the scanline is 
not too big compared to the time the composition takes.

It might also make an implementation easier where we use DMA tranfers to get 
the pixmap data from the framebuffer into the processor cache, but I might be 
wrong here.

We know that the biggest performance bottleneck currently is the framebuffer 
access, so I think that's the place we should focus currently. Using MMX 
instructions to fetch/store a scanline from the framebuffer is a good start, 
but in the long term we need DMA if we want to get any reasonable 

We can try a patch based aproach later on once we found a way to get fast 
access to the framebuffer data. As long as this is not solved it IMO doesn't 
make a whole lot of sense to try to improve the implementation in this 


More information about the xorg mailing list