render improvements

Fri Apr 15 16:39:12 PDT 2005

On Fri, 2005-04-15 at 18:13 -0400, Zack Rusin wrote:
> On Friday 15 April 2005 17:30, Owen Taylor wrote:
> > Assuming that the temporary buffer fits into L1 cache, that isn't
> > horribly bad, but it's is only going to be fast as an in-place
> > algorithm if you don't get any pipelining between memory
> > accesses and arithmetic in the in-place algorithm.
> >
> > Also, for compositing to video memory, you get a fairly big win
> > by optimizing alpha=0, alpha=255 source pixels to not read from
> > the destination, something that you can't do with your method.
> 
> Right. It's not exactly impossible becaue we still get the source first. 
> It's just less practical because you would have to scan the source 
> before fetching the destination. So you'd be forced to scan the source 
> twice. At this point it might be worth it.

Reading from framebuffer memory is so slow, that it might be worth
it if you had to send off for the pixels to read by postal mail ;-)

You'd basically just need to pass the read-in source buffer to the
functions that read and write the destination. (You need it at the
write stage to optimize the normal case of alpha == 0).

But I think an integrated loop is going to be significantly faster.

> > While it's better to have a fast general case and no special cases
> > then a slow general case that gets hit and some ultra-fast special
> > cases, it's still better to have a fast general case and some
> > ultra-fast special cases.
> >
> > I don't think you did any testing of the xorg code rendering to
> > system memory? Once I get my patch merged, it might be interesting
> > to try your benchmarks against Xephyr and compare that to your
> > code.
> 
> Is that a challenge? ;) 

More curiousity.

> Either way that's not something I'm worried  about for two reasons:
> a) merging special cases is trivial so we can do in a few minutes 
> without any problems,

Yes, it should just be a matter of not deleting the MMX code :-)

> b) operating on scanlines in general gives us more power to use MMX to 
> optimize the general case itself
>
> Right now the fact that Lars was sitting in front of the assembly dump 
> trying to figure out how to combine everything in a most efficient 
> manner helps quite a bit :) On a real server the combining methods are 
> hardly visible though. It's the fetch/store cycle that's killing us. If 
> we could optimize fetching we could easily get a huge improvement. I'd 
> like to look into what Alan suggested.
> 
> > > Also since now the combining methods operate on scanlines adding
> > > code that would in a common way accelerate all operations by
> > > combining a couple of pixels in one pass should be rather easy.
> >
> > You do most likely want to MMX optimize the pieces of your algorithm.
> > All my experience is that MMX makes a large (> 2x) improvement for
> > this kind of code.
> 
> I'm a PPC fan. Your MMX foo does nothing for me ;) 

Well, feel free to write fbaltivec.c ... the same compiler intrinsics
method will work. 

> > > Before we do that, lets decide what to do about convolution
> > > filters. Start of them them is in the xserver but not in the xorg
> > > or the specs. Glitz implements them already. We haven't implemented
> > > them in our implementation. I wasn't sure whether I should bother
> > > quite yet. This might be the right moment to figure out what to do
> > > with them :)
> >
> > Hmm. I don't think that needs to block merging the rest. (it's mostly
> > small bug fixes that got put into one bit of the xorg code or the
> > other).
> 
> Personally I'd just like to know what's the official word on convolution 
> filters.

"official" here is I think pretty much whatever the people doing the
work decide on. 

> > Do we want to link libxrender against libpixman and move the
> > tesellator there? 
> 
> Ideally, yes!
> 
> > Do we think that XRenderCompositeDoublePoly() is 
> > something people should be using at all?
> 
> To be honest my biggest worry is having tessellation code duplicated in 
> a few places. Granted that right now it's only Arthur and Cairo but 
> that's already two places where it should be shared. So having 
> tessellator in a library that we could share would be very nice.

The challenge for Cairo is to make it good enough that you just use it
from Authur :-), but certainly we don't want to block code sharing to
achieve that goal.

Regards,
						Owen