[PATCH] EXA: Move floating point math to the GPU as much as possible for R1-5xx.

Mon Oct 5 02:42:24 PDT 2009

On Mon, 2009-10-05 at 01:38 -0400, Alex Deucher wrote: 
> 2009/10/3 Michel Dänzer <michel at daenzer.net>:
> > From: Michel Dänzer <daenzer at vmware.com>
> >
> > Also add fast paths for untransformed Composite operations.
> >
> > This can significantly reduce the CPU overhead in RadeonCompositeTileCP, at
> > least for TCL capable GPUs.
> > ---
> >
> > I think the basic idea is sound, but I'm not sure if some parts are going too
> > far, e.g. the float fw, fh locals in the fastpath. Opinions?
> 
> 
> Looks pretty good.  What sort of improvements are you seeing?

Not sure I've measured this one separately, but together with the
changes I pushed recently I've seen an x11perf -aa10text speedup on the
order of 10-20%, both with and without KMS.

> Are there any improvements to the non-tcl path? 

Hmm probably not as is, but it might be possible to use the fast path as
well at least in the untransformed case.

> If you wanted to take this a step further you could add some instructions
> take the reciprocal in the shader.

Right, but I wouldn't expect that to make any significant difference,
the setup overhead seems small compared to RadeonCompositeTileCP. Also
I'm not planning to mess with shaders in such a low-level form, but feel
free. :)

> Also, we don't yet take advantage of the tcl hw on r1xx and r2xx chips.

Yeah, that might be a worthwhile project for those with such hardware.

-- 
Earthling Michel Dänzer           |                http://www.vmware.com
Libre software enthusiast         |          Debian, X and DRI developer