EXA for radeon experimental patch

Thu Sep 1 00:16:34 PDT 2005

On Thursday 01 September 2005 00:41, Benjamin Herrenschmidt wrote:
> > Another think I saw is that compositing onto the framebuffer is still
> > always slow. It might be a good idea is EXA always used
> > DownloadFromScreen (if it exists) to copy all pixmaps for a composite
> > call into main memory before attempting to use fbComposite.
>
> DownloadFromScreen will be dead slow in many cases. Especially you can't
> really rely on DMA to AGP memory here as a lot of chipsets have non
> working write from GPU to AGP :(

What about writes from GPU to PCI? Maybe these exist.
If you can't provide an implementation that is significantly faster than just 
a series of memcpy commands it's probably best just to not implement the 
hook, as it won't do anything else than the fallback handling from EXA.

> > I know this would give a huge speedup in some cases. Especially
> > compositing onto the framebuffer is currently extremely slow as it can't
> > be migrated over to main memory. Using DownloadFromScreen to make a copy
> > of the framebuffer area in question (and of the other two operands to
> > composite), doing the composition completely in main memory and then
> > copying the result back into the framebuffer would probably be a factor
> > of 10-50 faster than doing calling fbComposite with something still left
> > in video mem.
>
> We might need to "hint" EXA about how good DownloadFromScreen is ?

Whether the hook is implemented or not?

> > Now this is not true for shared memory architectures as the i810, so we
> > would probably need some way to find out how slow framebuffer reads are
> > (and how fast DownloadFromScreen is) and decide the strategy to use based
> > on this information.
>
> BTW. Another issue I'm tackling at the moment is endianness & swappers.
> When falling back, composite will end up drawing directly into pixmaps
> in vram which have a different bit depth than the front buffer.

As long as the Picture has the correct format (ie. the one that is in fact in 
VRAM) it should all just work.

> This will of course not work on big endian machines as the swapper on
> the PCI -> VRAM path will be configured for the front buffer.
>
> I'm about to add to EXA a couple of new hooks PrepareAccess() &
> FinishAccess() that will wrap such direct accesses to vram. They can be
> stacked though, up to 3 times for composite. On Radeon, that is fine as
> I can use the surface registers to setup different swapper settings over
> the 3 pixmaps, but not all cards can do that. So I'll have a fallback
> mecanism: when PrepareAccess() fails, then the pixmap is downloaded
> using DownloadFromScreen() and compositing will be done from memory.
> DownloadFromScreen() should always work as it can save the main swapper
> setting, change it to the pixmap bit depth, do the transfer, restore the
> swapper.

How did this work in XAA? XAA did also fall back to fbComposite operating 
directly on VRAM. The only change now is that you're you have more freedom of 
what kind of format you store in VRAM.

> What kind of mecanism does nvidia have for dealing with that issue ?

Nvidia has an endianness flag you can set in various places which tells the HW 
about the endianness of pixmaps etc. They are allways set to host endianness.

Lars