EXA for radeon experimental patch

Thu Sep 1 14:36:47 PDT 2005

On Thu, 2005-09-01 at 14:57 +0200, Lars Knoll wrote:
> On Thursday 01 September 2005 13:15, Benjamin Herrenschmidt wrote:
> > > > They do, though they require some kernel support to get to the physical
> > > > address of pages and need some proper scatter/gather support on the
> > > > card side.
> > >
> > > The all you need is a drm module for your card. Even if the card doesn't
> > > have scatter/gather support, drm allows you to allocate a piece of
> > > consistent physical ram, and mmap it in the server. The handle you get is
> > > the physical address, so you should be able to use that to implement PCI
> > > dma transfers.
> >
> > Yup, but it's very likely that allocating physically contiguous memory
> > will fail. The kernel isn't that good as keeping physical memory non
> > fragmented, and thus, physical allocations above PAGE_SIZE are quite
> > likely fail after boot.
> 
> Why this? The kernel has support for paging, so it could easily free up some 
> continuous pages just by swapping them out if they are used.

Not really. Not everything can be pages out, kernel own allocations for
example can't, like network buffers etc... The kernel is bad as
allocating physically contiguous memory, the more you ask, the more
chances you have for it to fail. Above a few pages, it's hopeless.

> > On cards that don't have such capability, I intend to have the
> > PrepareAccess() hook fail, causing EXA to DownloadFromScreen() the
> > pixmap to RAM before the composite operation.
> 
> Ok, I get it now. This makes sense for operations where you only write to a 
> pixmap. 
> 
> For pixmaps that operate as a source for the fbXxx commands (and this includes 
> the dest pixmap in fbComposite) it might be better to download them directly, 
> as you have to read the data from the framebuffer anyway. Doing this by mmio 
> will be extremely slow on most cards, so the download hook gives you at least 
> the change that it's faster.

Yes, and by using my Perpare() hook, you can experiment with that by
making it fail all the time ;) In fact, that's why I'm thinking passing
down "hints" to make you decide what to do, like the "direction".

> When you have to read in something in 16bit color depth, even a simple memcpy 
> based download implementation will be faster than mmio, as you at least copy 
> the stuff 32bit wise from vidmem and not 16bit wise.
> 
> Lars$