EXA support for nv driver

Tue Sep 6 00:55:18 PDT 2005

On Tuesday 06 September 2005 09:38, Benjamin Herrenschmidt wrote:
> On Tue, 2005-09-06 at 09:29 +0200, Lars Knoll wrote:
> > > You mean PCIe chipsets? ;) AFAIK AGP is explicitly specified not to be
> > > cache coherent.
> >
> > AGP 3.0 seems to allow for cache coherency:
>
> Yah, but any idea which chipsets actually provide it ? I've been
> thinking about various ways of "using" those features lately with EXA.
> There have been several things I had in mind.

If I read the AGP 3.0 specs correctly, it looks to me that every 3.0 chipset 
has to support cacheable AGP. 

Seems like newer AMD chipsets are 3.0. At least mine is, so I could do some 
testing.

> 1) Create a 3rd classs of migration for pixmaps -> AGP memory (or
> whatever "GART" memory, wether it's AGP, PCI-GART or PCI-E GART).
> Pixmaps put there are the ones that get regulary banged from both the
> CPU (software fallbacks) and the engine. The nice thing with AGP memory
> is that it's faster than video memory for CPU accesses (especially if
> your chipset supports cacheable AGP !) and it's directly accessible by
> the engine.
>
> The way to do that at first (incremental implementation) would probably
> to modify the offscreen allocator to actually have N instances of the
> allocator itself (zones). Then, the driver could create a memory zone
> and an AGP zone. The pixmaps themselves would have a flag indicating in
> which zone their backup is. That also means more dynamic allocation of
> the vram/agp storage as we don't want to permanently allocate space in
> both for all pixmaps, and thus more causes for failures due to
> fragmentation but since we can kick pixmaps around, and possibly move
> them around as well, that should be doable...

Sounds very good. Having pixmaps that will be accessed both by the graphics HW 
and by CPU in the GART will be a definite win.

> 2) With PCI GART and PCI-E GART (and with cache coherent AGP too) we are
> in a new situation where we don't really need to permanently bind a
> whole aperture in the "GART" space at all. We could dynamically map the
> pixmap there "on demand". Which mean that instead of migrating the
> pixmaps to "AGP" (or rather call it GART space)

This would be even more flexible, and could be quite a bit faster in a lot of 
cases. But it would probably require some kernel/drm support (to be able to 
map a specified region of virtual memory into the GART).

> This is really only a low level implementation detail compared to 1).
> That is, we still have to allocate GART virtual space (though we can
> afford to have a much larger GART) and instead of "migrating" from
> memory to GART, we would just alter the mappings. Sort-of a faster
> version of 1).

I don't think the virtual space is a problem, if we limit ourselves to 256MB 
or similar on 32 bit machines.

I think it would be best to start out with the first proposal and add some 
sort of GART handling to the memory manager and EXA.

Cheers,
Lars