[Linux-fbdev-devel] Re: radeon, apertures & memory mapping

Sun Mar 13 19:59:42 PST 2005

On Sun, 2005-03-13 at 20:47 -0500, Jon Smirl wrote:
> On Mon, 14 Mar 2005 12:05:59 +1100, Benjamin Herrenschmidt
> <benh at kernel.crashing.org> wrote:
> > 
> > > It should be the responsibility of the memory manager. If anything wants
> > > to access the memory it would call lock() and when it's done with the
> > > memory it calls unlock(). That's exactly how DirectFB's memory manager
> > > works.
> > 
> > In an ideal world ... However, since we are planning to move the memory
> > manager to the kernel, that would mean a kernel access (syscall, ioctl,
> > whatever...) twice per access to AGP memory. Not realistic.
> 
> I'm only suggesting this for the DRM/fbdev stack. Anything else from
> user space can use a non-cached mapping.

Then I don't see the point. Especially since the problem I explained
would still be there as long as there is a non-cached mapping.

> It shouldn't hurt to have a parallel non-cached mapping being used in
> conjuction with this protocol. By definition the non-cached mapping
> never gets into an inconsistent state.

Wrong :) It can badly conflict with the existence of a cached mapping.
Re-read my mail that explains the problem carefully.

> > The case of the CP ring is easy to deal with by the macros we have there
> > already and it would be kernel-kernel. But it would be a hit for a lot
> > of other things I suppose.
> 
> The performance trade off is, how long does the invalidate take?  If
> the CPU has 2MB of unflushed write data the instruction is going to
> take a while to finish. In the non-cached scheme this data is flushed
> in parallel with us playing with the AGP memory.  To flush 2MB takes
> something like 2MB / 400Mhz * 64bytes * 2 (DDR) = 20 microseconds but
> it may be more like 1 microsecond on average.
> 
> Thinking about this for a while you can't compute which is the better
> strategy because everything depends on the workload and how dirty the
> cache is. Best thing to do would be to code it up and try it. But I
> want to get a dual head radeon driver working first.
> 
> It may also be true that the CP Ring is better left non-cached and
> only access to the graphics buffers be done with the caching scheme.

Using write-through cache might be an interesting tradeoff

> BTW, you can implement super fast texture load/unload using a similar
> scheme. Start with the texture in the user space program. Program
> wants to upload the texture. Flush CPU cache. Point the GART at the
> physical pages allocated to the user holding the texture. Now walk the
> user's page table and mark those pages copy on write. Free the memory
> the pages the GART was originally pointing at. Reverse the scheme to
> get data from the GPU. For small textures it is faster to copy them
> but if you are moving 20MB of data this is much faster.
> 
-- 
Benjamin Herrenschmidt <benh at kernel.crashing.org>