Looking for API to possibly implement "mastered image transfer" for xf86-video-savage

Fri Nov 14 05:05:37 PST 2008

On Thu, 2008-11-13 at 13:03 -0500, Alex Villacís Lasso wrote:
> Michel Dänzer escribió:
> > On Mon, 2008-11-10 at 13:37 -0500, Alex Villacís Lasso wrote:
> >   
> >> The question is: is there any xserver support that might enable a driver 
> >> to get pixmap data into either kind of situation? Either get the pixmap 
> >> into physically contiguous pages and obtain the physical address of the 
> >> start of data, or pre-copy the pixmap data into AGP memory (allocated by 
> >> the driver on startup, if necessary)
> >>     
> >
> > There isn't any such support in the X server; it would be mostly a
> > kernel level thing anyway.
> >
> >   
> >> so that the driver does not need to copy it into AGP memory every
> >> single time.
> >>     
> >
> > FWIW, that's what the radeon driver does currently.
> >
> >
> >   
> I could do that in savage too (copy into AGP and set up a mastered image 
> transfer), but I have seen the radeon code, and it seems that the copy 
> from system memory to AGP is done using a standard memcpy. Probably I am 
> missing a clue, but I fail to understand how a memcpy from system memory 
> to AGP, followed by an accelerated blit from AGP to framebuffer, can be 
> any faster that a direct memcpy from system memory into the framebuffer. 
> Particularly when (in the case of XVideo for savage) the copy must be 
> done every single time for every frame.

Memcpy throughput may be higher to AGP memory than to the GPU, and it
can be done asynchronously wrt other GPU operations. Of course there's
no guarantee that it helps, but it's certainly possible depending on the
setup and workload.

> I was reading http://dri.freedesktop.org/wiki/AGP?highlight=(CategoryFaq) ,
> and the start of the document describes the "DMA model" of AGP usage, which 
> closely matches what I think I need. Only that instead of allocating an 
> AGP buffer and using it for all memory transfers, it would be useful to 
> make the GART (temporarily) point into a buffer in userspace while 
> preserving its contents. Even one page might be useful to make a faster 
> transfer. But I might be wrong.

That's only possible when the data in system memory satisfies the
address/pitch/etc. constraints of the GPU. Even then, the overhead of
(un)binding memory to/from the GART may be too high. Again, it's hard to
say without trying it.

-- 
Earthling Michel Dänzer           |          http://tungstengraphics.com
Libre software enthusiast         |          Debian, X and DRI developer