memcpy to AGP vs memcpy to framebuffer - which is faster?

Alex Deucher alexdeucher at
Fri Dec 26 08:42:22 PST 2008

On Wed, Dec 24, 2008 at 1:28 PM, Alex Villací­s Lasso
<a_villacis at> wrote:
> This is probably a very basic question, but it is important for me to know:
> If I do an ordinary (non-accelerated) memcpy of a frame from system
> memory to a buffer in AGP memory (allocated via DRI), is it any faster
> than a (non-accelerated) memcpy of the exact same frame to a buffer area
> in offscreen memory in the framebuffer? Does the fact of whether the
> chipset is integrated into the mainboard (and using memory stolen from
> main RAM as video memory) affect this, as opposed to a plug-in card in
> the AGP port?

It's going to be faster to copy to gart buffers rather than to access
framebuffer memory directly since you don't have to go across the
PCI/AGP bus.  For IGP chips you could probably copy stuff directly to
the stolen memory pool (if you know where it is), but in that case it
doesn't go through the GPU's host data path which handles tiling and
things like that.

> I want to know because I am evaluating whether it is worthwhile to
> implement allocation of AGP buffers via DRI in the XVideo code path of
> xf86-video-savage. The "mastered image transfer" (used to transform from
> planar YV12 to packed YUV) present in the chipset can choose between
> framebuffer memory and AGP memory as a source for the conversion.
> Currently it uses an upload from system memory to an area in offscreen
> memory, followed by the conversion. However, the upload to the
> framebuffer is the main source of delay, and measurements with mplayer
> and a 640x480 movie show that software conversion (BCIForXV=off) is
> *faster* (90 seconds) than BCI-mediated conversion (110 seconds).

You'd have to benchmark it to be sure. The advantage of using gart
buffers is that the GPU will do the fetch across the bus freeing the
CPU to do other stuff.


More information about the xorg mailing list