Strange speed differences on composite

Tue Feb 14 17:07:31 PST 2006

On Tue, 14 Feb 2006 12:07:59 -0500 Adam Jackson <ajax at nwnk.net> babbled:

> On Saturday 11 February 2006 14:13, Tomasz Torcz wrote:
> > On Fri, Feb 10, 2006 at 07:00:03AM -0500, Owen Taylor wrote:
> > > On Thu, 2006-02-09 at 18:55 +0100, Tomasz Torcz wrote:
> > > >   The question: how it comes that software-only Xephyr is _faster_ at
> > > > drawing shadows than fully hw accelerated X server with mga driver?
> > >
> > > If you ever have to read data from the frame buffer, that's *slow* -
> > > you lose far more than you gain from any hardware acceleration you
> > > might be getting.
> >
> >   But why read data? Shouldn't it be composed by graphic card in VRAM?
> > Isn't that what acceleration is for?
> 
> "Acceleration" isn't an all-or-nothing thing.  Almost all the drivers we have 
> are accelerated to one degree or another.  The issue is that very few of them 
> accelerate the image composition operations that Render exposes.
> 
> When you do an Over blend in Render, you're computing values for each pixel:
> 
> output = 1 * src_color + (1 - src_alpha) * dst_color
> 
> When Render is accelerated in hardware, you can simply load the 1 and
> (1-srca) blend factors into the hardware and it runs like a normal
> screen-to-screen blit; you can think of a plain blit as one where the source
> blend factor is 1 but the dest blend factor is 0.  When it's not accelerated
> in hardware, you have to compute each output pixel by reading the dst_color
> pixel from the framebuffer, blending it with the src_color pixel in the CPU,
> and then writing that result back to the framebuffer.  
> 
> That's essentially the same as reading the entire image back from the 
> framebuffer into host memory.  Write speeds are pretty fast, but framebuffer 
> readback speeds pretty much top out at 50M/sec or so, so that's how fast 
> you're going to go.

it's actually a bit worse that that - since both src and dst are likely in
video ram - u have to read the src pixel AND the dst pixel from video ram (read
2 lots of the image) and THEN blend and write back. :) not to mention it gets
even nastier when doing transforms with interpolation or super-smapling where
with interpolation for example, it has to read 4 pixels from src, calulate an
interpolation, read a dst pixel, blend, then write back (so read 5 lots of the
image from the fb) :/ if the src and dst are in system memory - these reads
have somewhere between 1/10th and 1/100th of the impact they do reading FROM
the video card, and so on modern cpu's you can manage some usable framerates -
given optimised blending/interpolation etc. algorithms. tehcnically speakin
xrender COULD get smart and learn to cache parts of drawables in system ram and
use dirty masks (or tiles) to know when they invalidate to speed this up and
avoid the read from video ram wherever possible - but this is going to be very
involved code.

> Phrased another way, your assertion above:
> 
> > > > drawing shadows than fully hw accelerated X server with mga driver?
> 
> is that the mga driver is fully hardware accelerated.  It's not, it doesn't 
> accelerate Render in hardware.
> 
> - ajax
> 

-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com
裸好多
Tokyo, Japan (東京 日本)