[Xorg-driver-geode] Glyph rendering

Mon Jul 19 23:25:08 PDT 2010

On Die, 2010-07-20 at 10:52 +0800, Huang, FrankR wrote: 
> 
> Can we let the Xserver give us PICT_a8r8g8b8 source and dest to do the
> PictOpAdd?

No, the source will be PICT_a8.

> On Tue, 2010-07-20 at 04:32 +0300, Jonathan Morton wrote:
> > On 19 July 2010 12:35, Huang, FrankR <FrankR.Huang at amd.com> wrote:
> > > I found that our HW can not support PICT_a8 destination, only support PICT_r3g3b2 destination. Can you use PICT_r3g3b2? From the experiment, the result is not correct. I think the HW will split the 8 bits value into 3,3,2 and do the PictOpAdd separately instead a whole 8 bits value?
> > > Or do that in SW way (I wrote in the driver)?
> > 
> > You have better information on the hardware than I do.  I can only
> > tell you what needs to be done.
> > 
> > For this particular purpose you can treat A8 as being equivalt to
> > 8-bit greyscale.  I don't know if that helps you.
> > 
> > If there is no way to make Geode do the A8 addition in one pass, then
> > it is probably better to do it in software.  This is trivially
> > achieved by rejecting the operation and allowing Pixman to take it
> > over.  Pixman - if a recent enough version - will automatically take
> > advantage of MMX or SSE2 if available (I don't know if your hardware
> > has it).
> 
> This means 5000 glyphs per second on x11perf -aa10text instead of 56500
> if fully accelerated (Frank tested with 8bpp color, which obviously gave
> the wrong result, but fast), or 20-28k (if we fallback the other glyph
> operation as well for avoiding pixmap migration ping-pong).

Yeah, basically exaGlyphs() is only expected to be a win if all of its
steps can be accelerated.

> Could we somehow do it via ARGB with perhaps some necessary tricks
> regarding pitch/width, as the end result would be the same if it were
> a8r8g8b8 or a8a8a8a8?

I brought up this idea on IRC, but after more thought I'm afraid it's
not feasible, as the Check/PrepareComposite hooks don't know the widths
used for the consequent Composite hook calls.

> Or can we do this operation with pixman in video memory, so the other
> operation doesn't need any pixmap migration? After all, it's shared
> memory. Sort of UMA, but not fully in the way e.g Intel has it (I
> believe hardware operations destinations have to be in the reserved
> video memory space, but Frank should know more about the hardware
> limitations in that area by now). Of course if that were the case, maybe
> we could "hardware accelerate" all things in video memory to avoid
> pixmap migration, and cheat with pixman with video memory pointers (just
> reserved area in system memory) if the hardware can't do it itself and
> avoid any pixmap migration, given enough offscreen video memory...

You could experiment with using the 'driver' or maybe 'mixed' scheme
instead of 'classic', but beware that migration isn't the only reason
for software rendering to kill performance: It requires synchronisation
between the GPU and CPU, which incurs overhead and prevents GPU
pipelining.

-- 
Earthling Michel Dänzer           |                http://www.vmware.com
Libre software enthusiast         |          Debian, X and DRI developer