[Xorg] Faster render with gcc 3.4 mmx intrinsics

Thu Jul 8 10:01:45 PDT 2004

On bug 839

        http://freedesktop.org/bugzilla/show_bug.cgi?id=839

there is a patch adding faster versions of some the render ops.

A benchmark rendering a paragraph of component alpha text to a pixmap
gave these results on a 1200 MHz laptop with an i830 chip running
Fedora Core I:

Unmodified X server and the pixmap in system RAM:

        [ssp at localhost x]$ ./a.out
        total time: 41.394618
        average rect time: 0.683200
        worst rect: 9
        average glyph time: 3.550500

with the MMX optimizations:

        [ssp at localhost x]$ ./a.out
        total time: 22.972553
        average rect time: 0.677900
        worst rect: 9
        average glyph time: 1.692000

Ie., text rendering is more than twice as fast. The 'average glyph
time' here is the time it takes to render the entire paragraph of
text.

With the pixmap in video RAM, the speedup is not quite as
spectacular:

Unmodified X server:

        [ssp at localhost x]$ ./a.out
        total time: 95.900768
        average rect time: 0.003300
        worst rect: 1
        average glyph time: 9.693500

With MMXified compositing:

        total time: 66.559287
        average rect time: 0.015100
        worst rect: 6
        average glyph time: 6.720500

But still a nice improvement. The patch includes improved code for
these cases:

Subpixel text:
- (constant color) in (component alpha mask) over 565 destination
- (constant color) in (component alpha mask) over 32bit destination
- (32 bit component alpha) Saturate (32 bit destination)

Regular antialiased text:
- (8 bit alpha) Saturate (8 bit destination)
- (constant color) in (8 bit alpha mask) over 565 destination
- (constant color) in (8 bit alpha mask) over 32bit destination

GdkPixbuf:
- (reversed, non-premultiplied source) over 32bit destination
- (reversed, non-premultiplied source) over 565 destination

Alpha rectangle (e.g., Nautilus selection rectangle):
- (constant color) over 32bit destination
- (constant color) over 565 destination

Solid fill
- solid fill of 32 bit drawable
- solid fill of 16 bit drawable

The code can optionally be compiled to use the pshufw instruction, which
is only available on pentium III. 

One question: The patch has a bad hack where it redefines
DefaultCCOptions for all of the framebuffer code. How should this be
done properly? The problem with the existing DefaultCCOptions is that
they include -pedantic which doesn't work with the MMX intrinsics.

Søren