[Xorg] Faster render with gcc 3.4 mmx intrinsics
Soeren Sandmann
sandmann at daimi.au.dk
Thu Jul 8 10:01:45 PDT 2004
On bug 839
http://freedesktop.org/bugzilla/show_bug.cgi?id=839
there is a patch adding faster versions of some the render ops.
A benchmark rendering a paragraph of component alpha text to a pixmap
gave these results on a 1200 MHz laptop with an i830 chip running
Fedora Core I:
Unmodified X server and the pixmap in system RAM:
[ssp at localhost x]$ ./a.out
total time: 41.394618
average rect time: 0.683200
worst rect: 9
average glyph time: 3.550500
with the MMX optimizations:
[ssp at localhost x]$ ./a.out
total time: 22.972553
average rect time: 0.677900
worst rect: 9
average glyph time: 1.692000
Ie., text rendering is more than twice as fast. The 'average glyph
time' here is the time it takes to render the entire paragraph of
text.
With the pixmap in video RAM, the speedup is not quite as
spectacular:
Unmodified X server:
[ssp at localhost x]$ ./a.out
total time: 95.900768
average rect time: 0.003300
worst rect: 1
average glyph time: 9.693500
With MMXified compositing:
total time: 66.559287
average rect time: 0.015100
worst rect: 6
average glyph time: 6.720500
But still a nice improvement. The patch includes improved code for
these cases:
Subpixel text:
- (constant color) in (component alpha mask) over 565 destination
- (constant color) in (component alpha mask) over 32bit destination
- (32 bit component alpha) Saturate (32 bit destination)
Regular antialiased text:
- (8 bit alpha) Saturate (8 bit destination)
- (constant color) in (8 bit alpha mask) over 565 destination
- (constant color) in (8 bit alpha mask) over 32bit destination
GdkPixbuf:
- (reversed, non-premultiplied source) over 32bit destination
- (reversed, non-premultiplied source) over 565 destination
Alpha rectangle (e.g., Nautilus selection rectangle):
- (constant color) over 32bit destination
- (constant color) over 565 destination
Solid fill
- solid fill of 32 bit drawable
- solid fill of 16 bit drawable
The code can optionally be compiled to use the pshufw instruction, which
is only available on pentium III.
One question: The patch has a bad hack where it redefines
DefaultCCOptions for all of the framebuffer code. How should this be
done properly? The problem with the existing DefaultCCOptions is that
they include -pedantic which doesn't work with the MMX intrinsics.
Søren
More information about the xorg
mailing list