Solved! (was Re: client-side font rendering very very slow in xserver 1.5.3 w/r200: massive fetches from VRAM, why?)

Nix nix at
Wed Feb 4 16:29:38 PST 2009

On 3 Feb 2009, Dan Nicholson uttered the following:
> The output isn't quite what I'd expect, but I think this is because
> it's using the builtin fonts only. Try rebuilding the server with
> --disable-builtin-fonts, or apply this patch that's a candidate for

augh! blasted --disable-builtin-fonts, why isn't it the default? (More
to the point, where did the option go? I was passing it to configure
when I built 1.5.x, did it edit itself out of my configure-switches
file? ;) )

With that done, I've been able to get more crude not-a-benchmark
results, and they are *vastly* better. Here are the 1.5 and 1.6 results,
with a bit of extra info to show how much time the pathological
megascrolling app (cat in a tight loop) is getting to run rather than
being blocked by its scrolling xterm (this is really crude because of
course the fonts I'm using are rather different sizes, but it gives you
the gist: I've used piles of stars to give a visual cue as well):

XAA first, then EXA, leaving the best till last:

1.5: XAA, 16,     AA: fbCompositeSolidMask_nx8888x0565Cmmx 59.25 (X: 90.12)
1.5: XAA, 24,     AA: fbCompositeSolidMask_nx8888x8888Cmmx 57.12 (X: 85.03)
1.6: XAA, 16,     AA: fbCompositeSolidMask_nx8888x0565Cmmx 58.41 (X: 84.70)
                      speed of underlying painting app on arbitrary scale
                      (forks of cat per second): 10
1.6: XAA, 24,     AA: fbCompositeSolidMask_nx8888x8888Cmmx 58.14 (X: 85.27)
                      App speed: 10

1.5: XAA, 16, non-AA: fbFetch_a1 12.43 (X: 79.96)
1.5: XAA, 24, non-AA: fbFetch_a1 14.51 (X: 87.60)
1.6: XAA, 16, non-AA: fbFetch_a1 14.34 (X: 80.23); app speed: 7
1.6: XAA, 24, non-AA: fbFetch_a1 12.23 (X: 82.12); app speed: 15

1.5: XAA, 16,   core: cat, bash, xterm; CPU load nearly nil; screen a blur
                      far too fast to read
                      highest consumer in X, at <1s, DrawTETextScanlineWidth7()
1.5: XAA, 24,   core: as above
1.6: XAA, 16,   core: highest consumer in X, at 1.5s, DrawTETextScanlineWidth7()
                      app speed: 150
1.6: XAA, 24,   core: as above; max-X, @1.63s, DrawTETextScanlineWidth7()
                      app speed: 140

So far, so mostly unchanged and boring: XAA hasn't changed much. The good stuff,
as Michel suggested, is in EXA.

1.5: EXA, 16,     AA: dixLookupPrivate 23.17
                      generally much faster than XAA, occasionally degrades to
                      XAA speed
1.5: EXA, 24,     AA: dixLookupPrivate 26.83
1.6: EXA, 16,     AA: exaBufferGlyph 5.74 (X: 49.42); xterm: 9.48; kernel: 20.10
1.6: EXA, 24,     AA: exaBufferGlyph 4.82 (X: 57.88); xterm: 7.09, kernel: 33.5
                      App speed: 50, considerably slower than 16bpp. X seems to
                      be spending longer in the kernel.

1.5: EXA, 16, non-AA: fbFetch_r5g6b5 53.40, fbFetch_a1 5.75 (X: 95.88)
                      horrendously, impossibly slow, >10s for a single screen
1.5: EXA, 24, non-AA: fbFetch_a1 12.40 (X: 89.34)
1.6: EXA, 16, non-AA: exaBufferGlyph 6.41 (X: 49.89); xterm: 28.91; kernel: 17
                      App speed: 80.
1.6: EXA, 24, non-AA: exaBufferGlyph 6.31 (X: 52.58); xterm: 26.35;
                      kernel: 20.81. App speed: 50.

1.5: EXA, 16,   core: pixman_fill_mmx 22.17, fbGlyph16 15.83 (X: 62.58)
1.5: EXA, 24,   core: pixman_fill_mmx 37.51, fbGlyph32 13.63 (X: 69.48)
1.6: EXA, 16,   core: pixman_fill_mmx 21.28, fbGlyph16 15.71 (X: 59.59).
                      App speed: 60, slower than non-core fonts, as expected,
                      and much slower than XAA, as expected. Figures basically
                      identical to 1.5 core fonts.
1.6: EXA, 24,   core: pixman_fill_mmx 30.17, fbGlyph32 11.69 (X: 65.14)
                      App speed: 40.

The changes between EXA and XAA are interesting. Core fonts are somewhat
slower; 24bpp has moved from being faster than 16bpp to being slower
than it; but the most significant change from my perspective is that the
crude little 4-entry glyph cache has raised the performance of font
rendering by about an order of magnitude. That one change has made my
desktop seem fast again, as *everything* that does a lot of client-side
text painting has just got a lot faster, which is to say virtually
everything I use. (The performance reduction for non-antialiased
client-side fonts is probably down to xterm: with konsole, they're
equally fast.)

So it looks like I'm going to be using prerelease X servers, and I owe
Owen Taylor a beer for implementing the glyph cache and Michel Dänzer
another beer for making it work with non-antialiased fonts :)

(only downside of prerelease X servers: the dropping of XFree86-Misc
means that xset can no longer set the keyboard repeat rate. KDE et
al can still do it, though. I suppose xset needs fixing, if anyone
but me uses it anymore.)

More information about the xorg mailing list