Initial attempts at i965 text batching
Carl Worth
cworth at cworth.org
Wed Dec 19 08:12:02 PST 2007
So a long time ago I reported that with my i965 I could get about
290,000 glyphs/sec. from "x11perf -aa10text" by using the NoAccel
option to the X server, and similar performance with XAA, but only
95,000 glyphs./sec with EXA due to the synchronous compositing bug in
the driver.
Since then, Dave Airlie rewrote the driver to use batch buffers,
which completely eliminated all of the syncs. By design, his work was
"functional, not performant" as it would go through all the effort of
allocating a new batch buffer, initializing all device state, and
emitting the batch for every compositing operation.
Needless to say, that's more work than we really want to do, and it
showed by getting performance in the range of 1000 - 10,000
glyphs/sec.
Since then, I've rewritten parts of the driver to attempt to take
advantage of the batch buffers by actually batching up as much as
possible. General device state is only initialized once, then
surface-specific state is initialized in a batch basis within a buffer
object.
My work is available in the master branch of my personal
xf86-video-intel repository:
http://cgit.freedesktop.org/~cworth/xf86-video-intel/log/
This work required some changes to the drm interface which Dave kindly
provided here, (in a hacked form---a cleaner version merged together
with Keith's recent work will come soon):
http://cgit.freedesktop.org/~airlied/drm/log/?h=i965-hack-drm
So both of those are required for anyone that wants to experiment with
this.
As for performance, initially batching seems to help a lot, but we hit
a ceiling sooner than I would like to:
Ops/batch Glyphs/sec.
---------- -----------
1 10,000
2 20,000
4 37,000
8 67,000
16 110,000
32 120,000
64 120,000
128 120,000
For people that saw an earlier version of this table, I should
explain two differences:
* Earlier, it stopped at 64 since it started crashing after
that. This was easy to workaround by increasing the BATCH_SZ
value. Clearly there's some missing error-checking around
that value.
* Previously, it looked like things kept improving all the way
to 64 ops./sec. That was because that version was
unconditionally allocating a maximally larger buffer object
for the surface state, (so the allocation overheard hit
every case). Here, the surface state buffer object is
allocated at the appropriate size, so the smaller batching
cases improve and we hit the 120,000 glyphs/sec. ceiling
earlier.
I'll be looking into why things aren't faster than this, but first
I'll need to get oprofile working on my system again.[*]
-Carl
[*] Right now opreport is complaining with:
opreport: error while loading shared libraries: libbfd-2.18.so: cannot
open shared object file: No such file or directory
Does that mean anything to anybody? I'm doing a general system upgrade
now to see if that helps.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.x.org/archives/xorg/attachments/20071219/b5bc5c81/attachment.pgp>
More information about the xorg
mailing list