Intel ( i845G ) profiling

Eric Anholt eric at anholt.net
Wed Mar 5 11:04:59 PST 2008


On Wed, 2008-03-05 at 15:44 +1100, Daniel Kasak wrote:
> Some discussion regarding EXA performance for my graphics chip motivated
> me to do some profiling. 
> 
> I've used 2 different apps to benchmark, expedite ( an evas-based
> benchmark from the Enlightenment team ) and an internal application
> which uses Gtk2. I tested with a couple of different driver setups
> ( exa, xaa, xaa with no-offscreen-pixmaps ). I also tested some
> combinations with a compositing manager named ecomorph, which is a port
> of compiz to run on Enlightenment. Profiling was done with sysprof from
> svn.
> 
> For the expedite tests, I ran the full auto test, with various renderers
> ( software, xrender, gl ). I kept the output of each expedite run ( see
> the .txt files in the profile tarball ). Quite counter-intuitively,
> expedite gives EXA a better score overall than XAA. I'm not sure what to
> say about that. Certainly for some benchmarks it scores lower. I suppose
> Gtk2 makes most use of those features which EXA doesn't score so well at
> in the expedite tests? Or something like that. At any rate, all of the
> apps that I use are Gtk2 apps ( other than Enlightenment ).
> 
> For the internal application ( see
> http://entropy.homelinux.org/axis/images/client_services.png for a
> screenshot ), I loaded it up, moved to a client, and then clicked on
> each of the pages. I waited until the page finished rendering, then I
> clicked on the next page. With XAA, each page will render in just under
> half a second, which isn't too bad. With EXA, each page takes between 1
> and 2 seconds, which is quite noticably slower than XAA, and a little
> painful to watch. Whack a compositing manager on top, and it gets worse
> still ( ie completely unusable, whereas with XAA it's OK ... not
> brilliant, but OK ).
> 
> A tarball of the sysprof profiles and expedite output can be found at:
> http://entropy.homelinux.org/intel_benchmarking.tar.bz2
> 
> Software setup - all these are compiled with gcc-4.1.2 with
> CFLAGS="-march=pentium4 -g -O2 -pipe -ftracer -fweb" ...
>  - mesa-7.0.2
>  - xorg-server-1.4.0.90
>  - xf86-video-i810-2.2.0.90
>  - libpixman-0.1.6
>  - glibc-2.7
> 
> If someone wants me to profile something different, please feel free to
> ask :)

Excellent, this is what I was hoping someone would do.  So, from your
profile, you're spending 67% of cpu in memcpying data out for software
fallbacks from the composite path.  I'm betting TTM will improve that
somewhat (you're avoiding an uncached copy in favor of cached access,
but there's clflushing and chipset flushing going on to make that
possible).

But the real problem is that you're hitting the software fallback path
at all.  Looking over the code, it looks like we're falling back on
rendering from a8 textures because you're on 830 or 845.  That would be
all non-subpixel text.  Ouch.  You could remove the #ifdef I830DEBUG
guard around #define DEBUG_I830FALLBACK to confirm that that's where the
majority of your fallbacks are occurring.

If so, the reason we're falling back is because the A8 format (0,0,0,a)
is unsupported on that chipset.  But we've got the I8 format which
produces (a,a,a,a), and if we're using the a8 picture as a
non-component-alpha mask, we'll only use that fourth component.  For a
more general implementation, those color channels could be treated as
zero by using ARGx_SEL_ONE | ARGx_INVERT when reading them instead of
ARGx_SEL_TEXELy.

-- 
Eric Anholt                             anholt at FreeBSD.org
eric at anholt.net                         eric.anholt at intel.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: This is a digitally signed message part
URL: <http://lists.x.org/archives/xorg/attachments/20080305/4c551960/attachment.pgp>


More information about the xorg mailing list