Overhead of X?

Carsten Haitzler (The Rasterman) raster at rasterman.com
Thu Apr 20 17:01:45 PDT 2006


On Thu, 20 Apr 2006 18:29:12 +0200 Egbert Eich <eich at suse.de> babbled:

> Bilderbeek, Manuel writes:
>  > Hi,
>  > 
>  > > X does a little more than just a memset() of the enitre framebuffer.
>  > > Filling the entire framebuffer with just one solid color is 
>  > > nothing that
>  > > the solid fill code in fb (that's what you are using unaccelerated) is
>  > > really optimized for. Therefore it will perform some tests 
>  > > and operations
>  > > that are not required for this particular operation.
>  > 
>  > Would all those tests take so much CPU that it makes it 3 times slower?
>  > So, 2/3 of the time is used for calculations and checks?
>  > After all, the fill code is just supposed to fill a solid rectangle, so
>  > the filling itself should be comparable. Or not?
> 
> 
> memset() is highly optimized to fill one single large range of 
> memory with a single byte value.

you may also find it is possibly faster if u memset 0 than other values (there
is a special case for 0 value fills). you also might find that memset isn't
always that optimised. it depends from libc to libc (and patches, compile
options, distribution, etc.). also remember that x will likely be filling in
pixel by pixel. if you are running 8 or 16bpp then x will most likely be
writing 1 byte or 2 bytes at a time, memset may be filling in runs of 4 or 8 or
even more bytes in 1 instruction. and as has been mentioned - solid color fills
with the cpu is not one of the most common cases as it is most often hardware
accelerated. i actually spent a bit of time on this myself (for my own software
routines) and i found it interesting to compare to memset and the myriad of
ways you can fill in memory and get differing performance: the results of the
test suite i have gave this:

[000]                      copy_color_dst-argb           C: 374.824 mpix/sec
[001]                      copy_color_dst-argb          OS: 350.372 mpix/sec
[002]                      copy_color_dst-argb MMX/ALTIVEC: 375.546 mpix/sec
[003]                      copy_color_dst-argb         SSE: 970.894 mpix/sec
[004]                      copy_color_dst-argb        SSE2: 853.097 mpix/sec

(where a pixel is 32bit ARGB). the "OS" variant is memset in this case. you
will find the results vary from cpu to cpu - this is a p4, on an amd64 in 64bit
mode it's quite different. the C routine simply was a for loop writing 32bits
at a time (one scanline at a time). so note above that with some handcrafted
assembly you can almost triple the performance of a loop doing it 1 pixel
(32bits) at a time in plain "C". you are likely seeing all sorts of
combinations of optimised libc memset(), x simply doing things 1 pixel at a
time and maybe your bit depth being low.

> Some implementation use hand crafted machine code for this, 
> select the right code sequence for the size and alignment of
> the memory. 
> XDrawRectangle() cannot use a lot of these optimizations or it's
> useless to add them for your corner case.
> Maybe you want to profile your problem a little better. This
> will tell you where the CPU cycles go - instead of letting us
> guess.
> 
>  > > Furthermore memset() is ususally higly optimized for your 
>  > > architecture.
>  > 
>  > If the fb fill routine would use memset, would that matter? (I don't
>  > know if it does that, but I don't see why it couldn't...)
> 
> It's not feasable as it will only be applicable to a few corner cases
> - like yours.
> Not very many people are interested in a solid color full screen
> rectangle.
> 
>  >  
>  > > This seems to be a poor comparison.
>  > 
>  > How can I make it better?
> 
> Maybe you should look for a more useful less corner case test scenario?
> 
>  > 
>  > Maybe running the application in the root window might help? (Could save
>  > some checks!)
>  > How would I do that, anyway? (Run a Java app in the root window...)
> 
> Well, even if you paint right into the root window your operations will
> still observe clipping by children.
> 
> Cheers,
> 	Egbert.
> _______________________________________________
> xorg mailing list
> xorg at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/xorg
> 


-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com
裸好多
Tokyo, Japan (東京 日本)



More information about the xorg mailing list