[cairo] [PATCH] SSE2 support for pixman (v2)

Mon Mar 17 08:16:00 PDT 2008

On Mon, Mar 17, 2008 at 11:57 AM, André Tupinambá <andrelrt at gmail.com>
wrote:

> Hi Rodrigo,
>
> > Did you see why there are some big performance regressions between
> > perf-mmx-base-run4 and  perf-sse2-run4?
> >
> > With cairo-perf-diff there are a few cases that are quite serious:
>
> Do you want to see something quite curious? Try to compare
> perf-mmx-base-run1 and perf-mmx-base-run3 :)
>

Running cairo-perf with nice -20 and -i 500 or -i 1000 did the trick of
giving me stable numbers, but other more skilled on this could give us
further advice.

> > Overall, I found that sse is not that much of a help for a Core 2 cpu,
> that
> > can sustain the same memory bandwidth with mmx code. The same cannot be
> said
> > for other models such as the P4, which gets a pretty good speedup.
>
> It's sounds strange. The performance in Core2 machine should be
> increased too. The MMX code loads a pixel, do the transformation and
> save a pixel. The SSE2 code loads 4 pixels, do 4 transformation
> sametime and save 4 pixel.

It's not that strange if you think from the memory fetching perpective. Both
the mmx code of the sse code will do the same amount of main memory fetches
as the cache line is 32 byte wide (or is 64?). The same can be said about
memory writes, as the same number of bus operations will be done. Since main
memory operations are in the other of many dozen of cicles, the mmx/sse
transformation code will basically be noise in the pipeline.

This means in the end that the big win would be to do a single pass
combining multiple operations in one. I guess even an interpreted
software-based shader script would have significantly better performance
than applying a long sequence of passes.

Rodrigo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.cairographics.org/archives/cairo/attachments/20080317/98a055c2/attachment.html