[cairo] [PATCH] SSE2 support to pixman

André Tupinambá andrelrt at gmail.com
Thu Mar 13 14:18:15 PDT 2008


Hum...

I'm my machine works fine (software engineer's frequent expressions #2 :)

This don't make any sense to me. I saw the assembly code generated
from pixman-sse.c compilation, and it's too much optimized to be
slower than MMX one.

Did you change the save128WriteCombining to save128Aligned and ran the
perf to check this?

Ok, I didn't make the opaque check, I know. I would run a performance
test to check which one is faster (look at TODO in the code), but if
you already test it...

I will check this late at night...

Thanks

André

On 13 Mar 2008 21:26:50 +0100, Soeren Sandmann <sandmann at daimi.au.dk> wrote:
> "André Tupinambá" <andrelrt at gmail.com> writes:
>
>  > I just finnished the patch to the pixman library to add SSE2 support.
>  > The patch was made using Kumpera's files and my proof of concept.
>  >
>  > I ran the cairo's tests and perf, and everything seems to be ok.
>
>  Overall, this looks great. It's well-written, and GCC actually
>  generates decent code for the intrinsics. However, when I tested this
>  with cairo-perf it came out slower than the MMX code. Here are the
>  numbers I get:
>
>  Before:
>  c-24-61-65-93:~/cairo/perf% env CAIRO_TEST_TARGET=image ./cairo-perf -i 5000 paint_image_rgba_over
>  [ # ]  backend-content                    test-size min(ticks)  min(ms) median(ms) stddev. iterations
>  [  0]    image-rgba       paint_image_rgba_over-256    2426056    0.810    0.830  0.93% 4841
>  [  1]    image-rgba       paint_image_rgba_over-512    9577644    3.199    3.242  0.52% 4708
>  [  0]    image-rgb        paint_image_rgba_over-256    3023492    1.010    1.030  0.70% 4780
>  [  1]    image-rgb        paint_image_rgba_over-512    9587464    3.202    3.242  0.49% 4747
>
>  After:
>  c-24-61-65-93:~/cairo/perf% env CAIRO_TEST_TARGET=image ./cairo-perf -i 5000 paint_image_rgba_over
>  [ # ]  backend-content                    test-size min(ticks)  min(ms) median(ms) stddev. iterations
>  [  0]    image-rgba       paint_image_rgba_over-256    3857756    1.288    1.297  0.35% 4207
>  [  1]    image-rgba       paint_image_rgba_over-512   15787128    5.270    5.287  0.12% 4140
>  [  0]    image-rgb        paint_image_rgba_over-256    4169408    1.392    1.409  0.56% 4019
>  [  1]    image-rgb        paint_image_rgba_over-512   15727612    5.250    5.267  0.12% 4339
>
>  c-24-61-65-93:~/cairo/perf% ./cairo-perf-diff old.perf new.perf
>  old: old
>  new: new
>  Slowdowns
>  =========
>  image-rgba      paint_image_rgba_over-512    3.24 0.52% ->   5.29 0.12%:  1.65x slowdown
>  image-rgb       paint_image_rgba_over-512    3.24 0.49% ->   5.27 0.12%:  1.64x slowdown
>  image-rgba      paint_image_rgba_over-256    0.83 0.93% ->   1.30 0.35%:  1.59x slowdown
>  image-rgb       paint_image_rgba_over-256    1.03 0.70% ->   1.41 0.56%:  1.38x slowdown
>
>  Would you mind posting the numbers you got?
>
>  I suspect two things going on:
>
>  (1) I don't think streaming writes are appropriate here. The problem
>  is that they force the cache line in question of the cache hierarchy
>  altogether. For a function like this one, this means that basically
>  every destination read will be uncached due to the previous iteration
>  having used a streaming write.
>
>  So I'd suggest to simply use save128Aligned() instead.
>
>  (2) The MMX version is careful to avoid reading from the destination
>  whenever the source pixels are fully opaque. My experience is that
>  this is enough of a win that it easily pays for the check.
>
>  SSE2 has pretty good support for this. We can use something like this
>  function:
>
>     static inline int
>     is_opaque (__m128i src)
>     {
>         __m128i alpha = _mm_and_si128 (src, Maskff000000);
>         __m128i cmp = _mm_cmpeq_epi8 (alpha, Maskff000000);
>         int x = _mm_movemask_epi8 (cmp);
>
>         return x == 0xffff;
>     }
>
>  where Maskff000000 is four copies of 0xff000000.
>
>
>  Thanks,
>  Soren
>


More information about the cairo mailing list