[cairo] [PATCH] SSE2 support to pixman
André Tupinambá
andrelrt at gmail.com
Thu Mar 13 14:18:15 PDT 2008
Hum...
I'm my machine works fine (software engineer's frequent expressions #2 :)
This don't make any sense to me. I saw the assembly code generated
from pixman-sse.c compilation, and it's too much optimized to be
slower than MMX one.
Did you change the save128WriteCombining to save128Aligned and ran the
perf to check this?
Ok, I didn't make the opaque check, I know. I would run a performance
test to check which one is faster (look at TODO in the code), but if
you already test it...
I will check this late at night...
Thanks
André
On 13 Mar 2008 21:26:50 +0100, Soeren Sandmann <sandmann at daimi.au.dk> wrote:
> "André Tupinambá" <andrelrt at gmail.com> writes:
>
> > I just finnished the patch to the pixman library to add SSE2 support.
> > The patch was made using Kumpera's files and my proof of concept.
> >
> > I ran the cairo's tests and perf, and everything seems to be ok.
>
> Overall, this looks great. It's well-written, and GCC actually
> generates decent code for the intrinsics. However, when I tested this
> with cairo-perf it came out slower than the MMX code. Here are the
> numbers I get:
>
> Before:
> c-24-61-65-93:~/cairo/perf% env CAIRO_TEST_TARGET=image ./cairo-perf -i 5000 paint_image_rgba_over
> [ # ] backend-content test-size min(ticks) min(ms) median(ms) stddev. iterations
> [ 0] image-rgba paint_image_rgba_over-256 2426056 0.810 0.830 0.93% 4841
> [ 1] image-rgba paint_image_rgba_over-512 9577644 3.199 3.242 0.52% 4708
> [ 0] image-rgb paint_image_rgba_over-256 3023492 1.010 1.030 0.70% 4780
> [ 1] image-rgb paint_image_rgba_over-512 9587464 3.202 3.242 0.49% 4747
>
> After:
> c-24-61-65-93:~/cairo/perf% env CAIRO_TEST_TARGET=image ./cairo-perf -i 5000 paint_image_rgba_over
> [ # ] backend-content test-size min(ticks) min(ms) median(ms) stddev. iterations
> [ 0] image-rgba paint_image_rgba_over-256 3857756 1.288 1.297 0.35% 4207
> [ 1] image-rgba paint_image_rgba_over-512 15787128 5.270 5.287 0.12% 4140
> [ 0] image-rgb paint_image_rgba_over-256 4169408 1.392 1.409 0.56% 4019
> [ 1] image-rgb paint_image_rgba_over-512 15727612 5.250 5.267 0.12% 4339
>
> c-24-61-65-93:~/cairo/perf% ./cairo-perf-diff old.perf new.perf
> old: old
> new: new
> Slowdowns
> =========
> image-rgba paint_image_rgba_over-512 3.24 0.52% -> 5.29 0.12%: 1.65x slowdown
> image-rgb paint_image_rgba_over-512 3.24 0.49% -> 5.27 0.12%: 1.64x slowdown
> image-rgba paint_image_rgba_over-256 0.83 0.93% -> 1.30 0.35%: 1.59x slowdown
> image-rgb paint_image_rgba_over-256 1.03 0.70% -> 1.41 0.56%: 1.38x slowdown
>
> Would you mind posting the numbers you got?
>
> I suspect two things going on:
>
> (1) I don't think streaming writes are appropriate here. The problem
> is that they force the cache line in question of the cache hierarchy
> altogether. For a function like this one, this means that basically
> every destination read will be uncached due to the previous iteration
> having used a streaming write.
>
> So I'd suggest to simply use save128Aligned() instead.
>
> (2) The MMX version is careful to avoid reading from the destination
> whenever the source pixels are fully opaque. My experience is that
> this is enough of a win that it easily pays for the check.
>
> SSE2 has pretty good support for this. We can use something like this
> function:
>
> static inline int
> is_opaque (__m128i src)
> {
> __m128i alpha = _mm_and_si128 (src, Maskff000000);
> __m128i cmp = _mm_cmpeq_epi8 (alpha, Maskff000000);
> int x = _mm_movemask_epi8 (cmp);
>
> return x == 0xffff;
> }
>
> where Maskff000000 is four copies of 0xff000000.
>
>
> Thanks,
> Soren
>
More information about the cairo
mailing list