[cairo] [PATCH] SSE2 support to pixman
Soeren Sandmann
sandmann at daimi.au.dk
Thu Mar 13 13:26:50 PDT 2008
"André Tupinambá" <andrelrt at gmail.com> writes:
> I just finnished the patch to the pixman library to add SSE2 support.
> The patch was made using Kumpera's files and my proof of concept.
>
> I ran the cairo's tests and perf, and everything seems to be ok.
Overall, this looks great. It's well-written, and GCC actually
generates decent code for the intrinsics. However, when I tested this
with cairo-perf it came out slower than the MMX code. Here are the
numbers I get:
Before:
c-24-61-65-93:~/cairo/perf% env CAIRO_TEST_TARGET=image ./cairo-perf -i 5000 paint_image_rgba_over
[ # ] backend-content test-size min(ticks) min(ms) median(ms) stddev. iterations
[ 0] image-rgba paint_image_rgba_over-256 2426056 0.810 0.830 0.93% 4841
[ 1] image-rgba paint_image_rgba_over-512 9577644 3.199 3.242 0.52% 4708
[ 0] image-rgb paint_image_rgba_over-256 3023492 1.010 1.030 0.70% 4780
[ 1] image-rgb paint_image_rgba_over-512 9587464 3.202 3.242 0.49% 4747
After:
c-24-61-65-93:~/cairo/perf% env CAIRO_TEST_TARGET=image ./cairo-perf -i 5000 paint_image_rgba_over
[ # ] backend-content test-size min(ticks) min(ms) median(ms) stddev. iterations
[ 0] image-rgba paint_image_rgba_over-256 3857756 1.288 1.297 0.35% 4207
[ 1] image-rgba paint_image_rgba_over-512 15787128 5.270 5.287 0.12% 4140
[ 0] image-rgb paint_image_rgba_over-256 4169408 1.392 1.409 0.56% 4019
[ 1] image-rgb paint_image_rgba_over-512 15727612 5.250 5.267 0.12% 4339
c-24-61-65-93:~/cairo/perf% ./cairo-perf-diff old.perf new.perf
old: old
new: new
Slowdowns
=========
image-rgba paint_image_rgba_over-512 3.24 0.52% -> 5.29 0.12%: 1.65x slowdown
image-rgb paint_image_rgba_over-512 3.24 0.49% -> 5.27 0.12%: 1.64x slowdown
image-rgba paint_image_rgba_over-256 0.83 0.93% -> 1.30 0.35%: 1.59x slowdown
image-rgb paint_image_rgba_over-256 1.03 0.70% -> 1.41 0.56%: 1.38x slowdown
Would you mind posting the numbers you got?
I suspect two things going on:
(1) I don't think streaming writes are appropriate here. The problem
is that they force the cache line in question of the cache hierarchy
altogether. For a function like this one, this means that basically
every destination read will be uncached due to the previous iteration
having used a streaming write.
So I'd suggest to simply use save128Aligned() instead.
(2) The MMX version is careful to avoid reading from the destination
whenever the source pixels are fully opaque. My experience is that
this is enough of a win that it easily pays for the check.
SSE2 has pretty good support for this. We can use something like this
function:
static inline int
is_opaque (__m128i src)
{
__m128i alpha = _mm_and_si128 (src, Maskff000000);
__m128i cmp = _mm_cmpeq_epi8 (alpha, Maskff000000);
int x = _mm_movemask_epi8 (cmp);
return x == 0xffff;
}
where Maskff000000 is four copies of 0xff000000.
Thanks,
Soren
More information about the cairo
mailing list