Road map for remaining pixman refactoring

Soeren Sandmann sandmann at daimi.au.dk
Tue Jun 2 01:37:56 PDT 2009


        Executive summary:

        Lots of text about refactoring; if you are an ARM developer,
        be advised that there are upcoming coding style changes that
        will make outstanding patches more difficult to maintain, so
        get them in quickly. Skip down to the last paragraph for more
        information.

The next batch of refactoring changes is the many-pixels branch:

        http://cgit.freedesktop.org/~sandmann/pixman/log/?h=many-pixels

(patch to master appended).

The main goal of this is to get rid of the pixman-transformed.c file
and move the functionality (fetching from transformed images) into
pixman-bits-image.c.

Along the way I changed the way the general code works. Previously it
would fetch each individual pixel through an indirect function
call. The new scheme works like this:

    - bits_image_fetch_transfomed() generates a list of fixed-point
      coordinates in a temporary buffer.

    - bits_image_fetch_filtered() depending on the image's filter and
      repeat mode turns this list of coordinates into a list of pixel
      coordinates.

    - bits_image_fetch_pixels() then calls a low-level pixel fetcher
      whose job it is to convert from whatever the image format is to
      PIXMAN_a8r8g8b8.

So basically, instead of working on one pixel at a time, the new
scheme works with lists of coordinates and pixels instead. It seems to
be slightly faster for many benchmarks:

        http://www.daimi.au.dk/~sandmann/complete.perf

The first slow-down on that list is for one-rounded-rectangle. If you
run just those benchmarks with a higher iteration count, you get this:

        http://www.daimi.au.dk/~sandmann/rounded.perf

Ie., a slight speed-up.

The main advantage of this scheme is that it opens op the possibility
of vectorizing parts of the general implementation. By turning the
three phases into function pointers it becomes possible for the
various implementations plug in their own SIMD versions of the
transorming, filtering and fetching phases.

Unless I hear otherwise, I'll push this branch soon, and then make an
0.15.10 release, hopefully Friday.


-=-=-=- Road map for the rest of the refactoring -=-=-=-

Two things remain to be done:

* Fix clipping

Clipping will be computed up front in pixman_compute_composite_region();
the rest of the code will not be concerned with clipping at all. This
has some consequences:

    - pixman_run_fast_path() will only run a fast path when

        - The composite region is completely covered by both source
          images. (Otherwise, the fast paths would have to generate
          transparent pixels).

        - Both source and mask are either solid, non-repeating, or
          normal-repeating and bigger than 16x16.

      Everything else will go through the general path.

    - The general path must deal with repeats in the non-transformed
      case.  REPEAT_NONE and REPEAT_NORMAL are fairly easy to deal
      with, the other repeat types will just go through the
      transformed path.

      (Eventually, the untransformed path should be seen as a fast
      path that belongs in an image created by the fast path
      implementation).

This will be followed by an 0.15.12 release.

* Formatting/naming/indenting

Finally, the naming and formatting style will be changed to match the
one described in cairo/CODING_STYLE with one change: braces go on
their own line, not the same line as "if/while/for".

     Note to ARM people: this will make outstanding patches more
     difficult to apply. Within reason, I can postpone this
     reformatting for the ARM files, but you should get your patches
     in quickly.

After this, following the coding style will be a requirement for
getting patches accepted into pixman.

And that's it. After that I have no more refactoring planned in the
near future. A 0.16.0 release candidate and release will follow.


Soren



diff --git a/pixman/Makefile.am b/pixman/Makefile.am
index 863caa3..c75ff87 100644
--- a/pixman/Makefile.am
+++ b/pixman/Makefile.am
@@ -25,8 +25,6 @@ libpixman_1_la_SOURCES =			\
 	pixman-linear-gradient.c		\
 	pixman-radial-gradient.c		\
 	pixman-bits-image.c			\
-	pixman-transformed.c			\
-	pixman-transformed-accessors.c		\
 	pixman-utils.c				\
 	pixman-edge.c				\
 	pixman-edge-accessors.c			\
diff --git a/pixman/pixman-access.c b/pixman/pixman-access.c
index 6b3ce34..3704c73 100644
--- a/pixman/pixman-access.c
+++ b/pixman/pixman-access.c
@@ -799,519 +799,1074 @@ fetchProc64 ACCESS(pixman_fetchProcForPicture64) (bits_image_t * pict)
 
 /**************************** Pixel wise fetching *****************************/
 
-static FASTCALL uint64_t
-fbFetchPixel_a2b10g10r10 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_a2b10g10r10 (bits_image_t *pict, uint64_t *buffer, int n_pixels)
 {
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t p = READ(pict, bits + offset);
-    uint64_t a = p >> 30;
-    uint64_t b = (p >> 20) & 0x3ff;
-    uint64_t g = (p >> 10) & 0x3ff;
-    uint64_t r = p & 0x3ff;
-
-    r = r << 6 | r >> 4;
-    g = g << 6 | g >> 4;
-    b = b << 6 | b >> 4;
-
-    a <<= 62;
-    a |= a >> 2;
-    a |= a >> 4;
-    a |= a >> 8;
-
-    return a << 48 | r << 32 | g << 16 | b;
-}
+    int i;
 
-static FASTCALL uint64_t
-fbFetchPixel_x2b10g10r10 (bits_image_t *pict, int offset, int line)
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = ((uint32_t *)buffer)[2 * i];
+	int line = ((uint32_t *)buffer)[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t p = READ(pict, bits + offset);
+	    uint64_t a = p >> 30;
+	    uint64_t b = (p >> 20) & 0x3ff;
+	    uint64_t g = (p >> 10) & 0x3ff;
+	    uint64_t r = p & 0x3ff;
+	    
+	    r = r << 6 | r >> 4;
+	    g = g << 6 | g >> 4;
+	    b = b << 6 | b >> 4;
+	    
+	    a <<= 62;
+	    a |= a >> 2;
+	    a |= a >> 4;
+	    a |= a >> 8;
+	    
+	    buffer[i] = a << 48 | r << 32 | g << 16 | b;
+	}
+    }
+}
+
+static FASTCALL void
+fbFetchPixel_x2b10g10r10 (bits_image_t *pict, uint64_t *buffer, int n_pixels)
 {
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t p = READ(pict, bits + offset);
-    uint64_t b = (p >> 20) & 0x3ff;
-    uint64_t g = (p >> 10) & 0x3ff;
-    uint64_t r = p & 0x3ff;
-
-    r = r << 6 | r >> 4;
-    g = g << 6 | g >> 4;
-    b = b << 6 | b >> 4;
+    int i;
+    
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = ((uint32_t *)buffer)[2 * i];
+	int line = ((uint32_t *)buffer)[2 * i + 1];
 
-    return 0xffffULL << 48 | r << 32 | g << 16 | b;
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t p = READ(pict, bits + offset);
+	    uint64_t b = (p >> 20) & 0x3ff;
+	    uint64_t g = (p >> 10) & 0x3ff;
+	    uint64_t r = p & 0x3ff;
+	    
+	    r = r << 6 | r >> 4;
+	    g = g << 6 | g >> 4;
+	    b = b << 6 | b >> 4;
+	    
+	    buffer[i] = 0xffffULL << 48 | r << 32 | g << 16 | b;
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_a8r8g8b8 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_a8r8g8b8 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    return READ(pict, (uint32_t *)bits + offset);
+    int i;
+    
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+	
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    buffer[i] = READ(pict, (uint32_t *)bits + offset);
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_x8r8g8b8 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_x8r8g8b8 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    return READ(pict, (uint32_t *)bits + offset) | 0xff000000;
+    int i;
+    
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+	
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    buffer[i] = READ(pict, (uint32_t *)bits + offset) | 0xff000000;
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_a8b8g8r8 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_a8b8g8r8 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = READ(pict, (uint32_t *)bits + offset);
-
-    return ((pixel & 0xff000000) |
-	    ((pixel >> 16) & 0xff) |
-	    (pixel & 0x0000ff00) |
-	    ((pixel & 0xff) << 16));
+    int i;
+    
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+	
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = READ(pict, (uint32_t *)bits + offset);
+	    
+	    buffer[i] = ((pixel & 0xff000000) |
+			 ((pixel >> 16) & 0xff) |
+			 (pixel & 0x0000ff00) |
+			 ((pixel & 0xff) << 16));
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_x8b8g8r8 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_x8b8g8r8 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = READ(pict, (uint32_t *)bits + offset);
-
-    return ((0xff000000) |
-	    ((pixel >> 16) & 0xff) |
-	    (pixel & 0x0000ff00) |
-	    ((pixel & 0xff) << 16));
+    int i;
+    
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+	
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = READ(pict, (uint32_t *)bits + offset);
+	    
+	    buffer[i] = ((0xff000000) |
+			 ((pixel >> 16) & 0xff) |
+			 (pixel & 0x0000ff00) |
+			 ((pixel & 0xff) << 16));
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_b8g8r8a8 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_b8g8r8a8 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = READ(pict, (uint32_t *)bits + offset);
+    int i;
 
-    return ((pixel & 0xff000000) >> 24 |
-	    (pixel & 0x00ff0000) >> 8 |
-	    (pixel & 0x0000ff00) << 8 |
-	    (pixel & 0x000000ff) << 24);
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = READ(pict, (uint32_t *)bits + offset);
+	    
+	    buffer[i] = ((pixel & 0xff000000) >> 24 |
+			 (pixel & 0x00ff0000) >> 8 |
+			 (pixel & 0x0000ff00) << 8 |
+			 (pixel & 0x000000ff) << 24);
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_b8g8r8x8 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_b8g8r8x8 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = READ(pict, (uint32_t *)bits + offset);
+    int i;
 
-    return ((0xff000000) |
-	    (pixel & 0xff000000) >> 24 |
-	    (pixel & 0x00ff0000) >> 8 |
-	    (pixel & 0x0000ff00) << 8);
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+	
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = READ(pict, (uint32_t *)bits + offset);
+	    
+	    buffer[i] = ((0xff000000) |
+			 (pixel & 0xff000000) >> 24 |
+			 (pixel & 0x00ff0000) >> 8 |
+			 (pixel & 0x0000ff00) << 8);
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_r8g8b8 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_r8g8b8 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint8_t   *pixel = ((uint8_t *) bits) + (offset*3);
+    int i;
+
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint8_t   *pixel = ((uint8_t *) bits) + (offset*3);
 #if IMAGE_BYTE_ORDER == MSBFirst
-    return (0xff000000 |
-	    (READ(pict, pixel + 0) << 16) |
-	    (READ(pict, pixel + 1) << 8) |
-	    (READ(pict, pixel + 2)));
+	    buffer[i] = (0xff000000 |
+			 (READ(pict, pixel + 0) << 16) |
+			 (READ(pict, pixel + 1) << 8) |
+			 (READ(pict, pixel + 2)));
 #else
-    return (0xff000000 |
-	    (READ(pict, pixel + 2) << 16) |
-	    (READ(pict, pixel + 1) << 8) |
-	    (READ(pict, pixel + 0)));
+	    buffer[i] = (0xff000000 |
+			 (READ(pict, pixel + 2) << 16) |
+			 (READ(pict, pixel + 1) << 8) |
+			 (READ(pict, pixel + 0)));
 #endif
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_b8g8r8 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_b8g8r8 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint8_t   *pixel = ((uint8_t *) bits) + (offset*3);
+    int i;
+
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint8_t   *pixel = ((uint8_t *) bits) + (offset*3);
 #if IMAGE_BYTE_ORDER == MSBFirst
-    return (0xff000000 |
-	    (READ(pict, pixel + 2) << 16) |
-	    (READ(pict, pixel + 1) << 8) |
-	    (READ(pict, pixel + 0)));
+	    buffer[i] = (0xff000000 |
+			 (READ(pict, pixel + 2) << 16) |
+			 (READ(pict, pixel + 1) << 8) |
+			 (READ(pict, pixel + 0)));
 #else
-    return (0xff000000 |
-	    (READ(pict, pixel + 0) << 16) |
-	    (READ(pict, pixel + 1) << 8) |
-	    (READ(pict, pixel + 2)));
+	    buffer[i] = (0xff000000 |
+			 (READ(pict, pixel + 0) << 16) |
+			 (READ(pict, pixel + 1) << 8) |
+			 (READ(pict, pixel + 2)));
 #endif
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_r5g6b5 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_r5g6b5 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t  r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+    int i;
+
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
 
-    r = ((pixel & 0xf800) | ((pixel & 0xe000) >> 5)) << 8;
-    g = ((pixel & 0x07e0) | ((pixel & 0x0600) >> 6)) << 5;
-    b = ((pixel & 0x001c) | ((pixel & 0x001f) << 5)) >> 2;
-    return (0xff000000 | r | g | b);
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t  r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+	    
+	    r = ((pixel & 0xf800) | ((pixel & 0xe000) >> 5)) << 8;
+	    g = ((pixel & 0x07e0) | ((pixel & 0x0600) >> 6)) << 5;
+	    b = ((pixel & 0x001c) | ((pixel & 0x001f) << 5)) >> 2;
+	    buffer[i] = (0xff000000 | r | g | b);
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_b5g6r5 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_b5g6r5 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t  r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+    int i;
 
-    b = ((pixel & 0xf800) | ((pixel & 0xe000) >> 5)) >> 8;
-    g = ((pixel & 0x07e0) | ((pixel & 0x0600) >> 6)) << 5;
-    r = ((pixel & 0x001c) | ((pixel & 0x001f) << 5)) << 14;
-    return (0xff000000 | r | g | b);
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t  r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+	    
+	    b = ((pixel & 0xf800) | ((pixel & 0xe000) >> 5)) >> 8;
+	    g = ((pixel & 0x07e0) | ((pixel & 0x0600) >> 6)) << 5;
+	    r = ((pixel & 0x001c) | ((pixel & 0x001f) << 5)) << 14;
+	    buffer[i] = (0xff000000 | r | g | b);
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_a1r5g5b5 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_a1r5g5b5 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t  a,r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+    int i;
+
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
 
-    a = (uint32_t) ((uint8_t) (0 - ((pixel & 0x8000) >> 15))) << 24;
-    r = ((pixel & 0x7c00) | ((pixel & 0x7000) >> 5)) << 9;
-    g = ((pixel & 0x03e0) | ((pixel & 0x0380) >> 5)) << 6;
-    b = ((pixel & 0x001c) | ((pixel & 0x001f) << 5)) >> 2;
-    return (a | r | g | b);
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t  a,r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+	    
+	    a = (uint32_t) ((uint8_t) (0 - ((pixel & 0x8000) >> 15))) << 24;
+	    r = ((pixel & 0x7c00) | ((pixel & 0x7000) >> 5)) << 9;
+	    g = ((pixel & 0x03e0) | ((pixel & 0x0380) >> 5)) << 6;
+	    b = ((pixel & 0x001c) | ((pixel & 0x001f) << 5)) >> 2;
+	    buffer[i] = (a | r | g | b);
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_x1r5g5b5 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_x1r5g5b5 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t  r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+    int i;
 
-    r = ((pixel & 0x7c00) | ((pixel & 0x7000) >> 5)) << 9;
-    g = ((pixel & 0x03e0) | ((pixel & 0x0380) >> 5)) << 6;
-    b = ((pixel & 0x001c) | ((pixel & 0x001f) << 5)) >> 2;
-    return (0xff000000 | r | g | b);
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t  r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+	    
+	    r = ((pixel & 0x7c00) | ((pixel & 0x7000) >> 5)) << 9;
+	    g = ((pixel & 0x03e0) | ((pixel & 0x0380) >> 5)) << 6;
+	    b = ((pixel & 0x001c) | ((pixel & 0x001f) << 5)) >> 2;
+	    buffer[i] = (0xff000000 | r | g | b);
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_a1b5g5r5 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_a1b5g5r5 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t  a,r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+    int i;
+
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
 
-    a = (uint32_t) ((uint8_t) (0 - ((pixel & 0x8000) >> 15))) << 24;
-    b = ((pixel & 0x7c00) | ((pixel & 0x7000) >> 5)) >> 7;
-    g = ((pixel & 0x03e0) | ((pixel & 0x0380) >> 5)) << 6;
-    r = ((pixel & 0x001c) | ((pixel & 0x001f) << 5)) << 14;
-    return (a | r | g | b);
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t  a,r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+	    
+	    a = (uint32_t) ((uint8_t) (0 - ((pixel & 0x8000) >> 15))) << 24;
+	    b = ((pixel & 0x7c00) | ((pixel & 0x7000) >> 5)) >> 7;
+	    g = ((pixel & 0x03e0) | ((pixel & 0x0380) >> 5)) << 6;
+	    r = ((pixel & 0x001c) | ((pixel & 0x001f) << 5)) << 14;
+	    buffer[i] = (a | r | g | b);
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_x1b5g5r5 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_x1b5g5r5 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t  r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+    int i;
+
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
 
-    b = ((pixel & 0x7c00) | ((pixel & 0x7000) >> 5)) >> 7;
-    g = ((pixel & 0x03e0) | ((pixel & 0x0380) >> 5)) << 6;
-    r = ((pixel & 0x001c) | ((pixel & 0x001f) << 5)) << 14;
-    return (0xff000000 | r | g | b);
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t  r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+	    
+	    b = ((pixel & 0x7c00) | ((pixel & 0x7000) >> 5)) >> 7;
+	    g = ((pixel & 0x03e0) | ((pixel & 0x0380) >> 5)) << 6;
+	    r = ((pixel & 0x001c) | ((pixel & 0x001f) << 5)) << 14;
+	    buffer[i] = (0xff000000 | r | g | b);
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_a4r4g4b4 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_a4r4g4b4 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t  a,r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+    int i;
 
-    a = ((pixel & 0xf000) | ((pixel & 0xf000) >> 4)) << 16;
-    r = ((pixel & 0x0f00) | ((pixel & 0x0f00) >> 4)) << 12;
-    g = ((pixel & 0x00f0) | ((pixel & 0x00f0) >> 4)) << 8;
-    b = ((pixel & 0x000f) | ((pixel & 0x000f) << 4));
-    return (a | r | g | b);
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t  a,r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+	    
+	    a = ((pixel & 0xf000) | ((pixel & 0xf000) >> 4)) << 16;
+	    r = ((pixel & 0x0f00) | ((pixel & 0x0f00) >> 4)) << 12;
+	    g = ((pixel & 0x00f0) | ((pixel & 0x00f0) >> 4)) << 8;
+	    b = ((pixel & 0x000f) | ((pixel & 0x000f) << 4));
+	    buffer[i] = (a | r | g | b);
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_x4r4g4b4 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_x4r4g4b4 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t  r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+    int i;
+
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
 
-    r = ((pixel & 0x0f00) | ((pixel & 0x0f00) >> 4)) << 12;
-    g = ((pixel & 0x00f0) | ((pixel & 0x00f0) >> 4)) << 8;
-    b = ((pixel & 0x000f) | ((pixel & 0x000f) << 4));
-    return (0xff000000 | r | g | b);
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t  r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+	    
+	    r = ((pixel & 0x0f00) | ((pixel & 0x0f00) >> 4)) << 12;
+	    g = ((pixel & 0x00f0) | ((pixel & 0x00f0) >> 4)) << 8;
+	    b = ((pixel & 0x000f) | ((pixel & 0x000f) << 4));
+	    buffer[i] = (0xff000000 | r | g | b);
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_a4b4g4r4 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_a4b4g4r4 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t  a,r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+    int i;
 
-    a = ((pixel & 0xf000) | ((pixel & 0xf000) >> 4)) << 16;
-    b = ((pixel & 0x0f00) | ((pixel & 0x0f00) >> 4)) >> 4;
-    g = ((pixel & 0x00f0) | ((pixel & 0x00f0) >> 4)) << 8;
-    r = ((pixel & 0x000f) | ((pixel & 0x000f) << 4)) << 16;
-    return (a | r | g | b);
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t  a,r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+	    
+	    a = ((pixel & 0xf000) | ((pixel & 0xf000) >> 4)) << 16;
+	    b = ((pixel & 0x0f00) | ((pixel & 0x0f00) >> 4)) >> 4;
+	    g = ((pixel & 0x00f0) | ((pixel & 0x00f0) >> 4)) << 8;
+	    r = ((pixel & 0x000f) | ((pixel & 0x000f) << 4)) << 16;
+	    buffer[i] = (a | r | g | b);
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_x4b4g4r4 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_x4b4g4r4 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t  r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+    int i;
+
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
 
-    b = ((pixel & 0x0f00) | ((pixel & 0x0f00) >> 4)) >> 4;
-    g = ((pixel & 0x00f0) | ((pixel & 0x00f0) >> 4)) << 8;
-    r = ((pixel & 0x000f) | ((pixel & 0x000f) << 4)) << 16;
-    return (0xff000000 | r | g | b);
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t  r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = READ(pict, (uint16_t *) bits + offset);
+	    
+	    b = ((pixel & 0x0f00) | ((pixel & 0x0f00) >> 4)) >> 4;
+	    g = ((pixel & 0x00f0) | ((pixel & 0x00f0) >> 4)) << 8;
+	    r = ((pixel & 0x000f) | ((pixel & 0x000f) << 4)) << 16;
+	    buffer[i] = (0xff000000 | r | g | b);
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_a8 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_a8 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t   pixel = READ(pict, (uint8_t *) bits + offset);
+    int i;
 
-    return pixel << 24;
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t   pixel = READ(pict, (uint8_t *) bits + offset);
+	    
+	    buffer[i] = pixel << 24;
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_r3g3b2 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_r3g3b2 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t  r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t   pixel = READ(pict, (uint8_t *) bits + offset);
+    int i;
+
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
 
-    r = ((pixel & 0xe0) | ((pixel & 0xe0) >> 3) | ((pixel & 0xc0) >> 6)) << 16;
-    g = ((pixel & 0x1c) | ((pixel & 0x18) >> 3) | ((pixel & 0x1c) << 3)) << 8;
-    b = (((pixel & 0x03)     ) |
-	 ((pixel & 0x03) << 2) |
-	 ((pixel & 0x03) << 4) |
-	 ((pixel & 0x03) << 6));
-    return (0xff000000 | r | g | b);
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t  r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t   pixel = READ(pict, (uint8_t *) bits + offset);
+	    
+	    r = ((pixel & 0xe0) | ((pixel & 0xe0) >> 3) | ((pixel & 0xc0) >> 6)) << 16;
+	    g = ((pixel & 0x1c) | ((pixel & 0x18) >> 3) | ((pixel & 0x1c) << 3)) << 8;
+	    b = (((pixel & 0x03)     ) |
+		 ((pixel & 0x03) << 2) |
+		 ((pixel & 0x03) << 4) |
+		 ((pixel & 0x03) << 6));
+	    buffer[i] = (0xff000000 | r | g | b);
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_b2g3r3 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_b2g3r3 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t  r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t   pixel = READ(pict, (uint8_t *) bits + offset);
-
-    b = (((pixel & 0xc0)     ) |
-	 ((pixel & 0xc0) >> 2) |
-	 ((pixel & 0xc0) >> 4) |
-	 ((pixel & 0xc0) >> 6));
-    g = ((pixel & 0x38) | ((pixel & 0x38) >> 3) | ((pixel & 0x30) << 2)) << 8;
-    r = (((pixel & 0x07)     ) |
-	 ((pixel & 0x07) << 3) |
-	 ((pixel & 0x06) << 6)) << 16;
-    return (0xff000000 | r | g | b);
-}
+    int i;
 
-static FASTCALL uint32_t
-fbFetchPixel_a2r2g2b2 (bits_image_t *pict, int offset, int line)
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t  r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t   pixel = READ(pict, (uint8_t *) bits + offset);
+	    
+	    b = (((pixel & 0xc0)     ) |
+		 ((pixel & 0xc0) >> 2) |
+		 ((pixel & 0xc0) >> 4) |
+		 ((pixel & 0xc0) >> 6));
+	    g = ((pixel & 0x38) | ((pixel & 0x38) >> 3) | ((pixel & 0x30) << 2)) << 8;
+	    r = (((pixel & 0x07)     ) |
+		 ((pixel & 0x07) << 3) |
+		 ((pixel & 0x06) << 6)) << 16;
+	    buffer[i] = (0xff000000 | r | g | b);
+	}
+    }
+}
+
+static FASTCALL void
+fbFetchPixel_a2r2g2b2 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t   a,r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t   pixel = READ(pict, (uint8_t *) bits + offset);
+    int i;
+
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
 
-    a = ((pixel & 0xc0) * 0x55) << 18;
-    r = ((pixel & 0x30) * 0x55) << 12;
-    g = ((pixel & 0x0c) * 0x55) << 6;
-    b = ((pixel & 0x03) * 0x55);
-    return a|r|g|b;
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t   a,r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t   pixel = READ(pict, (uint8_t *) bits + offset);
+	    
+	    a = ((pixel & 0xc0) * 0x55) << 18;
+	    r = ((pixel & 0x30) * 0x55) << 12;
+	    g = ((pixel & 0x0c) * 0x55) << 6;
+	    b = ((pixel & 0x03) * 0x55);
+	    buffer[i] = a|r|g|b;
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_a2b2g2r2 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_a2b2g2r2 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t   a,r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t   pixel = READ(pict, (uint8_t *) bits + offset);
+    int i;
 
-    a = ((pixel & 0xc0) * 0x55) << 18;
-    b = ((pixel & 0x30) * 0x55) >> 6;
-    g = ((pixel & 0x0c) * 0x55) << 6;
-    r = ((pixel & 0x03) * 0x55) << 16;
-    return a|r|g|b;
-}
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
 
-static FASTCALL uint32_t
-fbFetchPixel_c8 (bits_image_t *pict, int offset, int line)
-{
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t   pixel = READ(pict, (uint8_t *) bits + offset);
-    const pixman_indexed_t * indexed = pict->indexed;
-    return indexed->rgba[pixel];
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t   a,r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t   pixel = READ(pict, (uint8_t *) bits + offset);
+	    
+	    a = ((pixel & 0xc0) * 0x55) << 18;
+	    b = ((pixel & 0x30) * 0x55) >> 6;
+	    g = ((pixel & 0x0c) * 0x55) << 6;
+	    r = ((pixel & 0x03) * 0x55) << 16;
+	    buffer[i] = a|r|g|b;
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_x4a4 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_c8 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t   pixel = READ(pict, (uint8_t *) bits + offset);
+    int i;
+
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
 
-    return ((pixel & 0xf) | ((pixel & 0xf) << 4)) << 24;
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t   pixel = READ(pict, (uint8_t *) bits + offset);
+	    const pixman_indexed_t * indexed = pict->indexed;
+	    buffer[i] = indexed->rgba[pixel];
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_a4 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_x4a4 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = Fetch4(pict, bits, offset);
+    int i;
 
-    pixel |= pixel << 4;
-    return pixel << 24;
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t   pixel = READ(pict, (uint8_t *) bits + offset);
+	    
+	    buffer[i] = ((pixel & 0xf) | ((pixel & 0xf) << 4)) << 24;
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_r1g2b1 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_a4 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t  r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = Fetch4(pict, bits, offset);
+    int i;
 
-    r = ((pixel & 0x8) * 0xff) << 13;
-    g = ((pixel & 0x6) * 0x55) << 7;
-    b = ((pixel & 0x1) * 0xff);
-    return 0xff000000|r|g|b;
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = Fetch4(pict, bits, offset);
+	    
+	    pixel |= pixel << 4;
+	    buffer[i] = pixel << 24;
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_b1g2r1 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_r1g2b1 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t  r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = Fetch4(pict, bits, offset);
+    int i;
 
-    b = ((pixel & 0x8) * 0xff) >> 3;
-    g = ((pixel & 0x6) * 0x55) << 7;
-    r = ((pixel & 0x1) * 0xff) << 16;
-    return 0xff000000|r|g|b;
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+	
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t  r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = Fetch4(pict, bits, offset);
+	    
+	    r = ((pixel & 0x8) * 0xff) << 13;
+	    g = ((pixel & 0x6) * 0x55) << 7;
+	    b = ((pixel & 0x1) * 0xff);
+	    buffer[i] = 0xff000000|r|g|b;
+	}
+    }
+}
+
+static FASTCALL void
+fbFetchPixel_b1g2r1 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
+{
+    int i;
+
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t  r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = Fetch4(pict, bits, offset);
+	    
+	    b = ((pixel & 0x8) * 0xff) >> 3;
+	    g = ((pixel & 0x6) * 0x55) << 7;
+	    r = ((pixel & 0x1) * 0xff) << 16;
+	    buffer[i] = 0xff000000|r|g|b;
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_a1r1g1b1 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_a1r1g1b1 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t  a,r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = Fetch4(pict, bits, offset);
+    int i;
 
-    a = ((pixel & 0x8) * 0xff) << 21;
-    r = ((pixel & 0x4) * 0xff) << 14;
-    g = ((pixel & 0x2) * 0xff) << 7;
-    b = ((pixel & 0x1) * 0xff);
-    return a|r|g|b;
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t  a,r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = Fetch4(pict, bits, offset);
+	    
+	    a = ((pixel & 0x8) * 0xff) << 21;
+	    r = ((pixel & 0x4) * 0xff) << 14;
+	    g = ((pixel & 0x2) * 0xff) << 7;
+	    b = ((pixel & 0x1) * 0xff);
+	    buffer[i] = a|r|g|b;
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_a1b1g1r1 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_a1b1g1r1 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t  a,r,g,b;
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = Fetch4(pict, bits, offset);
+    int i;
+
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
 
-    a = ((pixel & 0x8) * 0xff) << 21;
-    r = ((pixel & 0x4) * 0xff) >> 3;
-    g = ((pixel & 0x2) * 0xff) << 7;
-    b = ((pixel & 0x1) * 0xff) << 16;
-    return a|r|g|b;
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t  a,r,g,b;
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = Fetch4(pict, bits, offset);
+	    
+	    a = ((pixel & 0x8) * 0xff) << 21;
+	    r = ((pixel & 0x4) * 0xff) >> 3;
+	    g = ((pixel & 0x2) * 0xff) << 7;
+	    b = ((pixel & 0x1) * 0xff) << 16;
+	    buffer[i] = a|r|g|b;
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_c4 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_c4 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = Fetch4(pict, bits, offset);
-    const pixman_indexed_t * indexed = pict->indexed;
+    int i;
 
-    return indexed->rgba[pixel];
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = Fetch4(pict, bits, offset);
+	    const pixman_indexed_t * indexed = pict->indexed;
+	    
+	    buffer[i] = indexed->rgba[pixel];
+	}
+    }
 }
 
 
-static FASTCALL uint32_t
-fbFetchPixel_a1 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_a1 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t  pixel = READ(pict, bits + (offset >> 5));
-    uint32_t  a;
+    int i;
+
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t  pixel = READ(pict, bits + (offset >> 5));
+	    uint32_t  a;
 #if BITMAP_BIT_ORDER == MSBFirst
-    a = pixel >> (0x1f - (offset & 0x1f));
+	    a = pixel >> (0x1f - (offset & 0x1f));
 #else
-    a = pixel >> (offset & 0x1f);
+	    a = pixel >> (offset & 0x1f);
 #endif
-    a = a & 1;
-    a |= a << 1;
-    a |= a << 2;
-    a |= a << 4;
-    return a << 24;
+	    a = a & 1;
+	    a |= a << 1;
+	    a |= a << 2;
+	    a |= a << 4;
+	    buffer[i] = a << 24;
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_g1 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_g1 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits = pict->bits + line*pict->rowstride;
-    uint32_t pixel = READ(pict, bits + (offset >> 5));
-    const pixman_indexed_t * indexed = pict->indexed;
-    uint32_t a;
+    int i;
+
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    uint32_t *bits = pict->bits + line*pict->rowstride;
+	    uint32_t pixel = READ(pict, bits + (offset >> 5));
+	    const pixman_indexed_t * indexed = pict->indexed;
+	    uint32_t a;
 #if BITMAP_BIT_ORDER == MSBFirst
-    a = pixel >> (0x1f - (offset & 0x1f));
+	    a = pixel >> (0x1f - (offset & 0x1f));
 #else
-    a = pixel >> (offset & 0x1f);
+	    a = pixel >> (offset & 0x1f);
 #endif
-    a = a & 1;
-    return indexed->rgba[a];
+	    a = a & 1;
+	    buffer[i] = indexed->rgba[a];
+	}
+    }
 }
 
-static FASTCALL uint32_t
-fbFetchPixel_yuy2 (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel_yuy2 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    int16_t y, u, v;
-    int32_t r, g, b;
-
-    const uint32_t *bits = pict->bits + pict->rowstride * line;
-
-    y = ((uint8_t *) bits)[offset << 1] - 16;
-    u = ((uint8_t *) bits)[((offset << 1) & -4) + 1] - 128;
-    v = ((uint8_t *) bits)[((offset << 1) & -4) + 3] - 128;
-
-    /* R = 1.164(Y - 16) + 1.596(V - 128) */
-    r = 0x012b27 * y + 0x019a2e * v;
-    /* G = 1.164(Y - 16) - 0.813(V - 128) - 0.391(U - 128) */
-    g = 0x012b27 * y - 0x00d0f2 * v - 0x00647e * u;
-    /* B = 1.164(Y - 16) + 2.018(U - 128) */
-    b = 0x012b27 * y + 0x0206a2 * u;
-
-    return 0xff000000 |
-	(r >= 0 ? r < 0x1000000 ? r         & 0xff0000 : 0xff0000 : 0) |
-	(g >= 0 ? g < 0x1000000 ? (g >> 8)  & 0x00ff00 : 0x00ff00 : 0) |
-	(b >= 0 ? b < 0x1000000 ? (b >> 16) & 0x0000ff : 0x0000ff : 0);
-}
+    int i;
 
-static FASTCALL uint32_t
-fbFetchPixel_yv12 (bits_image_t *pict, int offset, int line)
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    int16_t y, u, v;
+	    int32_t r, g, b;
+	    
+	    const uint32_t *bits = pict->bits + pict->rowstride * line;
+	    
+	    y = ((uint8_t *) bits)[offset << 1] - 16;
+	    u = ((uint8_t *) bits)[((offset << 1) & -4) + 1] - 128;
+	    v = ((uint8_t *) bits)[((offset << 1) & -4) + 3] - 128;
+	    
+	    /* R = 1.164(Y - 16) + 1.596(V - 128) */
+	    r = 0x012b27 * y + 0x019a2e * v;
+	    /* G = 1.164(Y - 16) - 0.813(V - 128) - 0.391(U - 128) */
+	    g = 0x012b27 * y - 0x00d0f2 * v - 0x00647e * u;
+	    /* B = 1.164(Y - 16) + 2.018(U - 128) */
+	    b = 0x012b27 * y + 0x0206a2 * u;
+	    
+	    buffer[i] = 0xff000000 |
+		(r >= 0 ? r < 0x1000000 ? r         & 0xff0000 : 0xff0000 : 0) |
+		(g >= 0 ? g < 0x1000000 ? (g >> 8)  & 0x00ff00 : 0x00ff00 : 0) |
+		(b >= 0 ? b < 0x1000000 ? (b >> 16) & 0x0000ff : 0x0000ff : 0);
+	}
+    }
+}
+
+static FASTCALL void
+fbFetchPixel_yv12 (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    YV12_SETUP(pict);
-    int16_t y = YV12_Y (line)[offset] - 16;
-    int16_t u = YV12_U (line)[offset >> 1] - 128;
-    int16_t v = YV12_V (line)[offset >> 1] - 128;
-    int32_t r, g, b;
-
-    /* R = 1.164(Y - 16) + 1.596(V - 128) */
-    r = 0x012b27 * y + 0x019a2e * v;
-    /* G = 1.164(Y - 16) - 0.813(V - 128) - 0.391(U - 128) */
-    g = 0x012b27 * y - 0x00d0f2 * v - 0x00647e * u;
-    /* B = 1.164(Y - 16) + 2.018(U - 128) */
-    b = 0x012b27 * y + 0x0206a2 * u;
+    int i;
 
-    return 0xff000000 |
-	(r >= 0 ? r < 0x1000000 ? r         & 0xff0000 : 0xff0000 : 0) |
-	(g >= 0 ? g < 0x1000000 ? (g >> 8)  & 0x00ff00 : 0x00ff00 : 0) |
-	(b >= 0 ? b < 0x1000000 ? (b >> 16) & 0x0000ff : 0x0000ff : 0);
+    for (i = 0; i < n_pixels; ++i)
+    {
+	int offset = buffer[2 * i];
+	int line = buffer[2 * i + 1];
+
+	if (offset == 0xffffffff || line == 0xffffffff)
+	{
+	    buffer[i] = 0;
+	}
+	else
+	{
+	    YV12_SETUP(pict);
+	    int16_t y = YV12_Y (line)[offset] - 16;
+	    int16_t u = YV12_U (line)[offset >> 1] - 128;
+	    int16_t v = YV12_V (line)[offset >> 1] - 128;
+	    int32_t r, g, b;
+	    
+	    /* R = 1.164(Y - 16) + 1.596(V - 128) */
+	    r = 0x012b27 * y + 0x019a2e * v;
+	    /* G = 1.164(Y - 16) - 0.813(V - 128) - 0.391(U - 128) */
+	    g = 0x012b27 * y - 0x00d0f2 * v - 0x00647e * u;
+	    /* B = 1.164(Y - 16) + 2.018(U - 128) */
+	    b = 0x012b27 * y + 0x0206a2 * u;
+	    
+	    buffer[i] = 0xff000000 |
+		(r >= 0 ? r < 0x1000000 ? r         & 0xff0000 : 0xff0000 : 0) |
+		(g >= 0 ? g < 0x1000000 ? (g >> 8)  & 0x00ff00 : 0x00ff00 : 0) |
+		(b >= 0 ? b < 0x1000000 ? (b >> 16) & 0x0000ff : 0x0000ff : 0);
+	}
+    }
 }
 
 /*
@@ -1320,19 +1875,21 @@ fbFetchPixel_yv12 (bits_image_t *pict, int offset, int line)
  *
  * WARNING: This function loses precision!
  */
-static FASTCALL uint32_t
-fbFetchPixel32_generic_lossy (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel32_generic_lossy (bits_image_t *pict, uint32_t *buffer, int n_pixels)
 {
-    fetchPixelProc64 fetchPixel64 = ACCESS(pixman_fetchPixelProcForPicture64) (pict);
-    const uint64_t argb16Pixel = fetchPixel64(pict, offset, line);
-    uint32_t argb8Pixel;
+    fetch_pixels_64_t fetch_pixels_64 = ACCESS(pixman_fetchPixelProcForPicture64) (pict);
 
-    pixman_contract(&argb8Pixel, &argb16Pixel, 1);
+    /* Since buffer contains n_pixels coordinate pairs, it also has enough room for
+     * n_pixels 64 bit pixels
+     */
+    fetch_pixels_64 (pict, (uint64_t *)buffer, n_pixels);
 
-    return argb8Pixel;
+    pixman_contract (buffer, (uint64_t *)buffer, n_pixels);
 }
 
-fetchPixelProc32 ACCESS(pixman_fetchPixelProcForPicture32) (bits_image_t * pict)
+fetch_pixels_32_t
+ACCESS(pixman_fetchPixelProcForPicture32) (bits_image_t * pict)
 {
     switch(pict->format) {
     case PIXMAN_a8r8g8b8: return fbFetchPixel_a8r8g8b8;
@@ -1393,19 +1950,18 @@ fetchPixelProc32 ACCESS(pixman_fetchPixelProcForPicture32) (bits_image_t * pict)
     return NULL;
 }
 
-static FASTCALL uint64_t
-fbFetchPixel64_generic (bits_image_t *pict, int offset, int line)
+static FASTCALL void
+fbFetchPixel64_generic (bits_image_t *pict, uint64_t *buffer, int n_pixels)
 {
-    fetchPixelProc32 fetchPixel32 = ACCESS(pixman_fetchPixelProcForPicture32) (pict);
-    uint32_t argb8Pixel = fetchPixel32(pict, offset, line);
-    uint64_t argb16Pixel;
+    fetch_pixels_32_t fetch_pixels_32 = ACCESS(pixman_fetchPixelProcForPicture32) (pict);
 
-    pixman_expand(&argb16Pixel, &argb8Pixel, pict->format, 1);
+    fetch_pixels_32 (pict, (uint32_t *)buffer, n_pixels);
 
-    return argb16Pixel;
+    pixman_expand (buffer, (uint32_t *)buffer, pict->format, n_pixels);
 }
 
-fetchPixelProc64 ACCESS(pixman_fetchPixelProcForPicture64) (bits_image_t * pict)
+fetch_pixels_64_t
+ACCESS(pixman_fetchPixelProcForPicture64) (bits_image_t * pict)
 {
     switch(pict->format) {
     case PIXMAN_a2b10g10r10: return fbFetchPixel_a2b10g10r10;
@@ -1421,7 +1977,8 @@ fetchPixelProc64 ACCESS(pixman_fetchPixelProcForPicture64) (bits_image_t * pict)
 
 static FASTCALL void
 fbStore_a2b10g10r10 (pixman_image_t *image,
-		     uint32_t *bits, const uint64_t *values, int x, int width, const pixman_indexed_t * indexed)
+		     uint32_t *bits, const uint64_t *values,
+		     int x, int width, const pixman_indexed_t * indexed)
 {
     int i;
     uint32_t *pixel = bits + x;
@@ -1981,93 +2538,3 @@ storeProc64 ACCESS(pixman_storeProcForPicture64) (bits_image_t * pict)
     default: return fbStore64_generic;
     }
 }
-
-#ifndef PIXMAN_FB_ACCESSORS
-/*
- * Helper routine to expand a color component from 0 < n <= 8 bits to 16 bits by
- * replication.
- */
-static inline uint64_t expand16(const uint8_t val, int nbits)
-{
-    // Start out with the high bit of val in the high bit of result.
-    uint16_t result = (uint16_t)val << (16 - nbits);
-
-    if (nbits == 0)
-        return 0;
-
-    // Copy the bits in result, doubling the number of bits each time, until we
-    // fill all 16 bits.
-    while (nbits < 16) {
-        result |= result >> nbits;
-        nbits *= 2;
-    }
-
-    return result;
-}
-
-/*
- * This function expands images from ARGB8 format to ARGB16.  To preserve
- * precision, it needs to know the original source format.  For example, if the
- * source was PIXMAN_x1r5g5b5 and the red component contained bits 12345, then
- * the expanded value is 12345123.  To correctly expand this to 16 bits, it
- * should be 1234512345123451 and not 1234512312345123.
- */
-void pixman_expand(uint64_t *dst, const uint32_t *src,
-                   pixman_format_code_t format, int width)
-{
-    /*
-     * Determine the sizes of each component and the masks and shifts required
-     * to extract them from the source pixel.
-     */
-    const int a_size = PIXMAN_FORMAT_A(format),
-              r_size = PIXMAN_FORMAT_R(format),
-              g_size = PIXMAN_FORMAT_G(format),
-              b_size = PIXMAN_FORMAT_B(format);
-    const int a_shift = 32 - a_size,
-              r_shift = 24 - r_size,
-              g_shift = 16 - g_size,
-              b_shift =  8 - b_size;
-    const uint8_t a_mask = ~(~0 << a_size),
-                  r_mask = ~(~0 << r_size),
-                  g_mask = ~(~0 << g_size),
-                  b_mask = ~(~0 << b_size);
-    int i;
-
-    /* Start at the end so that we can do the expansion in place when src == dst */
-    for (i = width - 1; i >= 0; i--)
-    {
-        const uint32_t pixel = src[i];
-        // Extract the components.
-        const uint8_t a = (pixel >> a_shift) & a_mask,
-                      r = (pixel >> r_shift) & r_mask,
-                      g = (pixel >> g_shift) & g_mask,
-                      b = (pixel >> b_shift) & b_mask;
-        const uint64_t a16 = a_size ? expand16(a, a_size) : 0xffff,
-                       r16 = expand16(r, r_size),
-                       g16 = expand16(g, g_size),
-                       b16 = expand16(b, b_size);
-
-        dst[i] = a16 << 48 | r16 << 32 | g16 << 16 | b16;
-    }
-}
-
-/*
- * Contracting is easier than expanding.  We just need to truncate the
- * components.
- */
-void pixman_contract(uint32_t *dst, const uint64_t *src, int width)
-{
-    int i;
-
-    /* Start at the beginning so that we can do the contraction in place when
-     * src == dst */
-    for (i = 0; i < width; i++)
-    {
-        const uint8_t a = src[i] >> 56,
-                      r = src[i] >> 40,
-                      g = src[i] >> 24,
-                      b = src[i] >> 8;
-        dst[i] = a << 24 | r << 16 | g << 8 | b;
-    }
-}
-#endif // PIXMAN_FB_ACCESSORS
diff --git a/pixman/pixman-bits-image.c b/pixman/pixman-bits-image.c
index 888e487..e9f12d0 100644
--- a/pixman/pixman-bits-image.c
+++ b/pixman/pixman-bits-image.c
@@ -1,189 +1,676 @@
 /*
+ * Copyright © 2000 Keith Packard, member of The XFree86 Project, Inc.
+ *             2005 Lars Knoll & Zack Rusin, Trolltech
+ *             2008 Aaron Plattner, NVIDIA Corporation
  * Copyright © 2000 SuSE, Inc.
- * Copyright © 2007 Red Hat, Inc.
+ * Copyright © 2007, 2009 Red Hat, Inc.
  *
  * Permission to use, copy, modify, distribute, and sell this software and its
  * documentation for any purpose is hereby granted without fee, provided that
  * the above copyright notice appear in all copies and that both that
  * copyright notice and this permission notice appear in supporting
- * documentation, and that the name of SuSE not be used in advertising or
- * publicity pertaining to distribution of the software without specific,
- * written prior permission.  SuSE makes no representations about the
- * suitability of this software for any purpose.  It is provided "as is"
- * without express or implied warranty.
+ * documentation, and that the name of Keith Packard not be used in
+ * advertising or publicity pertaining to distribution of the software without
+ * specific, written prior permission.  Keith Packard makes no
+ * representations about the suitability of this software for any purpose.  It
+ * is provided "as is" without express or implied warranty.
  *
- * SuSE DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL SuSE
- * BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
- * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION
- * OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
- * CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS
+ * SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
+ * FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
+ * AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
+ * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
+ * SOFTWARE.
  */
 
 #include <config.h>
 #include <stdlib.h>
+#include <string.h>
 #include "pixman-private.h"
 
+#define Alpha(x) ((x) >> 24)
+#define Red(x) (((x) >> 16) & 0xff)
+#define Green(x) (((x) >> 8) & 0xff)
+#define Blue(x) ((x) & 0xff)
 
 #define READ_ACCESS(f) ((image->common.read_func)? f##_accessors : f)
 #define WRITE_ACCESS(f) ((image->common.write_func)? f##_accessors : f)
 
+/* Store functions */
+
 static void
-fbFetchSolid(bits_image_t * image,
-	     int x, int y, int width,
-	     uint32_t *buffer,
-	     uint32_t *mask, uint32_t maskBits)
+bits_image_store_scanline_32 (bits_image_t *image, int x, int y, int width, uint32_t *buffer)
 {
-    uint32_t color;
-    uint32_t *end;
-    fetchPixelProc32 fetch =
-	READ_ACCESS(pixman_fetchPixelProcForPicture32)(image);
-    
-    color = fetch(image, 0, 0);
-    
-    end = buffer + width;
-    while (buffer < end)
-	*(buffer++) = color;
+    uint32_t *bits;
+    int32_t stride;
+    const pixman_indexed_t *indexed = image->indexed;
+
+    bits = image->bits;
+    stride = image->rowstride;
+    bits += y*stride;
+
+    image->store_scanline_raw_32 ((pixman_image_t *)image, bits, buffer, x, width, indexed);
+
+    if (image->common.alpha_map)
+    {
+	x -= image->common.alpha_origin.x;
+	y -= image->common.alpha_origin.y;
+
+	bits_image_store_scanline_32 (image->common.alpha_map, x, y, width, buffer);
+    }
 }
 
 static void
-fbFetchSolid64(bits_image_t * image,
-	       int x, int y, int width,
-	       uint64_t *buffer, void *unused, uint32_t unused2)
+bits_image_store_scanline_64 (bits_image_t *image, int x, int y, int width, uint32_t *buffer)
 {
-    uint64_t color;
-    uint64_t *end;
-    fetchPixelProc64 fetch =
-	READ_ACCESS(pixman_fetchPixelProcForPicture64)(image);
-    
-    color = fetch(image, 0, 0);
-    
-    end = buffer + width;
-    while (buffer < end)
-	*(buffer++) = color;
+    uint32_t *bits;
+    int32_t stride;
+    const pixman_indexed_t *indexed = image->indexed;
+
+    bits = image->bits;
+    stride = image->rowstride;
+    bits += y*stride;
+
+    image->store_scanline_raw_64 ((pixman_image_t *)image, bits,
+				  (uint64_t *)buffer, x, width, indexed);
+
+    if (image->common.alpha_map)
+    {
+	x -= image->common.alpha_origin.x;
+	y -= image->common.alpha_origin.y;
+
+	bits_image_store_scanline_64 (image->common.alpha_map, x, y, width, buffer);
+    }
 }
 
-static void
-fbFetch(bits_image_t * image,
-	int x, int y, int width,
-	uint32_t *buffer, uint32_t *mask, uint32_t maskBits)
+void
+_pixman_image_store_scanline_32 (bits_image_t *image, int x, int y, int width,
+				 uint32_t *buffer)
 {
-    fetchProc32 fetch = READ_ACCESS(pixman_fetchProcForPicture32)(image);
-    
-    fetch(image, x, y, width, buffer);
+    image->store_scanline_32 (image, x, y, width, buffer);
 }
 
+void
+_pixman_image_store_scanline_64 (bits_image_t *image, int x, int y, int width,
+				 uint32_t *buffer)
+{
+    image->store_scanline_64 (image, x, y, width, buffer);
+}
+
+/* Fetch functions */
+
+/* On entry, @buffer should contain @n_pixels (x, y) coordinate pairs, where
+ * x and y are both uint32_ts. On exit, buffer will contain the corresponding
+ * pixels.
+ *
+ * The coordinates must be within the sample grid. If either x or y is 0xffffffff,
+ * the pixel returned will be 0.
+ */
 static void
-fbFetch64(bits_image_t * image,
-	  int x, int y, int width,
-	  uint64_t *buffer, void *unused, uint32_t unused2)
+bits_image_fetch_raw_pixels (bits_image_t *image, uint32_t *buffer, int n_pixels)
 {
-    fetchProc64 fetch = READ_ACCESS(pixman_fetchProcForPicture64)(image);
-    
-    fetch(image, x, y, width, buffer);
+    image->fetch_pixels_32 (image, buffer, n_pixels);
 }
 
 static void
-fbStore(bits_image_t * image, int x, int y, int width, uint32_t *buffer)
+bits_image_fetch_alpha_pixels (bits_image_t *image, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits;
-    int32_t stride;
-    storeProc32 store = WRITE_ACCESS(pixman_storeProcForPicture32)(image);
-    const pixman_indexed_t * indexed = image->indexed;
+#define N_ALPHA_PIXELS 256
+    
+    uint32_t alpha_pixels[N_ALPHA_PIXELS * 2];
+    int i;
+    
+    if (!image->common.alpha_map)
+    {
+	bits_image_fetch_raw_pixels (image, buffer, n_pixels);
+	return;
+    }
 
-    bits = image->bits;
-    stride = image->rowstride;
-    bits += y*stride;
-    store((pixman_image_t *)image, bits, buffer, x, width, indexed);
+    /* Alpha map */
+    i = 0;
+    while (i < n_pixels)
+    {
+	int tmp_n_pixels = MIN (N_ALPHA_PIXELS, n_pixels - i);
+	int j;
+	int32_t *coords;
+	
+	memcpy (alpha_pixels, buffer + 2 * i, tmp_n_pixels * 2 * sizeof (int32_t));
+	coords = (int32_t *)alpha_pixels;
+	for (j = 0; j < tmp_n_pixels; ++j)
+	{
+	    int32_t x = coords[0];
+	    int32_t y = coords[1];
+	    
+	    if (x != 0xffffffff)
+	    {
+		x -= image->common.alpha_origin.x;
+		
+		if (x < 0 || x >= image->common.alpha_map->width)
+		    x = 0xffffffff;
+	    }
+	    
+	    if (y != 0xffffffff)
+	    {
+		y -= image->common.alpha_origin.y;
+		
+		if (y < 0 || y >= image->common.alpha_map->height)
+		    y = 0xffffffff;
+	    }
+	    
+	    coords[0] = x;
+	    coords[1] = y;
+	    
+	    coords += 2;
+	}
+	
+	bits_image_fetch_raw_pixels (image->common.alpha_map, alpha_pixels, tmp_n_pixels);
+	bits_image_fetch_raw_pixels (image, buffer + 2 * i, tmp_n_pixels);
+	
+	for (j = 0; j < tmp_n_pixels; ++j)
+	{
+	    int a = alpha_pixels[j] >> 24;
+	    
+	    buffer[i] =
+		(a << 24)					|
+		div_255 (Red (buffer[2 * i - j]) * a) << 16	|
+		div_255 (Green (buffer[2 * i - j]) * a) << 8	|
+		div_255 (Blue (buffer[2 * i - j]) * a);
+	    
+	    i++;
+	}
+    }
 }
 
 static void
-fbStore64 (bits_image_t * image, int x, int y, int width, uint64_t *buffer)
+bits_image_fetch_pixels_src_clip (bits_image_t *image, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits;
-    int32_t stride;
-    storeProc64 store = WRITE_ACCESS(pixman_storeProcForPicture64)(image);
-    const pixman_indexed_t * indexed = image->indexed;
+    if (image->common.src_clip != &(image->common.full_region) &&
+	!pixman_region32_equal (image->common.src_clip, &(image->common.full_region)))
+    {
+	int32_t *coords = (int32_t *)buffer;
+	int i;
+
+	for (i = 0; i < n_pixels; ++i)
+	{
+	    int32_t x = coords[0];
+	    int32_t y = coords[1];
+
+	    if (!pixman_region32_contains_point (image->common.src_clip, x, y, NULL))
+	    {
+		coords[0] = 0xffffffff;
+		coords[1] = 0xffffffff;
+	    }
+
+	    coords += 2;
+	}
+    }
 
-    bits = image->bits;
-    stride = image->rowstride;
-    bits += y*stride;
-    store((pixman_image_t *)image, bits, buffer, x, width, indexed);
+    bits_image_fetch_alpha_pixels (image, buffer, n_pixels);
 }
 
-static void
-fbStoreExternalAlpha (bits_image_t * image, int x, int y, int width,
-		      uint32_t *buffer)
+static force_inline void
+repeat (pixman_repeat_t repeat, int width, int height, int *x, int *y)
 {
-    uint32_t *bits, *alpha_bits;
-    int32_t stride, astride;
-    int ax, ay;
-    storeProc32 store;
-    storeProc32 astore;
-    const pixman_indexed_t * indexed = image->indexed;
-    const pixman_indexed_t * aindexed;
-
-    if (!image->common.alpha_map) {
-        // XXX[AGP]: This should never happen!
-        // fbStore(image, x, y, width, buffer);
-        abort();
-	return;
+    switch (repeat)
+    {
+    case PIXMAN_REPEAT_NORMAL:
+	*x = MOD (*x, width);
+	*y = MOD (*y, height);
+	break;
+
+    case PIXMAN_REPEAT_PAD:
+	*x = CLIP (*x, 0, width - 1);
+	*y = CLIP (*y, 0, height - 1);
+	break;
+	
+    case PIXMAN_REPEAT_REFLECT:
+	*x = MOD (*x, width * 2);
+	*y = MOD (*y, height * 2);
+
+	if (*x >= width)
+	    *x = width * 2 - *x - 1;
+	
+	if (*y >= height)
+	    *y = height * 2 - *y - 1;
+	break;
+
+    case PIXMAN_REPEAT_NONE:
+	if (*x < 0 || *x >= width)
+	    *x = 0xffffffff;
+
+	if (*y < 0 || *y >= height)
+	    *y = 0xffffffff;
+	break;
     }
+}
 
-    store = WRITE_ACCESS(pixman_storeProcForPicture32)(image);
-    astore = WRITE_ACCESS(pixman_storeProcForPicture32)(image->common.alpha_map);
-    aindexed = image->common.alpha_map->indexed;
+/* Buffer contains list of fixed-point coordinates on input,
+ * a list of pixels on output
+ */
+static void
+bits_image_fetch_nearest_pixels (bits_image_t *image, uint32_t *buffer, int n_pixels)
+{
+    pixman_repeat_t repeat_mode = image->common.repeat;
+    int width = image->width;
+    int height = image->height;
+    int i;
 
-    ax = x;
-    ay = y;
+    for (i = 0; i < 2 * n_pixels; i += 2)
+    {
+	int32_t *coords = (int32_t *)buffer;
+	int32_t x, y;
 
-    bits = image->bits;
-    stride = image->rowstride;
+	/* Subtract pixman_fixed_e to ensure that 0.5 rounds to 0, not 1 */
+	x = pixman_fixed_to_int (coords[i] - pixman_fixed_e);
+	y = pixman_fixed_to_int (coords[i + 1] - pixman_fixed_e);
 
-    alpha_bits = image->common.alpha_map->bits;
-    astride = image->common.alpha_map->rowstride;
+	repeat (repeat_mode, width, height, &x, &y);
 
-    bits       += y*stride;
-    alpha_bits += (ay - image->common.alpha_origin.y)*astride;
+	coords[i] = x;
+	coords[i + 1] = y;
+    }
 
+    return bits_image_fetch_pixels_src_clip (image, buffer, n_pixels);
+}
 
-    store((pixman_image_t *)image, bits, buffer, x, width, indexed);
-    astore((pixman_image_t *)image->common.alpha_map,
-	   alpha_bits, buffer, ax - image->common.alpha_origin.x, width, aindexed);
+#define N_TMP_PIXELS	(256)
+
+/* Buffer contains list of fixed-point coordinates on input,
+ * a list of pixels on output
+ */
+static void
+bits_image_fetch_bilinear_pixels (bits_image_t *image, uint32_t *buffer, int n_pixels)
+{
+/* (Four pixels * two coordinates) per pixel */
+#define N_TEMPS		(N_TMP_PIXELS * 8)
+#define N_DISTS		(N_TMP_PIXELS * 2)
+    
+    uint32_t temps[N_TEMPS];
+    int32_t  dists[N_DISTS];
+    pixman_repeat_t repeat_mode = image->common.repeat;
+    int width = image->width;
+    int height = image->height;
+    int32_t *coords;
+    int i;
+
+    i = 0;
+    coords = (int32_t *)buffer;
+    while (i < n_pixels)
+    {
+	int tmp_n_pixels = MIN(N_TMP_PIXELS, n_pixels - i);
+	int32_t distx, disty;
+	uint32_t *u;
+	int32_t *t, *d;
+	int j;
+	
+	t = (int32_t *)temps;
+	d = dists;
+	for (j = 0; j < tmp_n_pixels; ++j)
+	{
+	    int32_t x1, y1, x2, y2;
+	    
+	    x1 = coords[0] - pixman_fixed_1 / 2;
+	    y1 = coords[1] - pixman_fixed_1 / 2;
+	    
+	    distx = (x1 >> 8) & 0xff;
+	    disty = (y1 >> 8) & 0xff;
+	    
+	    x1 >>= 16;
+	    y1 >>= 16;
+	    x2 = x1 + 1;
+	    y2 = y1 + 1;
+
+	    repeat (repeat_mode, width, height, &x1, &y1);
+	    repeat (repeat_mode, width, height, &x2, &y2);
+	    
+	    *t++ = x1;
+	    *t++ = y1;
+	    *t++ = x2;
+	    *t++ = y1;
+	    *t++ = x1;
+	    *t++ = y2;
+	    *t++ = x2;
+	    *t++ = y2;
+
+	    *d++ = distx;
+	    *d++ = disty;
+
+	    coords += 2;
+	}
+
+	bits_image_fetch_pixels_src_clip (image, temps, tmp_n_pixels * 4);
+
+	u = (uint32_t *)temps;
+	d = dists;
+	for (j = 0; j < tmp_n_pixels; ++j)
+	{
+	    uint32_t tl, tr, bl, br, r;
+	    int32_t idistx, idisty;
+	    uint32_t ft, fb;
+	    
+	    tl = *u++;
+	    tr = *u++;
+	    bl = *u++;
+	    br = *u++;
+
+	    distx = *d++;
+	    disty = *d++;
+
+	    idistx = 256 - distx;
+	    idisty = 256 - disty;
+	    
+	    ft = FbGet8(tl,0) * idistx + FbGet8(tr,0) * distx;
+	    fb = FbGet8(bl,0) * idistx + FbGet8(br,0) * distx;
+	    r = (((ft * idisty + fb * disty) >> 16) & 0xff);
+	    ft = FbGet8(tl,8) * idistx + FbGet8(tr,8) * distx;
+	    fb = FbGet8(bl,8) * idistx + FbGet8(br,8) * distx;
+	    r |= (((ft * idisty + fb * disty) >> 8) & 0xff00);
+	    ft = FbGet8(tl,16) * idistx + FbGet8(tr,16) * distx;
+	    fb = FbGet8(bl,16) * idistx + FbGet8(br,16) * distx;
+	    r |= (((ft * idisty + fb * disty)) & 0xff0000);
+	    ft = FbGet8(tl,24) * idistx + FbGet8(tr,24) * distx;
+	    fb = FbGet8(bl,24) * idistx + FbGet8(br,24) * distx;
+	    r |= (((ft * idisty + fb * disty) << 8) & 0xff000000);
+
+	    buffer[i++] = r;
+	}
+    }
 }
 
+/* Buffer contains list of fixed-point coordinates on input,
+ * a list of pixels on output
+ */
 static void
-fbStoreExternalAlpha64 (bits_image_t * image, int x, int y, int width,
-			uint64_t *buffer)
+bits_image_fetch_convolution_pixels (bits_image_t *image, uint32_t *buffer, int n_pixels)
 {
-    uint32_t *bits, *alpha_bits;
-    int32_t stride, astride;
-    int ax, ay;
-    storeProc64 store;
-    storeProc64 astore;
-    const pixman_indexed_t * indexed = image->indexed;
-    const pixman_indexed_t * aindexed;
+    uint32_t tmp_pixels_stack[N_TMP_PIXELS * 2]; /* Two coordinates per pixel */
+    uint32_t *tmp_pixels = tmp_pixels_stack;
+    pixman_fixed_t *params = image->common.filter_params;
+    int x_off = (params[0] - pixman_fixed_1) >> 1;
+    int y_off = (params[1] - pixman_fixed_1) >> 1;
+    int n_tmp_pixels;
+    int32_t *coords;
+    int32_t *t;
+    uint32_t *u;
+    int i;
+    int max_n_kernels;
+
+    int32_t cwidth = pixman_fixed_to_int (params[0]);
+    int32_t cheight = pixman_fixed_to_int (params[1]);
+    int kernel_size = cwidth * cheight;
+
+    params += 2;
+
+    n_tmp_pixels = N_TMP_PIXELS;
+    if (kernel_size > n_tmp_pixels)
+    {
+	/* Two coordinates per pixel */
+	tmp_pixels = malloc (kernel_size * 2 * sizeof (uint32_t));
+	n_tmp_pixels = kernel_size;
+
+	if (!tmp_pixels)
+	{
+	    /* We ignore out-of-memory during rendering */
+	    return;
+	}
+    }
 
-    store = ACCESS(pixman_storeProcForPicture64)(image);
-    astore = ACCESS(pixman_storeProcForPicture64)(image->common.alpha_map);
-    aindexed = image->common.alpha_map->indexed;
+    max_n_kernels = n_tmp_pixels / kernel_size;
+    
+    i = 0;
+    coords = (int32_t *)buffer;
+    while (i < n_pixels)
+    {
+	int n_kernels = MIN (max_n_kernels, (n_pixels - i));
+	pixman_repeat_t repeat_mode = image->common.repeat;
+	int width = image->width;
+	int height = image->height;
+	int j;
+	
+	t = (int32_t *)tmp_pixels;
+	for (j = 0; j < n_kernels; ++j)
+	{
+	    int32_t x, y, x1, x2, y1, y2;
+
+	    /* Subtract pixman_fixed_e to ensure that 0.5 rounds to 0, not 1 */
+	    x1 = pixman_fixed_to_int (coords[0] - pixman_fixed_e) - x_off;
+	    y1 = pixman_fixed_to_int (coords[1] - pixman_fixed_e) - y_off;
+	    x2 = x1 + cwidth;
+	    y2 = y1 + cheight;
+
+	    for (y = y1; y < y2; ++y)
+	    {
+		for (x = x1; x < x2; ++x)
+		{
+		    int rx = x;
+		    int ry = y;
+		    
+		    repeat (repeat_mode, width, height, &rx, &ry);
+		    
+		    *t++ = rx;
+		    *t++ = ry;
+		}
+	    }
+
+	    coords += 2;
+	}
+
+	bits_image_fetch_pixels_src_clip (image, tmp_pixels, n_kernels * kernel_size);
+
+	u = tmp_pixels;
+	for (j = 0; j < n_kernels; ++j)
+	{
+	    int32_t srtot, sgtot, sbtot, satot;
+	    pixman_fixed_t *p = params;
+	    int k;
+
+	    srtot = sgtot = sbtot = satot = 0;
+		
+	    for (k = 0; k < kernel_size; ++k)
+	    {
+		pixman_fixed_t f = *p++;
+		if (f)
+		{
+		    uint32_t c = *u++;
+
+		    srtot += Red(c) * f;
+		    sgtot += Green(c) * f;
+		    sbtot += Blue(c) * f;
+		    satot += Alpha(c) * f;
+		}
+	    }
+
+	    satot >>= 16;
+	    srtot >>= 16;
+	    sgtot >>= 16;
+	    sbtot >>= 16;
+	    
+	    if (satot < 0) satot = 0; else if (satot > 0xff) satot = 0xff;
+	    if (srtot < 0) srtot = 0; else if (srtot > 0xff) srtot = 0xff;
+	    if (sgtot < 0) sgtot = 0; else if (sgtot > 0xff) sgtot = 0xff;
+	    if (sbtot < 0) sbtot = 0; else if (sbtot > 0xff) sbtot = 0xff;
+
+	    buffer[i++] = ((satot << 24) |
+			   (srtot << 16) |
+			   (sgtot <<  8) |
+			   (sbtot       ));
+	}
+    }
+    
+    if (tmp_pixels != tmp_pixels_stack)
+	free (tmp_pixels);
+}
 
-    ax = x;
-    ay = y;
+static void
+bits_image_fetch_filtered (bits_image_t *pict, uint32_t *buffer, int n_pixels)
+{
+    switch (pict->common.filter)
+    {
+    case PIXMAN_FILTER_NEAREST:
+    case PIXMAN_FILTER_FAST:
+	bits_image_fetch_nearest_pixels (pict, buffer, n_pixels);
+	break;
+	
+    case PIXMAN_FILTER_BILINEAR:
+    case PIXMAN_FILTER_GOOD:
+    case PIXMAN_FILTER_BEST:
+	bits_image_fetch_bilinear_pixels (pict, buffer, n_pixels);
+	break;
+	
+    case PIXMAN_FILTER_CONVOLUTION:
+	bits_image_fetch_convolution_pixels (pict, buffer, n_pixels);
+	break;
+    }
+}
 
-    bits = image->bits;
-    stride = image->rowstride;
+static void
+bits_image_fetch_transformed (bits_image_t * pict, int x, int y, int width,
+			      uint32_t *buffer, uint32_t *mask, uint32_t maskBits)
+{
+    uint32_t     *bits;
+    int32_t    stride;
+    pixman_vector_t v;
+    pixman_vector_t unit;
+    pixman_bool_t affine = TRUE;
+    uint32_t tmp_buffer[2 * N_TMP_PIXELS];
+    int32_t *coords;
+    int i;
+
+    bits = pict->bits;
+    stride = pict->rowstride;
+
+    /* reference point is the center of the pixel */
+    v.vector[0] = pixman_int_to_fixed(x) + pixman_fixed_1 / 2;
+    v.vector[1] = pixman_int_to_fixed(y) + pixman_fixed_1 / 2;
+    v.vector[2] = pixman_fixed_1;
+
+    /* when using convolution filters or PIXMAN_REPEAT_PAD one
+     * might get here without a transform */
+    if (pict->common.transform)
+    {
+        if (!pixman_transform_point_3d (pict->common.transform, &v))
+            return;
+	
+        unit.vector[0] = pict->common.transform->matrix[0][0];
+        unit.vector[1] = pict->common.transform->matrix[1][0];
+        unit.vector[2] = pict->common.transform->matrix[2][0];
+
+        affine = (v.vector[2] == pixman_fixed_1 && unit.vector[2] == 0);
+    }
+    else
+    {
+        unit.vector[0] = pixman_fixed_1;
+        unit.vector[1] = 0;
+        unit.vector[2] = 0;
+    }
+
+    i = 0;
+    while (i < width)
+    {
+	int n_pixels = MIN (N_TMP_PIXELS, width - i);
+	int j;
+	
+	coords = (int32_t *)tmp_buffer;
+
+	for (j = 0; j < n_pixels; ++j)
+	{
+	    if (affine)
+	    {
+		coords[0] = v.vector[0];
+		coords[1] = v.vector[1];
+	    }
+	    else
+	    {
+		pixman_fixed_48_16_t div;
+		
+		div = ((pixman_fixed_48_16_t)v.vector[0] << 16) / v.vector[2];
+
+		if ((div >> 16) > 0x7fff)
+		    coords[0] = 0x7fffffff; 
+		else if ((div >> 16) < 0x8000)
+		    coords[0] = 0x80000000;
+		else
+		    coords[0] = div;
+		
+		div = ((pixman_fixed_48_16_t)v.vector[1] << 16) / v.vector[2];
+
+		if ((div >> 16) > 0x7fff)
+		    coords[1] = 0x7fffffff;
+		else if ((div >> 16) < 0x8000)
+		    coords[1] = 0x8000000;
+		else
+		    coords[1] = div;
+
+		v.vector[2] += unit.vector[2];
+	    }
+
+	    coords += 2;
+
+	    v.vector[0] += unit.vector[0];
+	    v.vector[1] += unit.vector[1];
+	}
+
+	bits_image_fetch_filtered (pict, tmp_buffer, n_pixels);
+	
+	for (j = 0; j < n_pixels; ++j)
+	    buffer[i++] = tmp_buffer[j];
+    }
+}
+
+static void
+bits_image_fetch_solid_32 (bits_image_t * image,
+			   int x, int y, int width,
+			   uint32_t *buffer,
+			   uint32_t *mask, uint32_t maskBits)
+{
+    uint32_t color[2];
+    uint32_t *end;
+
+    color[0] = 0;
+    color[1] = 0;
+    
+    image->fetch_pixels_32 (image, color, 1);
+    
+    end = buffer + width;
+    while (buffer < end)
+	*(buffer++) = color[0];
+}
 
-    alpha_bits = image->common.alpha_map->bits;
-    astride = image->common.alpha_map->rowstride;
+static void
+bits_image_fetch_solid_64 (bits_image_t * image,
+			   int x, int y, int width,
+			   uint64_t *buffer, void *unused, uint32_t unused2)
+{
+    uint32_t color[2];
+    uint64_t *end;
+    uint32_t *coords = (uint32_t *)color;
 
-    bits       += y*stride;
-    alpha_bits += (ay - image->common.alpha_origin.y)*astride;
+    coords[0] = 0;
+    coords[1] = 1;
+    
+    image->fetch_pixels_64 (image, (uint64_t *)color, 1);
+    
+    end = buffer + width;
+    while (buffer < end)
+	*(buffer++) = color[0];
+}
 
+static void
+bits_image_fetch_untransformed_32 (bits_image_t * image,
+				   int x, int y, int width,
+				   uint32_t *buffer, uint32_t *mask, uint32_t maskBits)
+{
+    image->fetch_scanline_raw_32 (image, x, y, width, buffer);
+}
 
-    store((pixman_image_t *)image, bits, buffer, x, width, indexed);
-    astore((pixman_image_t *)image->common.alpha_map,
-	   alpha_bits, buffer, ax - image->common.alpha_origin.x, width, aindexed);
+static void
+bits_image_fetch_untransformed_64 (bits_image_t * image,
+				   int x, int y, int width,
+				   uint64_t *buffer, void *unused, uint32_t unused2)
+{
+    image->fetch_scanline_raw_64 (image, x, y, width, buffer);
 }
 
 static void
@@ -196,55 +683,46 @@ bits_image_property_changed (pixman_image_t *image)
 	image->common.get_scanline_64 =
 	    (scanFetchProc)_pixman_image_get_scanline_64_generic;
 	image->common.get_scanline_32 =
-	    (scanFetchProc)READ_ACCESS(fbFetchExternalAlpha);
+	    (scanFetchProc)bits_image_fetch_transformed;
     }
     else if ((bits->common.repeat != PIXMAN_REPEAT_NONE) &&
 	    bits->width == 1 &&
 	    bits->height == 1)
     {
-	image->common.get_scanline_64 = (scanFetchProc)fbFetchSolid64;
-	image->common.get_scanline_32 = (scanFetchProc)fbFetchSolid;
+	image->common.get_scanline_64 = (scanFetchProc)bits_image_fetch_solid_64;
+	image->common.get_scanline_32 = (scanFetchProc)bits_image_fetch_solid_32;
     }
     else if (!bits->common.transform &&
 	     bits->common.filter != PIXMAN_FILTER_CONVOLUTION &&
 	     bits->common.repeat != PIXMAN_REPEAT_PAD &&
 	     bits->common.repeat != PIXMAN_REPEAT_REFLECT)
     {
-	image->common.get_scanline_64 = (scanFetchProc)fbFetch64;
-	image->common.get_scanline_32 = (scanFetchProc)fbFetch;
+	image->common.get_scanline_64 = (scanFetchProc)bits_image_fetch_untransformed_64;
+	image->common.get_scanline_32 = (scanFetchProc)bits_image_fetch_untransformed_32;
     }
     else
     {
 	image->common.get_scanline_64 =
 	    (scanFetchProc)_pixman_image_get_scanline_64_generic;
 	image->common.get_scanline_32 =
-	    (scanFetchProc)READ_ACCESS(fbFetchTransformed);
+	    (scanFetchProc)bits_image_fetch_transformed;
     }
     
-    if (bits->common.alpha_map)
-    {
-	bits->store_scanline_64 = (scanStoreProc)fbStoreExternalAlpha64;
-	bits->store_scanline_32 = fbStoreExternalAlpha;
-    }
-    else
-    {
-	bits->store_scanline_64 = (scanStoreProc)fbStore64;
-	bits->store_scanline_32 = fbStore;
-    }
-}
+    bits->fetch_scanline_raw_32 =
+	READ_ACCESS(pixman_fetchProcForPicture32)(bits);
+    bits->fetch_scanline_raw_64 =
+	READ_ACCESS(pixman_fetchProcForPicture64)(bits);
+    
+    bits->fetch_pixels_32 = READ_ACCESS(pixman_fetchPixelProcForPicture32)(bits);
+    bits->fetch_pixels_64 = READ_ACCESS(pixman_fetchPixelProcForPicture64)(bits);
 
-void
-_pixman_image_store_scanline_32 (bits_image_t *image, int x, int y, int width,
-				 uint32_t *buffer)
-{
-    image->store_scanline_32 (image, x, y, width, buffer);
-}
+    bits->store_scanline_64 = bits_image_store_scanline_64;
+    bits->store_scanline_32 = bits_image_store_scanline_32;
 
-void
-_pixman_image_store_scanline_64 (bits_image_t *image, int x, int y, int width,
-				 uint32_t *buffer)
-{
-    image->store_scanline_64 (image, x, y, width, buffer);
+    bits->store_scanline_raw_32 =
+	WRITE_ACCESS(pixman_storeProcForPicture32)(bits);
+    bits->store_scanline_raw_64 =
+	WRITE_ACCESS(pixman_storeProcForPicture64)(bits);
 }
 
 static uint32_t *
diff --git a/pixman/pixman-private.h b/pixman/pixman-private.h
index 9e770f6..78e0e96 100644
--- a/pixman/pixman-private.h
+++ b/pixman/pixman-private.h
@@ -184,6 +184,9 @@ typedef struct _FbComposeData {
     uint16_t	 height;
 } FbComposeData;
 
+typedef void (* fetch_pixels_32_t) (bits_image_t *image, uint32_t *buffer, int n_pixels);
+typedef void (* fetch_pixels_64_t) (bits_image_t *image, uint64_t *buffer, int n_pixels);
+
 void pixman_composite_rect_general_accessors (const FbComposeData *data,
                                               void *src_buffer,
                                               void *mask_buffer,
@@ -192,17 +195,17 @@ void pixman_composite_rect_general_accessors (const FbComposeData *data,
 void pixman_composite_rect_general (const FbComposeData *data);
 
 fetchProc32 pixman_fetchProcForPicture32 (bits_image_t *);
-fetchPixelProc32 pixman_fetchPixelProcForPicture32 (bits_image_t *);
+fetch_pixels_32_t pixman_fetchPixelProcForPicture32 (bits_image_t *);
 storeProc32 pixman_storeProcForPicture32 (bits_image_t *);
 fetchProc32 pixman_fetchProcForPicture32_accessors (bits_image_t *);
-fetchPixelProc32 pixman_fetchPixelProcForPicture32_accessors (bits_image_t *);
+fetch_pixels_32_t pixman_fetchPixelProcForPicture32_accessors (bits_image_t *);
 storeProc32 pixman_storeProcForPicture32_accessors (bits_image_t *);
 
 fetchProc64 pixman_fetchProcForPicture64 (bits_image_t *);
-fetchPixelProc64 pixman_fetchPixelProcForPicture64 (bits_image_t *);
+fetch_pixels_64_t pixman_fetchPixelProcForPicture64 (bits_image_t *);
 storeProc64 pixman_storeProcForPicture64 (bits_image_t *);
 fetchProc64 pixman_fetchProcForPicture64_accessors (bits_image_t *);
-fetchPixelProc64 pixman_fetchPixelProcForPicture64_accessors (bits_image_t *);
+fetch_pixels_64_t pixman_fetchPixelProcForPicture64_accessors (bits_image_t *);
 storeProc64 pixman_storeProcForPicture64_accessors (bits_image_t *);
 
 void pixman_expand(uint64_t *dst, const uint32_t *src, pixman_format_code_t, int width);
@@ -212,19 +215,6 @@ void pixmanFetchGradient (gradient_t *, int x, int y, int width,
                            uint32_t *buffer, uint32_t *mask, uint32_t maskBits);
 void _pixman_image_get_scanline_64_generic (pixman_image_t * pict, int x, int y, int width,
 					    uint64_t *buffer, uint64_t *mask, uint32_t maskBits);
-void fbFetchTransformed(bits_image_t *, int x, int y, int width,
-                        uint32_t *buffer, uint32_t *mask, uint32_t maskBits);
-void fbFetchExternalAlpha(bits_image_t *, int x, int y, int width,
-                          uint32_t *buffer, uint32_t *mask, uint32_t maskBits);
-
-void fbFetchTransformed_accessors(bits_image_t *, int x, int y, int width,
-                                  uint32_t *buffer, uint32_t *mask,
-                                  uint32_t maskBits);
-void fbStoreExternalAlpha_accessors(bits_image_t *, int x, int y, int width,
-                                    uint32_t *buffer);
-void fbFetchExternalAlpha_accessors(bits_image_t *, int x, int y, int width,
-                                    uint32_t *buffer, uint32_t *mask,
-                                    uint32_t maskBits);
 
 /* end */
 
@@ -270,6 +260,9 @@ _pixman_image_get_scanline_64 (pixman_image_t *image, int x, int y, int width,
 void
 _pixman_image_store_scanline_32 (bits_image_t *image, int x, int y, int width,
 				 uint32_t *buffer);
+void
+_pixman_image_fetch_pixels (bits_image_t *image, uint32_t *buffer, int n_pixels);
+
 /* Even thought the type of buffer is uint32_t *, the function actually expects
  * a uint64_t *buffer.
  */
@@ -389,8 +382,17 @@ struct bits_image
     uint32_t *			free_me;
     int				rowstride; /* in number of uint32_t's */
 
+    fetch_pixels_32_t		fetch_pixels_32;
+    fetch_pixels_64_t		fetch_pixels_64;
+
     scanStoreProc		store_scanline_32;
     scanStoreProc		store_scanline_64;
+
+    storeProc32			store_scanline_raw_32;
+    storeProc64			store_scanline_raw_64;
+
+    fetchProc32			fetch_scanline_raw_32;
+    fetchProc64			fetch_scanline_raw_64;
 };
 
 union pixman_image
@@ -552,11 +554,15 @@ _pixman_gradient_walker_pixel (GradientWalker       *walker,
 
 #define MOD(a,b) ((a) < 0 ? ((b) - ((-(a) - 1) % (b))) - 1 : (a) % (b))
 
+/* Divides two fixed-point numbers and returns an integer */
 #define DIV(a,b) ((((a) < 0) == ((b) < 0)) ? (a) / (b) :		\
 		  ((a) - (b) + 1 - (((b) < 0) << 1)) / (b))
 
 #define CLIP(a,b,c) ((a) < (b) ? (b) : ((a) > (c) ? (c) : (a)))
 
+#define MIN(a,b) ((a < b)? a : b)
+#define MAX(a,b) ((a > b)? a : b)
+
 #if 0
 /* FIXME: the MOD macro above is equivalent, but faster I think */
 #define mod(a,b) ((b) == 1 ? 0 : (a) >= 0 ? (a) % (b) : (b) - (-a) % (b))
@@ -567,40 +573,6 @@ _pixman_gradient_walker_pixel (GradientWalker       *walker,
  * where Fetch4 doesn't have a READ
  */
 
-#if 0
-/* Framebuffer access support macros */
-#define ACCESS_MEM(code)						\
-    do {								\
-	const image_common_t *const com__ =				\
-	    (image_common_t *)image;					\
-									\
-	if (!com__->read_func && !com__->write_func)			\
-	{								\
-	    const int do_access__ = 0;					\
-	    const pixman_read_memory_func_t read_func__ = NULL;		\
-	    const pixman_write_memory_func_t write_func__ = NULL;	\
-	    (void)read_func__;						\
-	    (void)write_func__;						\
-	    (void)do_access__;						\
-									\
-	    {code}							\
-	}								\
-	else								\
-	{								\
-	    const int do_access__ = 1;					\
-	    const pixman_read_memory_func_t read_func__ =		\
-		com__->read_func;					\
-	    const pixman_write_memory_func_t write_func__ =		\
-		com__->write_func;					\
-	    (void)read_func__;						\
-	    (void)write_func__;						\
-	    (void)do_access__;						\
-	    								\
-	    {code}							\
-	}								\
-    } while (0)
-#endif
-
 #ifdef PIXMAN_FB_ACCESSORS
 
 #define ACCESS(sym) sym##_accessors
diff --git a/pixman/pixman-transformed-accessors.c b/pixman/pixman-transformed-accessors.c
deleted file mode 100644
index 442ca24..0000000
--- a/pixman/pixman-transformed-accessors.c
+++ /dev/null
@@ -1,3 +0,0 @@
-#define PIXMAN_FB_ACCESSORS
-
-#include "pixman-transformed.c"
diff --git a/pixman/pixman-transformed.c b/pixman/pixman-transformed.c
deleted file mode 100644
index d721b35..0000000
--- a/pixman/pixman-transformed.c
+++ /dev/null
@@ -1,510 +0,0 @@
-/*
- *
- * Copyright © 2000 Keith Packard, member of The XFree86 Project, Inc.
- *             2005 Lars Knoll & Zack Rusin, Trolltech
- *             2008 Aaron Plattner, NVIDIA Corporation
- *
- * Permission to use, copy, modify, distribute, and sell this software and its
- * documentation for any purpose is hereby granted without fee, provided that
- * the above copyright notice appear in all copies and that both that
- * copyright notice and this permission notice appear in supporting
- * documentation, and that the name of Keith Packard not be used in
- * advertising or publicity pertaining to distribution of the software without
- * specific, written prior permission.  Keith Packard makes no
- * representations about the suitability of this software for any purpose.  It
- * is provided "as is" without express or implied warranty.
- *
- * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS
- * SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
- * FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
- * SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
- * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
- * AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
- * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
- * SOFTWARE.
- */
-
-#ifdef HAVE_CONFIG_H
-#include <config.h>
-#endif
-
-#include <stdlib.h>
-
-#include "pixman-private.h"
-
-#define Alpha(x) ((x) >> 24)
-#define Red(x) (((x) >> 16) & 0xff)
-#define Green(x) (((x) >> 8) & 0xff)
-#define Blue(x) ((x) & 0xff)
-
-#define Alpha64(x) ((x) >> 48)
-#define Red64(x) (((x) >> 32) & 0xffff)
-#define Green64(x) (((x) >> 16) & 0xffff)
-#define Blue64(x) ((x) & 0xffff)
-
-/*
- * Fetch from region strategies
- */
-typedef FASTCALL uint32_t (*fetchFromRegionProc)(bits_image_t *pict, int x, int y, uint32_t *buffer, fetchPixelProc32 fetch, pixman_box32_t *box);
-
-/*
- * There are two properties we can make use of when fetching pixels
- *
- * (a) Is the source clip just the image itself?
- *
- * (b) Do we know the coordinates of the pixel to fetch are
- *     within the image boundaries;
- *
- * Source clips are almost never used, so the important case to optimize
- * for is when src_clip is false. Since inside_bounds is statically known,
- * the last part of the if statement will normally be optimized away.
- */
-static force_inline uint32_t
-do_fetch (bits_image_t *pict, int x, int y, fetchPixelProc32 fetch,
-	  pixman_bool_t src_clip,
-	  pixman_bool_t inside_bounds)
-{
-    if (src_clip)
-    {
-	if (pixman_region32_contains_point (pict->common.src_clip, x, y,NULL))
-	    return fetch (pict, x, y);
-	else
-	    return 0;
-    }
-    else if (inside_bounds)
-    {
-	return fetch (pict, x, y);
-    }
-    else
-    {
-	if (x >= 0 && x < pict->width && y >= 0 && y < pict->height)
-	    return fetch (pict, x, y);
-	else
-	    return 0;
-    }
-}
-
-/*
- * Fetching Algorithms
- */
-static inline uint32_t
-fetch_nearest (bits_image_t		*pict,
-	       fetchPixelProc32		 fetch,
-	       pixman_bool_t		 affine,
-	       pixman_repeat_t		 repeat,
-	       pixman_bool_t             has_src_clip,
-	       const pixman_vector_t    *v)
-{
-    if (!v->vector[2])
-    {
-	return 0;
-    }
-    else
-    {
-	int x, y;
-	pixman_bool_t inside_bounds;
-
-	if (!affine)
-	{
-	    x = DIV(v->vector[0], v->vector[2]);
-	    y = DIV(v->vector[1], v->vector[2]);
-	}
-	else
-	{
-	    x = v->vector[0]>>16;
-	    y = v->vector[1]>>16;
-	}
-
-	switch (repeat)
-	{
-	case PIXMAN_REPEAT_NORMAL:
-	    x = MOD (x, pict->width);
-	    y = MOD (y, pict->height);
-	    inside_bounds = TRUE;
-	    break;
-	    
-	case PIXMAN_REPEAT_PAD:
-	    x = CLIP (x, 0, pict->width-1);
-	    y = CLIP (y, 0, pict->height-1);
-	    inside_bounds = TRUE;
-	    break;
-	    
-	case PIXMAN_REPEAT_REFLECT:
-	    x = MOD (x, pict->width * 2);
-	    if (x >= pict->width)
-		x = pict->width * 2 - x - 1;
-	    y = MOD (y, pict->height * 2);
-	    if (y >= pict->height)
-		y = pict->height * 2 - y - 1;
-	    inside_bounds = TRUE;
-	    break;
-
-	case PIXMAN_REPEAT_NONE:
-	    inside_bounds = FALSE;
-	    break;
-
-	default:
-	    return 0;
-	}
-
-	return do_fetch (pict, x, y, fetch, has_src_clip, inside_bounds);
-    }
-}
-
-static inline uint32_t
-fetch_bilinear (bits_image_t		*pict,
-		fetchPixelProc32	 fetch,
-		pixman_bool_t		 affine,
-		pixman_repeat_t		 repeat,
-		pixman_bool_t		 has_src_clip,
-		const pixman_vector_t   *v)
-{
-    if (!v->vector[2])
-    {
-	return 0;
-    }
-    else
-    {
-	int x1, x2, y1, y2, distx, idistx, disty, idisty;
-	uint32_t tl, tr, bl, br, r;
-	uint32_t ft, fb;
-	pixman_bool_t inside_bounds;
-	
-	if (!affine)
-	{
-	    pixman_fixed_48_16_t div;
-	    div = ((pixman_fixed_48_16_t)v->vector[0] << 16)/v->vector[2];
-	    x1 = div >> 16;
-	    distx = ((pixman_fixed_t)div >> 8) & 0xff;
-	    div = ((pixman_fixed_48_16_t)v->vector[1] << 16)/v->vector[2];
-	    y1 = div >> 16;
-	    disty = ((pixman_fixed_t)div >> 8) & 0xff;
-	}
-	else
-	{
-	    x1 = v->vector[0] >> 16;
-	    distx = (v->vector[0] >> 8) & 0xff;
-	    y1 = v->vector[1] >> 16;
-	    disty = (v->vector[1] >> 8) & 0xff;
-	}
-	x2 = x1 + 1;
-	y2 = y1 + 1;
-	
-	idistx = 256 - distx;
-	idisty = 256 - disty;
-
-	switch (repeat)
-	{
-	case PIXMAN_REPEAT_NORMAL:
-	    x1 = MOD (x1, pict->width);
-	    x2 = MOD (x2, pict->width);
-	    y1 = MOD (y1, pict->height);
-	    y2 = MOD (y2, pict->height);
-	    inside_bounds = TRUE;
-	    break;
-	    
-	case PIXMAN_REPEAT_PAD:
-	    x1 = CLIP (x1, 0, pict->width-1);
-	    x2 = CLIP (x2, 0, pict->width-1);
-	    y1 = CLIP (y1, 0, pict->height-1);
-	    y2 = CLIP (y2, 0, pict->height-1);
-	    inside_bounds = TRUE;
-	    break;
-	    
-	case PIXMAN_REPEAT_REFLECT:
-	    x1 = MOD (x1, pict->width * 2);
-	    if (x1 >= pict->width)
-		x1 = pict->width * 2 - x1 - 1;
-	    x2 = MOD (x2, pict->width * 2);
-	    if (x2 >= pict->width)
-		x2 = pict->width * 2 - x2 - 1;
-	    y1 = MOD (y1, pict->height * 2);
-	    if (y1 >= pict->height)
-		y1 = pict->height * 2 - y1 - 1;
-	    y2 = MOD (y2, pict->height * 2);
-	    if (y2 >= pict->height)
-		y2 = pict->height * 2 - y2 - 1;
-	    inside_bounds = TRUE;
-	    break;
-
-	case PIXMAN_REPEAT_NONE:
-	    inside_bounds = FALSE;
-	    break;
-
-	default:
-	    return 0;
-	}
-	
-	tl = do_fetch(pict, x1, y1, fetch, has_src_clip, inside_bounds);
-	tr = do_fetch(pict, x2, y1, fetch, has_src_clip, inside_bounds);
-	bl = do_fetch(pict, x1, y2, fetch, has_src_clip, inside_bounds);
-	br = do_fetch(pict, x2, y2, fetch, has_src_clip, inside_bounds);
-	
-	ft = FbGet8(tl,0) * idistx + FbGet8(tr,0) * distx;
-	fb = FbGet8(bl,0) * idistx + FbGet8(br,0) * distx;
-	r = (((ft * idisty + fb * disty) >> 16) & 0xff);
-	ft = FbGet8(tl,8) * idistx + FbGet8(tr,8) * distx;
-	fb = FbGet8(bl,8) * idistx + FbGet8(br,8) * distx;
-	r |= (((ft * idisty + fb * disty) >> 8) & 0xff00);
-	ft = FbGet8(tl,16) * idistx + FbGet8(tr,16) * distx;
-	fb = FbGet8(bl,16) * idistx + FbGet8(br,16) * distx;
-	r |= (((ft * idisty + fb * disty)) & 0xff0000);
-	ft = FbGet8(tl,24) * idistx + FbGet8(tr,24) * distx;
-	fb = FbGet8(bl,24) * idistx + FbGet8(br,24) * distx;
-	r |= (((ft * idisty + fb * disty) << 8) & 0xff000000);
-
-	return r;
-    }
-}
-
-static void
-fbFetchTransformed_Convolution(bits_image_t * pict, int width, uint32_t *buffer, uint32_t *mask, uint32_t maskBits,
-			       pixman_bool_t affine, pixman_vector_t v, pixman_vector_t unit)
-{
-    fetchPixelProc32 fetch;
-    int i;
-
-    pixman_fixed_t *params = pict->common.filter_params;
-    int32_t cwidth = pixman_fixed_to_int(params[0]);
-    int32_t cheight = pixman_fixed_to_int(params[1]);
-    int xoff = (params[0] - pixman_fixed_1) >> 1;
-    int yoff = (params[1] - pixman_fixed_1) >> 1;
-    fetch = ACCESS(pixman_fetchPixelProcForPicture32)(pict);
-
-    params += 2;
-    for (i = 0; i < width; ++i) {
-        if (!mask || mask[i] & maskBits)
-        {
-            if (!v.vector[2]) {
-                *(buffer + i) = 0;
-            } else {
-                int x1, x2, y1, y2, x, y;
-                int32_t srtot, sgtot, sbtot, satot;
-                pixman_fixed_t *p = params;
-
-                if (!affine) {
-                    pixman_fixed_48_16_t tmp;
-                    tmp = ((pixman_fixed_48_16_t)v.vector[0] << 16)/v.vector[2] - xoff;
-                    x1 = pixman_fixed_to_int(tmp);
-                    tmp = ((pixman_fixed_48_16_t)v.vector[1] << 16)/v.vector[2] - yoff;
-                    y1 = pixman_fixed_to_int(tmp);
-                } else {
-                    x1 = pixman_fixed_to_int(v.vector[0] - xoff);
-                    y1 = pixman_fixed_to_int(v.vector[1] - yoff);
-                }
-                x2 = x1 + cwidth;
-                y2 = y1 + cheight;
-
-                srtot = sgtot = sbtot = satot = 0;
-
-                for (y = y1; y < y2; y++) {
-                    int ty;
-                    switch (pict->common.repeat) {
-                        case PIXMAN_REPEAT_NORMAL:
-                            ty = MOD (y, pict->height);
-                            break;
-                        case PIXMAN_REPEAT_PAD:
-                            ty = CLIP (y, 0, pict->height-1);
-                            break;
-			case PIXMAN_REPEAT_REFLECT:
-			    ty = MOD (y, pict->height * 2);
-			    if (ty >= pict->height)
-				ty = pict->height * 2 - ty - 1;
-			    break;
-                        default:
-                            ty = y;
-                    }
-                    for (x = x1; x < x2; x++) {
-                        if (*p) {
-                            int tx;
-                            switch (pict->common.repeat) {
-                                case PIXMAN_REPEAT_NORMAL:
-                                    tx = MOD (x, pict->width);
-                                    break;
-                                case PIXMAN_REPEAT_PAD:
-                                    tx = CLIP (x, 0, pict->width-1);
-                                    break;
-				case PIXMAN_REPEAT_REFLECT:
-				    tx = MOD (x, pict->width * 2);
-				    if (tx >= pict->width)
-					tx = pict->width * 2 - tx - 1;
-				    break;
-                                default:
-                                    tx = x;
-                            }
-                            if (pixman_region32_contains_point (pict->common.src_clip, tx, ty, NULL)) {
-                                uint32_t c = fetch(pict, tx, ty);
-
-                                srtot += Red(c) * *p;
-                                sgtot += Green(c) * *p;
-                                sbtot += Blue(c) * *p;
-                                satot += Alpha(c) * *p;
-                            }
-                        }
-                        p++;
-                    }
-                }
-
-                satot >>= 16;
-                srtot >>= 16;
-                sgtot >>= 16;
-                sbtot >>= 16;
-
-                if (satot < 0) satot = 0; else if (satot > 0xff) satot = 0xff;
-                if (srtot < 0) srtot = 0; else if (srtot > 0xff) srtot = 0xff;
-                if (sgtot < 0) sgtot = 0; else if (sgtot > 0xff) sgtot = 0xff;
-                if (sbtot < 0) sbtot = 0; else if (sbtot > 0xff) sbtot = 0xff;
-
-                *(buffer + i) = ((satot << 24) |
-                                 (srtot << 16) |
-                                 (sgtot <<  8) |
-                                 (sbtot       ));
-            }
-        }
-        v.vector[0] += unit.vector[0];
-        v.vector[1] += unit.vector[1];
-        v.vector[2] += unit.vector[2];
-    }
-}
-
-static void
-adjust (pixman_vector_t *v, pixman_vector_t *u, pixman_fixed_t adjustment)
-{
-    int delta_v = (adjustment * v->vector[2]) >> 16;
-    int delta_u = (adjustment * u->vector[2]) >> 16;
-    
-    v->vector[0] += delta_v;
-    v->vector[1] += delta_v;
-    
-    u->vector[0] += delta_u;
-    u->vector[1] += delta_u;
-}
-
-void
-ACCESS(fbFetchTransformed)(bits_image_t * pict, int x, int y, int width,
-                           uint32_t *buffer, uint32_t *mask, uint32_t maskBits)
-{
-    uint32_t     *bits;
-    int32_t    stride;
-    pixman_vector_t v;
-    pixman_vector_t unit;
-    pixman_bool_t affine = TRUE;
-
-    bits = pict->bits;
-    stride = pict->rowstride;
-
-    /* reference point is the center of the pixel */
-    v.vector[0] = pixman_int_to_fixed(x) + pixman_fixed_1 / 2;
-    v.vector[1] = pixman_int_to_fixed(y) + pixman_fixed_1 / 2;
-    v.vector[2] = pixman_fixed_1;
-
-    /* when using convolution filters or PIXMAN_REPEAT_PAD one might get here without a transform */
-    if (pict->common.transform)
-    {
-        if (!pixman_transform_point_3d (pict->common.transform, &v))
-            return;
-        unit.vector[0] = pict->common.transform->matrix[0][0];
-        unit.vector[1] = pict->common.transform->matrix[1][0];
-        unit.vector[2] = pict->common.transform->matrix[2][0];
-
-        affine = (v.vector[2] == pixman_fixed_1 && unit.vector[2] == 0);
-    }
-    else
-    {
-        unit.vector[0] = pixman_fixed_1;
-        unit.vector[1] = 0;
-        unit.vector[2] = 0;
-    }
-
-    if (pict->common.filter == PIXMAN_FILTER_NEAREST || pict->common.filter == PIXMAN_FILTER_FAST)
-    {
-	fetchPixelProc32   fetch;
-	pixman_bool_t src_clip;
-	int i;
-
-	/* Round down to closest integer, ensuring that 0.5 rounds to 0, not 1 */
-	adjust (&v, &unit, - pixman_fixed_e);
-
-	fetch = ACCESS(pixman_fetchPixelProcForPicture32)(pict);
-	
-	src_clip = pict->common.src_clip != &(pict->common.full_region);
-	
-	for ( i = 0; i < width; ++i)
-	{
-	    if (!mask || mask[i] & maskBits)
-		*(buffer + i) = fetch_nearest (pict, fetch, affine, pict->common.repeat, src_clip, &v);
-	    
-	    v.vector[0] += unit.vector[0];
-	    v.vector[1] += unit.vector[1];
-	    v.vector[2] += unit.vector[2];
-	}
-    }
-    else if (pict->common.filter == PIXMAN_FILTER_BILINEAR	||
-	       pict->common.filter == PIXMAN_FILTER_GOOD	||
-	       pict->common.filter == PIXMAN_FILTER_BEST)
-    {
-	pixman_bool_t src_clip;
-	fetchPixelProc32   fetch;
-	int i;
-
-	/* Let the bilinear code pretend that pixels fall on integer coordinaters */
-	adjust (&v, &unit, -(pixman_fixed_1 / 2));
-
-	fetch = ACCESS(pixman_fetchPixelProcForPicture32)(pict);
-	src_clip = pict->common.src_clip != &(pict->common.full_region);
-	
-	for (i = 0; i < width; ++i)
-	{
-	    if (!mask || mask[i] & maskBits)
-		*(buffer + i) = fetch_bilinear (pict, fetch, affine, pict->common.repeat, src_clip, &v);
-	    
-	    v.vector[0] += unit.vector[0];
-	    v.vector[1] += unit.vector[1];
-	    v.vector[2] += unit.vector[2];
-	}
-    }
-    else if (pict->common.filter == PIXMAN_FILTER_CONVOLUTION)
-    {
-	/* Round to closest integer, ensuring that 0.5 rounds to 0, not 1 */
-	adjust (&v, &unit, - pixman_fixed_e);
-	
-        fbFetchTransformed_Convolution(pict, width, buffer, mask, maskBits, affine, v, unit);
-    }
-}
-
-#define SCANLINE_BUFFER_LENGTH 2048
-
-void
-ACCESS(fbFetchExternalAlpha)(bits_image_t * pict, int x, int y, int width,
-                             uint32_t *buffer, uint32_t *mask,
-                             uint32_t maskBits)
-{
-    int i;
-    uint32_t _alpha_buffer[SCANLINE_BUFFER_LENGTH];
-    uint32_t *alpha_buffer = _alpha_buffer;
-
-    if (!pict->common.alpha_map) {
-        ACCESS(fbFetchTransformed) (pict, x, y, width, buffer, mask, maskBits);
-	return;
-    }
-    if (width > SCANLINE_BUFFER_LENGTH)
-        alpha_buffer = (uint32_t *) pixman_malloc_ab (width, sizeof(uint32_t));
-
-    ACCESS(fbFetchTransformed)(pict, x, y, width, buffer, mask, maskBits);
-    ACCESS(fbFetchTransformed)((bits_image_t *)pict->common.alpha_map, x - pict->common.alpha_origin.x,
-                               y - pict->common.alpha_origin.y, width,
-                               alpha_buffer, mask, maskBits);
-    for (i = 0; i < width; ++i) {
-        if (!mask || mask[i] & maskBits)
-	{
-	    int a = alpha_buffer[i]>>24;
-	    *(buffer + i) = (a << 24)
-		| (div_255(Red(*(buffer + i)) * a) << 16)
-		| (div_255(Green(*(buffer + i)) * a) << 8)
-		| (div_255(Blue(*(buffer + i)) * a));
-	}
-    }
-
-    if (alpha_buffer != _alpha_buffer)
-        free(alpha_buffer);
-}
diff --git a/pixman/pixman-utils.c b/pixman/pixman-utils.c
index ffb1444..8139947 100644
--- a/pixman/pixman-utils.c
+++ b/pixman/pixman-utils.c
@@ -274,6 +274,97 @@ pixman_version (void)
     return PIXMAN_VERSION;
 }
 
+/*
+ * Helper routine to expand a color component from 0 < n <= 8 bits to 16 bits by
+ * replication.
+ */
+static inline uint64_t
+expand16(const uint8_t val, int nbits)
+{
+    // Start out with the high bit of val in the high bit of result.
+    uint16_t result = (uint16_t)val << (16 - nbits);
+
+    if (nbits == 0)
+        return 0;
+
+    // Copy the bits in result, doubling the number of bits each time, until we
+    // fill all 16 bits.
+    while (nbits < 16) {
+        result |= result >> nbits;
+        nbits *= 2;
+    }
+
+    return result;
+}
+
+/*
+ * This function expands images from ARGB8 format to ARGB16.  To preserve
+ * precision, it needs to know the original source format.  For example, if the
+ * source was PIXMAN_x1r5g5b5 and the red component contained bits 12345, then
+ * the expanded value is 12345123.  To correctly expand this to 16 bits, it
+ * should be 1234512345123451 and not 1234512312345123.
+ */
+void
+pixman_expand(uint64_t *dst, const uint32_t *src,
+	      pixman_format_code_t format, int width)
+{
+    /*
+     * Determine the sizes of each component and the masks and shifts required
+     * to extract them from the source pixel.
+     */
+    const int a_size = PIXMAN_FORMAT_A(format),
+              r_size = PIXMAN_FORMAT_R(format),
+              g_size = PIXMAN_FORMAT_G(format),
+              b_size = PIXMAN_FORMAT_B(format);
+    const int a_shift = 32 - a_size,
+              r_shift = 24 - r_size,
+              g_shift = 16 - g_size,
+              b_shift =  8 - b_size;
+    const uint8_t a_mask = ~(~0 << a_size),
+                  r_mask = ~(~0 << r_size),
+                  g_mask = ~(~0 << g_size),
+                  b_mask = ~(~0 << b_size);
+    int i;
+
+    /* Start at the end so that we can do the expansion in place when src == dst */
+    for (i = width - 1; i >= 0; i--)
+    {
+        const uint32_t pixel = src[i];
+        // Extract the components.
+        const uint8_t a = (pixel >> a_shift) & a_mask,
+                      r = (pixel >> r_shift) & r_mask,
+                      g = (pixel >> g_shift) & g_mask,
+                      b = (pixel >> b_shift) & b_mask;
+        const uint64_t a16 = a_size ? expand16(a, a_size) : 0xffff,
+                       r16 = expand16(r, r_size),
+                       g16 = expand16(g, g_size),
+                       b16 = expand16(b, b_size);
+
+        dst[i] = a16 << 48 | r16 << 32 | g16 << 16 | b16;
+    }
+}
+
+/*
+ * Contracting is easier than expanding.  We just need to truncate the
+ * components.
+ */
+void
+pixman_contract(uint32_t *dst, const uint64_t *src, int width)
+{
+    int i;
+
+    /* Start at the beginning so that we can do the contraction in place when
+     * src == dst */
+    for (i = 0; i < width; i++)
+    {
+        const uint8_t a = src[i] >> 56,
+                      r = src[i] >> 40,
+                      g = src[i] >> 24,
+                      b = src[i] >> 8;
+        dst[i] = a << 24 | r << 16 | g << 8 | b;
+    }
+}
+
 /**
  * pixman_version_string:
  *


More information about the xorg-devel mailing list