[PATCH] fb: Fix memcpy abuse

Fri Apr 29 05:19:50 PDT 2011

On Thu, Apr 21, 2011 at 11:37 PM, Adam Jackson <ajax at redhat.com> wrote:
> The memcpy fast path implicitly assumes that the copy walks
> left-to-right.  That's not something memcpy guarantees, and newer glibc
> on some processors will indeed break that assumption.  Since we walk a
> line at a time, check the source and destination against the width of
> the blit to determine whether we can be sloppy enough to allow memcpy.
> (Having done this, we can remove the check for !reverse as well.)
>
> On an Intel Core i7-2630QM with an NVIDIA GeForce GTX 460M running in
> NoAccel, the broken code and various fixes for -copywinwin{10,100,500}
> gives (edited to fit in 80 columns):
>
> 1: Disable the fastpath entirely
> 2: Replace memcpy with memmove
> 3: This fix
> 4: The code before this fix
>
>  1            2                 3                 4           Operation
> ------   ---------------   ---------------   ---------------   ------------
> 258000   269000 (  1.04)   544000 (  2.11)   552000 (  2.14)   Copy 10x10
>  21300    23000 (  1.08)    43700 (  2.05)    47100 (  2.21)   Copy 100x100
>   960      962 (  1.00)     1990 (  2.09)     1990 (  2.07)   Copy 500x500
>
> So it's a modest performance hit, but correctness demands it, and it's
> probably worth keeping the 2x speedup from having the fast path in the
> first place.

In my opinion, still the best solution is to do this stuff in pixman,
because it has its own SIMD optimized code and can do a lot better job
than memcpy/memmove (for example, it can be improved to use MOVNTx
instructions for x86 when scrolling large areas exceeding L2/L3 cache
size, and this optimization can't be easily done by glibc if
memcpy/memmove is used separately for each individual scanline). I did
some experiments with improving scrolling performance when using
non-hardware accelerated framebuffer earlier and it showed a really
major speedup on ARM:
http://lists.x.org/archives/xorg-devel/2009-November/003536.html

I'm a bit surprised that there were no other people even trying to
push for something similar because the scrolling performance issue is
so obvious. You are more likely to see the people trying to use ARM
devices without hardware acceleration, become horrified by poor
scrolling performance, and then start searching for a way to obtain
and install the proprietary binary drivers :)

> Signed-off-by: Adam Jackson <ajax at redhat.com>
> ---
>  fb/fbblt.c |    9 +++++++--
>  1 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fb/fbblt.c b/fb/fbblt.c
> index a040298..b955245 100644
> --- a/fb/fbblt.c
> +++ b/fb/fbblt.c
> @@ -65,6 +65,7 @@ fbBlt (FbBits   *srcLine,
>     int            n, nmiddle;
>     Bool    destInvarient;
>     int            startbyte, endbyte;
> +    int     careful;
>     FbDeclareMergeRop ();
>
>     if (bpp == 24 && !FbCheck24Pix (pm))
> @@ -74,12 +75,16 @@ fbBlt (FbBits   *srcLine,
>        return;
>     }
>
> -    if (alu == GXcopy && pm == FB_ALLONES && !reverse &&
> +    careful = !((srcLine < dstLine && srcLine + width * (bpp/8) > dstLine) ||
> +                (dstLine < srcLine && dstLine + width * (bpp/8) > srcLine)) ||
> +              (bpp % 8);
> +
> +    if (alu == GXcopy && pm == FB_ALLONES && !careful &&
>             !(srcX & 7) && !(dstX & 7) && !(width & 7)) {
>         int i;
>         CARD8 *src = (CARD8 *) srcLine;
>         CARD8 *dst = (CARD8 *) dstLine;
> -
> +
>         srcStride *= sizeof(FbBits);
>         dstStride *= sizeof(FbBits);
>         width >>= 3;

What is the story with the 'reverse' variable? Does it have no use
anymore and needs to be purged later?

-- 
Best regards,
Siarhei Siamashka