xf86XVCopyPacked() and friends : why so slow ?

Thu Jul 21 14:15:41 PDT 2011

On Thu, Feb 4, 2010 at 7:45 PM,  <rixed at happyleptic.org> wrote:
> When playing some video with mplayer I noticed with oprofile that
> half the time is spent in xf86XVCopyPacked() or xf86XVCopyYUV12ToPacked().
>
> Looking at the former, I wonder why a mere memcpy was not used instead
> of "manually" copying each words. glibc's memcpy is usually optimized
> for the target architecture while there is little the compiler can do
> to optimize given code.
> Also, for the plannar to packed version, you can achieve much better
> performance using vector instructions, but it's less easy to do it
> portably.
>
> So I suppose there is a good reason why these functions are so slow.
> Maybe because the video driver are supposed to propose better ones ?
> Or maybe because it's planned to use an external library like pixman
> to do this kind of job in the future ?
>
> More to the point, what I'm trying to know is weither I'm supposed to
> optimize my video driver to not use these functions, or if it's OK to
> optimize them instead, and what path I should follow ?

I was digging through some old patches and came across a
Loongson-optimized xf86XVCopyYUV12ToPacked function (attached). Do you
know who wrote it?

Did we ever come to some conclusion as to how this was supposed to be
handled? Would optimized implementations be acceptable to put in
hw/xfree86/common/xf86xv.c?

Also, I see no reason why xf86XVCopyPacked can't be simplified by
using memcpy (or maybe memmove?). Any reason why not?

Matt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xf86-video-siliconmotion-1.7.5-loongson-video-accl.patch
Type: text/x-patch
Size: 3491 bytes
Desc: not available
URL: <http://lists.x.org/archives/xorg-devel/attachments/20110721/50095333/attachment.bin>