Optimization idea: soft XvPutImage

Mon Sep 22 04:55:27 PDT 2008

>> Even if you do not want to do stretch, I believe that the X Render
>> extension would require first copying the YUV data to a drawable and
>> then doing a drawable->drawable block transfer operation to do the
>> YUV transformation.  In comparison, XvPutImage is a single call
>> takes an XImage, which can be in shared memory, and would normally
>> be in YUV, and specifies the YUV->RGB conversion and stretch in a
>> single operation.
> 
> As other people pointed out, XRender does allow arbitrary 3x3
> transformations of source images, but you are right that the XRender
> protocol would require putting the data in a drawable first.

To recapitulate, (1) YUV->RGB plus (2) up-/downscaling combined is not a 
pixel shuffling problem.

(1) is linear color space conversion by 3x3 matrix multiplication per 
pixel one definitely does at most once per pixel,

(2) involves convolution, either antialiasing or low-pass filtering; 
which probably is the reason to decouple (1) and (2) (in the sense of 
computational efficiency and not software technologically).

If neglected (2) then upscaling PAL to HDTV size results in garbage. 
(For entertainment applications "linear interpolation" is enough, in 
science/medicine often not.)

Intel SSE2-intrinsics does a perfect job, a speedup comparable to an 
order of magnitude can happen.

Once in a test connecting concurrently 4 firewire cameras at 640x480 
frame size each and doing no image rescale (i.e. basically only 
YUV->RGB) resulted in (pentium-4) CPU usage doubling easily, in doing 
nothing else but lightmindedly injected a "redundant" (i.e. not 
performing some computation along the way) frame copy somewhere. :-)

Greetings,

     Eeri Kask