Xvideo performance on Radeon 7500 vs Intel 915

Roland Scheidegger rscheidegger_lists at hispeed.ch
Mon Nov 27 12:21:45 PST 2006


Thomas Hellström wrote:
>>> Also, I tried moving down to 16 bit depth from 24 and didn't see a 
>>> difference.
>>>     
>> This all makes sense, as the bottleneck is probably the transfer from
>> system RAM to video RAM. Integrated chipsets actually have an advantage
>> there.
>>
>>   
> I agree.
I'm not sure this is actually that much of an advantage, you may not 
need to copy the data, however you need to read it out from main memory 
multiple times negating any bandwidth advantage (assuming 24 fps video 
and 72 fps display refresh the chip's scaler needs to read it three 
times - at least if the chip actually uses a "true" overlay scaler and 
doesn't just do some sort of a blit). (Note that the radeon chips should 
probably be able to do video overlays from main memory too, though I 
haven't tried that.)

Interestingly, I've tried both with dma xv and without on my good old 
celeron 1.0A overclocked to 1.33Ghz (with sdram) - xorg cpu time seemed 
to be just the same, I'm assuming it just burns up cpu cycles in some 
wait for idle loop in case of dma.

Anyway, the driver really should support planar yuv natively instead of 
converting to packed yuv. Not only would it be faster, it would also 
need less memory (as the common planar yuv format has the cr and cb 
subsampled both vertically and horizontally while packed yuv has it only 
subsampled horizontally).
With the attached patch, xorg cpu time seems to be significantly lower 
(roughly half here) - still not good enough for full mpeg4 hd video on 
that old box, however :-). Note though the patch is old (against 
monolithic xorg no less!) so it won't apply cleanly. Also, it is quite 
broken and needs fixing before it could be commited (it allocates too 
much ram, may not work for big endian, and worst the offset calculations 
are wrong when the overlay window is moved out the screen - resulting in 
garbled video, corrupted pixmaps and segfaults... There are likely 
issues with source videos not aligned to 32 pixels too. It might be 
worth fixing though.

I'm not sure, maybe increasing agp mode to 4 (if not already done) could 
help performance too (if dma for xv is used).
New player (decoding library) version could help too. In any case, if it 
needs 60% cpu time on a pentium m 2ghz, I would expect cpu time to get 
awfully close to 100% on a 2ghz p4, even under optimal conditions.

Roland
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: radeon_video_planar2.diff
URL: <http://lists.x.org/archives/xorg/attachments/20061127/537fdc09/attachment.ksh>


More information about the xorg mailing list