[PATCH] Allocate Xv buffers to GTT.

Thu Feb 11 15:36:18 PST 2010

2010/2/11 Michel Dänzer <michel at daenzer.net>:
> On Wed, 2010-02-10 at 22:44 +0200, Pauli Nieminen wrote:
>> KMS doesn't have acceleration for upload to vram. memcpy/memmove to VRAM
>> directly is very slow (40M/s in benchmark) which causes visible problems
>> to video.
>>
>> Allocating video buffer in GTT will give good performance (350-450M/s)
>> for memmove operation. This is nice performance boost for Xv under KMS.
>>
>> There is still posibility to improve if adding BLITBLT transfer to VRAM
>> which would handle tiling and endian swapping.
>
> What tiling? Byte swapping is done as part of copying to the BO, so I'm
> not sure how an additional blit could improve anything.
>

I would think that byteswaping copy would be slow (unless power has
instruction for that) while gpu probably can do it without much
performance hit. Of course that would need some benchmarking if there
is any huge difference.

>
>> diff --git a/src/radeon_crtc.c b/src/radeon_crtc.c
>> index 556b461..8384af1 100644
>> --- a/src/radeon_crtc.c
>> +++ b/src/radeon_crtc.c
>> @@ -564,7 +564,7 @@ radeon_crtc_shadow_allocate (xf86CrtcPtr crtc, int width, int height)
>>       * setter for offscreen area locking in EXA currently.  So, we just
>>       * allocate offscreen memory and fake up a pixmap header for it.
>>       */
>> -    rotate_offset = radeon_legacy_allocate_memory(pScrn, &radeon_crtc->crtc_rotate_mem, size, align);
>> +    rotate_offset = radeon_legacy_allocate_memory(pScrn, &radeon_crtc->crtc_rotate_mem, size, align, 0);
>
> This should probably be in VRAM.
>

If exa is using upload to screen to copy cursor to this memory
location then yes. I don't know how that stuff works.

>
>> @@ -3179,7 +3180,7 @@ RADEONAllocateSurface(
>>      pitch = ((w << 1) + 15) & ~15;
>>      size = pitch * h;
>>
>> -    offset = radeon_legacy_allocate_memory(pScrn, &surface_memory, size, 64);
>> +    offset = radeon_legacy_allocate_memory(pScrn, &surface_memory, size, 64, 0);
>>      if (offset == 0)
>>       return BadAlloc;
>
> And this?
>

No. It would be better to allocate to GTT. Then let TTM to move the bo
to VRAM. In fact I think it would be simpler if all buffers would be
just allocated to system/gtt (depending on does cpu need read access
while pushing data there). Then just let TTM upload everything to VRAM
and back to GTT depending on operations that happen for the buffer.

>
> --
> Earthling Michel Dänzer           |                http://www.vmware.com
> Libre software enthusiast         |          Debian, X and DRI developer
>