[PATCH 2/4] EXA/evergreen/ni: optimize non-overlapping Copy

Grigori Goronzy greg at chown.ath.cx
Tue Jul 30 16:08:33 PDT 2013


On 30.07.2013 17:40, Michel Dänzer wrote:
> On Mon, 2013-07-22 at 06:06 +0200, Grigori Goronzy wrote:
>> In case dst and src rectangles of a Copy operation in the same surface
>> don't overlap, it is safe to skip the scratch surface. This is a
>> common case.
>> ---
>>   src/evergreen_exa.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/evergreen_exa.c b/src/evergreen_exa.c
>> index 86f455d..2cdce0f 100644
>> --- a/src/evergreen_exa.c
>> +++ b/src/evergreen_exa.c
>> @@ -575,7 +575,12 @@ EVERGREENCopy(PixmapPtr pDst,
>>       if (accel_state->vsync)
>>   	RADEONVlineHelperSet(pScrn, dstX, dstY, dstX + w, dstY + h);
>>
>> -    if (accel_state->same_surface && accel_state->copy_area) {
>> +    if (accel_state->same_surface &&
>> +	    (srcX + w <= dstX || dstX + w <= srcX || srcY + h <= dstY || dstY + h <= srcY)) {
>> +	EVERGREENDoPrepareCopy(pScrn);
>> +	EVERGREENAppendCopyVertex(pScrn, srcX, srcY, dstX, dstY, w, h);
>> +	EVERGREENDoCopyVline(pDst);
>> +    } else if (accel_state->same_surface && accel_state->copy_area) {
>>   	uint32_t orig_dst_domain = accel_state->dst_obj.domain;
>>   	uint32_t orig_src_domain = accel_state->src_obj[0].domain;
>>   	uint32_t orig_src_tiling_flags = accel_state->src_obj[0].tiling_flags;
>
> This looks a bit weird. EVERGREENCopy() may be called any number of
> times between calls to EVERGREENPrepareCopy() and EVERGREENDoneCopy(),
> and the source and destination rectangles may or may not overlap for
> each call. Does this change handle arbitrary transitions between the
> overlapping and non-overlapping case correctly?
>

It should be safe. EVERGREENDoPrepareCopy sets up all state for the 
operation, which includes flushing texture cache for the src, and is 
always called before doing a same-surface Copy operation. 
EVERGREENDoCopyVline queues the operation and flushes the dst surface 
after every same-surface Copy operation. src and dest are always 
properly restored in the overlapping case. So the state is always 
correct and flushing is done correctly.

> Also, I think you might get even more gains by not allocating a
> temporary BO at all when none of the rectangles overlap.
>

I actually tested this (I moved scratch pixmap allocation into the 
overlapping copy case) and didn't see an improvement in cairo-perf-traces.


More information about the xorg-driver-ati mailing list