[PATCH 6/6] RADEONPrepareAccess_CS: fallback to DFS when pixmap is in VRAM

Karl Tomlinson xmail at karlt.net
Tue Aug 31 15:37:40 PDT 2010


Thank you for taking the time to look at this, Michel.

Michel Dänzer writes:

> First of all, thanks a lot for splitting up the patch from the bug
> report, this was very helpful for review. Unfortunately though, the
> split isn't quite perfect:
>
>       * Patch 4 introduces a big endian build break (typo, pSrc instead
>         of pDst in RADEONUploadToScreenCS) which is silently fixed in
>         patch 5. 

Sorry about that.  I can touch that up, but, if you don't mind,
I'll wait for feedback to see whether other changes are requested.

>       * The r600 changes aren't split up properly,

I'm not clear exactly what you had in mind for a proper split.

The r600 code doesn't have the big-endian paths, so there's
nothing to fix there until little-endian code also completes
UTS/DFS even when a scratch BO is not necessary (part 5 in this
series).

I don't want to activate the unflushed CS bug, fixed in part 3, in
little-endian code (until it is fixed).

I guess part 4 (complete UTS and DFS even when scratch allocation
fails) could be moved to after what is currently part 5, but there
seems little point in completing UTS/DFS in this situation unless
it ensures some reliability in the methods.

>                                                    so changes to one
>         patch may require changing an otherwise unrelated patch as well.

OK.  I won't submit another part 5 until I have feedback on the
earlier parts.

> On Mon, 2010-08-23 at 19:25 +1200, Karl Tomlinson wrote: 
>> 
>> I wondered whether PrepareAccess could fail for the visible screen
>> with mixed pixmaps as suggested here
>> http://www.mentby.com/maarten-maathuis/exa-classic-problem-with-xv.html
>> When I tried that, however, I ended up with pixels in the wrong
>> places, a bit like what I would expect if the pitch were wrong.
>
> I thought I tried that before and it worked. If the problem isn't a
> straightforward pitch bug, it sounds like it might be tiling related?

Could be.  I have color tiling enabled.  I expect you know better
than I whether this is likely.

Things worked fine for a while, then suddenly, on some apparently
random activity, the whole screen got upset.  Colors looked
reasonable AFAICT, though I wasn't sure which bits of the screen I
was looking at.  IIRC lines bordering regions weren't perfect
diagonals but more staircases, and it felt like there as a lot of
repeating going on.

> Basically, the reason why these paths were so far limited to big endian
> (and even there excluded the screen pixmap, as that doesn't require
> byte-swapping thanks to the surface registers) is that I thought other
> people like Dave were opposed to using them everywhere based on the
> belief that it was an overall loss compared to direct CPU access even in
> VRAM.

Would you be able to point me any relevant discussion, please?

VRAM reads are not cached so I find it hard to imagine how reading
one pixel at a time (4 times in the case of bi-linear filters) can
be faster than a GPU blit.

I am certain that switching to direct CPU VRAM reads was a VERY
BAD regression on my system, symptoms similar to trying to use
offscreen pixmaps in the old days with XAA.

> If that turns out to be wrong, or at least the worst case
> behaviour is better, we should be able to make these paths work for all
> pixmaps.

Sounds good, but doable as an additional improvement, I assume.

> Have you made any measurements with e.g. gtkperf or x11perf to
> see if there are any bad regressions?

No.  What I measured was RepeatPad operator-over RGB24-source with
transform.  The cost of direct VRAM read was at least 50 times
the cost of the GPU blit.

Naturally, I've also been using my system with these patches for
some time now.

One possible (smaller) regression is when EXA first creates
a system copy during a write-only operation, it does an unnecessary
DFS:
http://lists.x.org/archives/xorg-devel/2010-August/012298.html

Patch here:
http://lists.x.org/archives/xorg-devel/2010-August/012299.html
(The exaCopyDirty* API could perhaps do with an update, but I
didn't want to get into that, and this patch is enough to fix this
regression.)

Please do remember that most often improving the slowest paths is
the most important.  It is the slow paths that get noticed.  In
this case, it would take many gains in many operations to recover
the 2 seconds lost trying to read from VRAM.


More information about the xorg-driver-ati mailing list