[PATCH 6/6] RADEONPrepareAccess_CS: fallback to DFS when pixmap is in VRAM
Michel Dänzer
michel at daenzer.net
Wed Sep 1 04:26:09 PDT 2010
On Mit, 2010-09-01 at 10:37 +1200, Karl Tomlinson wrote:
>
> Michel Dänzer writes:
>
> > * Patch 4 introduces a big endian build break (typo, pSrc instead
> > of pDst in RADEONUploadToScreenCS) which is silently fixed in
> > patch 5.
>
> Sorry about that. I can touch that up, but, if you don't mind,
> I'll wait for feedback to see whether other changes are requested.
Sure.
> > * The r600 changes aren't split up properly,
>
> I'm not clear exactly what you had in mind for a proper split.
>
> The r600 code doesn't have the big-endian paths, so there's
> nothing to fix there until little-endian code also completes
> UTS/DFS even when a scratch BO is not necessary (part 5 in this
> series).
>
> I don't want to activate the unflushed CS bug, fixed in part 3, in
> little-endian code (until it is fixed).
>
> I guess part 4 (complete UTS and DFS even when scratch allocation
> fails) could be moved to after what is currently part 5, but there
> seems little point in completing UTS/DFS in this situation unless
> it ensures some reliability in the methods.
Okay I'm convinced, thanks for elaborating.
> >> I wondered whether PrepareAccess could fail for the visible screen
> >> with mixed pixmaps as suggested here
> >> http://www.mentby.com/maarten-maathuis/exa-classic-problem-with-xv.html
> >> When I tried that, however, I ended up with pixels in the wrong
> >> places, a bit like what I would expect if the pitch were wrong.
> >
> > I thought I tried that before and it worked. If the problem isn't a
> > straightforward pitch bug, it sounds like it might be tiling related?
>
> Could be. I have color tiling enabled. I expect you know better
> than I whether this is likely.
>
> Things worked fine for a while, then suddenly, on some apparently
> random activity, the whole screen got upset. Colors looked
> reasonable AFAICT, though I wasn't sure which bits of the screen I
> was looking at. IIRC lines bordering regions weren't perfect
> diagonals but more staircases, and it felt like there as a lot of
> repeating going on.
Have you tried if it also happens with tiling disabled? Is the effect
captured in a screenshot?
> > Basically, the reason why these paths were so far limited to big endian
> > (and even there excluded the screen pixmap, as that doesn't require
> > byte-swapping thanks to the surface registers) is that I thought other
> > people like Dave were opposed to using them everywhere based on the
> > belief that it was an overall loss compared to direct CPU access even in
> > VRAM.
>
> Would you be able to point me any relevant discussion, please?
Not offhand I'm afraid, maybe we only discussed it on IRC.
> VRAM reads are not cached so I find it hard to imagine how reading
> one pixel at a time (4 times in the case of bi-linear filters) can
> be faster than a GPU blit.
>
> I am certain that switching to direct CPU VRAM reads was a VERY
> BAD regression on my system, symptoms similar to trying to use
> offscreen pixmaps in the old days with XAA.
There's no question that direct VRAM access is very slow if the
operation needs to read a large number of the pixels. However that isn't
always the case, e.g. consider a 0-width line spanning a diagonal of a
large pixmap.
> > If that turns out to be wrong, or at least the worst case
> > behaviour is better, we should be able to make these paths work for all
> > pixmaps.
>
> Sounds good, but doable as an additional improvement, I assume.
Sure.
> > Have you made any measurements with e.g. gtkperf or x11perf to
> > see if there are any bad regressions?
>
> No. What I measured was RepeatPad operator-over RGB24-source with
> transform. The cost of direct VRAM read was at least 50 times
> the cost of the GPU blit.
>
> Naturally, I've also been using my system with these patches for
> some time now.
That's fine, still it would be good to make some additional measurements
to make sure there aren't any bad regressions.
> One possible (smaller) regression is when EXA first creates
> a system copy during a write-only operation, it does an unnecessary
> DFS:
> http://lists.x.org/archives/xorg-devel/2010-August/012298.html
>
> Patch here:
> http://lists.x.org/archives/xorg-devel/2010-August/012299.html
I saw that but haven't got around to reviewing it yet.
> Please do remember that most often improving the slowest paths is
> the most important. It is the slow paths that get noticed. In
> this case, it would take many gains in many operations to recover
> the 2 seconds lost trying to read from VRAM.
I agree and I am basically in favour of your changes (just discussing
some details) and I wanted to give some background about how the code
came to be in its current form.
--
Earthling Michel Dänzer | http://www.vmware.com
Libre software enthusiast | Debian, X and DRI developer
More information about the xorg-driver-ati
mailing list