[PATCH] include: introduce byte counting macros.

Luc Verhaegen libv at skynet.be
Tue Jun 16 13:03:56 PDT 2009


On Tue, Jun 16, 2009 at 07:52:13PM +0200, Matthias Hopf wrote:
> On Jun 16, 09 13:43:55 -0400, Adam Jackson wrote:
> > > To my experience they are not as efficient. Even with today's compilers
> > > :-(
> > I'd love to see an existence proof for this.
> 
> Because we all disliked the use of macros in the radeon driver in the
> acceleration routines, we had all acceleration macros for radeonhd first
> coded up as static inline functions, but switched over to macros because
> of something like a 30% performance impact (AFAIR, it definitely was not
> negligible).
> 
> Later I discussed this with Richard Guenther (one of the major gcc
> developers, who happens to work here at SuSE), and he basically agreed
> that gcc is by far not optimal with respect to optimizing static inline
> functions yet.
> 
> That's as far as a proof as I can come by. Luc probably remembers the
> git commits (if there were any, and we didn't only commit the optimized
> version), because he coded that stuff. gcc version was 4.2 AFAIR.
> 
> Matthias

When Command Submission got coded up, just about a year ago, I first got 
rid of the cancerous macros the radeon driver used (uses?) for writing 
into the command buffers, and provided general infrastructure 
(CS), with callbacks, to handle this transparently for both MMIO and CP 
(and CP in FB/directCP, which i never properly finished).

The intention was to take as wide an arc around the bog as possible, 
and write fully macro-less C. Trivial functions like RHDCSGrab, 
RHDCSWrite and RHDCSRegWrite were created, as that made the code a lot 
easier on the eye. These were inlined in the hope that the compiler 
would then also optimise them properly.

When all was nicely coded up i started benchmarking. Amazingly the 
inlines were just about 1/10th of the speed of the nasty radeon macros. 
A rather baffling difference. But by simply recoding those functions 
into just as simple and transparent macros i saw the same throughput 
as before.

I do not firmly remember what gcc version came with openSUSE 10.3, but 
at the time 11.0 was close to release, our gcc people (richi and 
matz) said that it did get improved significantly there. But i doubt it 
will be fully on par. I too wish that this was different, but i can live 
with the small concessions that were made in this case.

As for proof, feel free to get an r5xx, and then trivially (re-)create 
inlines out of the few small macros in rhd_cs.h, and x11perf this 
against the macros.

Because CS was not exactly welcomed by our technology partners at ATI,
it became rather hard to preserve history in a sane way for upstreaming
when eventually CS was accepted.

Luc Verhaegen.


More information about the xorg-devel mailing list