New acceleration architecture

Thu Jun 30 13:44:09 PDT 2005

On Iau, 2005-06-30 at 14:14, Gioele Barabucci wrote:
> I fear that
> 
>   prepare(var, value) { opt[var] = value; }
>   job(v) { return opt[VAR1] + opt[VAR2] + v; }
> 
> with current compilers and current processors (PPC, x86_64) will be much
> slower than
> 
>   job(opt1, opt2, v) { return opt1 + opt2 + v; }

(Just IMHO...)

On many processors it will generate longer uglier code because global
loads are long winded operations that are not the optimal path. Also the
cache top is hot in L1 and globals scattered around may not be or will
increase cache misses.
With register parameters its even more visible.

That said since the compiler can assume nobody else changes the variable
it shouldn't be much different.

It is bad programming practice and it is also IMHO poor design because
not only is it slower or at best same speed but it prevents a future
threaded exa using server. Now given that many exa operations are doing
cpu work in host memory on the host CPU and there is a tile cache it
seems rather dumb to assume a non-threaded server. Similarly for waits
on one GPU while another is free for multi-head. If I am dragging a
translucent window on a dual cpu box with four cores my window manager
is using 0.01 of a core and it would be nice to do the software
rendering work on 3.99 cores not 0.99

CPU direction is now "more threads" not "more ops/sec". It seems unwise
for X to make that change hard at server level when the cost is
essentially nil right now. I don't see a fully threaded server making
sense but threading rendering work (and even balancing between host and
GPU) does appear useful with a tile cache.

Alan