[Mesa-dev] [RFC] nir: compiler options for addressing modes

Tue Apr 14 16:08:07 PDT 2015

On Tue, Apr 14, 2015 at 6:24 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Tue, Apr 14, 2015 at 5:16 PM, Rob Clark <robdclark at gmail.com> wrote:
>> On Tue, Apr 14, 2015 at 4:59 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>>>>>> +   /**
>>>>>>>> +    * Addressing mode for corresponding _indirect intrinsics:
>>>>>>>> +    */
>>>>>>>> +   nir_addressing_mode var_addressing_mode;
>>>>>>>> +   nir_addressing_mode input_addressing_mode;
>>>>>>>> +   nir_addressing_mode output_addressing_mode;
>>>>>>>> +   nir_addressing_mode uniform_addressing_mode;
>>>>>>>> +   nir_addressing_mode ubo_addressing_mode;
>>>
>>> What is var_addressing_mode?  Sorry for not bringing that up before.
>>
>>
>> well, originally in my thinking it was for load_var/store_var..  but
>> perhaps that doesn't make sense (given lower_io).  Maybe it makes more
>> sense to define is as applying to var_local/var_global (where the
>> others apply to shader_in/shader_out/uniform and their equivalent
>> intrinsic forms)?
>>
>> Maybe it's a bit weird since I don't lower vars to regs before feeding
>> to my ir3 frontend, but the whole load_var/store_var for array access,
>> and ssa for everything else form works kind of nicely for me.
>>
>> BR,
>> -R
>
> I don't think we should be letting the driver define the stride of
> variable array accesses. Variables are supposed to be structured,
> backend-independent things that core NIR can manipulate and optimize
> as it pleases; it shouldn't need to know anything about how the driver
> will index the data. For doing the kinds of optimizations you're
> talking about, you have registers that are backend-dependent, and core
> NIR (other than the lower_locals_to_regs) doesn't need to know what
> the indices mean. What you're doing right now is a hack, and if you
> want to get the benefits of optimizing the index expression in core
> NIR you should be using lower_locals_to_regs(). Having scalars be SSA
> values and arrays be registers can't be that much more complicated
> than having arrays be variables, and that's how it was set up to work
> from the beginning.

well, it is pretty convenient for me to have direct and indirect array
access come via intrinsics, since that gives me a nice single point to
do all the magic I need to do to set up instruction dependencies for
scheduling and register assignment where the arrays get allocated in
registers.  Possibly that means we need an option to lower array
access to some new sort of intrinsic?  Not sure, I'll play with the
lower_locals_to_regs without first coming out of SSA.. maybe if then
the only things in regs are array accesses, I can achieve the same
result.

But more immediately, I hit a sort of snag:  I cannot seem to narrow
from 32b to 16b at the same time I move to address register.  Which
ends up meaning I need a mov from 32b to 16b followed by a 2nd mov to
get it into address register...  which sort of defeats the purpose of
this whole exercise.  I need to do some more r/e around this, but it
may end up being better the way it was before.  And if we end up
needing to do the shl in half-precision registers, then solving this
in NIR would (I think) require NIR to be aware of half-precision.
Which sounds useful, but -EBIGGER_FIRES

The other problem is that currently ttn gives addr src in float, which
is how things are in tgsi land.  I'm not sure if changing this will be
a problem for Eric.

An interesting alternative solution to consider, is to allow the
backend to lower to driver specific specific alu opcodes.. and then
somehow run those through the other generic NIR opt passes.  I'm not
quite sure yet *how* that will work (esp. considering my need for
doing some things in half precision), but if we come up with something
it would help in other cases too (such as lowering integer multiply)

For now, I think I'll go back to having ttn UBO support not depending
on this patch, since in the short term I need to sort out if/else
flattening and flow control so that we can drop the tgsi f/e.

BR,
-R

> Connor