[Mesa-dev] [RFC] nir: compiler options for addressing modes

Wed Apr 15 07:32:06 PDT 2015

On Tue, Apr 14, 2015 at 7:08 PM, Rob Clark <robdclark at gmail.com> wrote:
> On Tue, Apr 14, 2015 at 6:24 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>> On Tue, Apr 14, 2015 at 5:16 PM, Rob Clark <robdclark at gmail.com> wrote:
>>> On Tue, Apr 14, 2015 at 4:59 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>>>>>>> +   /**
>>>>>>>>> +    * Addressing mode for corresponding _indirect intrinsics:
>>>>>>>>> +    */
>>>>>>>>> +   nir_addressing_mode var_addressing_mode;
>>>>>>>>> +   nir_addressing_mode input_addressing_mode;
>>>>>>>>> +   nir_addressing_mode output_addressing_mode;
>>>>>>>>> +   nir_addressing_mode uniform_addressing_mode;
>>>>>>>>> +   nir_addressing_mode ubo_addressing_mode;
>>>>
>>>> What is var_addressing_mode?  Sorry for not bringing that up before.
>>>
>>>
>>> well, originally in my thinking it was for load_var/store_var..  but
>>> perhaps that doesn't make sense (given lower_io).  Maybe it makes more
>>> sense to define is as applying to var_local/var_global (where the
>>> others apply to shader_in/shader_out/uniform and their equivalent
>>> intrinsic forms)?
>>>
>>> Maybe it's a bit weird since I don't lower vars to regs before feeding
>>> to my ir3 frontend, but the whole load_var/store_var for array access,
>>> and ssa for everything else form works kind of nicely for me.
>>>
>>> BR,
>>> -R
>>
>> I don't think we should be letting the driver define the stride of
>> variable array accesses. Variables are supposed to be structured,
>> backend-independent things that core NIR can manipulate and optimize
>> as it pleases; it shouldn't need to know anything about how the driver
>> will index the data. For doing the kinds of optimizations you're
>> talking about, you have registers that are backend-dependent, and core
>> NIR (other than the lower_locals_to_regs) doesn't need to know what
>> the indices mean. What you're doing right now is a hack, and if you
>> want to get the benefits of optimizing the index expression in core
>> NIR you should be using lower_locals_to_regs(). Having scalars be SSA
>> values and arrays be registers can't be that much more complicated
>> than having arrays be variables, and that's how it was set up to work
>> from the beginning.
>
> well, it is pretty convenient for me to have direct and indirect array
> access come via intrinsics, since that gives me a nice single point to
> do all the magic I need to do to set up instruction dependencies for
> scheduling and register assignment where the arrays get allocated in
> registers.  Possibly that means we need an option to lower array
> access to some new sort of intrinsic?  Not sure, I'll play with the
> lower_locals_to_regs without first coming out of SSA.. maybe if then
> the only things in regs are array accesses, I can achieve the same
> result.

Yeah, it seems like some sort of load_register intrinsic might be more
useful to you... there are a few reasons I never added it from the
beginning:

- You can't have a pointer to the nir_register * that contains useful
info like the number of vector components. You would have to search
through the list of registers, which takes O(n) time and is a lot more
annoying.
- There's another usecase for registers, namely backends that don't
support SSA at all; there, the possibility of register reads/writes
that are arbitrarily ordered compared to how they're used seems like
no fun. Having more than one way to represent a register load/store
doesn't seem like a great idea either.

although I can't imagine supporting the current way of loading/storing
registers would be that much more complicated. Variables definitely
aren't what you want.

>
> But more immediately, I hit a sort of snag:  I cannot seem to narrow
> from 32b to 16b at the same time I move to address register.  Which
> ends up meaning I need a mov from 32b to 16b followed by a 2nd mov to
> get it into address register...  which sort of defeats the purpose of
> this whole exercise.  I need to do some more r/e around this, but it
> may end up being better the way it was before.  And if we end up
> needing to do the shl in half-precision registers, then solving this
> in NIR would (I think) require NIR to be aware of half-precision.
> Which sounds useful, but -EBIGGER_FIRES

I don't quite understand... is this just a problem with using
registers? Would the entire sequence of operations need to be in 16
bits, or can you have whatever instruction computed your address do
the conversion to 16-bit as part of the output? If it's the latter,
you can just re-emit a 16-bit-outputting version of it and use that,
although it's a bit of a hack.

Long-term, support for half-precision in NIR is definitely in the
cards, but it'll probably have to wait until fp64 support as they're
both very similar wrt changes we have to make in the IR. Unless
someone has a burning desire to do half-precision first :).

>
> The other problem is that currently ttn gives addr src in float, which
> is how things are in tgsi land.  I'm not sure if changing this will be
> a problem for Eric.
>
> An interesting alternative solution to consider, is to allow the
> backend to lower to driver specific specific alu opcodes.. and then
> somehow run those through the other generic NIR opt passes.  I'm not
> quite sure yet *how* that will work (esp. considering my need for
> doing some things in half precision), but if we come up with something
> it would help in other cases too (such as lowering integer multiply)
>
> For now, I think I'll go back to having ttn UBO support not depending
> on this patch, since in the short term I need to sort out if/else
> flattening and flow control so that we can drop the tgsi f/e.
>
> BR,
> -R
>
>> Connor