[Mesa-dev] [PATCH v3 002/104] nir: Add a deref instruction type

Sun Apr 8 21:20:20 UTC 2018

>>>>>>>> +
>>>>>>>> +   /** The mode of the underlying variable */
>>>>>>>> +   nir_variable_mode mode;
>>>>>>>
>>>>>>> In fact, it seems like deref->mode is unused outside of nir_print and
>>>>>>> nir_validate.. for logical addressing we can get the mode from the
>>>>>>> deref_var->var at the start of the chain, and deref->mode has no
>>>>>>> meaning for physical addressing (where the mode comes from the
>>>>>>> pointer).
>>>>>>>
>>>>>>> So maybe just drop deref->mode?
>>>>>>
>>>>>> Isn't it still useful with logical addressing in case a var is not
>>>>>> immediately available? (think VK_KHR_variable_pointers)
>>>>>
>>>>> not sure, maybe this should just also use fat-pointers like physical
>>>>> addressing does??
>>>>>
>>>>>> Also I could see this being useful in physical addressing too to avoid
>>>>>> all passes working with derefs needing to do the constant folding?
>>>>>
>>>>> The problem is that you don't necessarily know the type at compile
>>>>> time (and in the case where you do, you need to do constant folding to
>>>>> figure it out)
>>>>
>>>> So I have two considerations here
>>>>
>>>> 1) for vulkan you always know the mode, even when you don't know the var.
>>>> 2)  In CL the mode can still get annotated in the source program (CL C
>>>> non-generic pointers) in cases in which we cannot reasonably figure it
>>>> out with just constant folding. In those cases the mode is extra
>>>> information that you really lose.
>>>
>>> so, even in cl 1.x, you could do things like 'somefxn(foo ? global_ptr
>>> : local_ptr)'.. depending on how much we inline all the things, that
>>> might not get CF'd away.

How does this even work btw? somefxn has a definition, and the
definition specifies a mode for the argument right? (which is
implicitly __private if the app does not specify anything?)

>>
>> But something like
>> __constant int *ptr_value = ...;
>> store ptr in complex data structure.
>> __constant int* ptr2 = load from complex data structure.
>>
>> Without explicitly annotating ptr2 it is unlikely that constant
>> folding would find that ptr2 is pointing to __constant address space.
>> Hence removing the modes loses valuable information that you cannot
>> get back by constant folding. However, if you have a pointer with
>> unknown mode, we could have a special mode (or mode_all?) and you can
>> use the uvec2 representation in that case?
>
> hmm, I'm not really getting how deref->mode could magically have
> information that fatptr.y doesn't have.. if the mode is known, vtn
> could stash it in fatptr.y and everyone is happy?  If vtn doesn't know
> this, then I don't see how deref->mode helps..

You mean insert it into the fatptr every time deref_cast is called?

Wouldn't that blow up the IR size significantly for very little benefit?

>
>>>
>>> I think I'm leaning towards using fat ptrs for the vk case, since I
>>> guess that is a case where you could always expect
>>> nir_src_as_const_value() to work, to get the variable mode.  If for no
>>> other reason than I guess these deref's, if the var is not known,
>>> start w/ deref_cast, and it would be ugly for deref_cast to have to
>>> work differently for compute vs vk.  But maybe Jason already has some
>>> thoughts about it?
>>
>> I'd like to avoid fat pointers alltogether on AMD since we would not
>> use it even for CL. a generic pointer is just a uint64_t for us, with
>> no bitfield in there for the address space.
>>
>> I think we may need to think a bit more about representation however,
>> as e.g. for AMD a pointer is typically 64-bits (but we can do e.g.
>> 32-bits for known workgroup pointers), the current deref instructions
>> return 32-bit, and you want something like a uvec2 as pointer
>> representation?
>
> afaiu, newer AMD (and NV) hw can remap shared/private into a single
> global address space..  But I guess that is an easy subset of the
> harder case where drivers need to use different instructions.. so a
> pretty simple lowering pass run before lower_io could remap things
> that use fatptrs into something that ignores fatptr.y.  Then opt
> passes make fatptr.y go away.  So both AMD and hw that doesn't have a
> flat address space are happy.

But then you run into other issues, like how are you going to stuff a
64-bit fatptr.x + a ?-bit fatptr.y into a 64-bit value for Physical64
addressing? Also this means we have to track to the sources back to
the cast/var any time we want to do anything at all with any deref
which seems less efficient to me than just stuffing the deref in
there.

Also, what would the something which ignores fatptr.y be? I'd assume
that would be the normal deref based stuff, but requiring fatptr
contradicts that?

>
> BR,
> -R