[Mesa-dev] 16-bit comparisons in NIR

Sat Apr 21 08:28:43 UTC 2018

On 21.04.2018 02:32, Bas Nieuwenhuizen wrote:
> On Fri, Apr 20, 2018 at 5:16 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>> On Fri, Apr 20, 2018 at 5:16 AM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>>>
>>> On 20.04.2018 10:21, Iago Toral wrote:
>>>>
>>>> Hi,
>>>>
>>>> while developing support for Vulkan shaderInt16 on Anvil I came across
>>>> a feature of NIR that was a bit inconvenient: bools are always 32-bit
>>>> by design, but the Intel hardware produces 16-bit bool results for 16-
>>>> bit comparisons, so that creates a problem that manifests like this:
>>>>
>>>> vec1 32 ssa_21 = fge ssa_20, ssa_16
>>>> vec1 16 ssa_22 = b2f ssa_21
>>
>>
>> I was thinking about this a bit this morning and it gets even more sticky.
>> What happens if you have
>>
>> bool e = (a < b) && (c < d);
>>
>> where a and b are 16-bit and c and d are 32-bit?  In this case, one
>> comprison has a 32-bit value and one has a 16-bit value and you have to pick
>> one for the &&.
>>
>>>>
>>>> Our CMP instruction will produce a 16-bit boolean result for the first
>>>> NIR instruction (where NIR expects it to be 32-bit), so by the time we
>>>> emit the second instruction in the driver the bit-size for the operand
>>>> of b2f provided by NIR no longer matches the reality and we emit
>>>> incorrect code.
>>>>
>>>> This seems to have been a consicious design choice in NIR, and while
>>>> discussing this with Jason he was unsure how much we wanted to change
>>>> this  or how to do it, given how thoroughly 32-bit bools are baked into
>>>> NIR and the complexities that modifying this would also bring to our
>>>> bit-size validation code.
>>>>
>>>> I have been considering alternatives that didn't involve changing NIR
>>>> to support multiple bit-sizes for booleans:
>>>>
>>>> 1) Drivers that need to emit smaller booleans could try to fix the
>>>> generated NIR by correcting the expected bit-sizes for CMP
>>>> instructions. This would be rather trivial to implement in drivers (and
>>>> maybe we could even make a generic pass for other drivers to use if
>>>> they need it) but this will make the validator complain because it
>>>> won't recognize comparisons with 16-bit bool outputs as valid NIR
>>>> opcodes. I also found instances where nir_search would complain about
>>>> mismatching bit-sizes. I haven't looked any further into it yet though,
>>>> so maybe we can reasonably work around these issues.
>>>>
>>>> 2) Drivers could handle this specially when they emit code from NIR.
>>>> Specifically, when they see a 32-bit boolean source in an instruction,
>>>> they would have to search for the instruction that produced that source
>>>> value and check whether it is a 16-bit or a 32-bit boolean to emit
>>>> proper code for the instruction.
>>>>
>>>> 3) Drivers can just convert the 16-bit bool result they generate for
>>>> 16-bit cmp to the 32-bit bool that NIR expects, and then possibly run
>>>> an optimization pass to eliminate these extra conversions and fix up
>>>> the code accordingly.
>>>
>>>
>>> radeonsi(NIR) and radv already use option 3, since GCN hardware really
>>> wants to treat bools as 1-bit value, so that's what I'd suggest. The
>>> optimizations that cleanup the conversions happen in LLVM for us.
>>
>>
>> Is this a GCN thing or an LLVM thing?  It would be neat if your hardware had
>> 1-bit registers. :-)  We sort-of do but they're special flag registers and
>> we have very few of them.
> 
> LLVM. For GCN  HW we use a 64-bit register that is shared between
> lanes (i.e. having 1 bit for each lane)

Which means, if you think about it, that using i1 for bool _is_ a GCN 
thing in the end ;)

But admittedly it's semantics.

Cheers,
Nicolai

>>
>> --Jason
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>

-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.