[Mesa-dev] 16-bit comparisons in NIR

Fri Apr 20 08:21:40 UTC 2018

Hi,

while developing support for Vulkan shaderInt16 on Anvil I came across
a feature of NIR that was a bit inconvenient: bools are always 32-bit
by design, but the Intel hardware produces 16-bit bool results for 16-
bit comparisons, so that creates a problem that manifests like this:

vec1 32 ssa_21 = fge ssa_20, ssa_16
vec1 16 ssa_22 = b2f ssa_21

Our CMP instruction will produce a 16-bit boolean result for the first
NIR instruction (where NIR expects it to be 32-bit), so by the time we
emit the second instruction in the driver the bit-size for the operand
of b2f provided by NIR no longer matches the reality and we emit
incorrect code.

This seems to have been a consicious design choice in NIR, and while
discussing this with Jason he was unsure how much we wanted to change
this  or how to do it, given how thoroughly 32-bit bools are baked into
NIR and the complexities that modifying this would also bring to our
bit-size validation code.

I have been considering alternatives that didn't involve changing NIR
to support multiple bit-sizes for booleans:

1) Drivers that need to emit smaller booleans could try to fix the
generated NIR by correcting the expected bit-sizes for CMP
instructions. This would be rather trivial to implement in drivers (and
maybe we could even make a generic pass for other drivers to use if
they need it) but this will make the validator complain because it
won't recognize comparisons with 16-bit bool outputs as valid NIR
opcodes. I also found instances where nir_search would complain about
mismatching bit-sizes. I haven't looked any further into it yet though,
so maybe we can reasonably work around these issues.

2) Drivers could handle this specially when they emit code from NIR.
Specifically, when they see a 32-bit boolean source in an instruction,
they would have to search for the instruction that produced that source
value and check whether it is a 16-bit or a 32-bit boolean to emit
proper code for the instruction.

3) Drivers can just convert the 16-bit bool result they generate for
16-bit cmp to the 32-bit bool that NIR expects, and then possibly run
an optimization pass to eliminate these extra conversions and fix up
the code accordingly.

Does anyone else have any better ideas?

Iago