[Mesa-dev] [PATCH 4/4] RFC: nir: add lowering for idiv/udiv/umod

Wed Apr 1 06:50:40 PDT 2015

On Wed, Apr 1, 2015 at 7:09 AM, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 01.04.2015 um 03:44 schrieb Rob Clark:
>> On Tue, Mar 31, 2015 at 9:03 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>>> Am 01.04.2015 um 00:57 schrieb Rob Clark:
>>>> From: Rob Clark <robclark at freedesktop.org>
>>>>
>>>> Based on the algo from NV50LegalizeSSA::handleDIV() and handleMOD().
>>>> See also trans_idiv() in freedreno/ir3/ir3_compiler.c (which was an
>>>> adaptation of the nv50 code from Ilia).
>>>>
>>>> Just sending as an rfc right now, since I'm not quite at the point to be
>>>> able to test it on actual hw.
>>>>
>>>> Signed-off-by: Rob Clark <robclark at freedesktop.org>
>>>> ---
>>>>  src/glsl/Makefile.sources     |   1 +
>>>>  src/glsl/nir/nir.h            |   1 +
>>>>  src/glsl/nir/nir_lower_idiv.c | 212 ++++++++++++++++++++++++++++++++++++++++++
>>>>  3 files changed, 214 insertions(+)
>>>>  create mode 100644 src/glsl/nir/nir_lower_idiv.c
>>>>
>>>> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
>>>> index 18fff38..e426970 100644
>>>> --- a/src/glsl/Makefile.sources
>>>> +++ b/src/glsl/Makefile.sources
>>>> @@ -32,6 +32,7 @@ NIR_FILES = \
>>>>       nir/nir_lower_atomics.c \
>>>>       nir/nir_lower_global_vars_to_local.c \
>>>>       nir/nir_lower_locals_to_regs.c \
>>>> +     nir/nir_lower_idiv.c \
>>>>       nir/nir_lower_io.c \
>>>>       nir/nir_lower_phis_to_scalar.c \
>>>>       nir/nir_lower_samplers.cpp \
>>>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>>>> index cd03d6b..e002d6f 100644
>>>> --- a/src/glsl/nir/nir.h
>>>> +++ b/src/glsl/nir/nir.h
>>>> @@ -1605,6 +1605,7 @@ void nir_lower_samplers(nir_shader *shader,
>>>>
>>>>  void nir_lower_system_values(nir_shader *shader);
>>>>  void nir_lower_tex_projector(nir_shader *shader);
>>>> +void nir_lower_idiv(nir_shader *shader);
>>>>
>>>>  void nir_lower_atomics(nir_shader *shader);
>>>>  void nir_lower_to_source_mods(nir_shader *shader);
>>>> diff --git a/src/glsl/nir/nir_lower_idiv.c b/src/glsl/nir/nir_lower_idiv.c
>>>> new file mode 100644
>>>> index 0000000..e95c57e
>>>> --- /dev/null
>>>> +++ b/src/glsl/nir/nir_lower_idiv.c
>>>> @@ -0,0 +1,212 @@
>>>> +/*
>>>> + * Copyright © 2015 Red Hat
>>>> + *
>>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>>> + * copy of this software and associated documentation files (the "Software"),
>>>> + * to deal in the Software without restriction, including without limitation
>>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>>> + * Software is furnished to do so, subject to the following conditions:
>>>> + *
>>>> + * The above copyright notice and this permission notice (including the next
>>>> + * paragraph) shall be included in all copies or substantial portions of the
>>>> + * Software.
>>>> + *
>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>>>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
>>>> + * IN THE SOFTWARE.
>>>> + *
>>>> + * Authors:
>>>> + *    Rob Clark <robclark at freedesktop.org>
>>>> + */
>>>> +
>>>> +#include "nir.h"
>>>> +#include "nir_builder.h"
>>>> +
>>>> +/* Lowers idiv/udiv/umod
>>>> + * Based on NV50LegalizeSSA::handleDIV()
>>>> + *
>>>> + * Note that this is probably not enough precision for compute shaders.
>>>> + * Perhaps we want a second higher precision (looping) version of this?
>>>> + * Or perhaps we assume if you can do compute shaders you can also
>>>> + * branch out to a pre-optimized shader library routine..
>>>
>>> So if this is not enough precision, maybe should state how large the
>>> error can be?
>>>
>>
>> tbh, if I knew what the error for this approach was, I would have
>> included it.  I'm not the original author, but this is based on
>> nouveau codegen code (as mentioned in the comment).  I guess it is
>> better than converting to float and dividing and converting back, but
>> worse than an iterative (ie. looping, ie. divergent flow control)
>> approach.  It is apparently enough to keep piglit happy.
>>
>> The original algo in nv50 lowering code is from
>> 322bc7ed68ed92233c97168c036d0aa50c11a20e (ie. 'nv50/ir: import nv50
>> target') which doesn't really give more clue about the origin..
>>
>> if anyone knows, I'm all ears and will add relevant links/info to comment..
>
> Ah ok. Well it isn't even obvious to me if the results are not actually
> always exact.

Should be easy enough to take the algo, express it in terms of e.g.
numpy (or even, *gasp*, a C program), and then do a randomized search
over the 32bit x 32bit input space to see if there are any errors, and
what they are. (Since the full input space would take too long...)

Looks like I did just that when debugging the freedreno impl...
available at http://hastebin.com/ewimuvobin.py

  -ilia