[Mesa-dev] [PATCH 4/9] nir: Move the compare-with-zero optimizations to the late section

Jason Ekstrand jason at jlekstrand.net
Wed Apr 1 11:51:02 PDT 2015


On Tue, Mar 31, 2015 at 11:04 AM, Matt Turner <mattst88 at gmail.com> wrote:
> On Mon, Mar 23, 2015 at 8:43 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>> On Mon, Mar 23, 2015 at 8:34 PM, Matt Turner <mattst88 at gmail.com> wrote:
>>> On Mon, Mar 23, 2015 at 8:13 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>> total instructions in shared programs: 4422307 -> 4422363 (0.00%)
>>>> instructions in affected programs:     4230 -> 4286 (1.32%)
>>>> helped:                                0
>>>> HURT:                                  12
>>>>
>>>> While this does hurt some things, the losses are minor and it prevents the
>>>> compare-with-zero optimization from fighting with ffma which is much more
>>>> important.
>>>
>>> Is it actually "fighting" (i.e., undoing the other pass' work) or just
>>> preventing some ffmas from being generated?
>>>
>>> If we did have something that would be recognized by both these and
>>> the ffma pattern, it'd look like
>>>
>>> fge(fadd(a, fmul(b, c)), 0.0)
>>>
>>> which we could turn into
>>>
>>> fge(ffma(a, b, c), 0.0) if ffma runs first; or
>>> fge(a, fneg(fmul(b, c)) otherwise
>>>
>>> I guess the first one is better for i965, since we can do that in one
>>> instruction. In fact, maybe we don't want to do these optimizations at
>>> all? I'm kind of surprised that it hurts.
>>
>> Right.  In one sense it doesn't help anything because we can do a
>> compare with zero for free in i965.  However, losing it does hurt
>> quite a bit in the case where the optimization allows us to remove the
>> add instruction.  The problem is when the add is part of a potential
>> ffma in which case pulling things into the comparison keeps the more
>> optimized ffma peephole from actually converting to an ffma.  In this
>> case we keep both the add and the multiply even though we could have
>> done it with a ffma and a compare with zero.
>
> So to confirm, in the case of
>
>> (('flt', ('fadd', a, b), 0.0), ('flt', a, ('fneg', b))),
>
> you want to keep the a+b around so that if a or b is a multiplication,
> the ffma peephole can recognize it?

Exactly.

> If that's the case,
>
> Reviewed-by: Matt Turner <mattst88 at gmail.com>

Thanks.


More information about the mesa-dev mailing list