[Mesa-dev] [PATCH 03/23] i965/fs: Use MOV.nz instead of AND.nz to generate flag on GEN6+

Thu Apr 9 11:12:58 PDT 2015

On 04/06/2015 11:35 AM, Matt Turner wrote:
> On Fri, Mar 20, 2015 at 1:58 PM, Ian Romanick <idr at freedesktop.org> wrote:
>> From: Ian Romanick <ian.d.romanick at intel.com>
>>
>> On SNB+, the Boolean result is always 0 or ~0, so MOV.nz produces the
>> same effect as AND.nz.  However, later cmod propagation passes can
>> handle the MOV.nz, but they cannot handle the AND.nz because the source
>> is not generated by a CMP.
>>
>> It's worth noting that this commit was a lot more effective before
>> commit bb22aa0 (i965/fs: Ignore type in cmod prop if scan_inst is CMP.).
>> Without that commit, this commit improved ~2,500 shaders on each
>> affected platform, including Sandy Bridge.
>>
>> Ivy Bridge (0x0166):
>> total instructions in shared programs: 6291794 -> 6291668 (-0.00%)
>> instructions in affected programs:     41207 -> 41081 (-0.31%)
>> helped:                                154
>> HURT:                                  28
>>
>> Haswell (0x0426):
>> total instructions in shared programs: 5779180 -> 5779054 (-0.00%)
>> instructions in affected programs:     37210 -> 37084 (-0.34%)
>> helped:                                154
>> HURT:                                  28
>>
>> Broadwell (0x162E):
>> total instructions in shared programs: 6823014 -> 6822848 (-0.00%)
>> instructions in affected programs:     40195 -> 40029 (-0.41%)
>> helped:                                164
>> HURT:                                  28
>>
>> No change on GM45, Iron Lake, Sandy Bridge, Ivy Bridge with NIR, or
>> Haswell with NIR.
>>
>> Signed-off-by: Ian Romanick <ian.d.romanick at intel.com>
>> ---
> 
> I looked at some helped shaders. They seem to be doing this:
> 
> const vec4 ps_c0 = vec4(1.0, -1.0, 0.0, -0.0);
> ...
>         t0_ps.x = (gl_FrontFacing ? ps_c0.x : ps_c0.y);
>         t0_ps.y = (gl_FrontFacing ? ps_c0.w : ps_c0.y);
>         t0_ps.x = ((-t0_ps.x >= 0.0) ? ps_c0.z : ps_c0.x);
> 
> so before this patch we hit the
> fs_visitor::try_opt_frontfacing_ternary path for t0_ps.x and not for
> t0_ps.y, generating:
> 
> asr(8)          g26<1>D         -g0<0,1,0>W     15D
> or(8)           g36.1<2>W       g0<0,1,0>W      0x3f80UW
> mov(1)          g25<1>F         [0F, 0F, 0F, 0F]VF
> and.nz.f0(8)    null            g26<8,8,1>D     1D    <--- this gets
> removed with this patch
> and(8)          g35<1>D         g36<8,8,1>D     0xbf800000UD
> mov(8)          g38<1>F         -g25<0,1,0>F
> mov(8)          g40<1>F         g25<0,1,0>F
> (+f0) sel(8)    g37<1>F         g38<8,8,1>F     -1F
> cmp.ge.f0(8)    null            -g35<8,8,1>F    g25<0,1,0>F
> (+f0) sel(8)    g39<1>F         g40<8,8,1>F     1F
> 
> After this patch we generate
> asr.nz.f0(8)    null            -g0<0,1,0>W     15D
> or(8)           g35.1<2>W       g0<0,1,0>W      0x3f80UW
> mov(1)          g25<1>F         [0F, 0F, 0F, 0F]VF
> and(8)          g34<1>D         g35<8,8,1>D     0xbf800000UD
> mov(8)          g37<1>F         -g25<0,1,0>F
> mov(8)          g39<1>F         g25<0,1,0>F
> (+f0) sel(8)    g36<1>F         g37<8,8,1>F     -1F
> cmp.ge.f0(8)    null            -g34<8,8,1>F    g25<0,1,0>F
> (+f0) sel(8)    g38<1>F         g39<8,8,1>F     1F
> 
> 10 instructions to 9. That's an annoying amount of assembly to digest,
> but basically we're just benefiting because of the order the uses of
> the flag. If we could simply rearrange the flag writes and reads, we
> would generate better code, and...

Removing this at the end of the series affects quite a few shaders in
all GEN6+, non-NIR runs:

total instructions in shared programs: 7268653 -> 7270887 (0.03%)
instructions in affected programs:     408532 -> 410766 (0.55%)
helped:                                31
HURT:                                  2265

My current guess is that "glsl: Optimize certain if-statements to
ir_triop_csel" makes a boatload more opportunities for this accidental
optimization.

> If we could recognize that there are multiple gl_FrontFacing ? ... :
> ... expressions, we probably would have just emitted asr.nz.f0 and a
> couple of SELs.

The weird thing is that both uses of gl_FrontFacing don't get the asr
treatment...  It's also weird that this code doesn't change after
"i965/fs: Optimize gl_FrontFacing used alone as a condition"

> So I don't really think this patch is helping anything except by accident. :)