[Mesa-dev] [PATCH 03/23] i965/fs: Use MOV.nz instead of AND.nz to generate flag on GEN6+
Ian Romanick
idr at freedesktop.org
Mon Apr 6 11:51:15 PDT 2015
On 04/06/2015 11:35 AM, Matt Turner wrote:
> On Fri, Mar 20, 2015 at 1:58 PM, Ian Romanick <idr at freedesktop.org> wrote:
>> From: Ian Romanick <ian.d.romanick at intel.com>
>>
>> On SNB+, the Boolean result is always 0 or ~0, so MOV.nz produces the
>> same effect as AND.nz. However, later cmod propagation passes can
>> handle the MOV.nz, but they cannot handle the AND.nz because the source
>> is not generated by a CMP.
>>
>> It's worth noting that this commit was a lot more effective before
>> commit bb22aa0 (i965/fs: Ignore type in cmod prop if scan_inst is CMP.).
>> Without that commit, this commit improved ~2,500 shaders on each
>> affected platform, including Sandy Bridge.
>>
>> Ivy Bridge (0x0166):
>> total instructions in shared programs: 6291794 -> 6291668 (-0.00%)
>> instructions in affected programs: 41207 -> 41081 (-0.31%)
>> helped: 154
>> HURT: 28
>>
>> Haswell (0x0426):
>> total instructions in shared programs: 5779180 -> 5779054 (-0.00%)
>> instructions in affected programs: 37210 -> 37084 (-0.34%)
>> helped: 154
>> HURT: 28
>>
>> Broadwell (0x162E):
>> total instructions in shared programs: 6823014 -> 6822848 (-0.00%)
>> instructions in affected programs: 40195 -> 40029 (-0.41%)
>> helped: 164
>> HURT: 28
>>
>> No change on GM45, Iron Lake, Sandy Bridge, Ivy Bridge with NIR, or
>> Haswell with NIR.
>>
>> Signed-off-by: Ian Romanick <ian.d.romanick at intel.com>
>> ---
>
> I looked at some helped shaders. They seem to be doing this:
>
> const vec4 ps_c0 = vec4(1.0, -1.0, 0.0, -0.0);
> ...
> t0_ps.x = (gl_FrontFacing ? ps_c0.x : ps_c0.y);
> t0_ps.y = (gl_FrontFacing ? ps_c0.w : ps_c0.y);
> t0_ps.x = ((-t0_ps.x >= 0.0) ? ps_c0.z : ps_c0.x);
>
> so before this patch we hit the
> fs_visitor::try_opt_frontfacing_ternary path for t0_ps.x and not for
> t0_ps.y, generating:
>
> asr(8) g26<1>D -g0<0,1,0>W 15D
> or(8) g36.1<2>W g0<0,1,0>W 0x3f80UW
> mov(1) g25<1>F [0F, 0F, 0F, 0F]VF
> and.nz.f0(8) null g26<8,8,1>D 1D <--- this gets
> removed with this patch
> and(8) g35<1>D g36<8,8,1>D 0xbf800000UD
> mov(8) g38<1>F -g25<0,1,0>F
> mov(8) g40<1>F g25<0,1,0>F
> (+f0) sel(8) g37<1>F g38<8,8,1>F -1F
> cmp.ge.f0(8) null -g35<8,8,1>F g25<0,1,0>F
> (+f0) sel(8) g39<1>F g40<8,8,1>F 1F
>
> After this patch we generate
> asr.nz.f0(8) null -g0<0,1,0>W 15D
> or(8) g35.1<2>W g0<0,1,0>W 0x3f80UW
> mov(1) g25<1>F [0F, 0F, 0F, 0F]VF
> and(8) g34<1>D g35<8,8,1>D 0xbf800000UD
> mov(8) g37<1>F -g25<0,1,0>F
> mov(8) g39<1>F g25<0,1,0>F
> (+f0) sel(8) g36<1>F g37<8,8,1>F -1F
> cmp.ge.f0(8) null -g34<8,8,1>F g25<0,1,0>F
> (+f0) sel(8) g38<1>F g39<8,8,1>F 1F
>
> 10 instructions to 9. That's an annoying amount of assembly to digest,
> but basically we're just benefiting because of the order the uses of
> the flag. If we could simply rearrange the flag writes and reads, we
> would generate better code, and...
>
> If we could recognize that there are multiple gl_FrontFacing ? ... :
> ... expressions, we probably would have just emitted asr.nz.f0 and a
> couple of SELs.
Right... I wonder what happens to these shaders after patch 14. The
t0_ps.y calculation will change to 't0_ps.y = -float(gl_FrontFacing)'
after patch 10. The final t0_ps.x calculation will get changed to
't0_ps.x = float(t0_ps.x == 0)' after patch 12. With patches 13 and 14,
tree grafting will enable some other changes.
> So I don't really think this patch is helping anything except by accident. :)
That is definitely possible. Before some of the changes to the cmod
propagation pass, this patch helped a couple thousand shaders. I'll
test the series with this patch reverted and see if there are still any
benefits.
More information about the mesa-dev
mailing list