[Mesa-dev] [PATCH 2/2] i965/fs: Combine tex/fb_write operations (opt)

Wed Apr 1 12:44:25 PDT 2015

On Wednesday, April 01, 2015 10:39:17 AM Ben Widawsky wrote:
> Certain platforms support the ability to sample from a texture, and write it out
> to the file RT - thus saving a costly send instructions (note that this is a
> potnential win if one wanted to backport to a tag that didn't have the patch
> from Topi which removed excess MOVs from LOAD_PAYLOAD - 97caf5fa04dbd2),
> 
> v2: Modify the algorithm. Instead of iterating in reverse through blocks and
> insts, since the last block/inst is the only thing which can benefit. Rebased
> on top of Ken's patching modifying is_last_send
> 
> v3: Rebased over almost 2 months, and Incorporated feedback from Matt:
> Some comment typo fixes and rewordings.
> Whitespace
> Move the optimization pass outside of the optimize loop
> (Matt is unconvinced of the need for the helper function to be discrete, but I
> am keeping it as it is to reduce the size of the optimization's implementation.
> I thank him for his willingness to concede to my wishes on this).
> 
> The v1 of this patch did have performance data associated with it.
> Unfortunately, my systems are not stable enough at the moment to reproduce the
> performance data (I get too many GPU hangs). There are no regressions on piglit
> however. I will continue to look at getting the performance data more stable.
> 
> Cc: Matt Turner <mattst88 at gmail.com>
> Cc: Kenneth Graunke <kenneth at whitecape.org>
> Cc: Jason Ekstrand <jason at jlekstrand.net>
> Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp           | 104 +++++++++++++++++++++++++
>  src/mesa/drivers/dri/i965/brw_fs.h             |   3 +
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  12 +++
>  3 files changed, 119 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index e540dd1..a8d9a41 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2554,6 +2554,108 @@ fs_visitor::opt_algebraic()
>     return progress;
>  }
>  
> +/* Set up the sampler message to also have EOT set */

/**
 * Set up the sampler message to also have EOT set.
 */

> +void
> +fs_visitor::combine_send_tex(struct bblock_t *block,
> +                             fs_inst *tex_inst,
> +                             fs_inst *fb_write)
> +{
> +   assert((tex_inst->offset & (0xff << 24)) == 0);
> +   assert(!tex_inst->eot); /* We can't get here twice */
> +
> +   fs_inst *old_load_payload = (fs_inst *) tex_inst->prev;
> +
> +   if (old_load_payload->opcode != SHADER_OPCODE_LOAD_PAYLOAD) {

It might be nice to check old_load_payload->is_head_sentinel() || ...
here.  Realistically, the texturing instruction won't be the first in
the shader, but it's nice to be robust.

> +      /* It's unlikely yet possible the load_payload is further above */
> +      return;

It seems like this should "return false" back up through the caller, so
we properly report that we didn't optimize anything.

(Even though you're not in the optimization loop, the return value is still
 used to control whether INTEL_DEBUG=optimizer prints anything.)

Alternatively, you could make the caller check for the existence of the
LOAD_PAYLOAD, to avoid making this function return a bool.  I've
sketched that out below...your call, they're your patches. :)

> +   }
> +
> +   tex_inst->offset |= fb_write->target << 24;
> +   tex_inst->eot = true;
> +   fb_write->remove(block);
> +
> +   /* The texture instruction with EOT requires a header. If one is already
> +    * present, there is nothing to do.
> +    */
> +   if (tex_inst->header_present)
> +      return;
> +
> +   /* If a header is not present, we need to create a new LOAD_PAYLOAD command
> +    * with the same sources and a space saved for the header. Using a new
> +    * destination register not only makes sure we have enough space, but it will
> +    * make sure the dead code eliminator kills the instruction that this will
> +    * replace.
> +    */
> +   tex_inst->mlen++;
> +   tex_inst->header_present = true;
> +
> +   fs_reg send_header = vgrf(old_load_payload->sources + 1);
> +   fs_reg *new_sources =
> +      ralloc_array(mem_ctx, fs_reg, old_load_payload->sources + 1);
> +
> +   new_sources[0] = fs_reg();
> +   for (int i = 0; i < old_load_payload->sources; i++)
> +      new_sources[i+1] = old_load_payload->src[i];
> +
> +   /* The LOAD_PAYLOAD helper seems like the obvious choice here. However, it
> +    * requires a lot of information about the sources to appropriately figure
> +    * out the number of registers needed to be used. Given this stage in our
> +    * optimization, we may not have the appropriate GRFs required by
> +    * LOAD_PAYLOAD at this point (copy propagation). Therefore, we need to
> +    * manually emit the instruction.
> +    */
> +   fs_inst *new_load_payload = new(mem_ctx) fs_inst(SHADER_OPCODE_LOAD_PAYLOAD,
> +                                                    old_load_payload->exec_size,
> +                                                    send_header,
> +                                                    new_sources,
> +                                                    old_load_payload->sources + 1);
> +
> +   new_load_payload->regs_written = old_load_payload->regs_written + 1;
> +
> +   tex_inst->insert_before(block, new_load_payload);
> +   tex_inst->src[0] = send_header;
> +}
> +

Proper doxygen would be:

/**
 * <brief description of what the function does>
 *
 * <long description>
 */

> +/* CHV, and gen9+ can mark a texturing SEND instruction with EOT to
> + * have its results sent directly to the framebuffer, bypassing the EU.
> + *
> + * Recognize the final texturing results copied to the framebuffer write payload
> + * and modify them to write to the framebuffer directly.
> + */
> +bool
> +fs_visitor::opt_sampler_eot()
> +{
> +   brw_wm_prog_key *key = (brw_wm_prog_key*) this->key;
> +
> +   if (brw->gen < 9 && !brw->is_cherryview)
> +      return false;
> +
> +   /* FINISME: It should be possible to implement this optimization when there
> +    * are multiple drawbuffers.
> +    */
> +   if (key->nr_color_regions != 1)
> +      return false;
> +

How about:

   /* Look for a texturing instruction immediately before the final
    * FB_WRITE.
    */
   fs_inst *fb_write = (fs_inst *) cfg->blocks[cfg->num_blocks - 1]->end();
   fs_inst *tex_inst = (fs_inst *) fb_write->prev;

   /* There wasn't one; nothing to do. */
   if (tex_inst->is_head_sentinel() || !tex_inst->is_tex())
      return false;

   /* If there's no header present, we need to munge the LOAD_PAYLOAD as
    * well.  It's very likely to be the previous instruction.
    */
   fs_inst *load_payload = (fs_inst *) tex_inst->prev;
   if (!tex_inst->header_present && (load_payload->is_head_sentinel() ||
       load_payload->opcode != SHADER_OPCODE_LOAD_PAYLOAD))
      return false;

> +   fs_inst *fb_write = (fs_inst *) cfg->blocks[cfg->num_blocks - 1]->end();
> +   fs_inst *tex_inst = (fs_inst *) fb_write->prev;
> +
> +   if (tex_inst->is_head_sentinel())
> +      return false;
> +
> +   /* It will potentially be multiple iterations of optimizations before we get
> +    * the condition we wish to optimize for.
> +    */

The thing is...the condition we wish to optimize for may not ever
happen, even after multiple iterations of optimizations.  

> +   if (!tex_inst->is_tex())
> +      return false;
> +
> +   assert(fb_write->eot);
> +   assert(fb_write->opcode == FS_OPCODE_FB_WRITE);
> +
> +   combine_send_tex(cfg->blocks[cfg->num_blocks - 1], tex_inst, fb_write);
> +
> +   return true;
> +}

This looks good to me.  While I like my suggestions, they're mostly
cosmetic, so:
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

You should get a review from Matt before pushing, though.

> +
>  bool
>  fs_visitor::opt_register_renaming()
>  {
> @@ -3760,6 +3862,8 @@ fs_visitor::optimize()
>  
>     pass_num = 0;
>  
> +   OPT(opt_sampler_eot);
> +
>     if (OPT(lower_load_payload)) {
>        split_virtual_grfs();
>        OPT(register_coalesce);
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
> index 278a8ee..fcb4b63 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -231,6 +231,9 @@ public:
>     bool compute_to_mrf();
>     bool dead_code_eliminate();
>     bool remove_duplicate_mrf_writes();
> +
> +   void combine_send_tex(bblock_t *block, fs_inst *tex_inst, fs_inst *fb_write);
> +   bool opt_sampler_eot();
>     bool virtual_grf_interferes(int a, int b);
>     void schedule_instructions(instruction_scheduler_mode mode);
>     void insert_gen4_send_dependency_workarounds();
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index bd12147..3adcf84 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -517,6 +517,7 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src
>     int rlen = 4;
>     uint32_t simd_mode;
>     uint32_t return_format;
> +   bool is_combined_send = inst->eot;
>  
>     switch (dst.type) {
>     case BRW_REGISTER_TYPE_D:
> @@ -676,6 +677,11 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src
>        dst = vec16(dst);
>     }
>  
> +   if (is_combined_send) {
> +      assert(brw->gen >= 9 || brw->is_cherryview);
> +      rlen = 0;
> +   }
> +
>     assert(brw->gen < 7 || !inst->header_present ||
>            src.file == BRW_GENERAL_REGISTER_FILE);
>  
> @@ -781,6 +787,12 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src
>         * so has already done marking.
>         */
>     }
> +
> +   if (is_combined_send) {
> +      brw_inst_set_eot(brw, brw_last_inst, true);
> +      brw_set_dest(p, brw_last_inst, brw_null_reg());
> +      brw_inst_set_opcode(brw, brw_last_inst, BRW_OPCODE_SENDC);
> +   }
>  }
>  
>  
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150401/3247ac7b/attachment-0001.sig>