[Mesa-dev] [PATCH 6/7] i965/cs: Implement brw_emit_gpgpu_walker
Jordan Justen
jordan.l.justen at intel.com
Wed Apr 29 17:00:27 PDT 2015
On 2015-04-27 19:02:38, Kenneth Graunke wrote:
> On Friday, April 24, 2015 04:33:43 PM Jordan Justen wrote:
> > Tested on Ivybridge, Haswell and Broadwell.
> >
> > Signed-off-by: Jordan Justen <jordan.l.justen at intel.com>
> > ---
> > src/mesa/drivers/dri/i965/brw_compute.c | 39 ++++++++++++++++++++++++++++++++-
> > src/mesa/drivers/dri/i965/brw_defines.h | 1 +
> > 2 files changed, 39 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_compute.c b/src/mesa/drivers/dri/i965/brw_compute.c
> > index baed701..06ef448 100644
> > --- a/src/mesa/drivers/dri/i965/brw_compute.c
> > +++ b/src/mesa/drivers/dri/i965/brw_compute.c
> > @@ -31,12 +31,49 @@
> > #include "brw_draw.h"
> > #include "brw_state.h"
> > #include "intel_batchbuffer.h"
> > +#include "brw_defines.h"
> >
> >
> > static void
> > brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint *num_groups)
> > {
> > - _mesa_problem(&brw->ctx, "TODO: implement brw_emit_gpgpu_walker");
> > + const struct brw_cs_prog_data *prog_data = brw->cs.prog_data;
> > +
> > + const unsigned simd_size = prog_data->simd_size;
> > + unsigned group_size = prog_data->local_size[0] *
> > + prog_data->local_size[1] * prog_data->local_size[2];
> > + unsigned thread_width_max =
> > + (group_size + simd_size - 1) / simd_size;
> > +
> > + uint32_t right_mask = (1u << simd_size) - 1;
> > + const unsigned right_non_aligned = group_size & (simd_size - 1);
> > + if (right_non_aligned != 0)
> > + right_mask >>= (simd_size - right_non_aligned);
>
> I think this is equvalent to:
>
> uint32_t right_mask = (1u << (simd_size - (group_size % simd_size))) - 1;
>
> which might be a bit simpler...
>
> > + BEGIN_BATCH(dwords);
> > + OUT_BATCH(GPGPU_WALKER << 16 | (dwords - 2));
>
> I was going to suggest splitting this into separate Gen8+ and Gen7
> blocks, but now that I look at the code...these two are slightly
> different indirect handling, and the later one is just a DWord of MBZ,
> so...it's not really that different. I think what you have is fine :)
>
> > + uint32_t dwords = brw->gen < 8 ? 11 : 15;
> > + OUT_BATCH(0);
> > + if (brw->gen >= 8) {
> > + OUT_BATCH(0);
> > + OUT_BATCH(0);
> > + }
> > + assert(thread_width_max <= brw->max_cs_threads);
> > + OUT_BATCH(((simd_size == 8) ? 0 : 1) << 30 |
>
> You might want to write this as ((simd_size / 8) - 1). That will work
> for SIMD8/16/32.
Good idea, but I think simd_size / 16 will be needed, since we need 2
for SIMD32.
> Topi would probably suggest using SET_FIELD, i.e.
>
> #define BRW_GPGPU_SIMD_SIZE_SHIFT 30
> #define BRW_GPGPU_SIMD_SIZE_MASK INTEL_MASK(31, 30)
>
> SET_FIELD((simd_size / 8) - 1, BRW_GPGPU_SIMD_SIZE)
>
> It's probably a good idea here too.
Will do.
> > + (thread_width_max - 1));
>
> Don't you need to set the thread height/depth maximums as well?
> I'm not really sure how this works.
We flatten the 3-dims out above in group_size, and then
thread_width_max. So, this basically focuses getting it to execute the
correct number of times. When height is not used, we can set
bottom_mask to all 1's, and only use the right_mask.
In terms of GLSL's gl_LocalInvocationID, that is a whole separate
matter. (And a whole separate patch series! :)
> > + OUT_BATCH(0);
>
> It'd be nice to label the 0's, i.e.
Will do.
Thanks for the review!
-Jordan
> OUT_BATCH(0); /* Thread Group ID Starting X */
> OUT_BATCH(num_groups[0]); /* Thread Group ID X Dimension */
>
> With those changes, the whole series is:
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
>
> I haven't verified that these execution masks are really what you want.
> You know more about this than I do. :)
>
> > + if (brw->gen >= 8)
> > + OUT_BATCH(0);
> > + OUT_BATCH(num_groups[0]);
> > + OUT_BATCH(0);
> > + if (brw->gen >= 8)
> > + OUT_BATCH(0);
> > + OUT_BATCH(num_groups[1]);
> > + OUT_BATCH(0);
> > + OUT_BATCH(num_groups[2]);
> > + OUT_BATCH(right_mask);
> > + OUT_BATCH(0xffffffff);
> > + ADVANCE_BATCH();
> > }
> >
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h
> > index 36f46af..cd25511 100644
> > --- a/src/mesa/drivers/dri/i965/brw_defines.h
> > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> > @@ -2451,5 +2451,6 @@ enum brw_wm_barycentric_interp_mode {
> >
> > #define MEDIA_VFE_STATE 0x7000
> > #define MEDIA_INTERFACE_DESCRIPTOR_LOAD 0x7002
> > +#define GPGPU_WALKER 0x7105
> >
> > #endif
> >
More information about the mesa-dev
mailing list