[Mesa-dev] [PATCH 1/3] radeonsi: implement mechanism for IBs without partial flushes at the end (v6)
Nicolai Hähnle
nhaehnle at gmail.com
Sun Apr 15 18:46:21 UTC 2018
On 07.04.2018 04:31, Marek Olšák wrote:
> From: Marek Olšák <marek.olsak at amd.com>
>
> (This patch doesn't enable the behavior. It will be enabled in a later
> commit.)
>
> Draw calls from multiple IBs can be executed in parallel.
>
> v2: do emit partial flushes on SI
> v3: invalidate all shader caches at the beginning of IBs
> v4: don't call si_emit_cache_flush in si_flush_gfx_cs if not needed,
> only do this for flushes invoked internally
> v5: empty IBs should wait for idle if the flush requires it
> v6: split the commit
>
> If we artificially limit the number of draw calls per IB to 5, we'll get
> a lot more IBs, leading to a lot more partial flushes. Let's see how
> the removal of partial flushes changes GPU utilization in that scenario:
>
> With partial flushes (time busy):
> CP: 99%
> SPI: 86%
> CB: 73:
>
> Without partial flushes (time busy):
> CP: 99%
> SPI: 93%
> CB: 81%
> ---
> src/gallium/drivers/radeon/radeon_winsys.h | 7 ++++
> src/gallium/drivers/radeonsi/si_gfx_cs.c | 52 ++++++++++++++++++++++--------
> src/gallium/drivers/radeonsi/si_pipe.h | 1 +
> 3 files changed, 46 insertions(+), 14 deletions(-)
> [snip]
> + /* Always invalidate caches at the beginning of IBs, because external
> + * users (e.g. BO evictions and SDMA/UVD/VCE IBs) can modify our
> + * buffers.
> + *
> + * Note that the cache flush done by the kernel at the end of GFX IBs
> + * isn't useful here, because that flush can finish after the following
> + * IB starts drawing.
> + *
> + * TODO: Do we also need to invalidate CB & DB caches?
I don't think so.
Kernel buffer move: CB & DB caches use logical addressing, so should be
unaffected.
UVD: APIs should forbid writing to the currently bound framebuffer.
CPU: Shouldn't be writing directly to the framebuffer, and even if it
does (linear framebuffer?), I believe OpenGL requires re-binding the
framebuffer.
Cheers,
Nicolai
> + */
> + ctx->flags |= SI_CONTEXT_INV_ICACHE |
> + SI_CONTEXT_INV_SMEM_L1 |
> + SI_CONTEXT_INV_VMEM_L1 |
> + SI_CONTEXT_INV_GLOBAL_L2 |
> + SI_CONTEXT_START_PIPELINE_STATS;
>
> /* set all valid group as dirty so they get reemited on
> * next draw command
> */
> si_pm4_reset_emitted(ctx);
>
> /* The CS initialization should be emitted before everything else. */
> si_pm4_emit(ctx, ctx->init_config);
> if (ctx->init_config_gs_rings)
> si_pm4_emit(ctx, ctx->init_config_gs_rings);
> diff --git a/src/gallium/drivers/radeonsi/si_pipe.h b/src/gallium/drivers/radeonsi/si_pipe.h
> index 0c90a6c6e46..f0f323ff3a7 100644
> --- a/src/gallium/drivers/radeonsi/si_pipe.h
> +++ b/src/gallium/drivers/radeonsi/si_pipe.h
> @@ -540,20 +540,21 @@ struct si_context {
> void *vs_blit_texcoord;
> struct si_screen *screen;
> struct pipe_debug_callback debug;
> LLVMTargetMachineRef tm; /* only non-threaded compilation */
> struct si_shader_ctx_state fixed_func_tcs_shader;
> struct r600_resource *wait_mem_scratch;
> unsigned wait_mem_number;
> uint16_t prefetch_L2_mask;
>
> bool gfx_flush_in_progress:1;
> + bool gfx_last_ib_is_busy:1;
> bool compute_is_busy:1;
>
> unsigned num_gfx_cs_flushes;
> unsigned initial_gfx_cs_size;
> unsigned gpu_reset_counter;
> unsigned last_dirty_tex_counter;
> unsigned last_compressed_colortex_counter;
> unsigned last_num_draw_calls;
> unsigned flags; /* flush flags */
> /* Current unaccounted memory usage. */
>
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
More information about the mesa-dev
mailing list