[Mesa-dev] [PATCH 1/3] radeonsi: implement mechanism for IBs without partial flushes at the end (v6)
Nicolai Hähnle
nhaehnle at gmail.com
Mon Apr 16 10:23:35 UTC 2018
On 16.04.2018 10:51, Christian König wrote:
> Am 15.04.2018 um 20:46 schrieb Nicolai Hähnle:
>> On 07.04.2018 04:31, Marek Olšák wrote:
>>> From: Marek Olšák <marek.olsak at amd.com>
>>>
>>> (This patch doesn't enable the behavior. It will be enabled in a later
>>> commit.)
>>>
>>> Draw calls from multiple IBs can be executed in parallel.
>>>
>>> v2: do emit partial flushes on SI
>>> v3: invalidate all shader caches at the beginning of IBs
>>> v4: don't call si_emit_cache_flush in si_flush_gfx_cs if not needed,
>>> only do this for flushes invoked internally
>>> v5: empty IBs should wait for idle if the flush requires it
>>> v6: split the commit
>>>
>>> If we artificially limit the number of draw calls per IB to 5, we'll get
>>> a lot more IBs, leading to a lot more partial flushes. Let's see how
>>> the removal of partial flushes changes GPU utilization in that scenario:
>>>
>>> With partial flushes (time busy):
>>> CP: 99%
>>> SPI: 86%
>>> CB: 73:
>>>
>>> Without partial flushes (time busy):
>>> CP: 99%
>>> SPI: 93%
>>> CB: 81%
>>> ---
>>> src/gallium/drivers/radeon/radeon_winsys.h | 7 ++++
>>> src/gallium/drivers/radeonsi/si_gfx_cs.c | 52
>>> ++++++++++++++++++++++--------
>>> src/gallium/drivers/radeonsi/si_pipe.h | 1 +
>>> 3 files changed, 46 insertions(+), 14 deletions(-)
>>> [snip]
>>> + /* Always invalidate caches at the beginning of IBs, because
>>> external
>>> + * users (e.g. BO evictions and SDMA/UVD/VCE IBs) can modify our
>>> + * buffers.
>>> + *
>>> + * Note that the cache flush done by the kernel at the end of
>>> GFX IBs
>>> + * isn't useful here, because that flush can finish after the
>>> following
>>> + * IB starts drawing.
>>> + *
>>> + * TODO: Do we also need to invalidate CB & DB caches?
>>
>> I don't think so.
>>
>> Kernel buffer move: CB & DB caches use logical addressing, so should
>> be unaffected.
>
> Are you sure about that? Basically we don't do any extra invalidation
> when BOs are moved by the kernel.
>
> But on the other hand the worst that could happen when we skip
> invalidation is that we don't read the same data into the caches which
> is already in the caches. E.g. the content of the BO doesn't change,
> just it's location.
>
> In other words it depends how the CB&DB caches work.
Yes, that's why I mentioned the logical addressing. And yes, I'm sure
that they're not using physical addresses in the CB/DB-internal caches.
Cheers,
Nicolai
>
> Christian.
>
>>
>> UVD: APIs should forbid writing to the currently bound framebuffer.
>>
>> CPU: Shouldn't be writing directly to the framebuffer, and even if it
>> does (linear framebuffer?), I believe OpenGL requires re-binding the
>> framebuffer.
>>
>> Cheers,
>> Nicolai
>>
>>
>>> + */
>>> + ctx->flags |= SI_CONTEXT_INV_ICACHE |
>>> + SI_CONTEXT_INV_SMEM_L1 |
>>> + SI_CONTEXT_INV_VMEM_L1 |
>>> + SI_CONTEXT_INV_GLOBAL_L2 |
>>> + SI_CONTEXT_START_PIPELINE_STATS;
>>> /* set all valid group as dirty so they get reemited on
>>> * next draw command
>>> */
>>> si_pm4_reset_emitted(ctx);
>>> /* The CS initialization should be emitted before everything
>>> else. */
>>> si_pm4_emit(ctx, ctx->init_config);
>>> if (ctx->init_config_gs_rings)
>>> si_pm4_emit(ctx, ctx->init_config_gs_rings);
>>> diff --git a/src/gallium/drivers/radeonsi/si_pipe.h
>>> b/src/gallium/drivers/radeonsi/si_pipe.h
>>> index 0c90a6c6e46..f0f323ff3a7 100644
>>> --- a/src/gallium/drivers/radeonsi/si_pipe.h
>>> +++ b/src/gallium/drivers/radeonsi/si_pipe.h
>>> @@ -540,20 +540,21 @@ struct si_context {
>>> void *vs_blit_texcoord;
>>> struct si_screen *screen;
>>> struct pipe_debug_callback debug;
>>> LLVMTargetMachineRef tm; /* only non-threaded
>>> compilation */
>>> struct si_shader_ctx_state fixed_func_tcs_shader;
>>> struct r600_resource *wait_mem_scratch;
>>> unsigned wait_mem_number;
>>> uint16_t prefetch_L2_mask;
>>> bool gfx_flush_in_progress:1;
>>> + bool gfx_last_ib_is_busy:1;
>>> bool compute_is_busy:1;
>>> unsigned num_gfx_cs_flushes;
>>> unsigned initial_gfx_cs_size;
>>> unsigned gpu_reset_counter;
>>> unsigned last_dirty_tex_counter;
>>> unsigned last_compressed_colortex_counter;
>>> unsigned last_num_draw_calls;
>>> unsigned flags; /* flush flags */
>>> /* Current unaccounted memory usage. */
>>>
>>
>>
>
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
More information about the mesa-dev
mailing list