[Mesa-dev] [PATCH 1/3] radeonsi: implement mechanism for IBs without partial flushes at the end (v6)

Mon Apr 16 08:51:53 UTC 2018

Am 15.04.2018 um 20:46 schrieb Nicolai Hähnle:
> On 07.04.2018 04:31, Marek Olšák wrote:
>> From: Marek Olšák <marek.olsak at amd.com>
>>
>> (This patch doesn't enable the behavior. It will be enabled in a later
>> commit.)
>>
>> Draw calls from multiple IBs can be executed in parallel.
>>
>> v2: do emit partial flushes on SI
>> v3: invalidate all shader caches at the beginning of IBs
>> v4: don't call si_emit_cache_flush in si_flush_gfx_cs if not needed,
>>      only do this for flushes invoked internally
>> v5: empty IBs should wait for idle if the flush requires it
>> v6: split the commit
>>
>> If we artificially limit the number of draw calls per IB to 5, we'll get
>> a lot more IBs, leading to a lot more partial flushes. Let's see how
>> the removal of partial flushes changes GPU utilization in that scenario:
>>
>> With partial flushes (time busy):
>>      CP: 99%
>>      SPI: 86%
>>      CB: 73:
>>
>> Without partial flushes (time busy):
>>      CP: 99%
>>      SPI: 93%
>>      CB: 81%
>> ---
>>   src/gallium/drivers/radeon/radeon_winsys.h |  7 ++++
>>   src/gallium/drivers/radeonsi/si_gfx_cs.c   | 52 
>> ++++++++++++++++++++++--------
>>   src/gallium/drivers/radeonsi/si_pipe.h     |  1 +
>>   3 files changed, 46 insertions(+), 14 deletions(-)
>> [snip]
>> +    /* Always invalidate caches at the beginning of IBs, because 
>> external
>> +     * users (e.g. BO evictions and SDMA/UVD/VCE IBs) can modify our
>> +     * buffers.
>> +     *
>> +     * Note that the cache flush done by the kernel at the end of 
>> GFX IBs
>> +     * isn't useful here, because that flush can finish after the 
>> following
>> +     * IB starts drawing.
>> +     *
>> +     * TODO: Do we also need to invalidate CB & DB caches?
>
> I don't think so.
>
> Kernel buffer move: CB & DB caches use logical addressing, so should 
> be unaffected.

Are you sure about that? Basically we don't do any extra invalidation 
when BOs are moved by the kernel.

But on the other hand the worst that could happen when we skip 
invalidation is that we don't read the same data into the caches which 
is already in the caches. E.g. the content of the BO doesn't change, 
just it's location.

In other words it depends how the CB&DB caches work.

Christian.

>
> UVD: APIs should forbid writing to the currently bound framebuffer.
>
> CPU: Shouldn't be writing directly to the framebuffer, and even if it 
> does (linear framebuffer?), I believe OpenGL requires re-binding the 
> framebuffer.
>
> Cheers,
> Nicolai
>
>
>> +     */
>> +    ctx->flags |= SI_CONTEXT_INV_ICACHE |
>> +              SI_CONTEXT_INV_SMEM_L1 |
>> +              SI_CONTEXT_INV_VMEM_L1 |
>> +              SI_CONTEXT_INV_GLOBAL_L2 |
>> +              SI_CONTEXT_START_PIPELINE_STATS;
>>         /* set all valid group as dirty so they get reemited on
>>        * next draw command
>>        */
>>       si_pm4_reset_emitted(ctx);
>>         /* The CS initialization should be emitted before everything 
>> else. */
>>       si_pm4_emit(ctx, ctx->init_config);
>>       if (ctx->init_config_gs_rings)
>>           si_pm4_emit(ctx, ctx->init_config_gs_rings);
>> diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
>> b/src/gallium/drivers/radeonsi/si_pipe.h
>> index 0c90a6c6e46..f0f323ff3a7 100644
>> --- a/src/gallium/drivers/radeonsi/si_pipe.h
>> +++ b/src/gallium/drivers/radeonsi/si_pipe.h
>> @@ -540,20 +540,21 @@ struct si_context {
>>       void                *vs_blit_texcoord;
>>       struct si_screen        *screen;
>>       struct pipe_debug_callback    debug;
>>       LLVMTargetMachineRef        tm; /* only non-threaded 
>> compilation */
>>       struct si_shader_ctx_state    fixed_func_tcs_shader;
>>       struct r600_resource        *wait_mem_scratch;
>>       unsigned            wait_mem_number;
>>       uint16_t            prefetch_L2_mask;
>>         bool                gfx_flush_in_progress:1;
>> +    bool                gfx_last_ib_is_busy:1;
>>       bool                compute_is_busy:1;
>>         unsigned            num_gfx_cs_flushes;
>>       unsigned            initial_gfx_cs_size;
>>       unsigned            gpu_reset_counter;
>>       unsigned            last_dirty_tex_counter;
>>       unsigned            last_compressed_colortex_counter;
>>       unsigned            last_num_draw_calls;
>>       unsigned            flags; /* flush flags */
>>       /* Current unaccounted memory usage. */
>>
>
>