[Mesa-dev] [PATCH 1/3] radeonsi: implement mechanism for IBs without partial flushes at the end (v6)

Nicolai Hähnle nhaehnle at gmail.com
Mon Apr 16 10:23:35 UTC 2018


On 16.04.2018 10:51, Christian König wrote:
> Am 15.04.2018 um 20:46 schrieb Nicolai Hähnle:
>> On 07.04.2018 04:31, Marek Olšák wrote:
>>> From: Marek Olšák <marek.olsak at amd.com>
>>>
>>> (This patch doesn't enable the behavior. It will be enabled in a later
>>> commit.)
>>>
>>> Draw calls from multiple IBs can be executed in parallel.
>>>
>>> v2: do emit partial flushes on SI
>>> v3: invalidate all shader caches at the beginning of IBs
>>> v4: don't call si_emit_cache_flush in si_flush_gfx_cs if not needed,
>>>      only do this for flushes invoked internally
>>> v5: empty IBs should wait for idle if the flush requires it
>>> v6: split the commit
>>>
>>> If we artificially limit the number of draw calls per IB to 5, we'll get
>>> a lot more IBs, leading to a lot more partial flushes. Let's see how
>>> the removal of partial flushes changes GPU utilization in that scenario:
>>>
>>> With partial flushes (time busy):
>>>      CP: 99%
>>>      SPI: 86%
>>>      CB: 73:
>>>
>>> Without partial flushes (time busy):
>>>      CP: 99%
>>>      SPI: 93%
>>>      CB: 81%
>>> ---
>>>   src/gallium/drivers/radeon/radeon_winsys.h |  7 ++++
>>>   src/gallium/drivers/radeonsi/si_gfx_cs.c   | 52 
>>> ++++++++++++++++++++++--------
>>>   src/gallium/drivers/radeonsi/si_pipe.h     |  1 +
>>>   3 files changed, 46 insertions(+), 14 deletions(-)
>>> [snip]
>>> +    /* Always invalidate caches at the beginning of IBs, because 
>>> external
>>> +     * users (e.g. BO evictions and SDMA/UVD/VCE IBs) can modify our
>>> +     * buffers.
>>> +     *
>>> +     * Note that the cache flush done by the kernel at the end of 
>>> GFX IBs
>>> +     * isn't useful here, because that flush can finish after the 
>>> following
>>> +     * IB starts drawing.
>>> +     *
>>> +     * TODO: Do we also need to invalidate CB & DB caches?
>>
>> I don't think so.
>>
>> Kernel buffer move: CB & DB caches use logical addressing, so should 
>> be unaffected.
> 
> Are you sure about that? Basically we don't do any extra invalidation 
> when BOs are moved by the kernel.
> 
> But on the other hand the worst that could happen when we skip 
> invalidation is that we don't read the same data into the caches which 
> is already in the caches. E.g. the content of the BO doesn't change, 
> just it's location.
> 
> In other words it depends how the CB&DB caches work.

Yes, that's why I mentioned the logical addressing. And yes, I'm sure 
that they're not using physical addresses in the CB/DB-internal caches.

Cheers,
Nicolai


> 
> Christian.
> 
>>
>> UVD: APIs should forbid writing to the currently bound framebuffer.
>>
>> CPU: Shouldn't be writing directly to the framebuffer, and even if it 
>> does (linear framebuffer?), I believe OpenGL requires re-binding the 
>> framebuffer.
>>
>> Cheers,
>> Nicolai
>>
>>
>>> +     */
>>> +    ctx->flags |= SI_CONTEXT_INV_ICACHE |
>>> +              SI_CONTEXT_INV_SMEM_L1 |
>>> +              SI_CONTEXT_INV_VMEM_L1 |
>>> +              SI_CONTEXT_INV_GLOBAL_L2 |
>>> +              SI_CONTEXT_START_PIPELINE_STATS;
>>>         /* set all valid group as dirty so they get reemited on
>>>        * next draw command
>>>        */
>>>       si_pm4_reset_emitted(ctx);
>>>         /* The CS initialization should be emitted before everything 
>>> else. */
>>>       si_pm4_emit(ctx, ctx->init_config);
>>>       if (ctx->init_config_gs_rings)
>>>           si_pm4_emit(ctx, ctx->init_config_gs_rings);
>>> diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
>>> b/src/gallium/drivers/radeonsi/si_pipe.h
>>> index 0c90a6c6e46..f0f323ff3a7 100644
>>> --- a/src/gallium/drivers/radeonsi/si_pipe.h
>>> +++ b/src/gallium/drivers/radeonsi/si_pipe.h
>>> @@ -540,20 +540,21 @@ struct si_context {
>>>       void                *vs_blit_texcoord;
>>>       struct si_screen        *screen;
>>>       struct pipe_debug_callback    debug;
>>>       LLVMTargetMachineRef        tm; /* only non-threaded 
>>> compilation */
>>>       struct si_shader_ctx_state    fixed_func_tcs_shader;
>>>       struct r600_resource        *wait_mem_scratch;
>>>       unsigned            wait_mem_number;
>>>       uint16_t            prefetch_L2_mask;
>>>         bool                gfx_flush_in_progress:1;
>>> +    bool                gfx_last_ib_is_busy:1;
>>>       bool                compute_is_busy:1;
>>>         unsigned            num_gfx_cs_flushes;
>>>       unsigned            initial_gfx_cs_size;
>>>       unsigned            gpu_reset_counter;
>>>       unsigned            last_dirty_tex_counter;
>>>       unsigned            last_compressed_colortex_counter;
>>>       unsigned            last_num_draw_calls;
>>>       unsigned            flags; /* flush flags */
>>>       /* Current unaccounted memory usage. */
>>>
>>
>>
> 


-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.


More information about the mesa-dev mailing list