[Mesa-dev] [PATCH] radv: Implement alternate GFX9 scissor workaround.

Nicolai Hähnle nhaehnle at gmail.com
Mon May 28 10:39:42 UTC 2018


On 28.05.2018 12:21, Bas Nieuwenhuizen wrote:
> On Mon, May 28, 2018 at 12:19 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>> On 27.05.2018 18:57, Bas Nieuwenhuizen wrote:
>>>
>>> This improves dota2 performance for me by 11% when I force the
>>> GPU DPM level to low (otherwise dota2 is CPU limited for 4k on my
>>> threadripper), which should be a large part of the radv-amdvlk gap.
>>> (For me with that was radv 60.3 -> 66.6, while AMDVLK does about 68
>>> fps)
>>>
>>> It looks like dota2 rendered the GUI with a bunch of draws with
>>> a SetScissors before almost each draw, causing a lot of pipeline
>>> stalls.
>>>
>>> I'm not really happy with the duplication of code, but overriding
>>> radeon_set_context_reg would also be messy since we have the
>>> pre-recorded pipelines and a bunch of si_cmd_buffer code, as well
>>> as some memory->context reg loads for which things would be more
>>> complicated.
>>> ---
>>>    src/amd/vulkan/radv_cmd_buffer.c | 56 +++++++++++++++++++++++++++-----
>>>    1 file changed, 47 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/src/amd/vulkan/radv_cmd_buffer.c
>>> b/src/amd/vulkan/radv_cmd_buffer.c
>>> index 5ab577b4c59..08be50995c1 100644
>>> --- a/src/amd/vulkan/radv_cmd_buffer.c
>>> +++ b/src/amd/vulkan/radv_cmd_buffer.c
>>> @@ -869,14 +869,6 @@ radv_emit_scissor(struct radv_cmd_buffer *cmd_buffer)
>>>    {
>>>          uint32_t count = cmd_buffer->state.dynamic.scissor.count;
>>>    -     /* Vega10/Raven scissor bug workaround. This must be done before
>>> VPORT
>>> -        * scissor registers are changed. There is also a more efficient
>>> but
>>> -        * more involved alternative workaround.
>>> -        */
>>> -       if (cmd_buffer->device->physical_device->has_scissor_bug) {
>>> -               cmd_buffer->state.flush_bits |=
>>> RADV_CMD_FLAG_PS_PARTIAL_FLUSH;
>>> -               si_emit_cache_flush(cmd_buffer);
>>> -       }
>>>          si_write_scissors(cmd_buffer->cs, 0, count,
>>>                            cmd_buffer->state.dynamic.scissor.scissors,
>>>                            cmd_buffer->state.dynamic.viewport.viewports,
>>> @@ -1403,7 +1395,8 @@ radv_cmd_buffer_flush_dynamic_state(struct
>>> radv_cmd_buffer *cmd_buffer)
>>>          if (states & (RADV_CMD_DIRTY_DYNAMIC_VIEWPORT))
>>>                  radv_emit_viewport(cmd_buffer);
>>>    -     if (states & (RADV_CMD_DIRTY_DYNAMIC_SCISSOR |
>>> RADV_CMD_DIRTY_DYNAMIC_VIEWPORT))
>>> +       if (states & (RADV_CMD_DIRTY_DYNAMIC_SCISSOR |
>>> RADV_CMD_DIRTY_DYNAMIC_VIEWPORT) &&
>>> +           !cmd_buffer->device->physical_device->has_scissor_bug)
>>>                  radv_emit_scissor(cmd_buffer);
>>>          if (states & RADV_CMD_DIRTY_DYNAMIC_LINE_WIDTH)
>>> @@ -3144,10 +3137,52 @@ radv_emit_draw_packets(struct radv_cmd_buffer
>>> *cmd_buffer,
>>>          }
>>>    }
>>>    +/*
>>> + * Vega and raven have a bug which triggers if there are multiple context
>>> + * register contexts active at the same time with different scissor
>>> values.
>>> + *
>>> + * There are two possible workarounds:
>>> + * 1) Wait for PS_PARTIAL_FLUSH every time the scissor is changed. That
>>> way
>>> + *    there is only ever 1 active set of scissor values at the same time.
>>> + *
>>> + * 2) Whenever the hardware switches contexts we have to set the scissor
>>> + *    registers again even if it is a noop. That way the new context gets
>>> + *    the correct scissor values.
>>> + *
>>> + * This implements option 2. radv_need_late_scissor_emission needs to
>>> + * return true on affected HW if radv_emit_all_graphics_states sets
>>> + * any context registers.
>>> + */
>>> +static bool radv_need_late_scissor_emission(struct radv_cmd_buffer
>>> *cmd_buffer,
>>> +                                            bool indexed_draw)
>>> +{
>>> +       struct radv_cmd_state *state = &cmd_buffer->state;
>>> +
>>> +       if (!cmd_buffer->device->physical_device->has_scissor_bug)
>>> +               return false;
>>> +
>>> +       /* Assume all state changes except  these two can imply context
>>> rolls. */
>>> +       if (cmd_buffer->state.dirty & ~(RADV_CMD_DIRTY_INDEX_BUFFER |
>>> +                                       RADV_CMD_DIRTY_VERTEX_BUFFER |
>>> +                                       RADV_CMD_DIRTY_PIPELINE))
>>
>>
>> Why are you excluding the pipeline state? Anything that has set_context_reg
>> will trigger a context roll, I'd expect pipelines to have those?
> 
> Because that is checked in the if directly below it: " if
> (cmd_buffer->state.emitted_pipeline != cmd_buffer->state.pipeline)"
> 
> The reason I'm doing this is that with our copy&blit implementations
> we have some cases of switching pipelines and immediately switching
> back before a draw. The state is dirty but the pipeline is not
> changed. We do the same check before emitting a pipeline.

Oh, that makes sense, sorry for the confusion.

Cheers,
Nicolai


> 
> 
>>
>> Cheers,
>> Nicolai
>>
>>
>>
>>> +               return true;
>>> +
>>> +       if (cmd_buffer->state.emitted_pipeline !=
>>> cmd_buffer->state.pipeline)
>>> +               return true;
>>> +
>>> +       if (indexed_draw && state->pipeline->graphics.prim_restart_enable
>>> &&
>>> +           (state->index_type ? 0xffffffffu : 0xffffu) !=
>>> state->last_primitive_reset_index)
>>> +               return true;
>>> +
>>> +       return false;
>>> +}
>>> +
>>>    static void
>>>    radv_emit_all_graphics_states(struct radv_cmd_buffer *cmd_buffer,
>>>                                const struct radv_draw_info *info)
>>>    {
>>> +       bool late_scissor_emission =
>>> radv_need_late_scissor_emission(cmd_buffer, info->indexed);
>>> +
>>>          if ((cmd_buffer->state.dirty & RADV_CMD_DIRTY_FRAMEBUFFER) ||
>>>              cmd_buffer->state.emitted_pipeline !=
>>> cmd_buffer->state.pipeline)
>>>                  radv_emit_rbplus_state(cmd_buffer);
>>> @@ -3177,6 +3212,9 @@ radv_emit_all_graphics_states(struct radv_cmd_buffer
>>> *cmd_buffer,
>>>          radv_emit_draw_registers(cmd_buffer, info->indexed,
>>>                                   info->instance_count > 1, info->indirect,
>>>                                   info->indirect ? 0 : info->count);
>>> +
>>> +       if (late_scissor_emission)
>>> +               radv_emit_scissor(cmd_buffer);
>>>    }
>>>      static void
>>>
>>
>>
>> --
>> Lerne, wie die Welt wirklich ist,
>> Aber vergiss niemals, wie sie sein sollte.


-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.


More information about the mesa-dev mailing list