[Mesa-dev] [PATCH] radv: Implement alternate GFX9 scissor workaround.
Bas Nieuwenhuizen
bas at basnieuwenhuizen.nl
Mon May 28 10:21:59 UTC 2018
On Mon, May 28, 2018 at 12:19 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> On 27.05.2018 18:57, Bas Nieuwenhuizen wrote:
>>
>> This improves dota2 performance for me by 11% when I force the
>> GPU DPM level to low (otherwise dota2 is CPU limited for 4k on my
>> threadripper), which should be a large part of the radv-amdvlk gap.
>> (For me with that was radv 60.3 -> 66.6, while AMDVLK does about 68
>> fps)
>>
>> It looks like dota2 rendered the GUI with a bunch of draws with
>> a SetScissors before almost each draw, causing a lot of pipeline
>> stalls.
>>
>> I'm not really happy with the duplication of code, but overriding
>> radeon_set_context_reg would also be messy since we have the
>> pre-recorded pipelines and a bunch of si_cmd_buffer code, as well
>> as some memory->context reg loads for which things would be more
>> complicated.
>> ---
>> src/amd/vulkan/radv_cmd_buffer.c | 56 +++++++++++++++++++++++++++-----
>> 1 file changed, 47 insertions(+), 9 deletions(-)
>>
>> diff --git a/src/amd/vulkan/radv_cmd_buffer.c
>> b/src/amd/vulkan/radv_cmd_buffer.c
>> index 5ab577b4c59..08be50995c1 100644
>> --- a/src/amd/vulkan/radv_cmd_buffer.c
>> +++ b/src/amd/vulkan/radv_cmd_buffer.c
>> @@ -869,14 +869,6 @@ radv_emit_scissor(struct radv_cmd_buffer *cmd_buffer)
>> {
>> uint32_t count = cmd_buffer->state.dynamic.scissor.count;
>> - /* Vega10/Raven scissor bug workaround. This must be done before
>> VPORT
>> - * scissor registers are changed. There is also a more efficient
>> but
>> - * more involved alternative workaround.
>> - */
>> - if (cmd_buffer->device->physical_device->has_scissor_bug) {
>> - cmd_buffer->state.flush_bits |=
>> RADV_CMD_FLAG_PS_PARTIAL_FLUSH;
>> - si_emit_cache_flush(cmd_buffer);
>> - }
>> si_write_scissors(cmd_buffer->cs, 0, count,
>> cmd_buffer->state.dynamic.scissor.scissors,
>> cmd_buffer->state.dynamic.viewport.viewports,
>> @@ -1403,7 +1395,8 @@ radv_cmd_buffer_flush_dynamic_state(struct
>> radv_cmd_buffer *cmd_buffer)
>> if (states & (RADV_CMD_DIRTY_DYNAMIC_VIEWPORT))
>> radv_emit_viewport(cmd_buffer);
>> - if (states & (RADV_CMD_DIRTY_DYNAMIC_SCISSOR |
>> RADV_CMD_DIRTY_DYNAMIC_VIEWPORT))
>> + if (states & (RADV_CMD_DIRTY_DYNAMIC_SCISSOR |
>> RADV_CMD_DIRTY_DYNAMIC_VIEWPORT) &&
>> + !cmd_buffer->device->physical_device->has_scissor_bug)
>> radv_emit_scissor(cmd_buffer);
>> if (states & RADV_CMD_DIRTY_DYNAMIC_LINE_WIDTH)
>> @@ -3144,10 +3137,52 @@ radv_emit_draw_packets(struct radv_cmd_buffer
>> *cmd_buffer,
>> }
>> }
>> +/*
>> + * Vega and raven have a bug which triggers if there are multiple context
>> + * register contexts active at the same time with different scissor
>> values.
>> + *
>> + * There are two possible workarounds:
>> + * 1) Wait for PS_PARTIAL_FLUSH every time the scissor is changed. That
>> way
>> + * there is only ever 1 active set of scissor values at the same time.
>> + *
>> + * 2) Whenever the hardware switches contexts we have to set the scissor
>> + * registers again even if it is a noop. That way the new context gets
>> + * the correct scissor values.
>> + *
>> + * This implements option 2. radv_need_late_scissor_emission needs to
>> + * return true on affected HW if radv_emit_all_graphics_states sets
>> + * any context registers.
>> + */
>> +static bool radv_need_late_scissor_emission(struct radv_cmd_buffer
>> *cmd_buffer,
>> + bool indexed_draw)
>> +{
>> + struct radv_cmd_state *state = &cmd_buffer->state;
>> +
>> + if (!cmd_buffer->device->physical_device->has_scissor_bug)
>> + return false;
>> +
>> + /* Assume all state changes except these two can imply context
>> rolls. */
>> + if (cmd_buffer->state.dirty & ~(RADV_CMD_DIRTY_INDEX_BUFFER |
>> + RADV_CMD_DIRTY_VERTEX_BUFFER |
>> + RADV_CMD_DIRTY_PIPELINE))
>
>
> Why are you excluding the pipeline state? Anything that has set_context_reg
> will trigger a context roll, I'd expect pipelines to have those?
Because that is checked in the if directly below it: " if
(cmd_buffer->state.emitted_pipeline != cmd_buffer->state.pipeline)"
The reason I'm doing this is that with our copy&blit implementations
we have some cases of switching pipelines and immediately switching
back before a draw. The state is dirty but the pipeline is not
changed. We do the same check before emitting a pipeline.
>
> Cheers,
> Nicolai
>
>
>
>> + return true;
>> +
>> + if (cmd_buffer->state.emitted_pipeline !=
>> cmd_buffer->state.pipeline)
>> + return true;
>> +
>> + if (indexed_draw && state->pipeline->graphics.prim_restart_enable
>> &&
>> + (state->index_type ? 0xffffffffu : 0xffffu) !=
>> state->last_primitive_reset_index)
>> + return true;
>> +
>> + return false;
>> +}
>> +
>> static void
>> radv_emit_all_graphics_states(struct radv_cmd_buffer *cmd_buffer,
>> const struct radv_draw_info *info)
>> {
>> + bool late_scissor_emission =
>> radv_need_late_scissor_emission(cmd_buffer, info->indexed);
>> +
>> if ((cmd_buffer->state.dirty & RADV_CMD_DIRTY_FRAMEBUFFER) ||
>> cmd_buffer->state.emitted_pipeline !=
>> cmd_buffer->state.pipeline)
>> radv_emit_rbplus_state(cmd_buffer);
>> @@ -3177,6 +3212,9 @@ radv_emit_all_graphics_states(struct radv_cmd_buffer
>> *cmd_buffer,
>> radv_emit_draw_registers(cmd_buffer, info->indexed,
>> info->instance_count > 1, info->indirect,
>> info->indirect ? 0 : info->count);
>> +
>> + if (late_scissor_emission)
>> + radv_emit_scissor(cmd_buffer);
>> }
>> static void
>>
>
>
> --
> Lerne, wie die Welt wirklich ist,
> Aber vergiss niemals, wie sie sein sollte.
More information about the mesa-dev
mailing list