[PATCH v5 5/8] drm/i915/pxp: Add ARB session creation and cleanup
Teres Alexis, Alan Previn
alan.previn.teres.alexis at intel.com
Fri Feb 17 03:12:26 UTC 2023
On Tue, 2023-02-14 at 13:38 -0800, Teres Alexis, Alan Previn wrote:
> Add MTL's function for ARB session creation using PXP firmware
> version 4.3 ABI structure format.
>
> Also add MTL's function for ARB session invalidation but this
> reuses PXP firmware version 4.2 ABI structure format.
>
> Before checking the return status, look at the GSC-CS-Mem-Header's
> pending-bit which means the GSC firmware is busy and we should
> resubmit.
>
> Signed-off-by: Alan Previn <alan.previn.teres.alexis at intel.com>
> ---
alan:snip
Not part of this patch today but a new modification is required that would end up going into this patch --->
So from the internal testing we are doing on MTL, i have noticed that the first time the GSC firmware
is requested to init the arb session (right after a cold-boot or driver-reload-after-flr), it takes much longer.
This has resulted in the observation of the following problematic event flow:
1. app or igt calls gem-context-create to create a protected context (after a fresh boot or driver reload).
2. intel_pxp_start will begin the global teardown and recreation where:
2-a: the first part (i.e. session teardown) is skipped (since arb session wasnt created before this)
2-b: the second part (i.e. arb session init commands via the gsc firmware) does happen and takes a long time (on first time)
3. step 2 is queued thru a worker while the main call into intel_pxp_start continues to wait for the arb
session to start and finally bails out with a timeout (back up through gem-context-create).
4. app retries again and now we get a second call that repeats step 1 while 2-b is still wrapping up.
so depending on the race of this step 4 (step-1-recall) vs the completion of step 2-b, we could end up
getting a 2nd teardown right (i.e. step 2-a going in) after the the first arb-session-creation completed
... eventhough in both cases app just wants the creation.
The simplest fix (with minimal code changes) would be to add a complementary "is_arb_creation_pending" flag
alongside the is_arb_valid flag - with both remainining protected by the arb-mutex. That said, we I'll respin rev6
with this fix along with other mutex fix on Patch4.
More information about the dri-devel
mailing list