[Mesa-dev] [PATCH] RFC: Externd IMG_context_priority with NV_context_priority_realtime
Daniel Stone
daniel at fooishbar.org
Mon Apr 2 12:47:14 UTC 2018
On 30 March 2018 at 19:20, Kenneth Graunke <kenneth at whitecape.org> wrote:
> On Friday, March 30, 2018 7:40:13 AM PDT Chris Wilson wrote:
>> For i915, we are proposing to use a quality-of-service parameter in
>> addition to that of just a priority that usurps everyone. Due to our HW,
>> preemption may not be immediate and will be forced to wait until an
>> uncooperative process hits an arbitration point. To prevent that unduly
>> impacting the privileged RealTime context, we back up the preemption
>> request with a timeout to reset the GPU and forcibly evict the GPU hog
>> in order to execute the new context.
>
> I am strongly against exposing this in general. Performing a GPU reset
> in the middle of a batch can completely screw up whatever application
> was running. If the application is using robustness extensions, we may
> be forced to return GL_DEVICE_LOST, causing the application to have to
> recreate their entire GL context and start over. If not, we may try to
> let them limp on(*) - and hope they didn't get too badly damaged by some
> of their commands not executing, or executing twice (if the kernel tries
> to resubmit it). But it may very well cause the app to misrender, or
> even crash.
>
> This seems like a crazy plan to me. Scheduling has never been allowed
> to just kill random processes. If you ever hit that case, then your
> customers will see random application crashes, glitches, GPU hangs,
> and be pretty unhappy with the result. And not because something was
> broken, but because somebody was impatient and an app was a bit slow.
>
> If you have work that is so mission critical, maybe you shouldn't run it
> on the same machine as one that runs applications which you care so
> little about that you're willing to watch them crash and burn. Don't
> run the entertainment system on the flight computer, so to speak.
I don't know what the automotive correspondent of 'that boat has
already sailed is', but that car has already driven (under the control
of those guys in Wired). For better or worse, having infotainment and
cluster UI run on single silicon is incredibly common nowadays.
Virtualisation platforms have been big business for a while now, and
GPU sharing is absolutely something which is happening as part of
that.
Cheers,
Daniel
More information about the mesa-dev
mailing list