[Nouveau] [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces

Thu Feb 2 11:53:52 UTC 2023

Am 01.02.23 um 09:10 schrieb Dave Airlie:
> [SNIP]
>>> For drivers that don't intend to merge at all and (somehow) are
>>> capable of dealing with sparse regions without knowing the sparse
>>> region's boundaries, it'd be easy to make those gpuva_regions optional.
>> Yeah, but this then defeats the approach of having the same hw
>> independent interface/implementation for all drivers.
> I think you are running a few steps ahead here. The plan isn't to have
> an independent interface, it's to provide a set of routines and
> tracking that will be consistent across drivers, so that all drivers
> once using them will operate in mostly the same fashion with respect
> to GPU VA tracking and VA/BO lifetimes. Already in the tree we have
> amdgpu and freedreno which I think end up operating slightly different
> around lifetimes. I'd like to save future driver writers the effort of
> dealing with those decisions and this should drive their user api
> design so to enable vulkan sparse bindings.

Ok in this case I'm pretty sure this is *NOT* a good idea.

See this means that we define the UAPI implicitly by saying to drivers 
to use a common framework for their VM implementation which then results 
in behavior A,B,C,D....

If a driver strides away from this common framework because it has 
different requirements based on how his hw work you certainly get 
different behavior again (and you have tons of hw specific requirements 
in here).

What we should do instead if we want to have some common handling among 
drivers (which I totally agree on makes sense) then we should define the 
UAPI explicitly.

For example we could have a DRM_IOCTL_GPU_VM which takes both driver 
independent as well as driver dependent information and then has the 
documented behavior:
a) VAs do (or don't) vanish automatically when the GEM handle is closed.
b) GEM BOs do (or don't) get an additional reference for each VM they 
are used in.
c) Can handle some common use cases driver independent (BO mappings, 
readonly, writeonly, sparse etc...).
d) Has a well defined behavior when the operation is executed async. 
E.g. in/out fences.
e) Can still handle hw specific stuff like (for example) trap on access 
etc....
...

Especially d is what Bas and I have pretty much already created a 
prototype for the amdgpu specific IOCTL for, but essentially this is 
completely driver independent and actually the more complex stuff. 
Compared to that common lifetime of BOs is just nice to have.

I strongly think we should concentrate on getting this right as well.

> Now if merging is a feature that makes sense to one driver maybe it
> makes sense to all, however there may be reasons amdgpu gets away
> without merging that other drivers might not benefit from, there might
> also be a benefit to amdgpu from merging that you haven't looked at
> yet, so I think we could leave merging as an optional extra driver
> knob here. The userspace API should operate the same, it would just be
> the gpu pagetables that would end up different sizes.

Yeah, agree completely. The point is that we should not have complexity 
inside the kernel which is not necessarily needed in the kernel.

So merging or not is something we have gone back and forth for amdgpu, 
one the one hand it reduces the memory footprint of the housekeeping 
overhead on the other hand it makes the handling more complex, error 
prone and use a few more CPU cycles.

For amdgpu merging is mostly beneficial when you can get rid of a whole 
page tables layer in the hierarchy, but for this you need to merge at 
least 2MiB or 1GiB together. And since that case doesn't happen that 
often we stopped doing it.

But for my understanding why you need the ranges for the merging? Isn't 
it sufficient to check that the mappings have the same type, flags, BO, 
whatever backing them?

Regards,
Christian.

>
> Dave.