[Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

Fri Jul 28 10:46:02 UTC 2017

Hi,

On 28 July 2017 at 10:24, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> On 28.07.2017 09:44, Daniel Stone wrote:
>> No, I don't think it is. Tiled layouts still have a stride: if you
>> look at i915 X/Y/Yf/Y_CCS/Yf_CCS (the latter two containing an
>> auxiliary compression/fast-clear buffer), iMX/etnaviv
>> tiled/supertiled, or VC4 T-tiled modifiers and how they're handled
>> both for DRIImage and KMS interchange, they all specify a stride which
>> is conceptually the same as linear, if you imagine linear to be 1x1
>> tiled.
>>
>> Most tiling users accept any integer units of tiles (buffer width
>> aligned to tile width), but not always. The NV12MT format used by
>> Samsung media decoders (also shipped in some Qualcomm SoCs) is
>> particularly special, IIRC requiring height to be aligned to a
>> multiple of two tiles.
>
> Fair enough, but I think you need to distinguish between the chosen stride
> and stride *requirements*. I do think it makes sense to consider the stride
> requirement as part of the format/layout description, but more below.

Right. Stride is a property of one buffer, stride requirements are a
property of the users of that buffer (GPU, display control, media
encode, etc). The requirements also depend on use, e.g. trying to do
rotation through your scanout engine can change those requirements.

>> It definitely seems attractive to kill two birds with one stone, but
>> I'd really much rather not conflate format description/advertisement,
>> and allocation restriction, into one enum. I'm still on the side of
>> saying that this is a problem modifiers do not solve, deferring to the
>> allocator we need anyway in order to determine things like placement
>> and global optimality (e.g. rotated scanout placing further
>> restrictions on allocation).
>
> Okay, the original issue here is that the allocator *cannot* determine the
> alignment requirement in the use case that prompted this sub-thread.
>
> The use case is PRIME off-loading, where the rendering GPU supports linear
> layouts with a 64 byte stride, while the display GPU requires a 256 byte
> stride.
>
> The allocator *cannot* solve this issue, because the allocation happens on
> the rendering GPU. We need to communicate somehow what the display GPU's
> stride requirements are.
>
> How do you propose to do that?

The allocator[0] in itself can't magically reach across processes to
determine desired usage and resolve dependencies. But the entire
design behind it was to be able to solve cross-device usage: between
GPU and scanout, between both of those and media encode/decode, etc.
Obviously it can't do that without help, so winsys will need to gain
protocol in order to express those in terms the allocator will
understand.

The idea was to split information into positive capabilities and
negative constraints. Modifier queries fall into the same boat as
format queries: you're expressing an additional capability ('I can
speak tiled'). Stride alignment, for me, falls into a negative
constraint ('linear allocations must have stride aligned to 256
bytes'). Similarly, placement constraints (VRAM possibly only
accessible to SLI-type paired GPU vs. GTT vs. pure system RAM, etc)
are in the same boat AFAICT. So this helps solve one side of the
equation, but not the other.

Cheers,
Daniel

[0]: Read as, 'the design we discussed for the allocator at XDC'.