[RFC dri3proto v6 01/14] Add modifier/multi-plane requests, bump to v1.1

Tue Feb 27 23:30:50 UTC 2018

Hi,
This reply covers the comments on v7 as well; anything I've snipped or
elided, I agree with.

On 27 February 2018 at 20:59, Keith Packard <keithp at keithp.com> wrote:
> Daniel Stone <daniel at fooishbar.org> writes:
>> Oof. Originally we tried to match the rest of the stack by using
>> formats rather than depth/bpp, but that rather breaks the current
>> model where DRI3 is a transport for a bag of bits, and Present is the
>> mechanism by which you make those bag of bits visible in a Window.
>
> Yeah, thanks for the good discussion, I was just poking at the
> boundaries here, and I now agree that depth+bpp is probably our best
> option. Including bpp lets us go beyond the core protocol and use
> formats which are different from image formats, and that seems like a
> good idea.

Never a bad thing to question assumptions! :)

>> Hm, why would they? For DirectColour, presumably either your only
>> difference from a hardware-accelerated TrueColor source is that you
>> stick a huge LUT at the end of the frag shader, or you have a software
>> pipeline.
>
> Nah, DirectColor has a LUT in the scanout engine so you can do colormap
> animation effects. As I said, it's not exactly popular these days for
> very good reasons.

I thought you needed a 3D LUT for DirectColor though, rather than the
3x 1D LUTs we have now? Anyway, onwards ...

>> DRI3 v1.2 - sent to the list in August but with very little comment -
>> adds support for dma-fence FDs. Michel questioned whether it would be
>> best to insert a GPU command-stream stall, or use an asynchronous
>> fence-completion mechanism to only later trigger presentation when the
>> fence has passed, similar to if the client had specified that
>> presentation should wait for a UST equivalent to when the fence was
>> signalled. My intuition is that he's right, but it would be great to
>> get more feedback on this.
>>
>> Louis-Francis hooked this up to the Vulkan WSI in Mesa, so you could
>> use real semaphore objects in QueuePresent and AcquireNextImage.
>>
>> https://lists.freedesktop.org/archives/xorg-devel/2017-August/054439.html
>> https://lists.freedesktop.org/archives/mesa-dev/2017-August/168201.html
>
> Those look like reasonable patches on a first read-through; I was a bit
> confused by the lack of any way for the server to wakeup when the DMA
> fence became readable? Does it not need to be able to do that in case
> the driver doesn't deal with it?

Discussion on IRC wasn't massively conclusive here, but it seems like
it's probably a good idea to follow this suggestion (made earlier by
Nicolai). That is, rather than inserting blocking waits into the
consumer client APIs (the GL command stream and KMS configuration
requests), we should poll on the fence and only 'execute' (per
Present's definition of the term) when the fence has been signalled.
Admittedly that does lose us pipelining, so may not be ideal for
performance when we're cutting it fine on overall capacity? Needs
further investigation.

>>>> > +   DRM_FORMAT_MOD_INVALID may be passed for 'modifier', in
>>>> > which
>>>> > +   case the driver may make its own inference as to the exact
>>>> > +   layout of the buffer(s).
>>>>
>>>> Presumably using information acquired externally?
>>>
>>> Or internally; some drivers add metadata to the buffer to know about
>>> tiling, etc.
>
> I'm not terribly excited about this name; is there some way we could
> make usage clearer in code?
> DRM_FORMAT_MOD_FIGURE_IT_OUT_AUTOMATICALLY_SOMEHOW seems a bit long
> though?

Clearer in intent, but sadly some years too late. That being said, if
we had a time machine yet could only make one change, I'd make the
token be 0 rather than 0xffffffff.

>> Since that's just advisory
>> text, we could explicitly make it 'undefined', or just leave it
>> completely unstated, or ...
>
> Well, I'd love to have this usage documented in the protocol spec so
> that when people read the code there's someplace that says what it
> means. And that would be more clear if we used a constant with a better
> name than INVALID.

I'll shedcraft some text.

>>   Synchronization of access to buffers shared between processes is not
>>   defined by this protocol [ed: well, until DRI3 v1.2 adds explicit fencing].
>>   Without the use of additional extensions not defined by the DRI3
>>   protocol as of version 1.1, synchronization between multiple processes
>>   and contexts is considered to follow the implicit model.
>>
>>   In this model, the underlying driver is responsible for enforcing a strict
>>   ordering such that any reads requested by one process or context are
>>   fulfilled before any writes requested by another process or context, as
>>   long as that read was definitively submitted before the write (e.g.
>>   a through a blocking read command such as glReadPixels or explicitly
>>   flushing the command stream through glFlush). A similar dependency
>>   exists for reads submitted after writes: the driver must ensure that the
>>   write is fully visible and coherent to the read request.
>>
>>   This restriction also applies for cross-device usage.
>
> Yes, a section like this would be awesome. Add it as a top-level section
> describing synchronization in the protocol.

Gotcha.

> As an X extension, I think we should be more focused on X APIs instead
> of higher level abstractions (like GL or Vulkan). In particular, I think
> we need things like this:
>
>         When Damage events for a pixmap are received by a client,
>         changes made through the X protocol will be visible in any
>         buffers associated through this extension.
>
>         Clients must ensure that all drawing done to buffers outside of
>         X are visible in those buffers before sending X requests
>         including a pixmap associated with them.
>
> As far as the bare protocol and implementation of the extension, it all
> looks good at this point. The synchronization stuff is just an old bug
> that needs fixing.

We need to be super careful about this. My interpretation of 'are
visible in those buffers' is that if I were to blindly DMA from the
underlying pages, I would see the result of that operation. That isn't
actually true ... I'll just bin the 'e.g.' and note something about
operations being recorded to an underlying driver which has global
responsibility for synchronising operations.

Cheers,
Daniel