[RFC dri3proto v6 01/14] Add modifier/multi-plane requests, bump to v1.1

Mon Feb 26 12:40:42 UTC 2018

Hi,

On 22 February 2018 at 08:52, Louis-Francis Ratté-Boulianne
<lfrb at collabora.com> wrote:
> On Wed, 2018-02-21 at 11:32 -0800, Keith Packard wrote:
>> Louis-Francis Ratté-Boulianne <lfrb at collabora.com> writes:
>> > index dac11d3..636c789 100644
>> > --- a/dri3proto.txt
>> > +++ b/dri3proto.txt
>> > @@ -1,11 +1,12 @@
>> >                       The DRI3 Extension
>> > -                        Version 1.0
>> > -                          2013-6-4
>> > +                        Version 1.1
>> > +                         2017-06-27
>> >
>> >                         Keith Packard
>> >                       keithp at keithp.com
>> >                       Intel Corporation
>>
>> You should add yourself to the authors list here.
>
> Actually, Daniel is the one who should add his name :)

Heh, right. For background, of the three new requests, two are obvious
analogues of 1.0 functions.

PixmapFromBuffer and BufferFromPixmap have width/height/depth/bpp
describing the Pixmap, and fd/stride/size describing the buffer. The
multi-buffer variant adds a modifier to the Pixmap descriptor, omits
the size from the buffer descriptor (adding an offset in its place),
and allows there to be up to four buffers attached. (Why four? Because
that's the limit encoded in drmModeAddFB2WithModifiers, the EGL
extension, and Chad's proposed Vulkan extension. Adding an arbitrary
number of buffers forced us to use the awkward XCB list API, for no
clear benefit I could see.)

GetSupportedModifiers is new, allowing the client to discover which
format modifiers the server supports when it imports the client
buffers, as well as what the server would prefer.

>> > @@ -199,6 +202,125 @@ The name of this extension is "DRI3"
>> >     associated with a direct rendering device that 'fence' can
>> >     work with, otherwise a Match error results.
>> >
>> > +┌───
>> > +    DRI3GetSupportedModifiers
>> > +   window: WINDOW
>> > +   depth: CARD8
>> > +   bpp: CARD8
>>
>> Hrm. You've uncovered a weird "feature" of the existing DRI3
>> protocol. Why does it include both depth and bpp? In the server, each
>> depth has a defined bpp for image format purposes, and (currently at
>> least), image bpp always matches the pixmap bpp. I'm afraid the
>> original
>> DRI3 proposal has no clues for us, and I'm afraid the author doesn't
>> remember...
>>
>> The only thing I can think is that including both allows us to catch
>> circumstances where the client guesses wrong about the bpp and
>> creates
>> an image in the wrong format. Checking a few drivers, I find that
>> modesetting uses it to detect this kind of error, but (at least)
>> nouveau
>> simply ignores it.

My only guess is that it was cribbed from the old drmModeAddFB API,
which uses the two to infer format; presumably at the time the author
wanted to be able to disambiguate XRGB8888 from RGB888. I would be
very happy to see the death of that API.

>> Should you use a visual here instead of depth? While visuals are
>> actually allowed to be shared across depths in the core protocol
>> specification, we currently rely on them being unique, and in fact
>> the
>> server implementation has always enforced that.
>>
>> I think using a visual would also let you support DeepColor visuals
>> and
>> gain access to those extended pixel formats. I'm not sure what bpp
>> would
>> need to be in those cases. Alternatively, we could explicitly allow
>> depth to be more than 32 here?

Oof. Originally we tried to match the rest of the stack by using
formats rather than depth/bpp, but that rather breaks the current
model where DRI3 is a transport for a bag of bits, and Present is the
mechanism by which you make those bag of bits visible in a Window.

Specifying a Visual muddies the line between the two; what happens
when you specify a DeepColor Visual for a DRI3 buffer which is
different from the Window's DeepColor visual? Or you mix and match
DeepColor and TrueColor? Or even more prosaically, a XBGR buffer
Visual and an XRGB Window Visual? Then you have Visuals which exist to
specify sRGB FBConfigs ...

I can only really see two possibilities. One is extra busywork to
ensure everything matches: look up and store the Visual, derive the
format to store bits in from the visual, then ensure at Present time
that it's 'compatible' with the window (to whatever extent
'compatible' is defined), and not do any conversions. The other is
getting into the world of implicit conversion: HDR <-> non-HDR
conversion, intra-HDR conversion, channel swizzling, maybe sRGB <->
linear encodings as well? Seems like a world of hurt.

It took me a little while to work out Michel and Eric's point from
review rounds, but given that we already have the Visual implacably
associated with the Window, I don't think adding one here is helpful
at all. But it would be trivial for any DeepColor implementation to
add a depth/bpp mapping which allowed FP1616 imports.

>> Using a visual would also allow other visual-dependent modifiers to
>> be
>> supported; one can imagine that TrueColor and DirectColor having a
>> different set of modifiers supported. Of course, anything other than
>> TrueColor isn't well supported these days anyways, so that's probably
>> less important.
>
> Wouldn't that be weird to have a totally different way of specifying
> the format from the old requests. I honestly don't know enough about
> DeepColor/HDR to really have an opinion about this though.

Hm, why would they? For DirectColour, presumably either your only
difference from a hardware-accelerated TrueColor source is that you
stick a huge LUT at the end of the frag shader, or you have a software
pipeline. For the former, modifiers aren't any different: everything
up to and including the texture sample is identical. For the latter,
it's only really relevant if you want to write software detiling I
guess ...

>> > +      ▶
>> > +   num_drawable_modifiers: CARD32
>> > +   num_screen_modifiers: CARD32
>> > +   drawable_modifiers: ListOfCARD64
>> > +   screen_modifiers: ListOfCARD64
>> > +└───
>> > +   Errors: Window, Match
>> > +
>> > +   For the Screen associated with 'window', return two lists
>> > of
>> > +   supported DRM FourCC modifiers, as defined in
>> > drm_fourcc.h,
>> > +   supported for DRI3 pixmap/buffer interchange. The first
>> > list
>> > +   'drawable_modifiers' contains modifiers that are specific
>> > to
>> > +   this window and should be used over the more generic
>> > +   'screen_modifiers' set.
>>
>> This is somewhat confusing. We use 'window' as a proxy for 'screen'
>> in
>> many requests, but in this request it doesn't seem to be just a proxy
>> as
>> you also have a set of modifiers that are specific to it.
>>
>> I think what makes sense is to use 'window' only to name the screen,
>> and
>> then to have the specific modifiers depend on the depth requested,
>> not
>> the drawable specified. That way you can reliably request information
>> about how to create pixmaps without needing to create a temporary
>> window
>> of the right format before using DRI3PixmapFromBuffers.
>
> Like was mentionned by Daniel on #xorg-devel, we really need to have
> the 'drawable' here and can't easily retrieve the drawable-specific
> modifiers just from the 'depth'. (We check if the window is a possible
> candidate for scanout to avoid useless client buffer reallocation).
>
> We don't explicitely state that in the spec though to be as general as
> possible. I guess it wouldn't hurt to add an example though if you
> think it would make things clearer.

Maybe something like:
  Return supported DRM FourCC modifiers for the specified 'window'.

  The first list of 'drawable_modifiers' contains a set of modifiers which
  the server considers optimal for the window's current configuration.
  Using these modifiers to allocate, even if locally suboptimal to the
  client driver, may result in a more optimal display pipeline, e.g. by
  avoiding composition.

  The second list of 'screen_modifiers', is the total set of modifiers
  which are acceptable for use on the Screen associated with 'window'.
  This set of modifiers will not change over the lifetime of the client.
  Using this set of modifiers to allocate may not result in a globally
  optimal pipeline, if separate 'drawable_modifiers' are available.

  The meaning of any modifier is canonically defined in drm_fourcc.h.

>> > +   Creates a pixmap for the direct rendering object
>> > associated
>> > +   with 'buffers'. Changes to pixmap will be visible in that
>> > +   direct rendered object and changes to the direct rendered
>> > +   object will be visible in the pixmap.
>
> That paragraph was directly copy/pasted from PixmapFromBuffer. So if
> things are confusing, we might want to update the documentation there
> as well.
>
>> Do we want to define synchronization mechanisms for this? We might
>> explicitly state the ordering between X rendering and Damage events,
>> for
>> instance?
>
> Do you mean adding a new synchronization mechanism (between X rendering
> and client rendering) that would only apply to this request (not
> PixapFromBuffer)? Was it considered problematic with DRI3 v1.0?

DRI3 v1.2 - sent to the list in August but with very little comment -
adds support for dma-fence FDs. Michel questioned whether it would be
best to insert a GPU command-stream stall, or use an asynchronous
fence-completion mechanism to only later trigger presentation when the
fence has passed, similar to if the client had specified that
presentation should wait for a UST equivalent to when the fence was
signalled. My intuition is that he's right, but it would be great to
get more feedback on this.

Louis-Francis hooked this up to the Vulkan WSI in Mesa, so you could
use real semaphore objects in QueuePresent and AcquireNextImage.

https://lists.freedesktop.org/archives/xorg-devel/2017-August/054439.html
https://lists.freedesktop.org/archives/mesa-dev/2017-August/168201.html

>> I assume that the pixmap is created for the screen associated with
>> 'drawable'?
>
> Indeed

It's the only possible answer, but yeah, would be good to specify this
explicitly, and retrofit that text into the 1.0 request.

>> > +   DRM_FORMAT_MOD_INVALID may be passed for 'modifier', in
>> > which
>> > +   case the driver may make its own inference as to the exact
>> > +   layout of the buffer(s).
>>
>> Presumably using information acquired externally?
>
> Or internally; some drivers add metadata to the buffer to know about
> tiling, etc.

Right, I think Keith means externally to the protocol, i.e. from the
protocol's point of view it's an implicit / implementation-defined way
of acquiring the information. Something like i915's GEM_BO_GET_TILING
ioctl, or amdgpu's GEM_GET_METADATA ioctl (22 bits of tiling data,
plus arbitrary 'metadata' which is only used to pass information about
auxiliary surfaces and mipmap levels). Since that's just advisory
text, we could explicitly make it 'undefined', or just leave it
completely unstated, or ...

>> > +   Returns direct rendering objects associated with 'pixmap'.
>> > +   Changes to 'pixmap' will be visible in the direct rendered
>> > +   objects and changes to the direct rendered objects will be
>> > +   visible in 'pixmap' after flushing and synchronization.
>>
>> What kind of flushing and synchronization? Maybe synchronization
>> stuff
>> should be added to the overall description of the extension so that
>> it
>> applies equally to all buffer/pixmap associations?

Yeah. First cut at something like this:
  Synchronization of access to buffers shared between processes is not
  defined by this protocol [ed: well, until DRI3 v1.2 adds explicit fencing].
  Without the use of additional extensions not defined by the DRI3
  protocol as of version 1.1, synchronization between multiple processes
  and contexts is considered to follow the implicit model.

  In this model, the underlying driver is responsible for enforcing a strict
  ordering such that any reads requested by one process or context are
  fulfilled before any writes requested by another process or context, as
  long as that read was definitively submitted before the write (e.g.
  a through a blocking read command such as glReadPixels or explicitly
  flushing the command stream through glFlush). A similar dependency
  exists for reads submitted after writes: the driver must ensure that the
  write is fully visible and coherent to the read request.

  This restriction also applies for cross-device usage.

>> > +   Precisely how any additional information about the buffer
>> > is
>> > +   shared is outside the scope of this extension.
>>
>> What additional information about the buffer might be relevant?
>
> Memory alignment, compression, tiling (when modifier is not
> used/supported).

Memory placement in particular.

Cheers,
Daniel