X Gesture Extension protocol - draft proposal v1

Thu Aug 19 20:20:45 PDT 2010

On Wed, Aug 18, 2010 at 10:29:20AM -0400, Chase Douglas wrote:
> > * What's not clear is how exactly this incorporates with the XI2.1 extension
> > proposal I sent the draft out for earlier. Is this supposed to be parallel
> > or exclusive, i.e will a set of touchpoints that generate a gesture still be
> > sent as touchpoints to the client? or will they be converted and the raw
> > touchpoint data thus discarded? Or is it to replace the other draft
> > completely (I don't think so, given section 1)
> 
> First, I see multitouch and gestures as two separate concepts, even
> though they're closely linked. Multitouch is defined by the ability to
> send raw data about each touch wherever they are on the screen, while
> gestures are a grouping of multitouch touches as a higher order event.
> There are times when an application wants only multitouch events, and
> when an application only wants gesture events. For example, google maps
> may just want pan and zoom gestures, while inkscape only wants
> multitouch events for drawing.

I have to disagree very strongly here. IMO, gestures are merely an
interpretation of input events conveying a specific meaning, much in the
same manner as a doubleclick conveyes a meaning.

And quite frankly, I'm pretty sure that most multitouch applications will
use _both_ gestures and multitouch in the future. That's an assumption based
on my beergut feeling, feel free to prove me wrong in a few years time ;)

> The strict dependency here is the event stream to the GE that I didn't
> define as fully as I should have. The GE receives MT events through XI
> 2.1 as its raw input stream to perform recognition. It only makes sense
> to me to send these events using the XI 2.1 protocol instead of defining
> a new protocol just to send events to the GE.

Yeah, that makes sense. But why only allow touch events to be interpreted as
gesture events? Opera I think was the first browser to come out with mouse
gestures and I used to love it. There is a use-case for other gestures.

> > * If this supplements the XI2.1 MT proposal, how does gesture recognition work
> > when there's a grab active on any device?
> 
> Good question. I'll admit that I haven't fully worked out this issue. In
> Maverick we are actually siphoning off events inside the evdev input
> module, so gestures would supersede input device grabs. However, my goal
> is to move the gesture event handling inside the server where we could
> reverse the ordering. If I had to make an educated guess as to what
> should happen, I would say that an active grab should override gesture
> support for the grabbed device.
> 
> > * What kind of event stream is sent to the gesture engine (GE)? it just says
> >   "stream of events" but doesn't specify anything more.
> 
> XI 2.1 events. I'll be sure to make that more clear in the next
> revision.

Make sure to be clear _what_ type of events. XI 2.1 is not finished yet, the
proposed bits cover the touch parts of it but there are several other ideas
floating around that may (or more likely may not) go in.

Also, you realise that the draft I sent out is just that - a draft? You're
building on top a potentially moving target here (just a warning).

> > * Why does the GE need to be registered by the X server? Kristian
> > asked that already and I don't think Section 3 answers this sufficiently.
> 
> I like the idea of a pluggable gesture engine. Gesture recognition is a
> new field IMHO, and to assume that we know the best mechanism for
> recognition seems a little naive :). I want people to be able to play
> with recognition to see if they can come up with better algorithms
> and/or better gestures. We allow this in the protocol by letting an X
> client be the gesture engine.

gesture recognition is not really new, researchers have been all over it for
years if not decades. I really recommend looking through UIST, CHI, CSCW,
TabletTop and whatnot proceedings, I've seen several different attempts to
define gestures. There's unfortunately a lot of noise in those proceedings
and from my experience most implementations aren't useable immediately, but
you can find the occasional useful bit.

Bonus points - many researchers would be quite excited to have their ideas
end up in a real product so you can likely expect some collaboration.

> To me, this is similar to the kernel having multiple memory allocators
> and I/O schedulers. The first implementation of each has been thrown
> away by now, and we have a good selection of interest approaches right
> now. Enabling the same capabilities in X seems appropriate to me, and
> having them exist outside of the X process space enables capabilities
> like threaded recognition.
> 
> That's not to say that we would oppose inclusion of a GE inside X
> itself, but having the GE be a separate client also means GEs can have
> different licensing and not be tied directly to the X release schedule.

I think my point may have been ambiguous. What I wanted to say is: I'm for X
not having _anything_ at all to do with the GE. X forwards the events to the
client, the client then may or may not pass it to the GE (e.g. over dbus). X
just sends the raw events, contextual interpretation of these events is done
purely client-side.

> > I'll be upfront again and re-state what I've said in the past - gestures do
> > IMO not belong into the X server, so consider my view biased in this regard.
> > The X protocol is not the ideal vehicle for gesture recognition and
> > inter-client communication, especially given it's total lack of knowledge of
> > user and/or session.
> 
> Believe me, we tried our hardest not to throw all this into the X
> server :). It's not that gesture recognition needs to be inside the
> server. The issue is correct event propagation. Gestures occur in
> specific regions of the screen, and as such they must be propagated with
> full knowledge of the X window environment.
> 
> Say you have one application with a parent window and a child window.
> Both windows select for MT input events. The application wants to
> receive gesture events on the parent window. I then make a gesture with
> some fingers in the parent window and some in the child window. Without
> the gesture recognition and propagation inside the X server, some of the
> input events would be sent to the parent window and some to the child
> window. It becomes very difficult to assimilate all the data properly
> for gestures if the raw inputs are spread out among various child
> windows.

OTOH, something that may look like a gesture when viewed from the parent
window may indeed be independent interaction in two different windows.
Humans are (un)surprisingly adept at using two hands independently and
ruling out this use-case by assuming gestures by default is inhibiting.

The much better approach here is to teach users not to do wrong gestures.
I recommend a read of "Ripples: Utilizing Per-Contact Visualizations to
Improve User Interaction with Touch Displays" by Wigdor et al.

> > I think the same goal could be achieved by having a daemon that communicates
> > out-of-band with the applications to interpret gestures as needed, in the
> > spirit of e.g. ibus.
> 
> I'm having a hard time finding any documentation on how ibus interacts
> with X and applications for input. Can you provide any details on how it
> works?

short story: app gets key event from server, passes it over dbus to ibus,
ibus passes back the actual symbol depending on the language selected.

> > In your reply to Kristian, you said :
> > > Also, we think that there's a case to be made for environmental gestures
> > > that should override gestures recognized by clients. This is provided by
> > > the mutual exclusion flag when selecting for events. Again, this
> > > wouldn't be possible without integrating it into the X propagation
> > > mechanism.
> > http://lists.freedesktop.org/archives/xorg-devel/2010-August/012045.html
> > 
> > Unless I screwed up, the passive grab mechanism for touches allows you to
> > intercept touchpoints and interpret them as gestures as required. The mutex
> > mask described below is like a synchronised passive grab mask. 
> > The major difference here is that unlike a truly synchronised grab, the
> > events are being continuously sent to the client until the replay is
> > requested (called "flushing" here in this document). I think this
> > requirement can be folded into the XI2.1 draft.
> 
> I think your idea would be useful for sending events to the gesture
> engine and then replaying input events if no gestures are found. That
> could simplify some of the GE part of this spec.
> 
> However, my reply to Kristian is more about a "passive grab" mechanism
> for gestures themselves. This partly comes back to the fact that you
> can't easily do proper gesture propagation outside of X, so you have to
> be able to define a passive grab mechanism for gestures.
> 
> > >  Gesture primitives comprise events such as pans, pinches, and rotatation.
> > 
> > question: what are some more gestures that are not pan/swipe (horiz or
> > vert), pinch/zoom and rotation? Can you think of any?
> 
> Tapping, which we support as well, but I don't think that's your real
> point :).
> 
> > the problem with gestures is that there's a million of them, but at the same
> > time there's only a handful that are "obvious". beyond that you're in
> > unchartered territory and _highly_ context dependent (at this point I'd like
> > to thank Daniel Wigdor for a highly interesting talk I recently attended
> > that outlined these issues)
> 
> Yes, I agree. I have yet to think of any further gesture types that
> would be obvious. However, I don't want to exclude the possibility of
> recognizing other primitives at a later time.
> 
> If your question is geared more towards why we call these primitives
> instead of just gestures, it comes from an idea we have that primitives
> may be strung together at a high level (maybe a toolkit?) to have some
> predefined meaning. For example, a DJ application may define a gesture
> sequence as two finger down, release one finger, drag one finger, then
> tap the second finger again. This gesture may be defined to have a
> specific meaning when it occurs over a mixer level control. This is very
> much a new idea though, it's not been implemented and tested yet.

IMO, this is _way_ too complicated to be regarded as a gesture by the
engine. And this is my main grief here, by having a single engine you
require that engine to be a catchall one. Having a completely client-side
engine allows you to do crazy gestures in one app but have other apps use a
simpler engine.

Use-case of Firefox again: How does FF deal with gestures on other OS? does
it take what the OS provides or does it have its own engine? If the latter,
it's likely that FF does not want to rely on any other engine to keep the
behaviour consistent across platforms.
(I might be wrong here, please let me know if this is the case)

Substitute FF with any app, just to be sure.

> > One example (though gestures are notoriously difficult to explain via
> > email): 
> > I have a map application like google maps. I put two fingers down and move
> > them apart. What does the app do?  Zoom? No, what I'd like it to do is to
> > move the two waypoints I touched to two different locations. But only the
> > app that controls these objects can know that. The app needs to decide
> > whether this particular event is to be a gesture or not.
> > 
> > The current proposal allows for an all-or-nothing approach, so the app
> > above cannot support this particular type of interaction or not supporting
> > the X-provided gestures. A more flexible approach would be to have the app
> > hand events to the GE, thus making gestures a last resort if no other
> > meaningful interaction can be performed.
> > (Exactly the inverse of the AllowEvents approach which only flushes the
> > events once no gesture has been recognised)
> 
> I think the easiest answer to this question is to give an overview of
> how this would work when the application is running inside the Unity
> window manager:
> 
> Our overall idea is that three and four finger initiated gestures are
> window manager/environment gestures, and two finger initiated gestures
> are application gestures. The Unity window manager will listen for three
> and four finger initiated gestures, and it will use these gestures for
> window management and other high level environment use cases. Firefox
> will listen for two finger initiated events.
> 
> Firefox will listen likely for two finger scroll on the entire page for
> normal pages. When a web page requests for multitouch input in a given
> region, like a google map, FF would create a new subwindow for it and
> cut out the subwindow's input region from the web page parent window
> using X Shape. The subwindow would then be free to listen to all one or
> two finger XI 2.1 input events, or more fingers as long as the touches
> are initiated with one or two fingers.
> 
> Of course, if FF only listened to XI 2.1 events instead of using X
> Gesture, then this simplifies greatly.

This is a problem then. As soon as something simplifies something greatly,
you've just found what most developers will want to use, be it out of
convenience, lazyness, or any other reason.

> To answer the last part of your thought, having the client hand events
> to the gesture engine breaks down when gestures occur over multiple
> windows that select for input events.

I do wonder how often this happens. The fat-finger problem is well-known but
many gestures have to at least _initiate_ within the window they are to be
used in (note that I am think about direct-touch interaction here). I'm
trying hard to think of a meaningful gesture that works across multiple
windows and cannot be solved with grabs but so far I have not come up with
any use-cases.
Note that gesture input is reasonably new and users won't be used to it
either way. So there is some amount of "getting used to it" that we can
expect too, once we're past the initial confusion.

> > > When a gesture engine recognizes a primitive that occurs within a window
> > > with a client that selected to receive events for the primitive, a gesture
> > > event is sent to the client.
> > 
> > What happens if a group of touches represent a gesture but another group of
> > touches present are not represented as part of a gesture? Use-case: I pinch
> > to zoom while trying to rotate with another finger in the same app.
> 
> Multiple gestures can be performed at the same time, whether in one
> window or many windows. The gesture engine can separate out input events
> into separate groupings as it sees fit before performing gesture
> recognition. If one grouping is a gesture but another isn't recognized,
> then there may be gesture events and XI 2.1 events interleaved.
> 
> > > It is necessary for a gesture engine to recognize mutually exclusive gesture
> > > primitives for each set of primitives defined for a given number of touches.
> > 
> > In light of the often-used pinch and rotate gestures - how do you plan to
> > handle this then? Require the user to lift the fingers before rotating the
> > photo they just zoomed? I doubt so, but I don't quite know how this is
> > supposed to work given the details here.
> 
> If possible, I would suggest giving what we have in Maverick a spin to
> see how this works.

I'd like to but I lack the hardware.

> Say you have a picture on screen that you can drag, rotate, and resize
> at will. You would listen for pan, pinch, and zoom gestures. When you
> start touching the screen, you begin a series of gesture primitives. If
> you pinch and then rotate, you'll see a pinch primitive start, update,
> then end at the same time a rotate primitive starts. If you are
> listening to all three of these primitives, you end up seeing a lot of
> gesture primitive transitions over the lifetime of a gesture.
> 
> Perhaps I should give such an example in the protocol document?
> 
> > On Mon, Aug 16, 2010 at 11:13:20AM -0400, Chase Douglas wrote:
> > > 
> > >                             The X Gesture Extension
> > >                                   Version 1.0
> > 
> > 1.0 is optimistic if you haven't had any exposure yet, I recommend changing
> > this to something pre 1.0, especially given the massive draft proposal
> > warning just below.
> 
> My idea was that this protocol as proposed can change, but once people
> agree to it and an implementation is available, it then becomes version
> 1.0 and no longer is a draft proposal. If it makes more sense to
> yourself or others to call it 0.9.x until it's really implemented, then
> we can switch to that scheme.

I think that's a better approach, yes. I don't have a problem with calling
it 1.0 once it's set in stone.

> > > ********************************************************************************
> > > ********************************************************************************
> > > **************************                           ***************************
> > > **************************    DRAFT PROPOSAL (v1)    ***************************
> > > **************************                           ***************************
> > > ********************************************************************************
> > > ********************************************************************************
> > > 
> > > 
> > > 1. Introduction
> > > 
> > > The X Gesture Extension is a mechanism to provide the following:
> > > - Interface for X clients to register and receive gesture primitive events
> > > - Interface for an X client to act as a gesture engine
> > > 
> > > Gestures may be seen as logical groupings of multitouch input events. Thus,
> > > this extension is dependent on the X Input Extension version 2.1, which
> > > implements multitouch input support.
> > > 
> > >                               ❧❧❧❧❧❧❧❧❧❧❧
> > > 
> > > 2. Notations used in this document
> > 
> > <skip>
> > 
> > > 
> > >                               ❧❧❧❧❧❧❧❧❧❧❧
> > > 3. Data types
> > > 
> > > DEVICE { DEVICEID, AllDevices }
> > >         A DEVICE specifies either an X Input DEVICEID or AllDevices.
> > 
> > AllMasterDevices is missing here, and in the other mentions below.
> 
> This is intentional. Gestures are tied to absolute input devices, and
> properties are given in screen coordinates. Thus, you should only be
> listening for input on individual devices themselves, not the aggregates
> that are master devices.
> 
> I'm not 100% convinced of this approach though, it just feels right to
> me. I'd be happy to add in master devices if it makes sense.

I think it's best to add them here. While aggregate master devices may not
make a lot of sense for direct touch devices (well, maybe in clone mode,
but...) I don't see why we shouldn't just pass them through.

> In Maverick, since we're siphoning events off inside the input module,
> master input devices will never have gesture events.
> 
> > > GESTUREID { CARD16 }
> > >         A GESTUREID is a numerical ID for an X Gesture type currently available
> > >         in the server. The server may re-use a gesture ID if available gesture
> > >         types change.
> > > 
> > > GESTUREFLAG { MUTEX }
> > >         A flag telling the server to not propagate gestures to child clients
> > >         when a gesture type in the associated mask is set. The gesture will only
> > >         be sent to the registering client.
> > >         When registering for gestures, a client using this flag may not listen
> > >         for gesture IDs that any other client has registered for with the MUTEX
> > >         flag.
> > > 
> > > GESTUREMASK
> > >         A GESTUREMASK is a binary mask defined as (1 << gesture ID). A
> > >         SETofGESTUREMASK is a binary OR of zero or more GESTUREMASK.
> > > 
> > > GESTUREPROP { property:             ATOM
> > >               property_type:        ATOM }
> > >         A GESTUREPROP is the definition of a single property of a gesture. The
> > >         property field specifies a label for the gesture. The property_type
> > >         field specifies the data type of the property. For example, the property
> > >         type may be the atom representing "FLOAT" for an IEEE 754 32-bit
> > >         representation of a floating point number. Where applicable, both the
> > >         property and the type should conform to the standard gesture
> > >         definitions.
> > > 
> > > EVENTRANGE { start:                 CARD16
> > >              end:                   CARD16 }
> > >         An EVENTRANGE specifies a range of event sequence numbers, inclusively.
> > > 
> > >                               ❧❧❧❧❧❧❧❧❧❧❧
> > > 3. Gesture Primitives and Events

> > > 5.1 Gesture Engine Operation
> > > 
> > > Once a gesture engine is registered, it will begin receiving a stream of events
> > > from the X server. The events are buffered inside the server until a request by
> > > the engine is received with instructions for how to handle the events.
> > > 
> > > When the engine recognizes a gesture primitive, it sends a gesture event to the
> > > server with the set of input event sequence numbers that comprise the gesture
> > > primitive. The server then selects and propagates the gesture event to clients.
> > > If clients were selected for propagation, the input events comprising the
> > > gesture primitive are discarded. Otherwise, the input events are released to
> > > propagate through to clients as normal XInput events.
> > > 
> > > When the engine does not recognize any gesture primitive for a set of input
> > > events, it sends a request to the server to release the input events to
> > > propagate through to clients as normal XInput events.
> > > 
> > > The server may set a timeout for receiving requests from the gesture engine. If
> > > no request from the engine is received within the timeout period, the server may
> > > release input events to propagate through to clients as normal XInput events.
> > > The timeout value is implementation specific.
> > 
> > any guesses as to what this timeout may be? do we have any data on how long
> > the average gesture takes to be recognised?
> 
> Currently we  do all the event processing in the same thread as the rest
> of X, and I think there's minimal latency such that it doesn't impact
> performance. The gesture recognition code is only a few hundred lines.
> 
> Since we haven't split it out into a separate client yet, we haven't had
> to deal with acceptable GE latency. I think there's some research that
> says any UI latency above 100 ms can become an issue, so perhaps a
> timeout value based on that would be a good starting point?

as said in the other email, the GE implementation is the least of your
worries. It's the nature of gestures and how quickly they can be identified.

> > As I read this at the moment, this approach means that _any_ touch event is
> > delayed until the server gets the ok from the GE. The passive grab approach
> > for gesture recognition also means that any event is delayed if there is at
> > least one client that wants gestures on the root window. What's the impact
> > on the UI here?
> 
> I don't think I understand :). The gesture engine does its recognition
> and then hands gesture events off to the server or tells the server to
> allow XI 2.1 events it's queued up. At this point, it's just a matter of
> propagation and selection. Maybe your argument is that you have to check
> the full lineage from the child window to the root window to find if
> anyone is listening for the gesture event, but that shouldn't take very
> long.

my argument here is that you get an input event, send it to the GE, then
wait for the GE to return with some value before you either send the raw
events or the gesture event. Given a timeout of e.g. 50 ms, how much does
this accumulate to before the actual event will arrive at the client?

I don't know how the GE works, but if you send multiple subsequent events to
the GE, does the timeout accumulate or reset on each event?
e.g. finger 1 sets timeout to 50ms, but 40ms into it another finger
arrives. you now need to wait another 50ms before you can give the go. For a
4 finger gesture, you're up to 200ms already before the GE can give the go
or no-go for the gesture. by the time the event actually arrives, a delay is
surely noticable.

> > >         ▶
> > >         num_gestures:           CARD16
> > >         gestures:               LISTofGESTUREMAP
> > >     └───
> > > 
> > >     GESTUREMAP  { gesture_id:           GESTUREID
> > >                   gesture_type:         ATOM }
> > > 
> > >     GestureQueryAvailableGestures details information about the gesture types
> > >     that are recognizeable by the gesture recognizer registered with the server.
> > > 
> > >     Each gesture is detailed as follows:
> > >     gesture_id
> > >         The unique ID of the gesture. Gesture IDs may be re-used when the
> > >         available gestures to be recognized changes.
> > >     gesture_type
> > >         An ATOM specifying the type of gesture. The gesture type should conform
> > >         to the list of standard gesture types if applicable.
> > 
> > Why do we need id and type? Can there be more ID's of one type? The
> > QueryProperties request doesn't allow for per-id selection, so the ID seems
> > superfluous, it might as well be just the type, right.
> 
> Type should be a universally global identifier. Any gesture engine that
> can recognize a two-finger pinch should use the same type, "2-Pinch"
> perhaps. A thought of mine is to publish a standard for gesture types
> which could be extended as more useful primitives are defined, but maybe
> there really are only four "obvious" gestures in this world :).
> 
> When a GE registers with the server, the type is assigned to an ID,
> which is also used as the bitmask position for selecting events. I think
> I can make that clearer in a second revision.

Or you could just supply a list of atoms in the SelectEvents request and
convert this to bitmasks internally. I don't really see the benefit the
bitmasks provide to the client and we're not strapped for bandwidth here
either.

> > >     GestureListenForGestureChanges registers the client to be sent a
> > >     GesturesChanged event.
> > > 
> > >     listen
> > >         Inequal to 0 if the client wants to receive GestureChanged events
> > 
> > I don't quite see why this is necessary. If a client doesn't care, let it
> > not register the event mask, otherwise just send the event and let the
> > client ignore it. A simple bit in SelectEvents is enough here.
> 
> I want a client to be able to connect while a GE is unavailable, but
> then receive an event when a GE registers so it knows it can begin
> listening for gestures. One way we could do this is to define the
> GestureChanged event as one of the events in the bitmask as you
> suggested.
> 
> > > 
> > >     On receipt of a GestureChanged event, the client may send the
> > >     GestureQueryAvailableGestures request to receive the new list of available
> > >     gestures.
> > >     Note that all clients listening for any gesture events on any window will
> > >     receive a GestureChanged event regardless of whether they have called this
> > >     request with any value for listen. However, the server will honor the last
> > >     listen value sent in this request whenever the client is not listening for
> > >     gesture events.
> > 
> > I'm confused...
> 
> Hopefully my comments above help sort out the issue :).
> 
> > >     ┌───
> > >         GestureQueryGestureProperties
> > >         gesture_type:           ATOM
> > >         ▶
> > >         num_properties:         CARD16
> > >         properties:             ListofGESTUREPROP
> > >     └───
> > > 
> > >     GestureQueryGestureProperties details properties of the requested gesture
> > >     type.
> > > 
> > >     ┌───
> > >         GestureSelectEvents
> > >             window:         Window
> > >             device_id:      CARD16
> > >             flags:          SETofGESTUREFLAG
> > >             mask_len:       CARD16
> > >             init_mask:      GESTUREMASK
> > >             cont_mask:      GESTUREMASK
> > >     └───
> > > 
> > >     window
> > >         The window to select the events on.
> > >     device_id
> > >         Numerical device ID, or AllDevices.
> > >     flags
> > >         Flags that may affect gesture recognition, selection, or propagation.
> > >     mask_len
> > >         Length of mask in 4 byte units.
> > >     init_mask
> > >         Gesture mask for initiation. A gesture mask for an event type T is
> > >         defined as (1 << T).
> > 
> > Don't do this (1 << Y) thing, this wasn't one of the smarter decisions in
> > XI2. Simply define the masks as they are, don't bind them to event types.
> > Though it hasn't become a problem yet, I already ran into a few proposals
> > where this would either be too inflexible or would create holes in the mask
> > sets (latter not really a problem, but...).
> 
> This was lazy copy and paste from me :). The proposal should read:
> 
>     init_mask
>         Gesture mask for initiation. A gesture mask for an event ID I is
>         defined as (1 << I).
> 
> Does the distinction between types and IDs resolve your issue, or are
> you referring to some other issue?

I think the specification as (1 << I) may cause issues long term.

> > >     cont_mask
> > >         Gesture mask for continuation. A gesture mask for an event type T is
> > >         defined as (1 << T).
> > > 
> > >     GestureSelectEvents selects for gesture events on window.
> > > 
> > >     The mask sets the (and overwrites a previous) gesture event mask for the
> > >     DEVICE specified through device_id. The device AllDevices is treated as a
> > >     separate device by server. A client's gesture mask is the union of
> > >     AllDevices and the per-device gesture mask.
> > 
> > I'd add a reference to the XI2 definition of AllDevices and AllMasterDevices
> > event mask handling here to avoid duplicating (and possibly accidentally
> > changing) the definition
> 
> Alright. I'll fix it up for the next revision.
> 
> > >     The removal of a device from the server unsets the gesture masks for the
> > >     device. If a gesture mask is set for AllDevices, the gesture mask is not
> > >     cleared on device removal and affects all future devices.
> > > 
> > >     If mask_len is 0, the gesture mask for the given device is cleared. However,
> > >     a client requesting for mutual exclusion may register for any valid mask_len
> > >     length of mask with all bits set to 0. This allows a mutual exclusion client
> > >     to prohibit any other client from gaining exclusive privilege.
> > > 
> > >     ┌───
> > >         GestureGetSelectedEvents
> > 
> > <skip>
> > 
> > >     ┌───
> > >         GestureGetAllSelectedEvents
> > >             window:         Window
> > >             ▶
> > >             num_masks:      CARD8
> > >             masks:          LISTofCLIENTEVENTMASK
> > >     └───
> > > 
> > >     CLIENTEVENTMASK { client_id:   CLIENT,
> > >                       device_id:   DEVICE,
> > >                       mask_len:    CARD8,
> > >                       init_mask:   GESTUREMASK
> > >                       cont_mask:   GESTUREMASK }
> > > 
> > <skip>
> > > 
> > >     GestureGetAllSelectedEvents retrieves the gesture selections for all clients
> > >     on the given window.
> > 
> > Is there a specific need for this request?
> 
> One idea is to make a legacy translation application. It is a client of
> the server that listens for gestures such as pinch and pan to do zooming
> and scrolling. It would then translate the gesture events into key and
> button events and send them to clients using XTest. Note that this is
> useful only until toolkits start listening to gestures to interpret them
> properly.
> 
> In order to do this correctly, the application needs to only listen on
> windows that another client is not already listening for gestures on.
> This request provides a way to determine if any other clients are
> listening already.
> 
> I tried to find out how a client could get the XI 2 event selection mask
> of all clients, but I couldn't find any mechanism. If there is a
> mechanism, let me know and I'll take a look to see if it would work here
> as well.

there isn't one, I didn't really see the need for it. XI 1.x has it but it
handles devices very differently to XI2, so that wasn't really a reference
point.

> > > 7.2 Gesture engine client requests
> > > 
> > >     ┌───
> > >         GestureEngineRegister
> > >             num_gestures:   CARD16
> > >             gestures:       LISTofGESTUREINFO
> > >     └───
> > > 
> > >     GESTUREINFO { gesture_type:      ATOM,
> > >                   num_properties:    CARD16,
> > >                   properties:        LISTofGESTUREPROP }
> > > 
> > >     GestureEngineRegister is the mechanism by which a gesture engine registers
> > >     with the X Gesture extension to be able to process gesture events. Only one
> > >     gesture engine may be registered to the server at any given time. Further
> > >     registration requests will cause a GestureEngineRegistered error.
> > >     When the gesture engine is registered successfully, a GesturesChanged event
> > >     is sent to all clients registered to listen for the event. The clients may
> > >     then request the new list of available gestures from the server.
> > > 
> > >     ┌───
> > >         GestureEngineUnregister
> > >     └───
> > > 
> > >     GestureEngineUnregister unregisters the gesture engine from the server. If
> > >     the client has not registered a gesture engine successfully through the
> > >     GestureEngineRegister request, a BadValue error will result. Otherwise, a
> > >     GesturesChanged event will be sent to all clients registered to listen for
> > >     the event.
> > > 
> > >     ┌───
> > >         GestureAllowInputEvents
> > >             num_ranges:     CARD16
> > >             ranges:         LISTofEVENTRANGE
> > >     └───
> > > 
> > >     GestureAllowInputEvents instructs the server to flush the input events to
> > >     clients unmodified. This is used when no gestures are recognized from
> > >     sequences of input events.
> > >     If any of the EVENTRANGE values are invalid, the BadValue error is reported
> > >     and no input events are flushed.
> > > 
> > >     ┌───
> > >         GestureRecognized
> > >             num_ranges:                 CARD16
> > >             ranges:                     LISTofEVENTRANGE
> > 
> > sequence numbers are not suited for this type of range. the sequence number
> > set in the EVENTHEADER only increments if additional requests are processed.
> > For clients that purely listen to events and e.g. dump them into a file, the
> > sequence number does not change after the first set of requests.
> 
> Hmmm. I guess some other mechanism will be needed. I'll have to think
> about this. Hopefully your suggestion of using active grabbing and
> replaying events could work instead, making this issue moot.
> 
> > >             gesture_id:                 CARD16
> > >             gesture_instance:           CARD16
> > >             device_id:                  CARD16
> > >             root_x:                     Float
> > >             root_y:                     Float
> > >             event_x:                    Float
> > >             event_y:                    Float
> > 
> > probably better to use the same type as in the XI2 spec.
> 
> I want to provide a protocol using XCB, and I couldn't figure out an
> easy way to do so with an FP1616 type. If there's a way, then that would
> be fine with me. If not, which would be easier? Fixing up XCB to provide
> a way or just using IEEE 754 floats instead?
> 
> Admittedly, I didn't spend a large amount of time looking for an FP1616
> solution in XCB since I don't understand the appeal of FP1616 :).

it was added to avoid a required format for floats on the protocol and
as alternative for devices without a useful FPU. Mixing datatypes for
extensions so similar (or the same if this is just folded into XI2) is IMO a
bad idea, it gives us very little benefit.

> > >         Values of all valuators for the touch. Valuators are defined in the
> > >         XInput protocol specification. The specific meaning of each valuator is
> > >         specific to the input device.
> > > 
> > >     A GestureNotify event is generated whenever a gesture occurs in a window for
> > >     which the client has requested the gesture ID.
> > > 
> > >     gesture_id
> > >         Gesture ID of the gesture type.
> > >     gesture_instance
> > >         Unique ID of this gesture instance. The ID will be maintained as the
> > >         gesture progresses from start to end as signified in the status field.
> > >         This value is monotonically increased by the server for every gesture
> > >         causing events to be sent to a client. The ID will only be reused once
> > >         a 32-bit wrap occurs.
> > >     device_id
> > >         X Input device ID of the slave device generating this gesture.
> > 
> > why the slave device? why not use device_id and source_id?  
> 
> For the same reason I didn't provide for all master devices when
> selecting for events. If there's a good reason for all master device
> selection, then the device_id and source_id would be warranted.
> 
> > I think that's all I have for now, but I guess more will come up in the
> > discussion. Thanks again for this draft!
> 
> I really appreciate the time and effort you put into this review. Thanks
> a lot!
> 
> As you said earlier, you're biased against putting gesture stuff inside
> X. I admit that I had the exact same bias at first too. However, I
> couldn't find any mechanism to handle event propagation and selection
> properly without embedding the logic within X itself. If you can come up
> with some scheme that fulfills the needs outlined here and sits outside
> of X, we'd be quite happy to scrap this proposal and move on :).
> 
> If it would help, perhaps we should start a new thread just on the
> question of whether gestures should propagate through X or if there's
> some better solution outside of X.

I have not yet completely formed my thoughts on this. I know what you're
trying to do but I'm not sure yet if either alternative is possible,
necessary and/or feasable. Both approaches have their advantages and
disadvantages and I am mostly trying to find the one that lets us do the
most while getting in the way the least.

Cheers,
  Peter

Sorry about the slow answers, this email did indeed take me two days to
write, coming back, rethinking and rewriting bits and pieces.