X Gesture Extension protocol - draft proposal v1

Thu Aug 19 08:35:07 PDT 2010

On Thu, 2010-08-19 at 09:10 +1000, Peter Hutterer wrote:
> On Wed, Aug 18, 2010 at 05:02:57PM -0400, Chase Douglas wrote:
> > On Wed, 2010-08-18 at 21:54 +0200, Simon Thum wrote:
> > This is another reason why I like to decouple the GE from X: it allows
> > people to play with touch to gesture assignment. 
> 
> but aren't you coupling GE and X with this protocol? apps now must go
> through the GE picked by the server* and take it or leave it.
> 
> * yeah, not 100% true but you get the point

By adding the protocol to the server, we just provide a mechanism for
gesture recognition if people want it. If you don't have recognition and
you want it, you can just install a recognizer package and it will run
when you start X. If you have recognition and you don't want it, you can
remove the recognizer or stop it from starting up. If people want, we
could define an option so that you could forbid gesture recognizers from
registering.

Now, once a gesture engine is registered to the server it is an extra
code path for all inputs to all clients. However, there's no way around
that if you want to support environment/window manager gestures.

If this is all terribly wrong-headed and you are right that gestures
should be done above X, we haven't really caused much issue by
attempting this. Just disable gesture support through X as I described
above and use a recognizer in a toolkit or some other client-side
library.

> > > > 4. Gesture Primitive Events
> > > > 
> > > > Gesture primitive events provide a complete picture of the gesture and the
> > > > touches that comprise it. Each gesture provides typical X data such as the
> > > > window ID of the window the gesture occurred within, and the time when the event
> > > > occurred. The gesture specific data includes information such as the focus point
> > > > of the event, the location and IDs of each touch comprising the gesture, the
> > > > properties of the gesture, and the status of the gesture.
> > > Is the focus point the coordinate in GestureRecognized? If yes, why do
> > > you call it gesture-specific? It's present in all gestures recognized.
> > 
> > It's gesture specific because it may vary even among the same gesture
> > type. Think of a rotation gesture. It may be performed by moving two
> > fingers around a pivot point halfway between the two. The focus point
> > would thus be that pivot point. A rotation may also be performed by
> > rotating one finger around a stationary finger. The focus point would be
> > under the stationary finger in this case.
> 
> I think you may be arguing about the wording here only, not about the
> content. The focus point is present in every gesture, right? If so, 
> maybe a better wording may be
> "Each gesture provides generic data such as the window the gesture occurred
> within, the time of the event and the focus point of the gesture. Data
> specific to a particular type of gesture includes the location and IDs
> of each touch, ..."

You're probably right :). I'll try to incorporate your wording to make
this more comprehensible.

> > > > When the engine recognizes a gesture primitive, it sends a gesture event to the
> > > > server with the set of input event sequence numbers that comprise the gesture
> > > > primitive. The server then selects and propagates the gesture event to clients.
> > > > If clients were selected for propagation, the input events comprising the
> > > > gesture primitive are discarded. Otherwise, the input events are released to
> > > > propagate through to clients as normal XInput events.
> > > I understand what you want to achieve, but I'd argue that apps shouldn't
> > > be listening to xinput events when they register for gestures. Or at
> > > least, only in properly constrained areas. Think of a mouse/pad gesture
> > > detection - how to avoid the latency implied by that approach?
> > 
> > I'm not sure I understand your last sentence, but I'll address the rest.
> > 
> > It may be true that a client should choose whether to receive only
> > gestures or only XInput events on a given window, but I'm not sure. X
> > was designed to be as flexible as possible, and leave policy up to the
> > toolkits and libraries that sit on top of X (or so wikipedia tells
> > me :). This mechanism is following that spirit.
> > 
> > Beyond that, It sounds like we're of the same mindset here. You "argue
> > that apps shouldn't be listening to xinput events when they register for
> > gestures." This protocol discards XInput events when a gesture is
> > recognized and an event is sent to a client. I hope I haven't misread
> > anything :).
> 
> You need to clearly define whether this is a "shouldn't" or a "mustn't",
> because only the latter is something you can safely work with. In
> particular, you need to define what happens to XI events creating a gesture
> if a client has a grab on a particular device.

I don't think it's a "mustn't", and as you note we shouldn't get into
"shouldn't"s, so I don't state any particular thoughts about it in the
protocol. I don't see any reason why an app must listen only to gestures
or only to multitouch events. I don't want to block off an avenue of
input events if a client is written well enough to take advantage of
both types of events in the same window.

As for defining what happens to events when a client has a grab on a
device, I do need to state that in the protocol. As I noted in another
thread, I think XI active grabs take precedence, so I'll state that
formally in the next revision of the protocol unless there are
objections. I think gestures should take precedence over passive grabs
though.

> > I don't fully understand your last sentence, but I will try to address
> > latency concerns. I think our current gesture recognition code is less
> > than 500 lines of code (maybe more near 300 lines, Henrik wrote it and
> > has more details if you are interested). Obviously, you can do a lot in
> > a small amount of code to kill latency, but I think Henrik has crafted a
> > tight and fast algorithm. I was skeptical about latency at first too,
> > but human interfaces being what they are, we should have plenty of cpu
> > cycles to do all the gesture primitive recognition we need (please don't
> > read this and assume we're pegging the processors either :). So far in
> > testing, I haven't seen any noticeable delay, but it's still rather
> > early in development.
> 
> I don't think the algorithm is what's holding you back anyway, it's the
> nature of gestures and human input in general.Even if you GE is
> instantaneous in the recognition, you may not know for N milliseconds if the
> given input may even translate into a gesture. Example: middle mouse button
> emulation code - you can't solve it without a timeout.

Yes that's true. This is an area where I think we need to do some
research and user testing once we have the foundation implemented. As
for specifics, I feel Henrik is more qualified to respond. I'll just
note that in my own testing I think Henrik's implementation works well
all around. We'll have more useful experiences once the Unity window
manager is fully integrated and we can test it out. That should occur
within the next week or two.

-- Chase