X and gestures architecture review

Carsten Haitzler (The Rasterman) raster at rasterman.com
Fri Aug 27 17:57:56 PDT 2010


On Fri, 27 Aug 2010 12:15:08 -0400 Chase Douglas <chase.douglas at canonical.com>
said:

> Hi all,
> 
> I've been formulating some thoughts about gesture recognition in an X
> environment over the past week. I haven't had time to sit down and write
> them until now due to the Maverick beta freeze deadline yesterday.
> 
> I think there's a fundamental issue that would be worth hammering out
> above all other issues with the proposed X Gesture extension: should
> gesture recognition occur server side, before events are propagated to
> clients?
> 
> I've been doing some thinking based on feedback from Peter Hutterer. He
> is of the mindset that gestures should be recognized on
> the client side only, whereas the X Gesture extension recognizes
> gestures on the server side. I've been trying to think of the reasons
> for us going with server side handling so far, and here's what I've come
> up with:

i absolutely agree with peter on this. frankly the problem is that input from a
mouse, or a tablet or multiple touchpoints is ambiguous. you may be painting in
gimp - not trying to "swipe to scroll". i can go on with examples (dragging a
slider inside a scrollable area as opposed to wanting to scroll with a drag).
only the client has enough detailed knowledge of the window contents,
application mode etc. to possibly make a reliable guess as to which one to do.
it's x's job to provide as much device input to the client as it needs in a
sane digestible way to make such a decision, but... that's imho where the
server's job ends.

> 1. Say you have two windows side by side and you do a four finger drag
> with two fingers over each window. With X Gesture, the four fingers
> would be recognized together and sent only to the root window, where
> a window manager might be listening. This is because all fingers of a
> gesture must start within the same window, and the recognition occurs
> without any context of where windows are on screen. One could view this
> as a policy decision we have made.

and that goes against the tradition in x of mechanism, not policy in the
server. put policy in clients.

> 2. If recognition occurs client side, it needs to occur twice so that a
> window manager can attempt recognition and replay events to clients who
> also attempt recognition. Anecdotally, I think our utouch-grail gesture
> engine is fast enough to do recognition twice with little latency, but
> it wouldn't seem to be the most optimal solution.
> 
> 3. We don't have access to MT events on the client side in Maverick.

that's an issue that imho shouldn't affect broad design of such a thing. i sit
here with a device with multi-touch on it. xorg. the driver exposes up to 2
touch points for now (though the hardware is capable of more - but its a small
screen so there is little point having more anyway). from the client side all
i care about is that there is xinput2 support for the multiple touch points and
that i can get them and do something - example paint, or interpret as a zoom +
rotate control, or i can just use them as multiple presses on a virtual
keyboard (like in real life you may press more than 1 key at a time as you
press the next key while you have not yet released the previous one).

in order for this to work right there needs to be the ability also for wm's to
listen to events in general. xevie was an awesome idea in many ways in that a
wm can not just transform, but listen and intercept , replay any sequence etc.
of events it likes and implement policy there. the problem is the reality of
its implementation and added latency.

> I think I've been conflating the third issue with general gesture
> recognition and propagation logic to the point that when I look back
> it's hard to remember exactly why we need recognition to be server side
> due to point one alone. I know that we have to do gestures through X in
> Maverick due to a lack of MT events, but maybe that's been coloring my
> view on how it should be done in the future when MT events are
> available?
> 
> It's true that the logic behind point one may be perfectly fine, but
> having recognition through X inserts a lot of extra code into the
> server. If we are content with touches being split up into window
> regions before recognition occurs, then we may be able to get around the
> need for the X Gesture extension completely. The window manager use case
> could be supplied through input grabbing and event replaying.

you definitely have some logic behind point 1 - but i think this needs to be
solved in another way. some client-level protocol maybe (netwm-style) where if
multiple clients can and do listen for gestures. if one decides to "act" it
needs to inform the others that it's now "stolen" the meaning of that gesture
for now. of course this leads to some wonderful race conditions, but this is
where it should be solved. but first mt events need to be out there in client
space (like i am happily enjoying here), and then there needs to be an ability
for multiple clients to listen in (eg select for mt events in a way they always
get delivered to that client, regardless of target window), and now that you
can have a conflict with events, a way as above, of resolving that conflict
sanely.

> However, what's the best way to resolve point 2?

i don;t see it as an issue - you most likely will have 2 layers checking for
gestures. clients themselves (targets for mt) and some "wm" for handling
screen-wide gestures. this is the majority case by a long shot. gesture
recognition is not hard for handling swipes and rotates and so on. it's pretty
lean cpu-wise. it's hard to do so it guesses the intended gesture just right
every time though - that's the trick, and then comes filtering of input. it's
cheap and easy to do this. (but remember i compare this to the cost in managing
the scene graph of a ui, and then rendering its updates pixel by pixel in the
cpu... and that can be done realtime with smooth framerates even on
embedded-level cpu's (arm/low level atoms). so in the scheme of things gesture
processing overhead is minimally intrusive (if done efficiently). admittedly
the only "gestures" i bother handling right now are swipes for scrolling with
fingers + momentum etc.

> We have not yet begun development for the next Ubuntu cycle, but we will
> be tackling it shortly. We are open to trying out any approach that
> seems reasonable, whether it's client side or server side recognition.
> At this point I'm on the fence due to the amount of work required to
> implement the X gesture extension vs the potential latencies encountered
> by gesture recognition being performed twice.
> 
> Thanks,
> 
> -- Chase
> 
> _______________________________________________
> xorg-devel at lists.x.org: X.Org development
> Archives: http://lists.x.org/archives/xorg-devel
> Info: http://lists.x.org/mailman/listinfo/xorg-devel
> 


-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com



More information about the xorg-devel mailing list