X and gestures architecture review

Fri Aug 27 09:15:08 PDT 2010

Hi all,

I've been formulating some thoughts about gesture recognition in an X
environment over the past week. I haven't had time to sit down and write
them until now due to the Maverick beta freeze deadline yesterday.

I think there's a fundamental issue that would be worth hammering out
above all other issues with the proposed X Gesture extension: should
gesture recognition occur server side, before events are propagated to
clients?

I've been doing some thinking based on feedback from Peter Hutterer. He
is of the mindset that gestures should be recognized on
the client side only, whereas the X Gesture extension recognizes
gestures on the server side. I've been trying to think of the reasons
for us going with server side handling so far, and here's what I've come
up with:

1. Say you have two windows side by side and you do a four finger drag
with two fingers over each window. With X Gesture, the four fingers
would be recognized together and sent only to the root window, where
a window manager might be listening. This is because all fingers of a
gesture must start within the same window, and the recognition occurs
without any context of where windows are on screen. One could view this
as a policy decision we have made.

2. If recognition occurs client side, it needs to occur twice so that a
window manager can attempt recognition and replay events to clients who
also attempt recognition. Anecdotally, I think our utouch-grail gesture
engine is fast enough to do recognition twice with little latency, but
it wouldn't seem to be the most optimal solution.

3. We don't have access to MT events on the client side in Maverick.

I think I've been conflating the third issue with general gesture
recognition and propagation logic to the point that when I look back
it's hard to remember exactly why we need recognition to be server side
due to point one alone. I know that we have to do gestures through X in
Maverick due to a lack of MT events, but maybe that's been coloring my
view on how it should be done in the future when MT events are
available?

It's true that the logic behind point one may be perfectly fine, but
having recognition through X inserts a lot of extra code into the
server. If we are content with touches being split up into window
regions before recognition occurs, then we may be able to get around the
need for the X Gesture extension completely. The window manager use case
could be supplied through input grabbing and event replaying.

However, what's the best way to resolve point 2?

We have not yet begun development for the next Ubuntu cycle, but we will
be tackling it shortly. We are open to trying out any approach that
seems reasonable, whether it's client side or server side recognition.
At this point I'm on the fence due to the amount of work required to
implement the X gesture extension vs the potential latencies encountered
by gesture recognition being performed twice.

Thanks,

-- Chase