New approach to multitouch using DIDs and bitmasked events

Chase Douglas chase.douglas at canonical.com
Fri Aug 27 07:59:47 PDT 2010


Hi Daniel,

I've tried to grok both Peter's and your proposals, and I've written
comments below. Overall, the approaches seem similar and the differences
seem more cosmetic than architectural.

On Mon, 2010-08-23 at 23:32 +1000, Daniel Stone wrote:
> Hi,
> Sorry for the delay on this one - holidays, moving continent, work, etc.
> 
> I've been approaching this from a slightly different standpoint.  Let me
> explain ...
> 
> These were my goals/invariants:
>     * be conservative in event delivery - better to drop an event and
>       require a second press than to accidentally deliver an event to
>       two clients
>     * touches do not change targets during their lifetime, akin to the
>       implicit grab on button press
>     * touch tracking must be done on the server side, if not already
>       done by the driver
>     * have the least painful API possible for clients and drivers alike
>     * the simplest solution is probably the best
>     * keep latency (i.e. event generation -> client reception) as low as
>       possible
> 
> I feel that DIDs are ruled out by #4 and #5.  Event delivery is already
> fairly nightmarishly complex, and I don't think we want to make it any
> more complicated here.

Sounds reasonable to me.

> Server-side gesture recognition is ruled out by #6, as well as others:
> gestures aren't uniformly meaningful, and interpreting them correctly
> requires knowledge that can only be found in the client.  Similarly, I'm
> not entirely convinced that it will be possible to develop a usefully
> generic gesture engine for quite a few reasons, including but not
> limited to licensing.

I'll be addressing this more in a separate email. However, I'll address
two issues here. One reason to have a gesture recognizer be a client of
X server is that licensing issues are avoided. The client approach also
allows for an extensible gesture recognizer that can be swapped out as
needed to provide for different gestures, but I think there really are
only a handful of primitive gestures that are useful to recognize.

(the above is just to think about, please discuss these issues in the
gesture extension thread.)

> This also rules out anything that requires a round trip before event
> delivery can actually occur, e.g. an XEvIE-like approach where input
> events are first sent to a client which mangles the event and/or tells
> the server where to direct it, before the server later sends it out for
> real.

I understand the goal, but there's an issue here. Either we do server
side gesture recognition, or we will have to do round trip processing
for window manager gestures. Take the following use case:

1. MT events are generated
2. Events are sent to window manager through passive grabbing
3. Gesture recognition occurs, no window manager gesture found
4. Window manager replays MT events to other clients
5. Client receives MT events
6. Client performs gesture recognition again

I don't see any way around this if there's no protocol for sending
gesture event data through X. However, I also believe that the gesture
recognition delays are minimal enough that this may actually work
without any noticeable delays.

> The approach I took was to introduce a separate TouchClass, which reuses
> existing valuators rather than adding a new TouchAxisClass.  We already
> have valuators which (IMO) already have all the semantics we need.

Why is there a touch id in your TouchClass? I think it's because you
want to match up actual touch ids to tool types, but that means the
input driver needs to keep track of touch ids and ensure the appropriate
events are sent with the appropriate touch id.

Peter's approach is actually extremely close to the valuators approach
you are suggesting. If you compare the AXISCLASS and the TOUCHAXISCLASS
in his proposal they are identical. The reason you need them both is
because of devices like the magic mouse, where you need AXISCLASS for
relative motion, but TOUCHAXISCLASS for the multitouch surface. If you
mix the two up, you won't know what classes are for touch axes and what
are for normal axes. It may be that Peter's approach could be optimized,
but I don't think the TouchClass proposed here is the right solution.

I'm also worried about mixing tool types in one device. If I have a
multitouch + pen tablet, I think the pen should be one device and
touches be another. Say you have Inkscape open and you want to draw with
the pen and manipulate the canvas (zooming, scrolling) with your
fingers. I think it would be easier for Inkscape if they were separate
input devices.

> Here's the lifecycle of a typical touch event:
>   * TouchNotify event sent to selecting client with detail TouchBegin,
>     which includes a bitmask of valuators used for the touch.
>   * All motion events as part of this touch are sent to that client
>     only.
>   * When the finger is lifted, TouchNotify event sent with TouchEnd.

You're essentially forcing passive grabbing of touches. I think that's
probably ok in this day and age, but XInput already has the concept of
passive grabbing and it works well. Why not continue that concept for
multitouch, at least just to be consistent? And who knows if someone
will want a touch to fall through to other windows when dragged outside
of the originating window.

> 'Selecting client' has the same meaning as it does for MotionNotify and
> co: walk back up the tree from the window immediately under the finger
> until we find a client selecting for TouchNotifies.  I don't believe
> that grabbing in response to a touch is practical: at best, you miss the
> first events that started the touch, and at worst, you miss the touch
> completely.

This is why we have passive grabs that activate in the server when the
touch begins. One could make the argument that active grabs are useless
for multitouch, but again why not put them in there if it's already part
of the protocol and implemented for single touches?

> Eagle-eyed readers might note that this only covers absolute devices
> rather than relative.  I can't see any way we can cleanly support
> multiple focii with touchpads, nor any usecase for doing so.
> Relative-mode devices thus always use the current focus of the device as
> the basis for delivery.

I think we're all in agreement here :)

> As touch events don't generate core events[0], a core grab has no effect
> on the further delivery of touches.  I haven't covered XI2 grabs in my
> first draft, but I think Peter's proposal for those looks reasonable,
> with the exception of the num_touches field.

I think the num_touches field was only a part of one of Peter's early
revisions. It's gone now.

> I've attached the protocol diff, and the driver API looks like this:
>   int touch_id = xf86TouchBegin(dev, tool);
>   if (touch_id < 0)
>     FatalError("mein leben\n");
>   xf86PostTouchEvent(dev, touch_id, x, y, touch_maj, touch_min,
>                      width_maj, width_min, orientation);
>   void xf86TouchEnd(dev, touch_id);
> 
> I think this is the cleanest approach so far, which requires minimal
> work on the toolkit side of things while still doing everything we need
> to.  Am I missing something?

I think the devil is in the details. Much of what you have seems similar
to Peter's spec, but Peter goes further to support things like all the
combinations of grabs. My gut feeling is that if you started
implementing this spec, and Peter started implementing his spec, you
would both find issues and eventually come to extremely similar end
points.

I would be interested in seeing your thoughts on Peter's proposal as an
inline discussion of areas that you think could be improved. It's hard
to compare the two side by side without the authors comparing and
contrasting their approaches themselves :).

-- Chase



More information about the xorg-devel mailing list