[PATCH inputproto xi 2.1] Updates for pointer emulation and more touch device modes

Chase Douglas chase.douglas at canonical.com
Wed Mar 2 08:35:41 PST 2011

On 03/02/2011 05:58 AM, Daniel Stone wrote:
> On Tue, Feb 22, 2011 at 10:06:37AM -0500, Chase Douglas wrote:
>> --- a/XI2.h
>> +++ b/XI2.h
>> @@ -32,6 +32,7 @@
>>  #define Dont_Check                              0
>>  #endif
>>  #define XInput_2_0                              7
>> +#define XInput_2_1                              8
> Peter, what was happening with this hunk?
>> @@ -132,16 +133,16 @@
>>  /* Device event flags (common) */
>>  /* Device event flags (key events only) */
>>  #define XIKeyRepeat                             (1 << 16)
>> -/* Device event flags (pointer events only) */
>> +/* Device event flags (pointer and touch events only) */
>>  #define XIPointerEmulated                       (1 << 16)
>>  /* Device event flags (touch events only) */
>> -#define XITouchPendingEnd                       (1 << 16)
>> -/* Device event flags (touch end events only) */
>> -#define XITouchAccepted                         (1 << 17)
>> +#define XITouchPendingEnd                       (1 << 17)
> Is there any particular reason to unify the two sets of flags? I guess
> it can't really hurt as we shouldn't be using sixteen flags between
> pointer and touch, but eh.

We don't have to unify them per se, we could leave them independent.
However, we need XIPointerEmulated for both touch events and pointer
events. As an example, through Qt one can query which touch point is
associated with an emulated pointer event. To do this, we need a way to
designate that touch in the protocol. Reusing the XIPointerEmulated flag
for the associated touch seems to be a reasonable solution to me.

>> @@ -205,62 +208,145 @@ to touch the device. The init and destroy stage of this sequence are always
>>  present, while the move stage is optional. Within this document, the term
>>  "touch sequence" is used to describe the above chain of events. A client
>>  wishing to receive touch events must register for at least TouchBegin,
>> -TouchOwnership, TouchUpdate, and TouchEnd simultaneously; it may also select
>> -for TouchUpdateUnowned events if it wants to receive the full touch stream,
>> -rather than just the final state.
>> +TouchUpdate, and TouchEnd simultaneously. It may also select for
>> +TouchUpdateUnowned and TouchOwnership events if it wants to receive the full
>> +touch stream while other clients own or have active grabs involving the touch.
> I'm not particularly happy with this hunk, as it means we'll be
> delivering TouchOwnership events to clients who haven't selected for
> them.  I think it was fairly clear as it is: you must always select for
> TouchBegin, TouchOwnership, TouchUpdate and TouchEnd.  If you also want
> unowned events, you select for TouchUpdateUnowned as well.

When would we ever need to send an ownership event if the client didn't
select for it? If you don't select for ownership and update unowned, you
won't receive any events until you have become the owner of the touch.
When you receive the begin event, you already know you're the owner, so
an ownership event isn't needed.

>> -delivered to all clients with grabs in the window tree, as well as the client
>> -with the deepest selection.  The first client may either “accept” the touch,
>> -which claims the touch sequence and stops delivery to all other clients for
>> -the duration of the touch sequence, or “reject” the touch sequence, which
>> +delivered to all clients with grabs in the window tree, as well as potentially
>> +the client with the deepest selection.  The first client may either “accept” the
>> +touch, which claims the touch sequence and stops delivery to all other clients
>> +for the duration of the touch sequence, or “reject” the touch sequence, which
> The 'stops all other delivery' bit may have to be removed per Peter's
> comments about TouchRejectContinue/TouchBeginInert, but that was my
> fault, not yours. :)

Yeah, I've kind of left the inert/observer stuff by the side for now.

>> -next client.
>> +next client. When a client, including the initial owner, becomes the owner of a
>> +touch, it will receive a TouchOwnership event. When an owning client accepts a
>> +touch, further clients receiving unowned events will receive TouchEnd events.
> Same here.
>> +Clients selecting for touch events may select for either unowned events or only
>> +owned events. The event stream for an unowned selection is identical to a touch
> 'must select for owned events and may optionally also select for unowned
> events'?

That's a much better way to word it :).

>> +grab. When a client does not select for unowned and ownership events, it will
>> +receive a TouchBegin event when it becomes the owner of a touch stream.
>> +TouchUpdate and TouchEnd events will be received in the same manner as for touch
>> +grabs.
> I think it could be clearer to state that:
>     * clients always receive TouchBegin events immediately before they
>       start receiving any other events for that touch sequence
>     * TouchUpdateUnowned events, if selected for, will be sent while the
>       client does not own the touch sequence
>     * a TouchOwnership event will be sent when the client becomes the
>       owner of a touch stream, followed by a sequence of TouchUpdate
>       events
>     * a TouchEnd event will be sent when no further events will be sent
>       to this client for the touch sequence: when the touch has
>       physically ended, when the client has called AllowTouchEvents with
>       TouchRejectEnd, when the touch grab owner has called
>       AllowTouchEvents with TouchAccept, or the pointer grab owner has
>       called AllowEvents with Async{Pointer,Both}.

This doesn't match what I wrote above :). As I noted in an earlier
comment, we don't need to send ownership events to clients that don't
select for unowned events. This makes the client code much cleaner too,
as they will only have to handle begin, update, and end events.

> Since it's fairly complicated, I'd like to make the spec as
> straightforward to follow as possible, to reduce potential confusion.
> (Again, this is as much my fault as anyone's: I need to go through and
> reorder/reword this at some point to be a lot more clear.  Looks like I
> have a long plane flight in a couple of weeks, which would be a prime
> opportunity.)

Perhaps it would be more straightforward if we had two bulleted lists,
one for clients selecting for unowned events, and one for when they only
select for owned events.

>> +Only one client may select or grab touch events for a device on a window. As an
>> +example, selecting for AllDevices will prevent any other client from selecting
>> +on the same window.
> So, would return BadAccess?

Does it need to be stated here? It seems like this belongs in the
protocol request definition section.

>> When a slave device is attached to a master device, any
>> +selections on any windows for touch events for the slave device ID will be
>> +canceled. Clients selecting for individual slave devices are suggested to select
>> +for HierarchyChanged events to be notified when this occurs.
> Hm, this is the inverse to what I was hoping, which was in order of
> least to most specific grab/selection:
> for (win = root; win; win = win->child)
>     look for a touch grab on the specific slave ID
>     look for a touch grab on the specific master ID, if any
>     look for a touch grab on XIAllMasterDevices, if master attached
>     look for a touch grab on XIAllDevices
>     look for pointer grabs in the same order as touch grabs
> for (win = child; win; win = win->parent)
>     look for selections in the same order as grabs above

I'm a little confused here :). The text you are commenting under refers
to what happens when a slave device is attached to a master device, but
your comment is referring to the order of processing of grabs. I'm going
to assume your comment has no relation to the text it is under.

Now for my second confusion. The algorithm you describe matches what
I've implemented in the server, and what I thought I had described here.
I suppose the only difference is that I allow for sending both pointer
and touch events to selecting clients on the same window. I'll address
this distinction in a later comment.

>> +DirectTouch:
>> +    These devices map their input region to a subset of the screen region. Touch
>> +    events are delivered according to where the touch occurs in the mapped
>> +    screen region. An example of a DirectTouch device is a touchscreen.
> s/events are delivered/focus is determined/, since a touch event that
> starts over a child window and moves to be over the root window only
> will continue to be delivered to the child.


>> +DependentTouch:
>> +    These devices do not have a direct correlation between a touch location and
>> +    a position on the screen. Touch events are delivered according to the
>> +    location of the pointer on screen. An Example of a DependentTouch device
>> +    is a trackpad.
> s/the pointer on screen/the device's cursor/


>> +IndependentPointer:
>> +    These devices do not have any correlation between touch events and pointer
>> +    events. IndependentPointer devices are a subset of DependentTouch devices.
>> +    An example of an IndependentPointer device is a mouse with a touch surface.
> This doesn't really explain the difference between IndependentPointer
> and DependentTouch: that DependentTouch devices will generate cursor
> motion from the touch surface, whereas IndependentPointer (of which
> there is only one known device at the moment, I belive[0]) devices have
> separate relative axes which generate pointer motion.
> There's little-to-no semantic difference to clients in terms of the
> protocol, but I can imagine they'd want to handle the resulting events
> differently, so.  (But see below.)

I see your point. I'll try to remedy this.

>> +SemiMultitouch:
>> +    These devices may report touch events that correlate to the two opposite
>> +    corners of the bounding box of all touches. The number of active touch
>> +    sequences represents the number of touches on the device, and the position
>> +    of any given touch event will be equal to either of the two corners of the
>> +    bounding box. However, the physical location of the touches is unknown.
>> +    SemiMultitouch devices are a subset of DependentTouch devices. Although
>> +    DirectTouch and IndependentPointer devices may also be SemiMultitouch
>> +    devices, such devices are not allowed through this protocol.
> Hmmm.  The bounding box being based on corners of separate pointers
> seems kind of a hack to me.  I'd much rather have the touches all be
> positioned at the midpoint, with the bounding box exposed through
> separate axes.

I think the question that highlights our differences is: "Should we
attempt to handle these devices in the XI 2.1 touch protocol, or fit
them into the pointer protocol?" In Linux, it's been determined that
these devices will be handled as multitouch devices. The evdev client
sees a device with two touch points that are located at the corners of
the bounding box. The normal synaptics-style event codes for describing
the number of fingers are used to denote how many touches are active in
the bounding box.

I'm of the mindset that these devices should be handled as described in
XI 2.1. However, I could be persuaded to handle these devices by
treating them as traditional pointing devices + 5 valuators for
describing the bounding box and how many touches are active.

> The last sentence also makes me slightly nervous; it seems like we want
> SemiMultitouch to actually be an independent property, whereby a device
> is Direct, Independent or Independent, and then also optionally
> semi-multitouch.  (Possibly just exposing the bounding box axes would be
> enough to qualify as semi-multitouch.)  In fact, IndependentPointer
> could be similarly be a property of some DependentTouch devices as well.

I thought about this, but there's a few reasons I did it this way:

1. If you want to make it an independent property, then we should change
the mode field to a bitmask. The field is only 8 bits right now, so we
could run out of bits very quickly. However, treating the field as an
integer as it is today allows for 255 variations. We can always revisit
and add in semi-mt + independent pointer as a new mode later on.

2. Semi-mt and direct touch doesn't make sense. You don't know where
touches are, so you don't know which window to direct events to if the
bounding box spans multiple windows.

3. I believe semi-mt is a dead technology now. I've only ever seen it in
touchpads, and I don't think they'll ever expand beyond that scope. We
can always add another device mode if needed.

>> +A device is identified as only one of the device modes above at any time. For
>> +the purposes of this protocol, IndependentPointer and SemiMultitouch devices are
>> +treated the same as DependentTouch devices unless stated otherwise.
> It would be nice to either go through and clarify every one of these
> cases, or if we end up keeping these two as separate classes, introduce
> new unambiguous terminology for the set of all three classes.

A good idea. I'll try to think of a better naming scheme.

>> +In order to prevent touch events delivered to one window while pointer events
>> +are implicitly grabbed by another, all touches from indirect devices will end
>> +when an implicit grab is activated on the slave or attached master device. New
>> +touches may begin while the device is implicitly grabbed.
> This bit makes me _nervous_.  Unfortunately we can only activate one
> pointer grab at a time, but I'd rather do something like this:
>     * populate the window set with the pseudocode described near the top
>       when the touch begins, regardless of the pointer state
>     * generate touch events as normal
>     * if ownership is passed to a pointer grab/selection, skip it if
>       a pointer grab is already active on the delivering device (the MD
>       if the selection was on the MD ID or XIAllMasterDevices, otherwise
>       the SD)
> It's unpleasant, but I don't like ending all touch events as soon as we
> start pointer emulation (which will happen a fair bit).  Also: why is
> this different for direct and indirect devices? Doesn't this completely
> kill multi-finger gestures if _any_ client (e.g. the WM) has a pointer
> grab anywhere in the stack?
> This bit will definitely require more thought.

I think you're mixing up a lot of things here :). First, we're only
talking about indirect devices where there's no pointer emulation.
Second, we're only talking about implicit grabs that are activated when
a button is pressed.

However, this does bring up a good point. What do we do when a touch
begins on an indirect device that is actively grabbed. What do we do
when a grab is activated?

I feel as though the only sound thing to do for indirect devices is to
cancel all touches when any grab is activated, and to not begin any
touch sequences while any grab is active. This is an extremely heavy
handed solution to the problem, but I can't think of anything better
that wouldn't introduce holes into the protocol. Further, there are
normally two scenarios where grabs are used:

1. When a button is pressed. For all multitouch gesture work I've seen
(and I'm unaware of any other usage of multitouch for indirect devices),
no button are pressed while multitouch events are being handled.

2. When doing funky things like confine-to. Hopefully pointer barriers
are a better solution for this, so we can just say we don't support MT +
pointer grabs.

Based on all this, I don't think we'll be missing that much if we go
with this approach. Our hands are tied by legacy X protocol choices, and
this isn't the only compromise we're making :).

>> +Many touch devices will emit pointer events as well, usually by mapping one
>> +touch sequence to pointer events. In these cases, events for both the pointer
>> +and its associated touch sequence will have the XIPointerEmulated flag set.
> I think we can move this section into pointer emulation, and make sure
> that it's clearly stated that all pointer events from touch devices will
> be emulated.

+1 on moving the text. However, the second point isn't true. An
independent pointer device does not emulate any pointer events.

>> +4.4.4 Pointer emulation for direct touch devices
>> +
>> +In order to facilitate backwards compatibility with legacy clients, direct touch
>> +devices will emulate pointer events. Pointer emulation events will only be
>> +delivered through the attached master device; no pointer events will be emulated
>> +for floating touch devices. Further, only one touch from any attached slave
>> +touch device may be emulated per master device at any time.
> Indirect devices won't do pointer emulation? How about touchpads?

I think this is a semantics issue that should be addressed. Direct touch
devices perform pointer emulation in a specific manner as outlined here.
Indirect devices have pointer emulation of sorts, but there's nothing
special about it.

>> +A touch event stream must be delivered to clients in a mutually exclusive
>> +fashion. This extends to emulated pointer events. For the purposes of
>> +exclusivity, emulated pointer events between an emulated button press and
>> +button release are considered. An emulated button press event is considered
>> +exclusively delivered once it has been delivered through an event selection, an
>> +asynchronous pointer grab, or it and a further event are delivered through a
>> +synchronous pointer grab.
> 'in a mutually exclusive fashion': could you elaborate?

I thought the rest of the paragraph was the elaboration you are looking
for. What do you feel is missing?

>> +Touch and pointer grabs are also mutually exclusive. For a given window, any
>> +touch grab is activated first. If the touch grab is rejected, the pointer grab
>> +is activated. If an emulated button press event is exclusively delivered to the
>> +grabbing client as outlined above, the touch sequence is ended for all clients
>> +still listening for unowned events. Otherwise, when the pointer stream is
>> +replayed the next window in the window set is checked for touch grabs.
> Buh.  If we're going to do this, we might as well allow multiple touch
> selections on the same window (e.g. if there are grabs on both the slave
> ID and XIAllDevices, deliver first to the slave grab, then to
> XIAllDevices).  Not that that's necessarily a bad idea, mind, but I'd
> like some consistency between touch and pointer here: either one grab
> per window, or multiple.

It is my understanding that only one client may grab a device per
window, which also means one client can't grab XIAllDevices while
another grabs a specific device.

The only other point here is whether one client can grab the master
device while another client grabs the slave device. However, when a
slave device is grabbed it is detached from the master device. So I
think the point is moot.

>> +If the touch sequence is not exclusively delivered to any client through a grab,
>> +the touch and emulated pointer events may be delivered to clients selecting for
>> +the events. Event propagation for the touch sequence ends at the first client
>> +selecting for touch and/or pointer events. Note that a client may receive both
>> +touch and emulated pointer events for the same touch sequence through event
>> +selection.
> Oh? So if someone has selected for both pointer and touch events on the
> same window, they receive both the touch events and the emulated pointer
> stream? How about if different clients select on the window? How does
> that work given that clients with selections cannot currently assert or
> reject ownership? Surely both the touch and pointer selections will then
> think they're the owner ... so either we're pointlessly delivering both
> the touch events and the emulated pointer events to the same client, or
> two clients think they're the owner of the touch stream.  Either way,
> it's bad news.

The X protocol has always had this property that if you select for
pointer events, you can't assume exclusivity of event delivery. This is
in contrast to pointer grabs, where you do have exclusivity.

There's nothing that prevents one client from selecting for touches
while another client selects for pointer events on the same window.
However, there is a clear distinction: the pointer selecting client
knows that it may not be the only receiver of events, while the touch
selecting client knows it has exclusive right to the touch events.

Also, delivering an emulated pointer and its associated touch event
isn't pointless. It's how Windows handles things today, so toolkits like
Qt are set up to deal with this situation. One could argue that Qt
could/should be handling things differently for XI 2.1, but I don't have
a good argument why we should force them to.

>> @@ -866,6 +949,9 @@ are required to be 0.
>>      master
>>          The new master device to attach this slave device to.
>> +    If any clients are selecting for touch events from the slave device, their
>> +    selection will be canceled.
> Does that mean the selection will be removed completely, and the
> selection will no longer be present if the SD is removed, and all
> clients are required to re-select every time the hierachy changes, or?

If the SD is removed, then all event selections are already canceled
aren't they? If not, that seems like a broken protocol. Device IDs are
reused, so you might end up selecting for events from a different device
than you meant to.

Clients only are required to re-select when the specific slave device
they care about is attached, not on every hierarchy change.

> I'd prefer to just remove this bit completely.

Got any other suggestion? This is due to the fact that only one client
may select for touch events on a window from a device at a time. When
you attach, this rule could be broken unless you do something about it.

>> @@ -1538,9 +1624,9 @@ are required to be 0.
>>      sequence to direct further delivery.
>>      deviceid
>> -        The grabbed device ID.
>> +        The slave device ID for a grabbed touch sequence.
>>      touchid
>> -        The ID of the currently-grabbed touch sequence.
>> +        The ID of the touch sequence to modify.
> Good catches, thanks.
> The rest looks fairly solid to me, although I'm worried enough about the
> above - and particularly how we'll handle delivery/pointer emulation
> when a pointer grab is already active on the device - that I really
> don't want to cut an RC now.  I don't think we can really commit to
> semantics for a lot of this until we've seen a working implementation
> with a full stack; at the moment, we don't have one upstream, and
> Ubuntu's seems to be in enough flux that I don't think it's settled down
> enough to be able to say that the semantics are necessarily what we
> want.

There's no need for an rc per se, I just thought some sort of upstream
release would be helpful, even if it's just called an "alpha". We're
getting by without an official upstream release of any sort, so this
isn't a huge deal.

As for our stack, it's pretty settled for most uses at this point.
Remember that most clients will just be selecting for begin, update,
end. For example, we have Qt with multitouch in ubuntu, and it only
selects for those three events. Direct touch devices in particular work
very well. The comments above highlight issues with indirect devices,
but they are corner cases that don't really come up much in usage. If we
discard touches during active indirect device pointer grabs, I think
we'll cover 99% of the use cases, and it's a pretty simple change.

As for what happens when using a direct device during an active pointer
grab, we essentially skip all grabs above the pointer grab window and
continue from there. This should be noted in the spec somewhere, but in
practice it works.

To give an idea of the breadth of testing we've got so far, here's one
thing we've been doing:

Compiz plugin with a touch grab on the root window that always rejects
Compiz has a passive grab on non-focused windows
Qt fingerpaint application has touch selections
Qt fingerpaing application actively grabs when you click on a drop down menu

The active grab for the drop down menu works when you touch on the menu
title, drag to the menu selection you want, and release. It also works
when you tap it, a grab is placed on the pointer, and then you tap again
on the menu selection you want or anywhere else on screen.

I know the protocol document is in flux, but in reality I think we are
very close to a fully working implementation.

> Sorry this has taken so long, but some of it was non-trivial.  I'll try
> to be more responsive from now on and let you know when I'm back working
> on the implementation, but that's probably 2-3 weeks away at the moment.
> Cheers,
> Daniel
> [0]: The Logitech Air mouse would qualify, but it seems to do its
>      gesture recognition (for scroll events only) in hardware, so.
>      Also, it's crap.  I bought it to make it work under Linux, turns
>      out it was actually perfectly HID-compliant and worked out of the
>      box, but was basically unusable and pointless.  $100 down the
>      drain.  But I digress.

MS has two touch surface mice, one already available. Another company
I've never heard of before has a similar mouse, but I can't remember any
other details.


-- Chase

More information about the xorg-devel mailing list