[PATCH inputproto xi 2.1] Updates for pointer emulation and more touch device modes

Tue Mar 8 07:24:42 PST 2011

On 03/08/2011 12:41 AM, Peter Hutterer wrote:
> On Wed, Mar 02, 2011 at 11:35:41AM -0500, Chase Douglas wrote:
>> On 03/02/2011 05:58 AM, Daniel Stone wrote:
>>> On Tue, Feb 22, 2011 at 10:06:37AM -0500, Chase Douglas wrote:
>>>> @@ -132,16 +133,16 @@
>>>>  /* Device event flags (common) */
>>>>  /* Device event flags (key events only) */
>>>>  #define XIKeyRepeat                             (1 << 16)
>>>> -/* Device event flags (pointer events only) */
>>>> +/* Device event flags (pointer and touch events only) */
>>>>  #define XIPointerEmulated                       (1 << 16)
>>>>  /* Device event flags (touch events only) */
>>>> -#define XITouchPendingEnd                       (1 << 16)
>>>> -/* Device event flags (touch end events only) */
>>>> -#define XITouchAccepted                         (1 << 17)
>>>> +#define XITouchPendingEnd                       (1 << 17)
>>>
>>> Is there any particular reason to unify the two sets of flags? I guess
>>> it can't really hurt as we shouldn't be using sixteen flags between
>>> pointer and touch, but eh.
>>
>> We don't have to unify them per se, we could leave them independent.
>> However, we need XIPointerEmulated for both touch events and pointer
>> events. As an example, through Qt one can query which touch point is
>> associated with an emulated pointer event. To do this, we need a way to
>> designate that touch in the protocol. Reusing the XIPointerEmulated flag
>> for the associated touch seems to be a reasonable solution to me.
> 
> I'm not sure how well that works in practice given that pointer events may
> be held up by grabs, whereas touch events are delivered immediately. so even
> if you have both pointer and touch event flagged, you may not get one event
> until much later.

True, this may not actually be effective for matching up a pointer event
to a touch sequence. However, it's still useful in that it tells us if a
pointer event is emulated and whether a touch sequence has generated
emulated pointer events.

A more precise solution would be to give the touch id of the emulated
touch sequence in the pointer events, but there doesn't appear to be any
fields available for this. Further, it still won't help the fact that
you may receive the emulated pointer events without ever seeing the
touch sequence, in the case of an indirect device. I think this may be
the best we can do.

>>>> @@ -205,62 +208,145 @@ to touch the device. The init and destroy stage of this sequence are always
>>>>  present, while the move stage is optional. Within this document, the term
>>>>  "touch sequence" is used to describe the above chain of events. A client
>>>>  wishing to receive touch events must register for at least TouchBegin,
>>>> -TouchOwnership, TouchUpdate, and TouchEnd simultaneously; it may also select
>>>> -for TouchUpdateUnowned events if it wants to receive the full touch stream,
>>>> -rather than just the final state.
>>>> +TouchUpdate, and TouchEnd simultaneously. It may also select for
>>>> +TouchUpdateUnowned and TouchOwnership events if it wants to receive the full
>>>> +touch stream while other clients own or have active grabs involving the touch.
>>>
>>> I'm not particularly happy with this hunk, as it means we'll be
>>> delivering TouchOwnership events to clients who haven't selected for
>>> them.  I think it was fairly clear as it is: you must always select for
>>> TouchBegin, TouchOwnership, TouchUpdate and TouchEnd.  If you also want
>>> unowned events, you select for TouchUpdateUnowned as well.
>>
>> When would we ever need to send an ownership event if the client didn't
>> select for it? If you don't select for ownership and update unowned, you
>> won't receive any events until you have become the owner of the touch.
>> When you receive the begin event, you already know you're the owner, so
>> an ownership event isn't needed.
> 
> daniel's approach requires that the touchbegin is sent immediately to all
> clients, the ownership when a client receives the ownership. your approach
> holds the touch begin until the client becomes the owner, thus being
> more-or-less the equivalent of the ownership event in daniel's apparoch.

Correct, in the case of selecting only for owned events.

>>>> +grab. When a client does not select for unowned and ownership events, it will
>>>> +receive a TouchBegin event when it becomes the owner of a touch stream.
>>>> +TouchUpdate and TouchEnd events will be received in the same manner as for touch
>>>> +grabs.
>>>
>>> I think it could be clearer to state that:
>>>     * clients always receive TouchBegin events immediately before they
>>>       start receiving any other events for that touch sequence
>>>     * TouchUpdateUnowned events, if selected for, will be sent while the
>>>       client does not own the touch sequence
>>>     * a TouchOwnership event will be sent when the client becomes the
>>>       owner of a touch stream, followed by a sequence of TouchUpdate
>>>       events
>>>     * a TouchEnd event will be sent when no further events will be sent
>>>       to this client for the touch sequence: when the touch has
>>>       physically ended, when the client has called AllowTouchEvents with
>>>       TouchRejectEnd, when the touch grab owner has called
>>>       AllowTouchEvents with TouchAccept, or the pointer grab owner has
>>>       called AllowEvents with Async{Pointer,Both}.
>>
>> This doesn't match what I wrote above :). As I noted in an earlier
>> comment, we don't need to send ownership events to clients that don't
>> select for unowned events. This makes the client code much cleaner too,
>> as they will only have to handle begin, update, and end events.
> 
> the danger I see in your spec however is that there is no clear mapping
> between the actual touch begin and the one sent to the client. does the
> TouchBegin still contain the original coordinates or the current ones? what
> about update events that happened between the physical begin and the
> TouchBegin. are they buffered and re-sent or just dropped or compressed. 
> you mentioned dropping with your ring buffer, but that's an implementation
> detail not explained elsewhere.
> 
> does the TouchBegin have the same timestamp as the actual touch begin or the
> timestamp of when sent to the client?
> for delayed touches (because a grabbing client takes a while), the
> time between TouchBegin and TouchOwnership can be a worthy piece of
> information that is otherwise not available. The mere fact that touch is
> currently used can be interesting to a client, even if it never receives the
> touch event.

This does need to be stated in the spec, I just forgot about it :).

First, I believe my approach is better than using ownership events when
the client only selects for owned events. It's not clear to me which of
the two sequences below Daniel is proposing:

1. Touch physically initiates
2. TouchBegin sent to client
3. All grabbing clients reject/replay touch
4. TouchOwnership sent to client
5. TouchUpdates sent to client
6. TouchEnd sent to client

1. Touch physically initiates
2. All grabbing clients reject/replay touch
3. TouchBegin sent to client
4. TouchOwnership sent to client
5. TouchUpdates sent to client
6. TouchEnd sent to client

In the first sequence of events, we are needlessly waking up the
selecting client at the beginning of each touch if the touch sequence is
grabbed and handled above. We would also be waking up the client when
the TouchEnd event is sent.

In the second sequence of events, the ownership event is superfluous.
There's no extra information to be gleaned from it. If I were writing a
client, I'd select for ownership just cause I was forced to, and then
discard the ownership events. That doesn't seem like a good API to me :).

There's something to be said for keeping the same semantics throughout
the api, but this only holds when dealing with the same context. I don't
believe using touch ownership events when selecting only for owned
events fits the context.

In my implementation, the touch begin event is saved off (separate from
the ring buffer) so we can replay it when the selecting client receives
ownership. Touch update events are saved into the ring buffer as they
are generated. When the ring buffer overruns, the oldest touch update
event is overwritten by the newest event. Thus, the client will receive
the touch begin event with the correct begin coordinates, then the first
touch update event may jerk the touch to a far away location if the ring
buffer overruns, and then the last N update events will be smooth. One
could say that the overran events at the beginning of the touch sequence
are motion compressed together.

My implementation sets the timestamp of the touch events as they are
sent to the client, so the timestamp of replayed events will not match
the timestamp of the original events as sent to the grabbing clients. I
don't see this as a problem because X timestamps just don't work for
multitouch events. Henrik Rydberg implemented a Kalman filter for
velocity estimation and compensation in utouch-frame, a library for
extracting touch events into frames for easier consumption by the
client. The library can work on top of mtdev or XI 2.1. When mtdev is
used, the evdev timestamps are used and the filter works well. When XI
2.1 is used we have to disable the filter because the X timestamps are
so wildly inaccurate. The correct solution, imo, is to add a valuator
axis to the devices whose value represents "device" time. On Linux, this
would be set to the timestamps from evdev. The valuator values of the
device events are copied into the ring buffer, so when they are replayed
the values would be representative of the original events.

As for clients wanting information on whether a touch event has ever
been owned by a grabbing client or the time difference between the
original touch begin event and when the client receives ownership, I
would say such clients should subscribe to unowned events as well. We
can't cater to every possible combination of use cases separately, and
such a use case seems specialized enough that the client can be expected
to jump through the extra hoops of unowned event handling to do so.

>>>> +SemiMultitouch:
>>>> +    These devices may report touch events that correlate to the two opposite
>>>> +    corners of the bounding box of all touches. The number of active touch
>>>> +    sequences represents the number of touches on the device, and the position
>>>> +    of any given touch event will be equal to either of the two corners of the
>>>> +    bounding box. However, the physical location of the touches is unknown.
>>>> +    SemiMultitouch devices are a subset of DependentTouch devices. Although
>>>> +    DirectTouch and IndependentPointer devices may also be SemiMultitouch
>>>> +    devices, such devices are not allowed through this protocol.
>>>
>>> Hmmm.  The bounding box being based on corners of separate pointers
>>> seems kind of a hack to me.  I'd much rather have the touches all be
>>> positioned at the midpoint, with the bounding box exposed through
>>> separate axes.
>>
>> I think the question that highlights our differences is: "Should we
>> attempt to handle these devices in the XI 2.1 touch protocol, or fit
>> them into the pointer protocol?" In Linux, it's been determined that
>> these devices will be handled as multitouch devices. The evdev client
>> sees a device with two touch points that are located at the corners of
>> the bounding box. The normal synaptics-style event codes for describing
>> the number of fingers are used to denote how many touches are active in
>> the bounding box.
>>
>> I'm of the mindset that these devices should be handled as described in
>> XI 2.1. However, I could be persuaded to handle these devices by
>> treating them as traditional pointing devices + 5 valuators for
>> describing the bounding box and how many touches are active.
>>
>>> The last sentence also makes me slightly nervous; it seems like we want
>>> SemiMultitouch to actually be an independent property, whereby a device
>>> is Direct, Independent or Independent, and then also optionally
>>> semi-multitouch.  (Possibly just exposing the bounding box axes would be
>>> enough to qualify as semi-multitouch.)  In fact, IndependentPointer
>>> could be similarly be a property of some DependentTouch devices as well.
>>
>> I thought about this, but there's a few reasons I did it this way:
>>
>> 1. If you want to make it an independent property, then we should change
>> the mode field to a bitmask. The field is only 8 bits right now, so we
>> could run out of bits very quickly. However, treating the field as an
>> integer as it is today allows for 255 variations. We can always revisit
>> and add in semi-mt + independent pointer as a new mode later on.
>>
>> 2. Semi-mt and direct touch doesn't make sense. You don't know where
>> touches are, so you don't know which window to direct events to if the
>> bounding box spans multiple windows.
>>
>> 3. I believe semi-mt is a dead technology now. I've only ever seen it in
>> touchpads, and I don't think they'll ever expand beyond that scope. We
>> can always add another device mode if needed.

I'm going to assume by the lack of comment here that you're satisfied
with this mode?

>>>> +In order to prevent touch events delivered to one window while pointer events
>>>> +are implicitly grabbed by another, all touches from indirect devices will end
>>>> +when an implicit grab is activated on the slave or attached master device. New
>>>> +touches may begin while the device is implicitly grabbed.
>>>
>>> This bit makes me _nervous_.  Unfortunately we can only activate one
>>> pointer grab at a time, but I'd rather do something like this:
>>>     * populate the window set with the pseudocode described near the top
>>>       when the touch begins, regardless of the pointer state
>>>     * generate touch events as normal
>>>     * if ownership is passed to a pointer grab/selection, skip it if
>>>       a pointer grab is already active on the delivering device (the MD
>>>       if the selection was on the MD ID or XIAllMasterDevices, otherwise
>>>       the SD)
>>>
>>> It's unpleasant, but I don't like ending all touch events as soon as we
>>> start pointer emulation (which will happen a fair bit).  Also: why is
>>> this different for direct and indirect devices? Doesn't this completely
>>> kill multi-finger gestures if _any_ client (e.g. the WM) has a pointer
>>> grab anywhere in the stack?
>>>
>>> This bit will definitely require more thought.
>>
>> I think you're mixing up a lot of things here :). First, we're only
>> talking about indirect devices where there's no pointer emulation.
>> Second, we're only talking about implicit grabs that are activated when
>> a button is pressed.
> 
> this needs to be specified then. AIUI, sending TouchBegin events activates
> implicit grabs too.

I didn't really like this idea, so I tried to come up with a better one.
It's at the end of one of my other emails. Essentially, don't cancel
touches, just don't send them to clients when the cursor has left the
touch selecting/grabbing window.

>> However, this does bring up a good point. What do we do when a touch
>> begins on an indirect device that is actively grabbed. What do we do
>> when a grab is activated?
>>
>> I feel as though the only sound thing to do for indirect devices is to
>> cancel all touches when any grab is activated, and to not begin any
>> touch sequences while any grab is active. This is an extremely heavy
>> handed solution to the problem, but I can't think of anything better
>> that wouldn't introduce holes into the protocol. Further, there are
>> normally two scenarios where grabs are used:
>>
>> 1. When a button is pressed. For all multitouch gesture work I've seen
>> (and I'm unaware of any other usage of multitouch for indirect devices),
>> no button are pressed while multitouch events are being handled.
> 
> tapping and scrolling both send button events that will likely be grabbed,
> even if temporarily only. that's usually on the MD though.

Tapping is just like any other button click, but I hadn't thought of
scroll events. This is just another reason to hate scroll as button
events :).

>> 2. When doing funky things like confine-to. Hopefully pointer barriers
>> are a better solution for this, so we can just say we don't support MT +
>> pointer grabs.
> 
> hoping that confine_to just disappears is not a good plan of action,
> regardless of pointer barriers.

I hope this is better resolved with the new proposal. It wouldn't
require any extra handling in the event of a confine_to grab.

>>>> +4.4.4 Pointer emulation for direct touch devices
>>>> +
>>>> +In order to facilitate backwards compatibility with legacy clients, direct touch
>>>> +devices will emulate pointer events. Pointer emulation events will only be
>>>> +delivered through the attached master device; no pointer events will be emulated
>>>> +for floating touch devices. Further, only one touch from any attached slave
>>>> +touch device may be emulated per master device at any time.
>>>
>>> Indirect devices won't do pointer emulation? How about touchpads?
>>
>> I think this is a semantics issue that should be addressed. Direct touch
>> devices perform pointer emulation in a specific manner as outlined here.
>> Indirect devices have pointer emulation of sorts, but there's nothing
>> special about it.
> 
> Then this needs to be stated in the spec. 
> "Independent touch devices do not feature pointer emulation, the device is
> expected to provide x and y coordinates through conventional axes."

Ok

>>>> +Touch and pointer grabs are also mutually exclusive. For a given window, any
>>>> +touch grab is activated first. If the touch grab is rejected, the pointer grab
>>>> +is activated. If an emulated button press event is exclusively delivered to the
>>>> +grabbing client as outlined above, the touch sequence is ended for all clients
>>>> +still listening for unowned events. Otherwise, when the pointer stream is
>>>> +replayed the next window in the window set is checked for touch grabs.
>>>
>>> Buh.  If we're going to do this, we might as well allow multiple touch
>>> selections on the same window (e.g. if there are grabs on both the slave
>>> ID and XIAllDevices, deliver first to the slave grab, then to
>>> XIAllDevices).  Not that that's necessarily a bad idea, mind, but I'd
>>> like some consistency between touch and pointer here: either one grab
>>> per window, or multiple.
>>
>> It is my understanding that only one client may grab a device per
>> window, which also means one client can't grab XIAllDevices while
>> another grabs a specific device.
> 
> fwiw, XIGrabDevice(XIAllDevices) will always fail with BadDevice.
> for passive grabs, the above is correct.

Good to know.

>>>> +If the touch sequence is not exclusively delivered to any client through a grab,
>>>> +the touch and emulated pointer events may be delivered to clients selecting for
>>>> +the events. Event propagation for the touch sequence ends at the first client
>>>> +selecting for touch and/or pointer events. Note that a client may receive both
>>>> +touch and emulated pointer events for the same touch sequence through event
>>>> +selection.
>>>
>>> Oh? So if someone has selected for both pointer and touch events on the
>>> same window, they receive both the touch events and the emulated pointer
>>> stream? How about if different clients select on the window? How does
>>> that work given that clients with selections cannot currently assert or
>>> reject ownership? Surely both the touch and pointer selections will then
>>> think they're the owner ... so either we're pointlessly delivering both
>>> the touch events and the emulated pointer events to the same client, or
>>> two clients think they're the owner of the touch stream.  Either way,
>>> it's bad news.
>>
>> The X protocol has always had this property that if you select for
>> pointer events, you can't assume exclusivity of event delivery. This is
>> in contrast to pointer grabs, where you do have exclusivity.
> 
> this only applies to Motion and Release events, not to Press events though.
> any client that selects for ButtonPress events expects exclusivity since it
> triggers an implicit passive grab. 
> assume we have two clients selecting for pointer and touch events
> respectively.  if we always deliver touch events first, I don't know how we
> can emulate pointer events to two clients since the device is already
> grabbed by then.

Good point.

> this pretty much comes down to two things:
> - we should specify that only one client may select for touch events on a
>   given window, just like for button press (I _think_ we may have this in
>   the protocol already)

That's the intention at least, even if it's not 100% clear yet :).

(To be precise, one client may select for touch events per physical
touch device per window.)

> - we need to decide if pointer emulation happens if the client selects for
>   pointer + touch events or if we trust the client to handle this situation
>
>> There's nothing that prevents one client from selecting for touches
>> while another client selects for pointer events on the same window.
>> However, there is a clear distinction: the pointer selecting client
>> knows that it may not be the only receiver of events, while the touch
>> selecting client knows it has exclusive right to the touch events.
>>
>> Also, delivering an emulated pointer and its associated touch event
>> isn't pointless. It's how Windows handles things today, so toolkits like
>> Qt are set up to deal with this situation. One could argue that Qt
>> could/should be handling things differently for XI 2.1, but I don't have
>> a good argument why we should force them to.
> 
> what do they do with the emulated pointer event? do they process it or
> discard it anyway?

It all depends on the widget that events propagate to. My understanding
is that widgets in Qt select for touch and pointer events independently,
just as in X. The widget will receive both types of events if it
subscribes to both. If a widget and its parents don't handle an event,
the event is discarded.

I'm hoping Denis will correct me if I'm mistaken :).

>>>> @@ -866,6 +949,9 @@ are required to be 0.
>>>>      master
>>>>          The new master device to attach this slave device to.
>>>>  
>>>> +    If any clients are selecting for touch events from the slave device, their
>>>> +    selection will be canceled.
>>>
>>> Does that mean the selection will be removed completely, and the
>>> selection will no longer be present if the SD is removed, and all
>>> clients are required to re-select every time the hierachy changes, or?
>>
>> If the SD is removed, then all event selections are already canceled
>> aren't they? If not, that seems like a broken protocol. Device IDs are
>> reused, so you might end up selecting for events from a different device
>> than you meant to.
>>
>> Clients only are required to re-select when the specific slave device
>> they care about is attached, not on every hierarchy change.
> 
> I guess daniel meant s/removed/reattached/, not as in "unplugged". But you
> answered the question, a client registering for touch events must re-select
> for touch events on every hierarchy change that affects the SD (including
> the race conditions this implies).
> 
> What is the reason for this again? If we already require clients to track
> the SDs, can we assume that they want the events from the device as
> selected, even if reattached?

We enforce one touch client selection per physical device per window at
selection request time. Let's say on the same window you have client A
selecting on detached slave device S, and client B selecting on
XIAllMasterDevices. When you attach device S to a master device, you now
have two competing selections. Do you send touch events to client A or
client B? I feel that client B has priority and client A's selection
should be cancelled. If you inverted the priority, you would break X
core and XI 1.x clients by removing their selections without them knowing.

Thanks,

-- Chase