[PATCH inputproto xi 2.1] Updates for pointer emulation and more touch device modes

Tue Mar 8 20:59:21 PST 2011

On Tue, Mar 08, 2011 at 10:24:42AM -0500, Chase Douglas wrote:
> On 03/08/2011 12:41 AM, Peter Hutterer wrote:
> > On Wed, Mar 02, 2011 at 11:35:41AM -0500, Chase Douglas wrote:
> >> On 03/02/2011 05:58 AM, Daniel Stone wrote:
> >>> On Tue, Feb 22, 2011 at 10:06:37AM -0500, Chase Douglas wrote:
> >>>> @@ -132,16 +133,16 @@
> >>>>  /* Device event flags (common) */
> >>>>  /* Device event flags (key events only) */
> >>>>  #define XIKeyRepeat                             (1 << 16)
> >>>> -/* Device event flags (pointer events only) */
> >>>> +/* Device event flags (pointer and touch events only) */
> >>>>  #define XIPointerEmulated                       (1 << 16)
> >>>>  /* Device event flags (touch events only) */
> >>>> -#define XITouchPendingEnd                       (1 << 16)
> >>>> -/* Device event flags (touch end events only) */
> >>>> -#define XITouchAccepted                         (1 << 17)
> >>>> +#define XITouchPendingEnd                       (1 << 17)
> >>>
> >>> Is there any particular reason to unify the two sets of flags? I guess
> >>> it can't really hurt as we shouldn't be using sixteen flags between
> >>> pointer and touch, but eh.
> >>
> >> We don't have to unify them per se, we could leave them independent.
> >> However, we need XIPointerEmulated for both touch events and pointer
> >> events. As an example, through Qt one can query which touch point is
> >> associated with an emulated pointer event. To do this, we need a way to
> >> designate that touch in the protocol. Reusing the XIPointerEmulated flag
> >> for the associated touch seems to be a reasonable solution to me.
> > 
> > I'm not sure how well that works in practice given that pointer events may
> > be held up by grabs, whereas touch events are delivered immediately. so even
> > if you have both pointer and touch event flagged, you may not get one event
> > until much later.
> 
> True, this may not actually be effective for matching up a pointer event
> to a touch sequence. However, it's still useful in that it tells us if a
> pointer event is emulated and whether a touch sequence has generated
> emulated pointer events.
> 
> A more precise solution would be to give the touch id of the emulated
> touch sequence in the pointer events, but there doesn't appear to be any
> fields available for this. Further, it still won't help the fact that
> you may receive the emulated pointer events without ever seeing the
> touch sequence, in the case of an indirect device. I think this may be
> the best we can do.

device events are supposedly extensible (unless I screwed up) so adding a
uint32_t for the touch ID would be easy enough. whether it's worth it is
another question given that pointer emulation has different time delivery
semantics. I don't know what the answer to this is.

anyway, adding this to the protocol and the server is easy. turns out I'm
better at designing the protocol than writing libXi interfaces, those are
painful to extend.

> >>>> @@ -205,62 +208,145 @@ to touch the device. The init and destroy stage of this sequence are always
> >>>>  present, while the move stage is optional. Within this document, the term
> >>>>  "touch sequence" is used to describe the above chain of events. A client
> >>>>  wishing to receive touch events must register for at least TouchBegin,
> >>>> -TouchOwnership, TouchUpdate, and TouchEnd simultaneously; it may also select
> >>>> -for TouchUpdateUnowned events if it wants to receive the full touch stream,
> >>>> -rather than just the final state.
> >>>> +TouchUpdate, and TouchEnd simultaneously. It may also select for
> >>>> +TouchUpdateUnowned and TouchOwnership events if it wants to receive the full
> >>>> +touch stream while other clients own or have active grabs involving the touch.
> >>>
> >>> I'm not particularly happy with this hunk, as it means we'll be
> >>> delivering TouchOwnership events to clients who haven't selected for
> >>> them.  I think it was fairly clear as it is: you must always select for
> >>> TouchBegin, TouchOwnership, TouchUpdate and TouchEnd.  If you also want
> >>> unowned events, you select for TouchUpdateUnowned as well.
> >>
> >> When would we ever need to send an ownership event if the client didn't
> >> select for it? If you don't select for ownership and update unowned, you
> >> won't receive any events until you have become the owner of the touch.
> >> When you receive the begin event, you already know you're the owner, so
> >> an ownership event isn't needed.
> > 
> > daniel's approach requires that the touchbegin is sent immediately to all
> > clients, the ownership when a client receives the ownership. your approach
> > holds the touch begin until the client becomes the owner, thus being
> > more-or-less the equivalent of the ownership event in daniel's apparoch.
> 
> Correct, in the case of selecting only for owned events.
> 
> >>>> +grab. When a client does not select for unowned and ownership events, it will
> >>>> +receive a TouchBegin event when it becomes the owner of a touch stream.
> >>>> +TouchUpdate and TouchEnd events will be received in the same manner as for touch
> >>>> +grabs.
> >>>
> >>> I think it could be clearer to state that:
> >>>     * clients always receive TouchBegin events immediately before they
> >>>       start receiving any other events for that touch sequence
> >>>     * TouchUpdateUnowned events, if selected for, will be sent while the
> >>>       client does not own the touch sequence
> >>>     * a TouchOwnership event will be sent when the client becomes the
> >>>       owner of a touch stream, followed by a sequence of TouchUpdate
> >>>       events
> >>>     * a TouchEnd event will be sent when no further events will be sent
> >>>       to this client for the touch sequence: when the touch has
> >>>       physically ended, when the client has called AllowTouchEvents with
> >>>       TouchRejectEnd, when the touch grab owner has called
> >>>       AllowTouchEvents with TouchAccept, or the pointer grab owner has
> >>>       called AllowEvents with Async{Pointer,Both}.
> >>
> >> This doesn't match what I wrote above :). As I noted in an earlier
> >> comment, we don't need to send ownership events to clients that don't
> >> select for unowned events. This makes the client code much cleaner too,
> >> as they will only have to handle begin, update, and end events.
> > 
> > the danger I see in your spec however is that there is no clear mapping
> > between the actual touch begin and the one sent to the client. does the
> > TouchBegin still contain the original coordinates or the current ones? what
> > about update events that happened between the physical begin and the
> > TouchBegin. are they buffered and re-sent or just dropped or compressed. 
> > you mentioned dropping with your ring buffer, but that's an implementation
> > detail not explained elsewhere.
> > 
> > does the TouchBegin have the same timestamp as the actual touch begin or the
> > timestamp of when sent to the client?
> > for delayed touches (because a grabbing client takes a while), the
> > time between TouchBegin and TouchOwnership can be a worthy piece of
> > information that is otherwise not available. The mere fact that touch is
> > currently used can be interesting to a client, even if it never receives the
> > touch event.
> 
> This does need to be stated in the spec, I just forgot about it :).
> 
> First, I believe my approach is better than using ownership events when
> the client only selects for owned events. It's not clear to me which of
> the two sequences below Daniel is proposing:
> 
> 1. Touch physically initiates
> 2. TouchBegin sent to client
> 3. All grabbing clients reject/replay touch
> 4. TouchOwnership sent to client
> 5. TouchUpdates sent to client
> 6. TouchEnd sent to client

 ^^ this one

> 1. Touch physically initiates
> 2. All grabbing clients reject/replay touch
> 3. TouchBegin sent to client
> 4. TouchOwnership sent to client
> 5. TouchUpdates sent to client
> 6. TouchEnd sent to client
> 
> In the first sequence of events, we are needlessly waking up the
> selecting client at the beginning of each touch if the touch sequence is
> grabbed and handled above. We would also be waking up the client when
> the TouchEnd event is sent.
> 
> In the second sequence of events, the ownership event is superfluous.
> There's no extra information to be gleaned from it. If I were writing a
> client, I'd select for ownership just cause I was forced to, and then
> discard the ownership events. That doesn't seem like a good API to me :).
> 
> There's something to be said for keeping the same semantics throughout
> the api, but this only holds when dealing with the same context. I don't
> believe using touch ownership events when selecting only for owned
> events fits the context.

well, there are a number of benefits for sending touch begin immediately. it
allows e.g. a UI to reconfigure itself for touch-interfaces even before it
gets to handle the actual touch (something that's easy to undo).
plus the symmetry in the API.

> In my implementation, the touch begin event is saved off (separate from
> the ring buffer) so we can replay it when the selecting client receives
> ownership. Touch update events are saved into the ring buffer as they
> are generated. When the ring buffer overruns, the oldest touch update
> event is overwritten by the newest event. Thus, the client will receive
> the touch begin event with the correct begin coordinates, then the first
> touch update event may jerk the touch to a far away location if the ring
> buffer overruns, and then the last N update events will be smooth. One
> could say that the overran events at the beginning of the touch sequence
> are motion compressed together.

this would be similar to motion events except that we don't provide the
motion history. again, needs to be added to the spec.

> My implementation sets the timestamp of the touch events as they are
> sent to the client, so the timestamp of replayed events will not match
> the timestamp of the original events as sent to the grabbing clients. I
> don't see this as a problem because X timestamps just don't work for
> multitouch events. Henrik Rydberg implemented a Kalman filter for
> velocity estimation and compensation in utouch-frame, a library for
> extracting touch events into frames for easier consumption by the
> client. The library can work on top of mtdev or XI 2.1. When mtdev is
> used, the evdev timestamps are used and the filter works well. When XI
> 2.1 is used we have to disable the filter because the X timestamps are
> so wildly inaccurate. The correct solution, imo, is to add a valuator
> axis to the devices whose value represents "device" time. On Linux, this
> would be set to the timestamps from evdev. The valuator values of the
> device events are copied into the ring buffer, so when they are replayed
> the values would be representative of the original events.

valuators are _not_ fields we can dump random values in just because we
can't fix them elsewhere. especially for this, we already have an event
time. if that is "wildly inaccurate" then it's mostl likely a bug. what's
the cause for the inaccuracy?

> As for clients wanting information on whether a touch event has ever
> been owned by a grabbing client or the time difference between the
> original touch begin event and when the client receives ownership, I
> would say such clients should subscribe to unowned events as well. We
> can't cater to every possible combination of use cases separately, and
> such a use case seems specialized enough that the client can be expected
> to jump through the extra hoops of unowned event handling to do so.
> 
> >>>> +SemiMultitouch:
> >>>> +    These devices may report touch events that correlate to the two opposite
> >>>> +    corners of the bounding box of all touches. The number of active touch
> >>>> +    sequences represents the number of touches on the device, and the position
> >>>> +    of any given touch event will be equal to either of the two corners of the
> >>>> +    bounding box. However, the physical location of the touches is unknown.
> >>>> +    SemiMultitouch devices are a subset of DependentTouch devices. Although
> >>>> +    DirectTouch and IndependentPointer devices may also be SemiMultitouch
> >>>> +    devices, such devices are not allowed through this protocol.
> >>>
> >>> Hmmm.  The bounding box being based on corners of separate pointers
> >>> seems kind of a hack to me.  I'd much rather have the touches all be
> >>> positioned at the midpoint, with the bounding box exposed through
> >>> separate axes.
> >>
> >> I think the question that highlights our differences is: "Should we
> >> attempt to handle these devices in the XI 2.1 touch protocol, or fit
> >> them into the pointer protocol?" In Linux, it's been determined that
> >> these devices will be handled as multitouch devices. The evdev client
> >> sees a device with two touch points that are located at the corners of
> >> the bounding box. The normal synaptics-style event codes for describing
> >> the number of fingers are used to denote how many touches are active in
> >> the bounding box.
> >>
> >> I'm of the mindset that these devices should be handled as described in
> >> XI 2.1. However, I could be persuaded to handle these devices by
> >> treating them as traditional pointing devices + 5 valuators for
> >> describing the bounding box and how many touches are active.
> >>
> >>> The last sentence also makes me slightly nervous; it seems like we want
> >>> SemiMultitouch to actually be an independent property, whereby a device
> >>> is Direct, Independent or Independent, and then also optionally
> >>> semi-multitouch.  (Possibly just exposing the bounding box axes would be
> >>> enough to qualify as semi-multitouch.)  In fact, IndependentPointer
> >>> could be similarly be a property of some DependentTouch devices as well.
> >>
> >> I thought about this, but there's a few reasons I did it this way:
> >>
> >> 1. If you want to make it an independent property, then we should change
> >> the mode field to a bitmask. The field is only 8 bits right now, so we
> >> could run out of bits very quickly. However, treating the field as an
> >> integer as it is today allows for 255 variations. We can always revisit
> >> and add in semi-mt + independent pointer as a new mode later on.
> >>
> >> 2. Semi-mt and direct touch doesn't make sense. You don't know where
> >> touches are, so you don't know which window to direct events to if the
> >> bounding box spans multiple windows.
> >>
> >> 3. I believe semi-mt is a dead technology now. I've only ever seen it in
> >> touchpads, and I don't think they'll ever expand beyond that scope. We
> >> can always add another device mode if needed.
> 
> I'm going to assume by the lack of comment here that you're satisfied
> with this mode?

tbh. I don't know yet. I obviously can't make these devices go away but i'm
not sure on the handling for them yet.

> >>>> +In order to prevent touch events delivered to one window while pointer events
> >>>> +are implicitly grabbed by another, all touches from indirect devices will end
> >>>> +when an implicit grab is activated on the slave or attached master device. New
> >>>> +touches may begin while the device is implicitly grabbed.
> >>>
> >>> This bit makes me _nervous_.  Unfortunately we can only activate one
> >>> pointer grab at a time, but I'd rather do something like this:
> >>>     * populate the window set with the pseudocode described near the top
> >>>       when the touch begins, regardless of the pointer state
> >>>     * generate touch events as normal
> >>>     * if ownership is passed to a pointer grab/selection, skip it if
> >>>       a pointer grab is already active on the delivering device (the MD
> >>>       if the selection was on the MD ID or XIAllMasterDevices, otherwise
> >>>       the SD)
> >>>
> >>> It's unpleasant, but I don't like ending all touch events as soon as we
> >>> start pointer emulation (which will happen a fair bit).  Also: why is
> >>> this different for direct and indirect devices? Doesn't this completely
> >>> kill multi-finger gestures if _any_ client (e.g. the WM) has a pointer
> >>> grab anywhere in the stack?
> >>>
> >>> This bit will definitely require more thought.
> >>
> >> I think you're mixing up a lot of things here :). First, we're only
> >> talking about indirect devices where there's no pointer emulation.
> >> Second, we're only talking about implicit grabs that are activated when
> >> a button is pressed.
> > 
> > this needs to be specified then. AIUI, sending TouchBegin events activates
> > implicit grabs too.
> 
> I didn't really like this idea, so I tried to come up with a better one.
> It's at the end of one of my other emails. Essentially, don't cancel
> touches, just don't send them to clients when the cursor has left the
> touch selecting/grabbing window.
> 
> >> However, this does bring up a good point. What do we do when a touch
> >> begins on an indirect device that is actively grabbed. What do we do
> >> when a grab is activated?
> >>
> >> I feel as though the only sound thing to do for indirect devices is to
> >> cancel all touches when any grab is activated, and to not begin any
> >> touch sequences while any grab is active. This is an extremely heavy
> >> handed solution to the problem, but I can't think of anything better
> >> that wouldn't introduce holes into the protocol. Further, there are
> >> normally two scenarios where grabs are used:
> >>
> >> 1. When a button is pressed. For all multitouch gesture work I've seen
> >> (and I'm unaware of any other usage of multitouch for indirect devices),
> >> no button are pressed while multitouch events are being handled.
> > 
> > tapping and scrolling both send button events that will likely be grabbed,
> > even if temporarily only. that's usually on the MD though.
> 
> Tapping is just like any other button click, but I hadn't thought of
> scroll events. This is just another reason to hate scroll as button
> events :).

they won't go away anytime soon though, so we need to cater for them.

> >> 2. When doing funky things like confine-to. Hopefully pointer barriers
> >> are a better solution for this, so we can just say we don't support MT +
> >> pointer grabs.
> > 
> > hoping that confine_to just disappears is not a good plan of action,
> > regardless of pointer barriers.
> 
> I hope this is better resolved with the new proposal. It wouldn't
> require any extra handling in the event of a confine_to grab.

I'm getting confused by protocol specs stacked on top of each other and then
picked to bits. What is your current upstream for inputproto? I need to read
the document in full again.

> > this pretty much comes down to two things:
> > - we should specify that only one client may select for touch events on a
> >   given window, just like for button press (I _think_ we may have this in
> >   the protocol already)
> 
> That's the intention at least, even if it's not 100% clear yet :).
> 
> (To be precise, one client may select for touch events per physical
> touch device per window.)
> 
> > - we need to decide if pointer emulation happens if the client selects for
> >   pointer + touch events or if we trust the client to handle this situation
> >
> >> There's nothing that prevents one client from selecting for touches
> >> while another client selects for pointer events on the same window.
> >> However, there is a clear distinction: the pointer selecting client
> >> knows that it may not be the only receiver of events, while the touch
> >> selecting client knows it has exclusive right to the touch events.
> >>
> >> Also, delivering an emulated pointer and its associated touch event
> >> isn't pointless. It's how Windows handles things today, so toolkits like
> >> Qt are set up to deal with this situation. One could argue that Qt
> >> could/should be handling things differently for XI 2.1, but I don't have
> >> a good argument why we should force them to.
> > 
> > what do they do with the emulated pointer event? do they process it or
> > discard it anyway?
> 
> It all depends on the widget that events propagate to. My understanding
> is that widgets in Qt select for touch and pointer events independently,
> just as in X. The widget will receive both types of events if it
> subscribes to both. If a widget and its parents don't handle an event,
> the event is discarded.
> 
> I'm hoping Denis will correct me if I'm mistaken :).
> 
> >>>> @@ -866,6 +949,9 @@ are required to be 0.
> >>>>      master
> >>>>          The new master device to attach this slave device to.
> >>>>  
> >>>> +    If any clients are selecting for touch events from the slave device, their
> >>>> +    selection will be canceled.
> >>>
> >>> Does that mean the selection will be removed completely, and the
> >>> selection will no longer be present if the SD is removed, and all
> >>> clients are required to re-select every time the hierachy changes, or?
> >>
> >> If the SD is removed, then all event selections are already canceled
> >> aren't they? If not, that seems like a broken protocol. Device IDs are
> >> reused, so you might end up selecting for events from a different device
> >> than you meant to.
> >>
> >> Clients only are required to re-select when the specific slave device
> >> they care about is attached, not on every hierarchy change.
> > 
> > I guess daniel meant s/removed/reattached/, not as in "unplugged". But you
> > answered the question, a client registering for touch events must re-select
> > for touch events on every hierarchy change that affects the SD (including
> > the race conditions this implies).
> > 
> > What is the reason for this again? If we already require clients to track
> > the SDs, can we assume that they want the events from the device as
> > selected, even if reattached?
> 
> We enforce one touch client selection per physical device per window at
> selection request time. Let's say on the same window you have client A
> selecting on detached slave device S, and client B selecting on
> XIAllMasterDevices. When you attach device S to a master device, you now
> have two competing selections. Do you send touch events to client A or
> client B? I feel that client B has priority and client A's selection
> should be cancelled. If you inverted the priority, you would break X
> core and XI 1.x clients by removing their selections without them knowing.

can you even select for XIAllMasterDevices for touch events? master devices
don't send touch events so you can't really select for them. Not sure how
that situation would then happen.

if you can, I need an extra blurb to see the semantics for
XIAllMasterDevices on XISelectEvents.

Cheers,
  Peter