[RFC] Multitouch support, step one

Tue Mar 16 22:35:30 PDT 2010

On Tue, Mar 16, 2010 at 02:42:15PM +0100, Henrik Rydberg wrote:
> Peter Hutterer wrote:
> > On Mon, Mar 15, 2010 at 03:41:24PM +0100, Henrik Rydberg wrote:
> >>> Preamble:
> >>> Multi-touch as defined in this proposal is limited to single input-point
> >>> multi-touch. This is suitable for indirect touch devices (e.g. touchpads)
> >>> and partially suited for direct touch devices provided a touch is equivalent
> >>> to a single-gesture single-application input.
> >> User-space applications need tools to *use* MT devices, not route raw data from
> >> the devices to the application. The latter is not much more complicated than
> >> opening a file, and everyone can do that already. Thus, unless there is a model
> >> for how MT devices work and interact with other MT devices, I see little point
> >> in having an X protocol at all.
> > 
> > The main reason is that applications, for better or worse, use X as their
> > input source. Our job is to get the data to the right client, without too
> > much processing going on. For clients to go around the server by opening the
> > kernel device files directly will cause issues in the long run, especially
> > when you have multiple applications running.
> 
> Thank you for addressing my concerns. The details you describe below form a
> logical and complete proposal, which is agreeable in its own right.
> 
> However, I must insist on continuing this discussion, because my gut
> tells me we are moving slightly in the wrong direction. Here are the main reasons:
> 
> 1. User space wants details, but also consistent behavior for all devices
> supporting multitouch.
> 
> 2. The kernel interface is bandwidth-consuming by necessity, but there is no
> need for the X protocol to be.
> 
> 3. Support for multitouch in X does exist already, so there is no need to start
> from zero when discussing it (http://bitmath.org/code/multitouch/).
> 
> 4. The hard limit of 256 guarantees something new will have to be done for
> multi-user multitouch, in essence pushing the problem forward.
> 
> 5. Squeezing MT into the valuator concept is generally crippling, since it does
> not map very well to the underlying contact concept.
> 
> What follows is a longer version of these five points, and below that is a
> proposal for how I believe it should be done.
> 
> ---
> 
> 1. Consistent behavior for all devices
> 
> The hardware stack supporting multitouch is diverse, and several different
> mechanisms and abstraction levels exists. The tracking ID is a good example. It
> may or may not be present in the driver output, and it may work poorly even if
> it exists. Thus, in order to support hardware consistently, there must be a
> middle layer outside of the kernel, parsing the driver data and patching it up
> to produce the same level of detail for all devices. This task can be quite
> complicated and uses some cpu, so having it in one place is imperative. Luckily,
> there exists such a solution in the multitouch X driver (see link above). This
> code can either be broken out as a standalone module or be placed in the X core.
> If there is a license issue, it can be resolved for the benefit of the X
> community. In the text below, this piece of code will be referred to as the
> contact driver.

I think this is where we mostly agree, maybe not it wording but in spirit.
I want the data to come out of the protocol in a generic fashion, or at
least generalised up to a point that clients can work with it. 
IMO dealing with touchpoint (or "contact") width and height can be left to
the clients. what the driver should however do is assign tracking IDs. I
don't think this approach would be useful without being able to track
touchpoints.

> 2. Bandwidth reduction should be made as early as possible
> 
> The MT events from the kernel are non-filtered, bypassing the normal input
> filtering by necessity. Duplicating this behavior further into the food chain
> would be a mistake. After parsing the MT stream in the contact driver, the event
> stream can be filtered substantially, thereby restoring bandwidth usage to
> something more similar to non-mt devices.

I don't understand what you mean by filtering here. Can you detail this?

> 3. The contact driver produces the more digested contact events
> 
> The contact driver takes the flora of driver MT events and produces a consistent
> stream of contact events. The contact event stream is less bandwidth-consuming,
> and follows the init-move-destroy concept we discussed last summer, if you
> recall. We are still talking about a low-level stream, there are no gesture or
> other high-level derivatives. Just a consistent stream of data.

Same as above. You've worked more with the kernel's multitouch interface
than I did. can you give us an example of what the data from the kernel
would look like and how it would be "digested"?

> 4. ABI, memory and cpu burden for nothing
> 
> Although the currently hard limit of valuators most likely can be programmed
> away, it just feels wrong to burden all other applications with the additional
> memory and cpu usage implied by raising a comfortable limit to something much
> higher, only to satisfy the request of a completely different interface, which
> strains the existing concept to the limit of breakage.
> 
> 5. Use appropriate data structures to solve the problem
> 
> By defining a handful of contact api functions, operating on simple structures,
> the whole problem of forward compatibility with multi-user multitouch can be
> solved in one go, without changing a single bit in the existing interfaces. Yes,
> it means a new interface, but the functionality is new, so this is the way it
> should be.
> 
> ---
> 
> X Multitouch Support Proposal
> -----------------------------
> 
> Introduction
> ------------
> 
> Back in summer 2009, when this was discussed informally between some in the
> present party, the general structure that emerged was a split into a low level
> protocol and a gesture library, here citing two of the formulations:
> 
> > Henrik Rydberg:
> > X multipointer via init-move-destroy to X gesture driver
> > X gesture driver via enhanced X events to X application
> >
> > Peter Hutterer:
> > X server -> protocol -> application
> >                           |->  x gesture library
> 
> The change I am proposing today simply means inserting the contact driver
> discussed above before or in conjunction with the X server in Peter's chain:
> 
> kernel -> [contact driver | X server] -> protocol -> application
>                                                       |->  x gesture library
> 
> The details of the protocol in this chain depends on the output of the contact
> driver, which is only slightly higher level than the kernel events.
> 
> The Contact Driver
> ------------------
> 
> The general structure of the MT events is that of contacts appearing, changing
> and disappearing. Because of the diversity of capabilities of the drivers, this
> structure is quite relaxed in the kernel stream, to the point that it requires
> work to fully impose this structure further down the stream. That is the job of
> the Contact Driver. It translates the relaxed kernel MT events into a steady
> stream of contact events, containing the same level of information for all
> drivers. The contact events follow the same logic as the MT events, but because
> all data is present, the init-move-destroy mechanism can be employed fully. Here
> is an example of what a two-finger scroll would look like:
> 
> init id = 588, x = -234, y = 42
> init id = 123, x = 933,	y = 3
> sync
> move id = 588, x = -211, y = 529
> move id = 123, x = 863,	y = 732
> sync
> destroy	id = 588
> destroy	id = 123
> sync

This is exactly what I had in mind to handle MT in the evdev driver. The
driver handles this, then forwards it on to the client. The only difference
I see so far is that the init/destroy is implicit by the presence of the
valuators. Is this different to what you are proposing? If so, can you
provide more detail?

> The X Protocol
> --------------
> 
> The details here are well beyond my expertise, but I am suggesting a contact
> interface implementation based on the contact driver event structure. I cannot
> imagine this is harder than using the XI event structure, but should be a lot
> less of a headache for everyone.

core protocol compatibility is always the killer. Any addition to the event
stream needs to be handled so that potential core emulation is compatible to
the protocol spec - while not hindering the new event stream. This is
essentially what killed off every attempt at real multitouch so far.

Cheers,
  Peter