input thread [was Re: [PATCH v2] input: constify valuators passed in by input drivers.]

Fri Aug 13 04:51:42 PDT 2010

Adam Jackson <ajax at nwnk.net> wrote:

> MTX was an experiment in X11R6 to do a multithreaded X server.  It sure
> does make software rendering faster if you do that, but given the modern
> architecture that's a bit like putting twin turbos in your Honda Civic.
> You get a fast Civic, but what you were really hoping for was more like
> an Audi R8.
> 
> The design docs were shipped in the R6 source but fell away once it
> became clear that MTX was a dead end.  I've got PDF versions of them up
> at:
> 
> http://people.freedesktop.org/~ajax/mtx/

Thanks a lot, Ajax. What an invaluable favor!

> The documentation itself is remarkably good.  The design it documents...
> well.

I'll keep your advice in mind while I read the specifications.

> When thinking about threading a display server the first thing is to
> figure out what problem you're hoping to solve by doing so.  All the
> evidence indicates that humans are really bad at thinking about
> concurrency models; if you're introducing that kind of complexity it had
> better be worth it.
> 
> Replacing the SIGIO pseudothread is probably worth it, because it can
> solve real latency issues, and because having the malloc hand tied
> behind your back means you have to do some pretty unnatural contortions
> sometimes.  In that sense the result may even be easier to understand.

Sure. We had no problem to understand nor to code this requirement.

Probably it's a bit soon to talk about it, but besides the pipe-based
answer for the problem of reducing input latency, we coded yet another
solution. This time, anchored in the fact that WaitForSomething has
too many responsibilities.

So we devised an input thread completely responsible for input devices
monitoring and reading, and obtained as a result an implementation for
WaitForsomething which is agnostic with respect to the existence of
input devices, blocking only upon the fd_set of clients.

And since both threads could possibly process events concurrently, we
had to serialize the access to ProcessInputEvents in order to maintain
the invariant which says that, for the lack of a less math-inclined
expression, the event queue in X constitutes a totally ordered set.

This works - surprisingly for me and maybe not for others - but for
reasons which are beyond my comprehension, the cursor damaging code
shows some artifacts.

The patches, based on Tiago's input thread tree, can be found here:

  http://people.freedesktop.org/~fcarrijo/patches/

I decided to mention this tentative implementation mostly because it
might contain serious falacies which are explicit to other people's
eyes. And if it did, I could try to fix them, or forget about it once
and for all.

> Parallelising at the per-GPU level with threads probably makes sense,
> because the primary performance problem is keeping the GPU busy.  In
> high-GPU-count applications, the amount of time you spend rotating among
> GPUs with a single thread means you lose real performance because you
> can't keep the pipes busy.  And, due to the way Xinerama is implemented,
> each ScreenRec pretty much has its own copy of the complete state of
> every protocol object, so you can just barge ahead on each pipe in
> parallel and expect it to work without any real interdependencies.
> (Less true for input objects, but let's handwave that away for a
> moment.)

I understand the case for GPU-level parallelism, but I know almost
nothing about how better approach the problem from this perspective.
I really would like to have deeper knowledge about it though. Maybe
if I started getting acquainted with GPU specs?

> Parallelising among _clients_ is not likely to be a win, because there's
> no evidence that our performance problem is in marshaling the protocol
> around.

So may I suppose that doing this for the sole purpose of guaranteeing
fairness among clients isn't worth either? I ask because we all hear
people complaining about selfish clients eating their machine cycles,
but seldom a reasonable solution pops up. At least not that I know
of.

And to be honest, it seems to me that the precautions taken by the
smart scheduler to penalize hungry clients isn't enough in certain
circumstances. Firefox abusive usage of XPutImage comes to my mind,
even though I'm completely unaware of the reasons that make the cost
of this operation to be so expensive.