input thread [was Re: [PATCH v2] input: constify valuators passed in by input drivers.]

Thu Aug 12 11:29:35 PDT 2010

On Thu, 2010-08-12 at 12:48 -0300, Fernando Carrijo wrote:
> Adam Jackson <ajax at nwnk.net> wrote:
> 
> > I do eventually want to see each ScreenRec factored out to its own
> > thread.  Right now in Xinerama rendering is serialized across all
> > screens, which is terrible.
> > 
> > If you did that, you'd have one thread for dispatch and core object
> > model, one for input, and one per GPU.  I don't think there's much value
> > in going beyond that for machines with reasonable GPUs.  In particular I
> > recommend avoiding the mistake MTX made of doing one thread per client;
> > once you do that you start needing a lock around every protocol-visible
> > object and the complexity doesn't actually win you any performance.
> 
> These are exactly the kinds of insights which exhilarate me because although
> there are lots of documentation out there about parallel programming, we hardly
> ever find something about this subject when applied to the context of windowing
> systems; not to say about the X server itself. The article which describes MTX,
> for instance, is one of those I never found in the web. If you happen to have
> it archived somewhere, and don't mind to share, that would make my day.

MTX was an experiment in X11R6 to do a multithreaded X server.  It sure
does make software rendering faster if you do that, but given the modern
architecture that's a bit like putting twin turbos in your Honda Civic.
You get a fast Civic, but what you were really hoping for was more like
an Audi R8.

The design docs were shipped in the R6 source but fell away once it
became clear that MTX was a dead end.  I've got PDF versions of them up
at:

http://people.freedesktop.org/~ajax/mtx/

The documentation itself is remarkably good.  The design it documents...
well.

---

When thinking about threading a display server the first thing is to
figure out what problem you're hoping to solve by doing so.  All the
evidence indicates that humans are really bad at thinking about
concurrency models; if you're introducing that kind of complexity it had
better be worth it.

Replacing the SIGIO pseudothread is probably worth it, because it can
solve real latency issues, and because having the malloc hand tied
behind your back means you have to do some pretty unnatural contortions
sometimes.  In that sense the result may even be easier to understand.

Parallelising at the per-GPU level with threads probably makes sense,
because the primary performance problem is keeping the GPU busy.  In
high-GPU-count applications, the amount of time you spend rotating among
GPUs with a single thread means you lose real performance because you
can't keep the pipes busy.  And, due to the way Xinerama is implemented,
each ScreenRec pretty much has its own copy of the complete state of
every protocol object, so you can just barge ahead on each pipe in
parallel and expect it to work without any real interdependencies.
(Less true for input objects, but let's handwave that away for a
moment.)

Parallelising among _clients_ is not likely to be a win, because there's
no evidence that our performance problem is in marshaling the protocol
around.

- ajax
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://lists.x.org/archives/xorg-devel/attachments/20100812/158d1834/attachment.pgp>