input thread [was Re: [PATCH v2] input: constify valuators passed in by input drivers.]

Fri Aug 13 09:14:35 PDT 2010

On Fri, Aug 13, 2010 at 09:02:34AM -0700, Adam Jackson wrote:
> On Fri, 2010-08-13 at 08:53 -0700, Aaron Plattner wrote:
> > On Fri, Aug 13, 2010 at 08:09:24AM -0700, Adam Jackson wrote:
> > > ShmPutImage is a bit better in that it elides the socket copies, but
> > > that last memcpy or DMA still has to fire, and it still completes
> > > synchronously; the server won't advance to the next request until it's
> > > done.
> > 
> > I know it's nitpicking, but this is false at least for our driver.
> > {Shm,}PutImage is pipelined and you can have a potentially large number of
> > them in flight at a time.
> 
> I assume you do this by hooking out the dispatch for ShmPutImage, since
> otherwise you're racing with sending the completion event.  (Or by
> memcpying the shm segment aside I guess, but that seems like losing.)

No, we do memcpy, which isn't losing specifically because it avoids a
CPU/GPU sync.  Also, configuring the DMA engine to DMA from some arbitrary
memory is often slower than just memcpying it somewhere else.

> > > And of course {Shm,}GetImage have all the same problems.
> > 
> > GetImage obviously does have to wait until the DMA is complete, so it's
> > worse than PutImage.
> 
> Enh.  We have a protocol buffer allocated for the DMA to get into, we
> could start the DMA, sleep the client, and then finish protocol when it
> completes.  Which isn't really different from the PutImage case.

See above comment about reconfiguring the DMA engine being slow.

> GetImage performance is a lost cause though, the protocol buffer is like
> 64k, anyone seriously doing screen scrapes for performance should be
> doing ShmGetImage.

Agreed, though any GetImage is losing.