Dispatching and scheduling--basic questions

Tue Sep 16 12:56:05 PDT 2008

On Tue, Sep 16, 2008 at 03:01:55PM -0400, Adam Jackson wrote:
> On Tue, 2008-09-16 at 20:03 +0300, Daniel Stone wrote:
> > Hmm.  Even enforcing fairness between clients? If you have a hostile
> > client, you've already lost, but we have a lot of crap clients already
> > (hello Gecko), so.  It would also presumably drop the mean/mode
> > latencies while having pretty much no impact on the others: if you have
> > one thread waiting on a GetImage and thus migration back to system
> > memory, your other clients can still push their trivial rendering to the
> > GPU and go back to sleeping.
> > 
> > I will admit that this paragraph has had no prior thought, and could
> > probably be swiftly proven wrong.  YMMV.
> 
> I could believe a fairness argument here, but I'd like to see better
> numbers first on how often clients block on the server, and what they're
> waiting for when they do.
> 
> Project for anyone reading this thread: instrument the scheduler such
> that when it punishes a client, it records both the last thing that
> client was doing, and the number of clients now in the wait queue.  Dump
> to log, run a desktop for a few days, then go do statistics.

Yeah, it's pretty pointless debating this part further with zero data,
especially as the scheduler isn't something I've yet looked at.

> > Not really.  We're getting to the point of seeing multicore in consumer
> > products, but the GPUs there are still too power-hungry to want to base
> > a Render implementation on.  Of course, we're still pretty much in the
> > first iteration of the current generation of those GPUs, so hopefully
> > they can push the power envelope quite aggressively lower, but for a
> > couple of years at least, we'll have multicore + effectively GPU-less,
> > in platforms where latency is absolutely unacceptable.
> 
> ARM, you're so weird.

Turns out people buying tablets/phones want more than 2 hours use time
and a few days' standby time.  Go figure.

> Well, okay, there's at least two tactics you could use here.  We could
> either go to aggressive threading like in MTX, but that's not a small
> project and I think the ping-pong latency from bouncing the locks around
> will offset any speed win from parallelising rendering.  You can
> mitigate some of that by trying to keep clients pinned to threads and
> hope the kernel pins threads to cores, but atoms and root window
> properties and cliplist manipulation will still knock all your locks
> around... so you might improve fairness, but at the cost of best-case
> latency.

I suspect that on current Intel/AMD hardware, the lock cost is virtually
zero (unless you have two threads noisily contending, with the worst
case being that the individual requests take twice as long to return as
previously, with the overall runtime being mostly unchanged -- but that
can be mitigated by being smart about your threading).  On ARM-type
hardware, due to our puny cache, you'd have to be smart about keeping
your locks on the same cacheline or two, since dcache misses come often
and hurt a lot.

> Or, we keep some long-lived rendering threads in pixman, and chunk
> rendering up at the last instant.  I still contend that software
> rendering is the only part of the server's life that should legitimately
> take significant time.  If we're going to thread to solve that problem,
> then keep the complexity there, not up in dispatch.

Mm, then you have a server with a very good overall best case, but still
a pretty terrible overall worst case.  What happens when an XGetImage
requires a complete GPU sync (forget software rendering for a moment),
which takes a while, then a copy? Bonus points if you have to stall to
clean its MMU, too.  Then you memcpy it into SHM and get that out to
the client, but in the meantime, all your other clients waiting for
trivial requests are doing just that: waiting.

This is a bit pathological, I know, but I suspect we'll actually start
really noticing this when we have threaded input event delivery, as
well as generaton, and we're all running competent EXA + DRI2 + GEM/TTM
setups.  At that point[0], the server blocking for 100ms will not be
cool, even if your cursor does still move.

> Still, I'm kind of dismayed the GPU needs that much power.  All we need
> is one texture unit.  I have to imagine the penalty for doing it in
> software outweighs the additional idle current from a braindead alpha
> blender...

Indeed, you want to be smart about what you offload to the GPU.  Luckily
they're[1] pretty quick to come out of power-off (suspend/resume is
something we can do virtually instantaneously, and at least hundreds of
times per second), so there's bugger-all penalty for being pessimistic
about its powering, but you don't always win at performance.

Cheers,
Daniel

[0]: Assuming apps and toolkits aren't just crap.
[1]: Some of them.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.x.org/archives/xorg/attachments/20080916/9c6d67a5/attachment.pgp>