Slow XQueryPointer?

Thu Jan 26 08:58:24 PST 2012

On Thu, 2012-01-26 at 17:51 +0200, Lauri Kasanen wrote:

> For this particular call, my app called XQueryPointer at
> 1327592221.398620 and reported the call took 12177 usec. The relevant
> portion of the strace is attached.
> 
> I'm not fully confident in my ability to grok strace, but to me it
> looks like the latter case (excess work?). There is a select call soon
> after the call, and there's a write about 12ms later.

Okay, I think I know what this is.

Right after your request there's:

1327592221.398755 select(256, [1 3 6 8 10 11 12 13 14 15 16 17 18 19 20 21 22], NULL, NULL, {7, 519000}) = 2 (in [20 22], left {7, 518990})
1327592221.398868 setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0
1327592221.398944 read(20, "5\30\4\0\206_q\1\2\1\0\0\325\1\27\0007\0\4\0\207_q\1\206_q\1\0\0\0\0"..., 4096) = 3452
1327592221.399071 ioctl(8, 0xc020645d, 0x7fff61bdc420) = 0
1327592221.399189 ioctl(8, 0x40086409, 0x7fff61bdc440) = 0
1327592221.399279 ioctl(8, 0xc0086464, 0x7fff61bdc210) = 0
1327592221.400355 ioctl(8, 0xc0206466, 0x1f81a38) = 0
1327592221.401180 ioctl(8, 0xc008646a, 0x7fff61bdc460) = 0
1327592221.401255 ioctl(8, 0xc0086464, 0x7fff61bdc170) = 0
1327592221.401604 read(20, 0x2c06920, 4096) = -1 EAGAIN (Resource temporarily unavailable)
1327592221.401680 ioctl(8, 0xc0206466, 0x1f81a38) = 0
1327592221.401954 ioctl(8, 0xc008646a, 0x7fff61bdc880) = 0
1327592221.402024 writev(20, [{"\16\0\r\304\206_q\1\0\0>\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3808}], 1) = 3808
1327592221.402446 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0

So this is showing a wakeup to handle a request on fd 20, but it
completes in under 4ms (and isn't your request anyway; "5" is 53 decimal
which is CreatePixmap).  Much much later we finally get around to
handling your request:

1327592221.410274 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
1327592221.410366 select(256, [1 3 6 8 10 11 12 13 14 15 16 17 18 19 20 21 22], NULL, NULL, {7, 507000}) = 1 (in [22], left {7, 506978})
1327592221.410492 setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0
1327592221.410568 read(22, "&\0\2\0\2\1\0\0", 4120) = 8
1327592221.410651 read(22, 0x2c05900, 4120) = -1 EAGAIN (Resource temporarily unavailable)
1327592221.410721 writev(22, [{"\1\1'\1\0\0\0\0\2\1\0\0\211_`\0\330\1\243\0\330\1\243\0\0\0\0\0\0\0\0\0", 32}], 1) = 32
1327592221.410858 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
1327592221.410936 select(256, [1 3 6 8 10 11 12 13 14 15 16 17 18 19 20 21 22], NULL, NULL, {7, 507000}) = 1 (in [22], left {7, 506988})
1327592221.411051 setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0

And this is definitely you, "&" is 38 decimal is QueryPointer.  So:

writev(22) returns:     1327592221.410721
Your printed timestamp: 1327592221.398620
----------------------- -----------------
                                 0.012101

That plus another context switch back to your process comes to the
12177us you're seeing.  So it's not that the server has excess work,
just that it's ignoring you.

Now the question, of course, is why we'd do such a thing, and the answer
is to try to guarantee fairness among clients.  The way we do this is by
trying to time how long we're handling requests for a given client
(that'd be the setitimer calls in the trace), and switching to a new
client if one exhausts its timeslice.  And now we see the problem:

#define SMART_SCHEDULE_DEFAULT_INTERVAL 20          /* ms */
#define SMART_SCHEDULE_MAX_SLICE        200         /* ms */

Bool SmartScheduleDisable = FALSE; 
long SmartScheduleSlice = SMART_SCHEDULE_DEFAULT_INTERVAL;
long SmartScheduleInterval = SMART_SCHEDULE_DEFAULT_INTERVAL;
long SmartScheduleMaxSlice = SMART_SCHEDULE_MAX_SLICE;

20ms at a time for a single client is clearly insane.

You can tune this on the command line with the -schedInterval <ms>
argument; it would be interesting to see how things fare at closer to
(say) 5ms.  I suspect we need to tune that default down to below 16ms at
least, allowing a single client to monopolize the server for more than a
frame is not acceptable.

I think a more complete solution would involve changing the main loop's
logic slightly: if we get to the end of requests for a single client
within a single timeslice, start pulling requests from any other clients
that have pending requests (in priority order) until the timeslice
expires (at which point we'd loop back around to select and start
again).  This would punish absolute throughput for any one client when
contended, but I'm willing to take that if it means we stop dropping
frames.

- ajax
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://lists.x.org/archives/xorg/attachments/20120126/fe71d4be/attachment.pgp>