[PATCH libX11 0/2] Xlib 32-bit request sequence wrap bug(fix)
Jonas Petersen
jnsptrsn1 at gmail.com
Mon Oct 28 23:36:29 CET 2013
Hi,
Here's two patches. The first one fixes a 32-bit sequence wrap bug. The
second patch only adds a comment to another relevant statement.
The patches contain some details. Here is the whole story for who might be
interested:
Xlib (libx11) will crash an application with a "Fatal IO error 11 (Resource
temporarily unavailable)" after 4 294 967 296 requests to the server. That is
when the Xlib internal 32-bit sequence wraps.
Most applications probably will hardly reach this number, but if they do,
they have a chance to die a mysterious death. For example the application I'm
working on did always crash after about 20 hours when I started to do some
stress testing. It does some intensive drawing through Xlib using gktmm2,
pixmaps and gc drawing at 40 frames per second in full hd resolution (on
Ubuntu). Some optimizations did extend the grace to about 35 hours but it
would still crash.
What then followed was some frustrating weeks of digging and debugging to
realize that it's not in my application, nor in gtkmm, gtk or glib but that
it's this little bug in Xlib which exists since 2006-10-06 apparently.
It took a while to turn out that the number 0x100000000 (2^32) has some
relevance. (Much) later it turned out it can be reproduced with Xlib only,
using this code for example:
while(1) {
XDrawPoint(display, drawable, gc, x, y);
XFlush(display);
}
It might take one or two hours, but when it reaches the 4294 million it will
explode into a "Fatal IO error 11".
What I then learned is that even though Xlib uses internal 32bit sequence
numbers they get (smartly) widened to 64bit in the process so that the 32bit
sequence may wrap without any disruption in the widened 64bit sequence.
Obviously there must be something wrong with that.
The Fatal IO error is issued in _XReply() when it's not getting a reply where
there should be one, but the cause is earlier in _XSend() in the moment when
the Xlib 32-bit sequence number wraps.
The problem is that when it wraps to 0, the value of 'last_flushed' will
still be at the upper boundary (e.g. 0xffffffff). There is two locations in
_XSend() (xcb_io.c) that fail in this state because they rely on those values
being sequential all the time, the first location is:
requests = dpy->request - dpy->xcb->last_flushed;
I case of request = 0x0 and last_flushed = 0xffffffff it will assign
0xffffffff00000001 to 'requests' and then to XCB as a number (amount) of
requests. This is the main killer.
The second location is this:
for(sequence = dpy->xcb->last_flushed + 1; sequence <= dpy->request; \
++sequence)
I case of request = 0x0 (less than last_flushed) there is no chance to enter
the loop ever and as a result some requests are ignored.
The solution is to "unwrap" dpy->request at these two locations and thus
retain the sequence related to last_flushed.
uint64_t unwrapped_request = ((uint64_t)(dpy->request < \
dpy->xcb->last_flushed) << 32) + dpy->request;
It creates a temporary 64-bit request number which has bit 8 set if 'request'
is less than 'last_flushed'. It is then used in the two locations instead of
dpy->request.
I'm not sure if it might be more efficient to use that statement inplace,
instead of using a variable.
There is another line in require_socket() that worried me at first:
dpy->xcb->last_flushed = dpy->request = sent;
That's a 64-bit, 32-bit, 64-bit assignment. It will truncate 'sent' to 32-bit
when assinging it to 'request' and then also assign the truncated value to
the (64-bit) 'last_flushed'.
But it seems inteded. I have added a note explaining that for the next poor
soul debugging sequence issues... :-)
- Jonas
Jonas Petersen (2):
xcb_io: Fix Xlib 32-bit request number wrapping
xcb_io: Add comment explaining a mixed type double assignment
src/xcb_io.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
--
1.7.10.4
More information about the xorg-devel
mailing list