[PATCH libX11 0/2] Xlib 32-bit request sequence wrap bug(fix)

Mon Oct 28 23:36:29 CET 2013

Hi,

Here's two patches. The first one fixes a 32-bit sequence wrap bug. The 
second patch only adds a comment to another relevant statement.

The patches contain some details. Here is the whole story for who might be 
interested:

Xlib (libx11) will crash an application with a "Fatal IO error 11 (Resource 
temporarily unavailable)" after 4 294 967 296 requests to the server. That is 
when the Xlib internal 32-bit sequence wraps.

Most applications probably will hardly reach this number, but if they do, 
they have a chance to die a mysterious death. For example the application I'm 
working on did always crash after about 20 hours when I started to do some 
stress testing. It does some intensive drawing through Xlib using gktmm2, 
pixmaps and gc drawing at 40 frames per second in full hd resolution (on 
Ubuntu). Some optimizations did extend the grace to about 35 hours but it 
would still crash.

What then followed was some frustrating weeks of digging and debugging to 
realize that it's not in my application, nor in gtkmm, gtk or glib but that 
it's this little bug in Xlib which exists since 2006-10-06 apparently.

It took a while to turn out that the number 0x100000000 (2^32) has some 
relevance. (Much) later it turned out it can be reproduced with Xlib only, 
using this code for example:

  while(1) {
      XDrawPoint(display, drawable, gc, x, y);
      XFlush(display);
  }

It might take one or two hours, but when it reaches the 4294 million it will 
explode into a "Fatal IO error 11".

What I then learned is that even though Xlib uses internal 32bit sequence 
numbers they get (smartly) widened to 64bit in the process so that the 32bit 
sequence may wrap without any disruption in the widened 64bit sequence. 
Obviously there must be something wrong with that.

The Fatal IO error is issued in _XReply() when it's not getting a reply where 
there should be one, but the cause is earlier in _XSend() in the moment when 
the Xlib 32-bit sequence number wraps.

The problem is that when it wraps to 0, the value of 'last_flushed' will 
still be at the upper boundary (e.g. 0xffffffff). There is two locations in 
_XSend() (xcb_io.c) that fail in this state because they rely on those values 
being sequential all the time, the first location is:

  requests = dpy->request - dpy->xcb->last_flushed;

I case of request = 0x0 and last_flushed = 0xffffffff it will assign 
0xffffffff00000001 to 'requests' and then to XCB as a number (amount) of 
requests. This is the main killer.

The second location is this:

  for(sequence = dpy->xcb->last_flushed + 1; sequence <= dpy->request; \
    ++sequence)

I case of request = 0x0 (less than last_flushed) there is no chance to enter 
the loop ever and as a result some requests are ignored.

The solution is to "unwrap" dpy->request at these two locations and thus 
retain the sequence related to last_flushed.

  uint64_t unwrapped_request = ((uint64_t)(dpy->request < \
    dpy->xcb->last_flushed) << 32) + dpy->request;

It creates a temporary 64-bit request number which has bit 8 set if 'request' 
is less than 'last_flushed'. It is then used in the two locations instead of 
dpy->request.

I'm not sure if it might be more efficient to use that statement inplace, 
instead of using a variable.

There is another line in require_socket() that worried me at first:

  dpy->xcb->last_flushed = dpy->request = sent;

That's a 64-bit, 32-bit, 64-bit assignment. It will truncate 'sent' to 32-bit 
when assinging it to 'request' and then also assign the truncated value to 
the (64-bit) 'last_flushed'.
But it seems inteded. I have added a note explaining that for the next poor 
soul debugging sequence issues... :-)

- Jonas

Jonas Petersen (2):
  xcb_io: Fix Xlib 32-bit request number wrapping
  xcb_io: Add comment explaining a mixed type double assignment

 src/xcb_io.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

-- 
1.7.10.4