weird Xwayland and compositor deadlock issue [WAS: [PATCH xserver v2] xwayland: handle EAGAIN and EINTR gracefully]

Wed Sep 14 07:56:15 UTC 2016

On Tue, 13 Sep 2016 12:04:14 -0400 (EDT)
Olivier Fourdan <ofourdan at redhat.com> wrote:

> Hi Pekka,
> 
> ----- Original Message -----
> > Hi Olivier,
> > 
> > I don't have any solution for you. The interactions between the Wayland
> > compositor and Xwayland are known to be very easily deadlockable IIRC. I
> > believe the only thing you can do is ensure no such case can ever
> > occur, which is very painful. That is, never do a blocking roundtrip at
> > least from one side.
> > 
> > Have the recent modifications caused a significant increase of Wayland
> > requests from Xwayland? If Xwayland needs to send an amount of data
> > bigger than bufferable, *any* blocking roundtrip via X11 from the
> > Wayland compositor is prone to deadlock. It will be waiting for a reply
> > via X11, while Xwayland is blocked on flushing, since the Wayland
> > compositor is not consuming requests.
> > 
> > It can also trivially happen if both sides do a blocking roundtrip at
> > the same time. Or just a wait for an event.
> > 
> > Either server needs to be able to return to its main loop to process the
> > protocol stream it is the server for. Preferably both, I think.  
> 
> Unfortunately, any XSync (like, for example, called in
> gdk_error_trap_pop() in gdk) will issue a blocking roundtrip, and
> window managers tend to do that quite a lot (some more than others)
> so I don't think we can easily chaneg that in window managers to
> avoid blocking rountrips on X11 side.
> 
> > You could check how Weston's XWM works. I highly suspect that after
> > Xwayland launch it avoids doing any blocking roundtrips via X11.  
> 
> Yet sometimes some X calls are blocking, e.g. XShapeGetRectangles()
> or even XGetWindowAttributes() which is invoked by mutter each time
> the a new window is mapped. mutter still uses Xlib and not xcb.
> 
> > I'd assume Xwayland also tries to avoid blocking on Wayland events,
> > but if nothing else, I believe Mesa via GLAMOR may block on
> > wl_buffer.release events... or maybe not if GLAMOR is smart with its
> > throttling. Anyway, since your flush is hitting EAGAIN, that doesn't
> > seem to be the cause.
> > 
> > I wonder if making wl_display_flush() block immediately like in your
> > patch could be replaced by adding the wl_display fd to the main poll
> > loop, so that it would get flushed ASAP but still service X11
> > requests in the mean time? It does run the risk of overflowing the
> > Wayland send buffer in Xwayland. Any way to prioritize the Wayland
> > compositor's X11 connection in Xwayland?  
> 
> If I don't make EAGAIN a FatalError() and wait for the Wayland
> display file descriptor to become writable again, Xwayland eventually
> dies with another error "(EE) request could not be marshaled: can't
> send file descriptor" from libwayland directly (in
> copy_fds_to_connection()).

Hi,

summarizing from #wayland irc between Olivier and Daniel: the proper
solution is indeed to never do blocking X11 roundtrips from the Wayland
compositor, but for practical reasons that might not be possible.

The irc log starts here:
https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=wayland&highlight_names=&date=2016-09-13#t-1402

Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 811 bytes
Desc: OpenPGP digital signature
URL: <https://lists.x.org/archives/xorg-devel/attachments/20160914/3f4b3352/attachment-0001.sig>