weird Xwayland and compositor deadlock issue [WAS: [PATCH xserver v2] xwayland: handle EAGAIN and EINTR gracefully]

Adam Jackson ajax at nwnk.net
Tue Sep 13 21:38:05 UTC 2016


On Tue, 2016-09-13 at 06:13 -0400, Olivier Fourdan wrote:

> I tried to poll() the Wayland fd with a timeout prior to
> wl_display_flush() to make sure to wl_display_flush() only when
> writable, to see if that would help unblocking mutter waiting for its
> PropertyNotify event but that did not work, the Wayland fd still
> remains in EAGAIN forever and gnome-shell/mutter remains stuck
> waiting for the PropertyNotify event...
> 
> I am a bit puzzled, why is gnome-shell/mutter/xcb waiting for the
> PropertyNotify, where is that event gone?

If I had to guess: it hasn't gone anywhere, because it hasn't been
generated yet. The request that would generate it is enqueued to
xserver, which hasn't processed it yet, because it's trying to flush
the wayland socket... you see where this is going.

The way that mutter tries to be both wayland server and X wm is sort of
fundamentally broken. When I complained to Owen about this, his opinion
was that wl_display_flush should just allocate and queue if writes
would return EAGAIN. My personal opinion is that mallocing your way out
of a deadlock is not, in fact, a fix. But it would probably work well
enough, and we could probably implement it entirely inside xserver if
libwayland-client didn't want to implement that feature (and I wouldn't
really blame them if they didn't).

Short of that, we have to consider xserver as a potentially very
aggressive wayland application, and keep it from generating so much
wayland protocol that we drown the compositor. A giant-sledgehammer
technique would be to set dispatchException |= DE_PRIORITYCHANGE in
wakeup_handler, which would force us into the block handler between
every X request (and thus flush out whatever just happened), so
_hopefully_ that would cap the amount of wl protocol we generate. It
would also force us to poll between every request, which should be a
hint to the scheduler to hand off our timeslice. It would /completely
suck/ for performance, but.

- ajax


More information about the xorg-devel mailing list