weird Xwayland and compositor deadlock issue [WAS: [PATCH xserver v2] xwayland: handle EAGAIN and EINTR gracefully]

Olivier Fourdan ofourdan at redhat.com
Tue Sep 13 10:13:16 UTC 2016


Hi all

----- Original Message -----
> wl_display_flush() can fail with EAGAIN and Xwayland would make this a
> fatal error.
> 
> Handle the usual EAGAIN and EINTR gracefully so that Xwayland doesn't
> die for so little.

Right, I am running out of ideas...

So the approach of using poll() to wait for the Wayland file descriptor to become writeable again leads straight to a deadlock apparently...

Reason for this is the compositor (gnome-shell/mutter) is itself waiting for data on the X file descriptor:

Backtrace of gnome-shell while we hit the EAGAIN case on the Wayland fd on the Xwayland side:

#0  0x00007f86d1cd400d in poll () at /lib64/libc.so.6
#1  0x00007f86d1537d10 in _xcb_conn_wait () at /lib64/libxcb.so.1
#2  0x00007f86d1539aa9 in xcb_wait_for_event () at /lib64/libxcb.so.1
#3  0x00007f86d21fe03b in _XReadEvents (dpy=dpy at entry=0x55f956633000) at xcb_io.c:401
#4  0x00007f86d21e562e in XIfEvent (dpy=0x55f956633000, event=0x7ffe30c28eb0, predicate=<find_timestamp_predicate>, arg=0x55f956761100)
    at IfEvent.c:68
#5  0x00007f86d8031ddb in meta_display_get_current_time_roundtrip () at /lib64/libmutter.so.0
#6  0x00007f86d805ac49 in handle_other_xevent () at /lib64/libmutter.so.0
#7  0x00007f86d805b95b in xevent_filter () at /lib64/libmutter.so.0
#8  0x00007f86d73b98f1 in gdk_event_apply_filters () at /lib64/libgdk-3.so.0
#9  0x00007f86d73b9cf2 in _gdk_x11_display_queue_events () at /lib64/libgdk-3.so.0
#10 0x00007f86d7380f19 in gdk_display_get_event () at /lib64/libgdk-3.so.0
#11 0x00007f86d73b9962 in gdk_event_source_dispatch () at /lib64/libgdk-3.so.0
#12 0x00007f86d37d0f22 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#13 0x00007f86d37d12a0 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#14 0x00007f86d37d15c2 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#15 0x00007f86d803c00c in meta_run () at /lib64/libmutter.so.0
#16 0x000055f953220657 in main ()

i.e gnome-shell is stuck in meta_display_get_current_time_roundtrip():

  https://git.gnome.org/browse/mutter/tree/src/core/display.c#n1300

While at the same time, Xwayland is trying to write to the Wayland file descriptor with wl_display_flush() and gets an EAGAIN in the block_handler():

  https://cgit.freedesktop.org/xorg/xserver/tree/hw/xwayland/xwayland.c?h=server-1.18-branch#n483

I tried to poll() the Wayland fd with a timeout prior to wl_display_flush() to make sure to wl_display_flush() only when writable, to see if that would help unblocking mutter waiting for its PropertyNotify event but that did not work, the Wayland fd still remains in EAGAIN forever and gnome-shell/mutter remains stuck waiting for the PropertyNotify event...

I am a bit puzzled, why is gnome-shell/mutter/xcb waiting for the PropertyNotify, where is that event gone?

Any ideas?

Thanks

Olivier


More information about the xorg-devel mailing list