weird Xwayland and compositor deadlock issue [WAS: [PATCH xserver v2] xwayland: handle EAGAIN and EINTR gracefully]

Pekka Paalanen ppaalanen at gmail.com
Mon Sep 19 08:54:55 UTC 2016


On Fri, 16 Sep 2016 22:21:42 -0700
"Jasper St. Pierre" <jstpierre at mecheye.net> wrote:

> Hi,
> 
> Based on my reading of the spec, writing an ICCCM-compliant WM *requires*
> blocking, since the behavior of an UnmapNotify depends on the attributes of
> a window. We cannot process any X11 events while we are retrieving the
> attributes of a mapped window inside MapRequest.

Hi,

but the blocking is limited only to the X11 connection, right?

One has to be able to handle X11 vs. Wayland concurrency in any case,
so in theory one should be able to keep on processing Wayland
connections while keeping the X11 connection "blocked" until the
expected event.

I suppose this idea is moot however, as implementing that is not
practically possible, is it? Far too many simple statement sequences in
existing code would need converting into state machines to be able to
return back to the main loop while waiting for X11?


Thanks,
pq

> If we want to modify the X11 protocol to provide non-blocking events to
> provide e.g. attributes in MapRequest, values in PropertyNotify, and shapes
> in ShapeNotify (the three major cases of required blocking right now), I'd
> be for it.
> 
> Focus management is extremely complex and subtle. Reading back on the
> history:
> 
> https://bugzilla.gnome.org/show_bug.cgi?id=701017
> https://bugzilla.gnome.org/show_bug.cgi?id=720558
> 
> The first patch was overly complex -- the XChangeProperty to bump the
> serial could have simply been a XNoOp to bump the serial while under server
> grab. :) We could even make that cleanup now. But it would be a minor
> simplification.
> 
> Daniel suggested that timestamps *should* be on the same timebase.
> Currently, they are not. X11 server timestamps are
> CLOCK_MONOTONIC_COARSE-based and are calculated at delivery time, evdev
> timestamps are CLOCK_MONOTONIC-based and are calculated at input time. This
> is why there are several focus management bugs that happen when you replace
> meta_display_get_current_time_roundtrip() with a clock_gettime().
> 
> We need to fix this, otherwise we can never properly synchronize X11 event
> streams and Wayland event streams. But Xorg calls GetCurrentTimeMillis()
> literally everywhere and compares against that instead of using evdev's own
> timestamps, and I doubt we can fix that without breaking multiple, multiple
> clients.
> 
> The only thing I can think of for that is, again, the Wayland-in-X11
> solution: an X11 extension that delivers the timestamp with every response
> and event from the server so we don't block on a PropertyChange for it.
> 
> 
> On Wed, Sep 14, 2016 at 12:56 AM, Pekka Paalanen <ppaalanen at gmail.com>
> wrote:
> 
> > On Tue, 13 Sep 2016 12:04:14 -0400 (EDT)
> > Olivier Fourdan <ofourdan at redhat.com> wrote:
> >  
> > > Hi Pekka,
> > >
> > > ----- Original Message -----  
> > > > Hi Olivier,
> > > >
> > > > I don't have any solution for you. The interactions between the Wayland
> > > > compositor and Xwayland are known to be very easily deadlockable IIRC.  
> > I  
> > > > believe the only thing you can do is ensure no such case can ever
> > > > occur, which is very painful. That is, never do a blocking roundtrip at
> > > > least from one side.
> > > >
> > > > Have the recent modifications caused a significant increase of Wayland
> > > > requests from Xwayland? If Xwayland needs to send an amount of data
> > > > bigger than bufferable, *any* blocking roundtrip via X11 from the
> > > > Wayland compositor is prone to deadlock. It will be waiting for a reply
> > > > via X11, while Xwayland is blocked on flushing, since the Wayland
> > > > compositor is not consuming requests.
> > > >
> > > > It can also trivially happen if both sides do a blocking roundtrip at
> > > > the same time. Or just a wait for an event.
> > > >
> > > > Either server needs to be able to return to its main loop to process  
> > the  
> > > > protocol stream it is the server for. Preferably both, I think.  
> > >
> > > Unfortunately, any XSync (like, for example, called in
> > > gdk_error_trap_pop() in gdk) will issue a blocking roundtrip, and
> > > window managers tend to do that quite a lot (some more than others)
> > > so I don't think we can easily chaneg that in window managers to
> > > avoid blocking rountrips on X11 side.
> > >  
> > > > You could check how Weston's XWM works. I highly suspect that after
> > > > Xwayland launch it avoids doing any blocking roundtrips via X11.  
> > >
> > > Yet sometimes some X calls are blocking, e.g. XShapeGetRectangles()
> > > or even XGetWindowAttributes() which is invoked by mutter each time
> > > the a new window is mapped. mutter still uses Xlib and not xcb.
> > >  
> > > > I'd assume Xwayland also tries to avoid blocking on Wayland events,
> > > > but if nothing else, I believe Mesa via GLAMOR may block on
> > > > wl_buffer.release events... or maybe not if GLAMOR is smart with its
> > > > throttling. Anyway, since your flush is hitting EAGAIN, that doesn't
> > > > seem to be the cause.
> > > >
> > > > I wonder if making wl_display_flush() block immediately like in your
> > > > patch could be replaced by adding the wl_display fd to the main poll
> > > > loop, so that it would get flushed ASAP but still service X11
> > > > requests in the mean time? It does run the risk of overflowing the
> > > > Wayland send buffer in Xwayland. Any way to prioritize the Wayland
> > > > compositor's X11 connection in Xwayland?  
> > >
> > > If I don't make EAGAIN a FatalError() and wait for the Wayland
> > > display file descriptor to become writable again, Xwayland eventually
> > > dies with another error "(EE) request could not be marshaled: can't
> > > send file descriptor" from libwayland directly (in
> > > copy_fds_to_connection()).  
> >
> > Hi,
> >
> > summarizing from #wayland irc between Olivier and Daniel: the proper
> > solution is indeed to never do blocking X11 roundtrips from the Wayland
> > compositor, but for practical reasons that might not be possible.
> >
> > The irc log starts here:
> > https://people.freedesktop.org/~cbrill/dri-log/index.php?
> > channel=wayland&highlight_names=&date=2016-09-13#t-1402
> >
> >
> > Thanks,
> > pq
> >
> > _______________________________________________
> > xorg-devel at lists.x.org: X.Org development
> > Archives: http://lists.x.org/archives/xorg-devel
> > Info: https://lists.x.org/mailman/listinfo/xorg-devel
> >  
> 
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 811 bytes
Desc: OpenPGP digital signature
URL: <https://lists.x.org/archives/xorg-devel/attachments/20160919/6d1e98d5/attachment.sig>


More information about the xorg-devel mailing list