Feedback needed on X wrongly reassigning same display when using '-displayfd'
Olivier Certner
olce.freedesktop at certner.fr
Thu Jul 8 10:41:59 UTC 2021
Hello,
First of all, a bit of context. The main problem I'm trying to solve is Xorg
reassigning an in-use display repeatedly when launched with '-displayfd' and
no explicit display, such as SDDM does. I'm experiencing this with Xorg
launched by SDDM on FreeBSD.
In this situation, the display selection logic in 'os/
connection.c:CreateWellKnownSockets()' finally calls into
'MakeAllCOTSServerListeners' in libxtrans. This code tries to create endpoints
for each transport available, and in particular UNIX sockets. To this end, it
calls 'CreateListener' for each transport, and in particular
'SocketUNIXCreateListener', which unconditionally unlinks an existing socket,
causing the problem. This unlink indeed always works when the current process'
user is the same as that of the process that created the socket (or in any
case if running as root and not otherwise restricted), even if the latter
process still exists.
The server lock file mechanism cannot currently prevent this problem since it
is disabled as soon as the '-displayfd' argument is passed.
The problem is currently not apparent on Linux where there is support for
"abstract" sockets, with automatic cleaning up of names of unused sockets (so
there is no need to call unlink(2)). However, using them in libxtrans seems to
be a security issue (see MR !7). If support for them is in the end removed,
then the above problem will occur on Linux as well (again).
I have reported months ago several issues stemming from or only vaguely
related to this problem. The most directly relevant is libxtrans' issue #4:
"`-displayfd` always select display :0 even if in use when Xlaunched as root".
The associated MR is !8, which just removes the call to unlink(2). I think
this change should not have any user-noticeable side-effect in most cases (the
server destroys the UNIX socket files at normal or abnormal termination
(AbortServer) through CloseWellKnownConnections; socket files should also not
survive a system crash, /tmp being generally cleaned up properly at boot; some
sockets would stay "stale" only in case of, e.g., a SIGKILL, but even in this
case, the automatic display selection would allow a new server to start,
albeit with a different display). This is the solution I was initially
favoring.
Some more complex fix was proposed 2 years ago, in libxtrans' MR !3.
Essentially, this MR introduces a lock file in this case, which has one
advantage: To be able to detect whether there is indeed another process
running using the socket, and thus enabling to break an existing stale socket.
I do not see any drawbacks in doing so (contrary to what happens with the
xserver's lock file, see comments on issue #1138 below). However, if this MR
is to be considered for inclusion, then I'll comment on it with some changes,
because in the currently proposed code there is still a race between lock
file's creation and locking (it can be bridged on *BSD; on Linux I don't think
it can a priori, to be checked more thoroughly).
There is another existing mechanism that could be leveraged to fix display re-
use: The server's lock file. As said above, no server lock is currently
established if '-displayfd' is passed. In xserver's issue #1139 ("os/utils.c:
Create lock file on explicit display, even on '-displayfd'") and the
accompanying MR !614, I'm merely proposing that the lock file is created if an
explicit display is passed, even if '-displayfd' is also, just for coherency
with the "raw" explicit display case (i.e., no '-displayfd'). As hinted for in
the MR, this could be extended to cover also automatic display selection. I
haven't worked on this alternative since then, because I would like feedback
before investing time in doing so (not sure the changes will be trivial this
time; and it seems best to dispense with the server's lock in the end).
I have two main questions:
1. How the problem should be fixed? Is libxtrans' #4 alone acceptable to you,
or you think having a lock mechanism is really better?
2. Should the problem be fixed in libxtrans (I tend to think so) or in
xserver? And, as a bonus question, if the answer is libxtrans, shouldn't the
xserver's lock file simply be removed altogether?
For the record, I proposed in xserver's issue #1138 (accompanying MR !612)
simplifications of 'LockServer()', along with general hardening of the
handling of abnormal situations. The current mechanism, contrary to the
proposal for libxtrans in MR !3 (with appropriate fix), is inherently racy
because the PID file is linked into place and never locked. Two servers
starting at approximately the same time can concurrently unlink and establish
a new one, resulting in both believing they own it. MR !612 proposes to stop
trying to break an existing lock file and instead simply report to the user as
much as possible what may be going on. Doing so allows to simplify the code by
removing the in-place link phase. Now, all this may be obsolete depending on
the answers to the main questions above.
I'm hopefully waiting for your thoughts and comments. I'll rebase and update
the existing MRs and/or propose new ones as appropriate depending on your
feedback.
Thanks and regards.
--
Olivier Certner
More information about the xorg-devel
mailing list