Feedback needed on X wrongly reassigning same display when using '-displayfd'

Thu Jul 8 10:41:59 UTC 2021

Hello,

First of all, a bit of context. The main problem I'm trying to solve is Xorg 
reassigning an in-use display repeatedly when launched with '-displayfd' and 
no explicit display, such as SDDM does. I'm experiencing this with Xorg 
launched by SDDM on FreeBSD.

In this situation, the display selection logic in 'os/
connection.c:CreateWellKnownSockets()' finally calls into 
'MakeAllCOTSServerListeners' in libxtrans. This code tries to create endpoints 
for each transport available, and in particular UNIX sockets. To this end, it 
calls 'CreateListener' for each transport, and in particular 
'SocketUNIXCreateListener', which unconditionally unlinks an existing socket, 
causing the problem. This unlink indeed always works when the current process' 
user is the same as that of the process that created the socket (or in any 
case if running as root and not otherwise restricted), even if the latter 
process still exists.

The server lock file mechanism cannot currently prevent this problem since it 
is disabled as soon as the '-displayfd' argument is passed.

The problem is currently not apparent on Linux where there is support for 
"abstract" sockets, with automatic cleaning up of names of unused sockets (so 
there is no need to call unlink(2)). However, using them in libxtrans seems to 
be a security issue (see MR !7). If support for them is in the end removed, 
then the above problem will occur on Linux as well (again).

I have reported months ago several issues stemming from or only vaguely 
related to this problem. The most directly relevant is libxtrans' issue #4: 
"`-displayfd` always select display :0 even if in use when Xlaunched as root". 
The associated MR is !8, which just removes the call to unlink(2). I think 
this change should not have any user-noticeable side-effect in most cases (the 
server destroys the UNIX socket files at normal or abnormal termination 
(AbortServer) through CloseWellKnownConnections; socket files should also not 
survive a system crash, /tmp being generally cleaned up properly at boot; some 
sockets would stay "stale" only in case of, e.g., a SIGKILL, but even in this 
case, the automatic display selection would allow a new server to start, 
albeit with a different display). This is the solution I was initially 
favoring.

Some more complex fix was proposed 2 years ago, in libxtrans' MR !3. 
Essentially, this MR introduces a lock file in this case, which has one 
advantage: To be able to detect whether there is indeed another process 
running using the socket, and thus enabling to break an existing stale socket. 
I do not see any drawbacks in doing so (contrary to what happens with the 
xserver's lock file, see comments on issue #1138 below). However, if this MR 
is to be considered for inclusion, then I'll comment on it with some changes, 
because in the currently proposed code there is still a race between lock 
file's creation and locking (it can be bridged on *BSD; on Linux I don't think 
it can a priori, to be checked more thoroughly).

There is another existing mechanism that could be leveraged to fix display re-
use: The server's lock file. As said above, no server lock is currently 
established if '-displayfd' is passed. In xserver's issue #1139 ("os/utils.c: 
Create lock file on explicit display, even on '-displayfd'") and the 
accompanying MR !614, I'm merely proposing that the lock file is created if an 
explicit display is passed, even if '-displayfd' is also, just for coherency 
with the "raw" explicit display case (i.e., no '-displayfd'). As hinted for in 
the MR, this could be extended to cover also automatic display selection. I 
haven't worked on this alternative since then, because I would like feedback 
before investing time in doing so (not sure the changes will be trivial this 
time; and it seems best to dispense with the server's lock in the end).

I have two main questions:
1. How the problem should be fixed? Is libxtrans' #4 alone acceptable to you, 
or you think having a lock mechanism is really better?
2. Should the problem be fixed in libxtrans (I tend to think so) or in 
xserver? And, as a bonus question, if the answer is libxtrans, shouldn't the 
xserver's lock file simply be removed altogether?

For the record, I proposed in xserver's issue #1138 (accompanying MR !612) 
simplifications of 'LockServer()', along with general hardening of the 
handling of abnormal situations. The current mechanism, contrary to the 
proposal for libxtrans in MR !3 (with appropriate fix), is inherently racy 
because the PID file is linked into place and never locked. Two servers 
starting at approximately the same time can concurrently unlink and establish 
a new one, resulting in both believing they own it. MR !612 proposes to stop 
trying to break an existing lock file and instead simply report to the user as 
much as possible what may be going on. Doing so allows to simplify the code by 
removing the in-place link phase. Now, all this may be obsolete depending on 
the answers to the main questions above.

I'm hopefully waiting for your thoughts and comments. I'll rebase and update 
the existing MRs and/or propose new ones as appropriate depending on your 
feedback.

Thanks and regards.

-- 
Olivier Certner