xf86OpenConsole: VT_WAITACTIVE failed: Interrupted system call

Mon Oct 8 12:50:49 PDT 2007

On Mon, 2007-10-08 at 10:30 -0700, Linus Torvalds wrote:
> 
> On Sun, 7 Oct 2007, Keith Packard wrote:

> I've dug some more, and my traces never show any "VT_RELDISP" happening, 
> which is what I would expect if the old server was involved in this whole 
> signalling thing and said "it's ok to switch now, I'm releasing my VT 
> usage".

On exit, the server does:

	ioctl (console_fd, VT_GETSTATE, &vts);
	ioctl (console_fd, KDSETMODE, KD_TEXT);
	ioctl (console_fd, VT_GETMODE, &VT);
	VT.mode = VT_AUTO;
	ioctl (console_fd, VT_SETMODE, &VT);
	ioctl (console_fd, VT_ACTIVATE, activeVT);
	ioctl (console_fd, VT_WAITACTIVE, activeVT);
	close (console_fd);

Why it uses WAITACTIVE is beyond me, but there you go.

> However, I'd also expect the Fedora startup scripts to not even *start*up* 
> the new X server until the old one has exited. 

I'll bet it kills one and starts the new one without waiting.

> I'm wondering if perhaps there is some forking going on, where the X 
> server has forked off another thing, and the original process exits before 
> the VT has been fully released - causing the startup scripts to start the 
> next X server even before the consoles are all sorted out?

Nope. The X server doesn't fork at startup (it does fork/exec a helper
program for XKB data, but the original process is the one which does all
of the X server work).

> I've got some more traces of a failing situation. The original X server 
> was pid 1126, and the first X server startup looks like this:
> 
> 	set_console (1126): want_console=7
> 	console_callback 1 (6): want_console=7
> 	6: change_console
> 	6 complete_change_console 1
> 	console_callback 2 (6): want_console=-1
> 
> (The "6" above is the pid of the kernel event daemon - which does the 
> actual switch itself for the simple reason that some switch requesters 
> cannot do the switch on their own - namely the keyboard interrupt 
> handler).
> 
> However, the second (failing) X server (pid 1993) startup ends up having 
> this trace:
> 
> 	1993 reset_vc
> 	set_console (1993): want_console=6
> 	console_callback 1 (6): want_console=6
> 	6: change_console
> 	console_callback 2 (6): want_console=-1
> 	set_console (1126): want_console=0
> 	console_callback 1 (6): want_console=0
> 	6: change_console
> 	6 complete_change_console
> 	console_callback 2 (6): want_console=-1
> 
> and here we see two thigns:
>  - the second X invocation didn't use console 7, apparently because it was
>    still in use
>  - it actually *did* switch to console 6, but look who came in and did a 
>    "want_console=0" _afterwards_! Yeah, our old buddy pid 1126 - the OLD X 
>    server!

As I guessed, the old server switches back while the new server is
starting up. Because the old server isn't looking for VT switch requests
during shutdown, the window here is fairly long.

> So I think this does show:
> 
>  - it doesn't look like a kernel race. I think we do serialize things 
>    sufficiently in the kernel, but we cannot protect ourselves from 
>    processes then doing things in the wrong order.

Unless we believe that a pending VT_ACTIVATE/VT_WAITACTIVE should
somehow magically keep a subsequent VT_ACTIVATE from switching to the
'wrong' console. This doesn't seem practical though.

>  - it doesn't *always* fail. The exact same kernel sometimes gets this 
>    trace for the second startup:
> 
> 	set_console (1124): want_console=0
> 	console_callback 1 (6): want_console=0
> 	6: change_console
> 	6 complete_change_console
> 	console_callback 2 (6): want_console=-1
> 	1994 reset_vc
> 	set_console (1994): want_console=6
> 	console_callback 1 (6): want_console=6
> 	6: change_console
> 	6 complete_change_console
> 	console_callback 2 (6): want_console=-1
> 
>    ie here the old server (1124 - don't ask me why the pid's sometimes 
>    change) got in earlier and did its want_console=0 before the new X 
>    server started up.

Yup, if the old server exits before the new server attempts to flip
consoles, things will 'just work'.

> In other words, I think the problem is somehow at *exit* time. The old 
> server does its "I want to go back to the original console" (reasonable), 
> but it does so after the new X server has already started up, and if the 
> new X server is fast enough, it will have already picked its console and 
> tried to switch to it - so the exit of the old one screws it up (because 
> it did actually switch to it, but then switched back!)

So, do we just have the server poll at startup time instead of using
VT_WAITACTIVE? Alternatively, could the *old* server detect that the new
server had requested a VT switch and flip to that one instead of the
original?

VT_WAITACTIVE seems a lot less useful because of this behaviour.

-- 
keith.packard at intel.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.x.org/archives/xorg/attachments/20071008/d1252163/attachment.pgp>