Bug in module loading system on x86-64?

Kendall Bennett KendallB at scitechsoft.com
Fri Oct 1 11:46:32 PDT 2004


Hi All,

We accidentally found what appears to be a strange bug in the module 
loading system for the X server that might still be present in the latest 
versions (we were building with R6.7.0 at the time). Normally this 
problem would never surface in practice, but someone who knows more about 
the module loading system may want to look into the problem.

Anyway, here is what happens. We had a minor bug in a piece of code in 
our X module such that when we ported it from 32-bit to 64-bit on our 
AMD64 machine, our code incorrectly read a value from disk as 64-bit when 
it was supposed to be 32-bit. Then it tried to malloc a *huge* amount of 
memory (20G or more I think). This naturally failed because the machine 
onlyl has 512M of physical memory in it and the swap file is only about 
768M in size. Our code happily accepted the failure condition and just 
failed to access out monitor database (a non-fatal condition) but then 
the X server crashed badly when trying to load specific modules. What 
happened is that prior to the huge malloc() call, all modules that were 
being loaded (in fact all xalloc() calls) allocated memory very low in 
the address space. Once the huge malloc() call was attempted (but 
failed), glibc started allocating all new blocks way up in the virtual 
memory space for some reason (way above the 4G boundary).

Now normally this should not be a problem, but it caused major issues 
with the module loader, specifically when resolving symbols. When we 
tried to load either the DDC module or later the framebuffer module when 
we disabled the DDC module, none of the symbols were resolved correctly 
and as soon as tried to call any of the functions within the module the X 
server crashed (the pointers were basically bogus).

Once we fixed the huge malloc problem in our code, the X server started 
working as per normal. Normally the X server is never going to allocate 
that much memory to cause this condition, so this is not really a serious 
issue for production code. But it is still interesting and someone may 
want to look into it. We would look into it but we don't have time at the 
moment so I figured I should at least mention it on this list (maybe it 
can get added to Bugzilla)?

I am sure you could reproduce the problem with any existing driver module 
just by attempting to allocate a huge amount of memory prior to loading 
any modules, and then the next module symbol you try to call will crash.

Regards,

---
Kendall Bennett
Chief Executive Officer
SciTech Software, Inc.
Phone: (530) 894 8400
http://www.scitechsoft.com

~ SciTech SNAP - The future of device driver technology! ~





More information about the xorg mailing list