PCI Subsystem Rework for X.org 7.1 Proposal

Fri Feb 3 11:28:18 PST 2006

Jesse Barnes writes:
 > [Content taken from http://wiki.x.org/wiki/PciReworkProposal and replied 
 > to via email to get a bit more exposure/comment.]
 > 
 > > It is fairly common knowledge in the X.org developer community that
 > > the PCI handling code in X.org 7.0 (and earlier) is a big, ugly mess.
 > > The code is complex, understood by very few developers, and does
 > > things that only the kernel should do.
 > 
 > And buggy.  It can easily hose your PCI bus configuration if your 
 > configuration wasn't forseen by the various developers who have touched 
 > the X PCI stuff in the past (e.g. it shouldn't ever overwrite BAR values 
 > unless the OS has *no* PCI support).

Some bugginess comes from the fact that we try to determine the topology
of the PCI system as this is required for VGA routing. When the code
was designed we assumed that there is a bridge to every PCI bus that's
visible in the config space. Later on we learned that H2PCI bridges
aren't always visible - although oddly they were on i386 systems.

Another problem arises from the fact that we still use the old 
separation code that was desinged for direct config space access. 
This code should only be used in a backend where necessary.

 > 
 > > In fact, most of the existing 
 > > code originates from a time before most kernels implemented the
 > > required functionality. Since that time, kernels have greatly expanded
 > > the functionality provided to user space for probing and accessing PCI
 > > devices. The PCI bus has also changed (e.g., multiple PCI domains,
 > > complex PCI-PCI bridges, AGP, and PCI-Express). This has required that

Yes, PCI domains were added later - and the way this was done was not
really clean. 
I don't know if we should do away with the old fashioned access method
completely. I think this code should be shoved to a backend and only
pulled out when nothing else works.

 > > the code to support probing and accessing PCI devices has also needed
 > > to change. Unfortuantely these changes tend to be platform specific.
 > > Certain features, such as multiple domains, are only supported on
 > > certain platforms by X.org. These features tend to be supported
 > > universally by the kernels on those platforms. Rather than duplicating
 > > the efforts of kernel developers, X.org needs to use the interfaces
 > > provided by the kernel as much as possible. It is currently unclear as
 > > to whether X.org still needs to support any platforms with kernels
 > > that do not export this functionality to user mode.

Domain support can only be implemented with kernel support. 
All this is *very* platform dependent.
Apart from this I would provide the code that can provide some limited
PCI functionality without kernel support to a backend.

 > 
 > That's a good question.  Are there any OSes with active X maintainers 
 > that *don't* have a good PCI API?  And if so, this theoretical PCI 
 > interface you're implementing could be ported with an appropriate back 
 > end.

Well, there are systems still running old kernels. 
This may be a niche situation but killing right from the beginning
may be a little harsh.

 > 
 > > Required Functionalty
 > >  There are seven broad pieces of functionality that X.org needs to
 > > work with devices on the PCI bus. These are:
 > >  Get a list of all devices matching some criteria.
 > >  Reading the device's expansion ROM. 
 > >  Accessing the device's IO ports.
 > >  Accessing the device's memory regions (BARs).

Only recent kernels have support for this thru their PCI interface.
I assume we can wrap this and add our own interface when kernels 
don't have this.
And if you look at older Alpha's (pre EV6) the situation is really grim.

 > >  Reading the device's capabilities (i.e., determine if the device is
 > >   AGP).

Reading (and even writing) PCI config space beyond device's capabilities 
is essential.
BIOS POST code requires it. The int10 code needs to catch config space
accesses (it does so alrady if I recall correctly) and go thru a function
that accesses config space.

 > >  Power management. 
 > >  VGA arbitration.

 > 
 > I guess memory management really is a separate issue, but it depends on 
 > whether the card can do AGP, how it accesses system memory, etc.

For memory management you just need to know an address region and
some properties that describe what this region is for. This can
sit on top of the PCI subsystem.

 > 
 > One thing that X has done poorly in the past (and at least some kernels 
 > haven't helped much with their poor interfaces) is map memory with the 
 > appropriate attributes.  The current sysfs mapping interface in Linux 
 > assumes the user wants to do an uncacheable mapping, but in the case of 
 > frame buffers and some other types of memory, you probably want to map 
 > it with write combining or caching somehow.  I'm not sure if this should 
 > happen automatically in some cases (e.g. a special pci_map_framebuffer 
 > or pci_map_cacheable call) or if the mapping routine should take an 
 > arbitrary flags argument (which could be abused).

We do flagging at the moment which gives us some more flexibility.
Any automatic system will not do what you want at some point and
then people start to kludge around it.

 > 
 > > In the best case scenario, 
 > > nearly all of this functionality is trivially provided by Linux's
 > > sysfs interface. A certain amount of this functionality is also
 > > provided by the libpci.a library in the pciutils package. The missing
 > > functionality and license issues (libpci.a is GPL) prevent use of
 > > libpci.a from being a viable option.

For a consierably modern Linux the situation is really simple.

 > 
 > The big missing pieces seem to be:
 >   o VGA arbitration - we (the kernel folks) really need to tackle this
 >     properly

This is a big problem. I can see why one wants it in the kernel
still there are arguments why the graphics subsystem should be
heavily involved.

 >   o mapping memory with different attributes - benh and I discussed some
 >     ideas awhile back, but no code ever came of it (did we settle on
 >     thinking that an madvise(2) addition was best?)
 > 
 > > Proposed Implementation 
 > >  The current proposal is to implement a new library that implements
 > > the require functionality in a generic way. The existing code for
 > > accessing PCI devices would then be removed, in its entirety, from the
 > > X-server and the drivers. At this time the X-server and drivers would
 > > be ported to the new interface. The remainder of this section
 > > describes the interface to the proposed new library. The interface
 > > consists of two primary data types. These structures are intentionally
 > > similar to structures found in libpci.a, but there are some
 > > significant differences.
 > 
 > You might also check out the way Linux does it internally.  See 
 > Documentation/pci.txt in the Linux kernel source tree for some 
 > description of that API (Linux Device Drivers, 3rd Edition also has good 
 > info about it).  It's been through a few changes over the years, but 
 > seems to have settled down into a mature, usable interface (in 
 > particular it's moved from an iterator style interface to a more opaque 
 > register-my-driver-it-handles-these-cards type interface, which we may 
 > want to emulate).

The problem here is that we may need to get an overview over the PCI device
topology - not a single device. From the driver perspective this is different.
There you only want to register a single device - although this view may
prove to be too simplistic as some devices appear under 2 IDs.

 > 
 > > pci_system is the analog of pci_access. However, pci_system is a
 > > completely opaque data type. 
 > 
 > > pci_device is the analog of pci_dev.
 > 
 > > pci_system 
 > >  Access to the PCI system is obtained by calling pci_system_create.
 > 
 > Random naming thoughts: pci_system_init instead?  Or just pci_init, 
 > though that assumes that the 'pci' namespace is where we want these 
 > calls; anyway somehow pci_system seems redundant...
 > 
 > > This function returns a pointer to a fully initialized pci_system
 > > structure. This structure can then be used to query lists of PCI
 > > devices found in the system. This structure is then eventually
 > > destroyed by calling pci_system_destroy. ISSUE: When the pci_system
 > > structure is destroyed, should all of the other data structures (e.g.,
 > > pci_device, pci_agp_info, etc.) also be destroyed? This seems like the
 > > logical choice.

What about hotplug? If we destry the entire structure pieces that
have pointers to some part may barf.

 > 
 > Yeah, that seems about right, especially if pci_system_create is 
 > allocating memory for all of them.  It should clean everything up on 
 > pci_system_destroy() or pci_system_cleanup().
 > 
 > > struct pci_device_iterator * pci_device_iterate(struct pci_system
 > > *pci_sys, const char *regex ); 
 > >
 > > The set of devices to be iterated is selected by the regular
 > > expression passed to pci_device_iterate.
 > 
 > > ISSUE: This interface seems fairly complex. Is it the right chioce? It
 > > is easly to specify and easy to implement. One disadvantage is that
 > > it's impossible to know in advance whether or not a particular 
 > > expression needs the extended device information (i.e., anything other
 > > than the device's domain:bus:slot.function" string).

Why do we need regexps? All values are numeric. If we can find a way
to specify a wildcard we don't have to go thry regexps. 

 > 
 > For some reason, using a regex here scares me a little, but maybe that's 
 > just because I'm not used to it.  Anyway, check out the Linux docs and 
 > see if you prefer that interface, I personally find it easier to read 
 > (things like PCI_ANY_ID for various subfields when doing matching seems 
 > a little more straightforward than a regex).

Yes.

 > 
 > > ISSUE: The only thing not included 
 > > in the magic extended bus ID string is the device's interrupt. Should
 > > it be possible to match based on that? X doesn't currently require
 > > that.

Hm, X doesn't really use the interrupt. But is the question 'which device
uses interrupt X?' one that is expected?
 > 
 > Doesn't seem like that would be necessary, generally you just want to 
 > match on vendor and device IDs, don't you?  The other information is 
 > needed later and can be had from the pci_device structure returned to 
 > you.

Yeah.
 > 
 > > The pci_device structure contains all of the expected fields and is
 > > very similar to libpci.a's pci_dev structure. Some fields that are
 > > important to X (e.g., subvendor_id) have been added, and some fields
 > > that are unnecessary (e.g., rom_base_addr) have been removed.
 > 
 > > struct pci_mem_region {
 > >     void * memory;
 > >     pciaddr_t bus_addr;
 > >     pciaddr_t base_addr;
 > >     pciaddr_t size;
 > > };
 > 
 > Hm, I think the bus_addr and base_addr should be hidden from the 
 > interface, maybe in a void *platform_data structure or something, since 
 > they're generally only used by low level platform code (or am I wrong 
 > here, are there cases where the driver needs this info?).  I've seen way 
 > too many people mistake the bus address for something else.  Which 
 > reminds me, sparse annotation of the various address space pointers in X 
 > might be a good idea at some point (ala the Linux kernel's __user 
 > attributes)...

Currently the bases and sizes are used in common code. 

Cheers,
	Egbert.