PCI Subsystem Rework for X.org 7.1 Proposal

Jesse Barnes jbarnes at virtuousgeek.org
Thu Feb 2 16:31:13 PST 2006


[Content taken from http://wiki.x.org/wiki/PciReworkProposal and replied 
to via email to get a bit more exposure/comment.]

> It is fairly common knowledge in the X.org developer community that
> the PCI handling code in X.org 7.0 (and earlier) is a big, ugly mess.
> The code is complex, understood by very few developers, and does
> things that only the kernel should do.

And buggy.  It can easily hose your PCI bus configuration if your 
configuration wasn't forseen by the various developers who have touched 
the X PCI stuff in the past (e.g. it shouldn't ever overwrite BAR values 
unless the OS has *no* PCI support).

> In fact, most of the existing 
> code originates from a time before most kernels implemented the
> required functionality. Since that time, kernels have greatly expanded
> the functionality provided to user space for probing and accessing PCI
> devices. The PCI bus has also changed (e.g., multiple PCI domains,
> complex PCI-PCI bridges, AGP, and PCI-Express). This has required that
> the code to support probing and accessing PCI devices has also needed
> to change. Unfortuantely these changes tend to be platform specific.
> Certain features, such as multiple domains, are only supported on
> certain platforms by X.org. These features tend to be supported
> universally by the kernels on those platforms. Rather than duplicating
> the efforts of kernel developers, X.org needs to use the interfaces
> provided by the kernel as much as possible. It is currently unclear as
> to whether X.org still needs to support any platforms with kernels
> that do not export this functionality to user mode.

That's a good question.  Are there any OSes with active X maintainers 
that *don't* have a good PCI API?  And if so, this theoretical PCI 
interface you're implementing could be ported with an appropriate back 
end.

> Required Functionalty
>  There are seven broad pieces of functionality that X.org needs to
> work with devices on the PCI bus. These are:
>  Get a list of all devices matching some criteria.
>  Reading the device's expansion ROM. 
>  Accessing the device's IO ports.
>  Accessing the device's memory regions (BARs).
>  Reading the device's capabilities (i.e., determine if the device is
>   AGP).
>  Power management. 
>  VGA arbitration.

I guess memory management really is a separate issue, but it depends on 
whether the card can do AGP, how it accesses system memory, etc.

One thing that X has done poorly in the past (and at least some kernels 
haven't helped much with their poor interfaces) is map memory with the 
appropriate attributes.  The current sysfs mapping interface in Linux 
assumes the user wants to do an uncacheable mapping, but in the case of 
frame buffers and some other types of memory, you probably want to map 
it with write combining or caching somehow.  I'm not sure if this should 
happen automatically in some cases (e.g. a special pci_map_framebuffer 
or pci_map_cacheable call) or if the mapping routine should take an 
arbitrary flags argument (which could be abused).

> In the best case scenario, 
> nearly all of this functionality is trivially provided by Linux's
> sysfs interface. A certain amount of this functionality is also
> provided by the libpci.a library in the pciutils package. The missing
> functionality and license issues (libpci.a is GPL) prevent use of
> libpci.a from being a viable option.

The big missing pieces seem to be:
  o VGA arbitration - we (the kernel folks) really need to tackle this
    properly
  o mapping memory with different attributes - benh and I discussed some
    ideas awhile back, but no code ever came of it (did we settle on
    thinking that an madvise(2) addition was best?)

> Proposed Implementation 
>  The current proposal is to implement a new library that implements
> the require functionality in a generic way. The existing code for
> accessing PCI devices would then be removed, in its entirety, from the
> X-server and the drivers. At this time the X-server and drivers would
> be ported to the new interface. The remainder of this section
> describes the interface to the proposed new library. The interface
> consists of two primary data types. These structures are intentionally
> similar to structures found in libpci.a, but there are some
> significant differences.

You might also check out the way Linux does it internally.  See 
Documentation/pci.txt in the Linux kernel source tree for some 
description of that API (Linux Device Drivers, 3rd Edition also has good 
info about it).  It's been through a few changes over the years, but 
seems to have settled down into a mature, usable interface (in 
particular it's moved from an iterator style interface to a more opaque 
register-my-driver-it-handles-these-cards type interface, which we may 
want to emulate).

> pci_system is the analog of pci_access. However, pci_system is a
> completely opaque data type. 

> pci_device is the analog of pci_dev.

> pci_system 
>  Access to the PCI system is obtained by calling pci_system_create.

Random naming thoughts: pci_system_init instead?  Or just pci_init, 
though that assumes that the 'pci' namespace is where we want these 
calls; anyway somehow pci_system seems redundant...

> This function returns a pointer to a fully initialized pci_system
> structure. This structure can then be used to query lists of PCI
> devices found in the system. This structure is then eventually
> destroyed by calling pci_system_destroy. ISSUE: When the pci_system
> structure is destroyed, should all of the other data structures (e.g.,
> pci_device, pci_agp_info, etc.) also be destroyed? This seems like the
> logical choice.

Yeah, that seems about right, especially if pci_system_create is 
allocating memory for all of them.  It should clean everything up on 
pci_system_destroy() or pci_system_cleanup().

> struct pci_device_iterator * pci_device_iterate(struct pci_system
> *pci_sys, const char *regex ); 
>
> The set of devices to be iterated is selected by the regular
> expression passed to pci_device_iterate.

> ISSUE: This interface seems fairly complex. Is it the right chioce? It
> is easly to specify and easy to implement. One disadvantage is that
> it's impossible to know in advance whether or not a particular 
> expression needs the extended device information (i.e., anything other
> than the device's domain:bus:slot.function" string).

For some reason, using a regex here scares me a little, but maybe that's 
just because I'm not used to it.  Anyway, check out the Linux docs and 
see if you prefer that interface, I personally find it easier to read 
(things like PCI_ANY_ID for various subfields when doing matching seems 
a little more straightforward than a regex).

> ISSUE: The only thing not included 
> in the magic extended bus ID string is the device's interrupt. Should
> it be possible to match based on that? X doesn't currently require
> that.

Doesn't seem like that would be necessary, generally you just want to 
match on vendor and device IDs, don't you?  The other information is 
needed later and can be had from the pci_device structure returned to 
you.

> The pci_device structure contains all of the expected fields and is
> very similar to libpci.a's pci_dev structure. Some fields that are
> important to X (e.g., subvendor_id) have been added, and some fields
> that are unnecessary (e.g., rom_base_addr) have been removed.

> struct pci_mem_region {
>     void * memory;
>     pciaddr_t bus_addr;
>     pciaddr_t base_addr;
>     pciaddr_t size;
> };

Hm, I think the bus_addr and base_addr should be hidden from the 
interface, maybe in a void *platform_data structure or something, since 
they're generally only used by low level platform code (or am I wrong 
here, are there cases where the driver needs this info?).  I've seen way 
too many people mistake the bus address for something else.  Which 
reminds me, sparse annotation of the various address space pointers in X 
might be a good idea at some point (ala the Linux kernel's __user 
attributes)...

> struct pci_device {
>     uint16_t    domain;
>     uint8_t     bus;
>     uint8_t     dev;
>     uint8_t     func;
>
>     uint16_t    vendor_id;
>     uint16_t    device_id;
>     uint16_t    subvendor_id;
>     uint16_t    subdevice_id;
>
>     uint16_t    device_class;

Does this have to account for the base class, the sub class and the 
programming interface (3 bytes)?  I'd have to dig out my PCI spec...

>     struct pci_mem_region regions[6];
>
>     pciaddr_t   rom_size;

Should the rom_size really be an 'int' or uint32_t instead?  Maybe this 
is just a cut & paste error from when it used to be rom_addr?

>     int irq;
>
>     void * user_data;

Maybe this should be called 'driver_private' so that drivers know it 
belongs to just them?  Also, it might be necessary to have a 
'platform_data' for the low level code to track things with...

> };
>
>  Once a pointer to a device has been obtained, memory regions of the
> device can be mapped via pci_device_map_region. Mapped regions can be
> unmapped with pci_device_unmap_region. Once a region is mapped, it can
> be accessed via the pci_mem_region::memory pointer. int
> pci_device_map_region( struct pci_device * dev, unsigned region, int
> write_enable );
>
> int pci_device_unmap_region( struct pci_device * dev, unsigned region
> );
>
> ISSUE: Should special routines for reading / writing MMIO regions ala
> xf86WriteMmio8 be added?

Not unless we find a platform that requires that, IMO.  Memory mapped 
should mean memory mapped.

Also, see above question about how to handle mapping with particular 
attributes.

> The device's expansion ROM is treated 
> specially. Rather than mapping the ROM and reading it, a special
> function, pci_device_read_rom, is provided. The supplied buffer must
> be at least pci_device::rom_size bytes.
> int pci_device_read_rom(struct pci_device * dev, void * buffer );

This looks good, ROMs are such a pain.

>  Device configuration and capability data can be accessed via
> traditional, libpci.a style read and write routines.
> int pci_device_read_u8   ( struct pci_device * dev, unsigned offset, 
uint8_t * val );
> int pci_device_read_u16  ( struct pci_device * dev, unsigned offset, 
uint16_t * val );
> int pci_device_read_u32  ( struct pci_device * dev, unsigned offset, 
uint32_t * val );
> int pci_device_read_block( struct pci_device * dev, unsigned offset, 
void * val, unsigned length );
> int pci_device_write_u8   ( struct  pci_device * dev, unsigned offset, 
uint8_t val );
> int pci_device_write_u16  ( struct pci_device * dev, unsigned offset,
> uint16_t val );
> int pci_device_write_u32  ( struct pci_device * dev, unsigned offset, 
uint32_t val );
> int pci_device_write_block( struct pci_device * dev, unsigned offset, 
const void * val, unsigned length );

Might want to call these pci_device_cfg_* or something so that people 
don't confuse them for general MMIO accessors like you describe above.

>  In addition, specific routines and data types exist for common
> capabilities that are important to X. The pci_device_get_agp_info
> function parses the device's configuration header and returns a fully
> popluated pci_agp_info structure. If the device does not have an AGP
> capability entry, NULL is returned.

I know very little about AGP, so I can't really comment here.

>  In the future, similar routines may be added for other common device
> capabilities (e.g., power management, PCI-Express, etc.).

It seems like the kernel may be doing power management of various devices 
as well, so should there be some sort of callback mechanism?  This could 
get tricky if the kernel suspends a device without telling X about it...

Anyway, the proposal looks great; I'm really glad someone is tackling 
this area.

Thanks,
Jesse



More information about the xorg mailing list