Performance improvement to vga arbitration

Wed Jun 9 23:02:43 PDT 2010

Proposal for improving vgaarb arbitration method

It appears  that after session is up,  in most cases,  drivers only do 
non-legacy accesses.
Non-legacy accesses do not need to block each other. Blocking 
arbitration is needed
mostly for session initialization and exiting. To improve performance, 
we need to treat
differently to legacy and non-legacy accesses, and allow non-legacy 
accesses to proceed
concurrently among devices without blocking each other. Non-legacy 
accesses is assumed
to be the default for operating functions after initialization. In case 
legacy accesses are
necessary for some of them, drivers can redefine them per function group 
bases.
Here are some details:

(1) New lock for non-legacy access

  Define another lock, vgadev->locks2 (locks2), for non-legacy access 
locking
  in addition to vgadev->locks (locks1), currently used for legacy access
  locking.

  Non-legacy access requests from a device that does not have legacy access
  decoding ability should always be honored without a need of acquiring 
a lock.
  Non-legacy access requests from a device that has legacy access decoding
  ability needs to acquire locks2 before proceeding.

  Request for locks2 is blocked only when some other device already has 
locks1
  (on the same resources).  Request for locks1 is blocked when some 
other device
  already has locks1 or locks2 (on the same resource). This means 
request for
  locks2 should not be blocked just because some other device already 
has locks2
  (on the same resources).

  Currently we have 4 defines for resource request:

        VGA_RSRC_LEGACY_IO
        VGA_RSRC_LEGACY_MEM
        VGA_RSRC_NORMAL_IO
        VGA_RSRC_NORMAL_MEM

  but only two strings for them, "io" and "mem". Add "IO" and "MEM" for non-
  legacy accesses.

(2) Function group based resource request

  Need to distinguish between decoding ability and decoding request 
(resource
  request). Decoding ability is still maintained in struct vga_device of 
kernel
  driver with

        unsigned int decodes;

  and a userland copy in dev->vgaarb_rsrc.

  Currently all lock/unlocking mechanism uses resource requests from
  dev->vgaarb_rsrc, which is actually decoding ability. In new design 
however,
  this is only the case for xf86VGAarbiterLock() and 
xf86VGAarbiterUnlock(), run
  during session initialization and exiting. During normal run, resource 
request
  is determined by a resource mask associated with each function.

  Wrapping function are grouped into MAX_VGAARB_OPS_MASK number of
  groups with resource masks assigned to each of them. The default 
setting of mask is
  VGA_RSRC_NORMAL_IO|VGA_RSRC_NORMAL_MEM, meaning non-legacy
  access, but drivers can redefine any of them. In an extreme if a 
driver redefines all
  masks to

  VGA_RSRC_NORMAL_IO|VGA_RSRC_NORMAL_MEM|
  VGA_RSRC_LEGACY_IO|VGA_RSRC_LEGACY_MEM

  we are returning to old arbitration algorithm.

(3) Other changes

  * pci_device_vgaarb_set_target() is heavily called. Currently it 
involves two
    syscalls.  These calls can be saved if the device in question is the 
same as
    in the previous call (recorded in pci_sys->vga_target). This contributes
    to major performance improvement.

  * OpenConsole()/CloseConsole() need to be protected by lock and unlock 
as they
    may have vga register accesses. Further, 
OpenConsole()/CloseConsole() is run
    only on a session with primary device.

I am posting the design idea for comments.

(This has been implemented and tested on both Linux and Solaris  systems.)

-Henry