Performance improvement to vga arbitration

Tue Jun 15 18:45:19 PDT 2010

On 06/14/10 16:18, Dave Airlie wrote:
> On Thu, Jun 10, 2010 at 4:02 PM, Henry Zhao<henry.zhao at oracle.com>  wrote:
>    
>> Proposal for improving vgaarb arbitration method
>>
>> It appears  that after session is up,  in most cases,  drivers only do
>> non-legacy accesses.
>> Non-legacy accesses do not need to block each other. Blocking arbitration is
>> needed
>> mostly for session initialization and exiting. To improve performance, we
>> need to treat
>> differently to legacy and non-legacy accesses, and allow non-legacy accesses
>> to proceed
>> concurrently among devices without blocking each other. Non-legacy accesses
>> is assumed
>> to be the default for operating functions after initialization. In case
>> legacy accesses are
>> necessary for some of them, drivers can redefine them per function group
>> bases.
>> Here are some details:
>>
>> (1) New lock for non-legacy access
>>
>>   Define another lock, vgadev->locks2 (locks2), for non-legacy access locking
>>   in addition to vgadev->locks (locks1), currently used for legacy access
>>   locking.
>>
>>   Non-legacy access requests from a device that does not have legacy access
>>   decoding ability should always be honored without a need of acquiring a
>> lock.
>>   Non-legacy access requests from a device that has legacy access decoding
>>   ability needs to acquire locks2 before proceeding.
>>
>>   Request for locks2 is blocked only when some other device already has
>> locks1
>>   (on the same resources).  Request for locks1 is blocked when some other
>> device
>>   already has locks1 or locks2 (on the same resource). This means request for
>>   locks2 should not be blocked just because some other device already has
>> locks2
>>   (on the same resources).
>>      
> Originally I think Ben H had something like this, and I think avoiding
> deadlock was really really impossible, so I gave up.
>
> The thing is you have to guarantee that deadlock can't occur when you
> combine any set of n userspace drivers, I think the original design
> went a step further though and allowed separate io and mem requests
> and that might have caused the deadlock situations rather than the
> legacy/non-legacy issues.
>
> So I'd like to make sure all the possible locking/unlocking combos are
> thought out before going ahead with anything.
>
>    
>>   but only two strings for them, "io" and "mem". Add "IO" and "MEM" for non-
>>   legacy accesses.
>>      
> You'll notice io+mem is only interpreted in the kernel, separate
> io/mem was a deadlock fest, esp when you also consider non-X apps
> running at the same time as X. X since it is single threaded can't
> really deadlock, but if X has locked the non-legacy io+mem, and
> vbetool locks the legacy io+mem, and waits for the non-legacy lock, it
> starts to get really messy. As long as you've considered this in the
> design I'm fine with it.
>    

Thank you for pointing out potential deadlock problem.  In this
context deadlock could happen when a device is holding a lock,
while requesting another.  These are possible scenarios:

(1) If a device is making both legacy (locks) and non-legacy
     (locks2) requests in one call (such as io+mem+IO+MEM), the
     code in _tryget() guarantees either both or none are granted.
     Safe.

(2) If a device is holding a legacy lock while making request for
     another lock (whether it's legacy or non-legacy) in a subsequent
     call, since legacy lock is exclusive and no other devices can get
     any lock when a legacy lock is being held, the new lock can be
     granted without waiting.  Safe.

(3) A device is holding a non-legacy lock while making request
     for another lock in subsequent call.
     (a) It is safe if the new lock requested is non-legacy lock: when
         a non-legacy lock is being held, no other devices can get any
         legacy lock, therefore the new lock will not be in conflict with
         any locks on other devices (the only locks other devices may
         have are non-legacy, which certainly are not in conflict with
         the one being requested).
     (b) It is unsafe if the new lock requested is a legacy lock:  if two
         devices are both holding non-legacy lock while requesting
         legacy lock, a deadlock occurs: both devices are waiting for
         the non-legacy lock from other device to be removed, which
         never happens.  To solve this problem:

         * Add the resources represented in the non-legacy lock
             (currently being held) to the requesting resource of the
             new lock;
         * Clear non-legacy lock.

         This means the new request will include both non-legacy and
         legacy locks, like the situation in (1).

(4) However neither the current code nor the proposed code allows
     locking on individual attributes. If device "a" is holding lock for
     "io" while requesting a lock for "mem"; device "b" is holding lock
     for "mem" while requesting a lock for "io", deadlock occurs.
     So far I don't see any such requirement.  We certainly could use
     the method similar to what described in (3)/(b) to make this
     possible if there is a need.

The patch for in-kernel driver with (3)/(b) fix is attached.

>    
>> (2) Function group based resource request
>>
>>   Need to distinguish between decoding ability and decoding request (resource
>>   request). Decoding ability is still maintained in struct vga_device of
>> kernel
>>   driver with
>>
>>        unsigned int decodes;
>>
>>   and a userland copy in dev->vgaarb_rsrc.
>>
>>   Currently all lock/unlocking mechanism uses resource requests from
>>   dev->vgaarb_rsrc, which is actually decoding ability. In new design
>> however,
>>   this is only the case for xf86VGAarbiterLock() and xf86VGAarbiterUnlock(),
>> run
>>   during session initialization and exiting. During normal run, resource
>> request
>>   is determined by a resource mask associated with each function.
>>
>>   Wrapping function are grouped into MAX_VGAARB_OPS_MASK number of
>>   groups with resource masks assigned to each of them. The default setting of
>> mask is
>>   VGA_RSRC_NORMAL_IO|VGA_RSRC_NORMAL_MEM, meaning non-legacy
>>   access, but drivers can redefine any of them. In an extreme if a driver
>> redefines all
>>   masks to
>>       VGA_RSRC_NORMAL_IO|VGA_RSRC_NORMAL_MEM|
>>   VGA_RSRC_LEGACY_IO|VGA_RSRC_LEGACY_MEM
>>
>>   we are returning to old arbitration algorithm.
>>
>> (3) Other changes
>>
>>   * pci_device_vgaarb_set_target() is heavily called. Currently it involves
>> two
>>    syscalls.  These calls can be saved if the device in question is the same
>> as
>>    in the previous call (recorded in pci_sys->vga_target). This contributes
>>    to major performance improvement.
>>
>>   * OpenConsole()/CloseConsole() need to be protected by lock and unlock as
>> they
>>    may have vga register accesses. Further, OpenConsole()/CloseConsole() is
>> run
>>    only on a session with primary device.
>>
>>      
> In-kernel I'd really like to make some more changes, that I've been
> hacking on but really don't have time to fix.
>
> The main one is to use bridge control when possible to switch stuff
> on/off, the other was some sort of callback for nvidia (though maybe
> they can make a patch).
>
> The callback would be called instead of the current forcing
> enable/disable, and the driver in-kernel would be responsible for
> doing that, it would allow more flexibility in the non-kms world
> (which I don't care about enough to do much more).
>
> [attached initial patch - kernel patch not X.org, also probably broken.
>
> Dave.
>    

I think with the attached patch, my proposed code can be futher
optimized:

If vgadev->bridge_has_one_vga is true, the pci decoding bits on the
device are alway set (never turned off), so the device can always
do non-legacy accesses.  That is, when requesting  non-legacy access
such device does not even need to get a lock (locks2) - it can bypass
arbitration (only want to make sure VGAEnable bit of the bus is off) !!
This also means, if we have two devices sitting on two different buses,
during normal run ("graphics mode"), in most cases no arbitration is
really needed - _tryget() returns immediately.  Arbitration is needed
when one of the devices exits (including VT switch), or a new device
starts.

Suggest to add an entry in vga_arb_read(), to print also VGAEnable
bit of the bus.

-Henry

>
> _______________________________________________
> xorg-devel at lists.x.org: X.Org development
> Archives: http://lists.x.org/archives/xorg-devel
> Info: http://lists.x.org/mailman/listinfo/xorg-devel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vgaarb.patch
Type: text/x-diff
Size: 18072 bytes
Desc: not available
URL: <http://lists.x.org/archives/xorg-devel/attachments/20100615/d002a13b/attachment-0001.patch>