DMA scheduling

Keith Whitwell keith at tungstengraphics.com
Thu Mar 16 07:52:09 PST 2006



I've been thinking a little bit about DMA scheduling for the graphics 
hardware.

Currently we have a situation where any 3d app which happens to be
able to grab the DRI lock can enqueue as many commands on the hardware
dma queue as it sees fit.  The Linux scheduler is the only arbiter
between competing 3D clients, and it has no information regarding the
GPU usage of these clients.

Even if it did, there are benefits to be reaped from keeping the 3d
DMA streams seperate and explicitly scheduling the dma rather than
allowing clients to inject it in arbitary quantities and orders.

Why do we want a GPU scheduler?

1) Fairness.  We can currently have situations where one 3d
    applications manages to dominate the GPU while a second app in
    another window is locked out entirely.

2) Interactivity.  It is quite possible to have one application which
    does so little rendering per frame that it can run at 3000fps while
    another eg, video-based application does a lot more and can just
    about keep up a 30fps framerate.  Consider a situation where both
    applications are running at once.  Simple fairness criteria would
    have them running at 1500fps and 15fps respectively - but it seems
    that fairness isn't what is required here.  It would be preferable
    give the slower application a greater percentage of the GPU, so
    that it manages eg. 27fps, while the other is scaled down to "only"
    300fps or so.

    Note that we currently don't even have the "fair" situation...

3) Resource management.  Imagine two applications each of which has a
    texture working set of 90% of the available video ram.  Even a
    smart replacement algorithm will end up thrashing if the
    applications are able to rapidly grab the DRI lock from each other
    and enqueue buffer loads and rendering.  A scheduler could
    recognize the thrashing and improve matters by giving each app a
    longer timeslice on the GPU to minimize texture reloads.

There are probably other reasons as well, but these are the ones that
spring to mind.  A scheduler should help to provide a graphics
environment that degrades gracefully under load, rather than the
current situation where apps stutter in and out of life, where one app
sits dead while another renders frantically.



Scheduling is a kernel activity
-------------------------------

First of all, and I think this is the currently most relevent bit, is
that a scheduler is going to be a largely kernel-space entity.  That
is processes will package up dma command buffers with all the
information needed to fire them, and hand them off (somehow) to a
scheduler which will live in the kernel.  I think this is likely to be
the only sensible place for a scheduler to live.

Scheduling and memory management
--------------------------------

What I see processes handing to the scheduler is something like a
struct of:

     bmBufferList *bufferlist;
     bmBuffer dmaCommandBuffer;
     bmFixupList *dmaFixups;
     bool  apply_cliprects;
     DrawableID cliprectDrawable;

This is basically the information we pass to the memory manager now.
The fixup list is the set of relocations which must be applied to the
command buffer once the memory manager has loaded all the referenced
buffers into locations in vram.

The departure from the current code is that we are now asking for this
to be done at some point in the future.  The current code actually
pulls in the buffers before phsyically enqueuing the command buffer on
the hardware.

Allowing this to be deferred to the future will allow the scheduler to
optimize the usage pattern of buffers, to reduce thrashing and to
attempt to fairly divide GPU resources (time, bandwidth, framerate,
etc) between competing 3d applications.

The scheduler will at some point in the future do the equivalent of:

  - choose a particular dma command buffer for execution.
  - effectively:
     LOCK_HARDWARE
     validateBuffers()
     applyFixups()
     retrieveClipRects()
     foreach cliprect {
         set cliprect;
         fire command buffer;
     }
     UNLOCK_HARDWARE

at this point, note that validateBuffers() is primarily used within
kernelspace.  This may mean that optimizations aimed at improving the
userspace behaviour of this call may not be important in the longer
term.

That aside, what is missing before we can implement the scheduler?  I
see a only couple of (small?) items.

1) retrieveClipRects()

- The way that other driver architectures have done this is to create
   a regular shared memory region that the X server and kernel module
   can access which holds the cliprects of all active drawables.  The
   memory region doesn't have to be pinned or anything special, just
   readable and understandable by both parties.  Access is probably
   protected by the DRI lock.

2) 2D blits.

- The regular mechanism of setting a cliprect and firing a command
   buffer works for most hardware we know about, but only for 3d
   commands.  2d commands tend not to be affected by the 3d state used
   for the cliprect.

- The i830 may or may not have a way to set a cliprect which affects
   2d state, but that's probably not helpful for solving the general
   problem.

If you look at the i915 driver, there's quite a few places where we
lock hardware and then use the cliprect list to emit 2d blits, eg for
screen clearing, swapBuffers, copyPixels, etc.

This could work with the scheduler if the application cleared out all
previous dma from the scheduler queue before being allowed to emit
those hard-coded blits.  The blits themselves would have to go to
hardware immediately, before UNLOCK_HARDWARE, and not be subject to
scheduling, otherwise the X server might change the cliprects before
they are scheduled.

While the above would work, and would certainly be fine as a first
step, it seems to reduce the utility of the scheduler as clients can
still skew the behaviour of the dma stream significantly just by
issuing lots of blits.

I'm considering a system where operations that don't respect the
standard method of setting a cliprect are passed to the scheduler as
special tokens.  These tokens are scheduled as usual, the same as dma
command buffers, but when it comes time to fire them, are passed to
the hardware's drm component to be turned into real dma commands.  At
the moment, I think the two tokens would be "copy-blit" and
"fill-blit", and that is pretty much all the 3d drivers need.

3) The X server.

Is the X server command stream scheduled?  I would like to think it
was, but see the above.  The X server would want more and more varied
control over the 2d and video hardware and command streams.  For Xgl,
it is a lot easier to see how this would work.  What about regular X
servers?

It has been pointed out that you can divide X server drawing into two
components:

	1) Drawing on behalf of clients.  This includes 2D xlib
	   drawing as well as 3D commands arising from indirect GLX
	   clients.

	2) Drawing as a result of window management operations, such
            as mapping, unmapping and moving windows.

 From the point of view of the scheduler, it may be advantageous to
treat these seperately.  The drawing commands from (1) can effectively
be sheduled normally, maybe even as multiple streams, one per
client/context.

The window-managment drawing operations are associated with changes to
cliprect lists and these may benefit from being scheduled differently.
They may also be subject to different constraints based on how easy or
difficult it is to propogate the cliprect changes to other queues.  If
cliprect changes cannot be propogated, it will be necessary to drain
the other queues before executing the window managment drawing.

Keith



More information about the xorg mailing list