[PATCH 2/3] present: Extend compositor optimization to presents > 1 vblank into future.

Mario Kleiner mario.kleiner.de at gmail.com
Sun Feb 8 10:32:00 PST 2015


Same principle as keithp's original patch 3/3, extended to handle
target_msc > crtc_msc + 1

1. If non-fullscreen window is redirected, schedule its vblank
   trigger event for 1 frame before target vblank, so offscreen
   copy and posting of damage happens in the frame before, through
   the regular present_execute() path, giving the compositor a
   chance to do its thing ahead of the target vblank.

2. After vblank_copy() posted the damage, it queues a new vblank
   event for 1 vblank into the future, which will call present_execute
   again, this time just for sending out the PresentCompleteNotify
   at the estimated vblank of true present completion.

   Unless something goes wrong with queueing, in which case it
   triggers immediate completion to avoid hangs, just as it does
   for unredirected copy presents.

Tested on Intel HD-4000 IvyBridge mobile with hardware timing
measurement equipment under KDE's KWin compositor and GNOME-3
for non-composited window copy swaps (= compositor off "classic"
mode), non-composited or unredirected fullscreen windows
(= kms-pageflip in use), and verified to do no harm to timing in
those cases. Testing the interesting case of redirected windows
under compositor, results are mixed:

1. KDE/KWin: Mostly correct timestamps (about 99% marking the correct
   vblank of true swap completion) under idle desktop. Things go
   downhill quickly if some additional load is present, e.g., a
   single instance of glxgears will cause at least 1 frame lag and
   all timestamps being 1 - 2 frames too early wrt. reality.

2. GNOME-3: Mostly correct timestamps for target_msc at least 2
   vblanks into the future (= 30 fps animation on 60 Hz panel),
   but almost always wrong - 1 frame too early - and somewhat
   jittery for 60 fps on 60 Hz panel. This on an otherwise idle
   desktop.

   When an instance of glxgears is running, timing *improves* to
   almost always correct timestamps for 60 fps, 30 fps, 20 fps,
   so somehow the extra load helps (gpu powermanagement upclocking,
   or somehow glxgears kicking the compositor in the right moment?)

Signed-off-by: Mario Kleiner <mario.kleiner.de at gmail.com>
Tested-by: Mario Kleiner <mario.kleiner.de at gmail.com>
---
 present/present.c | 90 +++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 68 insertions(+), 22 deletions(-)

diff --git a/present/present.c b/present/present.c
index 2e7662a..b75017b 100644
--- a/present/present.c
+++ b/present/present.c
@@ -587,13 +587,16 @@ present_vblank_idle(present_vblank_ptr vblank)
 
 /*
  * Execute the present copy operation if it hasn't been done yet, then mark the
- * related objects as completed
+ * related objects as completed. Schedule a deferred completion for composited
+ * windows.
  */
-static void
+static Bool
 present_copy(present_vblank_ptr vblank)
 {
     if (!vblank->copied) {
-        WindowPtr       window = vblank->window;
+        WindowPtr                   window = vblank->window;
+        ScreenPtr                   screen = vblank->screen;
+        present_window_priv_ptr     window_priv = present_get_window_priv(window, TRUE);
 
         vblank->copied = TRUE;
 
@@ -606,7 +609,46 @@ present_copy(present_vblank_ptr vblank)
         present_flush(window);
 
         present_vblank_idle(vblank);
+
+        /* Was this a copy into a composited window? If so, queue a present
+         * completion 1 vblank into the future, as our best guess as to when
+         * the compositor will actually complete the presentation.
+         *
+         * Otherwise fallthrough to immediate completion. That's the right
+         * thing to do for a non-composited copy, and the best we can do for
+         * now in case this is called as a unredirected copy from a fallback
+         * caused by a failed PresentCompleteModeFlip.
+         */
+        if (screen->GetWindowPixmap(window) != screen->GetScreenPixmap(screen)) {
+            int         ret;
+            uint64_t    ust = 0, crtc_msc = 0;
+
+            /* Re-add vblank to exec queue under same event_id */
+            xorg_list_add(&vblank->event_queue, &present_exec_queue);
+            xorg_list_append(&vblank->window_list, &window_priv->vblank);
+
+            /* Queue one vblank from now into the future */
+            ret = present_get_ust_msc(screen, vblank->crtc, &ust, &crtc_msc);
+
+            if (ret == Success)
+                ret = present_queue_vblank(screen, vblank->crtc,
+                                           vblank->event_id, crtc_msc + 1);
+
+            /* On success, tell caller to not complete the present. Our deferred
+             * vblank event will do that by triggering present_execute() again.
+             */
+            if (ret == Success)
+                return TRUE;
+
+            xorg_list_del(&vblank->event_queue);
+            xorg_list_del(&vblank->window_list);
+
+            DebugPresent(("present_queue_vblank for deferred completion failed\n"));
+        }
     }
+
+    /* Signal need for an immediate present completion */
+    return FALSE;
 }
 
 /*
@@ -708,7 +750,14 @@ present_execute(present_vblank_ptr vblank, uint64_t ust, uint64_t crtc_msc)
             if (window == screen_priv->flip_window)
                 present_unflip(screen);
         }
-        present_copy(vblank);
+
+        /* Only send present complete notify and destroy the vblank object now,
+         * if present_copy() has not queued a new vblank event for deferred
+         * completion. Otherwise leave the completion and cleanup to that
+         * deferred completion.
+         */
+        if (present_copy(vblank))
+            return;
     }
 
     present_vblank_notify(vblank, ust, crtc_msc);
@@ -880,6 +929,20 @@ present_pixmap(WindowPtr window,
             goto no_mem;
     }
 
+    /* If the window is composited and a copy is used to present the
+     * pixmap into it, queue the copy-present one frame earlier, so
+     * the offscreen copy and posting of damage can be done early enough
+     * for the compositor to get the final result onto the display for the
+     * true target vblank.
+     */
+    if (pixmap && window && vblank->mode == PresentCompleteModeCopy &&
+        screen->GetWindowPixmap(window) != screen->GetScreenPixmap(screen))
+    {
+        DebugPresent(("\tC %p %8lld: %08lx -> %08lx\n", vblank, crtc_msc,
+                      vblank->pixmap->drawable.id, vblank->window->drawable.id));
+        target_msc--;
+    }
+
     if (pixmap)
         DebugPresent(("q %lld %p %8lld: %08lx -> %08lx (crtc %p) flip %d vsync %d serial %d\n",
                       vblank->event_id, vblank, target_msc,
@@ -890,25 +953,8 @@ present_pixmap(WindowPtr window,
     vblank->queued = TRUE;
     if ((pixmap && target_msc >= crtc_msc) || (!pixmap && target_msc > crtc_msc)) {
         ret = present_queue_vblank(screen, target_crtc, vblank->event_id, target_msc);
-        if (ret == Success) {
-            /* If the window is composited, and the contents are
-             * destined for the next frame, just do the copy, sending
-             * damage along to the compositor.
-             *
-             * Leave the vblank around to send the completion event at
-             * vblank time
-             */
-            if (pixmap && window && vblank->mode == PresentCompleteModeCopy &&
-                (target_msc - crtc_msc) <= 1 &&
-                screen->GetWindowPixmap(window) != screen->GetScreenPixmap(screen))
-            {
-                DebugPresent(("\tC %p %8lld: %08lx -> %08lx\n", vblank, crtc_msc,
-                             vblank->pixmap->drawable.id, vblank->window->drawable.id));
-                present_copy(vblank);
-            }
-
+        if (ret == Success)
             return Success;
-        }
 
         DebugPresent(("present_queue_vblank failed\n"));
     }
-- 
2.1.0



More information about the xorg-devel mailing list