Hi there!<div><br></div><div>I&#39;m trying to optimise shadow copies from landscape oriented 16bit shadowfb to portrait 16bit fb proper. This is done in shadowUpdateRotate16_270YX which is located in miext/shadow/shrotpackYX.h. My optimisation is aimed at pxa27x and xscale3 arm processors only since it uses iwmmxt asm code. I guess it could easily be ported to x86 mmx code too.</div>

<div><br></div><div>As I understand, the current implementation copies a single pixel at a time, like this: </div><div><br></div><div><div> *win = *sha++;</div><div>win += WINSTEPX(winStride);</div></div><div><br></div><div>

This also means that we&#39;re stepping over entire cachelines since every pixel of a single shadowfb line has to be copied to a new line of fb proper.</div><div>My patch tries to copy 4x4 pixel blocks prerotated to portrait orientation.  Basically, it takes 4 lines of shadowfb and divides it into 4x4 blocks.</div>

<div>Then it rotates them and copies them to fb proper. This way, instead of copying a single pixel per fb proper line, it copies four. The rotation code is</div><div>done in iwmmxt asm and takes about 0.9 instructions per pixel (assuming the 4x4 block is already in iwmmxt registers). 4x4 blocks imply that the rectangle</div>

<div>to be copied is width and height aligned to 4 pixels. If not, the patch reduces the rectangle to proper alignment with single pixel copies for width and height.</div><div>It doesn&#39;t really work and i can&#39;t find a reason why. The inital Xfbdev screen is looking fine, but when i start moving the pointer or windows, all i get is garbage. </div>

<div>The patch was tested on kdrive 1.3.0.0 running in qemu and on a Zaurus C-1000. </div><div><br></div><div>If anyone has any suggestions, please do tell.</div><div><br></div><div><div>diff --git a/miext/shadow/Makefile.am b/miext/shadow/Makefile.am</div>

<div>index a73d0ec..bab1045 100644</div><div>--- a/miext/shadow/Makefile.am</div><div>+++ b/miext/shadow/Makefile.am</div><div>@@ -1,5 +1,7 @@</div><div> noinst_LTLIBRARIES = <a href="http://libshadow.la" target="_blank">libshadow.la</a></div>

<div> </div><div>+</div><div>+</div><div> AM_CFLAGS = $(DIX_CFLAGS)</div><div> </div><div> INCLUDES = -I$(top_srcdir)/hw/xfree86/os-support</div><div>@@ -31,4 +33,5 @@ libshadow_la_SOURCES =<span style="white-space:pre-wrap">                </span>\</div>

<div> <span style="white-space:pre-wrap">        </span>shrot8pack.c<span style="white-space:pre-wrap">                </span>\</div><div> <span style="white-space:pre-wrap">        </span>shrotate.c<span style="white-space:pre-wrap">                </span>\</div>

<div> <span style="white-space:pre-wrap">        </span>shrotpack.h<span style="white-space:pre-wrap">                </span>\</div><div>-<span style="white-space:pre-wrap">        </span>shrotpackYX.h</div>
<div>+<span style="white-space:pre-wrap">        </span>shrotpackYX.h<span style="white-space:pre-wrap">                </span>\</div><div>+<span style="white-space:pre-wrap">        </span>iwmmxt_rotate_copy.s</div>
<div>diff --git a/miext/shadow/iwmmxt_rotate_copy.s b/miext/shadow/iwmmxt_rotate_copy.s</div><div>new file mode 100644</div><div>index 0000000..f6dac05</div><div>--- /dev/null</div><div>+++ b/miext/shadow/iwmmxt_rotate_copy.s</div>

<div>@@ -0,0 +1,60 @@</div><div>+@ r0 - shadowfb line</div><div>+@ r1 - fb proper line</div><div>+@ r2 - shadowfb stride</div><div>+@ r3 - width</div><div>+<span style="white-space:pre-wrap">        </span>.code 32</div>
<div>+<span style="white-space:pre-wrap">        </span>.arch iwmmxt</div><div>+<span style="white-space:pre-wrap">        </span>.cpu iwmmxt</div><div>+<span style="white-space:pre-wrap">        </span>.text</div>
<div>+<span style="white-space:pre-wrap">        </span>.global iwmmxt_rotate_copy</div><div>+iwmmxt_rotate_copy:</div><div>+<span style="white-space:pre-wrap">        </span>mov r4, r0<span style="white-space:pre-wrap">        </span>@ save shaBase</div>

<div>+<span style="white-space:pre-wrap">        </span>mov r6, #16<span style="white-space:pre-wrap">        </span>@ used for 2 pixel bitwise rotation</div><div>+<span style="white-space:pre-wrap">        </span>mov r7, #32<span style="white-space:pre-wrap">        </span>@ used for 4 pixel bitwise rotation</div>

<div>+<span style="white-space:pre-wrap">        </span>tmcr wcgr0, r6</div><div>+<span style="white-space:pre-wrap">        </span>tmcr wcgr1, r7</div><div>+iwmmxt_rotate_copy_main_loop:</div><div>+<span style="white-space:pre-wrap">        </span>wldrd wr0,[r0]</div>

<div>+<span style="white-space:pre-wrap">        </span>add r0, r0, r2</div><div>+<span style="white-space:pre-wrap">        </span>wldrd wr1,[r0]</div><div>+<span style="white-space:pre-wrap">        </span>add r0, r0, r2</div>
<div>+<span style="white-space:pre-wrap">        </span>wldrd wr2,[r0]</div><div>+<span style="white-space:pre-wrap">        </span>add r0, r0, r2</div><div>+<span style="white-space:pre-wrap">        </span>wldrd wr3,[r0]</div>
<div>+<span style="white-space:pre-wrap">        </span>@ Block transpose</div><div>+<span style="white-space:pre-wrap">        </span>wunpckihh wr4, wr0, wr1</div><div>+<span style="white-space:pre-wrap">        </span>wunpckihh wr5, wr2, wr3</div>

<div>+<span style="white-space:pre-wrap">        </span>wunpckilh wr6, wr0, wr1</div><div>+<span style="white-space:pre-wrap">        </span>wunpckilh wr7, wr2, wr3</div><div>+<span style="white-space:pre-wrap">        </span>wunpckihw wr0, wr4, wr5</div>

<div>+<span style="white-space:pre-wrap">        </span>wunpckilw wr1, wr4, wr5</div><div>+<span style="white-space:pre-wrap">        </span>wunpckihw wr2, wr6, wr7</div><div>+<span style="white-space:pre-wrap">        </span>wunpckilw wr3, wr6, wr7</div>

<div>+<span style="white-space:pre-wrap">        </span>@Block rotate to portrait and write to fb proper</div><div>+<span style="white-space:pre-wrap">        </span>wrorwg wr3, wr3, wcgr0</div><div>
+<span style="white-space:pre-wrap">        </span>wrordg wr3, wr3, wcgr1</div><div>+<span style="white-space:pre-wrap">        </span>wstrd wr3,[r1] </div><div>+<span style="white-space:pre-wrap">        </span>wrorwg wr2, wr2, wcgr0</div>
<div>+<span style="white-space:pre-wrap">        </span>wrordg wr2, wr2, wcgr1</div><div>+<span style="white-space:pre-wrap">        </span>add r1, #960</div><div>+<span style="white-space:pre-wrap">        </span>wstrd wr2,[r1]</div>
<div>+<span style="white-space:pre-wrap">        </span>wrorwg wr1, wr1, wcgr0</div><div>+<span style="white-space:pre-wrap">        </span>wrordg wr1, wr1, wcgr1</div><div>+<span style="white-space:pre-wrap">        </span>add r1, #960</div>

<div>+<span style="white-space:pre-wrap">        </span>wstrd wr1,[r1]</div><div>+<span style="white-space:pre-wrap">        </span>wrorwg wr0, wr0, wcgr0</div><div>+<span style="white-space:pre-wrap">        </span>wrordg wr0, wr0, wcgr1</div>

<div>+<span style="white-space:pre-wrap">        </span>add r1, #960</div><div>+<span style="white-space:pre-wrap">        </span>wstrd wr0,[r1]</div><div>+<span style="white-space:pre-wrap">        </span>add r1, #960</div>
<div>+<span style="white-space:pre-wrap">        </span>subs r3, r3, #4 @ decrement width</div><div>+<span style="white-space:pre-wrap">        </span>beq exit</div><div>+<span style="white-space:pre-wrap">        </span>bmi exit</div>
<div>+<span style="white-space:pre-wrap">        </span>add r4, r4, #8<span style="white-space:pre-wrap">        </span>@ update shaBase</div><div>+<span style="white-space:pre-wrap">        </span>mov r0, r4</div>
<div>+<span style="white-space:pre-wrap">        </span>b iwmmxt_rotate_copy_main_loop</div><div>+<span style="white-space:pre-wrap">        </span></div><div>+exit:</div><div>+<span style="white-space:pre-wrap">        </span>mov pc, lr</div>

<div>+<span style="white-space:pre-wrap">        </span>.end</div><div>+<span style="white-space:pre-wrap">        </span></div><div>diff --git a/miext/shadow/shrotpackYX.h b/miext/shadow/shrotpackYX.h</div>
<div>index f51da2f..5613233 100644</div><div>--- a/miext/shadow/shrotpackYX.h</div><div>+++ b/miext/shadow/shrotpackYX.h</div><div>@@ -33,6 +33,7 @@</div><div> #include    &quot;shadow.h&quot;</div><div> #include    &quot;fb.h&quot;</div>

<div> </div><div>+#define PIXELS_PER_BLOCK 4</div><div> #if ROTATE == 270</div><div> </div><div> #define WINSTEPX(stride)    (stride)</div><div>@@ -58,6 +59,9 @@</div><div> void</div><div> FUNC (ScreenPtr<span style="white-space:pre-wrap">        </span>    pScreen,</div>

<div>       shadowBufPtr  pBuf);</div><div>+      </div><div>+extern inline void iwmmxt_rotate_copy (Data *shadowfb, Data *fb, </div><div>+<span style="white-space:pre-wrap">                                        </span>FbStride stride, signed int nr_lines); </div>

<div> </div><div> void</div><div> FUNC (ScreenPtr<span style="white-space:pre-wrap">        </span>    pScreen,</div><div>@@ -73,6 +77,7 @@ FUNC (ScreenPtr<span style="white-space:pre-wrap">        </span>    pScreen,</div>
<div>     int<span style="white-space:pre-wrap">                </span>shaBpp;</div><div>     int<span style="white-space:pre-wrap">                </span>shaXoff, shaYoff;   /* XXX assumed to be zero */</div><div>
     int<span style="white-space:pre-wrap">                </span>x, y, w, h;</div><div>+    int h_temp, w_prologue, h_prologue;</div><div>     Data<span style="white-space:pre-wrap">        </span>*winBase, *win, *winLine;</div>
<div>     CARD32<span style="white-space:pre-wrap">        </span>winSize;</div><div> </div><div>@@ -87,76 +92,79 @@ FUNC (ScreenPtr<span style="white-space:pre-wrap">        </span>    pScreen,</div>
<div> <span style="white-space:pre-wrap">                                        </span>  SHADOW_WINDOW_WRITE,</div><div> <span style="white-space:pre-wrap">                                        </span>  &amp;winSize, pBuf-&gt;closure) - winBase;</div><div>
 </div><div>-    while (nbox--)</div><div>-    {</div><div>-        x = pbox-&gt;x1;</div><div>-        y = pbox-&gt;y1;</div><div>-        w = (pbox-&gt;x2 - pbox-&gt;x1);</div><div>-        h = pbox-&gt;y2 - pbox-&gt;y1;</div>

<div>+/*the width and height of the rectangle</div><div>+ * should be modulo 4 = 0 aligned for the</div><div>+ * iwmmxt_rotate_copy. If not, reduce the</div><div>+ * rectangle with per pixel copy for width</div><div>+ * and height.</div>

<div>+ */</div><div>+while (nbox--) {</div><div>+<span style="white-space:pre-wrap">        </span>x = pbox-&gt;x1;</div><div>+<span style="white-space:pre-wrap">        </span>y = pbox-&gt;y1;</div>
<div>+<span style="white-space:pre-wrap">        </span>w = (pbox-&gt;x2 - pbox-&gt;x1);</div><div>+<span style="white-space:pre-wrap">        </span>h = pbox-&gt;y2 - pbox-&gt;y1;</div><div>+</div><div>
+<span style="white-space:pre-wrap">        </span>w_prologue = w % PIXELS_PER_BLOCK;</div><div>+</div><div>+<span style="white-space:pre-wrap">        </span>if (w_prologue) {</div><div>+<span style="white-space:pre-wrap">                </span>shaLine = shaBase + (y * shaStride) + x;</div>

<div>+<span style="white-space:pre-wrap">                </span>winLine = winBase + WINSTART(x, y);</div><div>+<span style="white-space:pre-wrap">                </span>h_temp = h;</div><div>+<span style="white-space:pre-wrap">                </span>while (h_temp--)</div>

<div>+<span style="white-space:pre-wrap">                </span>{</div><div>+<span style="white-space:pre-wrap">                        </span>sha = shaLine;</div><div>+<span style="white-space:pre-wrap">                        </span>win = winLine;</div>
<div>+<span style="white-space:pre-wrap">                        </span>while (sha &lt; (shaLine + w_prologue))</div><div>+<span style="white-space:pre-wrap">                        </span>{</div><div>+<span style="white-space:pre-wrap">                                </span>*win = *sha++;</div>

<div>+<span style="white-space:pre-wrap">                                </span>win += WINSTEPX(winStride);</div><div>+</div><div>+<span style="white-space:pre-wrap">                        </span> }</div><div>+<span style="white-space:pre-wrap">                        </span>shaLine += shaStride;</div>

<div>+<span style="white-space:pre-wrap">                        </span>winLine += WINSTEPY();</div><div>+<span style="white-space:pre-wrap">                </span>}</div><div>+<span style="white-space:pre-wrap">                </span>w -= w_prologue;</div>
<div>+<span style="white-space:pre-wrap">                </span>x += w_prologue;</div><div>+<span style="white-space:pre-wrap">        </span>}</div><div>+<span style="white-space:pre-wrap">        </span>h_prologue = h % PIXELS_PER_BLOCK;</div>
<div>+</div><div>+<span style="white-space:pre-wrap">        </span>if (h_prologue) {</div><div>+<span style="white-space:pre-wrap">                </span>shaLine = shaBase + (y * shaStride) + x;</div><div>
+<span style="white-space:pre-wrap">                </span>winLine = winBase + WINSTART(x, y);</div><div>+<span style="white-space:pre-wrap">                </span>h_temp = h_prologue;</div><div>+<span style="white-space:pre-wrap">                </span>while (h_temp--)</div>

<div>+<span style="white-space:pre-wrap">                </span>{</div><div>+<span style="white-space:pre-wrap">                        </span>sha = shaLine;</div><div>+<span style="white-space:pre-wrap">                        </span>win = winLine;</div>
<div>+<span style="white-space:pre-wrap">                                </span>while (sha &lt; (shaLine + w))</div><div>+<span style="white-space:pre-wrap">                                </span>{</div><div>+<span style="white-space:pre-wrap">                                </span>    *win = *sha++;</div>

<div>+<span style="white-space:pre-wrap">                                        </span>win += WINSTEPX(winStride);</div><div>+<span style="white-space:pre-wrap">                                </span>}</div><div>+<span style="white-space:pre-wrap">                        </span>shaLine += shaStride;</div>

<div>+<span style="white-space:pre-wrap">                        </span>winLine += WINSTEPY();</div><div>+<span style="white-space:pre-wrap">                </span>}</div><div>+<span style="white-space:pre-wrap">        </span>h -= h_prologue;</div>
<div>+<span style="white-space:pre-wrap">        </span>y += h_prologue;</div><div>+<span style="white-space:pre-wrap">        </span>}</div><div> </div><div> <span style="white-space:pre-wrap">        </span>shaLine = shaBase + (y * shaStride) + x;</div>

<div>-#ifdef PREFETCH</div><div>-<span style="white-space:pre-wrap">        </span>__builtin_prefetch (shaLine);</div><div>-#endif</div><div>-<span style="white-space:pre-wrap">        </span>winLine = winBase + WINSTART(x, y);</div>
<div>+<span style="white-space:pre-wrap">        </span>winLine = winBase + (((pScreen-&gt;height - PIXELS_PER_BLOCK) - y) + (x * winStride));</div><div> </div><div>-        while (h--)</div><div>-        {</div>
<div>-<span style="white-space:pre-wrap">        </span>    sha = shaLine;</div><div>-<span style="white-space:pre-wrap">        </span>    win = winLine;</div><div>+<span style="white-space:pre-wrap">        </span>while (h &gt; 0)</div>
<div>+<span style="white-space:pre-wrap">        </span>{</div><div> </div><div>-            while (sha &lt; (shaLine + w - 16))</div><div>-            {</div><div>-#ifdef PREFETCH</div><div>-<span style="white-space:pre-wrap">                </span>__builtin_prefetch (sha + shaStride);</div>

<div>-#endif</div><div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div><div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div><div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div>

<div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div><div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div><div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div>

<div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div><div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div><div>-</div><div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div>

<div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div><div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div><div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div>

<div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div><div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div><div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div>

<div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div><div>-</div><div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div><div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div>

<div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div><div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div><div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div>

<div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div><div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div><div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div>

<div>-</div><div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div><div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div><div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div>

<div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div><div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div><div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div>

<div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div><div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div><div>-            }</div><div>-</div>
<div>-            while (sha &lt; (shaLine + w))</div><div>-            {</div><div>-<span style="white-space:pre-wrap">                </span>*win = *sha++;</div><div>-<span style="white-space:pre-wrap">                </span>win += WINSTEPX(winStride);</div>

<div>-            }</div><div>-</div><div>-<span style="white-space:pre-wrap">        </span>    y++;</div><div>-<span style="white-space:pre-wrap">        </span>    shaLine += shaStride;</div><div>
-<span style="white-space:pre-wrap">        </span>    winLine += WINSTEPY();</div><div>-        }</div><div>-        pbox++;</div><div>-    } /*  nbox */</div><div>+<span style="white-space:pre-wrap">                </span>sha = shaLine;</div>

<div>+<span style="white-space:pre-wrap">                </span>win = winLine;</div><div>+<span style="white-space:pre-wrap">                </span>/*rotate and copy 4x4 pixel blocks. */</div><div>+<span style="white-space:pre-wrap">                </span>iwmmxt_rotate_copy(sha, win, (shaStride * 2), w);</div>

<div>+</div><div>+<span style="white-space:pre-wrap">                </span>shaLine += (shaStride * PIXELS_PER_BLOCK);</div><div>+<span style="white-space:pre-wrap">                </span>winLine -= PIXELS_PER_BLOCK;</div>
<div>+<span style="white-space:pre-wrap">                </span>h-= PIXELS_PER_BLOCK;</div><div>+</div><div>+<span style="white-space:pre-wrap">        </span>}</div><div>+<span style="white-space:pre-wrap">        </span>}</div>
<div>+pbox++;</div><div> }</div><div>+</div></div><div><br></div><div><br></div>