[Mesa-dev] [PATCH 3/3] i965/fs: Combine tex/fb_write operations (opt)
Pohjolainen, Topi
topi.pohjolainen at intel.com
Sun Apr 12 07:57:00 PDT 2015
On Sun, Apr 12, 2015 at 10:02:03AM +0300, Pohjolainen, Topi wrote:
> On Fri, Apr 10, 2015 at 12:52:04PM -0700, Ben Widawsky wrote:
> > Certain platforms support the ability to sample from a texture, and write it out
> > to the file RT - thus saving a costly send instructions (note that this is a
> > potnential win if one wanted to backport to a tag that didn't have the patch
> > from Topi which removed excess MOVs from LOAD_PAYLOAD - 97caf5fa04dbd2),
> >
> > v2: Modify the algorithm. Instead of iterating in reverse through blocks and
> > insts, since the last block/inst is the only thing which can benefit. Rebased
> > on top of Ken's patching modifying is_last_send
> >
> > v3: Rebased over almost 2 months, and Incorporated feedback from Matt:
> > Some comment typo fixes and rewordings.
> > Whitespace
> > Move the optimization pass outside of the optimize loop
> >
> > v4: Some cosmetic changes requested from Ken. These changes ensured that the
> > optimization function always returned true when an optimization occurred, and
> > false when one did not. This behavior did not exist with the original patch. As
> > a result, having the separate helper function which Matt did not like no longer
> > made sense, and so now I believe everyone should be happy.
> >
> > Braswell data:
> > Benchmark (n=20) %diff
> > *OglBatch5 -1.4
> > *OglBatch7 -1.79
> > OglFillTexMulti 5.57
> > OglFillTexSingle 1.16
> > OglShMapPcf 0.05
> > OglTexFilterAniso 3.01
> > OglTexFilterTri 1.94
> >
> > SKL data:
> > NONE COLLECTED
> >
> > No piglit regressions:
> > (http://otc-gfxtest-01.jf.intel.com:8080/view/dev/job/bwidawsk/112/)
> >
> > [*] I believe my measurements are incorrect for Batch5-7. If I add this new
> > optimization, but never emit the new instruction I see similar results.
>
> I'm seeing ~7% (with 95% confidence) decrease in OglBatch6/7 when I'm
> launching resolve clears with the light-weight mechanism provided by blorp.
> This may be totally unrelated but lets see if I get any smarter.
I let OglBatch6 run for some time (160 rounds each), and I get:
x /mnt/before
+ /mnt/after
+------------------------------------------------------------------------------+
| + x |
| + x |
| + x x |
| + + x x x x |
| + + x x x x x |
| + + ++ x xx x xx x |
| + *++ * +* x*xx+xx xxxx |
| + + + **+ *x+*+x**x+x** xxxxx |
| + + + ++*** *x+*+***x+x** xxxxx |
| + +++++++++***+**+*******x** xxxxxx |
| + + +++ ++*++++*+****+**********x**xxxxxxx x++ x |
|+ + ++ ** *+***+*+*+**+******************x*x***x+*** * xx*+ x|
| |__________|AM_______A__|_____| |
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 160 102.365 122.348 113.472 113.21107 3.6714446
+ 160 93.4825 121.597 110.289 110.03581 4.3771895
Difference at 95.0% confidence
-3.17526 +/- 0.885251
-2.80473% +/- 0.781947%
(Student's t, pooled s = 4.03976)
I'm not sure if one can really conclude much from this, I would almost claim
that my changes just introduce more fluctuation in the fps numbers but nothing
else.
I examined what callgrind tells me. Both master and meta-blorp got the same
amount of frames rendered while the latter does a little less work with
cpu to achieve this. The latter also submits slightly less work for the GPU
since clears are executed without the vertex shader stage. Hence I can't
really explain why it should be any slower.
So if I were you I probably wouldn't worry too much about your results.
More information about the mesa-dev
mailing list