[PATCH] Ensure blitter quiescience before reading pixels from the framebuffer

Tue Jul 31 06:07:48 PDT 2007

Michel Dänzer wrote:
> On Mon, 2007-07-30 at 16:39 +0200, Bernardo Innocenti wrote:
>> Michel Dänzer wrote:
>>
>> I depend on XAA because EXA is still unusably slow on all hardware
>> I ever tried it with.  I wonder if there's someone with a different experience.
> 
> I wouldn't bother spending effort on EXA if it didn't work better for
> me. Probably you're not using a composited desktop?

I must correct myself: with the latest code from git, EXA is rather
usable on r300, with or without a compositing manager running.

On an r200 laptop with an older CPU, it's still too slow with respect to
XAA, but much better than what I remembered.

>> That's not really a fix, rather a workaround: we still upload the pixmap to the
>> framebuffer, and we still allocate and initialize it in memory, which is also
>> unfortunate.
> 
> Probably, but does it incur a measurable penalty? The CPU is supposed to
> be ahead of the GPU anyway.

On the OLPC, it may not be the case: we have a very weak CPU along with a
somewhat better blitter.  It's probably the same with most embedded devices.

Besides, the pattern I've seen with gtk applications is dominated by very
small drawing operations where the blitter is not even worth using.

This is especially true for the OLPC theme and other themes with round edged
widgets: the arcs generate lots of small trapezoids of height 1 or 2.

>> I always wanted to run oprofile on one of Cairo's benchmark to see how the
>> overhead is distributed.  But Carl Worth already provided plenty of proof.
> 
> Can you be more specific? I've been following Carl's posts about i965
> vs. EXA, but I don't remember reading anything about this particular
> path having been identified as a bottleneck yet.

I can't read Carl's article because his site appears to be down at this time.
If I remember correctly, over half the amount of time was spent outside the
Intel driver.

And, from what I've seen, all antialiased primitives require drawing in a a8
off-screen bitmap (unaccelerated) and then composing a source bitmap through
the a8 mask.  For solid fills, the source is always a repeated 1x1 bitmap.

Along with the rendering of glyphs, these small but frequent operations are
likely to dominate rendering time for the typical desktop.

> I guess it's just not feasible to accurately estimate performance from
> code inspection. It needs to be measured.

I wanted to do it at some point, but running oprofile on slow hardware is
quite painful.  And, still, you need to do some guessing when you interpret
the results.

For instance, I expect to see a lot of time spent in the driver, but mostly
because EXA is asking it to do spurious uploads of small bitmaps.

-- 
   // Bernardo Innocenti
 \X/  http://www.codewiz.org/