[cairo] [RFC] Pixman & compositing with overlapping source and destination pixel data

Siarhei Siamashka siarhei.siamashka at gmail.com
Sun Oct 25 17:57:09 PDT 2009


On Friday 23 October 2009, Koen Kooi wrote:
> > I'm not sure about pixman_gc_t since most of the needed operations are just
> > simple copies. What about starting with just introducing a variant
> > of 'pixman_blt' which is overlapping aware?
> >
> > I created a work-in-progress branch with 'pixman_blt' function (generic C
> > implementation for now) extended to support overlapped source/destination
> > case. A simple test program is also included:
> > http://cgit.freedesktop.org/~siamashka/pixman/log/?h=overlapped-blt
 
First, this branch is outdated. There is a new branch with the final code :)
http://cgit.freedesktop.org/~siamashka/pixman/log/?h=overlapped-blt-v2

> Would using said branch give me 'magically' a performance boost (e.g. 
> not make firefox unusably slow as it is now on an 600MHz cortex a8) or 
> would I need to patch other libs (e.g. xrender) as well?

Not really, it's just a small extension of pixman functionality. Currently
the handling of overlapped blt operation (for software rendering) is done
in xorg-server. As it is the responsibility of pixman to provide CPU-specific
SIMD optimizations (NEON for ARM Cortex-A8), it would be quite natural to
move this work to pixman. So the next steps are to add NEON optimizations
to pixman_plt and make sure that xserver takes advantage of these
optimizations for the overlapped blit too.

As for improving scrolling performance (and assuming a standard fbdev driver),
the most important thing is to improve framebuffer memory performance. Right
now framebuffer memory is mapped as noncached writecombine on OMAP3. Enabling
write-through cache for it (with a simple kernel patch) improves scrolling
and moving windows performance by 4x-5x factor (unless shadow framebuffer is
used, which is also not good for performance). This works fine if nothing
but CPU can modify framebuffer memory. But if GPU or DSP can also access
framebuffer memory or compositing manager is used, everything gets more
complicated. Cache invalidate operations will have to be inserted in
appropriate places in order to ensure memory coherency and uniform view
of its content from all the units. If default write-back cache is used
instead of write-through, cache flush operations are needed too.

Unpatched firefox is also quite slow for another reason - it tries to
always work with 32bpp data internally, no matter what color depth is
used for desktop.

-- 
Best regards,
Siarhei Siamashka


More information about the xorg-devel mailing list