[cairo] New ARMv7-A (NEON) optimisations for Pixman

Wed May 6 11:25:59 PDT 2009

On Wed, May 06, 2009 at 07:56:29AM +0000, Jonathan Morton wrote:
> Hi,
> 
> At Movial we've been developing some optimisations for Pixman based on
> customer's hardware.  The optimisations are generally applicable to
> ARMv7-A processors with the NEON coprocessor enabled (presently
> Cortex-A8/9 and Snapdragon) and an RGB565 framebuffer.
> 
> It appears that Soren Sandmann is the active developer for Pixman at the
> moment, and thus in the best position to integrate these improvements.
> We'd welcome his input.
> 
> We've tried to implement the optimisations in the same sort of way as
> existing Pixman code, to minimise integration problems - the goal having
> always been to contribute these optimisations upstream when they are
> ready.  We have a series of patches against 0.15.2, starting with a
> framework for NEON support (based on Ian Rickard's work), then
> successively adding code paths.
> 
> However we do also notice that there is a major refactoring effort going
> on, and so our code might need to be rearranged to match the new layout.
> (For example, it looks like there's explicit support for NEON code there
> already.)  Apparently there is some other NEON code floating around, so
> we might have to do some coordination to avoid too much duplication of
> effort.  For the moment we have to consider 0.15.2 as the base version.

It would be best if you could base the patches off of the current git
tree.

> Unfortunately we have not had time to include intrinsic versions of the
> blitters, so the optimisations will only work on GCC.  The build
> shouldn't break on armcc, as we added a specific autoconf test for
> gcc-inline-asm support (cleaner than #ifdef magic, we think), though we
> don't have a convenient way of testing this directly against armcc.  The
> conversion to intrinsics should not be very difficult for an interested
> party to perform.
> 
> The optimisations cover straight fills, blended fills, straight copies,
> straight blits, format-converting blits (from xRGB8), ARGB8 compositing,
> and glyph (A8 * solid ARGB) rendering.  We consider these operations to
> be the most common ones in practical applications.
> 
> We've seen worthwhile performance improvements on the target hardware.
> In some typical cases, such as for glyph rendering, the bottleneck has
> been shifted from the blitter to the X server's overheads.  In other
> cases, we are close to saturating the available memory bandwidth.  We
> suspect that having the CPU and bus active for a shorter length of time
> should also save power, which is usually important on ARM-based devices.
> 
> The first couple of patches are available essentially immediately, to
> get the ball rolling.  The remaining patches in the series depend on our
> customer's approval, which will take time but not much effort.  Of
> course knowing exactly where to send the patches would be helpful.  :-)

I think sending the patches to the cairo list is probably the best
place. I look forward to seeing them :)

-Jeff