[cairo] New ARMv7-A (NEON) optimisations for Pixman
Jeff Muizelaar
jeff at infidigm.net
Wed May 6 11:25:59 PDT 2009
On Wed, May 06, 2009 at 07:56:29AM +0000, Jonathan Morton wrote:
> Hi,
>
> At Movial we've been developing some optimisations for Pixman based on
> customer's hardware. The optimisations are generally applicable to
> ARMv7-A processors with the NEON coprocessor enabled (presently
> Cortex-A8/9 and Snapdragon) and an RGB565 framebuffer.
>
> It appears that Soren Sandmann is the active developer for Pixman at the
> moment, and thus in the best position to integrate these improvements.
> We'd welcome his input.
>
> We've tried to implement the optimisations in the same sort of way as
> existing Pixman code, to minimise integration problems - the goal having
> always been to contribute these optimisations upstream when they are
> ready. We have a series of patches against 0.15.2, starting with a
> framework for NEON support (based on Ian Rickard's work), then
> successively adding code paths.
>
> However we do also notice that there is a major refactoring effort going
> on, and so our code might need to be rearranged to match the new layout.
> (For example, it looks like there's explicit support for NEON code there
> already.) Apparently there is some other NEON code floating around, so
> we might have to do some coordination to avoid too much duplication of
> effort. For the moment we have to consider 0.15.2 as the base version.
It would be best if you could base the patches off of the current git
tree.
> Unfortunately we have not had time to include intrinsic versions of the
> blitters, so the optimisations will only work on GCC. The build
> shouldn't break on armcc, as we added a specific autoconf test for
> gcc-inline-asm support (cleaner than #ifdef magic, we think), though we
> don't have a convenient way of testing this directly against armcc. The
> conversion to intrinsics should not be very difficult for an interested
> party to perform.
>
> The optimisations cover straight fills, blended fills, straight copies,
> straight blits, format-converting blits (from xRGB8), ARGB8 compositing,
> and glyph (A8 * solid ARGB) rendering. We consider these operations to
> be the most common ones in practical applications.
>
> We've seen worthwhile performance improvements on the target hardware.
> In some typical cases, such as for glyph rendering, the bottleneck has
> been shifted from the blitter to the X server's overheads. In other
> cases, we are close to saturating the available memory bandwidth. We
> suspect that having the CPU and bus active for a shorter length of time
> should also save power, which is usually important on ARM-based devices.
>
> The first couple of patches are available essentially immediately, to
> get the ball rolling. The remaining patches in the series depend on our
> customer's approval, which will take time but not much effort. Of
> course knowing exactly where to send the patches would be helpful. :-)
I think sending the patches to the cairo list is probably the best
place. I look forward to seeing them :)
-Jeff
More information about the cairo
mailing list