[PATCH xf86-video-glint] No need for byteswapping in YV12 decoding on BE machines
Matt Turner
mattst88 at gmail.com
Mon Feb 28 12:06:57 PST 2011
On Sat, Dec 11, 2010 at 10:24 PM, Mark Kettenis <mark.kettenis at xs4all.nl> wrote:
>> Date: Sat, 11 Dec 2010 22:52:25 +0100 (CET)
>> From: Mark Kettenis <mark.kettenis at xs4all.nl>
>>
>> > From: Alan Hourihane <alanh at fairlite.co.uk>
>> > Date: Thu, 09 Dec 2010 13:00:54 +0000
>> >
>> > On Thu, 2010-12-09 at 13:41 +0100, Mark Kettenis wrote:
>> > > > Might want to check the hardware manuals on this, but there might be a
>> > > > byteswap bit someplace that does it in hardware. The PGX32 may do this
>> > > > at BIOS init time, whereas a PC version may not. There is a bit for this
>> > > > on the PM3, so there maybe something on PM2.
>> > >
>> > > Well, obviously the PGX32 doesn't have a BIOS, but the Open Firmware
>> > > Forth code on the card might indeed do something like that. The
>> > > OpenBSD kernel driver for this card also twiddles some of the
>> > > endianness bits for the framebuffer windows to get things into the
>> > > state expected by the glint driver. The Linux framebuffer driver does
>> > > something similar, and I'm pretty sure NetBSD has something like that
>> > > as well.
>> > >
>> > > I don't have access to hardware documentation. Documentation was only
>> > > ever available under NDA isn't it? Do you expect there is a seperate
>> > > bit to set the ednianness used by the YUV transformation hardware?
>> >
>> > Possibly. I can't remember. But it's worth checking because the code
>> > wouldn't have been added without good reason in the first place.
>>
>> Did some further digging, and I think I've figured out what's going on
>> here.
>>
>> The way the code works is that the YV12 data is uploaded to the
>> texture buffer which is then translated to RGB and copied to the
>> framebuffer by the hardware. Uploading the YV12 data to the texture
>> buffer is done through the same aperture that maps the framebuffer.
>> Therefore writes to the texture buffer undergo the same byte twisting
>> as writes to the framebuffer. This has some interesting consequences.
>>
>> My PGX32 is a Permedia 2v and, by default, runs in 32bpp mode. So
>> pixels are 32 bits wide and therefore the glint driver configures the
>> aperture to do byte swapping, such that they end up as little-endian
>> RGBA values in the framebuffer. Since the YV12 is handled as 32-bit
>> wide units as well, they undergo the proper big-endian to
>> little-endian transformation as well, and there is no need for
>> CopyYV12() to do anything special. That means CopyYV12LE() is doing
>> the right thing.
>>
>> However, if I tell X to run in 16bpp mode, the situation changes.
>> Pixels are 16 bits wide and therefore the glint driver configures the
>> aperture to do half-word swapping. But since the YV12 data is handled
>> as 32-bit wide units, it now ends up being mangled when written to the
>> texture buffer. Unsurprisingly this results in weird looking video as
>> well. Note that the CopyYV12BE() code doesn't fix this.
>>
>> Running X in 8bpp mode is also still possible. In that case, the
>> aperture is configured not to do any byte swapping. In this case the
>> CopyYV12BE() code would do the right thing. But to be honest, given
>> the limited color space that 8bpp mode provides, it is hard to tell if
>> it does.
>>
>> So I guess some sort of byteswapping version of CopyYV12() will be
>> necessary to fix the 16bpp. Alternatively we could use the second
>> aperture and set it up to always do byteswapping and use that to
>> upload the YV12 data to the texture buffer.
>
> The diff below (on top of my previous diff) does the the trick. Not
> sure if the 8bpp and 24bpp modes are worth supporting. Video playback
> on 8bpp looks absolutey afwul, and 24bpp is completely broken (crashes
> the X server).
>
> I'll fight with git and submit a complete diff if folks think this is a
> sensible approach.
>
>
> Index: src/pm2_video.c
> ===================================================================
> RCS file: /cvs/xenocara/driver/xf86-video-glint/src/pm2_video.c,v
> retrieving revision 1.3
> diff -u -p -r1.3 pm2_video.c
> --- src/pm2_video.c 5 Dec 2010 20:25:26 -0000 1.3
> +++ src/pm2_video.c 11 Dec 2010 22:12:09 -0000
> @@ -1722,6 +1722,56 @@ CopyYV12(CARD8 *Y, CARD32 *dst, int widt
> }
> }
>
> +#if X_BYTE_ORDER == X_BIG_ENDIAN
> +
> +static void
> +CopyYV12_16(CARD8 *Y, CARD32 *dst, int width, int height, int pitch)
> +{
> + int Y_size = width * height;
> + CARD8 *V = Y + Y_size;
> + CARD8 *U = V + (Y_size >> 2);
> + int pad = (pitch >> 2) - (width >> 1);
> + int x;
> +
> + width >>= 1;
> +
> + for (height >>= 1; height > 0; height--) {
> + for (x = 0; x < width; Y += 2, x++)
> + *dst++ = Y[1] + (V[x] << 8) + (Y[0] << 16) + (U[x] << 24);
> + dst += pad;
> + for (x = 0; x < width; Y += 2, x++)
> + *dst++ = Y[1] + (V[x] << 8) + (Y[0] << 16) + (U[x] << 24);
> + dst += pad;
> + U += width;
> + V += width;
> + }
> +}
> +
> +static void
> +CopyYV12_8(CARD8 *Y, CARD32 *dst, int width, int height, int pitch)
> +{
> + int Y_size = width * height;
> + CARD8 *V = Y + Y_size;
> + CARD8 *U = V + (Y_size >> 2);
> + int pad = (pitch >> 2) - (width >> 1);
> + int x;
> +
> + width >>= 1;
> +
> + for (height >>= 1; height > 0; height--) {
> + for (x = 0; x < width; Y += 2, x++)
> + *dst++ = V[x] + (Y[1] << 8) + (U[x] << 16) + (Y[0] << 24);
> + dst += pad;
> + for (x = 0; x < width; Y += 2, x++)
> + *dst++ = V[x] + (Y[1] << 8) + (U[x] << 16) + (Y[0] << 24);
> + dst += pad;
> + U += width;
> + V += width;
> + }
> +}
> +
> +#endif
> +
> static void
> CopyFlat(CARD8 *src, CARD8 *dst, int width, int height, int pitch)
> {
> @@ -1818,8 +1868,24 @@ Permedia2PutImage(ScrnInfoPtr pScrn,
>
> switch (id) {
> case LE4CC('Y','V','1','2'):
> - CopyYV12(buf, (CARD32 *)((CARD8 *) pGlint->FbBase + pPPriv->BufferBase[0]),
> - width, height, pPPriv->BufferStride);
> + switch(pGlint->HwBpp) {
> +#if X_BYTE_ORDER == X_BIG_ENDIAN
> + case 8:
> + case 24:
> + CopyYV12_8(buf, (CARD32 *)((CARD8 *) pGlint->FbBase + pPPriv->BufferBase[0]),
> + width, height, pPPriv->BufferStride);
> + break;
> + case 15:
> + case 16:
> + CopyYV12_16(buf, (CARD32 *)((CARD8 *) pGlint->FbBase + pPPriv->BufferBase[0]),
> + width, height, pPPriv->BufferStride);
> + break;
> +#endif
> + default:
> + CopyYV12(buf, (CARD32 *)((CARD8 *) pGlint->FbBase + pPPriv->BufferBase[0]),
> + width, height, pPPriv->BufferStride);
> + break;
> + }
> PutYUV(pPPriv, pPPriv->BufferBase[0], FORMAT_YUYV, 1, 0);
> break;
It'd be cool to get this in the next released version, so can anyone
give this a review? I don't know enough about it to do it.
Thanks!
Matt
More information about the xorg-devel
mailing list