[PATCH xf86-video-glint] No need for byteswapping in YV12 decoding on BE machines

Mark Kettenis mark.kettenis at xs4all.nl
Mon Feb 28 15:53:13 PST 2011


> From: Alan Hourihane <alanh at fairlite.co.uk>
> Date: Mon, 28 Feb 2011 22:32:49 +0000
> 
> On Mon, 2011-02-28 at 20:06 +0000, Matt Turner wrote:
> > On Sat, Dec 11, 2010 at 10:24 PM, Mark Kettenis <mark.kettenis at xs4all.nl> wrote:
> > >> Date: Sat, 11 Dec 2010 22:52:25 +0100 (CET)
> > >> From: Mark Kettenis <mark.kettenis at xs4all.nl>
> > >>
> > >> > From: Alan Hourihane <alanh at fairlite.co.uk>
> > >> > Date: Thu, 09 Dec 2010 13:00:54 +0000
> > >> >
> > >> > On Thu, 2010-12-09 at 13:41 +0100, Mark Kettenis wrote:
> > >> > > > Might want to check the hardware manuals on this, but there might be a
> > >> > > > byteswap bit someplace that does it in hardware. The PGX32 may do this
> > >> > > > at BIOS init time, whereas a PC version may not. There is a bit for this
> > >> > > > on the PM3, so there maybe something on PM2.
> > >> > >
> > >> > > Well, obviously the PGX32 doesn't have a BIOS, but the Open Firmware
> > >> > > Forth code on the card might indeed do something like that.  The
> > >> > > OpenBSD kernel driver for this card also twiddles some of the
> > >> > > endianness bits for the framebuffer windows to get things into the
> > >> > > state expected by the glint driver.  The Linux framebuffer driver does
> > >> > > something similar, and I'm pretty sure NetBSD has something like that
> > >> > > as well.
> > >> > >
> > >> > > I don't have access to hardware documentation.  Documentation was only
> > >> > > ever available under NDA isn't it?  Do you expect there is a seperate
> > >> > > bit to set the ednianness used by the YUV transformation hardware?
> > >> >
> > >> > Possibly. I can't remember. But it's worth checking because the code
> > >> > wouldn't have been added without good reason in the first place.
> > >>
> > >> Did some further digging, and I think I've figured out what's going on
> > >> here.
> > >>
> > >> The way the code works is that the YV12 data is uploaded to the
> > >> texture buffer which is then translated to RGB and copied to the
> > >> framebuffer by the hardware.  Uploading the YV12 data to the texture
> > >> buffer is done through the same aperture that maps the framebuffer.
> > >> Therefore writes to the texture buffer undergo the same byte twisting
> > >> as writes to the framebuffer.  This has some interesting consequences.
> > >>
> > >> My PGX32 is a Permedia 2v and, by default, runs in 32bpp mode.  So
> > >> pixels are 32 bits wide and therefore the glint driver configures the
> > >> aperture to do byte swapping, such that they end up as little-endian
> > >> RGBA values in the framebuffer.  Since the YV12 is handled as 32-bit
> > >> wide units as well, they undergo the proper big-endian to
> > >> little-endian transformation as well, and there is no need for
> > >> CopyYV12() to do anything special.  That means CopyYV12LE() is doing
> > >> the right thing.
> > >>
> > >> However, if I tell X to run in 16bpp mode, the situation changes.
> > >> Pixels are 16 bits wide and therefore the glint driver configures the
> > >> aperture to do half-word swapping.  But since the YV12 data is handled
> > >> as 32-bit wide units, it now ends up being mangled when written to the
> > >> texture buffer.  Unsurprisingly this results in weird looking video as
> > >> well.  Note that the CopyYV12BE() code doesn't fix this.
> > >>
> > >> Running X in 8bpp mode is also still possible.  In that case, the
> > >> aperture is configured not to do any byte swapping.  In this case the
> > >> CopyYV12BE() code would do the right thing.  But to be honest, given
> > >> the limited color space that 8bpp mode provides, it is hard to tell if
> > >> it does.
> > >>
> > >> So I guess some sort of byteswapping version of CopyYV12() will be
> > >> necessary to fix the 16bpp.  Alternatively we could use the second
> > >> aperture and set it up to always do byteswapping and use that to
> > >> upload the YV12 data to the texture buffer.
> > >
> > > The diff below (on top of my previous diff) does the the trick.  Not
> > > sure if the 8bpp and 24bpp modes are worth supporting.  Video playback
> > > on 8bpp looks absolutey afwul, and 24bpp is completely broken (crashes
> > > the X server).
> > >
> > > I'll fight with git and submit a complete diff if folks think this is a
> > > sensible approach.
> > >
> > >
> > > Index: src/pm2_video.c
> > > ===================================================================
> > > RCS file: /cvs/xenocara/driver/xf86-video-glint/src/pm2_video.c,v
> > > retrieving revision 1.3
> > > diff -u -p -r1.3 pm2_video.c
> > > --- src/pm2_video.c     5 Dec 2010 20:25:26 -0000       1.3
> > > +++ src/pm2_video.c     11 Dec 2010 22:12:09 -0000
> > > @@ -1722,6 +1722,56 @@ CopyYV12(CARD8 *Y, CARD32 *dst, int widt
> > >     }
> > >  }
> > >
> > > +#if X_BYTE_ORDER == X_BIG_ENDIAN
> > > +
> > > +static void
> > > +CopyYV12_16(CARD8 *Y, CARD32 *dst, int width, int height, int pitch)
> > > +{
> > > +    int Y_size = width * height;
> > > +    CARD8 *V = Y + Y_size;
> > > +    CARD8 *U = V + (Y_size >> 2);
> > > +    int pad = (pitch >> 2) - (width >> 1);
> > > +    int x;
> > > +
> > > +    width >>= 1;
> > > +
> > > +    for (height >>= 1; height > 0; height--) {
> > > +       for (x = 0; x < width; Y += 2, x++)
> > > +           *dst++ = Y[1] + (V[x] << 8) + (Y[0] << 16) + (U[x] << 24);
> > > +       dst += pad;
> > > +       for (x = 0; x < width; Y += 2, x++)
> > > +           *dst++ = Y[1] + (V[x] << 8) + (Y[0] << 16) + (U[x] << 24);
> > > +       dst += pad;
> > > +       U += width;
> > > +       V += width;
> > > +    }
> > > +}
> > > +
> > > +static void
> > > +CopyYV12_8(CARD8 *Y, CARD32 *dst, int width, int height, int pitch)
> > > +{
> > > +    int Y_size = width * height;
> > > +    CARD8 *V = Y + Y_size;
> > > +    CARD8 *U = V + (Y_size >> 2);
> > > +    int pad = (pitch >> 2) - (width >> 1);
> > > +    int x;
> > > +
> > > +    width >>= 1;
> > > +
> > > +    for (height >>= 1; height > 0; height--) {
> > > +       for (x = 0; x < width; Y += 2, x++)
> > > +           *dst++ = V[x] + (Y[1] << 8) + (U[x] << 16) + (Y[0] << 24);
> > > +       dst += pad;
> > > +       for (x = 0; x < width; Y += 2, x++)
> > > +           *dst++ = V[x] + (Y[1] << 8) + (U[x] << 16) + (Y[0] << 24);
> > > +       dst += pad;
> > > +       U += width;
> > > +       V += width;
> > > +    }
> > > +}
> > > +
> > > +#endif
> > > +
> > >  static void
> > >  CopyFlat(CARD8 *src, CARD8 *dst, int width, int height, int pitch)
> > >  {
> > > @@ -1818,8 +1868,24 @@ Permedia2PutImage(ScrnInfoPtr pScrn,
> > >
> > >     switch (id) {
> > >     case LE4CC('Y','V','1','2'):
> > > -       CopyYV12(buf, (CARD32 *)((CARD8 *) pGlint->FbBase + pPPriv->BufferBase[0]),
> > > -           width, height, pPPriv->BufferStride);
> > > +       switch(pGlint->HwBpp) {
> > > +#if X_BYTE_ORDER == X_BIG_ENDIAN
> > > +       case 8:
> > > +       case 24:
> > > +           CopyYV12_8(buf, (CARD32 *)((CARD8 *) pGlint->FbBase + pPPriv->BufferBase[0]),
> > > +               width, height, pPPriv->BufferStride);
> > > +           break;
> > > +       case 15:
> > > +       case 16:
> > > +           CopyYV12_16(buf, (CARD32 *)((CARD8 *) pGlint->FbBase + pPPriv->BufferBase[0]),
> > > +               width, height, pPPriv->BufferStride);
> > > +           break;
> > > +#endif
> > > +       default:
> > > +           CopyYV12(buf, (CARD32 *)((CARD8 *) pGlint->FbBase + pPPriv->BufferBase[0]),
> > > +               width, height, pPPriv->BufferStride);
> > > +           break;
> > > +       }
> > >        PutYUV(pPPriv, pPPriv->BufferBase[0], FORMAT_YUYV, 1, 0);
> > >        break;
> > 
> > It'd be cool to get this in the next released version, so can anyone
> > give this a review? I don't know enough about it to do it.
> 
> Looks good.

Thanks Matt, for reminding me.  I created a proper git diff:

http://lists.x.org/archives/xorg-devel/2011-February/019760.html

Alan, I should be able to commit that on wednesday if you're still ok
with that.

Cheers,

Mark


More information about the xorg-devel mailing list