[PATCH 4/7] dix: Repack ClientRec

Adam Jackson ajax at redhat.com
Mon Sep 24 07:00:06 PDT 2012


On Sat, 2012-09-22 at 09:07 +0200, Keith Packard wrote:
> Adam Jackson <ajax at redhat.com> writes:
> 
> > Pick smaller types where possible, including bitfielding some Bools and
> > small enums, then shuffle the result to be hole-free.  192 -> 128 bytes
> > on LP64, 144 -> 96 bytes on ILP32.
> 
> One thing that would make this easier to check for 'optimal' packing
> would be to simply start with the largest sized objects and work down to
> the smallest ones. Otherwise, I'm sitting here counting bits. Or would
> that be less efficient at run time?

That's true, as far as it goes, but I find it less tedious to just ask
what the answer is:

hate:~/xserver% pahole -C _Client hw/vfb/Xvfb
struct _Client {
	pointer                    requestBuffer;        /*     0     8 */
	pointer                    osPrivate;            /*     8     8 */
	Mask                       clientAsMask;         /*    16     4 */
	short int                  index;                /*    20     2 */
	unsigned char              majorOp;              /*    22     1 */
	unsigned char              minorOp;              /*    23     1 */
	int                        swapped:1;            /*    24:31  4 */
	int                        local:1;              /*    24:30  4 */
	int                        big_requests:1;       /*    24:29  4 */
	int                        clientGone:1;         /*    24:28  4 */
	int                        closeDownMode:2;      /*    24:26  4 */
	int                        clientState:2;        /*    24:24  4 */

	/* Bitfield combined with next fields */

	char                       smart_priority;       /*    25     1 */
	short int                  noClientException;    /*    26     2 */
	int                        priority;             /*    28     4 */
	ReplySwapPtr               pSwapReplyFunc;       /*    32     8 */
	XID                        errorValue;           /*    40     4 */
	int                        sequence;             /*    44     4 */
	int                        ignoreCount;          /*    48     4 */
	int                        numSaved;             /*    52     4 */
	SaveSetElt *               saveSet;              /*    56     8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	int ()(ClientPtr) * *      requestVector;        /*    64     8 */
	CARD32                     req_len;              /*    72     4 */
	unsigned int               replyBytesRemaining;  /*    76     4 */
	PrivateRec *               devPrivates;          /*    80     8 */
	short unsigned int         xkbClientFlags;       /*    88     2 */
	short unsigned int         mapNotifyMask;        /*    90     2 */
	short unsigned int         newKeyboardNotifyMask; /*    92     2 */
	short unsigned int         vMajor;               /*    94     2 */
	short unsigned int         vMinor;               /*    96     2 */
	KeyCode                    minKC;                /*    98     1 */
	KeyCode                    maxKC;                /*    99     1 */
	int                        smart_start_tick;     /*   100     4 */
	int                        smart_stop_tick;      /*   104     4 */
	int                        smart_check_tick;     /*   108     4 */
	DeviceIntPtr               clientPtr;            /*   112     8 */
	ClientIdPtr                clientIds;            /*   120     8 */
	/* --- cacheline 2 boundary (128 bytes) --- */

	/* size: 128, cachelines: 2, members: 37 */
};

Not the best-documented set of tools in the world, but very handy.

As far as efficiency, I suspect cacheline fill cost would dominate over
the cost of computing offsets.  So if you really wanted to ricer tune
this, try a multi-client benchmark with the server under cachegrind,
figure out the histogram of field access, put the most-frequently used
member as the first element so its address constant-folds with that of
the struct itself, and then try to cram as many frequently-accessed
fields into the first cacheline as you can.

Having done that I'm not sure you'd see a statistically significant win
even in x11perf -noop.

- ajax
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://lists.x.org/archives/xorg-devel/attachments/20120924/dbd9f7b0/attachment.pgp>


More information about the xorg-devel mailing list