[PATCH 4/7] dix: Repack ClientRec
Adam Jackson
ajax at redhat.com
Mon Sep 24 07:00:06 PDT 2012
On Sat, 2012-09-22 at 09:07 +0200, Keith Packard wrote:
> Adam Jackson <ajax at redhat.com> writes:
>
> > Pick smaller types where possible, including bitfielding some Bools and
> > small enums, then shuffle the result to be hole-free. 192 -> 128 bytes
> > on LP64, 144 -> 96 bytes on ILP32.
>
> One thing that would make this easier to check for 'optimal' packing
> would be to simply start with the largest sized objects and work down to
> the smallest ones. Otherwise, I'm sitting here counting bits. Or would
> that be less efficient at run time?
That's true, as far as it goes, but I find it less tedious to just ask
what the answer is:
hate:~/xserver% pahole -C _Client hw/vfb/Xvfb
struct _Client {
pointer requestBuffer; /* 0 8 */
pointer osPrivate; /* 8 8 */
Mask clientAsMask; /* 16 4 */
short int index; /* 20 2 */
unsigned char majorOp; /* 22 1 */
unsigned char minorOp; /* 23 1 */
int swapped:1; /* 24:31 4 */
int local:1; /* 24:30 4 */
int big_requests:1; /* 24:29 4 */
int clientGone:1; /* 24:28 4 */
int closeDownMode:2; /* 24:26 4 */
int clientState:2; /* 24:24 4 */
/* Bitfield combined with next fields */
char smart_priority; /* 25 1 */
short int noClientException; /* 26 2 */
int priority; /* 28 4 */
ReplySwapPtr pSwapReplyFunc; /* 32 8 */
XID errorValue; /* 40 4 */
int sequence; /* 44 4 */
int ignoreCount; /* 48 4 */
int numSaved; /* 52 4 */
SaveSetElt * saveSet; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
int ()(ClientPtr) * * requestVector; /* 64 8 */
CARD32 req_len; /* 72 4 */
unsigned int replyBytesRemaining; /* 76 4 */
PrivateRec * devPrivates; /* 80 8 */
short unsigned int xkbClientFlags; /* 88 2 */
short unsigned int mapNotifyMask; /* 90 2 */
short unsigned int newKeyboardNotifyMask; /* 92 2 */
short unsigned int vMajor; /* 94 2 */
short unsigned int vMinor; /* 96 2 */
KeyCode minKC; /* 98 1 */
KeyCode maxKC; /* 99 1 */
int smart_start_tick; /* 100 4 */
int smart_stop_tick; /* 104 4 */
int smart_check_tick; /* 108 4 */
DeviceIntPtr clientPtr; /* 112 8 */
ClientIdPtr clientIds; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
/* size: 128, cachelines: 2, members: 37 */
};
Not the best-documented set of tools in the world, but very handy.
As far as efficiency, I suspect cacheline fill cost would dominate over
the cost of computing offsets. So if you really wanted to ricer tune
this, try a multi-client benchmark with the server under cachegrind,
figure out the histogram of field access, put the most-frequently used
member as the first element so its address constant-folds with that of
the struct itself, and then try to cram as many frequently-accessed
fields into the first cacheline as you can.
Having done that I'm not sure you'd see a statistically significant win
even in x11perf -noop.
- ajax
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://lists.x.org/archives/xorg-devel/attachments/20120924/dbd9f7b0/attachment.pgp>
More information about the xorg-devel
mailing list