libXi regression (was Re: XQuartz tablet mouse events are at the wrong location and other Xinput headaches)
peter.hutterer at who-t.net
Wed Mar 14 20:08:12 PDT 2012
On Wed, Mar 14, 2012 at 06:58:43PM -0700, Jeremy Huddleston wrote:
> So looking through the original patch is making my head spin... here are a couple things that stood out to me:
> 1) This is odd. It looks like the original code should never have worked because of the sizeof(XIButtonClass). My issue does not run through this block, but it's still a head scratcher.
> +static void
> +sizeXIButtonClassType(int num_buttons, int* structure, int* state, int* atoms)
> + *structure = pad_to_double(sizeof(XIButtonClassInfo));
> - bout = next_block(&ptr, sizeof(XIButtonClass));
> + sizeXIButtonClassType(bin->num_buttons, &struct_size,
> + &state_size, &labels_size);
> + bout = next_block(&ptr, struct_size);
> 2) sizeDeviceClassType(XIButtonClass) is now longer
> sizeDeviceClassType(int type, int num_elements)
> int l = 0;
> + int extra1 = 0;
> + int extra2 = 0;
> case XIButtonClass:
> - l = sizeof(XIButtonClassInfo);
> - l += num_elements * sizeof(Atom);
> - /* Force mask alignment with longs to avoid
> - * unaligned access when accessing the atoms. */
> - l += ((((num_elements + 7)/8) + 3)/4) * sizeof(Atom);
> + sizeXIButtonClassType(num_elements, &l, &extra1, &extra2);
> + l += extra1 + extra2;
> ^^^ Why is extra1 (the size of the state mask) now being added here? It
> wasn't before. Was this just a latent bug that was fixed with this change
> rather than as a separate commit?
there are a few other issues here. first, the sizeof(Atom) appears wrong, or
at least excessive (2d638fc37b0dbf28e5c826f74f68ada83a8c3e2b).
this should the oirginal code + whatever is left to get to multiple of
sizeof(long). Not a huge deal though, it's just a bit for bloat.
the code originally did sizeof(struct) + length of mask + size of atom
array. it still does so now, though the labels length is too long in
sizeXIButtonClassType (the labels += line can be skipped)
> 3) copy_classes seems to be the culprit. With the new changes, we're passing larger values to next_block
> - cls_lib = next_block(&ptr_lib, sizeof(XIButtonClassInfo));
> cls_wire = (xXIButtonInfo*)any_wire;
> + sizeXIButtonClassType(cls_wire->num_buttons,
> + &struct_size, &state_size,
> + &labels_size);
> + cls_lib = next_block(&ptr_lib, struct_size);
> - cls_lib->state.mask = next_block(&ptr_lib, size * sizeof(Atom));
> + cls_lib->state.mask_len = state_size;
> + cls_lib->state.mask = next_block(&ptr_lib, state_size);
> - cls_lib->labels = next_block(&ptr_lib, cls_lib->num_buttons * sizeof(Atom));
> + cls_lib->labels = next_block(&ptr_lib, labels_size);
> A comparison of old steps and new steps:
> ~ $ ./a.out
> sizeof(XIButtonClassInfo): 40, struct_size: 40
> size * sizeof(Atom): 8, state_size: 8
> old_mask_len: 4, state_size: 8
> num_buttons * sizeof(Atom): 56, labels_size: 64
> So the issue seems to be that the new "cls_lib->state.mask = next_block(&ptr_lib, state_size)" steps over our first button.
next_block (not the best-named function) just allocates a new block of data.
as long as the original malloc was correctly sized, next_block doesn't
overwrite data. as you saw from the patch, the culprit was the assignment of
state_size to mask_len and then re-using mask_len/state_size to copy data,
when state_size may have been aligned and thus larger than mask_len.
> On Mar 14, 2012, at 6:00 PM, Jeremy Huddleston <jeremyhu at apple.com> wrote:
> > Reverting c1a5a70b51f12dedf354102217c7cd4247ed3a4b from libXi fixed the issue for me. I haven't looked into why, but the patch's changes are certainly related:
> > good: libXi-184.108.40.206
> > bad: libXi-1.6.0
> > $ git bisect log
> > # bad: [70b730b0548ca9e408f14f2576b972beb32a0ad0] libXi 1.6.0
> > # good: [34964b05c16161de65709d60799b9ad97ce56296] libXi 220.127.116.11
> > git bisect start 'libXi-1.6.0' 'libXi-18.104.22.168'
> > # bad: [1b9f0394c3d4d3833f8560ae8170a4d5842419ab] Fix XIScrollClass increment value on 32-bit machines
> > git bisect bad 1b9f0394c3d4d3833f8560ae8170a4d5842419ab
> > # bad: [c1a5a70b51f12dedf354102217c7cd4247ed3a4b] Fix bus error on MIPS N32 for bug #38331.
> > git bisect bad c1a5a70b51f12dedf354102217c7cd4247ed3a4b
> > c1a5a70b51f12dedf354102217c7cd4247ed3a4b is the first bad commit
> > commit c1a5a70b51f12dedf354102217c7cd4247ed3a4b
> > Author: Michał Masłowski <mtjm at mtjm.eu>
> > Date: Tue Feb 21 20:54:40 2012 +0100
> > Fix bus error on MIPS N32 for bug #38331.
> > XIValuatorClassInfo and XIScrollClassInfo might have an address
> > of 4 bytes modulo 8, while they contain doubles which need 8 byte
> > alignment. This is fixed by adding extra padding after each structure
> > or array in sizeDeviceClassType and adding helper functions to
> > determine sizes and padding only in one place.
> > Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=38331
> > Signed-off-by: Michał Masłowski <mtjm at mtjm.eu>
> > Signed-off-by: Peter Hutterer <peter.hutterer at who-t.net>
> > :040000 040000 6b49f07dcdd66ae31add8e9b543e6d6a0586231c a389cffc1796584cc7e90ea296b57f7ecf2224d9 M src
> > On Mar 14, 2012, at 5:06 PM, Peter Hutterer <peter.hutterer at who-t.net> wrote:
> >>> (gdb) print b->num_buttons
> >>> $24 = 7
> >>> (gdb) print b->labels[-1]
> >>> $25 = 498216206336
> >>> Based on xlsatoms, I'd expect 116-122 as the button atoms:
> >>> 116 Button Left
> >>> 117 Button Middle
> >>> 118 Button Right
> >>> 119 Button Wheel Up
> >>> 120 Button Wheel Down
> >>> 121 Button Horiz Wheel Left
> >>> 122 Button Horiz Wheel Right
> >>> It works when the client is i386/Linux or i386/darwin and fails when x86_64/darwin, ppc/Linux, or ppc64/Linux ... is this possibly an inputproto issue?
> >> libXi or xinput, I suspect. curious though, since it works fine here in
> >> x86_64. Can you bisect libXi from 22.214.171.124 to 1.6.0, I wonder if we
> >> introduced a bug here with the alignments.
> >> 2d638fc37b0dbf28e5c826f74f68ada83a8c3e2b is another possible candidate to
> >> revert and test.
> >> put a breakpoint in libXi's copy_classes on the XIButtonClass case and see
> >> what comes down the wire and where the offset is introduced. if it's already
> >> wrong on the wire, then the server has a bug here.
More information about the xorg-devel