Old O'Reilly X books

Pat Kane pekane52 at gmail.com
Thu Aug 26 11:57:27 PDT 2010


The XTS test suite has a bit count routine
<http://cgit.freedesktop.org/xorg/test/xts/tree/xts5/src/lib/bitcount.c>
that has a nice comment to explain what it is doing, could
you bench mark it?


Pat
---

Here is the code:

/*
 * Explanation:
 * First we add 32 1-bit fields to get 16 2-bit fields.
 * Each 2-bit field is one of 00, 01, or 10 (binary).
 * We then add all the two-bit fields to get 8 4-bit fields.
 * These are all one of 0000, 0001, 0010, 0011, or 0100.
 *
 * Now we can do something different, becuase for the first
 * time the value in each k-bit field (k now being 4) is small
 * enough that adding two k-bit fields results in a value that
 * still fits in the k-bit field.  The result is four 4-bit
 * fields containing one of {0000,0001,...,0111,1000} and four
 * more 4-bit fields containing junk (sums that are uninteresting).
 * Pictorially:
 *          n = 0aaa0bbb0ccc0ddd0eee0fff0ggg0hhh
 *       n>>4 = 00000aaa0bbb0ccc0ddd0eee0fff0ggg
 *        sum = 0aaaWWWWiiiiXXXXjjjjYYYYkkkkZZZZ
 * where W, X, Y, and Z are the interesting sums (each at most 1000,
 * or 8 decimal).  Masking with 0x0f0f0f0f extracts these.
 *
 * Now we can change tactics yet again, because now we have:
 *          n = 0000WWWW0000XXXX0000YYYY0000ZZZZ
 *       n>>8 = 000000000000WWWW0000XXXX0000YYYY
 * so     sum = 0000WWWW000ppppp000qqqqq000rrrrr
 * where p and r are the interesting sums (and each is at most
 * 10000, or 16 decimal).  The sum `q' is junk, like i, j, and
 * k above; but it is not necessarry to discard it this time.
 * One more fold, this time by sixteen bits, gives
 *          n = 0000WWWW000ppppp000qqqqq000rrrrr
 *      n>>16 = 00000000000000000000WWWW000ppppp
 * so     sum = 0000WWWW000ppppp000sssss00tttttt
 * where s is at most 11000 and t is it most 100000 (32 decimal).
 *
 * Now we have t = r+p = (Z+Y)+(X+W) = ((h+g)+(f+e))+((d+c)+(b+a)),
 * or in other words, t is the number of bits set in the original
 * 32-bit longword.  So all we have to do is return the low byte
 * (or low 6 bits, but `low byte' is typically just as easy if not
 * easier).
 *
 * This technique is also applicable to 64 and 128 bit words, but
 * 256 bit or larger word sizes require at least one more masking
 * step.
 *
 * Author: Gene Olsen
 */
unsigned int
bitcount(n)
        register unsigned long n;
{

        n = (n & 0x55555555) + ((n >> 1) & 0x55555555);
        n = (n & 0x33333333) + ((n >> 2) & 0x33333333);
        n = (n + (n >> 4)) & 0x0f0f0f0f;
        n += n >> 8;
        n += n >> 16;
        return ((unsigned int) (n & 0xff));
}



On Thu, Aug 26, 2010 at 10:11 AM, Aaron Plattner <aplattner at nvidia.com> wrote:
> On Thu, Aug 26, 2010 at 05:43:53AM -0700, Pat Kane wrote:
>> BTW, while trying to grep my dead trees I noticed that the current X server
>> contains some HAKMEM code, in ./mi/micmap.c I see this hack:
>>     ...
>>     count = (visuals >> 1) & 033333333333;
>>     count = visuals - count - ((count >> 1) & 033333333333);
>>     count = (((count + (count >> 3)) & 030707070707) % 077);  /* HAKMEM 169 */
>>     ...
>
> Hah, awesome.
>
> Apparently that code sucked in 1995 and still sucks today:
> http://compilers.iecc.com/comparch/article/95-07-080
>
> I increased the loop count by two orders of magnitude:
>
> NAIVE:    5.616 sec
> HAKMEM:   7.541 sec
> HAKMEM_P: 8.381 sec
>


More information about the xorg-devel mailing list