Old O'Reilly X books
Pat Kane
pekane52 at gmail.com
Thu Aug 26 11:57:27 PDT 2010
The XTS test suite has a bit count routine
<http://cgit.freedesktop.org/xorg/test/xts/tree/xts5/src/lib/bitcount.c>
that has a nice comment to explain what it is doing, could
you bench mark it?
Pat
---
Here is the code:
/*
* Explanation:
* First we add 32 1-bit fields to get 16 2-bit fields.
* Each 2-bit field is one of 00, 01, or 10 (binary).
* We then add all the two-bit fields to get 8 4-bit fields.
* These are all one of 0000, 0001, 0010, 0011, or 0100.
*
* Now we can do something different, becuase for the first
* time the value in each k-bit field (k now being 4) is small
* enough that adding two k-bit fields results in a value that
* still fits in the k-bit field. The result is four 4-bit
* fields containing one of {0000,0001,...,0111,1000} and four
* more 4-bit fields containing junk (sums that are uninteresting).
* Pictorially:
* n = 0aaa0bbb0ccc0ddd0eee0fff0ggg0hhh
* n>>4 = 00000aaa0bbb0ccc0ddd0eee0fff0ggg
* sum = 0aaaWWWWiiiiXXXXjjjjYYYYkkkkZZZZ
* where W, X, Y, and Z are the interesting sums (each at most 1000,
* or 8 decimal). Masking with 0x0f0f0f0f extracts these.
*
* Now we can change tactics yet again, because now we have:
* n = 0000WWWW0000XXXX0000YYYY0000ZZZZ
* n>>8 = 000000000000WWWW0000XXXX0000YYYY
* so sum = 0000WWWW000ppppp000qqqqq000rrrrr
* where p and r are the interesting sums (and each is at most
* 10000, or 16 decimal). The sum `q' is junk, like i, j, and
* k above; but it is not necessarry to discard it this time.
* One more fold, this time by sixteen bits, gives
* n = 0000WWWW000ppppp000qqqqq000rrrrr
* n>>16 = 00000000000000000000WWWW000ppppp
* so sum = 0000WWWW000ppppp000sssss00tttttt
* where s is at most 11000 and t is it most 100000 (32 decimal).
*
* Now we have t = r+p = (Z+Y)+(X+W) = ((h+g)+(f+e))+((d+c)+(b+a)),
* or in other words, t is the number of bits set in the original
* 32-bit longword. So all we have to do is return the low byte
* (or low 6 bits, but `low byte' is typically just as easy if not
* easier).
*
* This technique is also applicable to 64 and 128 bit words, but
* 256 bit or larger word sizes require at least one more masking
* step.
*
* Author: Gene Olsen
*/
unsigned int
bitcount(n)
register unsigned long n;
{
n = (n & 0x55555555) + ((n >> 1) & 0x55555555);
n = (n & 0x33333333) + ((n >> 2) & 0x33333333);
n = (n + (n >> 4)) & 0x0f0f0f0f;
n += n >> 8;
n += n >> 16;
return ((unsigned int) (n & 0xff));
}
On Thu, Aug 26, 2010 at 10:11 AM, Aaron Plattner <aplattner at nvidia.com> wrote:
> On Thu, Aug 26, 2010 at 05:43:53AM -0700, Pat Kane wrote:
>> BTW, while trying to grep my dead trees I noticed that the current X server
>> contains some HAKMEM code, in ./mi/micmap.c I see this hack:
>> ...
>> count = (visuals >> 1) & 033333333333;
>> count = visuals - count - ((count >> 1) & 033333333333);
>> count = (((count + (count >> 3)) & 030707070707) % 077); /* HAKMEM 169 */
>> ...
>
> Hah, awesome.
>
> Apparently that code sucked in 1995 and still sucks today:
> http://compilers.iecc.com/comparch/article/95-07-080
>
> I increased the loop count by two orders of magnitude:
>
> NAIVE: 5.616 sec
> HAKMEM: 7.541 sec
> HAKMEM_P: 8.381 sec
>
More information about the xorg-devel
mailing list