[RFC][PATCH] Make GetXIDRange O(1) instead of O(N^2)

Roberto Ragusa mail at robertoragusa.it
Wed May 29 14:10:57 PDT 2013


On 05/28/2013 08:58 PM, Keith Packard wrote:
> Roberto Ragusa <mail at robertoragusa.it> writes:
> 
>> This patch, in addition to raising the max hash size from 2^11 to 2^16 (2^11 was decided in 1994, or maybe even 1987), adds explicit tracking of free ids. The data structure I've chosen is an rb-tree of free extents.
> 
> I'm not excited at the notion of adding another data structure solely to speed this fairly uncommon operation.
> 
> Given that we can extract the necessary information from the existing data structure, here are a couple of brief ideas:
> 
> 1) Cache just a few free XID ranges; when XIDs are freed, you can extend or merge them together. This would make the operations cheaper at least, and might cover the bulk of the performance problems you found.

This would only delay the issue at the moment there is nothing cached
anymore; then you have to solve the same exact "find free id" problem.
I hate when performance is good and then slows down mysteriously; this
entire investigation started when I noticed my session was snappy
for a few days and then degenerated to slowish and then unusable.
I DO have X sessions open for months on my laptop.

> 2) Delay constructing the new data structure until the application actually runs out of IDs. Once that happens, you could turn on the free ID tracking. I can imagine building this incrementally by finding two free ranges each time the client asks for one and adding them both to the new data structure until you fail to find any more.

But finding still remains a problem. What are you really improving
in respect to the actual situation, which is "find one range and return it"?
The fundamental problem is that the hash structure is completely
anti-range; it distributes consecutive XIDs everywhere (in fact, I briefly
considered about not using the last bits of the IDs in the hash calculation).

> 3) Create an entirely new data structure for the whole resource DB that would make finding free extents less expensive. I'd suggest a skip-list as that makes in-order traversal really cheap (to find a free extent) while also reducing the average per-node cost down close to one pointer.

This is something I tried to explore too.
Fast lookup is essential, I think, and the current hash is almost unbeatable.
The problem is very similar to free space tracking in filesystems.
The options I considered were: a bitmap (fast insert/delete, bad search,
huge mem usage!) or a tree. I avoided the extra complexity of a btree, and
settled for an rbtree where nodes are extents (as ranges are exactly what
we are interested in). A btree of extents could be more efficient memory-wise
as the payload of the nodes is really small (an extent is just 2 ints).

> In any case, what we really need is credible performance data to use to compare the old and new implementation. I think we need data that shows:
> 
> 1) 'real' application performance. The cairo performance stuff using Render is probably a good way to measure this.

I do not have an application to stress test this.
Please give me any pointer to interesting tools, I want to stress the
correctness of the implementation too.

> 2) Memory usage. How much additional memory does this use, and, ideally, how much additional memory is getting traversed when manipulating the resource data base. Measuring the X server memory usage in a couple of situations with and without the changes should suffice.

I have my normal desktop running here on the modified code and I can show you
how big the structure is for some significant cases.
(two days uptime, 94000 seconds of operation, because I suspend during the night)

----------

This is a "normal" client, gkrellm running for two days.
(xrestop line, _complete_ rbtree, rbtree statistics)

4000000     4   35    0  448   50     1576K      2K   1578K  3036 gkrellm

[ 93708.434] (II)                     <44000005:7fffffff>
[ 93708.434] (II)                 041a6da4:43ffffff
[ 93708.434] (II)                     <04196abf:041a6da2>
[ 93708.434] (II)             04196aba:04196abd
[ 93708.434] (II)                     <04196aa8:04196aa9>
[ 93708.434] (II)                 040127db:04196aa1
[ 93708.434] (II)         <040002c3:040127d9>
[ 93708.434] (II)                 040002b3:040002b4
[ 93708.434] (II)             040002a7:040002ac
[ 93708.434] (II)                     <04000297:04000298>
[ 93708.434] (II)                 0400024d:0400028e
[ 93708.434] (II)     0400024a:0400024b
[ 93708.434] (II)                 04000242:04000243
[ 93708.434] (II)             0400021d:0400021e
[ 93708.434] (II)                 04000207:0400020a
[ 93708.434] (II)         <040001f3:040001f4>
[ 93708.434] (II)                 040001dc:040001df
[ 93708.434] (II)             040001c4:040001c7
[ 93708.434] (II)                 040001bb:040001bb
[ 93708.434] (II) 040001b0:040001b9
[ 93708.434] (II)                 040001ac:040001ac
[ 93708.434] (II)             04000194:0400019b
[ 93708.434] (II)                 04000187:0400018b
[ 93708.434] (II)         04000184:04000185
[ 93708.434] (II)                 04000180:04000180
[ 93708.434] (II)                     <0400017a:0400017b>
[ 93708.434] (II)             04000168:0400016f
[ 93708.434] (II)                 04000148:0400014f
[ 93708.434] (II)     <04000128:0400012f>
[ 93708.434] (II)                 04000108:0400010f
[ 93708.434] (II)             040000e8:040000ef
[ 93708.434] (II)                 040000c8:040000cf
[ 93708.434] (II)         040000a8:040000af
[ 93708.434] (II)                 04000088:0400008f
[ 93708.434] (II)             04000069:0400006f
[ 93708.434] (II)                     04000064:04000064
[ 93708.434] (II)                         <04000051:04000052>
[ 93708.434] (II)                 <04000028:04000028>
[ 93708.434] (II)                     00000000:03ffffff
[ 93708.434] (II)   node count:39
[ 93708.434] (II)      inserts:1730244
[ 93708.434] (II)      deletes:1730205
[ 93708.434] (II)      lookups:5194957
[ 93708.434] (II) lookup_evals:43623819
[ 93708.434] (II)         mark_used:1732009
[ 93708.434] (II)    mark_used_omit:2
[ 93708.434] (II)       mark_unused:1731473
[ 93708.434] (II) maybe_mark_unused:1731473

As you can see, 1.7M inserts/deletes. Each deletes makes 3 lookups (to
detect merging conditions), so 5M lookups. 43M nodes explored, as the height
of the tree is <10, because we just have 39 extents.
The memory used is so 39*sizeof(rbtree node), which is 39*36 ~= 1KiB
on 64bit arch.

----------

This is a HEAVY client, firefox running for two days with 625 fully loaded
tabs (yes, 625).

(xrestop line, _partial_ rbtree, rbtree statistics)

6a00000   102   54    1 1210 1887    85194K     48K  85243K  8779 Schrödinger's �~_~X� and outside-the-box naming [LWN.net] - Mozilla Firefox

...
[ 94023.210] (II)                                         06a13524:06a13556
[ 94023.210] (II)                                     06a13517:06a13522
[ 94023.210] (II)                                             <06a134f5:06a13514>
[ 94023.210] (II)                                         06a134d1:06a134f3
[ 94023.210] (II)             06a1349d:06a134ce
[ 94023.210] (II)                                         <06a13441:06a1349b>
[ 94023.210] (II)                                     06a1341d:06a1343f
[ 94023.210] (II)                                         <06a13417:06a13418>
[ 94023.210] (II)                                 06a133f6:06a13415
[ 94023.210] (II)                                         06a133e3:06a133f4
[ 94023.210] (II)                                     <06a1332f:06a133e0>
[ 94023.210] (II)                                         06a1331c:06a1332d
[ 94023.210] (II)                             06a132cd:06a13319
...
[ 94023.211] (II)   node count:1429
[ 94023.211] (II)      inserts:13679868
[ 94023.211] (II)      deletes:13678439
[ 94023.211] (II)      lookups:89129915
[ 94023.211] (II) lookup_evals:1182562960
[ 94023.211] (II)         mark_used:29572199
[ 94023.211] (II)    mark_used_omit:60
[ 94023.211] (II)       mark_unused:29568982
[ 94023.211] (II) maybe_mark_unused:29569005

With 13M inserts/deletes here, we have 1429 nodes, so 50KiB of memory for the structure.
The average height of the tree appears to be 13.

----------

Finally, here is the guy who triggered all my study; the plasma-desktop systray
which is spammed by psi with flashing icons (either psi or plasma-desktop is
leaking pixmaps).

(xrestop line, _partial_ rbtree, rbtree statistics)
            pixmaps------------+
                               v
1600000    65   62    1   90 211194    10738K   4953K  15692K  3266 plasma-desktop

[ 94684.741] (II)                                                                                                 4160001d:4160001f
[ 94684.741] (II)                                                                                             4160001b:4160001b
[ 94684.741] (II)                                                                                                 41600016:41600016
[ 94684.741] (II)                                                                                 41600006:41600006
[ 94684.741] (II)                                                                                                     017ffff9:415fffff
[ 94684.741] (II)                                                                                                     <017fffe9:017ffff7>
[ 94684.741] (II)                                                                                                     017fffd9:017fffe7
[ 94684.741] (II)                                                                                                     017fffc9:017fffd7
[ 94684.741] (II)                                                                                                     017fffb9:017fffc7
[ 94684.741] (II)                                                                                                     <017fffa9:017fffb7>
...
[ 94685.003] (II)                                                             01740695:017406a3
[ 94685.003] (II)                                                                 01740685:01740693
[ 94685.003] (II)                                             01740675:01740683
[ 94685.003] (II)                                                                 01740665:01740673
[ 94685.003] (II)                                                             01740655:01740663
[ 94685.003] (II)                                                                 01740645:01740653
[ 94685.003] (II)                                                         01740635:01740643
[ 94685.003] (II)                                                                 01740625:01740633
[ 94685.003] (II)                                                             01740615:01740623
[ 94685.003] (II)                                                                 01740605:01740613
[ 94685.003] (II)                                                     017405f5:01740603
[ 94685.003] (II)                                                                 017405e5:017405f3
[ 94685.003] (II)                                                             017405d5:017405e3
[ 94685.003] (II)                                                                 017405c5:017405d3
[ 94685.003] (II)                                                         017405b5:017405c3
...
[ 94685.727] (II)                                                                 0160005c:01600060
[ 94685.727] (II)                                                         01600054:01600058
[ 94685.727] (II)                                                                 0160004c:01600050
[ 94685.727] (II)                                                             01600044:01600048
[ 94685.727] (II)                                                                 0160003c:01600040
[ 94685.727] (II)                                                     01600034:01600038
[ 94685.727] (II)                                                                 01600026:0160002d
[ 94685.727] (II)                                                             01600021:01600021
[ 94685.727] (II)                                                                 0160001f:0160001f
[ 94685.727] (II)                                                         01600011:01600019
[ 94685.727] (II)                                                                 0160000f:0160000f
[ 94685.727] (II)                                                             01600005:01600007
[ 94685.727] (II)                                                                 00000000:015fffff


[ 94685.727] (II)   node count:185743
[ 94685.727] (II)      inserts:1480748
[ 94685.727] (II)      deletes:1295005
[ 94685.727] (II)      lookups:9906008
[ 94685.727] (II) lookup_evals:267201246
[ 94685.727] (II)         mark_used:3352784
[ 94685.727] (II)    mark_used_omit:456
[ 94685.727] (II)       mark_unused:3143170
[ 94685.727] (II) maybe_mark_unused:3143620

We have a pathological case, with 185,000 nodes (that is 6.5MiB).
Tree height is about 26.
The available XIDs are 0x0160000-0x017fffff.
As you can see from the central part of the tree structure, there is a leak because
every 16 allocated XIDs, only 15 are freed.
The client has incremented the XID and left a lot of holes behind in the whole range, then
the 0x017fffff limit has been reached and the allocation of free XIDs is now happening.
The log says, for example:

[ 94683.640] (II) find min=01600000 max=017fffff
[ 94683.640] (II) looking up=01600000:017fffff (over)
[ 94683.640] (II) find returned 016bc194:016bc19a

> 3) Performance of applications running out of XIDs to know whether the change actually fixes the problem. Have you already constructed some way of measuring this?

This is easy.
Let psi flash for a few hours. When the numbers of pixmaps reaches 200,000
the X server is as sluggish as unusable. The patched server works well even
with 800,000 pixmaps (I didn't happen to have more).


> However, while the protocol requires that IDs be unique across all resource types, the server implementation does not. This allows the X server to associate other data with specific IDs and have that freed automatically. Looking up IDs always involves the type or class of the ID, so code can be assured of getting the right data structure back.

Then I suppose it happens at the server level.
I just see the calls to AddResource and FreeResource, so I was probably wrong
in implying this is caused by clients.

Have a look at mark_used_omit, mark_unused at maybe_mark_unused in the
statistics I pasted above.

Thanks for your comments.
-- 
   Roberto Ragusa    mail at robertoragusa.it



More information about the xorg-devel mailing list