Re: Brain dump: btree collapsing - Mailing list pgsql-hackers
From | Curtis Faith |
---|---|
Subject | Re: Brain dump: btree collapsing |
Date | |
Msg-id | 001401c2d389$b6db2270$a200a8c0@curtislaptop Whole thread Raw |
In response to | Re: Brain dump: btree collapsing (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Brain dump: btree collapsing
|
List | pgsql-hackers |
tom lane wrote: > Got any evidence to back that up? I was relying on > > [Johnson89] Johnson, T. and Shasha, D. Utilization of > B-trees with Inserts, Deletes and Modifies ACM Symp. on > PODS, 235-246, 1989. > > which provides a ton of math and simulations leading up to > the conclusion that collapsing btree pages before they are > fully empty doesn't really gain anything meaningful in terms > of storage utilization, in most scenarios. Well, I don't have that specific paper handy but I do have [JS93] Theodore Johnson , Dennis Shasha, "B-trees with inserts and deletes: why free-at-empty is better than merge-at-half" which appears to be their later thinking on the same subject. Note the following: "A merge-at-half B-tree will always have a space utilization of at least 50%. When all operations are modify operations, or when the number of insert operations is the same as the number of delete operations, then the utilization will be about 60%. In contrast, a free-at-empty B-tree has a 0% lower bound on its space utilization, and will have about 39% utilization on a pure-modify instruction mix. However, the space utilization of a free-at-empty B-tree remains high if there are just a few more insert operations than delete operations. Thus, merge-at-half usually buys little in terms of space utilization. In Figure 6, we showed that the restructuring rate of a merge-at-half B-tree is significantly larger than the restructuring rate of a free-at-empty B-tree for all values of q * :1. For many concurrent B-tree algorithms used in practice [4, 13], restructuring causes a serialization bottleneck. Thus, one simple but important way to increase concurrency in B-tree operations is to reduce the probability of restructuring. Since merge-at-half buys little space utilization but is expensive in terms of restructuring, we recommend that B-trees (especially concurrently accessed ones) use free-at-empty." I don't dispute their conclusions in that context and under the circumstances they outline of random distribution of deletion and insertion values for the index keys. However, as [Jannink]: "Implementing Deletion in B+-trees. SIGMOD RECORD, v.24, n.1, p.33-38, 1995" points out that assumption doesn't hold under other possibly common circumstances, specifically circumstances where the deletes are taking place in significant sections of the index at a much higher rate than inserts. Consider from [Jannink95]: "There has been some research on the acceptability of relaxing the constraint of minimum node size to reduce the number of so-called unsafe tree operations, i.e., those which contain node splitting and merging [ZH89]. The debate has culminated in analysis of a weaker form of the deletion algorithm which we call lazy deletion, that imposes no constraints on the number of entries left in the nodes, allowing them to empty completely before simply removing them. According to [GR93], most database system implementations of B+-trees have adopted this approach. Its most effective use is when it is vital to allow concurrent access to the tree [JS93b], and excessive splitting and merging of nodes would restrict concurrency. [JS89] derives some analytic solutions calculating memory utilization for B+-trees under a mix of insertions and lazy deletions, based on previous research which considered insertions only [BY89]. The simulations in [JS89] support its analysis to show that in typical situations, where deletions don't outnumber insertions in the mix of operations, the tree nodes will contain acceptable percentages of entries. One of the work's assumptions [JS93a] is that the keys and tree operations are chosen uniformly from a random distribution. This assumption is unreasonable in certain realistic situations such as one described below. Allowing interior nodes with only a single pointer to exist in a B+-tree creates the possibility for arbitrarily unbalanced trees of any height, which are virtually empty, and in which access times have degenerated from the logarithmic bound B+-trees are meant to guarantee to a worst case unbounded access time. Since nodes are not removed until they are completely empty, the lazy deletion algorithm does not regulate tree height effectively." Jannink then illustrates an example where an index is created based on a timestamp where the basic assumption of Johnson and Sasha does not hold since it is not a random distribution but a monotonically increasing value. His example is an extreme one but I believe there are many instances where a timestamp, sequence or some other monotonically increasing value is used in an index and where deletes are taking place much more frequently for largely older values. Since sequences are often used as foreign keys a significant number of indexes fit into this category. Consider a web site that tracked users and that deleted inactive accounts. There are many real-world scenarios where the number of inactive accounts is very high as a percentage of all accounts. Consider an example where 50% of the accounts are inactive and deleted after 90 days of disuse and 90% are inactive after 180 days. If the Account is implemented with an ID based on a sequence, any tables that referenced the inactive account's sequence based ID via foreign keys would have index entries that would be sparsely populated for the older entries but might not have a high percentage of completely empty index pages. The older, but still active accounts would keep the index pages from being completely empty in many cases. Further, Johnson and Sasha make the claim that "free-at-empty is better" in the specific context of restructuring causing a serialization bottleneck. I don't think this applies to the specific case I recommended where the parent is the same for both leaves, especially during a VACUUM process where presumably we can optimize for concurrency rather than the speed of VACUUM itself. - Curtis
pgsql-hackers by date: