Thread: Performance of PostgreSQL B+-tree algorithm

Performance of PostgreSQL B+-tree algorithm

From
Kyle Lanclos
Date:
I spent some time last week staring at the code for the PostgreSQL
B+-tree implementation. What I hoped to find, and was not immediately
able to determine, was the Knuth order for the PostgreSQL B+-tree
implementation. It is entirely possible that I simply got lost in the
wrong C file.

My goal is to make an informed assertion about the performance of
a PostgreSQL B+-tree index as the quantity of records in our database
grows more or less unbounded.

To use a common reference, wikipedia states the following:

    Bayer & McCreight (1972), Comer (1979), and
    others define the order of B-tree as the
    minimum number of keys in a non-root node.
    Folk & Zoellick (1992) points out that terminology
    is ambiguous because the maximum number of keys
    is not clear. An order 3 B-tree might hold a
    maximum of 6 keys or a maximum of 7 keys.
    (Knuth 1998, p. 483) avoids the problem by defining
    the order to be maximum number of children (which
    is one more than the maximum number of keys).

    http://en.wikipedia.org/wiki/B-tree

I would be happy to refer to an academic publication if it contains a
clear analysis of the PostgreSQL B+-tree implementation.

Thanks much,

--Kyle

Re: Performance of PostgreSQL B+-tree algorithm

From
Tom Lane
Date:
Kyle Lanclos <lanclos@ucolick.org> writes:
> I spent some time last week staring at the code for the PostgreSQL
> B+-tree implementation. What I hoped to find, and was not immediately
> able to determine, was the Knuth order for the PostgreSQL B+-tree
> implementation. It is entirely possible that I simply got lost in the
> wrong C file.

> My goal is to make an informed assertion about the performance of
> a PostgreSQL B+-tree index as the quantity of records in our database
> grows more or less unbounded.

> To use a common reference, wikipedia states the following:

>     Bayer & McCreight (1972), Comer (1979), and
>     others define the order of B-tree as the
>     minimum number of keys in a non-root node.
>     Folk & Zoellick (1992) points out that terminology
>     is ambiguous because the maximum number of keys
>     is not clear. An order 3 B-tree might hold a
>     maximum of 6 keys or a maximum of 7 keys.
>     (Knuth 1998, p. 483) avoids the problem by defining
>     the order to be maximum number of children (which
>     is one more than the maximum number of keys).

Well, that would depend on the data type being indexed, which you did
not specify; and if it's a variable-length type then it's really hard to
give a concrete answer.  For integer or int8 keys the answer is
typically about 400, though, depending on whether you're talking about
a 32- or 64-bit platform.  Basically it's 4 bytes for line pointer,
plus 8 bytes for index tuple header, plus maxalign'ed size of the index
key, divided into page size (less a couple dozen bytes for page header).
You could increase the result by building with a page size of more than
the default 8K, though I've seen no recent experiments suggesting that
doing so is likely to be a win.

            regards, tom lane

Re: Performance of PostgreSQL B+-tree algorithm

From
Kyle Lanclos
Date:
Tom Lane wrote:
> Well, that would depend on the data type being indexed, which you did
> not specify; and if it's a variable-length type then it's really hard to
> give a concrete answer.

Thanks for the quick reply; I did not appreciate that the Knuth order
would vary according to the data being indexed.

In my specific case, I have an index on (text, double). There are individual
indexes on (text) and (double) that are of some interest, but the main
interest is the two-column index.

The text column in question typically does not contain values longer than
ten characters.

> Basically it's 4 bytes for line pointer, plus 8 bytes for index tuple
> header, plus maxalign'ed size of the index key, divided into page size
> (less a couple dozen bytes for page header).

So, it is the size of the index key that varies depending on the column
type?

> You could increase the result by building with a page size of more than
> the default 8K, though I've seen no recent experiments suggesting that
> doing so is likely to be a win.

I'm thinking it would have to be a very large increase in page size for
it to have an impact. I'm guessing you would also pay a fixed cost
(log (Knuth order)) to traverse a leaf node once you get there. One can
probably produce graphs that show how many records one needs in a database
table before the page size increase starts to make sense.

--Kyle

Re: Performance of PostgreSQL B+-tree algorithm

From
Tom Lane
Date:
Kyle Lanclos <lanclos@ucolick.org> writes:
> Tom Lane wrote:
>> Well, that would depend on the data type being indexed, which you did
>> not specify; and if it's a variable-length type then it's really hard to
>> give a concrete answer.

> In my specific case, I have an index on (text, double). There are individual
> indexes on (text) and (double) that are of some interest, but the main
> interest is the two-column index.
> The text column in question typically does not contain values longer than
> ten characters.

>> Basically it's 4 bytes for line pointer, plus 8 bytes for index tuple
>> header, plus maxalign'ed size of the index key, divided into page size
>> (less a couple dozen bytes for page header).

> So, it is the size of the index key that varies depending on the column
> type?

Yeah.  You could probably safely assume that the text column occupies at
most 16 bytes (and because of the alignment requirement for the double,
it's unlikely to be much less either).  So that gives 4+8+16+8 = 36
bytes per index entry for this case, so you could expect to fit at least
220 or so entries per index page.

BTW, I'm unsure that that's a representative number in practice.  The
traditional wisdom for btree indexes on changing data is that the fill
factor averages only around 2/3rds, which would mean you'd really find
maybe 150 or so entries on a typical index page.  Not sure if the model
you're planning to use accounts for index slack space separately or not.

            regards, tom lane