Re: Performance on inserts - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Performance on inserts
Date
Msg-id 9612.967244422@sss.pgh.pa.us
Whole thread Raw
In response to Re: Performance on inserts  (Alfred Perlstein <bright@wintelcom.net>)
Responses Re: Performance on inserts  (Jules Bean <jules@jellybean.co.uk>)
List pgsql-hackers
Alfred Perlstein <bright@wintelcom.net> writes:
> A slightly incorrect startscan point is better than starting
> from the beginning every time.

Not for lookups, it isn't ;-).

I did think about making the btree-descending code act differently
for inserts than lookups, but AFAICS that wouldn't buy anything for
this problem, since until you get down to the leaf level you can't
see where there's free space anyway.  You'd simply be exchanging the
problem of missing free space that's too far to the right for the
problem of missing free space that's to the left of where you chose
to start looking.

The algorithm does seem to work quite nicely just the way I described
it, although it turns out I was off base about a good probability
setting.  I find that something up around 0.99 seems to be good.
Using the same (perhaps overly simplistic) test case:

# tuples inserted        6.5        current+random hack @ 0.99        Time    index size    Time    index size
1536            <1sec    90112        <1sec    106496
3072            1.56    163840        <1sec    188416
6144            3.70    286720        1.40    376832
12288            9.73    532480        2.65    688128
24576            93.26    1024000        5.22    1368064
49152            363.23    2007040        10.34    2727936
98304                        22.07    5545984
196608                        45.60    11141120
393216                        92.53    22290432

I tried probabilities from 0.67 to 0.999 and found that runtimes didn't
vary a whole lot (though this is near the minimum), while index size
consistently got larger as the probability of moving right decreased.
The runtime is nicely linear throughout the range.

The index size increase might look like a bit of a jump, but actually
this is right where we want it to be.  The old code was effectively
packing each page as full as it could be under these conditions.  That's
not what you want for a btree.  Steady-state load factor for btrees is
usually quoted as somewhere around 70%, and this method manages to
approximate that pretty well with a move-right probability of 0.99 or
a bit less.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Vince Vielhaber
Date:
Subject: Pure ODBMS (fwd)
Next
From: Thomas Lockhart
Date:
Subject: Re: Performance on inserts