Home > mailing lists

Re: Constant time insertion into highly non-unique indexes - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Constant time insertion into highly non-unique indexes
Date	April 14, 2005 14:35:51
Msg-id	372.1113489340@sss.pgh.pa.us Whole thread Raw
In response to	Constant time insertion into highly non-unique indexes (Simon Riggs <simon@2ndquadrant.com>)
Responses	Re: Constant time insertion into highly non-unique
List	pgsql-hackers

Tree view

Simon Riggs <simon@2ndquadrant.com> writes:
> Recent discussions on PERFORM have made me look into some aspects of 
> B-tree index code, especially with regard to bulk loading high volumes
> of data.

Have you read the archived discussions that led up to the current
algorithm?  I don't think it's nearly as bad as you believe.  In
particular, I think you missed the point that the move-or-split
decision is *random* and is therefore made differently by each inserter.
Therefore the probability that the earlier pages get split rises rapidly
as more and more insertions are made --- and it only takes one split
decision to take the pressure off.

It's entirely possible that the 0.99 figure needs some fine tuning.
IIRC, the experiments we did to choose that number were done with
pretty simple test indexes --- probably just int4.  Thinking about
the behavior, it seems plausible that the figure needs to drop as
the number of entries per page drops ... but we have not tested that.
In an int4 index on Intel-ish hardware (MAXALIGN 4), you can fit about
500 entries per page.  So consider a case where the first 2 pages for a
given value are full and the third is half full.  To fill the third
completely will require 250 insertions, by which time there is a very
good chance (more than 90% if I did the math right) that someone will
have decided to split rather than move right at the second page.  After
that the second page fills, and then we are back to the original state
(because the new third page will be half full).  So I claim that in fact
the behavior *is* constant time: most insertions will succeed on either
the second or third page, indefinitely.  However, obviously if there are
only a few values per page, you would get much worse behavior.  (OTOH,
the wider the index values, the lower the probability of exact
duplicates anyway, I'd think, so we may be wasting our time to worry
about the behavior there.)
        regards, tom lane

pgsql-hackers by date:

From: Simon Riggs
Date: 14 April 2005, 12:54:03
Subject: Constant time insertion into highly non-unique indexes

From: Simon Riggs
Date: 14 April 2005, 15:08:15
Subject: Re: Constant time insertion into highly non-unique

Re: Constant time insertion into highly non-unique indexes - Mailing list pgsql-hackers

Previous

Next