Re: Performance on inserts - Mailing list pgsql-hackers

From Alfred Perlstein
Subject Re: Performance on inserts
Date
Msg-id 20000825085219.V1209@fw.wintelcom.net
Whole thread Raw
In response to Re: Performance on inserts  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Performance on inserts
List pgsql-hackers
* Tom Lane <tgl@sss.pgh.pa.us> [000825 08:25] wrote:
> 
> In other words, we do a linear scan over all the pages containing equal
> key values, in the hope of finding one where we can shoehorn in the new
> entry.  If the keys are roughly equal-sized, that hope is usually vain,
> and we end up splitting the rightmost page that contains any keys
> equal to the target key.  Subsequent inserts of the same key value fill
> first the left-half split page, then the right-half, then split the
> right-half page again and repeat --- but for each insert we first had
> to scan over all the preceding pages containing equal keys.  That means
> inserting N equal keys takes O(N^2) time, for large enough N.
> 
> After thinking about this for a while, I propose a probabilistic
> solution.  Suppose that we add to the above loop a random-number check
> that causes us to fall out of the loop with probability 1/3 or so;
> that is, if the current page is full of the target key, then we move
> right to check the next page with probability 2/3, but with probability
> 1/3 we give up and split the current page to insert the new key.
> 
> With this approach we'd hardly ever move more than, say, 10 pages before
> splitting, so the time to insert a new key is bounded no matter how many
> duplicates it has.
> 
> The reason for using a random decision is that if we always stopped
> after scanning a fixed number of pages, then the space freed up by the
> previous decision to split would always be just out of reach, and we'd
> end up with lots of half-full pages of the same key.  With a random
> decision we will visit and fill both the left and right halves of a
> previously split page.
> 
> Comments?  Is there a better way?  What's the best probability to use?


I'm unsure if it's possible, but somehow storing the last place one
'gave up' and decided to split the page could offer a useful next-start
for the next insert.  Sort of attempting the split work amongst multiple
requests.  For some reason it looks like your algorithm might cause
problems because it plain gives up after 10 pages?                       
hope this helps,
-Alfred


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: advice on extensions needed
Next
From: Brook Milligan
Date:
Subject: Re: advice on extensions needed