Corner cases with GiST n-way splits - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Corner cases with GiST n-way splits
Date
Msg-id 4FABF771.9050304@enterprisedb.com
Whole thread Raw
Responses Re: Corner cases with GiST n-way splits
List pgsql-hackers
GiST page splitting has the peculiarity that it sometimes needs to split 
a single page into more than two pages. It happens rarely in practice, 
but it possible (*). With a bad picksplit function, it happens more often.

While testing with a custom gist opclass with truly evil helper 
functions, I found two corner cases with the current implementation when 
a page is split into many halves:

1. If a page is split into more than 100 pages, you run into the same 
limit of 100 simultaneous lwlocks that Tom Forbes reported with a 
pathological intarray index. This time it's not because we hold locks on 
many different levels, but because of a single split.

2. When the root page is split, there is no parent page to update, so we 
just create a new root page with the downlinks. However, when you split 
a page into a lot of siblings, it's possible that all the downlinks 
don't fit on a single page. The code is prepared for that situation. You 
get an error, when it tries to add more downlinks on a single page than 
fit there.

I'm not sure what to do about these. Neither issue is something you'd 
actually bump into in an index that's doing something useful; there's 
been no user complaints about these.


(*) What follows is an explanation of how a page can be split into more 
than two halves, to help you (and me!) understand this:

In a very pathological case, it's possible for a single insertion to 
cause a page to be split into hundreds of pages. Imagine that you have a 
page full of very small tuples (let's imagine that a page can hold 8 
letters, and ignore all tuple overhead for now):

A B C D E F G H

Now you insert one large tuple on the page, DDDD. picksplit algorithm 
can choose to split this as:

A - B C D E F G H DDDD

The right side is still too large to on a single page, so it's 
iteratively split again:

A - B - C D E F G H DDDD

And again:

A - B - C - D E F G H DDDD

And again:

A - B - C - D - E F G H DDDD

In this example, the page was split into 5 halves, but in reality a page 
can hold many more tuples, and the difference between a small and a 
large tuple can be much greater, so you can end up with many more 
siblings in one split.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Draft release notes complete
Next
From: Bruce Momjian
Date:
Subject: Re: Draft release notes complete