Re: GIN improvements part 1: additional information - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: GIN improvements part 1: additional information |
Date | |
Msg-id | 524DBAEA.9080908@vmware.com Whole thread Raw |
In response to | Re: GIN improvements part 1: additional information (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: GIN improvements part 1: additional information
|
List | pgsql-hackers |
On 23.09.2013 18:35, Bruce Momjian wrote: > On Sun, Sep 15, 2013 at 01:14:45PM +0400, Alexander Korotkov wrote: >> On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangas<hlinnakangas@vmware.com> >> wrote: >> >> There's a few open questions: >> >> 1. How are we going to handle pg_upgrade? It would be nice to be able to >> read the old page format, or convert on-the-fly. OTOH, if it gets too >> complicated, might not be worth it. The indexes are much smaller with the >> patch, so anyone using GIN probably wants to rebuild them anyway, sooner or >> later. Still, I'd like to give it a shot. > > We have broken pg_upgrade index compatibility in the past. > Specifically, hash and GIN index binary format changed from PG 8.3 to > 8.4. I handled it by invalidating the indexes and providing a > post-upgrade script to REINDEX all the changed indexes. The user > message is: > > Your installation contains hash and/or GIN indexes. These indexes have > different internal formats between your old and new clusters, so they > must be reindexed with the REINDEX command. The file: > > ... > > when executed by psql by the database superuser will recreate all invalid > indexes; until then, none of these indexes will be used. > > It would be very easy to do this from a pg_upgrade perspective. > However, I know there has been complaints from others about making > pg_upgrade more restrictive. > > In this specific case, even if you write code to read the old file > format, we might want to create the REINDEX script to allow _optional_ > reindexing to shrink the index files. > > If we do require the REINDEX, --check will clearly warn the user that > this will be required. It seems we've all but decided that we'll require reindexing GIN indexes in 9.4. Let's take the opportunity to change some other annoyances with the current GIN on-disk format: 1. There's no explicit "page id" field in the opaque struct, like there is in other index types. This is for the benefit of debugging tools like pg_filedump. We've managed to tell GIN pages apart from other index types by the fact that the special size of GIN pages is 8 and it's not using all the high-order bits in the last byte on the page. But an explicit page id field would be nice, so let's add that. 2. I'd like to change the way "incomplete splits" are handled. Currently, WAL recovery keeps track of incomplete splits, and fixes any that remain at the end of recovery. That concept is slightly broken; it's not guaranteed that after you've split a leaf page, for example, you will succeed in inserting the downlink to its parent. You might e.g run out of disk space. To fix that, I'd like to add a flag to the page header to indicate if the split has been completed, ie. if the page's downlink has been inserted to the parent, and fix them lazily on the next insert. I did a similar change to GiST back in 9.1. (Strictly speaking this doesn't require changing the on-disk format, though.) 3. I noticed that the GIN b-trees, the main key entry tree and the posting trees, use a slightly different arrangement of the downlink than our regular nbtree code does. In nbtree, the downlink for a page is the *low* key of that page, ie. if the downlink is 10, all the items on that child page must be >= 10. But in GIN, we store the *high* key in the downlink, ie. all the items on the child page must be <= 10. That makes inserting new downlinks at a page split slightly more complicated. For example, when splitting a page containing keys between 1-10 into 1-5 and 5-10, you need to insert a new downlink with key 10 for the new right page, and also update the existing downlink to 5. The nbtree code doesn't require updating existing entries. Anything else? - Heikki
pgsql-hackers by date: