Re: Minmax indexes - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Minmax indexes
Date
Msg-id 20140615023404.GY18688@eldon.alvh.no-ip.org
Whole thread Raw
In response to Re: Minmax indexes  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Minmax indexes  (Robert Haas <robertmhaas@gmail.com>)
Re: Minmax indexes  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
Robert Haas wrote:
> On Wed, Sep 25, 2013 at 4:34 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > Here's an updated version of this patch, with fixes to all the bugs
> > reported so far.  Thanks to Thom Brown, Jaime Casanova, Erik Rijkers and
> > Amit Kapila for the reports.
>
> I'm not very happy with the use of a separate relation fork for
> storing this data.

Here's a new version of this patch.  Now the revmap is not stored in a
separate fork, but together with all the regular data, as explained
elsewhere in the thread.

I added a few pageinspect functions that let one explore the data in the
index.  With this you can start by reading the metapage, and from there
obtain the block numbers for the revmap array pages; and explore revmap
array pages to read regular revmap pages, which contain the TIDs to
index entries.  All these pageinspect functions don't currently have any
documentation, but it's as easy as

  with idxname as (select 'ti'::text as idxname)
select *
  from idxname,
       generate_series(0, pg_relation_size(idxname) / 8192 - 1) i,
       minmax_page_type(get_raw_page(idxname, i::int));

 select *        -- data in metapage
   from
       minmax_metapage_info(get_raw_page('ti', 0));

  select *        -- data in revmap array pages
    from minmax_revmap_array_data(get_raw_page('ti', 6));

  select logblk, unnest(pages)    -- data in regular revmap pages
    from minmax_revmap_data(get_raw_page('ti', 15));

  select *        -- data in regular index pages
    from minmax_page_items(get_raw_page('ti', 2), 'ti'::regclass);

Note that in this last case you need to give it the OID of the index as
the second parameter, so that it can construct a tupledesc for decoding
the min/max data.

I have followed the suggestion by Amit to overwrite the index tuple when
a new heap tuple is inserted, instead of creating a separate index
tuple.  This saves a lot of index bloat.  This required a new entry
point in bufpage.c, PageOverwriteItemData().  bufpage.c also has a new
function PageIndexDeleteNoCompact which is similar in spirit to
PageIndexMultiDelete except that item pointers do not change.  This is
necessary because the revmap stores item pointers, and such reference
would break if we were to renumber items in index pages.

I have also added a reloption for the size of each page range, so you
can do
  create index ti on t using minmax (a) with (pages_per_range = 2);
The default is 128 pages per range, and I have an arbitrary maximum of
131072 (default size of a 1GB segment).  There doesn't seem to be much
point in having larger page ranges; intuitively I think page ranges
should be more or less the size of kernel readahead, but I haven't
tested this.


I didn't want to rebase past 0ef0b6784 in a hurry.  I only know this
applies cleanly on top of fe7337f2dc, so please use that if you want to
play with it.  I will post a rebased version shortly.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Atomics hardware support table & supported architectures
Next
From: Alvaro Herrera
Date:
Subject: Re: Add CREATE support to event triggers