Re: Maintaining cluster order on insert - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: Maintaining cluster order on insert
Date
Msg-id 200705181354.l4IDsxp16082@momjian.us
Whole thread Raw
In response to Re: Maintaining cluster order on insert  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-patches
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


Heikki Linnakangas wrote:
> Jaime Casanova wrote:
> > On 5/16/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
> >> Jim C. Nasby wrote:
> >> > What about adding the ability to ask the FSM for a page that's near a
> >> > given page? That way if you did have to go to the FSM you could at
> >> least
> >> > try and insert close to the page you originally wanted.
> >>
> >> Yeah, there's always room for improvement. I made the patch when I was
> >> working on clustered indexes, and was mostly concerned about getting
> >> inserts to the same page as other tuples with similar values so that the
> >> clustered index stays clustered.
> >>
> >
> > the patch doesn't apply in cvs... you'll need to update it...
>
> Oh, here you are.
>
> The implementation has changed a bit since August. I thought I had
> submitted an updated version in the winter but couldn't find it. Anyway,
> I updated and dusted off the source tree, tidied up the comments a
> little bit, and fixed some inconsistencies in pg_proc entries that made
> opr_sanity to fail.
>
> The beef of the patch is two new optional indexam API functions:
> amprepareinsert and amfinishinsert. amprepareinsert is called before
> inserting the heap tuple. It descends the tree and finds and pins the
> right leaf page to insert to, and returns a suggestion on where the heap
> tuple should be inserted. amfinishinsert is called after inserting the
> heap tuple to actually insert the index tuple. Documentation for these
> functions need to be added indexam.sgml, I noticed that that's not done yet.
>
> The cluster_inserts GUC option that you can use to enable/disable the
> feature should be removed before committing.
>
> The performance characteristics of this patch hasn't been thoroughly
> discussed yet. The reason why you want to cluster your tables is to
> speed up SELECTs that return a bunch of tuples with similar values, for
> example range queries. The reason for keeping them clustered on inserts
> is to reduce the need to run CLUSTER as often.
>
> It doesn't come without a cost, however. In the worst case, there never
> is room for new inserts on pages, and each insert needs to do one extra
> I/O to fetch the optimal heap page where the insert should go, see that
> there's no room, and then insert somewhere else. Using a non-zero
> fillfactor helps, but even when there is room on the page, it's often
> cheaper to just append to the end of the table and running CLUSTER at
> night for example, than do random access to insert to the "right" pages
> in the heap.
>
> So, should we have a WITH-option on the table to enable/disable this
> feature, and what would be the default?
>
> --
>    Heikki Linnakangas
>    EnterpriseDB   http://www.enterprisedb.com


>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

pgsql-patches by date:

Previous
From: "Pavel Stehule"
Date:
Subject: Re: Updateable cursors patch
Next
From: Tom Lane
Date:
Subject: Re: CREATE TABLE LIKE INCLUDING INDEXES support