Re: Maintaining cluster order on insert - Mailing list pgsql-patches

From Heikki Linnakangas
Subject Re: Maintaining cluster order on insert
Date
Msg-id 464F410C.8090900@enterprisedb.com
Whole thread Raw
In response to Re: Maintaining cluster order on insert  ("Jaime Casanova" <systemguards@gmail.com>)
Responses Re: Maintaining cluster order on insert  (Bruce Momjian <bruce@momjian.us>)
Re: Maintaining cluster order on insert  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
Re: Maintaining cluster order on insert  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-patches
Jaime Casanova wrote:
> On 5/18/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
>> Jaime Casanova wrote:
>> >
>> > the patch doesn't apply in cvs... you'll need to update it...
>>
>> Oh, here you are.
>>
>> The implementation has changed a bit since August. I thought I had
>> submitted an updated version in the winter but couldn't find it. Anyway,
>> I updated and dusted off the source tree, tidied up the comments a
>> little bit, and fixed some inconsistencies in pg_proc entries that made
>> opr_sanity to fail.
>>
>
> this one doesn't apply either... there are problems with nbtinsert.c and
> pg_am.h

Ah, sorry about that. For some reason my source tree was checked out
from the 8.2 branch, instead of CVS HEAD.

Here you are. Thanks for looking at this!

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Index: doc/src/sgml/catalogs.sgml
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/doc/src/sgml/catalogs.sgml,v
retrieving revision 2.152
diff -c -r2.152 catalogs.sgml
*** doc/src/sgml/catalogs.sgml    15 May 2007 19:13:54 -0000    2.152
--- doc/src/sgml/catalogs.sgml    19 May 2007 16:23:49 -0000
***************
*** 517,522 ****
--- 517,536 ----
        <entry>Function to parse and validate <structfield>reloptions</> for an index</entry>
       </row>

+      <row>
+       <entry><structfield>amprepareinsert</structfield></entry>
+       <entry><type>regproc</type></entry>
+       <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+       <entry>Performs the 1st phase of a two phase index insert, returning a suggestion of where in the heap to put a
newtuple</entry> 
+      </row>
+
+      <row>
+       <entry><structfield>amfinishinsert</structfield></entry>
+       <entry><type>regproc</type></entry>
+       <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+       <entry>Finishes an index insert started with amprepareinsert</entry>
+      </row>
+
      </tbody>
     </tgroup>
    </table>
Index: src/backend/access/heap/heapam.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.232
diff -c -r1.232 heapam.c
*** src/backend/access/heap/heapam.c    8 Apr 2007 01:26:27 -0000    1.232
--- src/backend/access/heap/heapam.c    19 May 2007 16:45:14 -0000
***************
*** 1368,1373 ****
--- 1368,1377 ----
   * Note that use_wal and use_fsm will be applied when inserting into the
   * heap's TOAST table, too, if the tuple requires any out-of-line data.
   *
+  * If suggested_blk is a valid block number, the tuple will be inserted to
+  * that block if there's enough room. If it's full, a block will be chosen
+  * as if suggested_blk was not set.
+  *
   * The return value is the OID assigned to the tuple (either here or by the
   * caller), or InvalidOid if no OID.  The header fields of *tup are updated
   * to match the stored tuple; in particular tup->t_self receives the actual
***************
*** 1376,1382 ****
   */
  Oid
  heap_insert(Relation relation, HeapTuple tup, CommandId cid,
!             bool use_wal, bool use_fsm)
  {
      TransactionId xid = GetCurrentTransactionId();
      HeapTuple    heaptup;
--- 1380,1386 ----
   */
  Oid
  heap_insert(Relation relation, HeapTuple tup, CommandId cid,
!             bool use_wal, bool use_fsm, BlockNumber suggested_blk)
  {
      TransactionId xid = GetCurrentTransactionId();
      HeapTuple    heaptup;
***************
*** 1432,1440 ****
      else
          heaptup = tup;

!     /* Find buffer to insert this tuple into */
!     buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
!                                        InvalidBuffer, use_fsm);

      /* NO EREPORT(ERROR) from here till changes are logged */
      START_CRIT_SECTION();
--- 1436,1478 ----
      else
          heaptup = tup;

!     /* Find buffer to insert this tuple into. Try the suggested block first
!      * if caller gave one.
!      */
!     if (suggested_blk != InvalidBlockNumber)
!     {
!         Buffer suggested_buf;
!         Page pageHeader;
!         Size pageFreeSpace;
!
!         suggested_buf = ReadBuffer(relation, suggested_blk);
!         pageHeader = (Page) BufferGetPage(suggested_buf);
!
!         LockBuffer(suggested_buf, BUFFER_LOCK_EXCLUSIVE);
!
!         /* Don't subtract fillfactor from the free space. That space is
!          * reserved exactly for situations like this; keeping updated and
!          * inserted tuples close to other tuples with similar values.
!          */
!         pageFreeSpace = PageGetFreeSpace(pageHeader);
!
!         if (heaptup->t_len <= pageFreeSpace)
!             buffer = suggested_buf;
!         else
!         {
!             /* Page was full. Release lock and pin and get another block
!              * as if suggested_blk was not given.
!              */
!             LockBuffer(suggested_buf, BUFFER_LOCK_UNLOCK);
!             ReleaseBuffer(suggested_buf);
!
!             buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
!                                                InvalidBuffer, use_fsm);
!         }
!     }
!     else
!         buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
!                                            InvalidBuffer, use_fsm);

      /* NO EREPORT(ERROR) from here till changes are logged */
      START_CRIT_SECTION();
***************
*** 1544,1550 ****
  Oid
  simple_heap_insert(Relation relation, HeapTuple tup)
  {
!     return heap_insert(relation, tup, GetCurrentCommandId(), true, true);
  }

  /*
--- 1582,1589 ----
  Oid
  simple_heap_insert(Relation relation, HeapTuple tup)
  {
!     return heap_insert(relation, tup, GetCurrentCommandId(), true,
!                        true, InvalidBlockNumber);
  }

  /*
Index: src/backend/access/heap/tuptoaster.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/tuptoaster.c,v
retrieving revision 1.74
diff -c -r1.74 tuptoaster.c
*** src/backend/access/heap/tuptoaster.c    6 Apr 2007 04:21:41 -0000    1.74
--- src/backend/access/heap/tuptoaster.c    19 May 2007 16:45:39 -0000
***************
*** 1146,1152 ****
          if (!HeapTupleIsValid(toasttup))
              elog(ERROR, "failed to build TOAST tuple");

!         heap_insert(toastrel, toasttup, mycid, use_wal, use_fsm);

          /*
           * Create the index entry.    We cheat a little here by not using
--- 1146,1153 ----
          if (!HeapTupleIsValid(toasttup))
              elog(ERROR, "failed to build TOAST tuple");

!         heap_insert(toastrel, toasttup, mycid, use_wal, use_fsm,
!                     InvalidBlockNumber);

          /*
           * Create the index entry.    We cheat a little here by not using
Index: src/backend/access/index/indexam.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/index/indexam.c,v
retrieving revision 1.97
diff -c -r1.97 indexam.c
*** src/backend/access/index/indexam.c    5 Jan 2007 22:19:23 -0000    1.97
--- src/backend/access/index/indexam.c    19 May 2007 16:23:57 -0000
***************
*** 18,23 ****
--- 18,25 ----
   *        index_rescan    - restart a scan of an index
   *        index_endscan    - end a scan
   *        index_insert    - insert an index tuple into a relation
+  *        index_prepareinsert    - get desired insert location for a heap tuple
+  *        index_finishinsert    - insert a previously prepared index tuple
   *        index_markpos    - mark a scan position
   *        index_restrpos    - restore a scan position
   *        index_getnext    - get the next tuple from a scan
***************
*** 202,207 ****
--- 204,269 ----
                                        BoolGetDatum(check_uniqueness)));
  }

+ /* ----------------
+  *        index_prepareinsert - get desired insert location for a heap tuple
+  *
+  * The returned BlockNumber is the *heap* page that is the best place
+  * to insert the given tuple to, according to the index am. The best
+  * place is one that maintains the cluster order.
+  *
+  * opaque should be passed to a later index_finishinsert to finish the
+  * insert.
+  * ----------------
+  */
+ BlockNumber
+ index_prepareinsert(Relation indexRelation,
+                     Datum *values,
+                     bool *isnull,
+                     Relation heapRelation,
+                     bool check_uniqueness,
+                     void **opaque)
+ {
+     FmgrInfo   *procedure;
+
+     RELATION_CHECKS;
+     GET_REL_PROCEDURE(amprepareinsert);
+
+     /*
+      * have the am's prepareinsert proc do all the work.
+      */
+     return DatumGetUInt32(FunctionCall6(procedure,
+                                         PointerGetDatum(indexRelation),
+                                         PointerGetDatum(values),
+                                         PointerGetDatum(isnull),
+                                         PointerGetDatum(heapRelation),
+                                         BoolGetDatum(check_uniqueness),
+                                         PointerGetDatum(opaque)));
+ }
+
+ /* ----------------
+  *        index_finishinsert - insert a previously prepared index tuple
+  *
+  * Finishes an insert operation initiated by an earlier call to
+  * index_prepareinsert.
+  * ----------------
+  */
+ bool
+ index_finishinsert(Relation indexRelation,
+                    ItemPointer heap_t_ctid, void *opaque)
+ {
+     FmgrInfo   *procedure;
+
+     RELATION_CHECKS;
+     GET_REL_PROCEDURE(amfinishinsert);
+
+     /*
+      * have the am's finishinsert proc do all the work.
+      */
+     return DatumGetBool(FunctionCall2(procedure,
+                                       PointerGetDatum(heap_t_ctid),
+                                       PointerGetDatum(opaque)));
+ }
+
  /*
   * index_beginscan - start a scan of an index with amgettuple
   *
Index: src/backend/access/nbtree/nbtinsert.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtinsert.c,v
retrieving revision 1.156
diff -c -r1.156 nbtinsert.c
*** src/backend/access/nbtree/nbtinsert.c    11 Apr 2007 20:47:37 -0000    1.156
--- src/backend/access/nbtree/nbtinsert.c    19 May 2007 18:12:25 -0000
***************
*** 96,114 ****
      /* we need an insertion scan key to do our search, so build one */
      itup_scankey = _bt_mkscankey(rel, itup);

- top:
      /* find the first page containing this key */
      stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_WRITE);

      offset = InvalidOffsetNumber;

!     /* trade in our read lock for a write lock */
      LockBuffer(buf, BUFFER_LOCK_UNLOCK);
      LockBuffer(buf, BT_WRITE);

      /*
       * If the page was split between the time that we surrendered our read
!      * lock and acquired our write lock, then this page may no longer be the
       * right place for the key we want to insert.  In this case, we need to
       * move right in the tree.    See Lehman and Yao for an excruciatingly
       * precise description.
--- 96,224 ----
      /* we need an insertion scan key to do our search, so build one */
      itup_scankey = _bt_mkscankey(rel, itup);

      /* find the first page containing this key */
      stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_WRITE);

      offset = InvalidOffsetNumber;

!     /* release our read lock. _bt_finishinsert will relock the page in
!      * exclusive mode.
!      */
      LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+
+     _bt_finishinsert(rel, heapRel, index_is_unique, itup,
+                      itup_scankey, stack, buf);
+ }
+
+ /*
+  *    _bt_prepareinsert() -- Find the insert location for a new tuple
+  *
+  * Descends the tree and finds the location for a new index tuple.
+  * As a hint to the executor, returns the heap block number the previous
+  * index tuple at that location points to. By inserting the heap tuple
+  * to that block, the heap will stay better clustered than by inserting
+  * to a random block.
+  *
+  * The leaf page is pinned and a reference to it, among other information
+  * needed to finish the insert, is stored in opaquePtr.
+  */
+ BlockNumber
+ _bt_prepareinsert(Relation rel, IndexTuple itup, bool index_is_unique,
+                   Relation heapRel, BTInsertInfo *opaquePtr)
+ {
+     int            natts = rel->rd_rel->relnatts;
+     OffsetNumber offset;
+     Page        page;
+     BTPageOpaque opaque;
+
+     ScanKey        itup_scankey;
+     BTStack        stack;
+     Buffer        buf;
+     BlockNumber suggestion = InvalidBlockNumber;
+     BTInsertInfo insert_opaque;
+
+     /* we need an insertion scan key to do our search, so build one */
+     itup_scankey = _bt_mkscankey(rel, itup);
+
+     /* find the first page containing this key */
+     stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_READ);
+     if(!BufferIsValid(buf))
+     {
+         /* The index was completely empty. No suggestion then. */
+         *opaquePtr = NULL;
+         return InvalidBlockNumber;
+     }
+
+     page = BufferGetPage(buf);
+     opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+     /* Find the location in the page where the new index tuple would go to. */
+
+     offset = _bt_binsrch(rel, buf, natts, itup_scankey, false);
+     if (offset > PageGetMaxOffsetNumber(page))
+     {
+         /* _bt_binsrch returned pointer to end-of-page. It means that
+          * there was no equal items on the page, and the new item should
+          * be inserted as the last tuple of the page. There could be equal
+          * items on the next page, however.
+          *
+          * At the moment, we just ignore the potential equal items on the
+          * right, and pretend there isn't any. We could instead walk right
+          * to the next page to check that, but let's keep it simple for now.
+          */
+         offset = OffsetNumberPrev(offset);
+     }
+     if(offset < P_FIRSTDATAKEY(opaque))
+     {
+         /* We landed on an empty page. We could step left or right until
+          * we find some items, but let's keep it simple for now.
+          */
+     } else {
+         /* We're now positioned at the index tuple that we're interested in. */
+         ItemId iid = PageGetItemId(page, offset);
+         IndexTuple curitup = (IndexTuple) PageGetItem(page, iid);
+
+         suggestion = ItemPointerGetBlockNumber(&curitup->t_tid);
+     }
+
+     /* Release the read lock. _bt_finishinsert will later reacquire it in
+      * exclusive mode. Keeping the buffer locked would be deadlock-prone
+      * as well; who knows what the caller is going to do, and what pages to
+      * lock, before calling finishinsert.
+      */
+     LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+
+     /* Return a struct with all information needed to finish this insert. */
+     insert_opaque = *opaquePtr = palloc(sizeof(struct BTInsertInfoData));
+     insert_opaque->rel = rel;
+     insert_opaque->heapRel = heapRel;
+     insert_opaque->index_is_unique = index_is_unique;
+     insert_opaque->itup = itup;
+     insert_opaque->itup_scankey = itup_scankey;
+     insert_opaque->stack = stack;
+     insert_opaque->buf = buf;
+
+     return suggestion;
+ }
+
+ /*
+  *    _bt_finishinsert() -- Finish an insert prepared with prepareinsert
+  */
+ void
+ _bt_finishinsert(Relation rel, Relation heapRel, bool index_is_unique,
+                  IndexTuple itup, ScanKey itup_scankey,
+                  BTStack stack, Buffer buf)
+ {
+     int            natts = rel->rd_rel->relnatts;
+     OffsetNumber offset = InvalidOffsetNumber;
+
      LockBuffer(buf, BT_WRITE);

+ top:
+
      /*
       * If the page was split between the time that we surrendered our read
!      * lock in _bt_prepareinsert or _bt_doinsert, and acquired our write lock, then this page may no longer be the
       * right place for the key we want to insert.  In this case, we need to
       * move right in the tree.    See Lehman and Yao for an excruciatingly
       * precise description.
***************
*** 146,151 ****
--- 256,269 ----
              XactLockTableWait(xwait);
              /* start over... */
              _bt_freestack(stack);
+
+             /* find the first page containing this key */
+             stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_WRITE);
+
+             /* trade in our read lock for a write lock */
+             LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+             LockBuffer(buf, BT_WRITE);
+
              goto top;
          }
      }
***************
*** 157,162 ****
--- 275,281 ----
      /* be tidy */
      _bt_freestack(stack);
      _bt_freeskey(itup_scankey);
+     pfree(itup);
  }

  /*
Index: src/backend/access/nbtree/nbtree.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtree.c,v
retrieving revision 1.154
diff -c -r1.154 nbtree.c
*** src/backend/access/nbtree/nbtree.c    5 Jan 2007 22:19:23 -0000    1.154
--- src/backend/access/nbtree/nbtree.c    19 May 2007 16:23:58 -0000
***************
*** 223,229 ****

      _bt_doinsert(rel, itup, checkUnique, heapRel);

!     pfree(itup);

      PG_RETURN_BOOL(true);
  }
--- 223,278 ----

      _bt_doinsert(rel, itup, checkUnique, heapRel);

!     PG_RETURN_BOOL(true);
! }
!
! /*
!  *    btprepareinsert() -- find the best place in the heap to put a new tuple.
!  *
!  *        This uses the same logic as btinsert to find the place where the index
!  *        tuple would go if this was a btinsert call.
!  */
! Datum
! btprepareinsert(PG_FUNCTION_ARGS)
! {
!     Relation    rel = (Relation) PG_GETARG_POINTER(0);
!     Datum       *values = (Datum *) PG_GETARG_POINTER(1);
!     bool       *isnull = (bool *) PG_GETARG_POINTER(2);
!     Relation    heapRel = (Relation) PG_GETARG_POINTER(3);
!     bool        checkUnique = PG_GETARG_BOOL(4);
!     void      **opaquePtr = (void **) PG_GETARG_POINTER(5);
!     IndexTuple    itup;
!     BlockNumber suggestion;
!
!     /* generate an index tuple */
!     itup = index_form_tuple(RelationGetDescr(rel), values, isnull);
!
!     suggestion =_bt_prepareinsert(rel, itup, checkUnique, heapRel,
!                                   (BTInsertInfo *) opaquePtr);
!
!     PG_RETURN_UINT32(suggestion);
! }
!
! /*
!  *    btfinishinsert() -- finish insert
!  */
! Datum
! btfinishinsert(PG_FUNCTION_ARGS)
! {
!     ItemPointer ht_ctid = (ItemPointer) PG_GETARG_POINTER(0);
!     BTInsertInfo opaque = (void *) PG_GETARG_POINTER(1);
!
!     opaque->itup->t_tid = *ht_ctid;
!
!     _bt_finishinsert(opaque->rel,
!                      opaque->heapRel,
!                      opaque->index_is_unique,
!                      opaque->itup,
!                      opaque->itup_scankey,
!                      opaque->stack,
!                      opaque->buf);
!
!     pfree(opaque);

      PG_RETURN_BOOL(true);
  }
Index: src/backend/commands/copy.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.283
diff -c -r1.283 copy.c
*** src/backend/commands/copy.c    27 Apr 2007 22:05:46 -0000    1.283
--- src/backend/commands/copy.c    19 May 2007 17:14:53 -0000
***************
*** 2109,2115 ****
                  ExecConstraints(resultRelInfo, slot, estate);

              /* OK, store the tuple and create index entries for it */
!             heap_insert(cstate->rel, tuple, mycid, use_wal, use_fsm);

              if (resultRelInfo->ri_NumIndices > 0)
                  ExecInsertIndexTuples(slot, &(tuple->t_self), estate, false);
--- 2109,2116 ----
                  ExecConstraints(resultRelInfo, slot, estate);

              /* OK, store the tuple and create index entries for it */
!             heap_insert(cstate->rel, tuple, mycid, use_wal, use_fsm,
!                         InvalidBlockNumber);

              if (resultRelInfo->ri_NumIndices > 0)
                  ExecInsertIndexTuples(slot, &(tuple->t_self), estate, false);
Index: src/backend/executor/execMain.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/executor/execMain.c,v
retrieving revision 1.293
diff -c -r1.293 execMain.c
*** src/backend/executor/execMain.c    27 Apr 2007 22:05:47 -0000    1.293
--- src/backend/executor/execMain.c    19 May 2007 16:24:01 -0000
***************
*** 53,58 ****
--- 53,59 ----
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"

+ bool cluster_inserts = true; /* GUC */

  typedef struct evalPlanQual
  {
***************
*** 869,876 ****
--- 870,879 ----
      resultRelInfo->ri_RangeTableIndex = resultRelationIndex;
      resultRelInfo->ri_RelationDesc = resultRelationDesc;
      resultRelInfo->ri_NumIndices = 0;
+     resultRelInfo->ri_ClusterIndex = -1;
      resultRelInfo->ri_IndexRelationDescs = NULL;
      resultRelInfo->ri_IndexRelationInfo = NULL;
+     resultRelInfo->ri_PreparedInsertOpaque = NULL;
      /* make a copy so as not to depend on relcache info not changing... */
      resultRelInfo->ri_TrigDesc = CopyTriggerDesc(resultRelationDesc->trigdesc);
      if (resultRelInfo->ri_TrigDesc)
***************
*** 1353,1358 ****
--- 1356,1362 ----
      ResultRelInfo *resultRelInfo;
      Relation    resultRelationDesc;
      Oid            newId;
+     BlockNumber suggestedBlock;

      /*
       * get the heap tuple out of the tuple table slot, making sure we have a
***************
*** 1401,1406 ****
--- 1405,1417 ----
      if (resultRelationDesc->rd_att->constr)
          ExecConstraints(resultRelInfo, slot, estate);

+     /* Ask the index am of the clustered index for the
+      * best place to put it */
+     if(cluster_inserts)
+         suggestedBlock = ExecPrepareIndexInsert(slot, estate);
+     else
+         suggestedBlock = InvalidBlockNumber;
+
      /*
       * insert the tuple
       *
***************
*** 1409,1415 ****
       */
      newId = heap_insert(resultRelationDesc, tuple,
                          estate->es_snapshot->curcid,
!                         true, true);

      IncrAppended();
      (estate->es_processed)++;
--- 1420,1426 ----
       */
      newId = heap_insert(resultRelationDesc, tuple,
                          estate->es_snapshot->curcid,
!                         true, true, suggestedBlock);

      IncrAppended();
      (estate->es_processed)++;
***************
*** 2600,2606 ****
                  tuple,
                  estate->es_snapshot->curcid,
                  estate->es_into_relation_use_wal,
!                 false);            /* never any point in using FSM */

      /* We know this is a newly created relation, so there are no indexes */

--- 2611,2618 ----
                  tuple,
                  estate->es_snapshot->curcid,
                  estate->es_into_relation_use_wal,
!                 false,             /* never any point in using FSM */
!                 InvalidBlockNumber);

      /* We know this is a newly created relation, so there are no indexes */

Index: src/backend/executor/execUtils.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/executor/execUtils.c,v
retrieving revision 1.147
diff -c -r1.147 execUtils.c
*** src/backend/executor/execUtils.c    27 Feb 2007 01:11:25 -0000    1.147
--- src/backend/executor/execUtils.c    19 May 2007 18:22:33 -0000
***************
*** 31,36 ****
--- 31,37 ----
   *        ExecOpenIndices            \
   *        ExecCloseIndices         | referenced by InitPlan, EndPlan,
   *        ExecInsertIndexTuples    /  ExecInsert, ExecUpdate
+  *        ExecPrepareIndexInsert    Referenced by ExecInsert
   *
   *        RegisterExprContextCallback    Register function shutdown callback
   *        UnregisterExprContextCallback  Deregister function shutdown callback
***************
*** 902,907 ****
--- 903,909 ----
      IndexInfo **indexInfoArray;

      resultRelInfo->ri_NumIndices = 0;
+     resultRelInfo->ri_ClusterIndex = -1;

      /* fast path if no indexes */
      if (!RelationGetForm(resultRelation)->relhasindex)
***************
*** 941,946 ****
--- 943,953 ----
          /* extract index key information from the index's pg_index info */
          ii = BuildIndexInfo(indexDesc);

+         /* Remember which index is the clustered one.
+          * It's used to call the suggestblock-method on inserts */
+         if(indexDesc->rd_index->indisclustered)
+             resultRelInfo->ri_ClusterIndex = i;
+
          relationDescs[i] = indexDesc;
          indexInfoArray[i] = ii;
          i++;
***************
*** 1007,1012 ****
--- 1014,1021 ----
      ExprContext *econtext;
      Datum        values[INDEX_MAX_KEYS];
      bool        isnull[INDEX_MAX_KEYS];
+     int            clusterIndex;
+     bool        preparedInsert;

      /*
       * Get information from the result relation info structure.
***************
*** 1016,1021 ****
--- 1025,1049 ----
      relationDescs = resultRelInfo->ri_IndexRelationDescs;
      indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
      heapRelation = resultRelInfo->ri_RelationDesc;
+     clusterIndex = resultRelInfo->ri_ClusterIndex;
+     preparedInsert = resultRelInfo->ri_PreparedInsertOpaque != NULL;
+
+     /*
+      * If the insert to the clustering index was already prepared,
+      * finish it.
+      */
+     if (preparedInsert)
+     {
+         index_finishinsert(relationDescs[clusterIndex],
+                            tupleid,
+                            resultRelInfo->ri_PreparedInsertOpaque);
+         resultRelInfo->ri_PreparedInsertOpaque = NULL;
+
+         /*
+          * keep track of index inserts for debugging
+          */
+         IncrIndexInserted();
+     }

      /*
       * We will use the EState's per-tuple context for evaluating predicates
***************
*** 1036,1041 ****
--- 1064,1072 ----
          if (relationDescs[i] == NULL)
              continue;

+         if (preparedInsert && i == clusterIndex)
+             continue; /* insert to clustered index was already handled above */
+
          indexInfo = indexInfoArray[i];

          /* Check for partial index */
***************
*** 1090,1095 ****
--- 1121,1196 ----
      }
  }

+ /* ----------------------------------------------------------------
+  *        ExecPrepareIndexInsert
+  *
+  *        This routine asks the index am where a new heap tuple
+  *        should be placed.
+  * ----------------------------------------------------------------
+  */
+ BlockNumber
+ ExecPrepareIndexInsert(TupleTableSlot *slot,
+                        EState *estate)
+ {
+     ResultRelInfo *resultRelInfo;
+     int            clusterIndex;
+     Relation    relationDesc;
+     Relation    heapRelation;
+     ExprContext *econtext;
+     Datum        values[INDEX_MAX_KEYS];
+     bool        isnull[INDEX_MAX_KEYS];
+     IndexInfo  *indexInfo;
+
+     /*
+      * Get information from the result relation info structure.
+      */
+     resultRelInfo = estate->es_result_relation_info;
+     clusterIndex = resultRelInfo->ri_ClusterIndex;
+
+     if (clusterIndex == -1)
+         return InvalidBlockNumber; /* there was no clustered index */
+
+     heapRelation = resultRelInfo->ri_RelationDesc;
+     relationDesc = resultRelInfo->ri_IndexRelationDescs[clusterIndex];
+     indexInfo = resultRelInfo->ri_IndexRelationInfo[clusterIndex];
+
+     if (!OidIsValid(relationDesc->rd_am->amprepareinsert))
+         return InvalidBlockNumber; /* the indexam doesn't support the
+                                     * two-phase insert API */
+
+     /* You can't cluster on a partial index */
+     Assert(indexInfo->ii_Predicate == NIL);
+
+     /*
+      * We will use the EState's per-tuple context for evaluating
+      * index expressions (creating it if it's not already there).
+      */
+     econtext = GetPerTupleExprContext(estate);
+
+     /* Arrange for econtext's scan tuple to be the tuple under test */
+     econtext->ecxt_scantuple = slot;
+
+     /*
+      * FormIndexDatum fills in its values and isnull parameters with the
+      * appropriate values for the column(s) of the index.
+      */
+     FormIndexDatum(indexInfo,
+                    slot,
+                    estate,
+                    values,
+                    isnull);
+
+     /*
+      * The index AM does the rest.
+      */
+     return index_prepareinsert(relationDesc,    /* index relation */
+                  values,    /* array of index Datums */
+                  isnull,    /* null flags */
+                  heapRelation,
+                  relationDesc->rd_index->indisunique,
+                  &resultRelInfo->ri_PreparedInsertOpaque);
+ }
+
  /*
   * UpdateChangedParamSet
   *        Add changed parameters to a plan node's chgParam set
Index: src/backend/utils/misc/guc.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/utils/misc/guc.c,v
retrieving revision 1.391
diff -c -r1.391 guc.c
*** src/backend/utils/misc/guc.c    8 May 2007 16:33:51 -0000    1.391
--- src/backend/utils/misc/guc.c    19 May 2007 16:24:17 -0000
***************
*** 99,104 ****
--- 99,105 ----
  #define MS_PER_D (1000 * 60 * 60 * 24)

  /* XXX these should appear in other modules' header files */
+ extern bool cluster_inserts;
  extern bool Log_disconnections;
  extern int    CommitDelay;
  extern int    CommitSiblings;
***************
*** 427,432 ****
--- 428,441 ----
  static struct config_bool ConfigureNamesBool[] =
  {
      {
+         {"cluster_inserts", PGC_USERSET, DEVELOPER_OPTIONS,
+             gettext_noop("Tries to maintain cluster order on inserts."),
+             NULL
+         },
+         &cluster_inserts,
+         true, NULL, NULL
+     },
+     {
          {"enable_seqscan", PGC_USERSET, QUERY_TUNING_METHOD,
              gettext_noop("Enables the planner's use of sequential-scan plans."),
              NULL
Index: src/include/access/genam.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/genam.h,v
retrieving revision 1.66
diff -c -r1.66 genam.h
*** src/include/access/genam.h    5 Jan 2007 22:19:50 -0000    1.66
--- src/include/access/genam.h    19 May 2007 16:24:26 -0000
***************
*** 93,98 ****
--- 93,106 ----
               ItemPointer heap_t_ctid,
               Relation heapRelation,
               bool check_uniqueness);
+ extern BlockNumber index_prepareinsert(Relation indexRelation,
+              Datum *values, bool *isnull,
+              Relation heapRelation,
+              bool check_uniqueness,
+              void **opauqe);
+ extern bool index_finishinsert(Relation indexRelation,
+              ItemPointer heap_t_ctid,
+              void *opaque);

  extern IndexScanDesc index_beginscan(Relation heapRelation,
                  Relation indexRelation,
Index: src/include/access/heapam.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/heapam.h,v
retrieving revision 1.123
diff -c -r1.123 heapam.h
*** src/include/access/heapam.h    8 Apr 2007 01:26:33 -0000    1.123
--- src/include/access/heapam.h    19 May 2007 16:24:26 -0000
***************
*** 157,163 ****
  extern void setLastTid(const ItemPointer tid);

  extern Oid heap_insert(Relation relation, HeapTuple tup, CommandId cid,
!             bool use_wal, bool use_fsm);
  extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
              ItemPointer ctid, TransactionId *update_xmax,
              CommandId cid, Snapshot crosscheck, bool wait);
--- 157,163 ----
  extern void setLastTid(const ItemPointer tid);

  extern Oid heap_insert(Relation relation, HeapTuple tup, CommandId cid,
!             bool use_wal, bool use_fsm, BlockNumber suggestedblk);
  extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
              ItemPointer ctid, TransactionId *update_xmax,
              CommandId cid, Snapshot crosscheck, bool wait);
Index: src/include/access/nbtree.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/nbtree.h,v
retrieving revision 1.113
diff -c -r1.113 nbtree.h
*** src/include/access/nbtree.h    11 Apr 2007 20:47:38 -0000    1.113
--- src/include/access/nbtree.h    19 May 2007 16:24:26 -0000
***************
*** 508,517 ****
--- 508,540 ----
  extern Datum btbulkdelete(PG_FUNCTION_ARGS);
  extern Datum btvacuumcleanup(PG_FUNCTION_ARGS);
  extern Datum btoptions(PG_FUNCTION_ARGS);
+ extern Datum btprepareinsert(PG_FUNCTION_ARGS);
+ extern Datum btfinishinsert(PG_FUNCTION_ARGS);
+
+ /* Filled in by _bt_prepareinsert */
+ typedef struct BTInsertInfoData
+ {
+     Relation rel;
+     Relation heapRel;
+     bool index_is_unique;
+     IndexTuple itup;
+     ScanKey itup_scankey;
+     Buffer buf; /* pinned, not locked */
+     BTStack stack;
+ } BTInsertInfoData;
+
+ typedef BTInsertInfoData *BTInsertInfo;

  /*
   * prototypes for functions in nbtinsert.c
   */
+ extern BlockNumber _bt_prepareinsert(Relation rel, IndexTuple itup,
+                                      bool index_is_unique, Relation heapRel,
+                                      BTInsertInfo *opaquePtr);
+ extern void _bt_finishinsert(Relation rel, Relation heapRel,
+                              bool check_uniqueness,
+                              IndexTuple itup, ScanKey itup_scankey,
+                              BTStack stack, Buffer buf);
  extern void _bt_doinsert(Relation rel, IndexTuple itup,
               bool index_is_unique, Relation heapRel);
  extern Buffer _bt_getstackbuf(Relation rel, BTStack stack, int access);
Index: src/include/catalog/pg_am.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/catalog/pg_am.h,v
retrieving revision 1.51
diff -c -r1.51 pg_am.h
*** src/include/catalog/pg_am.h    6 Apr 2007 22:33:43 -0000    1.51
--- src/include/catalog/pg_am.h    19 May 2007 16:42:48 -0000
***************
*** 66,71 ****
--- 66,73 ----
      regproc        amvacuumcleanup;    /* post-VACUUM cleanup function */
      regproc        amcostestimate; /* estimate cost of an indexscan */
      regproc        amoptions;        /* parse AM-specific parameters */
+     regproc        amprepareinsert;    /* get desired insert location on heap */
+     regproc        amfinishinsert;    /* finish a prepared insert operation */
  } FormData_pg_am;

  /* ----------------
***************
*** 79,85 ****
   *        compiler constants for pg_am
   * ----------------
   */
! #define Natts_pg_am                        24
  #define Anum_pg_am_amname                1
  #define Anum_pg_am_amstrategies            2
  #define Anum_pg_am_amsupport            3
--- 81,87 ----
   *        compiler constants for pg_am
   * ----------------
   */
! #define Natts_pg_am                        26
  #define Anum_pg_am_amname                1
  #define Anum_pg_am_amstrategies            2
  #define Anum_pg_am_amsupport            3
***************
*** 104,125 ****
  #define Anum_pg_am_amvacuumcleanup        22
  #define Anum_pg_am_amcostestimate        23
  #define Anum_pg_am_amoptions            24

  /* ----------------
   *        initial contents of pg_am
   * ----------------
   */

! DATA(insert OID = 403 (  btree    5 1 t t t t t t f t btinsert btbeginscan btgettuple btgetmulti btrescan btendscan
btmarkposbtrestrpos btbuild btbulkdelete btvacuumcleanup btcostestimate btoptions )); 
  DESCR("b-tree index access method");
  #define BTREE_AM_OID 403
! DATA(insert OID = 405 (  hash    1 1 f f f f f f f f hashinsert hashbeginscan hashgettuple hashgetmulti hashrescan
hashendscanhashmarkpos hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions )); 
  DESCR("hash index access method");
  #define HASH_AM_OID 405
! DATA(insert OID = 783 (  gist    0 7 f f t t t t t t gistinsert gistbeginscan gistgettuple gistgetmulti gistrescan
gistendscangistmarkpos gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions )); 
  DESCR("GiST index access method");
  #define GIST_AM_OID 783
! DATA(insert OID = 2742 (  gin    0 4 f f f f f f t f gininsert ginbeginscan gingettuple gingetmulti ginrescan
ginendscanginmarkpos ginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate ginoptions )); 
  DESCR("GIN index access method");
  #define GIN_AM_OID 2742

--- 106,129 ----
  #define Anum_pg_am_amvacuumcleanup        22
  #define Anum_pg_am_amcostestimate        23
  #define Anum_pg_am_amoptions            24
+ #define Anum_pg_am_amprepareinsert        25
+ #define Anum_pg_am_amfinishinsert        26

  /* ----------------
   *        initial contents of pg_am
   * ----------------
   */

! DATA(insert OID = 403 (  btree    5 1 t t t t t t f t btinsert btbeginscan btgettuple btgetmulti btrescan btendscan
btmarkposbtrestrpos btbuild btbulkdelete btvacuumcleanup btcostestimate btoptions btprepareinsert btfinishinsert)); 
  DESCR("b-tree index access method");
  #define BTREE_AM_OID 403
! DATA(insert OID = 405 (  hash    1 1 f f f f f f f f hashinsert hashbeginscan hashgettuple hashgetmulti hashrescan
hashendscanhashmarkpos hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions - -)); 
  DESCR("hash index access method");
  #define HASH_AM_OID 405
! DATA(insert OID = 783 (  gist    0 7 f f t t t t t t gistinsert gistbeginscan gistgettuple gistgetmulti gistrescan
gistendscangistmarkpos gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions - -)); 
  DESCR("GiST index access method");
  #define GIST_AM_OID 783
! DATA(insert OID = 2742 (  gin    0 4 f f f f f f t f gininsert ginbeginscan gingettuple gingetmulti ginrescan
ginendscanginmarkpos ginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate ginoptions - -)); 
  DESCR("GIN index access method");
  #define GIN_AM_OID 2742

Index: src/include/catalog/pg_proc.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.455
diff -c -r1.455 pg_proc.h
*** src/include/catalog/pg_proc.h    8 May 2007 18:56:48 -0000    1.455
--- src/include/catalog/pg_proc.h    19 May 2007 17:20:23 -0000
***************
*** 688,693 ****
--- 688,697 ----
  DESCR("btree(internal)");
  DATA(insert OID = 2785 (  btoptions           PGNSP PGUID 12 1 0 f f t f s 2 17 "1009 16" _null_ _null_ _null_
btoptions- _null_ )); 
  DESCR("btree(internal)");
+ DATA(insert OID = 5433 (  btprepareinsert   PGNSP PGUID 12 1 0 f f t f v 6 23 "2281 2281 2281 2281 2281 2281" _null_
_null__null_    btprepareinsert - _null_ )); 
+ DESCR("btree(internal)");
+ DATA(insert OID = 5430 (  btfinishinsert   PGNSP PGUID 12 1 0 f f t f v 2 16 "2281 2281" _null_ _null_ _null_
btfinishinsert- _null_ )); 
+ DESCR("btree(internal)");

  DATA(insert OID = 339 (  poly_same           PGNSP PGUID 12 1 0 f f t f i 2 16 "604 604" _null_ _null_ _null_
poly_same- _null_ )); 
  DESCR("same as?");
Index: src/include/executor/executor.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/executor/executor.h,v
retrieving revision 1.139
diff -c -r1.139 executor.h
*** src/include/executor/executor.h    27 Feb 2007 01:11:25 -0000    1.139
--- src/include/executor/executor.h    19 May 2007 16:24:27 -0000
***************
*** 276,281 ****
--- 276,282 ----
  extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
  extern void ExecInsertIndexTuples(TupleTableSlot *slot, ItemPointer tupleid,
                        EState *estate, bool is_vacuum);
+ extern BlockNumber ExecPrepareIndexInsert(TupleTableSlot *slot, EState *estate);

  extern void RegisterExprContextCallback(ExprContext *econtext,
                              ExprContextCallbackFunction function,
Index: src/include/nodes/execnodes.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/nodes/execnodes.h,v
retrieving revision 1.174
diff -c -r1.174 execnodes.h
*** src/include/nodes/execnodes.h    17 May 2007 19:35:08 -0000    1.174
--- src/include/nodes/execnodes.h    19 May 2007 16:24:27 -0000
***************
*** 264,269 ****
--- 264,271 ----
   *        NumIndices                # of indices existing on result relation
   *        IndexRelationDescs        array of relation descriptors for indices
   *        IndexRelationInfo        array of key/attr info for indices
+  *        ClusterIndex            index to the IndexRelationInfo array of the
+  *                                clustered index, or -1 if there's none
   *        TrigDesc                triggers to be fired, if any
   *        TrigFunctions            cached lookup info for trigger functions
   *        TrigInstrument            optional runtime measurements for triggers
***************
*** 280,291 ****
--- 282,296 ----
      int            ri_NumIndices;
      RelationPtr ri_IndexRelationDescs;
      IndexInfo **ri_IndexRelationInfo;
+     int         ri_ClusterIndex;
      TriggerDesc *ri_TrigDesc;
      FmgrInfo   *ri_TrigFunctions;
      struct Instrumentation *ri_TrigInstrument;
      List      **ri_ConstraintExprs;
      JunkFilter *ri_junkFilter;
      ProjectionInfo *ri_projectReturning;
+
+     void        *ri_PreparedInsertOpaque;
  } ResultRelInfo;

  /* ----------------
Index: src/include/utils/rel.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/utils/rel.h,v
retrieving revision 1.100
diff -c -r1.100 rel.h
*** src/include/utils/rel.h    29 Mar 2007 00:15:39 -0000    1.100
--- src/include/utils/rel.h    19 May 2007 16:24:29 -0000
***************
*** 117,122 ****
--- 117,124 ----
      FmgrInfo    amvacuumcleanup;
      FmgrInfo    amcostestimate;
      FmgrInfo    amoptions;
+     FmgrInfo    amprepareinsert;
+     FmgrInfo    amfinishinsert;
  } RelationAmInfo;



pgsql-patches by date:

Previous
From: Zoltan Boszormenyi
Date:
Subject: Re: Standard compliant DEFAULT clause
Next
From: Bruce Momjian
Date:
Subject: Re: Maintaining cluster order on insert