Maintaining cluster order on insert - Mailing list pgsql-patches

From Heikki Linnakangas
Subject Maintaining cluster order on insert
Date
Msg-id 44DA31B1.3090700@enterprisedb.com
Whole thread Raw
Responses Re: Maintaining cluster order on insert
Re: Maintaining cluster order on insert
Re: Maintaining cluster order on insert
List pgsql-patches
While thinking about index-organized-tables and similar ideas, it
occurred to me that there's some low-hanging-fruit: maintaining cluster
order on inserts by trying to place new heap tuples close to other
similar tuples. That involves asking the index am where on the heap the
new tuple should go, and trying to insert it there before using the FSM.
Using the new fillfactor parameter makes it more likely that there's
room on the page. We don't worry about the order within the page.

The API I'm thinking of introduces a new optional index am function,
amsuggestblock (suggestions for a better name are welcome). It gets the
same parameters as aminsert, and returns the heap block number that
would be optimal place to put the new tuple. It's be called from
ExecInsert before inserting the heap tuple, and the suggestion is passed
on to heap_insert and RelationGetBufferForTuple.

I wrote a little patch to implement this for btree, attached.

This could be optimized by changing the existing aminsert API, because
as it is, an insert will have to descend the btree twice. Once in
amsuggestblock and then in aminsert. amsuggestblock could keep the right
index page pinned so aminsert could locate it quicker. But I wanted to
keep this simple for now. Another improvement might be to allow
amsuggestblock to return a list of suggestions, but that makes it more
expensive to insert if there isn't room in the suggested pages, since
heap_insert will have to try them all before giving up.

Comments regarding the general idea or the patch? There should probably
be a index option to turn the feature on and off. You'll want to turn it
off when you first load a table, and turn it on after CLUSTER to keep it
clustered.

Since there's been discussion on keeping the TODO list more up-to-date,
I hereby officially claim the "Automatically maintain clustering on a
table" TODO item :). Feel free to bombard me with requests for status
reports. And just to be clear, I'm not trying to sneak this into 8.2
anymore, this is 8.3 stuff.

I won't be implementing a background daemon described on the TODO item,
since that would essentially be an online version of CLUSTER. Which sure
would be nice, but that's a different story.

- Heikki

Index: doc/src/sgml/catalogs.sgml
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/doc/src/sgml/catalogs.sgml,v
retrieving revision 2.129
diff -c -r2.129 catalogs.sgml
*** doc/src/sgml/catalogs.sgml    31 Jul 2006 20:08:55 -0000    2.129
--- doc/src/sgml/catalogs.sgml    8 Aug 2006 16:17:21 -0000
***************
*** 499,504 ****
--- 499,511 ----
        <entry>Function to parse and validate reloptions for an index</entry>
       </row>

+      <row>
+       <entry><structfield>amsuggestblock</structfield></entry>
+       <entry><type>regproc</type></entry>
+       <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+       <entry>Get the best place in the heap to put a new tuple</entry>
+      </row>
+
      </tbody>
     </tgroup>
    </table>
Index: doc/src/sgml/indexam.sgml
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/doc/src/sgml/indexam.sgml,v
retrieving revision 2.16
diff -c -r2.16 indexam.sgml
*** doc/src/sgml/indexam.sgml    31 Jul 2006 20:08:59 -0000    2.16
--- doc/src/sgml/indexam.sgml    8 Aug 2006 17:15:25 -0000
***************
*** 391,396 ****
--- 391,414 ----
     <function>amoptions</> to test validity of options settings.
    </para>

+   <para>
+ <programlisting>
+ BlockNumber
+ amsuggestblock (Relation indexRelation,
+                 Datum *values,
+                 bool *isnull,
+                 Relation heapRelation);
+ </programlisting>
+    Gets the optimal place in the heap for a new tuple. The parameters
+    correspond the parameters for <literal>aminsert</literal>.
+    This function is called on the clustered index before a new tuple
+    is inserted to the heap, and it should choose the optimal insertion
+    target page on the heap in such manner that the heap stays as close
+    as possible to the index order.
+    <literal>amsuggestblock</literal> can return InvalidBlockNumber if
+    the index am doesn't have a suggestion.
+   </para>
+
   </sect1>

   <sect1 id="index-scanning">
Index: src/backend/access/heap/heapam.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.218
diff -c -r1.218 heapam.c
*** src/backend/access/heap/heapam.c    31 Jul 2006 20:08:59 -0000    1.218
--- src/backend/access/heap/heapam.c    8 Aug 2006 16:17:21 -0000
***************
*** 1325,1330 ****
--- 1325,1335 ----
   * use_fsm is passed directly to RelationGetBufferForTuple, which see for
   * more info.
   *
+  * suggested_blk can be set by the caller to hint heap_insert which
+  * block would be the best place to put the new tuple in. heap_insert can
+  * ignore the suggestion, if there's not enough room on that block.
+  * InvalidBlockNumber means no preference.
+  *
   * The return value is the OID assigned to the tuple (either here or by the
   * caller), or InvalidOid if no OID.  The header fields of *tup are updated
   * to match the stored tuple; in particular tup->t_self receives the actual
***************
*** 1333,1339 ****
   */
  Oid
  heap_insert(Relation relation, HeapTuple tup, CommandId cid,
!             bool use_wal, bool use_fsm)
  {
      TransactionId xid = GetCurrentTransactionId();
      HeapTuple    heaptup;
--- 1338,1344 ----
   */
  Oid
  heap_insert(Relation relation, HeapTuple tup, CommandId cid,
!             bool use_wal, bool use_fsm, BlockNumber suggested_blk)
  {
      TransactionId xid = GetCurrentTransactionId();
      HeapTuple    heaptup;
***************
*** 1386,1392 ****

      /* Find buffer to insert this tuple into */
      buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
!                                        InvalidBuffer, use_fsm);

      /* NO EREPORT(ERROR) from here till changes are logged */
      START_CRIT_SECTION();
--- 1391,1397 ----

      /* Find buffer to insert this tuple into */
      buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
!                                        InvalidBuffer, use_fsm, suggested_blk);

      /* NO EREPORT(ERROR) from here till changes are logged */
      START_CRIT_SECTION();
***************
*** 1494,1500 ****
  Oid
  simple_heap_insert(Relation relation, HeapTuple tup)
  {
!     return heap_insert(relation, tup, GetCurrentCommandId(), true, true);
  }

  /*
--- 1499,1506 ----
  Oid
  simple_heap_insert(Relation relation, HeapTuple tup)
  {
!     return heap_insert(relation, tup, GetCurrentCommandId(), true,
!                        true, InvalidBlockNumber);
  }

  /*
***************
*** 2079,2085 ****
          {
              /* Assume there's no chance to put heaptup on same page. */
              newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
!                                                buffer, true);
          }
          else
          {
--- 2085,2092 ----
          {
              /* Assume there's no chance to put heaptup on same page. */
              newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
!                                                buffer, true,
!                                                InvalidBlockNumber);
          }
          else
          {
***************
*** 2096,2102 ****
                   */
                  LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
                  newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
!                                                    buffer, true);
              }
              else
              {
--- 2103,2110 ----
                   */
                  LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
                  newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
!                                                    buffer, true,
!                                                    InvalidBlockNumber);
              }
              else
              {
Index: src/backend/access/heap/hio.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/hio.c,v
retrieving revision 1.63
diff -c -r1.63 hio.c
*** src/backend/access/heap/hio.c    3 Jul 2006 22:45:37 -0000    1.63
--- src/backend/access/heap/hio.c    9 Aug 2006 18:03:01 -0000
***************
*** 93,98 ****
--- 93,100 ----
   *    any committed data of other transactions.  (See heap_insert's comments
   *    for additional constraints needed for safe usage of this behavior.)
   *
+  *    If the caller has a suggestion, it's passed in suggestedBlock.
+  *
   *    We always try to avoid filling existing pages further than the fillfactor.
   *    This is OK since this routine is not consulted when updating a tuple and
   *    keeping it on the same page, which is the scenario fillfactor is meant
***************
*** 103,109 ****
   */
  Buffer
  RelationGetBufferForTuple(Relation relation, Size len,
!                           Buffer otherBuffer, bool use_fsm)
  {
      Buffer        buffer = InvalidBuffer;
      Page        pageHeader;
--- 105,112 ----
   */
  Buffer
  RelationGetBufferForTuple(Relation relation, Size len,
!                           Buffer otherBuffer, bool use_fsm,
!                           BlockNumber suggestedBlock)
  {
      Buffer        buffer = InvalidBuffer;
      Page        pageHeader;
***************
*** 135,142 ****
          otherBlock = InvalidBlockNumber;        /* just to keep compiler quiet */

      /*
!      * We first try to put the tuple on the same page we last inserted a tuple
!      * on, as cached in the relcache entry.  If that doesn't work, we ask the
       * shared Free Space Map to locate a suitable page.  Since the FSM's info
       * might be out of date, we have to be prepared to loop around and retry
       * multiple times.    (To insure this isn't an infinite loop, we must update
--- 138,147 ----
          otherBlock = InvalidBlockNumber;        /* just to keep compiler quiet */

      /*
!      * We first try to put the tuple on the page suggested by the caller, if
!      * any. Then we try to put the tuple on the same page we last inserted a
!      * tuple on, as cached in the relcache entry. If that doesn't work, we
!      * ask the
       * shared Free Space Map to locate a suitable page.  Since the FSM's info
       * might be out of date, we have to be prepared to loop around and retry
       * multiple times.    (To insure this isn't an infinite loop, we must update
***************
*** 144,152 ****
       * not to be suitable.)  If the FSM has no record of a page with enough
       * free space, we give up and extend the relation.
       *
!      * When use_fsm is false, we either put the tuple onto the existing target
!      * page or extend the relation.
       */
      if (len + saveFreeSpace <= MaxTupleSize)
          targetBlock = relation->rd_targblock;
      else
--- 149,167 ----
       * not to be suitable.)  If the FSM has no record of a page with enough
       * free space, we give up and extend the relation.
       *
!      * When use_fsm is false, we skip the fsm lookup if neither the suggested
!      * nor the cached last insertion page has enough room, and extend the
!      * relation.
!      *
!      * The fillfactor is taken into account when calculating the free space
!      * on the cached target block, and when using the FSM. The suggested page
!      * is used whenever there's enough room in it, regardless of the fillfactor,
!      * because that's exactly the purpose the space is reserved for in the
!      * first place.
       */
+     if (suggestedBlock != InvalidBlockNumber)
+         targetBlock = suggestedBlock;
+     else
      if (len + saveFreeSpace <= MaxTupleSize)
          targetBlock = relation->rd_targblock;
      else
***************
*** 219,224 ****
--- 234,244 ----
           */
          pageHeader = (Page) BufferGetPage(buffer);
          pageFreeSpace = PageGetFreeSpace(pageHeader);
+
+         /* If we're trying the suggested block, don't care about fillfactor */
+         if (targetBlock == suggestedBlock && len <= pageFreeSpace)
+             return buffer;
+
          if (len + saveFreeSpace <= pageFreeSpace)
          {
              /* use this page as future insert target, too */
***************
*** 241,246 ****
--- 261,275 ----
              ReleaseBuffer(buffer);
          }

+         /* If we just tried the suggested block, try the cached target
+          * block next, before consulting the FSM. */
+         if(suggestedBlock == targetBlock)
+         {
+             targetBlock = relation->rd_targblock;
+             suggestedBlock = InvalidBlockNumber;
+             continue;
+         }
+
          /* Without FSM, always fall out of the loop and extend */
          if (!use_fsm)
              break;
Index: src/backend/access/index/genam.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/index/genam.c,v
retrieving revision 1.58
diff -c -r1.58 genam.c
*** src/backend/access/index/genam.c    31 Jul 2006 20:08:59 -0000    1.58
--- src/backend/access/index/genam.c    8 Aug 2006 16:17:21 -0000
***************
*** 259,261 ****
--- 259,275 ----

      pfree(sysscan);
  }
+
+ /*
+  * This is a dummy implementation of amsuggestblock, to be used for index
+  * access methods that don't or can't support it. It just returns
+  * InvalidBlockNumber, which means "no preference".
+  *
+  * This is probably not a good best place for this function, but it doesn't
+  * fit naturally anywhere else either.
+  */
+ Datum
+ dummysuggestblock(PG_FUNCTION_ARGS)
+ {
+     PG_RETURN_UINT32(InvalidBlockNumber);
+ }
Index: src/backend/access/index/indexam.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/index/indexam.c,v
retrieving revision 1.94
diff -c -r1.94 indexam.c
*** src/backend/access/index/indexam.c    31 Jul 2006 20:08:59 -0000    1.94
--- src/backend/access/index/indexam.c    8 Aug 2006 16:17:21 -0000
***************
*** 18,23 ****
--- 18,24 ----
   *        index_rescan    - restart a scan of an index
   *        index_endscan    - end a scan
   *        index_insert    - insert an index tuple into a relation
+  *        index_suggestblock    - get desired insert location for a heap tuple
   *        index_markpos    - mark a scan position
   *        index_restrpos    - restore a scan position
   *        index_getnext    - get the next tuple from a scan
***************
*** 202,207 ****
--- 203,237 ----
                                        BoolGetDatum(check_uniqueness)));
  }

+ /* ----------------
+  *        index_suggestblock - get desired insert location for a heap tuple
+  *
+  * The returned BlockNumber is the *heap* page that is the best place
+  * to insert the given tuple to, according to the index am. The best
+  * place is usually one that maintains the cluster order.
+  * ----------------
+  */
+ BlockNumber
+ index_suggestblock(Relation indexRelation,
+                    Datum *values,
+                    bool *isnull,
+                    Relation heapRelation)
+ {
+     FmgrInfo   *procedure;
+
+     RELATION_CHECKS;
+     GET_REL_PROCEDURE(amsuggestblock);
+
+     /*
+      * have the am's suggestblock proc do all the work.
+      */
+     return DatumGetUInt32(FunctionCall4(procedure,
+                                       PointerGetDatum(indexRelation),
+                                       PointerGetDatum(values),
+                                       PointerGetDatum(isnull),
+                                       PointerGetDatum(heapRelation)));
+ }
+
  /*
   * index_beginscan - start a scan of an index with amgettuple
   *
Index: src/backend/access/nbtree/nbtinsert.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtinsert.c,v
retrieving revision 1.142
diff -c -r1.142 nbtinsert.c
*** src/backend/access/nbtree/nbtinsert.c    25 Jul 2006 19:13:00 -0000    1.142
--- src/backend/access/nbtree/nbtinsert.c    9 Aug 2006 17:51:33 -0000
***************
*** 146,151 ****
--- 146,221 ----
  }

  /*
+  *    _bt_suggestblock() -- Find the heap block of the closest index tuple.
+  *
+  * The logic to find the target should match _bt_doinsert, otherwise
+  * we'll be making bad suggestions.
+  */
+ BlockNumber
+ _bt_suggestblock(Relation rel, IndexTuple itup, Relation heapRel)
+ {
+     int            natts = rel->rd_rel->relnatts;
+     OffsetNumber offset;
+     Page        page;
+     BTPageOpaque opaque;
+
+     ScanKey        itup_scankey;
+     BTStack        stack;
+     Buffer        buf;
+     IndexTuple    curitup;
+     BlockNumber suggestion = InvalidBlockNumber;
+
+     /* we need an insertion scan key to do our search, so build one */
+     itup_scankey = _bt_mkscankey(rel, itup);
+
+     /* find the first page containing this key */
+     stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_READ);
+     if(!BufferIsValid(buf))
+     {
+         /* The index was completely empty. No suggestion then. */
+         return InvalidBlockNumber;
+     }
+     /* we don't need the stack, so free it right away */
+     _bt_freestack(stack);
+
+     page = BufferGetPage(buf);
+     opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+     /* Find the location in the page where the new index tuple would go to. */
+
+     offset = _bt_binsrch(rel, buf, natts, itup_scankey, false);
+     if (offset > PageGetMaxOffsetNumber(page))
+     {
+         /* _bt_binsrch returned pointer to end-of-page. It means that
+          * there was no equal items on the page, and the new item should
+          * be inserted as the last tuple of the page. There could be equal
+          * items on the next page, however.
+          *
+          * At the moment, we just ignore the potential equal items on the
+          * right, and pretend there isn't any. We could instead walk right
+          * to the next page to check that, but let's keep it simple for now.
+          */
+         offset = OffsetNumberPrev(offset);
+     }
+     if(offset < P_FIRSTDATAKEY(opaque))
+     {
+         /* We landed on an empty page. We could step left or right until
+          * we find some items, but let's keep it simple for now.
+          */
+     } else {
+         /* We're now positioned at the index tuple that we're interested in. */
+
+         curitup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offset));
+         suggestion = ItemPointerGetBlockNumber(&curitup->t_tid);
+     }
+
+     _bt_relbuf(rel, buf);
+     _bt_freeskey(itup_scankey);
+
+     return suggestion;
+ }
+
+ /*
   *    _bt_check_unique() -- Check for violation of unique index constraint
   *
   * Returns InvalidTransactionId if there is no conflict, else an xact ID
Index: src/backend/access/nbtree/nbtree.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtree.c,v
retrieving revision 1.149
diff -c -r1.149 nbtree.c
*** src/backend/access/nbtree/nbtree.c    10 May 2006 23:18:39 -0000    1.149
--- src/backend/access/nbtree/nbtree.c    9 Aug 2006 18:04:02 -0000
***************
*** 228,233 ****
--- 228,265 ----
  }

  /*
+  *    btsuggestblock() -- find the best place in the heap to put a new tuple.
+  *
+  *        This uses the same logic as btinsert to find the place where the index
+  *        tuple would go if this was a btinsert call.
+  *
+  *        There's room for improvement here. An insert operation will descend
+  *        the tree twice, first by btsuggestblock, then by btinsert. Things
+  *        might have changed in between, so that the heap tuple is actually
+  *        not inserted in the optimal page, but since this is just an
+  *        optimization, it's ok if it happens    sometimes.
+  */
+ Datum
+ btsuggestblock(PG_FUNCTION_ARGS)
+ {
+     Relation    rel = (Relation) PG_GETARG_POINTER(0);
+     Datum       *values = (Datum *) PG_GETARG_POINTER(1);
+     bool       *isnull = (bool *) PG_GETARG_POINTER(2);
+     Relation    heapRel = (Relation) PG_GETARG_POINTER(3);
+     IndexTuple    itup;
+     BlockNumber suggestion;
+
+     /* generate an index tuple */
+     itup = index_form_tuple(RelationGetDescr(rel), values, isnull);
+
+     suggestion =_bt_suggestblock(rel, itup, heapRel);
+
+     pfree(itup);
+
+     PG_RETURN_UINT32(suggestion);
+ }
+
+ /*
   *    btgettuple() -- Get the next tuple in the scan.
   */
  Datum
Index: src/backend/executor/execMain.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/executor/execMain.c,v
retrieving revision 1.277
diff -c -r1.277 execMain.c
*** src/backend/executor/execMain.c    31 Jul 2006 01:16:37 -0000    1.277
--- src/backend/executor/execMain.c    8 Aug 2006 16:17:21 -0000
***************
*** 892,897 ****
--- 892,898 ----
      resultRelInfo->ri_RangeTableIndex = resultRelationIndex;
      resultRelInfo->ri_RelationDesc = resultRelationDesc;
      resultRelInfo->ri_NumIndices = 0;
+     resultRelInfo->ri_ClusterIndex = -1;
      resultRelInfo->ri_IndexRelationDescs = NULL;
      resultRelInfo->ri_IndexRelationInfo = NULL;
      /* make a copy so as not to depend on relcache info not changing... */
***************
*** 1388,1394 ****
          heap_insert(estate->es_into_relation_descriptor, tuple,
                      estate->es_snapshot->curcid,
                      estate->es_into_relation_use_wal,
!                     false);        /* never any point in using FSM */
          /* we know there are no indexes to update */
          heap_freetuple(tuple);
          IncrAppended();
--- 1389,1396 ----
          heap_insert(estate->es_into_relation_descriptor, tuple,
                      estate->es_snapshot->curcid,
                      estate->es_into_relation_use_wal,
!                     false, /* never any point in using FSM */
!                     InvalidBlockNumber);
          /* we know there are no indexes to update */
          heap_freetuple(tuple);
          IncrAppended();
***************
*** 1419,1424 ****
--- 1421,1427 ----
      ResultRelInfo *resultRelInfo;
      Relation    resultRelationDesc;
      Oid            newId;
+     BlockNumber suggestedBlock;

      /*
       * get the heap tuple out of the tuple table slot, making sure we have a
***************
*** 1467,1472 ****
--- 1470,1479 ----
      if (resultRelationDesc->rd_att->constr)
          ExecConstraints(resultRelInfo, slot, estate);

+     /* Ask the index am of the clustered index for the
+      * best place to put it */
+     suggestedBlock = ExecSuggestBlock(slot, estate);
+
      /*
       * insert the tuple
       *
***************
*** 1475,1481 ****
       */
      newId = heap_insert(resultRelationDesc, tuple,
                          estate->es_snapshot->curcid,
!                         true, true);

      IncrAppended();
      (estate->es_processed)++;
--- 1482,1488 ----
       */
      newId = heap_insert(resultRelationDesc, tuple,
                          estate->es_snapshot->curcid,
!                         true, true, suggestedBlock);

      IncrAppended();
      (estate->es_processed)++;
Index: src/backend/executor/execUtils.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/executor/execUtils.c,v
retrieving revision 1.139
diff -c -r1.139 execUtils.c
*** src/backend/executor/execUtils.c    4 Aug 2006 21:33:36 -0000    1.139
--- src/backend/executor/execUtils.c    9 Aug 2006 18:05:05 -0000
***************
*** 31,36 ****
--- 31,37 ----
   *        ExecOpenIndices            \
   *        ExecCloseIndices         | referenced by InitPlan, EndPlan,
   *        ExecInsertIndexTuples    /  ExecInsert, ExecUpdate
+  *        ExecSuggestBlock        Referenced by ExecInsert
   *
   *        RegisterExprContextCallback    Register function shutdown callback
   *        UnregisterExprContextCallback  Deregister function shutdown callback
***************
*** 874,879 ****
--- 875,881 ----
      IndexInfo **indexInfoArray;

      resultRelInfo->ri_NumIndices = 0;
+     resultRelInfo->ri_ClusterIndex = -1;

      /* fast path if no indexes */
      if (!RelationGetForm(resultRelation)->relhasindex)
***************
*** 913,918 ****
--- 915,925 ----
          /* extract index key information from the index's pg_index info */
          ii = BuildIndexInfo(indexDesc);

+         /* Remember which index is the clustered one.
+          * It's used to call the suggestblock-method on inserts */
+         if(indexDesc->rd_index->indisclustered)
+             resultRelInfo->ri_ClusterIndex = i;
+
          relationDescs[i] = indexDesc;
          indexInfoArray[i] = ii;
          i++;
***************
*** 1062,1067 ****
--- 1069,1137 ----
      }
  }

+ /* ----------------------------------------------------------------
+  *        ExecSuggestBlock
+  *
+  *        This routine asks the index am where a new heap tuple
+  *        should be placed.
+  * ----------------------------------------------------------------
+  */
+ BlockNumber
+ ExecSuggestBlock(TupleTableSlot *slot,
+                  EState *estate)
+ {
+     ResultRelInfo *resultRelInfo;
+     int            i;
+     Relation    relationDesc;
+     Relation    heapRelation;
+     ExprContext *econtext;
+     Datum        values[INDEX_MAX_KEYS];
+     bool        isnull[INDEX_MAX_KEYS];
+     IndexInfo  *indexInfo;
+
+     /*
+      * Get information from the result relation info structure.
+      */
+     resultRelInfo = estate->es_result_relation_info;
+     i = resultRelInfo->ri_ClusterIndex;
+     if(i == -1)
+         return InvalidBlockNumber; /* there was no clustered index */
+
+     heapRelation = resultRelInfo->ri_RelationDesc;
+     relationDesc = resultRelInfo->ri_IndexRelationDescs[i];
+     indexInfo = resultRelInfo->ri_IndexRelationInfo[i];
+
+     /* You can't cluster on a partial index */
+     Assert(indexInfo->ii_Predicate == NIL);
+
+     /*
+      * We will use the EState's per-tuple context for evaluating
+      * index expressions (creating it if it's not already there).
+      */
+     econtext = GetPerTupleExprContext(estate);
+
+     /* Arrange for econtext's scan tuple to be the tuple under test */
+     econtext->ecxt_scantuple = slot;
+
+     /*
+      * FormIndexDatum fills in its values and isnull parameters with the
+      * appropriate values for the column(s) of the index.
+      */
+     FormIndexDatum(indexInfo,
+                    slot,
+                    estate,
+                    values,
+                    isnull);
+
+     /*
+      * The index AM does the rest.
+      */
+     return index_suggestblock(relationDesc,    /* index relation */
+                  values,    /* array of index Datums */
+                  isnull,    /* null flags */
+                  heapRelation);
+ }
+
  /*
   * UpdateChangedParamSet
   *        Add changed parameters to a plan node's chgParam set
Index: src/include/access/genam.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/genam.h,v
retrieving revision 1.65
diff -c -r1.65 genam.h
*** src/include/access/genam.h    31 Jul 2006 20:09:05 -0000    1.65
--- src/include/access/genam.h    9 Aug 2006 17:53:44 -0000
***************
*** 93,98 ****
--- 93,101 ----
               ItemPointer heap_t_ctid,
               Relation heapRelation,
               bool check_uniqueness);
+ extern BlockNumber index_suggestblock(Relation indexRelation,
+              Datum *values, bool *isnull,
+              Relation heapRelation);

  extern IndexScanDesc index_beginscan(Relation heapRelation,
                  Relation indexRelation,
***************
*** 123,128 ****
--- 126,133 ----
  extern FmgrInfo *index_getprocinfo(Relation irel, AttrNumber attnum,
                    uint16 procnum);

+ extern Datum dummysuggestblock(PG_FUNCTION_ARGS);
+
  /*
   * index access method support routines (in genam.c)
   */
Index: src/include/access/heapam.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/heapam.h,v
retrieving revision 1.114
diff -c -r1.114 heapam.h
*** src/include/access/heapam.h    3 Jul 2006 22:45:39 -0000    1.114
--- src/include/access/heapam.h    8 Aug 2006 16:17:21 -0000
***************
*** 156,162 ****
  extern void setLastTid(const ItemPointer tid);

  extern Oid heap_insert(Relation relation, HeapTuple tup, CommandId cid,
!             bool use_wal, bool use_fsm);
  extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
              ItemPointer ctid, TransactionId *update_xmax,
              CommandId cid, Snapshot crosscheck, bool wait);
--- 156,162 ----
  extern void setLastTid(const ItemPointer tid);

  extern Oid heap_insert(Relation relation, HeapTuple tup, CommandId cid,
!             bool use_wal, bool use_fsm, BlockNumber suggestedblk);
  extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
              ItemPointer ctid, TransactionId *update_xmax,
              CommandId cid, Snapshot crosscheck, bool wait);
Index: src/include/access/hio.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/hio.h,v
retrieving revision 1.32
diff -c -r1.32 hio.h
*** src/include/access/hio.h    13 Jul 2006 17:47:01 -0000    1.32
--- src/include/access/hio.h    8 Aug 2006 16:17:21 -0000
***************
*** 21,26 ****
  extern void RelationPutHeapTuple(Relation relation, Buffer buffer,
                       HeapTuple tuple);
  extern Buffer RelationGetBufferForTuple(Relation relation, Size len,
!                     Buffer otherBuffer, bool use_fsm);

  #endif   /* HIO_H */
--- 21,26 ----
  extern void RelationPutHeapTuple(Relation relation, Buffer buffer,
                       HeapTuple tuple);
  extern Buffer RelationGetBufferForTuple(Relation relation, Size len,
!                     Buffer otherBuffer, bool use_fsm, BlockNumber suggestedblk);

  #endif   /* HIO_H */
Index: src/include/access/nbtree.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/nbtree.h,v
retrieving revision 1.103
diff -c -r1.103 nbtree.h
*** src/include/access/nbtree.h    7 Aug 2006 16:57:57 -0000    1.103
--- src/include/access/nbtree.h    8 Aug 2006 16:17:21 -0000
***************
*** 467,472 ****
--- 467,473 ----
  extern Datum btbulkdelete(PG_FUNCTION_ARGS);
  extern Datum btvacuumcleanup(PG_FUNCTION_ARGS);
  extern Datum btoptions(PG_FUNCTION_ARGS);
+ extern Datum btsuggestblock(PG_FUNCTION_ARGS);

  /*
   * prototypes for functions in nbtinsert.c
***************
*** 476,481 ****
--- 477,484 ----
  extern Buffer _bt_getstackbuf(Relation rel, BTStack stack, int access);
  extern void _bt_insert_parent(Relation rel, Buffer buf, Buffer rbuf,
                    BTStack stack, bool is_root, bool is_only);
+ extern BlockNumber _bt_suggestblock(Relation rel, IndexTuple itup,
+              Relation heapRel);

  /*
   * prototypes for functions in nbtpage.c
Index: src/include/catalog/pg_am.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/catalog/pg_am.h,v
retrieving revision 1.46
diff -c -r1.46 pg_am.h
*** src/include/catalog/pg_am.h    31 Jul 2006 20:09:05 -0000    1.46
--- src/include/catalog/pg_am.h    8 Aug 2006 16:17:21 -0000
***************
*** 65,70 ****
--- 65,71 ----
      regproc        amvacuumcleanup;    /* post-VACUUM cleanup function */
      regproc        amcostestimate; /* estimate cost of an indexscan */
      regproc        amoptions;        /* parse AM-specific parameters */
+     regproc        amsuggestblock;    /* suggest a block where to put heap tuple */
  } FormData_pg_am;

  /* ----------------
***************
*** 78,84 ****
   *        compiler constants for pg_am
   * ----------------
   */
! #define Natts_pg_am                        23
  #define Anum_pg_am_amname                1
  #define Anum_pg_am_amstrategies            2
  #define Anum_pg_am_amsupport            3
--- 79,85 ----
   *        compiler constants for pg_am
   * ----------------
   */
! #define Natts_pg_am                        24
  #define Anum_pg_am_amname                1
  #define Anum_pg_am_amstrategies            2
  #define Anum_pg_am_amsupport            3
***************
*** 102,123 ****
  #define Anum_pg_am_amvacuumcleanup        21
  #define Anum_pg_am_amcostestimate        22
  #define Anum_pg_am_amoptions            23

  /* ----------------
   *        initial contents of pg_am
   * ----------------
   */

! DATA(insert OID = 403 (  btree    5 1 1 t t t t f t btinsert btbeginscan btgettuple btgetmulti btrescan btendscan
btmarkposbtrestrpos btbuild btbulkdelete btvacuumcleanup btcostestimate btoptions )); 
  DESCR("b-tree index access method");
  #define BTREE_AM_OID 403
! DATA(insert OID = 405 (  hash    1 1 0 f f f f f f hashinsert hashbeginscan hashgettuple hashgetmulti hashrescan
hashendscanhashmarkpos hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions )); 
  DESCR("hash index access method");
  #define HASH_AM_OID 405
! DATA(insert OID = 783 (  gist    100 7 0 f t t t t t gistinsert gistbeginscan gistgettuple gistgetmulti gistrescan
gistendscangistmarkpos gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions )); 
  DESCR("GiST index access method");
  #define GIST_AM_OID 783
! DATA(insert OID = 2742 (  gin    100 4 0 f f f f t f gininsert ginbeginscan gingettuple gingetmulti ginrescan
ginendscanginmarkpos ginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate ginoptions )); 
  DESCR("GIN index access method");
  #define GIN_AM_OID 2742

--- 103,125 ----
  #define Anum_pg_am_amvacuumcleanup        21
  #define Anum_pg_am_amcostestimate        22
  #define Anum_pg_am_amoptions            23
+ #define Anum_pg_am_amsuggestblock        24

  /* ----------------
   *        initial contents of pg_am
   * ----------------
   */

! DATA(insert OID = 403 (  btree    5 1 1 t t t t f t btinsert btbeginscan btgettuple btgetmulti btrescan btendscan
btmarkposbtrestrpos btbuild btbulkdelete btvacuumcleanup btcostestimate btoptions btsuggestblock)); 
  DESCR("b-tree index access method");
  #define BTREE_AM_OID 403
! DATA(insert OID = 405 (  hash    1 1 0 f f f f f f hashinsert hashbeginscan hashgettuple hashgetmulti hashrescan
hashendscanhashmarkpos hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions
dummysuggestblock));
  DESCR("hash index access method");
  #define HASH_AM_OID 405
! DATA(insert OID = 783 (  gist    100 7 0 f t t t t t gistinsert gistbeginscan gistgettuple gistgetmulti gistrescan
gistendscangistmarkpos gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions
dummysuggestblock));
  DESCR("GiST index access method");
  #define GIST_AM_OID 783
! DATA(insert OID = 2742 (  gin    100 4 0 f f f f t f gininsert ginbeginscan gingettuple gingetmulti ginrescan
ginendscanginmarkpos ginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate ginoptions dummysuggestblock
));
  DESCR("GIN index access method");
  #define GIN_AM_OID 2742

Index: src/include/catalog/pg_proc.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.420
diff -c -r1.420 pg_proc.h
*** src/include/catalog/pg_proc.h    6 Aug 2006 03:53:44 -0000    1.420
--- src/include/catalog/pg_proc.h    9 Aug 2006 18:06:44 -0000
***************
*** 682,687 ****
--- 682,689 ----
  DESCR("btree(internal)");
  DATA(insert OID = 2785 (  btoptions           PGNSP PGUID 12 f f t f s 2 17 "1009 16" _null_ _null_ _null_  btoptions
-_null_ )); 
  DESCR("btree(internal)");
+ DATA(insert OID = 2852 (  btsuggestblock   PGNSP PGUID 12 f f t f v 4 23 "2281 2281 2281 2281" _null_ _null_ _null_
btsuggestblock - _null_ )); 
+ DESCR("btree(internal)");

  DATA(insert OID = 339 (  poly_same           PGNSP PGUID 12 f f t f i 2 16 "604 604" _null_ _null_ _null_ poly_same -
_null_)); 
  DESCR("same as?");
***************
*** 3936,3941 ****
--- 3938,3946 ----
  DATA(insert OID = 2749 (  arraycontained       PGNSP PGUID 12 f f t f i 2 16 "2277 2277" _null_ _null_ _null_
arraycontained- _null_ )); 
  DESCR("anyarray contained");

+ DATA(insert OID = 2853 (  dummysuggestblock   PGNSP PGUID 12 f f t f v 4 23 "2281 2281 2281 2281" _null_ _null_
_null_   dummysuggestblock - _null_ )); 
+ DESCR("dummy amsuggestblock implementation (internal)");
+
  /*
   * Symbolic values for provolatile column: these indicate whether the result
   * of a function is dependent *only* on the values of its explicit arguments,
Index: src/include/executor/executor.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/executor/executor.h,v
retrieving revision 1.128
diff -c -r1.128 executor.h
*** src/include/executor/executor.h    4 Aug 2006 21:33:36 -0000    1.128
--- src/include/executor/executor.h    8 Aug 2006 16:17:21 -0000
***************
*** 271,276 ****
--- 271,277 ----
  extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
  extern void ExecInsertIndexTuples(TupleTableSlot *slot, ItemPointer tupleid,
                        EState *estate, bool is_vacuum);
+ extern BlockNumber ExecSuggestBlock(TupleTableSlot *slot, EState *estate);

  extern void RegisterExprContextCallback(ExprContext *econtext,
                              ExprContextCallbackFunction function,
Index: src/include/nodes/execnodes.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/nodes/execnodes.h,v
retrieving revision 1.158
diff -c -r1.158 execnodes.h
*** src/include/nodes/execnodes.h    4 Aug 2006 21:33:36 -0000    1.158
--- src/include/nodes/execnodes.h    8 Aug 2006 16:17:21 -0000
***************
*** 257,262 ****
--- 257,264 ----
   *        NumIndices                # of indices existing on result relation
   *        IndexRelationDescs        array of relation descriptors for indices
   *        IndexRelationInfo        array of key/attr info for indices
+  *        ClusterIndex            index to the IndexRelationInfo array of the
+  *                                clustered index, or -1 if there's none
   *        TrigDesc                triggers to be fired, if any
   *        TrigFunctions            cached lookup info for trigger functions
   *        TrigInstrument            optional runtime measurements for triggers
***************
*** 272,277 ****
--- 274,280 ----
      int            ri_NumIndices;
      RelationPtr ri_IndexRelationDescs;
      IndexInfo **ri_IndexRelationInfo;
+     int         ri_ClusterIndex;
      TriggerDesc *ri_TrigDesc;
      FmgrInfo   *ri_TrigFunctions;
      struct Instrumentation *ri_TrigInstrument;
Index: src/include/utils/rel.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/utils/rel.h,v
retrieving revision 1.91
diff -c -r1.91 rel.h
*** src/include/utils/rel.h    3 Jul 2006 22:45:41 -0000    1.91
--- src/include/utils/rel.h    8 Aug 2006 16:17:21 -0000
***************
*** 116,121 ****
--- 116,122 ----
      FmgrInfo    amvacuumcleanup;
      FmgrInfo    amcostestimate;
      FmgrInfo    amoptions;
+     FmgrInfo    amsuggestblock;
  } RelationAmInfo;



pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [BUGS] BUG #2569: statement_timeout bug on Windows
Next
From: Greg Sabino Mullane
Date:
Subject: Fix statement timing display