Re: Maintaining cluster order on insert - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: Maintaining cluster order on insert
Date
Msg-id 200804111938.m3BJcUj02170@momjian.us
Whole thread Raw
In response to Maintaining cluster order on insert  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-patches
This idea has been rejected to do poor performance results reported
later in the thread.

---------------------------------------------------------------------------

Heikki Linnakangas wrote:
> While thinking about index-organized-tables and similar ideas, it
> occurred to me that there's some low-hanging-fruit: maintaining cluster
> order on inserts by trying to place new heap tuples close to other
> similar tuples. That involves asking the index am where on the heap the
> new tuple should go, and trying to insert it there before using the FSM.
> Using the new fillfactor parameter makes it more likely that there's
> room on the page. We don't worry about the order within the page.
>
> The API I'm thinking of introduces a new optional index am function,
> amsuggestblock (suggestions for a better name are welcome). It gets the
> same parameters as aminsert, and returns the heap block number that
> would be optimal place to put the new tuple. It's be called from
> ExecInsert before inserting the heap tuple, and the suggestion is passed
> on to heap_insert and RelationGetBufferForTuple.
>
> I wrote a little patch to implement this for btree, attached.
>
> This could be optimized by changing the existing aminsert API, because
> as it is, an insert will have to descend the btree twice. Once in
> amsuggestblock and then in aminsert. amsuggestblock could keep the right
> index page pinned so aminsert could locate it quicker. But I wanted to
> keep this simple for now. Another improvement might be to allow
> amsuggestblock to return a list of suggestions, but that makes it more
> expensive to insert if there isn't room in the suggested pages, since
> heap_insert will have to try them all before giving up.
>
> Comments regarding the general idea or the patch? There should probably
> be a index option to turn the feature on and off. You'll want to turn it
> off when you first load a table, and turn it on after CLUSTER to keep it
> clustered.
>
> Since there's been discussion on keeping the TODO list more up-to-date,
> I hereby officially claim the "Automatically maintain clustering on a
> table" TODO item :). Feel free to bombard me with requests for status
> reports. And just to be clear, I'm not trying to sneak this into 8.2
> anymore, this is 8.3 stuff.
>
> I won't be implementing a background daemon described on the TODO item,
> since that would essentially be an online version of CLUSTER. Which sure
> would be nice, but that's a different story.
>
> - Heikki
>

[ text/x-patch is unsupported, treating like TEXT/PLAIN ]

> Index: doc/src/sgml/catalogs.sgml
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/doc/src/sgml/catalogs.sgml,v
> retrieving revision 2.129
> diff -c -r2.129 catalogs.sgml
> *** doc/src/sgml/catalogs.sgml    31 Jul 2006 20:08:55 -0000    2.129
> --- doc/src/sgml/catalogs.sgml    8 Aug 2006 16:17:21 -0000
> ***************
> *** 499,504 ****
> --- 499,511 ----
>         <entry>Function to parse and validate reloptions for an index</entry>
>        </row>
>
> +      <row>
> +       <entry><structfield>amsuggestblock</structfield></entry>
> +       <entry><type>regproc</type></entry>
> +       <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
> +       <entry>Get the best place in the heap to put a new tuple</entry>
> +      </row>
> +
>       </tbody>
>      </tgroup>
>     </table>
> Index: doc/src/sgml/indexam.sgml
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/doc/src/sgml/indexam.sgml,v
> retrieving revision 2.16
> diff -c -r2.16 indexam.sgml
> *** doc/src/sgml/indexam.sgml    31 Jul 2006 20:08:59 -0000    2.16
> --- doc/src/sgml/indexam.sgml    8 Aug 2006 17:15:25 -0000
> ***************
> *** 391,396 ****
> --- 391,414 ----
>      <function>amoptions</> to test validity of options settings.
>     </para>
>
> +   <para>
> + <programlisting>
> + BlockNumber
> + amsuggestblock (Relation indexRelation,
> +                 Datum *values,
> +                 bool *isnull,
> +                 Relation heapRelation);
> + </programlisting>
> +    Gets the optimal place in the heap for a new tuple. The parameters
> +    correspond the parameters for <literal>aminsert</literal>.
> +    This function is called on the clustered index before a new tuple
> +    is inserted to the heap, and it should choose the optimal insertion
> +    target page on the heap in such manner that the heap stays as close
> +    as possible to the index order.
> +    <literal>amsuggestblock</literal> can return InvalidBlockNumber if
> +    the index am doesn't have a suggestion.
> +   </para>
> +
>    </sect1>
>
>    <sect1 id="index-scanning">
> Index: src/backend/access/heap/heapam.c
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/heapam.c,v
> retrieving revision 1.218
> diff -c -r1.218 heapam.c
> *** src/backend/access/heap/heapam.c    31 Jul 2006 20:08:59 -0000    1.218
> --- src/backend/access/heap/heapam.c    8 Aug 2006 16:17:21 -0000
> ***************
> *** 1325,1330 ****
> --- 1325,1335 ----
>    * use_fsm is passed directly to RelationGetBufferForTuple, which see for
>    * more info.
>    *
> +  * suggested_blk can be set by the caller to hint heap_insert which
> +  * block would be the best place to put the new tuple in. heap_insert can
> +  * ignore the suggestion, if there's not enough room on that block.
> +  * InvalidBlockNumber means no preference.
> +  *
>    * The return value is the OID assigned to the tuple (either here or by the
>    * caller), or InvalidOid if no OID.  The header fields of *tup are updated
>    * to match the stored tuple; in particular tup->t_self receives the actual
> ***************
> *** 1333,1339 ****
>    */
>   Oid
>   heap_insert(Relation relation, HeapTuple tup, CommandId cid,
> !             bool use_wal, bool use_fsm)
>   {
>       TransactionId xid = GetCurrentTransactionId();
>       HeapTuple    heaptup;
> --- 1338,1344 ----
>    */
>   Oid
>   heap_insert(Relation relation, HeapTuple tup, CommandId cid,
> !             bool use_wal, bool use_fsm, BlockNumber suggested_blk)
>   {
>       TransactionId xid = GetCurrentTransactionId();
>       HeapTuple    heaptup;
> ***************
> *** 1386,1392 ****
>
>       /* Find buffer to insert this tuple into */
>       buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
> !                                        InvalidBuffer, use_fsm);
>
>       /* NO EREPORT(ERROR) from here till changes are logged */
>       START_CRIT_SECTION();
> --- 1391,1397 ----
>
>       /* Find buffer to insert this tuple into */
>       buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
> !                                        InvalidBuffer, use_fsm, suggested_blk);
>
>       /* NO EREPORT(ERROR) from here till changes are logged */
>       START_CRIT_SECTION();
> ***************
> *** 1494,1500 ****
>   Oid
>   simple_heap_insert(Relation relation, HeapTuple tup)
>   {
> !     return heap_insert(relation, tup, GetCurrentCommandId(), true, true);
>   }
>
>   /*
> --- 1499,1506 ----
>   Oid
>   simple_heap_insert(Relation relation, HeapTuple tup)
>   {
> !     return heap_insert(relation, tup, GetCurrentCommandId(), true,
> !                        true, InvalidBlockNumber);
>   }
>
>   /*
> ***************
> *** 2079,2085 ****
>           {
>               /* Assume there's no chance to put heaptup on same page. */
>               newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
> !                                                buffer, true);
>           }
>           else
>           {
> --- 2085,2092 ----
>           {
>               /* Assume there's no chance to put heaptup on same page. */
>               newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
> !                                                buffer, true,
> !                                                InvalidBlockNumber);
>           }
>           else
>           {
> ***************
> *** 2096,2102 ****
>                    */
>                   LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
>                   newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
> !                                                    buffer, true);
>               }
>               else
>               {
> --- 2103,2110 ----
>                    */
>                   LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
>                   newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
> !                                                    buffer, true,
> !                                                    InvalidBlockNumber);
>               }
>               else
>               {
> Index: src/backend/access/heap/hio.c
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/hio.c,v
> retrieving revision 1.63
> diff -c -r1.63 hio.c
> *** src/backend/access/heap/hio.c    3 Jul 2006 22:45:37 -0000    1.63
> --- src/backend/access/heap/hio.c    9 Aug 2006 18:03:01 -0000
> ***************
> *** 93,98 ****
> --- 93,100 ----
>    *    any committed data of other transactions.  (See heap_insert's comments
>    *    for additional constraints needed for safe usage of this behavior.)
>    *
> +  *    If the caller has a suggestion, it's passed in suggestedBlock.
> +  *
>    *    We always try to avoid filling existing pages further than the fillfactor.
>    *    This is OK since this routine is not consulted when updating a tuple and
>    *    keeping it on the same page, which is the scenario fillfactor is meant
> ***************
> *** 103,109 ****
>    */
>   Buffer
>   RelationGetBufferForTuple(Relation relation, Size len,
> !                           Buffer otherBuffer, bool use_fsm)
>   {
>       Buffer        buffer = InvalidBuffer;
>       Page        pageHeader;
> --- 105,112 ----
>    */
>   Buffer
>   RelationGetBufferForTuple(Relation relation, Size len,
> !                           Buffer otherBuffer, bool use_fsm,
> !                           BlockNumber suggestedBlock)
>   {
>       Buffer        buffer = InvalidBuffer;
>       Page        pageHeader;
> ***************
> *** 135,142 ****
>           otherBlock = InvalidBlockNumber;        /* just to keep compiler quiet */
>
>       /*
> !      * We first try to put the tuple on the same page we last inserted a tuple
> !      * on, as cached in the relcache entry.  If that doesn't work, we ask the
>        * shared Free Space Map to locate a suitable page.  Since the FSM's info
>        * might be out of date, we have to be prepared to loop around and retry
>        * multiple times.    (To insure this isn't an infinite loop, we must update
> --- 138,147 ----
>           otherBlock = InvalidBlockNumber;        /* just to keep compiler quiet */
>
>       /*
> !      * We first try to put the tuple on the page suggested by the caller, if
> !      * any. Then we try to put the tuple on the same page we last inserted a
> !      * tuple on, as cached in the relcache entry. If that doesn't work, we
> !      * ask the
>        * shared Free Space Map to locate a suitable page.  Since the FSM's info
>        * might be out of date, we have to be prepared to loop around and retry
>        * multiple times.    (To insure this isn't an infinite loop, we must update
> ***************
> *** 144,152 ****
>        * not to be suitable.)  If the FSM has no record of a page with enough
>        * free space, we give up and extend the relation.
>        *
> !      * When use_fsm is false, we either put the tuple onto the existing target
> !      * page or extend the relation.
>        */
>       if (len + saveFreeSpace <= MaxTupleSize)
>           targetBlock = relation->rd_targblock;
>       else
> --- 149,167 ----
>        * not to be suitable.)  If the FSM has no record of a page with enough
>        * free space, we give up and extend the relation.
>        *
> !      * When use_fsm is false, we skip the fsm lookup if neither the suggested
> !      * nor the cached last insertion page has enough room, and extend the
> !      * relation.
> !      *
> !      * The fillfactor is taken into account when calculating the free space
> !      * on the cached target block, and when using the FSM. The suggested page
> !      * is used whenever there's enough room in it, regardless of the fillfactor,
> !      * because that's exactly the purpose the space is reserved for in the
> !      * first place.
>        */
> +     if (suggestedBlock != InvalidBlockNumber)
> +         targetBlock = suggestedBlock;
> +     else
>       if (len + saveFreeSpace <= MaxTupleSize)
>           targetBlock = relation->rd_targblock;
>       else
> ***************
> *** 219,224 ****
> --- 234,244 ----
>            */
>           pageHeader = (Page) BufferGetPage(buffer);
>           pageFreeSpace = PageGetFreeSpace(pageHeader);
> +
> +         /* If we're trying the suggested block, don't care about fillfactor */
> +         if (targetBlock == suggestedBlock && len <= pageFreeSpace)
> +             return buffer;
> +
>           if (len + saveFreeSpace <= pageFreeSpace)
>           {
>               /* use this page as future insert target, too */
> ***************
> *** 241,246 ****
> --- 261,275 ----
>               ReleaseBuffer(buffer);
>           }
>
> +         /* If we just tried the suggested block, try the cached target
> +          * block next, before consulting the FSM. */
> +         if(suggestedBlock == targetBlock)
> +         {
> +             targetBlock = relation->rd_targblock;
> +             suggestedBlock = InvalidBlockNumber;
> +             continue;
> +         }
> +
>           /* Without FSM, always fall out of the loop and extend */
>           if (!use_fsm)
>               break;
> Index: src/backend/access/index/genam.c
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/index/genam.c,v
> retrieving revision 1.58
> diff -c -r1.58 genam.c
> *** src/backend/access/index/genam.c    31 Jul 2006 20:08:59 -0000    1.58
> --- src/backend/access/index/genam.c    8 Aug 2006 16:17:21 -0000
> ***************
> *** 259,261 ****
> --- 259,275 ----
>
>       pfree(sysscan);
>   }
> +
> + /*
> +  * This is a dummy implementation of amsuggestblock, to be used for index
> +  * access methods that don't or can't support it. It just returns
> +  * InvalidBlockNumber, which means "no preference".
> +  *
> +  * This is probably not a good best place for this function, but it doesn't
> +  * fit naturally anywhere else either.
> +  */
> + Datum
> + dummysuggestblock(PG_FUNCTION_ARGS)
> + {
> +     PG_RETURN_UINT32(InvalidBlockNumber);
> + }
> Index: src/backend/access/index/indexam.c
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/index/indexam.c,v
> retrieving revision 1.94
> diff -c -r1.94 indexam.c
> *** src/backend/access/index/indexam.c    31 Jul 2006 20:08:59 -0000    1.94
> --- src/backend/access/index/indexam.c    8 Aug 2006 16:17:21 -0000
> ***************
> *** 18,23 ****
> --- 18,24 ----
>    *        index_rescan    - restart a scan of an index
>    *        index_endscan    - end a scan
>    *        index_insert    - insert an index tuple into a relation
> +  *        index_suggestblock    - get desired insert location for a heap tuple
>    *        index_markpos    - mark a scan position
>    *        index_restrpos    - restore a scan position
>    *        index_getnext    - get the next tuple from a scan
> ***************
> *** 202,207 ****
> --- 203,237 ----
>                                         BoolGetDatum(check_uniqueness)));
>   }
>
> + /* ----------------
> +  *        index_suggestblock - get desired insert location for a heap tuple
> +  *
> +  * The returned BlockNumber is the *heap* page that is the best place
> +  * to insert the given tuple to, according to the index am. The best
> +  * place is usually one that maintains the cluster order.
> +  * ----------------
> +  */
> + BlockNumber
> + index_suggestblock(Relation indexRelation,
> +                    Datum *values,
> +                    bool *isnull,
> +                    Relation heapRelation)
> + {
> +     FmgrInfo   *procedure;
> +
> +     RELATION_CHECKS;
> +     GET_REL_PROCEDURE(amsuggestblock);
> +
> +     /*
> +      * have the am's suggestblock proc do all the work.
> +      */
> +     return DatumGetUInt32(FunctionCall4(procedure,
> +                                       PointerGetDatum(indexRelation),
> +                                       PointerGetDatum(values),
> +                                       PointerGetDatum(isnull),
> +                                       PointerGetDatum(heapRelation)));
> + }
> +
>   /*
>    * index_beginscan - start a scan of an index with amgettuple
>    *
> Index: src/backend/access/nbtree/nbtinsert.c
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtinsert.c,v
> retrieving revision 1.142
> diff -c -r1.142 nbtinsert.c
> *** src/backend/access/nbtree/nbtinsert.c    25 Jul 2006 19:13:00 -0000    1.142
> --- src/backend/access/nbtree/nbtinsert.c    9 Aug 2006 17:51:33 -0000
> ***************
> *** 146,151 ****
> --- 146,221 ----
>   }
>
>   /*
> +  *    _bt_suggestblock() -- Find the heap block of the closest index tuple.
> +  *
> +  * The logic to find the target should match _bt_doinsert, otherwise
> +  * we'll be making bad suggestions.
> +  */
> + BlockNumber
> + _bt_suggestblock(Relation rel, IndexTuple itup, Relation heapRel)
> + {
> +     int            natts = rel->rd_rel->relnatts;
> +     OffsetNumber offset;
> +     Page        page;
> +     BTPageOpaque opaque;
> +
> +     ScanKey        itup_scankey;
> +     BTStack        stack;
> +     Buffer        buf;
> +     IndexTuple    curitup;
> +     BlockNumber suggestion = InvalidBlockNumber;
> +
> +     /* we need an insertion scan key to do our search, so build one */
> +     itup_scankey = _bt_mkscankey(rel, itup);
> +
> +     /* find the first page containing this key */
> +     stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_READ);
> +     if(!BufferIsValid(buf))
> +     {
> +         /* The index was completely empty. No suggestion then. */
> +         return InvalidBlockNumber;
> +     }
> +     /* we don't need the stack, so free it right away */
> +     _bt_freestack(stack);
> +
> +     page = BufferGetPage(buf);
> +     opaque = (BTPageOpaque) PageGetSpecialPointer(page);
> +
> +     /* Find the location in the page where the new index tuple would go to. */
> +
> +     offset = _bt_binsrch(rel, buf, natts, itup_scankey, false);
> +     if (offset > PageGetMaxOffsetNumber(page))
> +     {
> +         /* _bt_binsrch returned pointer to end-of-page. It means that
> +          * there was no equal items on the page, and the new item should
> +          * be inserted as the last tuple of the page. There could be equal
> +          * items on the next page, however.
> +          *
> +          * At the moment, we just ignore the potential equal items on the
> +          * right, and pretend there isn't any. We could instead walk right
> +          * to the next page to check that, but let's keep it simple for now.
> +          */
> +         offset = OffsetNumberPrev(offset);
> +     }
> +     if(offset < P_FIRSTDATAKEY(opaque))
> +     {
> +         /* We landed on an empty page. We could step left or right until
> +          * we find some items, but let's keep it simple for now.
> +          */
> +     } else {
> +         /* We're now positioned at the index tuple that we're interested in. */
> +
> +         curitup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offset));
> +         suggestion = ItemPointerGetBlockNumber(&curitup->t_tid);
> +     }
> +
> +     _bt_relbuf(rel, buf);
> +     _bt_freeskey(itup_scankey);
> +
> +     return suggestion;
> + }
> +
> + /*
>    *    _bt_check_unique() -- Check for violation of unique index constraint
>    *
>    * Returns InvalidTransactionId if there is no conflict, else an xact ID
> Index: src/backend/access/nbtree/nbtree.c
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtree.c,v
> retrieving revision 1.149
> diff -c -r1.149 nbtree.c
> *** src/backend/access/nbtree/nbtree.c    10 May 2006 23:18:39 -0000    1.149
> --- src/backend/access/nbtree/nbtree.c    9 Aug 2006 18:04:02 -0000
> ***************
> *** 228,233 ****
> --- 228,265 ----
>   }
>
>   /*
> +  *    btsuggestblock() -- find the best place in the heap to put a new tuple.
> +  *
> +  *        This uses the same logic as btinsert to find the place where the index
> +  *        tuple would go if this was a btinsert call.
> +  *
> +  *        There's room for improvement here. An insert operation will descend
> +  *        the tree twice, first by btsuggestblock, then by btinsert. Things
> +  *        might have changed in between, so that the heap tuple is actually
> +  *        not inserted in the optimal page, but since this is just an
> +  *        optimization, it's ok if it happens    sometimes.
> +  */
> + Datum
> + btsuggestblock(PG_FUNCTION_ARGS)
> + {
> +     Relation    rel = (Relation) PG_GETARG_POINTER(0);
> +     Datum       *values = (Datum *) PG_GETARG_POINTER(1);
> +     bool       *isnull = (bool *) PG_GETARG_POINTER(2);
> +     Relation    heapRel = (Relation) PG_GETARG_POINTER(3);
> +     IndexTuple    itup;
> +     BlockNumber suggestion;
> +
> +     /* generate an index tuple */
> +     itup = index_form_tuple(RelationGetDescr(rel), values, isnull);
> +
> +     suggestion =_bt_suggestblock(rel, itup, heapRel);
> +
> +     pfree(itup);
> +
> +     PG_RETURN_UINT32(suggestion);
> + }
> +
> + /*
>    *    btgettuple() -- Get the next tuple in the scan.
>    */
>   Datum
> Index: src/backend/executor/execMain.c
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/executor/execMain.c,v
> retrieving revision 1.277
> diff -c -r1.277 execMain.c
> *** src/backend/executor/execMain.c    31 Jul 2006 01:16:37 -0000    1.277
> --- src/backend/executor/execMain.c    8 Aug 2006 16:17:21 -0000
> ***************
> *** 892,897 ****
> --- 892,898 ----
>       resultRelInfo->ri_RangeTableIndex = resultRelationIndex;
>       resultRelInfo->ri_RelationDesc = resultRelationDesc;
>       resultRelInfo->ri_NumIndices = 0;
> +     resultRelInfo->ri_ClusterIndex = -1;
>       resultRelInfo->ri_IndexRelationDescs = NULL;
>       resultRelInfo->ri_IndexRelationInfo = NULL;
>       /* make a copy so as not to depend on relcache info not changing... */
> ***************
> *** 1388,1394 ****
>           heap_insert(estate->es_into_relation_descriptor, tuple,
>                       estate->es_snapshot->curcid,
>                       estate->es_into_relation_use_wal,
> !                     false);        /* never any point in using FSM */
>           /* we know there are no indexes to update */
>           heap_freetuple(tuple);
>           IncrAppended();
> --- 1389,1396 ----
>           heap_insert(estate->es_into_relation_descriptor, tuple,
>                       estate->es_snapshot->curcid,
>                       estate->es_into_relation_use_wal,
> !                     false, /* never any point in using FSM */
> !                     InvalidBlockNumber);
>           /* we know there are no indexes to update */
>           heap_freetuple(tuple);
>           IncrAppended();
> ***************
> *** 1419,1424 ****
> --- 1421,1427 ----
>       ResultRelInfo *resultRelInfo;
>       Relation    resultRelationDesc;
>       Oid            newId;
> +     BlockNumber suggestedBlock;
>
>       /*
>        * get the heap tuple out of the tuple table slot, making sure we have a
> ***************
> *** 1467,1472 ****
> --- 1470,1479 ----
>       if (resultRelationDesc->rd_att->constr)
>           ExecConstraints(resultRelInfo, slot, estate);
>
> +     /* Ask the index am of the clustered index for the
> +      * best place to put it */
> +     suggestedBlock = ExecSuggestBlock(slot, estate);
> +
>       /*
>        * insert the tuple
>        *
> ***************
> *** 1475,1481 ****
>        */
>       newId = heap_insert(resultRelationDesc, tuple,
>                           estate->es_snapshot->curcid,
> !                         true, true);
>
>       IncrAppended();
>       (estate->es_processed)++;
> --- 1482,1488 ----
>        */
>       newId = heap_insert(resultRelationDesc, tuple,
>                           estate->es_snapshot->curcid,
> !                         true, true, suggestedBlock);
>
>       IncrAppended();
>       (estate->es_processed)++;
> Index: src/backend/executor/execUtils.c
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/executor/execUtils.c,v
> retrieving revision 1.139
> diff -c -r1.139 execUtils.c
> *** src/backend/executor/execUtils.c    4 Aug 2006 21:33:36 -0000    1.139
> --- src/backend/executor/execUtils.c    9 Aug 2006 18:05:05 -0000
> ***************
> *** 31,36 ****
> --- 31,37 ----
>    *        ExecOpenIndices            \
>    *        ExecCloseIndices         | referenced by InitPlan, EndPlan,
>    *        ExecInsertIndexTuples    /  ExecInsert, ExecUpdate
> +  *        ExecSuggestBlock        Referenced by ExecInsert
>    *
>    *        RegisterExprContextCallback    Register function shutdown callback
>    *        UnregisterExprContextCallback  Deregister function shutdown callback
> ***************
> *** 874,879 ****
> --- 875,881 ----
>       IndexInfo **indexInfoArray;
>
>       resultRelInfo->ri_NumIndices = 0;
> +     resultRelInfo->ri_ClusterIndex = -1;
>
>       /* fast path if no indexes */
>       if (!RelationGetForm(resultRelation)->relhasindex)
> ***************
> *** 913,918 ****
> --- 915,925 ----
>           /* extract index key information from the index's pg_index info */
>           ii = BuildIndexInfo(indexDesc);
>
> +         /* Remember which index is the clustered one.
> +          * It's used to call the suggestblock-method on inserts */
> +         if(indexDesc->rd_index->indisclustered)
> +             resultRelInfo->ri_ClusterIndex = i;
> +
>           relationDescs[i] = indexDesc;
>           indexInfoArray[i] = ii;
>           i++;
> ***************
> *** 1062,1067 ****
> --- 1069,1137 ----
>       }
>   }
>
> + /* ----------------------------------------------------------------
> +  *        ExecSuggestBlock
> +  *
> +  *        This routine asks the index am where a new heap tuple
> +  *        should be placed.
> +  * ----------------------------------------------------------------
> +  */
> + BlockNumber
> + ExecSuggestBlock(TupleTableSlot *slot,
> +                  EState *estate)
> + {
> +     ResultRelInfo *resultRelInfo;
> +     int            i;
> +     Relation    relationDesc;
> +     Relation    heapRelation;
> +     ExprContext *econtext;
> +     Datum        values[INDEX_MAX_KEYS];
> +     bool        isnull[INDEX_MAX_KEYS];
> +     IndexInfo  *indexInfo;
> +
> +     /*
> +      * Get information from the result relation info structure.
> +      */
> +     resultRelInfo = estate->es_result_relation_info;
> +     i = resultRelInfo->ri_ClusterIndex;
> +     if(i == -1)
> +         return InvalidBlockNumber; /* there was no clustered index */
> +
> +     heapRelation = resultRelInfo->ri_RelationDesc;
> +     relationDesc = resultRelInfo->ri_IndexRelationDescs[i];
> +     indexInfo = resultRelInfo->ri_IndexRelationInfo[i];
> +
> +     /* You can't cluster on a partial index */
> +     Assert(indexInfo->ii_Predicate == NIL);
> +
> +     /*
> +      * We will use the EState's per-tuple context for evaluating
> +      * index expressions (creating it if it's not already there).
> +      */
> +     econtext = GetPerTupleExprContext(estate);
> +
> +     /* Arrange for econtext's scan tuple to be the tuple under test */
> +     econtext->ecxt_scantuple = slot;
> +
> +     /*
> +      * FormIndexDatum fills in its values and isnull parameters with the
> +      * appropriate values for the column(s) of the index.
> +      */
> +     FormIndexDatum(indexInfo,
> +                    slot,
> +                    estate,
> +                    values,
> +                    isnull);
> +
> +     /*
> +      * The index AM does the rest.
> +      */
> +     return index_suggestblock(relationDesc,    /* index relation */
> +                  values,    /* array of index Datums */
> +                  isnull,    /* null flags */
> +                  heapRelation);
> + }
> +
>   /*
>    * UpdateChangedParamSet
>    *        Add changed parameters to a plan node's chgParam set
> Index: src/include/access/genam.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/genam.h,v
> retrieving revision 1.65
> diff -c -r1.65 genam.h
> *** src/include/access/genam.h    31 Jul 2006 20:09:05 -0000    1.65
> --- src/include/access/genam.h    9 Aug 2006 17:53:44 -0000
> ***************
> *** 93,98 ****
> --- 93,101 ----
>                ItemPointer heap_t_ctid,
>                Relation heapRelation,
>                bool check_uniqueness);
> + extern BlockNumber index_suggestblock(Relation indexRelation,
> +              Datum *values, bool *isnull,
> +              Relation heapRelation);
>
>   extern IndexScanDesc index_beginscan(Relation heapRelation,
>                   Relation indexRelation,
> ***************
> *** 123,128 ****
> --- 126,133 ----
>   extern FmgrInfo *index_getprocinfo(Relation irel, AttrNumber attnum,
>                     uint16 procnum);
>
> + extern Datum dummysuggestblock(PG_FUNCTION_ARGS);
> +
>   /*
>    * index access method support routines (in genam.c)
>    */
> Index: src/include/access/heapam.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/heapam.h,v
> retrieving revision 1.114
> diff -c -r1.114 heapam.h
> *** src/include/access/heapam.h    3 Jul 2006 22:45:39 -0000    1.114
> --- src/include/access/heapam.h    8 Aug 2006 16:17:21 -0000
> ***************
> *** 156,162 ****
>   extern void setLastTid(const ItemPointer tid);
>
>   extern Oid heap_insert(Relation relation, HeapTuple tup, CommandId cid,
> !             bool use_wal, bool use_fsm);
>   extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
>               ItemPointer ctid, TransactionId *update_xmax,
>               CommandId cid, Snapshot crosscheck, bool wait);
> --- 156,162 ----
>   extern void setLastTid(const ItemPointer tid);
>
>   extern Oid heap_insert(Relation relation, HeapTuple tup, CommandId cid,
> !             bool use_wal, bool use_fsm, BlockNumber suggestedblk);
>   extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
>               ItemPointer ctid, TransactionId *update_xmax,
>               CommandId cid, Snapshot crosscheck, bool wait);
> Index: src/include/access/hio.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/hio.h,v
> retrieving revision 1.32
> diff -c -r1.32 hio.h
> *** src/include/access/hio.h    13 Jul 2006 17:47:01 -0000    1.32
> --- src/include/access/hio.h    8 Aug 2006 16:17:21 -0000
> ***************
> *** 21,26 ****
>   extern void RelationPutHeapTuple(Relation relation, Buffer buffer,
>                        HeapTuple tuple);
>   extern Buffer RelationGetBufferForTuple(Relation relation, Size len,
> !                     Buffer otherBuffer, bool use_fsm);
>
>   #endif   /* HIO_H */
> --- 21,26 ----
>   extern void RelationPutHeapTuple(Relation relation, Buffer buffer,
>                        HeapTuple tuple);
>   extern Buffer RelationGetBufferForTuple(Relation relation, Size len,
> !                     Buffer otherBuffer, bool use_fsm, BlockNumber suggestedblk);
>
>   #endif   /* HIO_H */
> Index: src/include/access/nbtree.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/nbtree.h,v
> retrieving revision 1.103
> diff -c -r1.103 nbtree.h
> *** src/include/access/nbtree.h    7 Aug 2006 16:57:57 -0000    1.103
> --- src/include/access/nbtree.h    8 Aug 2006 16:17:21 -0000
> ***************
> *** 467,472 ****
> --- 467,473 ----
>   extern Datum btbulkdelete(PG_FUNCTION_ARGS);
>   extern Datum btvacuumcleanup(PG_FUNCTION_ARGS);
>   extern Datum btoptions(PG_FUNCTION_ARGS);
> + extern Datum btsuggestblock(PG_FUNCTION_ARGS);
>
>   /*
>    * prototypes for functions in nbtinsert.c
> ***************
> *** 476,481 ****
> --- 477,484 ----
>   extern Buffer _bt_getstackbuf(Relation rel, BTStack stack, int access);
>   extern void _bt_insert_parent(Relation rel, Buffer buf, Buffer rbuf,
>                     BTStack stack, bool is_root, bool is_only);
> + extern BlockNumber _bt_suggestblock(Relation rel, IndexTuple itup,
> +              Relation heapRel);
>
>   /*
>    * prototypes for functions in nbtpage.c
> Index: src/include/catalog/pg_am.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/catalog/pg_am.h,v
> retrieving revision 1.46
> diff -c -r1.46 pg_am.h
> *** src/include/catalog/pg_am.h    31 Jul 2006 20:09:05 -0000    1.46
> --- src/include/catalog/pg_am.h    8 Aug 2006 16:17:21 -0000
> ***************
> *** 65,70 ****
> --- 65,71 ----
>       regproc        amvacuumcleanup;    /* post-VACUUM cleanup function */
>       regproc        amcostestimate; /* estimate cost of an indexscan */
>       regproc        amoptions;        /* parse AM-specific parameters */
> +     regproc        amsuggestblock;    /* suggest a block where to put heap tuple */
>   } FormData_pg_am;
>
>   /* ----------------
> ***************
> *** 78,84 ****
>    *        compiler constants for pg_am
>    * ----------------
>    */
> ! #define Natts_pg_am                        23
>   #define Anum_pg_am_amname                1
>   #define Anum_pg_am_amstrategies            2
>   #define Anum_pg_am_amsupport            3
> --- 79,85 ----
>    *        compiler constants for pg_am
>    * ----------------
>    */
> ! #define Natts_pg_am                        24
>   #define Anum_pg_am_amname                1
>   #define Anum_pg_am_amstrategies            2
>   #define Anum_pg_am_amsupport            3
> ***************
> *** 102,123 ****
>   #define Anum_pg_am_amvacuumcleanup        21
>   #define Anum_pg_am_amcostestimate        22
>   #define Anum_pg_am_amoptions            23
>
>   /* ----------------
>    *        initial contents of pg_am
>    * ----------------
>    */
>
> ! DATA(insert OID = 403 (  btree    5 1 1 t t t t f t btinsert btbeginscan btgettuple btgetmulti btrescan btendscan
btmarkposbtrestrpos btbuild btbulkdelete btvacuumcleanup btcostestimate btoptions )); 
>   DESCR("b-tree index access method");
>   #define BTREE_AM_OID 403
> ! DATA(insert OID = 405 (  hash    1 1 0 f f f f f f hashinsert hashbeginscan hashgettuple hashgetmulti hashrescan
hashendscanhashmarkpos hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions )); 
>   DESCR("hash index access method");
>   #define HASH_AM_OID 405
> ! DATA(insert OID = 783 (  gist    100 7 0 f t t t t t gistinsert gistbeginscan gistgettuple gistgetmulti gistrescan
gistendscangistmarkpos gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions )); 
>   DESCR("GiST index access method");
>   #define GIST_AM_OID 783
> ! DATA(insert OID = 2742 (  gin    100 4 0 f f f f t f gininsert ginbeginscan gingettuple gingetmulti ginrescan
ginendscanginmarkpos ginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate ginoptions )); 
>   DESCR("GIN index access method");
>   #define GIN_AM_OID 2742
>
> --- 103,125 ----
>   #define Anum_pg_am_amvacuumcleanup        21
>   #define Anum_pg_am_amcostestimate        22
>   #define Anum_pg_am_amoptions            23
> + #define Anum_pg_am_amsuggestblock        24
>
>   /* ----------------
>    *        initial contents of pg_am
>    * ----------------
>    */
>
> ! DATA(insert OID = 403 (  btree    5 1 1 t t t t f t btinsert btbeginscan btgettuple btgetmulti btrescan btendscan
btmarkposbtrestrpos btbuild btbulkdelete btvacuumcleanup btcostestimate btoptions btsuggestblock)); 
>   DESCR("b-tree index access method");
>   #define BTREE_AM_OID 403
> ! DATA(insert OID = 405 (  hash    1 1 0 f f f f f f hashinsert hashbeginscan hashgettuple hashgetmulti hashrescan
hashendscanhashmarkpos hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions
dummysuggestblock));
>   DESCR("hash index access method");
>   #define HASH_AM_OID 405
> ! DATA(insert OID = 783 (  gist    100 7 0 f t t t t t gistinsert gistbeginscan gistgettuple gistgetmulti gistrescan
gistendscangistmarkpos gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions
dummysuggestblock));
>   DESCR("GiST index access method");
>   #define GIST_AM_OID 783
> ! DATA(insert OID = 2742 (  gin    100 4 0 f f f f t f gininsert ginbeginscan gingettuple gingetmulti ginrescan
ginendscanginmarkpos ginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate ginoptions dummysuggestblock
));
>   DESCR("GIN index access method");
>   #define GIN_AM_OID 2742
>
> Index: src/include/catalog/pg_proc.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/catalog/pg_proc.h,v
> retrieving revision 1.420
> diff -c -r1.420 pg_proc.h
> *** src/include/catalog/pg_proc.h    6 Aug 2006 03:53:44 -0000    1.420
> --- src/include/catalog/pg_proc.h    9 Aug 2006 18:06:44 -0000
> ***************
> *** 682,687 ****
> --- 682,689 ----
>   DESCR("btree(internal)");
>   DATA(insert OID = 2785 (  btoptions           PGNSP PGUID 12 f f t f s 2 17 "1009 16" _null_ _null_ _null_
btoptions- _null_ )); 
>   DESCR("btree(internal)");
> + DATA(insert OID = 2852 (  btsuggestblock   PGNSP PGUID 12 f f t f v 4 23 "2281 2281 2281 2281" _null_ _null_ _null_
  btsuggestblock - _null_ )); 
> + DESCR("btree(internal)");
>
>   DATA(insert OID = 339 (  poly_same           PGNSP PGUID 12 f f t f i 2 16 "604 604" _null_ _null_ _null_ poly_same
-_null_ )); 
>   DESCR("same as?");
> ***************
> *** 3936,3941 ****
> --- 3938,3946 ----
>   DATA(insert OID = 2749 (  arraycontained       PGNSP PGUID 12 f f t f i 2 16 "2277 2277" _null_ _null_ _null_
arraycontained- _null_ )); 
>   DESCR("anyarray contained");
>
> + DATA(insert OID = 2853 (  dummysuggestblock   PGNSP PGUID 12 f f t f v 4 23 "2281 2281 2281 2281" _null_ _null_
_null_   dummysuggestblock - _null_ )); 
> + DESCR("dummy amsuggestblock implementation (internal)");
> +
>   /*
>    * Symbolic values for provolatile column: these indicate whether the result
>    * of a function is dependent *only* on the values of its explicit arguments,
> Index: src/include/executor/executor.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/executor/executor.h,v
> retrieving revision 1.128
> diff -c -r1.128 executor.h
> *** src/include/executor/executor.h    4 Aug 2006 21:33:36 -0000    1.128
> --- src/include/executor/executor.h    8 Aug 2006 16:17:21 -0000
> ***************
> *** 271,276 ****
> --- 271,277 ----
>   extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
>   extern void ExecInsertIndexTuples(TupleTableSlot *slot, ItemPointer tupleid,
>                         EState *estate, bool is_vacuum);
> + extern BlockNumber ExecSuggestBlock(TupleTableSlot *slot, EState *estate);
>
>   extern void RegisterExprContextCallback(ExprContext *econtext,
>                               ExprContextCallbackFunction function,
> Index: src/include/nodes/execnodes.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/nodes/execnodes.h,v
> retrieving revision 1.158
> diff -c -r1.158 execnodes.h
> *** src/include/nodes/execnodes.h    4 Aug 2006 21:33:36 -0000    1.158
> --- src/include/nodes/execnodes.h    8 Aug 2006 16:17:21 -0000
> ***************
> *** 257,262 ****
> --- 257,264 ----
>    *        NumIndices                # of indices existing on result relation
>    *        IndexRelationDescs        array of relation descriptors for indices
>    *        IndexRelationInfo        array of key/attr info for indices
> +  *        ClusterIndex            index to the IndexRelationInfo array of the
> +  *                                clustered index, or -1 if there's none
>    *        TrigDesc                triggers to be fired, if any
>    *        TrigFunctions            cached lookup info for trigger functions
>    *        TrigInstrument            optional runtime measurements for triggers
> ***************
> *** 272,277 ****
> --- 274,280 ----
>       int            ri_NumIndices;
>       RelationPtr ri_IndexRelationDescs;
>       IndexInfo **ri_IndexRelationInfo;
> +     int         ri_ClusterIndex;
>       TriggerDesc *ri_TrigDesc;
>       FmgrInfo   *ri_TrigFunctions;
>       struct Instrumentation *ri_TrigInstrument;
> Index: src/include/utils/rel.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/utils/rel.h,v
> retrieving revision 1.91
> diff -c -r1.91 rel.h
> *** src/include/utils/rel.h    3 Jul 2006 22:45:41 -0000    1.91
> --- src/include/utils/rel.h    8 Aug 2006 16:17:21 -0000
> ***************
> *** 116,121 ****
> --- 116,122 ----
>       FmgrInfo    amvacuumcleanup;
>       FmgrInfo    amcostestimate;
>       FmgrInfo    amoptions;
> +     FmgrInfo    amsuggestblock;
>   } RelationAmInfo;
>
>

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

pgsql-patches by date:

Previous
From: Andrew Chernow
Date:
Subject: Re: libpq Win32 Mutex performance patch
Next
From: Andrew Chernow
Date:
Subject: Re: libpq Win32 Mutex performance patch