Thread: CLUSTER patch

CLUSTER patch

From
Alvaro Herrera
Date:
Hello:

I think I have some more knowledge on my failing to build a working
CLUSTER patch.  It does not help me to fix it, however.

The way the cluster is done is still the same: first cluster the heap
and swap relfilenodes, then drop old heap; then rebuild each index, swap
relfilenodes with old index and drop new.

If I don't use the new relfilenode for anything after building it,
everything works OK.  But I can't attach the old filenode to the new
heap; it has to stay around.  I do can attach the new filenode to the
old heap however.  But as soon as I try to do anything to it (the new,
clustered filenode) before transaction commit (building an index, say),
the local buffer manager fails an assertion (actually,
RelationNodeCacheGetRelation returns 0 for the given rnode), and the
transaction aborts.

Thus, if I comment the lines that attach the old rnode to the new heap
and the lines that create the new indexes, the cluster works, and I now
have two tables with the same rnode, one of the named
"temp_<oid-of-the-other>".  Obviously dropping any of those renders the
other useless (no disk storage).


What I still don't understand is why the RelationNodeCache doesn't
return the buffer I'm looking for.  I added some debugging fprintf's to
the bufmgr code and the relation _is_ entered (!?) during the cluster
transaction.  Maybe somewhere the local buffer manager drops the
knowledge about the relation, or that knowledge is based on (OID +
rnode) info, which changed because the rnode changed.  I do not
understand the relcaching bussiness anyway.


I attach anyway the old version of the patch, that reconstructs the
indexes without the filenode swapping bussiness.  The observations done
to the earlier patch are corrected.  I think that if nothing comes to
fix the problems with the newer approach and there are no other
objections, this one should be applied as it enhances the current
version anyway.

I'm leaving for winter vacation on monday and probably won't be back on
two weeks.

--
Alvaro Herrera (<alvherre[a]atentus.com>)
"La vida es para el que se aventura"

Attachment

Re: CLUSTER patch

From
Bruce Momjian
Date:
I am working on the same thing myself.  Attached is a version of the
patch that applies to current CVS.  I worked with
RelationClearRelation() to clear the cache for the relations.

It now reports "Relation still open" during index_drop(), so something
still isn't right.  I think the whole problem stems from the fact that
the relation cache caches by relnode, so when we change it, the cache
gets all confused.  The RelationClearRelation() change handles this in a
very crude way, but it seems to be in the right direction for a
solution.

---------------------------------------------------------------------------

Alvaro Herrera wrote:
> Hello:
>
> I think I have some more knowledge on my failing to build a working
> CLUSTER patch.  It does not help me to fix it, however.
>
> The way the cluster is done is still the same: first cluster the heap
> and swap relfilenodes, then drop old heap; then rebuild each index, swap
> relfilenodes with old index and drop new.
>
> If I don't use the new relfilenode for anything after building it,
> everything works OK.  But I can't attach the old filenode to the new
> heap; it has to stay around.  I do can attach the new filenode to the
> old heap however.  But as soon as I try to do anything to it (the new,
> clustered filenode) before transaction commit (building an index, say),
> the local buffer manager fails an assertion (actually,
> RelationNodeCacheGetRelation returns 0 for the given rnode), and the
> transaction aborts.
>
> Thus, if I comment the lines that attach the old rnode to the new heap
> and the lines that create the new indexes, the cluster works, and I now
> have two tables with the same rnode, one of the named
> "temp_<oid-of-the-other>".  Obviously dropping any of those renders the
> other useless (no disk storage).
>
>
> What I still don't understand is why the RelationNodeCache doesn't
> return the buffer I'm looking for.  I added some debugging fprintf's to
> the bufmgr code and the relation _is_ entered (!?) during the cluster
> transaction.  Maybe somewhere the local buffer manager drops the
> knowledge about the relation, or that knowledge is based on (OID +
> rnode) info, which changed because the rnode changed.  I do not
> understand the relcaching bussiness anyway.
>
>
> I attach anyway the old version of the patch, that reconstructs the
> indexes without the filenode swapping bussiness.  The observations done
> to the earlier patch are corrected.  I think that if nothing comes to
> fix the problems with the newer approach and there are no other
> objections, this one should be applied as it enhances the current
> version anyway.
>
> I'm leaving for winter vacation on monday and probably won't be back on
> two weeks.
>
> --
> Alvaro Herrera (<alvherre[a]atentus.com>)
> "La vida es para el que se aventura"

Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
Index: src/backend/commands/cluster.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.83
diff -c -r1.83 cluster.c
*** src/backend/commands/cluster.c    12 Jul 2002 18:43:15 -0000    1.83
--- src/backend/commands/cluster.c    14 Jul 2002 06:38:44 -0000
***************
*** 15,21 ****
   *
   *
   * IDENTIFICATION
!  *      $Header: /cvsroot/pgsql/src/backend/commands/cluster.c,v 1.83 2002/07/12 18:43:15 tgl Exp $
   *
   *-------------------------------------------------------------------------
   */
--- 15,21 ----
   *
   *
   * IDENTIFICATION
!  *      $Header: /cvsroot/pgsql/src/backend/commands/cluster.c,v 1.82 2002/06/20 20:29:26 momjian Exp $
   *
   *-------------------------------------------------------------------------
   */
***************
*** 24,74 ****

  #include "access/genam.h"
  #include "access/heapam.h"
- #include "catalog/dependency.h"
  #include "catalog/heap.h"
  #include "catalog/index.h"
  #include "catalog/pg_index.h"
  #include "catalog/pg_proc.h"
  #include "commands/cluster.h"
  #include "commands/tablecmds.h"
  #include "miscadmin.h"
  #include "utils/builtins.h"
  #include "utils/lsyscache.h"
  #include "utils/syscache.h"


  static Oid    copy_heap(Oid OIDOldHeap, const char *NewName);
- static Oid    copy_index(Oid OIDOldIndex, Oid OIDNewHeap,
-                        const char *NewIndexName);
  static void rebuildheap(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex);

  /*
   * cluster
   *
   * STILL TO DO:
!  *     Create a list of all the other indexes on this relation. Because
!  *     the cluster will wreck all the tids, I'll need to destroy bogus
!  *     indexes. The user will have to re-create them. Not nice, but
!  *     I'm not a nice guy. The alternative is to try some kind of post
!  *     destroy re-build. This may be possible. I'll check out what the
!  *     index create functiond want in the way of paramaters. On the other
!  *     hand, re-creating n indexes may blow out the space.
   */
  void
  cluster(RangeVar *oldrelation, char *oldindexname)
  {
      Oid            OIDOldHeap,
                  OIDOldIndex,
!                 OIDNewHeap,
!                 OIDNewIndex;
      Relation    OldHeap,
                  OldIndex;
      char        NewHeapName[NAMEDATALEN];
!     char        NewIndexName[NAMEDATALEN];
      ObjectAddress object;

      /*
!      * We grab exclusive access to the target rel and index for the
       * duration of the transaction.
       */
      OldHeap = heap_openrv(oldrelation, AccessExclusiveLock);
--- 24,100 ----

  #include "access/genam.h"
  #include "access/heapam.h"
  #include "catalog/heap.h"
+ #include "catalog/dependency.h"
  #include "catalog/index.h"
+ #include "catalog/indexing.h"
+ #include "catalog/catname.h"
  #include "catalog/pg_index.h"
  #include "catalog/pg_proc.h"
  #include "commands/cluster.h"
  #include "commands/tablecmds.h"
  #include "miscadmin.h"
  #include "utils/builtins.h"
+ #include "utils/fmgroids.h"
  #include "utils/lsyscache.h"
+ #include "utils/relcache.h"
  #include "utils/syscache.h"

+ /*
+  * We need one of these structs for each index in the relation to be
+  * clustered.  It's basically the data needed by index_create() so
+  * we can recreate the indexes after destroying the old heap.
+  */
+ typedef struct
+ {
+     char       *indexName;
+     IndexInfo  *indexInfo;
+     Oid            accessMethodOID;
+     Oid           *classOID;
+     Oid            indexOID;
+     bool        isPrimary;
+ } IndexAttrs;

  static Oid    copy_heap(Oid OIDOldHeap, const char *NewName);
  static void rebuildheap(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex);
+ List *get_indexattr_list (Oid OIDOldHeap);
+ void recreate_indexattr(Oid OIDOldHeap, List *indexes);
+ void swap_relfilenodes(Oid r1, Oid r2);
+
+ void
+ RelationClearRelation(Relation relation, bool rebuildIt);

  /*
   * cluster
   *
   * STILL TO DO:
!  *   Keep foreign keys, permissions and inheritance of the clustered table.
!  *
!  *   We need to look at making use of the ability to write a new version of a
!  *   table (or index) under a new relfilenode value, without changing the
!  *   table's OID.
   */
  void
  cluster(RangeVar *oldrelation, char *oldindexname)
  {
      Oid            OIDOldHeap,
                  OIDOldIndex,
!                 OIDNewHeap;
      Relation    OldHeap,
                  OldIndex;
      char        NewHeapName[NAMEDATALEN];
!     List       *indexes;
      ObjectAddress object;

+     /* The games with filenodes may not be rollbackable, so
+      * disallow running this inside a transaction block.
+      * This may be a false assumption though.
+      */
+     if (IsTransactionBlock())
+         elog (ERROR, "CLUSTER: may not be called inside a transaction block");
+
      /*
!      * We grab exclusive access to the target relation for the
       * duration of the transaction.
       */
      OldHeap = heap_openrv(oldrelation, AccessExclusiveLock);
***************
*** 96,143 ****
      heap_close(OldHeap, NoLock);
      index_close(OldIndex);

!     /*
!      * Create the new heap with a temporary name.
!      */
!     snprintf(NewHeapName, NAMEDATALEN, "temp_%u", OIDOldHeap);

      OIDNewHeap = copy_heap(OIDOldHeap, NewHeapName);

      /* We do not need CommandCounterIncrement() because copy_heap did it. */

!     /*
!      * Copy the heap data into the new table in the desired order.
!      */
      rebuildheap(OIDNewHeap, OIDOldHeap, OIDOldIndex);

      /* To make the new heap's data visible. */
      CommandCounterIncrement();

-     /* Create new index over the tuples of the new heap. */
-     snprintf(NewIndexName, NAMEDATALEN, "temp_%u", OIDOldIndex);

!     OIDNewIndex = copy_index(OIDOldIndex, OIDNewHeap, NewIndexName);

!     CommandCounterIncrement();

!     /* Destroy old heap (along with its index) and rename new. */
      object.classId = RelOid_pg_class;
!     object.objectId = OIDOldHeap;
      object.objectSubId = 0;

      /* XXX better to use DROP_CASCADE here? */
      performDeletion(&object, DROP_RESTRICT);
-
      /* performDeletion does CommandCounterIncrement at end */
-
-     renamerel(OIDNewHeap, oldrelation->relname);
-
-     /* This one might be unnecessary, but let's be safe. */
-     CommandCounterIncrement();
-
-     renamerel(OIDNewIndex, oldindexname);
  }

  static Oid
  copy_heap(Oid OIDOldHeap, const char *NewName)
  {
--- 122,165 ----
      heap_close(OldHeap, NoLock);
      index_close(OldIndex);

!     /* Save the information of all indexes on the relation. */
!     indexes = get_indexattr_list(OIDOldHeap);

+     /* Create the new heap with a temporary name. */
+     snprintf(NewHeapName, NAMEDATALEN, "temp_%u", OIDOldHeap);
      OIDNewHeap = copy_heap(OIDOldHeap, NewHeapName);

      /* We do not need CommandCounterIncrement() because copy_heap did it. */

!     /* Copy the heap data into the new table in the desired order. */
      rebuildheap(OIDNewHeap, OIDOldHeap, OIDOldIndex);

      /* To make the new heap's data visible. */
      CommandCounterIncrement();


!     /* Swap the relfilenodes of the old and new heaps. */
!     swap_relfilenodes(OIDNewHeap, OIDOldHeap);

!     /* At this point, Old points to the new file, and new points to old */
!
!     /* Recreate the indexes on the relation.  We do not need
!      * CommandCounterIncrement() because recreate_indexattr does it.
!      */
!     recreate_indexattr(OIDOldHeap, indexes);

!     /* Destroy the new heap, carrying the old filenode along. */
      object.classId = RelOid_pg_class;
!     object.objectId = OIDNewHeap;
      object.objectSubId = 0;

      /* XXX better to use DROP_CASCADE here? */
      performDeletion(&object, DROP_RESTRICT);
      /* performDeletion does CommandCounterIncrement at end */
  }

+ /* Create and initialize the new heap
+  */
  static Oid
  copy_heap(Oid OIDOldHeap, const char *NewName)
  {
***************
*** 181,223 ****
      return OIDNewHeap;
  }

! static Oid
! copy_index(Oid OIDOldIndex, Oid OIDNewHeap, const char *NewIndexName)
! {
!     Oid            OIDNewIndex;
!     Relation    OldIndex,
!                 NewHeap;
!     IndexInfo  *indexInfo;
!
!     NewHeap = heap_open(OIDNewHeap, AccessExclusiveLock);
!     OldIndex = index_open(OIDOldIndex);
!
!     /*
!      * Create a new index like the old one.  To do this I get the info
!      * from pg_index, and add a new index with a temporary name (that will
!      * be changed later).
!      */
!     indexInfo = BuildIndexInfo(OldIndex->rd_index);
!
!     OIDNewIndex = index_create(OIDNewHeap,
!                                NewIndexName,
!                                indexInfo,
!                                OldIndex->rd_rel->relam,
!                                OldIndex->rd_index->indclass,
!                                OldIndex->rd_index->indisprimary,
!                                false, /* XXX losing constraint status */
!                                allowSystemTableMods);
!
!     setRelhasindex(OIDNewHeap, true,
!                    OldIndex->rd_index->indisprimary, InvalidOid);
!
!     index_close(OldIndex);
!     heap_close(NewHeap, NoLock);
!
!     return OIDNewIndex;
! }
!
!
  static void
  rebuildheap(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex)
  {
--- 203,210 ----
      return OIDNewHeap;
  }

! /* Load the data into the new heap, clustered.
!  */
  static void
  rebuildheap(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex)
  {
***************
*** 260,263 ****
--- 247,415 ----
      index_close(LocalOldIndex);
      heap_close(LocalOldHeap, NoLock);
      heap_close(LocalNewHeap, NoLock);
+ }
+
+ /* Get the necessary info about the indexes in the relation and
+  * return a List of IndexAttrs.
+  */
+ List *
+ get_indexattr_list (Oid OIDOldHeap)
+ {
+     ScanKeyData    entry;
+     HeapScanDesc scan;
+     Relation indexRelation;
+     HeapTuple indexTuple;
+     List *indexes = NIL;
+     IndexAttrs *attrs;
+     HeapTuple tuple;
+     Form_pg_index index;
+
+     /* Grab the index tuples by looking into RelationRelationName
+      * by the OID of the old heap.
+      */
+     indexRelation = heap_openr(IndexRelationName, AccessShareLock);
+     ScanKeyEntryInitialize(&entry, 0, Anum_pg_index_indrelid,
+             F_OIDEQ, ObjectIdGetDatum(OIDOldHeap));
+     scan = heap_beginscan(indexRelation, SnapshotNow, 1, &entry);
+     while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+     {
+         index = (Form_pg_index) GETSTRUCT(indexTuple);
+
+         attrs = (IndexAttrs *) palloc(sizeof(IndexAttrs));
+         attrs->indexInfo = BuildIndexInfo(index);
+         attrs->isPrimary = index->indisprimary;
+         attrs->indexOID = index->indexrelid;
+
+         /* The opclasses are copied verbatim from the original indexes.
+         */
+         attrs->classOID = (Oid *)palloc(sizeof(Oid) *
+                 attrs->indexInfo->ii_NumIndexAttrs);
+         memcpy(attrs->classOID, index->indclass,
+                 sizeof(Oid) * attrs->indexInfo->ii_NumIndexAttrs);
+
+         /* Name and access method of each index come from
+          * RelationRelationName.
+          */
+         tuple = SearchSysCache(RELOID,
+                 ObjectIdGetDatum(attrs->indexOID),
+                 0, 0, 0);
+         if (!HeapTupleIsValid(tuple))
+             elog(ERROR, "CLUSTER: cannot find index %u", attrs->indexOID);
+         attrs->indexName = pstrdup(NameStr(((Form_pg_class) GETSTRUCT(tuple))->relname));
+         attrs->accessMethodOID = ((Form_pg_class) GETSTRUCT(tuple))->relam;
+         ReleaseSysCache(tuple);
+
+         /* Cons the gathered data into the list.  We do not care about
+          * ordering, and this is more efficient than append.
+          */
+         indexes=lcons((void *)attrs, indexes);
+     }
+     heap_endscan(scan);
+     heap_close(indexRelation, NoLock);
+     return indexes;
+ }
+
+ /* Create new indexes and swap the filenodes with old indexes.  Then drop
+  * the new index (carrying the old heap along).
+  */
+ void
+ recreate_indexattr(Oid OIDOldHeap, List *indexes)
+ {
+     IndexAttrs *attrs;
+     List        *elem;
+     Oid            newIndexOID;
+     char        newIndexName[NAMEDATALEN];
+
+     foreach (elem, indexes)
+     {
+         attrs=(IndexAttrs *) lfirst(elem);
+
+         /* Create the new index under a temporary name */
+         snprintf(newIndexName, NAMEDATALEN, "temp_%u", attrs->indexOID);
+         newIndexOID = index_create(OIDOldHeap, newIndexName,
+                                    attrs->indexInfo, attrs->accessMethodOID,
+                                    attrs->classOID, attrs->isPrimary,
+                                    false, /* XXX losing constraint status */
+                                    allowSystemTableMods);
+         CommandCounterIncrement();
+
+         /* Swap the filenodes. */
+         swap_relfilenodes(newIndexOID, attrs->indexOID);
+         setRelhasindex(OIDOldHeap, true, attrs->isPrimary, InvalidOid);
+
+         /* I'm not sure this one is needed, but let's be safe. */
+         CommandCounterIncrement();
+
+         /* Drop the new index, carrying the old filenode along. */
+         index_drop(newIndexOID);
+         CommandCounterIncrement();
+
+         pfree(attrs->classOID);
+         pfree(attrs);
+     }
+     freeList(indexes);
+ }
+
+ /* Swap the relfilenodes for two given relations.
+  */
+ void
+ swap_relfilenodes(Oid r1, Oid r2)
+ {
+     /* I can probably keep RelationRelationName open in the main
+      * function and pass the Relation around so I don't have to open
+      * it avery time.
+      */
+     Relation    relRelation,
+                 irels[Num_pg_class_indices];
+     HeapTuple    reltup[2];
+     Oid            tempRFNode;
+
+     RelationClearRelation(RelationIdGetRelation(r1), false);
+ //       RelationClearRelation(RelationIdGetRelation(r2), false);
+
+     CommandCounterIncrement();
+     /* We need both RelationRelationName tuples.  */
+     relRelation = heap_openr(RelationRelationName, RowExclusiveLock);
+
+     reltup[0] = SearchSysCacheCopy(RELOID,
+                                    ObjectIdGetDatum(r1),
+                                    0, 0, 0);
+     if (!HeapTupleIsValid(reltup[0]))
+         elog(ERROR, "CLUSTER: Cannot find tuple for relation %u", r1);
+     reltup[1] = SearchSysCacheCopy(RELOID,
+                                    ObjectIdGetDatum(r2),
+                                    0, 0, 0);
+     if (!HeapTupleIsValid(reltup[1]))
+         elog(ERROR, "CLUSTER: Cannot find tuple for relation %u", r2);
+
+     /* Swap the filenodes */
+     tempRFNode = ((Form_pg_class) GETSTRUCT(reltup[0]))->relfilenode;
+     ((Form_pg_class) GETSTRUCT(reltup[0]))->relfilenode =
+         ((Form_pg_class) GETSTRUCT(reltup[1]))->relfilenode;
+     ((Form_pg_class) GETSTRUCT(reltup[1]))->relfilenode = tempRFNode;
+
+     /* Update the RelationRelationName tuples */
+     simple_heap_update(relRelation, &reltup[0]->t_self, reltup[0]);
+     simple_heap_update(relRelation, &reltup[1]->t_self, reltup[1]);
+
+     /* Keep system catalogs current */
+     CatalogOpenIndices(Num_pg_class_indices, Name_pg_class_indices, irels);
+     CatalogIndexInsert(irels, Num_pg_class_indices, relRelation, reltup[0]);
+     CatalogIndexInsert(irels, Num_pg_class_indices, relRelation, reltup[1]);
+     CatalogCloseIndices(Num_pg_class_indices, irels);
+
+     heap_close(relRelation, RowExclusiveLock);
+     heap_freetuple(reltup[0]);
+     heap_freetuple(reltup[1]);
+
+     CommandCounterIncrement();
+
+     // bjm
+ //    RelationIdInvalidateRelationCacheByRelationId(r1);
+ //    RelationIdInvalidateRelationCacheByRelationId(r2);
+
+     RelationClearRelation(RelationIdGetRelation(r1), true);
+ //       RelationClearRelation(RelationIdGetRelation(r2), true);
+
+     CommandCounterIncrement();
  }
Index: src/backend/utils/cache/relcache.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/cache/relcache.c,v
retrieving revision 1.166
diff -c -r1.166 relcache.c
*** src/backend/utils/cache/relcache.c    12 Jul 2002 18:43:18 -0000    1.166
--- src/backend/utils/cache/relcache.c    14 Jul 2002 06:38:55 -0000
***************
*** 238,249 ****
                                             (void *)&(RELATION->rd_id), \
                                             HASH_REMOVE, NULL); \
      if (idhentry == NULL) \
!         elog(WARNING, "trying to delete a reldesc that does not exist."); \
      nodentry = (RelNodeCacheEnt*)hash_search(RelationNodeCache, \
                                             (void *)&(RELATION->rd_node), \
                                             HASH_REMOVE, NULL); \
      if (nodentry == NULL) \
!         elog(WARNING, "trying to delete a reldesc that does not exist."); \
      if (IsSystemNamespace(RelationGetNamespace(RELATION))) \
      { \
          char *relname = RelationGetRelationName(RELATION); \
--- 238,249 ----
                                             (void *)&(RELATION->rd_id), \
                                             HASH_REMOVE, NULL); \
      if (idhentry == NULL) \
!         elog(WARNING, "trying to delete a reldesc (rd_id) that does not exist."); \
      nodentry = (RelNodeCacheEnt*)hash_search(RelationNodeCache, \
                                             (void *)&(RELATION->rd_node), \
                                             HASH_REMOVE, NULL); \
      if (nodentry == NULL) \
!         elog(WARNING, "trying to delete a reldesc (rd_node) that does not exist."); \
      if (IsSystemNamespace(RelationGetNamespace(RELATION))) \
      { \
          char *relname = RelationGetRelationName(RELATION); \
***************
*** 252,258 ****
                                                     relname, \
                                                     HASH_REMOVE, NULL); \
          if (namehentry == NULL) \
!             elog(WARNING, "trying to delete a reldesc that does not exist."); \
      } \
  } while(0)

--- 252,258 ----
                                                     relname, \
                                                     HASH_REMOVE, NULL); \
          if (namehentry == NULL) \
!             elog(WARNING, "trying to delete a reldesc (relname) that does not exist."); \
      } \
  } while(0)

***************
*** 276,282 ****

  /* non-export function prototypes */

! static void RelationClearRelation(Relation relation, bool rebuildIt);

  #ifdef    ENABLE_REINDEX_NAILED_RELATIONS
  static void RelationReloadClassinfo(Relation relation);
--- 276,282 ----

  /* non-export function prototypes */

! //static void RelationClearRelation(Relation relation, bool rebuildIt);

  #ifdef    ENABLE_REINDEX_NAILED_RELATIONS
  static void RelationReloadClassinfo(Relation relation);
***************
*** 683,689 ****
       */
      rewrite_desc = heap_openr(RewriteRelationName, AccessShareLock);
      rewrite_tupdesc = RelationGetDescr(rewrite_desc);
!     rewrite_scan = systable_beginscan(rewrite_desc,
                                        RewriteRelRulenameIndex,
                                        true, SnapshotNow,
                                        1, &key);
--- 683,689 ----
       */
      rewrite_desc = heap_openr(RewriteRelationName, AccessShareLock);
      rewrite_tupdesc = RelationGetDescr(rewrite_desc);
!     rewrite_scan = systable_beginscan(rewrite_desc,
                                        RewriteRelRulenameIndex,
                                        true, SnapshotNow,
                                        1, &key);
***************
*** 1651,1657 ****
   *     (one with refcount > 0).  However, this routine just does whichever
   *     it's told to do; callers must determine which they want.
   */
! static void
  RelationClearRelation(Relation relation, bool rebuildIt)
  {
      MemoryContext oldcxt;
--- 1651,1658 ----
   *     (one with refcount > 0).  However, this routine just does whichever
   *     it's told to do; callers must determine which they want.
   */
! //bjm
! void
  RelationClearRelation(Relation relation, bool rebuildIt)
  {
      MemoryContext oldcxt;

Re: CLUSTER patch

From
Tom Lane
Date:
Alvaro Herrera <alvherre@atentus.com> writes:
> The way the cluster is done is still the same: first cluster the heap
> and swap relfilenodes, then drop old heap; then rebuild each index, swap
> relfilenodes with old index and drop new.

I do not see anything in this patch that touches relfilenode.  Perhaps
the patch is incomplete?

> But as soon as I try to do anything to it (the new,
> clustered filenode) before transaction commit (building an index, say),
> the local buffer manager fails an assertion (actually,
> RelationNodeCacheGetRelation returns 0 for the given rnode), and the
> transaction aborts.

Hmm.  If you do the swap in the correct way (viz, update the relation's
pg_class entry and then CommandCounterIncrement) I'd expect the relcache
to respond correctly.  This does involve re-indexing the relcache entry
under a new relfilenode value, but that's not significantly different
from the case of renaming a relation.

            regards, tom lane

Re: CLUSTER patch

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> +     CommandCounterIncrement();
> +
> +     // bjm
> + //    RelationIdInvalidateRelationCacheByRelationId(r1);
> + //    RelationIdInvalidateRelationCacheByRelationId(r2);
> +
> +     RelationClearRelation(RelationIdGetRelation(r1), true);
> + //       RelationClearRelation(RelationIdGetRelation(r2), true);
> +
> +     CommandCounterIncrement();

Surely the above is neither necessary nor appropriate.  The relcache
should automatically rebuild its entries for these relations at
CommandCounterIncrement.  In any case I do not care for exporting
internal relcache routines to make CLUSTER work ...

            regards, tom lane

Re: CLUSTER patch

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > +     CommandCounterIncrement();
> > +
> > +     // bjm
> > + //    RelationIdInvalidateRelationCacheByRelationId(r1);
> > + //    RelationIdInvalidateRelationCacheByRelationId(r2);
> > +
> > +     RelationClearRelation(RelationIdGetRelation(r1), true);
> > + //       RelationClearRelation(RelationIdGetRelation(r2), true);
> > +
> > +     CommandCounterIncrement();
>
> Surely the above is neither necessary nor appropriate.  The relcache
> should automatically rebuild its entries for these relations at

New patch attached.  Something like this is required or
heap_drop/index_drop will fail because it can't find the relation cache
entries for the relation.  Maybe the trick is to properly invalidate the
relation caches when pg_class is updated.  This is particularly
significant for thisxactonly relations.  I don't think we need to handle
oid changes in pg_class, but relfilenode changes are not updated,
causing problems down the road.  The attached patch actually does work
with the following warnings:

    test=> cluster i on test;
    WARNING:  trying to delete a reldesc (rd_node) that does not exist.
    WARNING:  trying to delete a reldesc (rd_node) that does not exist.
    CLUSTER

My guess is that the proper fix is elsewhere.  I looked through
relcache.c and didn't see any case where a relfilenode could be
invalidated _without_ removing the old entry first, but of course, the
old entry has a different value from the new one.  My guess is that the
work is to be done there somewhere.

My cache forget calls work because it forces new cache entries to match
pg_class, and that way the other commands can succeed.

> CommandCounterIncrement.  In any case I do not care for exporting
> internal relcache routines to make CLUSTER work ...

I need to get cluster working before I can worry about style.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
Index: src/backend/commands/cluster.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.83
diff -c -r1.83 cluster.c
*** src/backend/commands/cluster.c    12 Jul 2002 18:43:15 -0000    1.83
--- src/backend/commands/cluster.c    14 Jul 2002 17:03:37 -0000
***************
*** 15,21 ****
   *
   *
   * IDENTIFICATION
!  *      $Header: /cvsroot/pgsql/src/backend/commands/cluster.c,v 1.83 2002/07/12 18:43:15 tgl Exp $
   *
   *-------------------------------------------------------------------------
   */
--- 15,21 ----
   *
   *
   * IDENTIFICATION
!  *      $Header: /cvsroot/pgsql/src/backend/commands/cluster.c,v 1.82 2002/06/20 20:29:26 momjian Exp $
   *
   *-------------------------------------------------------------------------
   */
***************
*** 24,74 ****

  #include "access/genam.h"
  #include "access/heapam.h"
- #include "catalog/dependency.h"
  #include "catalog/heap.h"
  #include "catalog/index.h"
  #include "catalog/pg_index.h"
  #include "catalog/pg_proc.h"
  #include "commands/cluster.h"
  #include "commands/tablecmds.h"
  #include "miscadmin.h"
  #include "utils/builtins.h"
  #include "utils/lsyscache.h"
  #include "utils/syscache.h"


  static Oid    copy_heap(Oid OIDOldHeap, const char *NewName);
- static Oid    copy_index(Oid OIDOldIndex, Oid OIDNewHeap,
-                        const char *NewIndexName);
  static void rebuildheap(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex);

  /*
   * cluster
   *
   * STILL TO DO:
!  *     Create a list of all the other indexes on this relation. Because
!  *     the cluster will wreck all the tids, I'll need to destroy bogus
!  *     indexes. The user will have to re-create them. Not nice, but
!  *     I'm not a nice guy. The alternative is to try some kind of post
!  *     destroy re-build. This may be possible. I'll check out what the
!  *     index create functiond want in the way of paramaters. On the other
!  *     hand, re-creating n indexes may blow out the space.
   */
  void
  cluster(RangeVar *oldrelation, char *oldindexname)
  {
      Oid            OIDOldHeap,
                  OIDOldIndex,
!                 OIDNewHeap,
!                 OIDNewIndex;
      Relation    OldHeap,
                  OldIndex;
      char        NewHeapName[NAMEDATALEN];
!     char        NewIndexName[NAMEDATALEN];
      ObjectAddress object;

      /*
!      * We grab exclusive access to the target rel and index for the
       * duration of the transaction.
       */
      OldHeap = heap_openrv(oldrelation, AccessExclusiveLock);
--- 24,99 ----

  #include "access/genam.h"
  #include "access/heapam.h"
  #include "catalog/heap.h"
+ #include "catalog/dependency.h"
  #include "catalog/index.h"
+ #include "catalog/indexing.h"
+ #include "catalog/catname.h"
  #include "catalog/pg_index.h"
  #include "catalog/pg_proc.h"
  #include "commands/cluster.h"
  #include "commands/tablecmds.h"
  #include "miscadmin.h"
  #include "utils/builtins.h"
+ #include "utils/fmgroids.h"
  #include "utils/lsyscache.h"
+ #include "utils/relcache.h"
  #include "utils/syscache.h"

+ /*
+  * We need one of these structs for each index in the relation to be
+  * clustered.  It's basically the data needed by index_create() so
+  * we can recreate the indexes after destroying the old heap.
+  */
+ typedef struct
+ {
+     char       *indexName;
+     IndexInfo  *indexInfo;
+     Oid            accessMethodOID;
+     Oid           *classOID;
+     Oid            indexOID;
+     bool        isPrimary;
+ } IndexAttrs;

  static Oid    copy_heap(Oid OIDOldHeap, const char *NewName);
  static void rebuildheap(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex);
+ List *get_indexattr_list (Oid OIDOldHeap);
+ void recreate_indexattr(Oid OIDOldHeap, List *indexes);
+ void swap_relfilenodes(Oid r1, Oid r2);
+
+ void RelationFlushRelation(Relation relation);

  /*
   * cluster
   *
   * STILL TO DO:
!  *   Keep foreign keys, permissions and inheritance of the clustered table.
!  *
!  *   We need to look at making use of the ability to write a new version of a
!  *   table (or index) under a new relfilenode value, without changing the
!  *   table's OID.
   */
  void
  cluster(RangeVar *oldrelation, char *oldindexname)
  {
      Oid            OIDOldHeap,
                  OIDOldIndex,
!                 OIDNewHeap;
      Relation    OldHeap,
                  OldIndex;
      char        NewHeapName[NAMEDATALEN];
!     List       *indexes;
      ObjectAddress object;

+     /* The games with filenodes may not be rollbackable, so
+      * disallow running this inside a transaction block.
+      * This may be a false assumption though.
+      */
+     if (IsTransactionBlock())
+         elog (ERROR, "CLUSTER: may not be called inside a transaction block");
+
      /*
!      * We grab exclusive access to the target relation for the
       * duration of the transaction.
       */
      OldHeap = heap_openrv(oldrelation, AccessExclusiveLock);
***************
*** 96,143 ****
      heap_close(OldHeap, NoLock);
      index_close(OldIndex);

!     /*
!      * Create the new heap with a temporary name.
!      */
!     snprintf(NewHeapName, NAMEDATALEN, "temp_%u", OIDOldHeap);

      OIDNewHeap = copy_heap(OIDOldHeap, NewHeapName);

      /* We do not need CommandCounterIncrement() because copy_heap did it. */

!     /*
!      * Copy the heap data into the new table in the desired order.
!      */
      rebuildheap(OIDNewHeap, OIDOldHeap, OIDOldIndex);

      /* To make the new heap's data visible. */
      CommandCounterIncrement();

-     /* Create new index over the tuples of the new heap. */
-     snprintf(NewIndexName, NAMEDATALEN, "temp_%u", OIDOldIndex);

!     OIDNewIndex = copy_index(OIDOldIndex, OIDNewHeap, NewIndexName);

!     CommandCounterIncrement();

!     /* Destroy old heap (along with its index) and rename new. */
      object.classId = RelOid_pg_class;
!     object.objectId = OIDOldHeap;
      object.objectSubId = 0;

      /* XXX better to use DROP_CASCADE here? */
      performDeletion(&object, DROP_RESTRICT);
-
      /* performDeletion does CommandCounterIncrement at end */
-
-     renamerel(OIDNewHeap, oldrelation->relname);
-
-     /* This one might be unnecessary, but let's be safe. */
-     CommandCounterIncrement();
-
-     renamerel(OIDNewIndex, oldindexname);
  }

  static Oid
  copy_heap(Oid OIDOldHeap, const char *NewName)
  {
--- 121,164 ----
      heap_close(OldHeap, NoLock);
      index_close(OldIndex);

!     /* Save the information of all indexes on the relation. */
!     indexes = get_indexattr_list(OIDOldHeap);

+     /* Create the new heap with a temporary name. */
+     snprintf(NewHeapName, NAMEDATALEN, "temp_%u", OIDOldHeap);
      OIDNewHeap = copy_heap(OIDOldHeap, NewHeapName);

      /* We do not need CommandCounterIncrement() because copy_heap did it. */

!     /* Copy the heap data into the new table in the desired order. */
      rebuildheap(OIDNewHeap, OIDOldHeap, OIDOldIndex);

      /* To make the new heap's data visible. */
      CommandCounterIncrement();


!     /* Swap the relfilenodes of the old and new heaps. */
!     swap_relfilenodes(OIDNewHeap, OIDOldHeap);

!     /* At this point, Old points to the new file, and new points to old */
!
!     /* Recreate the indexes on the relation.  We do not need
!      * CommandCounterIncrement() because recreate_indexattr does it.
!      */
!     recreate_indexattr(OIDOldHeap, indexes);

!     /* Destroy the new heap, carrying the old filenode along. */
      object.classId = RelOid_pg_class;
!     object.objectId = OIDNewHeap;
      object.objectSubId = 0;

      /* XXX better to use DROP_CASCADE here? */
      performDeletion(&object, DROP_RESTRICT);
      /* performDeletion does CommandCounterIncrement at end */
  }

+ /* Create and initialize the new heap
+  */
  static Oid
  copy_heap(Oid OIDOldHeap, const char *NewName)
  {
***************
*** 181,223 ****
      return OIDNewHeap;
  }

! static Oid
! copy_index(Oid OIDOldIndex, Oid OIDNewHeap, const char *NewIndexName)
! {
!     Oid            OIDNewIndex;
!     Relation    OldIndex,
!                 NewHeap;
!     IndexInfo  *indexInfo;
!
!     NewHeap = heap_open(OIDNewHeap, AccessExclusiveLock);
!     OldIndex = index_open(OIDOldIndex);
!
!     /*
!      * Create a new index like the old one.  To do this I get the info
!      * from pg_index, and add a new index with a temporary name (that will
!      * be changed later).
!      */
!     indexInfo = BuildIndexInfo(OldIndex->rd_index);
!
!     OIDNewIndex = index_create(OIDNewHeap,
!                                NewIndexName,
!                                indexInfo,
!                                OldIndex->rd_rel->relam,
!                                OldIndex->rd_index->indclass,
!                                OldIndex->rd_index->indisprimary,
!                                false, /* XXX losing constraint status */
!                                allowSystemTableMods);
!
!     setRelhasindex(OIDNewHeap, true,
!                    OldIndex->rd_index->indisprimary, InvalidOid);
!
!     index_close(OldIndex);
!     heap_close(NewHeap, NoLock);
!
!     return OIDNewIndex;
! }
!
!
  static void
  rebuildheap(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex)
  {
--- 202,209 ----
      return OIDNewHeap;
  }

! /* Load the data into the new heap, clustered.
!  */
  static void
  rebuildheap(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex)
  {
***************
*** 260,263 ****
--- 246,418 ----
      index_close(LocalOldIndex);
      heap_close(LocalOldHeap, NoLock);
      heap_close(LocalNewHeap, NoLock);
+ }
+
+ /* Get the necessary info about the indexes in the relation and
+  * return a List of IndexAttrs.
+  */
+ List *
+ get_indexattr_list (Oid OIDOldHeap)
+ {
+     ScanKeyData    entry;
+     HeapScanDesc scan;
+     Relation indexRelation;
+     HeapTuple indexTuple;
+     List *indexes = NIL;
+     IndexAttrs *attrs;
+     HeapTuple tuple;
+     Form_pg_index index;
+
+     /* Grab the index tuples by looking into RelationRelationName
+      * by the OID of the old heap.
+      */
+     indexRelation = heap_openr(IndexRelationName, AccessShareLock);
+     ScanKeyEntryInitialize(&entry, 0, Anum_pg_index_indrelid,
+             F_OIDEQ, ObjectIdGetDatum(OIDOldHeap));
+     scan = heap_beginscan(indexRelation, SnapshotNow, 1, &entry);
+     while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+     {
+         index = (Form_pg_index) GETSTRUCT(indexTuple);
+
+         attrs = (IndexAttrs *) palloc(sizeof(IndexAttrs));
+         attrs->indexInfo = BuildIndexInfo(index);
+         attrs->isPrimary = index->indisprimary;
+         attrs->indexOID = index->indexrelid;
+
+         /* The opclasses are copied verbatim from the original indexes.
+         */
+         attrs->classOID = (Oid *)palloc(sizeof(Oid) *
+                 attrs->indexInfo->ii_NumIndexAttrs);
+         memcpy(attrs->classOID, index->indclass,
+                 sizeof(Oid) * attrs->indexInfo->ii_NumIndexAttrs);
+
+         /* Name and access method of each index come from
+          * RelationRelationName.
+          */
+         tuple = SearchSysCache(RELOID,
+                 ObjectIdGetDatum(attrs->indexOID),
+                 0, 0, 0);
+         if (!HeapTupleIsValid(tuple))
+             elog(ERROR, "CLUSTER: cannot find index %u", attrs->indexOID);
+         attrs->indexName = pstrdup(NameStr(((Form_pg_class) GETSTRUCT(tuple))->relname));
+         attrs->accessMethodOID = ((Form_pg_class) GETSTRUCT(tuple))->relam;
+         ReleaseSysCache(tuple);
+
+         /* Cons the gathered data into the list.  We do not care about
+          * ordering, and this is more efficient than append.
+          */
+         indexes=lcons((void *)attrs, indexes);
+     }
+     heap_endscan(scan);
+     heap_close(indexRelation, NoLock);
+     return indexes;
+ }
+
+ /* Create new indexes and swap the filenodes with old indexes.  Then drop
+  * the new index (carrying the old heap along).
+  */
+ void
+ recreate_indexattr(Oid OIDOldHeap, List *indexes)
+ {
+     IndexAttrs *attrs;
+     List        *elem;
+     Oid            newIndexOID;
+     char        newIndexName[NAMEDATALEN];
+
+     foreach (elem, indexes)
+     {
+         attrs=(IndexAttrs *) lfirst(elem);
+
+         /* Create the new index under a temporary name */
+         snprintf(newIndexName, NAMEDATALEN, "temp_%u", attrs->indexOID);
+         newIndexOID = index_create(OIDOldHeap, newIndexName,
+                                    attrs->indexInfo, attrs->accessMethodOID,
+                                    attrs->classOID, attrs->isPrimary,
+                                    false, /* XXX losing constraint status */
+                                    allowSystemTableMods);
+         CommandCounterIncrement();
+
+         /* Swap the filenodes. */
+         swap_relfilenodes(newIndexOID, attrs->indexOID);
+         setRelhasindex(OIDOldHeap, true, attrs->isPrimary, InvalidOid);
+
+         /* I'm not sure this one is needed, but let's be safe. */
+         CommandCounterIncrement();
+
+         /* Drop the new index, carrying the old filenode along. */
+         index_drop(newIndexOID);
+         CommandCounterIncrement();
+
+         pfree(attrs->classOID);
+         pfree(attrs);
+     }
+     freeList(indexes);
+ }
+
+ /* Swap the relfilenodes for two given relations.
+  */
+ void
+ swap_relfilenodes(Oid r1, Oid r2)
+ {
+     /* I can probably keep RelationRelationName open in the main
+      * function and pass the Relation around so I don't have to open
+      * it avery time.
+      */
+     Relation    relRelation,
+                 rel1, rel2,
+                 irels[Num_pg_class_indices];
+     HeapTuple    reltup[2];
+     Oid            tempRFNode;
+
+     rel1 = RelationIdGetRelation(r1);
+     rel2 = RelationIdGetRelation(r2);
+     RelationFlushRelation(rel1);
+        RelationFlushRelation(rel2);
+     RelationDecrementReferenceCount(rel1);
+     RelationDecrementReferenceCount(rel2);
+
+     CommandCounterIncrement();
+     /* We need both RelationRelationName tuples.  */
+     relRelation = heap_openr(RelationRelationName, RowExclusiveLock);
+
+     reltup[0] = SearchSysCacheCopy(RELOID,
+                                    ObjectIdGetDatum(r1),
+                                    0, 0, 0);
+     if (!HeapTupleIsValid(reltup[0]))
+         elog(ERROR, "CLUSTER: Cannot find tuple for relation %u", r1);
+     reltup[1] = SearchSysCacheCopy(RELOID,
+                                    ObjectIdGetDatum(r2),
+                                    0, 0, 0);
+     if (!HeapTupleIsValid(reltup[1]))
+         elog(ERROR, "CLUSTER: Cannot find tuple for relation %u", r2);
+
+     /* Swap the filenodes */
+     tempRFNode = ((Form_pg_class) GETSTRUCT(reltup[0]))->relfilenode;
+     ((Form_pg_class) GETSTRUCT(reltup[0]))->relfilenode =
+         ((Form_pg_class) GETSTRUCT(reltup[1]))->relfilenode;
+     ((Form_pg_class) GETSTRUCT(reltup[1]))->relfilenode = tempRFNode;
+
+     /* Update the RelationRelationName tuples */
+     simple_heap_update(relRelation, &reltup[0]->t_self, reltup[0]);
+     simple_heap_update(relRelation, &reltup[1]->t_self, reltup[1]);
+
+     /* Keep system catalogs current */
+     CatalogOpenIndices(Num_pg_class_indices, Name_pg_class_indices, irels);
+     CatalogIndexInsert(irels, Num_pg_class_indices, relRelation, reltup[0]);
+     CatalogIndexInsert(irels, Num_pg_class_indices, relRelation, reltup[1]);
+     CatalogCloseIndices(Num_pg_class_indices, irels);
+
+     heap_close(relRelation, RowExclusiveLock);
+     heap_freetuple(reltup[0]);
+     heap_freetuple(reltup[1]);
+
+     CommandCounterIncrement();
+
+     rel1 = RelationIdGetRelation(r1);
+     rel2 = RelationIdGetRelation(r2);
+        RelationFlushRelation(rel1);
+        RelationFlushRelation(rel2);
+     RelationDecrementReferenceCount(rel1);
+     RelationDecrementReferenceCount(rel2);
+     CommandCounterIncrement();
  }
Index: src/backend/utils/cache/relcache.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/cache/relcache.c,v
retrieving revision 1.166
diff -c -r1.166 relcache.c
*** src/backend/utils/cache/relcache.c    12 Jul 2002 18:43:18 -0000    1.166
--- src/backend/utils/cache/relcache.c    14 Jul 2002 17:03:46 -0000
***************
*** 238,249 ****
                                             (void *)&(RELATION->rd_id), \
                                             HASH_REMOVE, NULL); \
      if (idhentry == NULL) \
!         elog(WARNING, "trying to delete a reldesc that does not exist."); \
      nodentry = (RelNodeCacheEnt*)hash_search(RelationNodeCache, \
                                             (void *)&(RELATION->rd_node), \
                                             HASH_REMOVE, NULL); \
      if (nodentry == NULL) \
!         elog(WARNING, "trying to delete a reldesc that does not exist."); \
      if (IsSystemNamespace(RelationGetNamespace(RELATION))) \
      { \
          char *relname = RelationGetRelationName(RELATION); \
--- 238,249 ----
                                             (void *)&(RELATION->rd_id), \
                                             HASH_REMOVE, NULL); \
      if (idhentry == NULL) \
!         elog(WARNING, "trying to delete a reldesc (rd_id) that does not exist."); \
      nodentry = (RelNodeCacheEnt*)hash_search(RelationNodeCache, \
                                             (void *)&(RELATION->rd_node), \
                                             HASH_REMOVE, NULL); \
      if (nodentry == NULL) \
!         elog(WARNING, "trying to delete a reldesc (rd_node) that does not exist."); \
      if (IsSystemNamespace(RelationGetNamespace(RELATION))) \
      { \
          char *relname = RelationGetRelationName(RELATION); \
***************
*** 252,258 ****
                                                     relname, \
                                                     HASH_REMOVE, NULL); \
          if (namehentry == NULL) \
!             elog(WARNING, "trying to delete a reldesc that does not exist."); \
      } \
  } while(0)

--- 252,258 ----
                                                     relname, \
                                                     HASH_REMOVE, NULL); \
          if (namehentry == NULL) \
!             elog(WARNING, "trying to delete a reldesc (relname) that does not exist."); \
      } \
  } while(0)

***************
*** 281,287 ****
  #ifdef    ENABLE_REINDEX_NAILED_RELATIONS
  static void RelationReloadClassinfo(Relation relation);
  #endif   /* ENABLE_REINDEX_NAILED_RELATIONS */
! static void RelationFlushRelation(Relation relation);
  static Relation RelationSysNameCacheGetRelation(const char *relationName);
  static bool load_relcache_init_file(void);
  static void write_relcache_init_file(void);
--- 281,287 ----
  #ifdef    ENABLE_REINDEX_NAILED_RELATIONS
  static void RelationReloadClassinfo(Relation relation);
  #endif   /* ENABLE_REINDEX_NAILED_RELATIONS */
! //static void RelationFlushRelation(Relation relation);
  static Relation RelationSysNameCacheGetRelation(const char *relationName);
  static bool load_relcache_init_file(void);
  static void write_relcache_init_file(void);
***************
*** 683,689 ****
       */
      rewrite_desc = heap_openr(RewriteRelationName, AccessShareLock);
      rewrite_tupdesc = RelationGetDescr(rewrite_desc);
!     rewrite_scan = systable_beginscan(rewrite_desc,
                                        RewriteRelRulenameIndex,
                                        true, SnapshotNow,
                                        1, &key);
--- 683,689 ----
       */
      rewrite_desc = heap_openr(RewriteRelationName, AccessShareLock);
      rewrite_tupdesc = RelationGetDescr(rewrite_desc);
!     rewrite_scan = systable_beginscan(rewrite_desc,
                                        RewriteRelRulenameIndex,
                                        true, SnapshotNow,
                                        1, &key);
***************
*** 1651,1657 ****
   *     (one with refcount > 0).  However, this routine just does whichever
   *     it's told to do; callers must determine which they want.
   */
! static void
  RelationClearRelation(Relation relation, bool rebuildIt)
  {
      MemoryContext oldcxt;
--- 1651,1658 ----
   *     (one with refcount > 0).  However, this routine just does whichever
   *     it's told to do; callers must determine which they want.
   */
! //bjm
! void
  RelationClearRelation(Relation relation, bool rebuildIt)
  {
      MemoryContext oldcxt;
***************
*** 1803,1809 ****
   *
   *     Rebuild the relation if it is open (refcount > 0), else blow it away.
   */
! static void
  RelationFlushRelation(Relation relation)
  {
      bool        rebuildIt;
--- 1804,1810 ----
   *
   *     Rebuild the relation if it is open (refcount > 0), else blow it away.
   */
! void
  RelationFlushRelation(Relation relation)
  {
      bool        rebuildIt;

Re: CLUSTER patch

From
Alvaro Herrera
Date:
Tom Lane dijo:

> Alvaro Herrera <alvherre@atentus.com> writes:
> > The way the cluster is done is still the same: first cluster the heap
> > and swap relfilenodes, then drop old heap; then rebuild each index, swap
> > relfilenodes with old index and drop new.
>
> I do not see anything in this patch that touches relfilenode.  Perhaps
> the patch is incomplete?

No, this is the old version, corrected according your comments, for
inclusion in case the new version doesn't make it.

Perhaps you missed the patch I posted some days ago that did swap
relfilenodes (there was even a function named swap_relfilenode, so it
wasn't easy to miss).  It's archived in
http://archives.postgresql.org/pgsql-patches/2002-07/msg00079.php.  The
code in question is this:

    tempRFNode = ((Form_pg_class) GETSTRUCT(reltup[0]))->relfilenode;
    ((Form_pg_class) GETSTRUCT(reltup[0]))->relfilenode =
        ((Form_pg_class) GETSTRUCT(reltup[1]))->relfilenode;
    ((Form_pg_class) GETSTRUCT(reltup[1]))->relfilenode = tempRFNode;

    /* Update the RelationRelationName tuples */
    simple_heap_update(relRelation, &reltup[1]->t_self, reltup[1]);
    simple_heap_update(relRelation, &reltup[0]->t_self, reltup[0]);

Now, if I do CommandCounterIncrement() right after this, I get "Relation
deleted while still in use".  So I add

    CatalogOpenIndices(Num_pg_class_indices, Name_pg_class_indices, irels);
    CatalogIndexInsert(irels, Num_pg_class_indices, relRelation, reltup[1]);
    CatalogIndexInsert(irels, Num_pg_class_indices, relRelation, reltup[0]);
    CatalogCloseIndices(Num_pg_class_indices, irels);

and only then the CommandCounterIncrement.  But the server says
TRAP: Failed Assertion("!(bufrel != ((void *)0)):", File: "localbuf.c",
Line: 242)

(I think it's line 241 in pristine source).  I tried doing
CatalogIndexInsert and CommandCounterIncrement after each
simple_heap_update, but the result is the same.  This assertion is
failed at transaction commit, when LocalBufferSync is called.

> > But as soon as I try to do anything to it (the new,
> > clustered filenode) before transaction commit (building an index, say),
> > the local buffer manager fails an assertion (actually,
> > RelationNodeCacheGetRelation returns 0 for the given rnode), and the
> > transaction aborts.
>
> Hmm.  If you do the swap in the correct way (viz, update the relation's
> pg_class entry and then CommandCounterIncrement) I'd expect the relcache
> to respond correctly.  This does involve re-indexing the relcache entry
> under a new relfilenode value, but that's not significantly different
> from the case of renaming a relation.

That's what I think I'm doing, but I don't know what's the relcache
doing; other than I expect, I pressume.  I tried using
RelationForgetRelation on both relations (following Bruce's idea of
invalidating the relcache entries), but I get lost in the relcache, so
I'm now in the dark.

--
Alvaro Herrera (<alvherre[a]atentus.com>)
"The ability to monopolize a planet is insignificant
next to the power of the source"



Re: CLUSTER patch

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> New patch attached.  Something like this is required or
> heap_drop/index_drop will fail because it can't find the relation cache
> entries for the relation.  Maybe the trick is to properly invalidate the
> relation caches when pg_class is updated.

They should be updated *automatically* --- otherwise CLUSTER is hardly
the only thing that will fail.

> This is particularly
> significant for thisxactonly relations.

Yes.  After thinking awhile I realize that the real problem is that we
are trying to swap between an existing relation (!rd_myxactonly) and
a new relation (rd_myxactonly).  Buffers for one live in the main
buffer pool, for the other in the local buffer pool.  There's also the
little matter of the local state inside relcache.c.  While the update
to pg_class should make the right things happen to relfilenode, it
doesn't do anything to cause a change in rd_myxactonly status.

Not sure what to do about this.  Will think more.

            regards, tom lane

Re: CLUSTER patch

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> Tom Lane dijo:
>
> > Alvaro Herrera <alvherre@atentus.com> writes:
> > > The way the cluster is done is still the same: first cluster the heap
> > > and swap relfilenodes, then drop old heap; then rebuild each index, swap
> > > relfilenodes with old index and drop new.
> >
> > I do not see anything in this patch that touches relfilenode.  Perhaps
> > the patch is incomplete?
>
> No, this is the old version, corrected according your comments, for
> inclusion in case the new version doesn't make it.

I sent a new version of your patch out that got farther, and it does
have those swap lines.

> Now, if I do CommandCounterIncrement() right after this, I get "Relation
> deleted while still in use".  So I add
>
>     CatalogOpenIndices(Num_pg_class_indices, Name_pg_class_indices, irels);
>     CatalogIndexInsert(irels, Num_pg_class_indices, relRelation, reltup[1]);
>     CatalogIndexInsert(irels, Num_pg_class_indices, relRelation, reltup[0]);
>     CatalogCloseIndices(Num_pg_class_indices, irels);

Yes, that is required.

> and only then the CommandCounterIncrement.  But the server says
> TRAP: Failed Assertion("!(bufrel != ((void *)0)):", File: "localbuf.c",
> Line: 242)

Yes, the problem here is that dirty buffers can't look up the proper
relation to figure out how to complete the transaction.  Reloading the
cache after the swap seems to fix it, but the invalidation itself throws
WARNINGS because the relfilenode's of the old and new relations
structures don't match, and when it tries to delete the first one, it
throws the warning.

> (I think it's line 241 in pristine source).  I tried doing
> CatalogIndexInsert and CommandCounterIncrement after each
> simple_heap_update, but the result is the same.  This assertion is
> failed at transaction commit, when LocalBufferSync is called.

Yep, I think everything now points to changes needed in relcache.c to
handle the new condition of a relation swapping its relnode.  I looked
at the REINDEX code and it doesn't seem to have this problem.  Not sure
why.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: CLUSTER patch

From
Bruce Momjian
Date:
Tom Lane wrote:
> > This is particularly
> > significant for thisxactonly relations.
>
> Yes.  After thinking awhile I realize that the real problem is that we
> are trying to swap between an existing relation (!rd_myxactonly) and
> a new relation (rd_myxactonly).  Buffers for one live in the main
> buffer pool, for the other in the local buffer pool.  There's also the
> little matter of the local state inside relcache.c.  While the update
> to pg_class should make the right things happen to relfilenode, it
> doesn't do anything to cause a change in rd_myxactonly status.

Yes, I had one intermediate patch where I was calling
RelationClearRelation directly and not using the reload flag computed by
RelationForgetRelation.  Then I found RelationFlushRelation and that
seemed to work better.

This case is actually more complicated because after the swap, pg_class
still says thisxactonly but the relfilenode points to non-thisactonly
buffers, and via versa.  One of the key things is that thisxactonly
relations can't be forgotten because there are local buffers associated
with the relation, or something like that.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania
19026

Re: CLUSTER patch

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Yep, I think everything now points to changes needed in relcache.c to
> handle the new condition of a relation swapping its relnode.

I still don't believe that any changes are needed in relcache.c.
In particular, after thinking more I see that we do not need to change
the myxactonly status of either relation: the temp one is still temp,
the original one is still not.

What we do need to do is to flush all buffers for the two rels from the
local and global buffer caches respectively, since after the swap they
should get buffered in different places.  FlushRelationBuffers looks
like it will work for that.

            regards, tom lane

Re: CLUSTER patch

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> This case is actually more complicated because after the swap, pg_class
> still says thisxactonly but the relfilenode points to non-thisactonly
> buffers, and via versa.

pg_class doesn't say anything (if it did, we could make relcache.c
track it).

But yes, the problem is that myxactonly has to line up with where the
buffers are stored.  As I mentioned, I think it would work to flush out
all buffers for both rels before doing the swap.

            regards, tom lane

Re: CLUSTER patch

From
Alvaro Herrera
Date:
Tom Lane dijo:

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > This case is actually more complicated because after the swap, pg_class
> > still says thisxactonly but the relfilenode points to non-thisactonly
> > buffers, and via versa.
>
> pg_class doesn't say anything (if it did, we could make relcache.c
> track it).
>
> But yes, the problem is that myxactonly has to line up with where the
> buffers are stored.  As I mentioned, I think it would work to flush out
> all buffers for both rels before doing the swap.

It does indeed work.  Actually, I tried to do that before, but the fact
that RelationFlushBuffers requires a firstDelBlock made me turn around.

I attach a new patch against current cvs HEAD.  It works as far as I've
tested it, but I may be missing some cases.  I'll post a documentation
patch later if this gets applied.

--
Alvaro Herrera (<alvherre[a]atentus.com>)
"Some men are heterosexual, and some are bisexual, and some
men don't think about sex at all... they become lawyers" (Woody Allen)

Attachment

Re: CLUSTER patch

From
Tom Lane
Date:
Alvaro Herrera <alvherre@atentus.com> writes:
> It does indeed work.  Actually, I tried to do that before, but the fact
> that RelationFlushBuffers requires a firstDelBlock made me turn around.

firstDelBlock should be 0, not length-of-relation; as given, this code
fails to get rid of the buffer entries!

            regards, tom lane

Re: CLUSTER patch

From
Alvaro Herrera
Date:
Tom Lane dijo:

> Alvaro Herrera <alvherre@atentus.com> writes:
> > It does indeed work.  Actually, I tried to do that before, but the fact
> > that RelationFlushBuffers requires a firstDelBlock made me turn around.
>
> firstDelBlock should be 0, not length-of-relation; as given, this code
> fails to get rid of the buffer entries!

Oh, I failed to understand completely the purpose of firstDelBlock then.

Anyway, there's still a big problem with this patch: the pg_depend
information gets messed up after CLUSTER, so a clustered table cannot be
dropped.  I don't know why is this.

As I said yesterday, I'm going on vacation tomorrow, and probably will
not fix this issue.  I can look into it when I'm back, on two weeks.

I attach a little doc patch.

--
Alvaro Herrera (<alvherre[a]atentus.com>)
"La gente vulgar solo piensa en pasar el tiempo;
el que tiene talento, en aprovecharlo"

Attachment

Re: CLUSTER patch

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> Tom Lane dijo:
>
> > Alvaro Herrera <alvherre@atentus.com> writes:
> > > It does indeed work.  Actually, I tried to do that before, but the fact
> > > that RelationFlushBuffers requires a firstDelBlock made me turn around.
> >
> > firstDelBlock should be 0, not length-of-relation; as given, this code
> > fails to get rid of the buffer entries!
>
> Oh, I failed to understand completely the purpose of firstDelBlock then.
>
> Anyway, there's still a big problem with this patch: the pg_depend
> information gets messed up after CLUSTER, so a clustered table cannot be
> dropped.  I don't know why is this.

I actually backed out Tom's recent change to cluster.c (attached),
applied your patch, then manually applied Tom's patch to cluster.c so I
had a working version of your patch with the new dependencies.  Seemed
to work fine.   I will do the same when I apply you version.

> As I said yesterday, I'm going on vacation tomorrow, and probably will
> not fix this issue.  I can look into it when I'm back, on two weeks.

I will take care of applying your patch and the doc part too.  It will
be all done when you return.  Thanks for your work and have a nice
vacation.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
Index: cluster.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.82
retrieving revision 1.83
diff -c -r1.82 -r1.83
*** cluster.c    20 Jun 2002 20:29:26 -0000    1.82
--- cluster.c    12 Jul 2002 18:43:15 -0000    1.83
***************
*** 15,21 ****
   *
   *
   * IDENTIFICATION
!  *      $Header: /cvsroot/pgsql/src/backend/commands/cluster.c,v 1.82 2002/06/20 20:29:26 momjian Exp $
   *
   *-------------------------------------------------------------------------
   */
--- 15,21 ----
   *
   *
   * IDENTIFICATION
!  *      $Header: /cvsroot/pgsql/src/backend/commands/cluster.c,v 1.83 2002/07/12 18:43:15 tgl Exp $
   *
   *-------------------------------------------------------------------------
   */
***************
*** 24,29 ****
--- 24,30 ----

  #include "access/genam.h"
  #include "access/heapam.h"
+ #include "catalog/dependency.h"
  #include "catalog/heap.h"
  #include "catalog/index.h"
  #include "catalog/pg_index.h"
***************
*** 64,69 ****
--- 65,71 ----
                  OldIndex;
      char        NewHeapName[NAMEDATALEN];
      char        NewIndexName[NAMEDATALEN];
+     ObjectAddress object;

      /*
       * We grab exclusive access to the target rel and index for the
***************
*** 119,127 ****
      CommandCounterIncrement();

      /* Destroy old heap (along with its index) and rename new. */
!     heap_drop_with_catalog(OIDOldHeap, allowSystemTableMods);

!     CommandCounterIncrement();

      renamerel(OIDNewHeap, oldrelation->relname);

--- 121,134 ----
      CommandCounterIncrement();

      /* Destroy old heap (along with its index) and rename new. */
!     object.classId = RelOid_pg_class;
!     object.objectId = OIDOldHeap;
!     object.objectSubId = 0;

!     /* XXX better to use DROP_CASCADE here? */
!     performDeletion(&object, DROP_RESTRICT);
!
!     /* performDeletion does CommandCounterIncrement at end */

      renamerel(OIDNewHeap, oldrelation->relname);

***************
*** 198,203 ****
--- 205,211 ----
                                 OldIndex->rd_rel->relam,
                                 OldIndex->rd_index->indclass,
                                 OldIndex->rd_index->indisprimary,
+                                false, /* XXX losing constraint status */
                                 allowSystemTableMods);

      setRelhasindex(OIDNewHeap, true,

Re: CLUSTER patch

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> Anyway, there's still a big problem with this patch: the pg_depend
>> information gets messed up after CLUSTER, so a clustered table cannot be
>> dropped.  I don't know why is this.

> I actually backed out Tom's recent change to cluster.c (attached),
> applied your patch, then manually applied Tom's patch to cluster.c so I
> had a working version of your patch with the new dependencies.  Seemed
> to work fine.

The changes I made to cluster.c may or may not be correct in the
context of a redone CLUSTER implementation; it'll need to be looked at.

            regards, tom lane

Re: CLUSTER patch

From
Bruce Momjian
Date:
Tom Lane wrote:
> What we do need to do is to flush all buffers for the two rels from the
> local and global buffer caches respectively, since after the swap they
> should get buffered in different places.  FlushRelationBuffers looks
> like it will work for that.

Sure, if you want to take the easy solution.  ;-)

Good idea, just flush the buffers so we don't have to track the change.

Thanks for the assistance, Tom.  Getting a cluster that doesn't have
embarrassing shortcomings is a major improvement for 7.3.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: CLUSTER patch

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> Anyway, there's still a big problem with this patch: the pg_depend
> >> information gets messed up after CLUSTER, so a clustered table cannot be
> >> dropped.  I don't know why is this.
>
> > I actually backed out Tom's recent change to cluster.c (attached),
> > applied your patch, then manually applied Tom's patch to cluster.c so I
> > had a working version of your patch with the new dependencies.  Seemed
> > to work fine.
>
> The changes I made to cluster.c may or may not be correct in the
> context of a redone CLUSTER implementation; it'll need to be looked at.

Tom, you are probably right because the table being dropped doesn't have
anything associated with it, except that it has the same tuple
descriptor as the base table.  Wonder if that is going to create things
like defaults that need to be cascade deleted:

    OIDNewHeap = heap_create_with_catalog(NewName,
                  RelationGetNamespace(OldHeap),
                  tupdesc,
                  OldHeap->rd_rel->relkind,
                  OldHeap->rd_rel->relisshared,
                  OldHeap->rd_rel->relhasoids,
                  allowSystemTableMods);

If you can tell me which drop type is correct, I can apply the cluster patch.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: CLUSTER patch

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://candle.pha.pa.us/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

---------------------------------------------------------------------------


Alvaro Herrera wrote:
> Tom Lane dijo:
>
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > This case is actually more complicated because after the swap, pg_class
> > > still says thisxactonly but the relfilenode points to non-thisactonly
> > > buffers, and via versa.
> >
> > pg_class doesn't say anything (if it did, we could make relcache.c
> > track it).
> >
> > But yes, the problem is that myxactonly has to line up with where the
> > buffers are stored.  As I mentioned, I think it would work to flush out
> > all buffers for both rels before doing the swap.
>
> It does indeed work.  Actually, I tried to do that before, but the fact
> that RelationFlushBuffers requires a firstDelBlock made me turn around.
>
> I attach a new patch against current cvs HEAD.  It works as far as I've
> tested it, but I may be missing some cases.  I'll post a documentation
> patch later if this gets applied.
>
> --
> Alvaro Herrera (<alvherre[a]atentus.com>)
> "Some men are heterosexual, and some are bisexual, and some
> men don't think about sex at all... they become lawyers" (Woody Allen)

Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: CLUSTER patch

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://candle.pha.pa.us/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

---------------------------------------------------------------------------


Alvaro Herrera wrote:
> Tom Lane dijo:
>
> > Alvaro Herrera <alvherre@atentus.com> writes:
> > > It does indeed work.  Actually, I tried to do that before, but the fact
> > > that RelationFlushBuffers requires a firstDelBlock made me turn around.
> >
> > firstDelBlock should be 0, not length-of-relation; as given, this code
> > fails to get rid of the buffer entries!
>
> Oh, I failed to understand completely the purpose of firstDelBlock then.
>
> Anyway, there's still a big problem with this patch: the pg_depend
> information gets messed up after CLUSTER, so a clustered table cannot be
> dropped.  I don't know why is this.
>
> As I said yesterday, I'm going on vacation tomorrow, and probably will
> not fix this issue.  I can look into it when I'm back, on two weeks.
>
> I attach a little doc patch.
>
> --
> Alvaro Herrera (<alvherre[a]atentus.com>)
> "La gente vulgar solo piensa en pasar el tiempo;
> el que tiene talento, en aprovecharlo"

Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: CLUSTER patch

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I will try to apply it within the next 48 hours.

Given that this patch is specifically stated not to work:

>> Anyway, there's still a big problem with this patch: the pg_depend
>> information gets messed up after CLUSTER, so a clustered table cannot be
>> dropped.  I don't know why is this.

I object to it being applied now.

            regards, tom lane

Re: CLUSTER patch

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I will try to apply it within the next 48 hours.
>
> Given that this patch is specifically stated not to work:
>
> >> Anyway, there's still a big problem with this patch: the pg_depend
> >> information gets messed up after CLUSTER, so a clustered table cannot be
> >> dropped.  I don't know why is this.

Wow, that is strange since it has the same oid as before.

> I object to it being applied now.

Oh, I didn't see that part of his message; will remove from queue.  I
know I was waiting to hear if the proper way to drop the temp table was
using object drop or drop it directly.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: CLUSTER patch

From
Bruce Momjian
Date:
Doc patch mentions problem with dependency after CLUSTER.  Patch removed
from queue.

---------------------------------------------------------------------------

Alvaro Herrera wrote:
> Tom Lane dijo:
>
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > This case is actually more complicated because after the swap, pg_class
> > > still says thisxactonly but the relfilenode points to non-thisactonly
> > > buffers, and via versa.
> >
> > pg_class doesn't say anything (if it did, we could make relcache.c
> > track it).
> >
> > But yes, the problem is that myxactonly has to line up with where the
> > buffers are stored.  As I mentioned, I think it would work to flush out
> > all buffers for both rels before doing the swap.
>
> It does indeed work.  Actually, I tried to do that before, but the fact
> that RelationFlushBuffers requires a firstDelBlock made me turn around.
>
> I attach a new patch against current cvs HEAD.  It works as far as I've
> tested it, but I may be missing some cases.  I'll post a documentation
> patch later if this gets applied.
>
> --
> Alvaro Herrera (<alvherre[a]atentus.com>)
> "Some men are heterosexual, and some are bisexual, and some
> men don't think about sex at all... they become lawyers" (Woody Allen)

Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: CLUSTER patch

From
Alvaro Herrera
Date:
Bruce Momjian dijo:

> Doc patch mentions problem with dependency after CLUSTER.  Patch removed
> from queue.

It turns out I was using index_drop() rather than performDeletion().  I
have corrected it.  Please review this version.  Everything works here
as expected.

--
Alvaro Herrera (<alvherre[a]atentus.com>)
"La Primavera ha venido. Nadie sabe como ha sido" (A. Machado)

Attachment