Thread: refactor index build

refactor index build

From
Neil Conway
Date:
This patch refactors away some duplicated code in the index AM build
methods: they all invoke UpdateStats() since they have computed the
number of heap tuples, so I created a function in catalog/index.c that
each AM calls. This is per earlier discussion (it was included in the
GiST patches I submitted a while back, but I'm breaking those patches up
into smaller chunks now).

Barring any objections, I'll apply this later today or tomorrow.

-Neil
Index: src/backend/access/gist/gist.c
===================================================================
RCS file: /var/lib/cvs/pgsql/src/backend/access/gist/gist.c,v
retrieving revision 1.113
diff -c -r1.113 gist.c
*** src/backend/access/gist/gist.c    21 Mar 2005 01:23:56 -0000    1.113
--- src/backend/access/gist/gist.c    11 May 2005 02:27:22 -0000
***************
*** 184,210 ****

      /* okay, all heap tuples are indexed */

!     /*
!      * Since we just counted the tuples in the heap, we update its stats
!      * in pg_class to guarantee that the planner takes advantage of the
!      * index we just created.  But, only update statistics during normal
!      * index definitions, not for indices on system catalogs created
!      * during bootstrap processing.  We must close the relations before
!      * updating statistics to guarantee that the relcache entries are
!      * flushed when we increment the command counter in UpdateStats(). But
!      * we do not release any locks on the relations; those will be held
!      * until end of transaction.
!      */
!     if (IsNormalProcessingMode())
!     {
!         Oid            hrelid = RelationGetRelid(heap);
!         Oid            irelid = RelationGetRelid(index);
!
!         heap_close(heap, NoLock);
!         index_close(index);
!         UpdateStats(hrelid, reltuples);
!         UpdateStats(irelid, buildstate.indtuples);
!     }

      freeGISTstate(&buildstate.giststate);
  #ifdef GISTDEBUG
--- 184,191 ----

      /* okay, all heap tuples are indexed */

!     /* since we just counted the # of tuples, may as well update stats */
!     IndexCloseAndUpdateStats(heap, reltuples, index, buildstate.indtuples);

      freeGISTstate(&buildstate.giststate);
  #ifdef GISTDEBUG
Index: src/backend/access/hash/hash.c
===================================================================
RCS file: /var/lib/cvs/pgsql/src/backend/access/hash/hash.c,v
retrieving revision 1.78
diff -c -r1.78 hash.c
*** src/backend/access/hash/hash.c    27 Mar 2005 23:52:57 -0000    1.78
--- src/backend/access/hash/hash.c    11 May 2005 02:27:30 -0000
***************
*** 72,98 ****
      reltuples = IndexBuildHeapScan(heap, index, indexInfo,
                                  hashbuildCallback, (void *) &buildstate);

!     /*
!      * Since we just counted the tuples in the heap, we update its stats
!      * in pg_class to guarantee that the planner takes advantage of the
!      * index we just created.  But, only update statistics during normal
!      * index definitions, not for indices on system catalogs created
!      * during bootstrap processing.  We must close the relations before
!      * updating statistics to guarantee that the relcache entries are
!      * flushed when we increment the command counter in UpdateStats(). But
!      * we do not release any locks on the relations; those will be held
!      * until end of transaction.
!      */
!     if (IsNormalProcessingMode())
!     {
!         Oid            hrelid = RelationGetRelid(heap);
!         Oid            irelid = RelationGetRelid(index);
!
!         heap_close(heap, NoLock);
!         index_close(index);
!         UpdateStats(hrelid, reltuples);
!         UpdateStats(irelid, buildstate.indtuples);
!     }

      PG_RETURN_VOID();
  }
--- 72,79 ----
      reltuples = IndexBuildHeapScan(heap, index, indexInfo,
                                  hashbuildCallback, (void *) &buildstate);

!     /* since we just counted the # of tuples, may as well update stats */
!     IndexCloseAndUpdateStats(heap, reltuples, index, buildstate.indtuples);

      PG_RETURN_VOID();
  }
Index: src/backend/access/nbtree/nbtree.c
===================================================================
RCS file: /var/lib/cvs/pgsql/src/backend/access/nbtree/nbtree.c,v
retrieving revision 1.129
diff -c -r1.129 nbtree.c
*** src/backend/access/nbtree/nbtree.c    7 May 2005 21:32:23 -0000    1.129
--- src/backend/access/nbtree/nbtree.c    11 May 2005 02:27:40 -0000
***************
*** 148,174 ****
      }
  #endif   /* BTREE_BUILD_STATS */

!     /*
!      * Since we just counted the tuples in the heap, we update its stats
!      * in pg_class to guarantee that the planner takes advantage of the
!      * index we just created.  But, only update statistics during normal
!      * index definitions, not for indices on system catalogs created
!      * during bootstrap processing.  We must close the relations before
!      * updating statistics to guarantee that the relcache entries are
!      * flushed when we increment the command counter in UpdateStats(). But
!      * we do not release any locks on the relations; those will be held
!      * until end of transaction.
!      */
!     if (IsNormalProcessingMode())
!     {
!         Oid            hrelid = RelationGetRelid(heap);
!         Oid            irelid = RelationGetRelid(index);
!
!         heap_close(heap, NoLock);
!         index_close(index);
!         UpdateStats(hrelid, reltuples);
!         UpdateStats(irelid, buildstate.indtuples);
!     }

      PG_RETURN_VOID();
  }
--- 148,155 ----
      }
  #endif   /* BTREE_BUILD_STATS */

!     /* since we just counted the # of tuples, may as well update stats */
!     IndexCloseAndUpdateStats(heap, reltuples, index, buildstate.indtuples);

      PG_RETURN_VOID();
  }
Index: src/backend/access/rtree/rtree.c
===================================================================
RCS file: /var/lib/cvs/pgsql/src/backend/access/rtree/rtree.c,v
retrieving revision 1.88
diff -c -r1.88 rtree.c
*** src/backend/access/rtree/rtree.c    21 Mar 2005 01:24:00 -0000    1.88
--- src/backend/access/rtree/rtree.c    11 May 2005 02:27:47 -0000
***************
*** 142,168 ****

      /* okay, all heap tuples are indexed */

!     /*
!      * Since we just counted the tuples in the heap, we update its stats
!      * in pg_class to guarantee that the planner takes advantage of the
!      * index we just created.  But, only update statistics during normal
!      * index definitions, not for indices on system catalogs created
!      * during bootstrap processing.  We must close the relations before
!      * updating statistics to guarantee that the relcache entries are
!      * flushed when we increment the command counter in UpdateStats(). But
!      * we do not release any locks on the relations; those will be held
!      * until end of transaction.
!      */
!     if (IsNormalProcessingMode())
!     {
!         Oid            hrelid = RelationGetRelid(heap);
!         Oid            irelid = RelationGetRelid(index);
!
!         heap_close(heap, NoLock);
!         index_close(index);
!         UpdateStats(hrelid, reltuples);
!         UpdateStats(irelid, buildstate.indtuples);
!     }

      PG_RETURN_VOID();
  }
--- 142,149 ----

      /* okay, all heap tuples are indexed */

!     /* since we just counted the # of tuples, may as well update stats */
!     IndexCloseAndUpdateStats(heap, reltuples, index, buildstate.indtuples);

      PG_RETURN_VOID();
  }
Index: src/backend/catalog/index.c
===================================================================
RCS file: /var/lib/cvs/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.254
diff -c -r1.254 index.c
*** src/backend/catalog/index.c    6 May 2005 17:24:52 -0000    1.254
--- src/backend/catalog/index.c    11 May 2005 02:09:55 -0000
***************
*** 62,67 ****
--- 62,68 ----
                      Oid *classOids,
                      bool primary);
  static Oid    IndexGetRelation(Oid indexId);
+ static void UpdateStats(Oid relid, double reltuples);


  /*
***************
*** 1149,1154 ****
--- 1150,1185 ----
  }


+ /*
+  * This is invoked by the various index AMs once they have finished
+  * constructing an index. Constructing an index involves counting the
+  * number of tuples in both the relation and the index, so we take
+  * advantage of the opportunity to update pg_class to ensure that the
+  * planner takes advantage of the index we just created.  But, only
+  * update statistics during normal index definitions, not for indices
+  * on system catalogs created during bootstrap processing.  We must
+  * close the relations before updating statistics to guarantee that
+  * the relcache entries are flushed when we increment the command
+  * counter in UpdateStats(). But we do not release any locks on the
+  * relations; those will be held until end of transaction.
+  */
+ void
+ IndexCloseAndUpdateStats(Relation heap, double heapTuples,
+                          Relation index, double indexTuples)
+ {
+     Oid        hrelid = RelationGetRelid(heap);
+     Oid        irelid = RelationGetRelid(index);
+
+     if (!IsNormalProcessingMode())
+         return;
+
+     heap_close(heap, NoLock);
+     index_close(index);
+     UpdateStats(hrelid, heapTuples);
+     UpdateStats(irelid, indexTuples);
+ }
+
+
  /* ----------------
   *        UpdateStats
   *
***************
*** 1157,1163 ****
   * in the context of VACUUM, only CREATE INDEX.
   * ----------------
   */
! void
  UpdateStats(Oid relid, double reltuples)
  {
      Relation    whichRel;
--- 1188,1194 ----
   * in the context of VACUUM, only CREATE INDEX.
   * ----------------
   */
! static void
  UpdateStats(Oid relid, double reltuples)
  {
      Relation    whichRel;
Index: src/include/catalog/index.h
===================================================================
RCS file: /var/lib/cvs/pgsql/src/include/catalog/index.h,v
retrieving revision 1.62
diff -c -r1.62 index.h
*** src/include/catalog/index.h    14 Apr 2005 01:38:20 -0000    1.62
--- src/include/catalog/index.h    11 May 2005 02:24:28 -0000
***************
*** 52,58 ****
                 Datum *values,
                 bool *isnull);

! extern void UpdateStats(Oid relid, double reltuples);

  extern void setRelhasindex(Oid relid, bool hasindex,
                 bool isprimary, Oid reltoastidxid);
--- 52,59 ----
                 Datum *values,
                 bool *isnull);

! extern void IndexCloseAndUpdateStats(Relation heap, double heapTuples,
!                                      Relation index, double indexTuples);

  extern void setRelhasindex(Oid relid, bool hasindex,
                 bool isprimary, Oid reltoastidxid);

Re: refactor index build

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> This patch refactors away some duplicated code in the index AM build
> methods: they all invoke UpdateStats() since they have computed the
> number of heap tuples, so I created a function in catalog/index.c that
> each AM calls. This is per earlier discussion

No objection here, although it seems like only a minor issue.

The part of index build that's always gotten my goat is the business
about bootstrap mode opening and closing stuff in different places than
normal mode (see the XXX near the bottom of index_create).  If you're
feeling like improving the code cleanliness in this area, that would be
a great little exercise.

(For that matter, I suspect the bootstrap-time distinction between index
define and index build is not needed any more at all; we define them all
in a row at the end of bootstrap, so why not fill them in when defined
and make it exactly like the normal CREATE INDEX path?)

            regards, tom lane