Thread: GiST header cleanup

GiST header cleanup

From
Neil Conway
Date:
This patch moves GiST implementation details from gist.h into a new
header file, gist_private.h. gist.h should only contain APIs that are
exposed to clients writing GiST extensions -- where possible we should
avoid backward-incompatible changes to those APIs, so it makes sense to
keep that API in a separate file.

Other related changes:

- pruned down the list of unnecessary includes in gist.h; as a result I
had to add a missing #include <float.h> to contrib/cube/cube.c

- remove isAttByVal(), which is no longer used

- remove declaration of _gistdump(), which is never defined

I noticed that GISTNStrategies is defined, but never used; instead there
is a literal "100" in include/catalog/pg_am.h. Does anyone see a reason
to keep GISTNStrategies around? Alternatively, should pg_am.h include
gist.h and reference GISTNStrategies instead of using "100"?

All of contrib/ continues to compile without warnings with this patch; I
haven't tried externally maintained GiST extensions, but they may need a
bit of #include tweaking.

Barring any objections I'll apply this later today or tomorrow.

-Neil
Index: contrib/cube/cube.c
===================================================================
RCS file: /var/lib/cvs/pgsql/contrib/cube/cube.c,v
retrieving revision 1.18
diff -c -r1.18 cube.c
*** contrib/cube/cube.c    21 Oct 2004 19:28:33 -0000    1.18
--- contrib/cube/cube.c    17 May 2005 01:54:22 -0000
***************
*** 6,11 ****
--- 6,12 ----

  #include "postgres.h"

+ #include <float.h>
  #include <math.h>

  #include "access/gist.h"
Index: src/backend/access/gist/gist.c
===================================================================
RCS file: /var/lib/cvs/pgsql/src/backend/access/gist/gist.c,v
retrieving revision 1.116
diff -c -r1.116 gist.c
*** src/backend/access/gist/gist.c    17 May 2005 00:59:30 -0000    1.116
--- src/backend/access/gist/gist.c    17 May 2005 01:49:26 -0000
***************
*** 15,21 ****
  #include "postgres.h"

  #include "access/genam.h"
! #include "access/gist.h"
  #include "access/gistscan.h"
  #include "access/heapam.h"
  #include "catalog/index.h"
--- 15,21 ----
  #include "postgres.h"

  #include "access/genam.h"
! #include "access/gist_private.h"
  #include "access/gistscan.h"
  #include "access/heapam.h"
  #include "catalog/index.h"
***************
*** 26,35 ****

  #undef GIST_PAGEADDITEM

! #define ATTSIZE(datum, TupDesc, i, isnull) \
      ( \
          (isnull) ? 0 : \
!             att_addlength(0, (TupDesc)->attrs[(i)-1]->attlen, (datum)) \
      )

  /* result's status */
--- 26,35 ----

  #undef GIST_PAGEADDITEM

! #define ATTSIZE(datum, tupdesc, i, isnull) \
      ( \
          (isnull) ? 0 : \
!             att_addlength(0, (tupdesc)->attrs[(i)-1]->attlen, (datum)) \
      )

  /* result's status */
Index: src/backend/access/gist/gistget.c
===================================================================
RCS file: /var/lib/cvs/pgsql/src/backend/access/gist/gistget.c,v
retrieving revision 1.46
diff -c -r1.46 gistget.c
*** src/backend/access/gist/gistget.c    17 May 2005 00:59:30 -0000    1.46
--- src/backend/access/gist/gistget.c    17 May 2005 01:28:51 -0000
***************
*** 14,20 ****
   */
  #include "postgres.h"

! #include "access/gist.h"
  #include "executor/execdebug.h"
  #include "utils/memutils.h"

--- 14,21 ----
   */
  #include "postgres.h"

! #include "access/gist_private.h"
! #include "access/itup.h"
  #include "executor/execdebug.h"
  #include "utils/memutils.h"

Index: src/backend/access/gist/gistscan.c
===================================================================
RCS file: /var/lib/cvs/pgsql/src/backend/access/gist/gistscan.c,v
retrieving revision 1.57
diff -c -r1.57 gistscan.c
*** src/backend/access/gist/gistscan.c    17 May 2005 00:59:30 -0000    1.57
--- src/backend/access/gist/gistscan.c    17 May 2005 01:16:46 -0000
***************
*** 15,21 ****
  #include "postgres.h"

  #include "access/genam.h"
! #include "access/gist.h"
  #include "access/gistscan.h"
  #include "utils/memutils.h"
  #include "utils/resowner.h"
--- 15,21 ----
  #include "postgres.h"

  #include "access/genam.h"
! #include "access/gist_private.h"
  #include "access/gistscan.h"
  #include "utils/memutils.h"
  #include "utils/resowner.h"
Index: src/backend/access/transam/rmgr.c
===================================================================
RCS file: /var/lib/cvs/pgsql/src/backend/access/transam/rmgr.c,v
retrieving revision 1.16
diff -c -r1.16 rmgr.c
*** src/backend/access/transam/rmgr.c    29 Aug 2004 21:08:47 -0000    1.16
--- src/backend/access/transam/rmgr.c    17 May 2005 01:19:48 -0000
***************
*** 8,14 ****
  #include "postgres.h"

  #include "access/clog.h"
! #include "access/gist.h"
  #include "access/hash.h"
  #include "access/heapam.h"
  #include "access/nbtree.h"
--- 8,14 ----
  #include "postgres.h"

  #include "access/clog.h"
! #include "access/gist_private.h"
  #include "access/hash.h"
  #include "access/heapam.h"
  #include "access/nbtree.h"
Index: src/include/access/gist.h
===================================================================
RCS file: /var/lib/cvs/pgsql/src/include/access/gist.h,v
retrieving revision 1.45
diff -c -r1.45 gist.h
*** src/include/access/gist.h    17 May 2005 00:59:30 -0000    1.45
--- src/include/access/gist.h    17 May 2005 01:50:26 -0000
***************
*** 1,7 ****
  /*-------------------------------------------------------------------------
   *
   * gist.h
!  *      common declarations for the GiST access method code.
   *
   *
   * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
--- 1,9 ----
  /*-------------------------------------------------------------------------
   *
   * gist.h
!  *      The public API for GiST indexes. This API is exposed to
!  *      individuals implementing GiST indexes, so backward-incompatible
!  *      changes should be made with care.
   *
   *
   * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
***************
*** 14,45 ****
  #ifndef GIST_H
  #define GIST_H

! #include "access/itup.h"
! #include "access/relscan.h"
! #include "access/sdir.h"
! #include "access/xlog.h"

  /*
!  * You can have as many strategies as you please in GiSTs,
!  * as long as your consistent method can handle them.
!  * The system doesn't really care what they are.
   */
  #define GISTNStrategies                    100

  /*
-  * amproc indexes for GiST indexes.
-  */
- #define GIST_CONSISTENT_PROC            1
- #define GIST_UNION_PROC                    2
- #define GIST_COMPRESS_PROC                3
- #define GIST_DECOMPRESS_PROC            4
- #define GIST_PENALTY_PROC                5
- #define GIST_PICKSPLIT_PROC                6
- #define GIST_EQUAL_PROC                    7
- #define GISTNProcs                        7
-
-
- /*
   * Page opaque data in a GiST index page.
   */
  #define F_LEAF            (1 << 0)
--- 16,33 ----
  #ifndef GIST_H
  #define GIST_H

! #include "storage/bufpage.h"
! #include "storage/off.h"
! #include "utils/rel.h"

  /*
!  * You can have as many strategies as you please in GiSTs, as long as
!  * your consistent method can handle them.  The system doesn't really
!  * care what they are.
   */
  #define GISTNStrategies                    100

  /*
   * Page opaque data in a GiST index page.
   */
  #define F_LEAF            (1 << 0)
***************
*** 51,131 ****

  typedef GISTPageOpaqueData *GISTPageOpaque;

- #define GIST_LEAF(entry) (((GISTPageOpaque) PageGetSpecialPointer((entry)->page))->flags & F_LEAF)
-
- /*
-  *    When we descend a tree, we keep a stack of parent pointers. This
-  *    allows us to follow a chain of internal node points until we reach
-  *    a leaf node, and then back up the stack to re-examine the internal
-  *    nodes.
-  *
-  * 'parent' is the previous stack entry -- i.e. the node we arrived
-  * from. 'block' is the node's block number. 'offset' is the offset in
-  * the node's page that we stopped at (i.e. we followed the child
-  * pointer located at the specified offset).
-  */
- typedef struct GISTSTACK
- {
-     struct GISTSTACK *parent;
-     OffsetNumber offset;
-     BlockNumber block;
- } GISTSTACK;
-
- typedef struct GISTSTATE
- {
-     FmgrInfo    consistentFn[INDEX_MAX_KEYS];
-     FmgrInfo    unionFn[INDEX_MAX_KEYS];
-     FmgrInfo    compressFn[INDEX_MAX_KEYS];
-     FmgrInfo    decompressFn[INDEX_MAX_KEYS];
-     FmgrInfo    penaltyFn[INDEX_MAX_KEYS];
-     FmgrInfo    picksplitFn[INDEX_MAX_KEYS];
-     FmgrInfo    equalFn[INDEX_MAX_KEYS];
-
-     TupleDesc    tupdesc;
- } GISTSTATE;
-
- #define isAttByVal( gs, anum ) (gs)->tupdesc->attrs[anum]->attbyval
-
- /*
-  *    When we're doing a scan, we need to keep track of the parent stack
-  *    for the marked and current items.
-  */
- typedef struct GISTScanOpaqueData
- {
-     GISTSTACK            *stack;
-     GISTSTACK            *markstk;
-     uint16                 flags;
-     GISTSTATE            *giststate;
-     MemoryContext         tempCxt;
-     Buffer                 curbuf;
-     Buffer                 markbuf;
- } GISTScanOpaqueData;
-
- typedef GISTScanOpaqueData *GISTScanOpaque;
-
- /*
-  *    When we're doing a scan and updating a tree at the same time, the
-  *    updates may affect the scan.  We use the flags entry of the scan's
-  *    opaque space to record our actual position in response to updates
-  *    that we can't handle simply by adjusting pointers.
-  */
- #define GS_CURBEFORE    ((uint16) (1 << 0))
- #define GS_MRKBEFORE    ((uint16) (1 << 1))
-
- /* root page of a gist index */
- #define GIST_ROOT_BLKNO                0
-
- /*
-  *    When we update a relation on which we're doing a scan, we need to
-  *    check the scan and fix it if the update affected any of the pages it
-  *    touches.  Otherwise, we can miss records that we should see.  The only
-  *    times we need to do this are for deletions and splits.    See the code in
-  *    gistscan.c for how the scan is fixed. These two constants tell us what sort
-  *    of operation changed the index.
-  */
- #define GISTOP_DEL        0
- #define GISTOP_SPLIT    1
-
  /*
   * This is the Split Vector to be returned by the PickSplit method.
   */
--- 39,44 ----
***************
*** 153,162 ****
  } GIST_SPLITVEC;

  /*
!  * An entry on a GiST node.  Contains the key, as well as
!  * its own location (rel,page,offset) which can supply the matching
!  * pointer.  The size of the key is in bytes, and leafkey is a flag to
!  * tell us if the entry is in a leaf node.
   */
  typedef struct GISTENTRY
  {
--- 66,75 ----
  } GIST_SPLITVEC;

  /*
!  * An entry on a GiST node.  Contains the key, as well as its own
!  * location (rel,page,offset) which can supply the matching pointer.
!  * The size of the key is in bytes, and leafkey is a flag to tell us
!  * if the entry is in a leaf node.
   */
  typedef struct GISTENTRY
  {
***************
*** 168,186 ****
      bool        leafkey;
  } GISTENTRY;


  /*
!  * Vector of GISTENTRY struct's, user-defined
!  * methods union andpick split takes it as one of args
   */
-
  typedef struct
  {
      int32        n;                /* number of elements */
      GISTENTRY    vector[1];
  } GistEntryVector;

! #define GEVHDRSZ    ( offsetof(GistEntryVector, vector[0]) )

  /*
   * macro to initialize a GISTENTRY
--- 81,99 ----
      bool        leafkey;
  } GISTENTRY;

+ #define GIST_LEAF(entry) (((GISTPageOpaque) PageGetSpecialPointer((entry)->page))->flags & F_LEAF)

  /*
!  * Vector of GISTENTRY structs; user-defined methods union and pick
!  * split takes it as one of their arguments
   */
  typedef struct
  {
      int32        n;                /* number of elements */
      GISTENTRY    vector[1];
  } GistEntryVector;

! #define GEVHDRSZ    (offsetof(GistEntryVector, vector[0]))

  /*
   * macro to initialize a GISTENTRY
***************
*** 189,212 ****
      do { (e).key = (k); (e).rel = (r); (e).page = (pg); \
           (e).offset = (o); (e).bytes = (b); (e).leafkey = (l); } while (0)

- /* gist.c */
- extern Datum gistbuild(PG_FUNCTION_ARGS);
- extern Datum gistinsert(PG_FUNCTION_ARGS);
- extern Datum gistbulkdelete(PG_FUNCTION_ARGS);
- extern void _gistdump(Relation r);
- extern void initGISTstate(GISTSTATE *giststate, Relation index);
- extern void freeGISTstate(GISTSTATE *giststate);
- extern void gistdentryinit(GISTSTATE *giststate, int nkey, GISTENTRY *e,
-                Datum k, Relation r, Page pg, OffsetNumber o,
-                int b, bool l, bool isNull);
-
- extern void gist_redo(XLogRecPtr lsn, XLogRecord *record);
- extern void gist_undo(XLogRecPtr lsn, XLogRecord *record);
- extern void gist_desc(char *buf, uint8 xl_info, char *rec);
- extern MemoryContext createTempGistContext(void);
-
- /* gistget.c */
- extern Datum gistgettuple(PG_FUNCTION_ARGS);
- extern Datum gistgetmulti(PG_FUNCTION_ARGS);
-
  #endif   /* GIST_H */
--- 102,105 ----
Index: src/include/access/gist_private.h
===================================================================
RCS file: src/include/access/gist_private.h
diff -N src/include/access/gist_private.h
*** /dev/null    1 Jan 1970 00:00:00 -0000
--- src/include/access/gist_private.h    17 May 2005 01:50:09 -0000
***************
*** 0 ****
--- 1,122 ----
+ /*-------------------------------------------------------------------------
+  *
+  * gist_private.h
+  *      private declarations for GiST -- declarations related to the
+  *      internal implementation of GiST, not the public API
+  *
+  * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  * $PostgreSQL$
+  *
+  *-------------------------------------------------------------------------
+  */
+ #ifndef GIST_PRIVATE_H
+ #define GIST_PRIVATE_H
+
+ #include "access/gist.h"
+ #include "access/xlog.h"
+ #include "fmgr.h"
+
+ /*
+  * amproc indexes for GiST indexes.
+  */
+ #define GIST_CONSISTENT_PROC            1
+ #define GIST_UNION_PROC                    2
+ #define GIST_COMPRESS_PROC                3
+ #define GIST_DECOMPRESS_PROC            4
+ #define GIST_PENALTY_PROC                5
+ #define GIST_PICKSPLIT_PROC                6
+ #define GIST_EQUAL_PROC                    7
+ #define GISTNProcs                        7
+
+ /*
+  * When we descend a tree, we keep a stack of parent pointers. This
+  * allows us to follow a chain of internal node points until we reach
+  * a leaf node, and then back up the stack to re-examine the internal
+  * nodes.
+  *
+  * 'parent' is the previous stack entry -- i.e. the node we arrived
+  * from. 'block' is the node's block number. 'offset' is the offset in
+  * the node's page that we stopped at (i.e. we followed the child
+  * pointer located at the specified offset).
+  */
+ typedef struct GISTSTACK
+ {
+     struct GISTSTACK *parent;
+     OffsetNumber offset;
+     BlockNumber block;
+ } GISTSTACK;
+
+ typedef struct GISTSTATE
+ {
+     FmgrInfo    consistentFn[INDEX_MAX_KEYS];
+     FmgrInfo    unionFn[INDEX_MAX_KEYS];
+     FmgrInfo    compressFn[INDEX_MAX_KEYS];
+     FmgrInfo    decompressFn[INDEX_MAX_KEYS];
+     FmgrInfo    penaltyFn[INDEX_MAX_KEYS];
+     FmgrInfo    picksplitFn[INDEX_MAX_KEYS];
+     FmgrInfo    equalFn[INDEX_MAX_KEYS];
+
+     TupleDesc    tupdesc;
+ } GISTSTATE;
+
+ /*
+  *    When we're doing a scan, we need to keep track of the parent stack
+  *    for the marked and current items.
+  */
+ typedef struct GISTScanOpaqueData
+ {
+     GISTSTACK            *stack;
+     GISTSTACK            *markstk;
+     uint16                 flags;
+     GISTSTATE            *giststate;
+     MemoryContext         tempCxt;
+     Buffer                 curbuf;
+     Buffer                 markbuf;
+ } GISTScanOpaqueData;
+
+ typedef GISTScanOpaqueData *GISTScanOpaque;
+
+ /*
+  * When we're doing a scan and updating a tree at the same time, the
+  * updates may affect the scan.  We use the flags entry of the scan's
+  * opaque space to record our actual position in response to updates
+  * that we can't handle simply by adjusting pointers.
+  */
+ #define GS_CURBEFORE    ((uint16) (1 << 0))
+ #define GS_MRKBEFORE    ((uint16) (1 << 1))
+
+ /* root page of a gist index */
+ #define GIST_ROOT_BLKNO                0
+
+ /*
+  * When we update a relation on which we're doing a scan, we need to
+  * check the scan and fix it if the update affected any of the pages
+  * it touches.  Otherwise, we can miss records that we should see.
+  * The only times we need to do this are for deletions and splits. See
+  * the code in gistscan.c for how the scan is fixed. These two
+  * constants tell us what sort of operation changed the index.
+  */
+ #define GISTOP_DEL        0
+ #define GISTOP_SPLIT    1
+
+ /* gist.c */
+ extern Datum gistbuild(PG_FUNCTION_ARGS);
+ extern Datum gistinsert(PG_FUNCTION_ARGS);
+ extern Datum gistbulkdelete(PG_FUNCTION_ARGS);
+ extern MemoryContext createTempGistContext(void);
+ extern void initGISTstate(GISTSTATE *giststate, Relation index);
+ extern void freeGISTstate(GISTSTATE *giststate);
+ extern void gistdentryinit(GISTSTATE *giststate, int nkey, GISTENTRY *e,
+                Datum k, Relation r, Page pg, OffsetNumber o,
+                int b, bool l, bool isNull);
+ extern void gist_redo(XLogRecPtr lsn, XLogRecord *record);
+ extern void gist_undo(XLogRecPtr lsn, XLogRecord *record);
+ extern void gist_desc(char *buf, uint8 xl_info, char *rec);
+
+ /* gistget.c */
+ extern Datum gistgettuple(PG_FUNCTION_ARGS);
+ extern Datum gistgetmulti(PG_FUNCTION_ARGS);
+
+ #endif    /* GIST_PRIVATE_H */

Re: GiST header cleanup

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> This patch moves GiST implementation details from gist.h into a new
> header file, gist_private.h.

Sounds reasonable.  The other index AMs could possibly benefit from the
same thing --- in particular I believe nbtree.h is included by quite a
lot of places that really only need to know btree strategy and support
procedure numbers.

One objection: I think the GiST amproc numbers (GIST_CONSISTENT_PROC
and friends) *are* part of the API and should be in the public header,
even if they happen not to be used by any C code at the moment.  They
are certainly well known at the SQL level, and the btree precedent
suggests people will want them at the C level too.

> I noticed that GISTNStrategies is defined, but never used; instead there
> is a literal "100" in include/catalog/pg_am.h. Does anyone see a reason
> to keep GISTNStrategies around? Alternatively, should pg_am.h include
> gist.h and reference GISTNStrategies instead of using "100"?

GISTNStrategies seems inherently bogus, since there's no essential limit
on the number of strategies in a gist index.  I'd get rid of it.

The "100" in pg_am.h is pretty nasty too, because it is on the one hand
theoretically insufficient and on the other hand in practice way too
much.  (This at least leads to wasted space in relcache entries for gist
indexes, and probably defeats error checks to some extent as well.)
It might be interesting someday to think about a way to let this number
be specified per-opclass instead of per-AM, for those AMs where such a
thing makes sense (only GiST at the moment).  I'm not terribly excited
about it though... for now I'm willing to live with "100".  Anyone who
needs more can poke their local copy of pg_am.

            regards, tom lane

Re: GiST header cleanup

From
Neil Conway
Date:
Patch applied.

Tom Lane wrote:
> One objection: I think the GiST amproc numbers (GIST_CONSISTENT_PROC
> and friends) *are* part of the API and should be in the public header,
> even if they happen not to be used by any C code at the moment.

Ok, I've moved these back to gist.h

> GISTNStrategies seems inherently bogus, since there's no essential limit
> on the number of strategies in a gist index.  I'd get rid of it.

Done.

> The "100" in pg_am.h is pretty nasty too, because it is on the one hand
> theoretically insufficient and on the other hand in practice way too
> much.

Yeah, I agree this is pretty ugly, but I'm not planning to fix it any
time soon, either...

-Neil