Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)
Date
Msg-id 1073314.1607983297@sss.pgh.pa.us
Whole thread Raw
In response to Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)
List pgsql-hackers
I wrote:
> There are a couple of other API oddities that maybe we should think
> about while we're here:

> * Should we just have a blanket insistence that all callers supply
> HASH_ELEM?  The default sizes that dynahash.c uses without that are
> undocumented and basically useless.  We're already asserting that
> in the HASH_BLOBS path, which is the majority use-case, and this
> patch now asserts it for HASH_STRINGS too.

Here's a follow-up patch for that part, which also tries to respond
a bit to Heikki's complaint about skimpy documentation.  While at it,
I const-ified the HASHCTL argument, since there's no need for
hash_create to modify that.

            regards, tom lane

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 07cae638df..49f21b77bb 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -317,11 +317,20 @@ string_compare(const char *key1, const char *key2, Size keysize)
  *    *info: additional table parameters, as indicated by flags
  *    flags: bitmask indicating which parameters to take from *info
  *
- * The flags value must include exactly one of HASH_STRINGS, HASH_BLOBS,
+ * The flags value *must* include HASH_ELEM.  (Formerly, this was nominally
+ * optional, but the default keysize and entrysize values were useless.)
+ * The flags value must also include exactly one of HASH_STRINGS, HASH_BLOBS,
  * or HASH_FUNCTION, to define the key hashing semantics (C strings,
  * binary blobs, or custom, respectively).  Callers specifying a custom
  * hash function will likely also want to use HASH_COMPARE, and perhaps
  * also HASH_KEYCOPY, to control key comparison and copying.
+ * Another often-used flag is HASH_CONTEXT, to allocate the hash table
+ * under info->hcxt rather than under TopMemoryContext; the default
+ * behavior is only suitable for session-lifespan hash tables.
+ * Other flags bits are special-purpose and seldom used.
+ *
+ * Fields in *info are read only when the associated flags bit is set.
+ * It is not necessary to initialize other fields of *info.
  *
  * Note: for a shared-memory hashtable, nelem needs to be a pretty good
  * estimate, since we can't expand the table on the fly.  But an unshared
@@ -330,11 +339,19 @@ string_compare(const char *key1, const char *key2, Size keysize)
  * large nelem will penalize hash_seq_search speed without buying much.
  */
 HTAB *
-hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
+hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags)
 {
     HTAB       *hashp;
     HASHHDR    *hctl;

+    /*
+     * Hash tables now allocate space for key and data, but you have to say
+     * how much space to allocate.
+     */
+    Assert(flags & HASH_ELEM);
+    Assert(info->keysize > 0);
+    Assert(info->entrysize >= info->keysize);
+
     /*
      * For shared hash tables, we have a local hash header (HTAB struct) that
      * we allocate in TopMemoryContext; all else is in shared memory.
@@ -385,7 +402,6 @@ hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
     {
         Assert(!(flags & HASH_STRINGS));
         /* We can optimize hashing for common key sizes */
-        Assert(flags & HASH_ELEM);
         if (info->keysize == sizeof(uint32))
             hashp->hash = uint32_hash;
         else
@@ -402,7 +418,6 @@ hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
          * it's almost certainly an integer or pointer not a string.
          */
         Assert(flags & HASH_STRINGS);
-        Assert(flags & HASH_ELEM);
         Assert(info->keysize > 8);

         hashp->hash = string_hash;
@@ -529,16 +544,9 @@ hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
         hctl->dsize = info->dsize;
     }

-    /*
-     * hash table now allocates space for key and data but you have to say how
-     * much space to allocate
-     */
-    if (flags & HASH_ELEM)
-    {
-        Assert(info->entrysize >= info->keysize);
-        hctl->keysize = info->keysize;
-        hctl->entrysize = info->entrysize;
-    }
+    /* remember the entry sizes, too */
+    hctl->keysize = info->keysize;
+    hctl->entrysize = info->entrysize;

     /* make local copies of heavily-used constant fields */
     hashp->keysize = hctl->keysize;
@@ -617,10 +625,6 @@ hdefault(HTAB *hashp)
     hctl->dsize = DEF_DIRSIZE;
     hctl->nsegs = 0;

-    /* rather pointless defaults for key & entry size */
-    hctl->keysize = sizeof(char *);
-    hctl->entrysize = 2 * sizeof(char *);
-
     hctl->num_partitions = 0;    /* not partitioned */

     /* table has no fixed maximum size */
diff --git a/src/include/utils/hsearch.h b/src/include/utils/hsearch.h
index 666ad33567..c3daaae92b 100644
--- a/src/include/utils/hsearch.h
+++ b/src/include/utils/hsearch.h
@@ -124,7 +124,7 @@ typedef struct
  * one of these.
  */
 extern HTAB *hash_create(const char *tabname, long nelem,
-                         HASHCTL *info, int flags);
+                         const HASHCTL *info, int flags);
 extern void hash_destroy(HTAB *hashp);
 extern void hash_stats(const char *where, HTAB *hashp);
 extern void *hash_search(HTAB *hashp, const void *keyPtr, HASHACTION action,

pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: A few new options for CHECKPOINT
Next
From: Joshua Drake
Date:
Subject: Re: Optimizing the documentation