Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)
Date
Msg-id 1013756.1607972343@sss.pgh.pa.us
Whole thread Raw
In response to Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)  (Noah Misch <noah@leadboat.com>)
Responses Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)
List pgsql-hackers
Noah Misch <noah@leadboat.com> writes:
> On Sun, Dec 13, 2020 at 11:49:31AM -0500, Tom Lane wrote:
>> A quick count of grep hits suggest that the large majority of
>> existing hash_create() calls use HASH_BLOBS, and there might be
>> only order-of-ten calls that would need to be touched if we
>> required an explicit HASH_STRING flag.  So option #2 is seeming
>> kind of attractive.  Maybe that together with an assertion that
>> string keys have to exceed 8 or 16 bytes would be enough protection.

> Agreed.  I expect (2) gives most of the benefit.  Requiring 8-byte capacity
> should be harmless, and most architectures can zero 8 bytes in one
> instruction.  Requiring more bytes trades specificity for sensitivity.

Attached is a proposed patch that requires HASH_STRINGS to be stated
explicitly (in the event, there are 13 callers needing that) and insists
on keysize > 8 for string keys.  In examining the now-easily-visible uses
of string keys, almost all of them are using NAMEDATALEN-sized keys, or
in a few places larger values.  Only two are smaller:

1. ShmemIndex uses SHMEM_INDEX_KEYSIZE, which is only set to 48.

2. ResetUnloggedRelationsInDbspaceDir is using OIDCHARS + 1, because
it stores relfilenode OIDs as strings.  That seems pretty damfool
to me, so I'm inclined to change it to store binary OIDs instead;
those'd be a third the size (or probably a quarter the size after
alignment padding) and likely faster to hash or compare.  But I
didn't do that here, since it's still more than 8.  (I did whack
it upside the head to the extent of not storing its temporary
hash table in CacheMemoryContext.)

So it seems to me that insisting on keysize > 8 is fine.

There are a couple of other API oddities that maybe we should think
about while we're here:

* Should we just have a blanket insistence that all callers supply
HASH_ELEM?  The default sizes that dynahash.c uses without that are
undocumented and basically useless.  We're already asserting that
in the HASH_BLOBS path, which is the majority use-case, and this
patch now asserts it for HASH_STRINGS too.

* The coding convention that the HASHCTL argument struct should be
pre-zeroed seems to have been ignored at a lot of call sites.
I added a memset call to a couple of callers that I was touching
in this patch, but I'm having second thoughts about that.  Maybe
we should just rip out all those memsets as pointless, since there's
basically no case where you'd use the memset to fill a field that
you meant to pass as zero.  The fact that hash_create() doesn't
read fields it's not told to by a flag means we should not need
the memsets to avoid uninitialized-memory reads.

            regards, tom lane

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 2dc9e44ae6..8b17fb06eb 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2604,10 +2604,12 @@ createConnHash(void)
 {
     HASHCTL        ctl;

+    memset(&ctl, 0, sizeof(ctl));
     ctl.keysize = NAMEDATALEN;
     ctl.entrysize = sizeof(remoteConnHashEnt);

-    return hash_create("Remote Con hash", NUMCONN, &ctl, HASH_ELEM);
+    return hash_create("Remote Con hash", NUMCONN, &ctl,
+                       HASH_ELEM | HASH_STRINGS);
 }

 static void
diff --git a/contrib/tablefunc/tablefunc.c b/contrib/tablefunc/tablefunc.c
index 85986ec24a..ec7819ca77 100644
--- a/contrib/tablefunc/tablefunc.c
+++ b/contrib/tablefunc/tablefunc.c
@@ -726,7 +726,7 @@ load_categories_hash(char *cats_sql, MemoryContext per_query_ctx)
     crosstab_hash = hash_create("crosstab hash",
                                 INIT_CATS,
                                 &ctl,
-                                HASH_ELEM | HASH_CONTEXT);
+                                HASH_ELEM | HASH_STRINGS | HASH_CONTEXT);

     /* Connect to SPI manager */
     if ((ret = SPI_connect()) < 0)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 4b18be5b27..5ba7c2eb3c 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -414,7 +414,7 @@ InitQueryHashTable(void)
     prepared_queries = hash_create("Prepared Queries",
                                    32,
                                    &hash_ctl,
-                                   HASH_ELEM);
+                                   HASH_ELEM | HASH_STRINGS);
 }

 /*
diff --git a/src/backend/nodes/extensible.c b/src/backend/nodes/extensible.c
index ab04459c55..2fe89fd361 100644
--- a/src/backend/nodes/extensible.c
+++ b/src/backend/nodes/extensible.c
@@ -51,7 +51,8 @@ RegisterExtensibleNodeEntry(HTAB **p_htable, const char *htable_label,
         ctl.keysize = EXTNODENAME_MAX_LEN;
         ctl.entrysize = sizeof(ExtensibleNodeEntry);

-        *p_htable = hash_create(htable_label, 100, &ctl, HASH_ELEM);
+        *p_htable = hash_create(htable_label, 100, &ctl,
+                                HASH_ELEM | HASH_STRINGS);
     }

     if (strlen(extnodename) >= EXTNODENAME_MAX_LEN)
diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c
index 0c2094f766..f21ab67ae4 100644
--- a/src/backend/storage/file/reinit.c
+++ b/src/backend/storage/file/reinit.c
@@ -175,7 +175,9 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
         memset(&ctl, 0, sizeof(ctl));
         ctl.keysize = sizeof(unlogged_relation_entry);
         ctl.entrysize = sizeof(unlogged_relation_entry);
-        hash = hash_create("unlogged hash", 32, &ctl, HASH_ELEM);
+        ctl.hcxt = CurrentMemoryContext;
+        hash = hash_create("unlogged hash", 32, &ctl,
+                           HASH_ELEM | HASH_STRINGS | HASH_CONTEXT);

         /* Scan the directory. */
         dbspace_dir = AllocateDir(dbspacedirname);
diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index 97716f6aef..0afd87e075 100644
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -292,7 +292,6 @@ void
 InitShmemIndex(void)
 {
     HASHCTL        info;
-    int            hash_flags;

     /*
      * Create the shared memory shmem index.
@@ -302,13 +301,14 @@ InitShmemIndex(void)
      * initializing the ShmemIndex itself.  The special "ShmemIndex" hash
      * table name will tell ShmemInitStruct to fake it.
      */
+    memset(&info, 0, sizeof(info));
     info.keysize = SHMEM_INDEX_KEYSIZE;
     info.entrysize = sizeof(ShmemIndexEnt);
-    hash_flags = HASH_ELEM;

     ShmemIndex = ShmemInitHash("ShmemIndex",
                                SHMEM_INDEX_SIZE, SHMEM_INDEX_SIZE,
-                               &info, hash_flags);
+                               &info,
+                               HASH_ELEM | HASH_STRINGS);
 }

 /*
@@ -329,6 +329,10 @@ InitShmemIndex(void)
  * whose maximum size is certain, this should be equal to max_size; that
  * ensures that no run-time out-of-shared-memory failures can occur.
  *
+ * *infoP and hash_flags should specify at least the entry sizes and key
+ * comparison semantics (see hash_create()).  Flag bits specific to
+ * shared-memory hash tables are added here.
+ *
  * Note: before Postgres 9.0, this function returned NULL for some failure
  * cases.  Now, it always throws error instead, so callers need not check
  * for NULL.
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 12557ce3af..be0a45b55e 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -3446,7 +3446,7 @@ get_json_object_as_hash(char *json, int len, const char *funcname)
     tab = hash_create("json object hashtable",
                       100,
                       &ctl,
-                      HASH_ELEM | HASH_CONTEXT);
+                      HASH_ELEM | HASH_STRINGS | HASH_CONTEXT);

     state = palloc0(sizeof(JHashState));
     sem = palloc0(sizeof(JsonSemAction));
@@ -3838,7 +3838,7 @@ populate_recordset_object_start(void *state)
     _state->json_hash = hash_create("json object hashtable",
                                     100,
                                     &ctl,
-                                    HASH_ELEM | HASH_CONTEXT);
+                                    HASH_ELEM | HASH_STRINGS | HASH_CONTEXT);
 }

 static void
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index ad582f99a5..87a3154c1a 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -3471,7 +3471,7 @@ set_rtable_names(deparse_namespace *dpns, List *parent_namespaces,
     names_hash = hash_create("set_rtable_names names",
                              list_length(dpns->rtable),
                              &hash_ctl,
-                             HASH_ELEM | HASH_CONTEXT);
+                             HASH_ELEM | HASH_STRINGS | HASH_CONTEXT);
     /* Preload the hash table with names appearing in parent_namespaces */
     foreach(lc, parent_namespaces)
     {
diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c
index bd779fdaf7..e83e30defe 100644
--- a/src/backend/utils/fmgr/dfmgr.c
+++ b/src/backend/utils/fmgr/dfmgr.c
@@ -686,7 +686,7 @@ find_rendezvous_variable(const char *varName)
         rendezvousHash = hash_create("Rendezvous variable hash",
                                      16,
                                      &ctl,
-                                     HASH_ELEM);
+                                     HASH_ELEM | HASH_STRINGS);
     }

     /* Find or create the hashtable entry for this varName */
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index d14d875c93..07cae638df 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -30,11 +30,12 @@
  * dynahash.c provides support for these types of lookup keys:
  *
  * 1. Null-terminated C strings (truncated if necessary to fit in keysize),
- * compared as though by strcmp().  This is the default behavior.
+ * compared as though by strcmp().  This is selected by specifying the
+ * HASH_STRINGS flag to hash_create.
  *
  * 2. Arbitrary binary data of size keysize, compared as though by memcmp().
  * (Caller must ensure there are no undefined padding bits in the keys!)
- * This is selected by specifying HASH_BLOBS flag to hash_create.
+ * This is selected by specifying the HASH_BLOBS flag to hash_create.
  *
  * 3. More complex key behavior can be selected by specifying user-supplied
  * hashing, comparison, and/or key-copying functions.  At least a hashing
@@ -47,8 +48,8 @@
  *   locks.
  * - Shared memory hashes are allocated in a fixed size area at startup and
  *   are discoverable by name from other processes.
- * - Because entries don't need to be moved in the case of hash conflicts, has
- *   better performance for large entries
+ * - Because entries don't need to be moved in the case of hash conflicts,
+ *   dynahash has better performance for large entries.
  * - Guarantees stable pointers to entries.
  *
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
@@ -316,6 +317,12 @@ string_compare(const char *key1, const char *key2, Size keysize)
  *    *info: additional table parameters, as indicated by flags
  *    flags: bitmask indicating which parameters to take from *info
  *
+ * The flags value must include exactly one of HASH_STRINGS, HASH_BLOBS,
+ * or HASH_FUNCTION, to define the key hashing semantics (C strings,
+ * binary blobs, or custom, respectively).  Callers specifying a custom
+ * hash function will likely also want to use HASH_COMPARE, and perhaps
+ * also HASH_KEYCOPY, to control key comparison and copying.
+ *
  * Note: for a shared-memory hashtable, nelem needs to be a pretty good
  * estimate, since we can't expand the table on the fly.  But an unshared
  * hashtable can be expanded on-the-fly, so it's better for nelem to be
@@ -370,9 +377,13 @@ hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
      * Select the appropriate hash function (see comments at head of file).
      */
     if (flags & HASH_FUNCTION)
+    {
+        Assert(!(flags & (HASH_BLOBS | HASH_STRINGS)));
         hashp->hash = info->hash;
+    }
     else if (flags & HASH_BLOBS)
     {
+        Assert(!(flags & HASH_STRINGS));
         /* We can optimize hashing for common key sizes */
         Assert(flags & HASH_ELEM);
         if (info->keysize == sizeof(uint32))
@@ -381,17 +392,30 @@ hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
             hashp->hash = tag_hash;
     }
     else
-        hashp->hash = string_hash;    /* default hash function */
+    {
+        /*
+         * string_hash used to be considered the default hash method, and in a
+         * non-assert build it effectively still is.  But we now consider it
+         * an assertion error to not say HASH_STRINGS explicitly.  To help
+         * catch mistaken usage of HASH_STRINGS, we also insist on a
+         * reasonably long string length: if the keysize is only 4 or 8 bytes,
+         * it's almost certainly an integer or pointer not a string.
+         */
+        Assert(flags & HASH_STRINGS);
+        Assert(flags & HASH_ELEM);
+        Assert(info->keysize > 8);
+
+        hashp->hash = string_hash;
+    }

     /*
      * If you don't specify a match function, it defaults to string_compare if
-     * you used string_hash (either explicitly or by default) and to memcmp
-     * otherwise.
+     * you used string_hash, and to memcmp otherwise.
      *
      * Note: explicitly specifying string_hash is deprecated, because this
      * might not work for callers in loadable modules on some platforms due to
      * referencing a trampoline instead of the string_hash function proper.
-     * Just let it default, eh?
+     * Specify HASH_STRINGS instead.
      */
     if (flags & HASH_COMPARE)
         hashp->match = info->match;
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index ec6f80ee99..a382c4219b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -111,6 +111,7 @@ EnablePortalManager(void)
                                              "TopPortalContext",
                                              ALLOCSET_DEFAULT_SIZES);

+    memset(&ctl, 0, sizeof(ctl));
     ctl.keysize = MAX_PORTALNAME_LEN;
     ctl.entrysize = sizeof(PortalHashEnt);

@@ -119,7 +120,7 @@ EnablePortalManager(void)
      * create, initially
      */
     PortalHashTable = hash_create("Portal hash", PORTALS_PER_USER,
-                                  &ctl, HASH_ELEM);
+                                  &ctl, HASH_ELEM | HASH_STRINGS);
 }

 /*
diff --git a/src/include/utils/hsearch.h b/src/include/utils/hsearch.h
index bebf89b3c4..666ad33567 100644
--- a/src/include/utils/hsearch.h
+++ b/src/include/utils/hsearch.h
@@ -82,7 +82,8 @@ typedef struct HASHCTL
 #define HASH_PARTITION    0x0001    /* Hashtable is used w/partitioned locking */
 #define HASH_SEGMENT    0x0002    /* Set segment size */
 #define HASH_DIRSIZE    0x0004    /* Set directory size (initial and max) */
-#define HASH_ELEM        0x0010    /* Set keysize and entrysize */
+#define HASH_ELEM        0x0008    /* Set keysize and entrysize */
+#define HASH_STRINGS    0x0010    /* Select support functions for string keys */
 #define HASH_BLOBS        0x0020    /* Select support functions for binary keys */
 #define HASH_FUNCTION    0x0040    /* Set user defined hash function */
 #define HASH_COMPARE    0x0080    /* Set user defined comparison function */
@@ -119,7 +120,8 @@ typedef struct
  *
  * Note: It is deprecated for callers of hash_create to explicitly specify
  * string_hash, tag_hash, uint32_hash, or oid_hash.  Just set HASH_BLOBS or
- * not.  Use HASH_FUNCTION only when you want something other than those.
+ * HASH_STRINGS.  Use HASH_FUNCTION only when you want something other than
+ * one of these.
  */
 extern HTAB *hash_create(const char *tabname, long nelem,
                          HASHCTL *info, int flags);
diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index 4de756455d..60f5d66264 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -586,7 +586,7 @@ select_perl_context(bool trusted)
         interp_desc->query_hash = hash_create("PL/Perl queries",
                                               32,
                                               &hash_ctl,
-                                              HASH_ELEM);
+                                              HASH_ELEM | HASH_STRINGS);
     }

     /*
diff --git a/src/timezone/pgtz.c b/src/timezone/pgtz.c
index 3f0fb51e91..5240cab022 100644
--- a/src/timezone/pgtz.c
+++ b/src/timezone/pgtz.c
@@ -211,7 +211,7 @@ init_timezone_hashtable(void)
     timezone_cache = hash_create("Timezones",
                                  4,
                                  &hash_ctl,
-                                 HASH_ELEM);
+                                 HASH_ELEM | HASH_STRINGS);
     if (!timezone_cache)
         return false;


pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: archive status ".ready" files may be created too early
Next
From: Joshua Drake
Date:
Subject: Optimizing the documentation