Re: Protect syscache from bloating with negative cache entries - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Protect syscache from bloating with negative cache entries
Date
Msg-id 20180312.173408.162882093.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Protect syscache from bloating with negative cache entries  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Protect syscache from bloating with negative cache entries  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Re: Protect syscache from bloating with negative cache entries  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
At Fri, 09 Mar 2018 17:40:01 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20180309.174001.202113825.horiguchi.kyotaro@lab.ntt.co.jp>
> > In short, it's not really apparent to me that negative syscache entries
> > are the major problem of this kind.  I'm afraid that you're drawing very
> > large conclusions from a specific workload.  Maybe we could fix that
> > workload some other way.
> 
> The current patch doesn't consider whether an entry is negative
> or positive(?). Just clean up all entries based on time.
> 
> If relation has to have the same characterictics to syscaches, it
> might be better be on the catcache mechanism, instaed of adding
> the same pruning mechanism to dynahash..

For the moment, I added such feature to dynahash and let only
relcache use it in this patch. Hash element has different shape
in "prunable" hash and pruning is performed in a similar way
sharing the setting with syscache. This seems working fine.

It is bit uneasy that all syscaches and relcache shares the same
value of syscache_memory_target...

Something like the sttached test script causes relcache
"bloat". Server emits the following log entries in DEBUG1 message
level.

DEBUG:  removed 11240/32769 entries from hash "Relcache by OID" at character 15

# The last few words are just garbage I mentioned in another thread.

The last two patches do that (as PoC).

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 705b67a79ef7e27a450083944f8d970b7eb9e619 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Tue, 26 Dec 2017 17:43:09 +0900
Subject: [PATCH 1/3] Remove entries that haven't been used for a certain time

Catcache entries can be left alone for several reasons. It is not
desirable that they eat up memory. With this patch, This adds
consideration of removal of entries that haven't been used for a
certain time before enlarging the hash array.
---
 doc/src/sgml/config.sgml                      |  38 +++++++
 src/backend/access/transam/xact.c             |   3 +
 src/backend/utils/cache/catcache.c            | 152 +++++++++++++++++++++++++-
 src/backend/utils/misc/guc.c                  |  23 ++++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/utils/catcache.h                  |  19 ++++
 6 files changed, 233 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3a8fc7d803..394e0703f8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1557,6 +1557,44 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-syscache-memory-target" xreflabel="syscache_memory_target">
+      <term><varname>syscache_memory_target</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>syscache_memory_target</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the maximum amount of memory to which syscache is expanded
+        without pruning. The value defaults to 0, indicating that pruning is
+        always considered. After exceeding this size, syscache pruning is
+        considered according to
+        <xref linkend="guc-syscache-prune-min-age"/>. If you need to keep
+        certain amount of syscache entries with intermittent usage, try
+        increase this setting.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-syscache-prune-min-age" xreflabel="syscache_prune_min_age">
+      <term><varname>syscache_prune_min_age</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>syscache_prune_min_age</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the minimum amount of unused time in seconds at which a
+        syscache entry is considered to be removed. -1 indicates that syscache
+        pruning is disabled at all. The value defaults to 600 seconds
+        (<literal>10 minutes</literal>). The syscache entries that are not
+        used for the duration can be removed to prevent syscache bloat. This
+        behavior is suppressed until the size of syscache exceeds
+        <xref linkend="guc-syscache-memory-target"/>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index dbaaf8e005..86d76917bb 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -733,6 +733,9 @@ void
 SetCurrentStatementStartTimestamp(void)
 {
     stmtStartTimestamp = GetCurrentTimestamp();
+
+    /* Set this timestamp as aproximated current time */
+    SetCatCacheClock(stmtStartTimestamp);
 }
 
 /*
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 5ddbf6eab1..0236a05127 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -71,9 +71,23 @@
 #define CACHE6_elog(a,b,c,d,e,f,g)
 #endif
 
+/*
+ * GUC variable to define the minimum size of hash to cosider entry eviction.
+ * Let the name be the same with the guc variable name, not using 'catcache'.
+ */
+int syscache_memory_target = 0;
+
+/* GUC variable to define the minimum age of entries that will be cosidered
+ * to be evicted in seconds. Ditto for the name.
+ */
+int syscache_prune_min_age = 600;
+
 /* Cache management header --- pointer is NULL until created */
 static CatCacheHeader *CacheHdr = NULL;
 
+/* Timestamp used for any operation on caches. */
+TimestampTz    catcacheclock = 0;
+
 static inline HeapTuple SearchCatCacheInternal(CatCache *cache,
                        int nkeys,
                        Datum v1, Datum v2,
@@ -866,9 +880,130 @@ InitCatCache(int id,
      */
     MemoryContextSwitchTo(oldcxt);
 
+    /* initilize catcache reference clock if haven't done yet */
+    if (catcacheclock == 0)
+        catcacheclock = GetCurrentTimestamp();
+
     return cp;
 }
 
+/*
+ * CatCacheCleanupOldEntries - Remove infrequently-used entries
+ *
+ * Catcache entries can be left alone for several reasons. We remove them if
+ * they not accessed for a certain time to prevent catcache from bloating. The
+ * eviction is performed with the similar algorithm with buffer eviction using
+ * access counter. Entries that are accessed several times can live longer
+ * than those that have had no access in the same duration.
+ */
+static bool
+CatCacheCleanupOldEntries(CatCache *cp)
+{
+    int            i;
+    int            nremoved = 0;
+    size_t        hash_size;
+#ifdef CATCACHE_STATS
+    /* These variables are only for debugging purpose */
+    int            ntotal = 0;
+    /*
+     * nth element in nentries stores the number of cache entries that have
+     * lived unaccessed for corresponding multiple in ageclass of
+     * syscache_prune_min_age. The index of nremoved_entry is the value of the
+     * clock-sweep counter, which takes from 0 up to 2.
+     */
+    double        ageclass[] = {0.05, 0.1, 1.0, 2.0, 3.0, 0.0};
+    int            nentries[] = {0, 0, 0, 0, 0, 0};
+    int            nremoved_entry[3] = {0, 0, 0};
+    int            j;
+#endif
+
+    /* Return immediately if no pruning is wanted */
+    if (syscache_prune_min_age < 0)
+        return false;
+
+    /*
+     * Return without pruning if the size of the hash is below the target.
+     * Since the area for bucket array is dominant, consider only it.
+     */
+    hash_size = cp->cc_nbuckets * sizeof(dlist_head);
+    if (hash_size < (Size) syscache_memory_target * 1024L)
+        return false;
+    
+    /* Search the whole hash for entries to remove */
+    for (i = 0; i < cp->cc_nbuckets; i++)
+    {
+        dlist_mutable_iter iter;
+
+        dlist_foreach_modify(iter, &cp->cc_bucket[i])
+        {
+            CatCTup    *ct = dlist_container(CatCTup, cache_elem, iter.cur);
+            long entry_age;
+            int us;
+
+
+            /*
+             * Calculate the duration from the time of the last access to the
+             * "current" time. Since catcacheclock is not advanced within a
+             * transaction, the entries that are accessed within the current
+             * transaction won't be pruned.
+             */
+            TimestampDifference(ct->lastaccess, catcacheclock, &entry_age, &us);
+
+#ifdef CATCACHE_STATS
+            /* count catcache entries for each age class */
+            ntotal++;
+            for (j = 0 ;
+                 ageclass[j] != 0.0 &&
+                     entry_age > syscache_prune_min_age * ageclass[j] ;
+                 j++);
+            if (ageclass[j] == 0.0) j--;
+            nentries[j]++;
+#endif
+
+            /*
+             * Try to remove entries older than syscache_prune_min_age
+             * seconds.  Entries that are not accessed after last pruning are
+             * removed in that seconds, and that has been accessed several
+             * times are removed after leaving alone for up to three times of
+             * the duration. We don't try shrink buckets since pruning
+             * effectively caps catcache expansion in the long term.
+             */
+            if (entry_age > syscache_prune_min_age)
+            {
+#ifdef CATCACHE_STATS
+                Assert (ct->naccess >= 0 && ct->naccess <= 2);
+                nremoved_entry[ct->naccess]++;
+#endif
+                if (ct->naccess > 0)
+                    ct->naccess--;
+                else
+                {
+                    if (!ct->c_list || ct->c_list->refcount == 0)
+                    {
+                        CatCacheRemoveCTup(cp, ct);
+                        nremoved++;
+                    }
+                }
+            }
+        }
+    }
+
+#ifdef CATCACHE_STATS
+    ereport(DEBUG1,
+            (errmsg ("removed %d/%d, age(-%.0fs:%d, -%.0fs:%d, *-%.0fs:%d, -%.0fs:%d, -%.0fs:%d) naccessed(0:%d, 1:%d,
2:%d)",
+                     nremoved, ntotal,
+                     ageclass[0] * syscache_prune_min_age, nentries[0],
+                     ageclass[1] * syscache_prune_min_age, nentries[1],
+                     ageclass[2] * syscache_prune_min_age, nentries[2],
+                     ageclass[3] * syscache_prune_min_age, nentries[3],
+                     ageclass[4] * syscache_prune_min_age, nentries[4],
+                     nremoved_entry[0], nremoved_entry[1], nremoved_entry[2]),
+             errhidestmt(true)));
+#endif
+
+    return nremoved > 0;
+}
+
 /*
  * Enlarge a catcache, doubling the number of buckets.
  */
@@ -1282,6 +1417,11 @@ SearchCatCacheInternal(CatCache *cache,
          */
         dlist_move_head(bucket, &ct->cache_elem);
 
+        /* Update access information for pruning */
+        if (ct->naccess < 2)
+            ct->naccess++;
+        ct->lastaccess = catcacheclock;
+
         /*
          * If it's a positive entry, bump its refcount and return it. If it's
          * negative, we can report failure to the caller.
@@ -1813,7 +1953,6 @@ ReleaseCatCacheList(CatCList *list)
         CatCacheRemoveCList(list->my_cache, list);
 }
 
-
 /*
  * CatalogCacheCreateEntry
  *        Create a new CatCTup entry, copying the given HeapTuple and other
@@ -1906,6 +2045,8 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
     ct->dead = false;
     ct->negative = negative;
     ct->hash_value = hashValue;
+    ct->naccess = 0;
+    ct->lastaccess = catcacheclock;
 
     dlist_push_head(&cache->cc_bucket[hashIndex], &ct->cache_elem);
 
@@ -1913,10 +2054,13 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
     CacheHdr->ch_ntup++;
 
     /*
-     * If the hash table has become too full, enlarge the buckets array. Quite
-     * arbitrarily, we enlarge when fill factor > 2.
+     * If the hash table has become too full, try cleanup by removing
+     * infrequently used entries to make a room for the new entry. If it
+     * failed, enlarge the bucket array instead.  Quite arbitrarily, we try
+     * this when fill factor > 2.
      */
-    if (cache->cc_ntup > cache->cc_nbuckets * 2)
+    if (cache->cc_ntup > cache->cc_nbuckets * 2 &&
+        !CatCacheCleanupOldEntries(cache))
         RehashCatCache(cache);
 
     return ct;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a4f9b3668e..5e0d18657f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -78,6 +78,7 @@
 #include "tsearch/ts_cache.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
+#include "utils/catcache.h"
 #include "utils/guc_tables.h"
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
@@ -1972,6 +1973,28 @@ static struct config_int ConfigureNamesInt[] =
         NULL, NULL, NULL
     },
 
+    {
+        {"syscache_memory_target", PGC_USERSET, RESOURCES_MEM,
+            gettext_noop("Sets the minimum syscache size to keep."),
+            gettext_noop("Syscache is not pruned before exceeding this size."),
+            GUC_UNIT_KB
+        },
+        &syscache_memory_target,
+        0, 0, MAX_KILOBYTES,
+        NULL, NULL, NULL
+    },
+
+    {
+        {"syscache_prune_min_age", PGC_USERSET, RESOURCES_MEM,
+            gettext_noop("Sets the minimum duration of an unused syscache entry to remove."),
+            gettext_noop("Syscache entries that live unused for longer than this seconds are considered to be
removed."),
+            GUC_UNIT_S
+        },
+        &syscache_prune_min_age,
+        600, -1, INT_MAX,
+        NULL, NULL, NULL
+    },
+
     /*
      * We use the hopefully-safely-small value of 100kB as the compiled-in
      * default for max_stack_depth.  InitializeGUCOptions will increase it if
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 39272925fb..5a5729a88f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -124,6 +124,8 @@
 #work_mem = 4MB                # min 64kB
 #maintenance_work_mem = 64MB        # min 1MB
 #autovacuum_work_mem = -1        # min 1MB, or -1 to use maintenance_work_mem
+#syscache_memory_target = 0kB    # in kB. zero disables the feature
+#syscache_prune_min_age = 600s    # -1 disables the feature
 #max_stack_depth = 2MB            # min 100kB
 #dynamic_shared_memory_type = posix    # the default is the first option
                     # supported by the operating system:
diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h
index 7b22f9c7bc..c3c4d65998 100644
--- a/src/include/utils/catcache.h
+++ b/src/include/utils/catcache.h
@@ -22,6 +22,7 @@
 
 #include "access/htup.h"
 #include "access/skey.h"
+#include "datatype/timestamp.h"
 #include "lib/ilist.h"
 #include "utils/relcache.h"
 
@@ -119,6 +120,8 @@ typedef struct catctup
     bool        dead;            /* dead but not yet removed? */
     bool        negative;        /* negative cache entry? */
     HeapTupleData tuple;        /* tuple management header */
+    int            naccess;        /* # of access to this entry, up to 2  */
+    TimestampTz    lastaccess;        /* approx. timestamp of the last usage */
 
     /*
      * The tuple may also be a member of at most one CatCList.  (If a single
@@ -189,6 +192,22 @@ typedef struct catcacheheader
 /* this extern duplicates utils/memutils.h... */
 extern PGDLLIMPORT MemoryContext CacheMemoryContext;
 
+/* for guc.c, not PGDLLPMPORT'ed */
+extern int syscache_prune_min_age;
+extern int syscache_memory_target;
+
+/* to use as access timestamp of catcache entries */
+extern TimestampTz catcacheclock;
+
+/*
+ * SetCatCacheClock - set timestamp for catcache access record
+ */
+static inline void
+SetCatCacheClock(TimestampTz ts)
+{
+    catcacheclock = ts;
+}
+
 extern void CreateCacheMemoryContext(void);
 
 extern CatCache *InitCatCache(int id, Oid reloid, Oid indexoid,
-- 
2.16.2

From 74545dc6f52d42cf93d1353e205bb38581269c5f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Mon, 12 Mar 2018 15:52:18 +0900
Subject: [PATCH 2/3] introduce dynhash pruning

---
 src/backend/utils/hash/dynahash.c | 159 +++++++++++++++++++++++++++++++++-----
 src/include/utils/catcache.h      |  12 +++
 src/include/utils/hsearch.h       |  19 ++++-
 3 files changed, 170 insertions(+), 20 deletions(-)

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 5281cd5410..5a8b15652a 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -88,6 +88,7 @@
 #include "access/xact.h"
 #include "storage/shmem.h"
 #include "storage/spin.h"
+#include "utils/catcache.h"
 #include "utils/dynahash.h"
 #include "utils/memutils.h"
 
@@ -184,6 +185,8 @@ struct HASHHDR
     long        ssize;            /* segment size --- must be power of 2 */
     int            sshift;            /* segment shift = log2(ssize) */
     int            nelem_alloc;    /* number of entries to allocate at once */
+    bool        prunable;        /* true if prunable */
+    HASH_PRUNE_CB    prune_cb;    /* pruning callback. see above. */
 
 #ifdef HASH_STATISTICS
 
@@ -227,16 +230,18 @@ struct HTAB
     int            sshift;            /* segment shift = log2(ssize) */
 };
 
+#define HASHELEMENT_SIZE(ctlp) MAXALIGN(ctlp->prunable ? sizeof(PRUNABLE_HASHELEMENT) : sizeof(HASHELEMENT))
+
 /*
  * Key (also entry) part of a HASHELEMENT
  */
-#define ELEMENTKEY(helem)  (((char *)(helem)) + MAXALIGN(sizeof(HASHELEMENT)))
+#define ELEMENTKEY(helem, ctlp)  (((char *)(helem)) + HASHELEMENT_SIZE(ctlp))
 
 /*
  * Obtain element pointer given pointer to key
  */
-#define ELEMENT_FROM_KEY(key)  \
-    ((HASHELEMENT *) (((char *) (key)) - MAXALIGN(sizeof(HASHELEMENT))))
+#define ELEMENT_FROM_KEY(key, ctlp)                                        \
+    ((HASHELEMENT *) (((char *) (key)) - HASHELEMENT_SIZE(ctlp)))
 
 /*
  * Fast MOD arithmetic, assuming that y is a power of 2 !
@@ -257,6 +262,7 @@ static HASHSEGMENT seg_alloc(HTAB *hashp);
 static bool element_alloc(HTAB *hashp, int nelem, int freelist_idx);
 static bool dir_realloc(HTAB *hashp);
 static bool expand_table(HTAB *hashp);
+static bool prune_entries(HTAB *hashp);
 static HASHBUCKET get_hash_entry(HTAB *hashp, int freelist_idx);
 static void hdefault(HTAB *hashp);
 static int    choose_nelem_alloc(Size entrysize);
@@ -497,6 +503,17 @@ hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
         hctl->entrysize = info->entrysize;
     }
 
+    /*
+     * hash table runs pruning
+     */
+    if (flags & HASH_PRUNABLE)
+    {
+        hctl->prunable = true;
+        hctl->prune_cb = info->prune_cb;
+    }
+    else
+        hctl->prunable = false;
+
     /* make local copies of heavily-used constant fields */
     hashp->keysize = hctl->keysize;
     hashp->ssize = hctl->ssize;
@@ -982,7 +999,7 @@ hash_search_with_hash_value(HTAB *hashp,
     while (currBucket != NULL)
     {
         if (currBucket->hashvalue == hashvalue &&
-            match(ELEMENTKEY(currBucket), keyPtr, keysize) == 0)
+            match(ELEMENTKEY(currBucket, hctl), keyPtr, keysize) == 0)
             break;
         prevBucketPtr = &(currBucket->link);
         currBucket = *prevBucketPtr;
@@ -995,6 +1012,17 @@ hash_search_with_hash_value(HTAB *hashp,
     if (foundPtr)
         *foundPtr = (bool) (currBucket != NULL);
 
+    /* Update access counter if needed */
+    if (hctl->prunable && currBucket &&
+        (action == HASH_FIND || action == HASH_ENTER))
+    {
+        PRUNABLE_HASHELEMENT *prunable_elm =
+            (PRUNABLE_HASHELEMENT *) currBucket;
+        if (prunable_elm->naccess < 2)
+            prunable_elm->naccess++;
+        prunable_elm->last_access = GetCatCacheClock();
+    }
+
     /*
      * OK, now what?
      */
@@ -1002,7 +1030,8 @@ hash_search_with_hash_value(HTAB *hashp,
     {
         case HASH_FIND:
             if (currBucket != NULL)
-                return (void *) ELEMENTKEY(currBucket);
+                return (void *) ELEMENTKEY(currBucket, hctl);
+
             return NULL;
 
         case HASH_REMOVE:
@@ -1031,7 +1060,7 @@ hash_search_with_hash_value(HTAB *hashp,
                  * element, because someone else is going to reuse it the next
                  * time something is added to the table
                  */
-                return (void *) ELEMENTKEY(currBucket);
+                return (void *) ELEMENTKEY(currBucket, hctl);
             }
             return NULL;
 
@@ -1043,7 +1072,7 @@ hash_search_with_hash_value(HTAB *hashp,
         case HASH_ENTER:
             /* Return existing element if found, else create one */
             if (currBucket != NULL)
-                return (void *) ELEMENTKEY(currBucket);
+                return (void *) ELEMENTKEY(currBucket, hctl);
 
             /* disallow inserts if frozen */
             if (hashp->frozen)
@@ -1073,8 +1102,18 @@ hash_search_with_hash_value(HTAB *hashp,
 
             /* copy key into record */
             currBucket->hashvalue = hashvalue;
-            hashp->keycopy(ELEMENTKEY(currBucket), keyPtr, keysize);
+            hashp->keycopy(ELEMENTKEY(currBucket, hctl), keyPtr, keysize);
 
+            /* set access counter */
+            if (hctl->prunable)
+            {
+                PRUNABLE_HASHELEMENT *prunable_elm =
+                    (PRUNABLE_HASHELEMENT *) currBucket;
+                if (prunable_elm->naccess < 2)
+                    prunable_elm->naccess++;
+                prunable_elm->last_access = GetCatCacheClock();
+            }
+            
             /*
              * Caller is expected to fill the data field on return.  DO NOT
              * insert any code that could possibly throw error here, as doing
@@ -1082,7 +1121,7 @@ hash_search_with_hash_value(HTAB *hashp,
              * caller's data structure.
              */
 
-            return (void *) ELEMENTKEY(currBucket);
+            return (void *) ELEMENTKEY(currBucket, hctl);
     }
 
     elog(ERROR, "unrecognized hash action code: %d", (int) action);
@@ -1114,7 +1153,7 @@ hash_update_hash_key(HTAB *hashp,
                      void *existingEntry,
                      const void *newKeyPtr)
 {
-    HASHELEMENT *existingElement = ELEMENT_FROM_KEY(existingEntry);
+    HASHELEMENT *existingElement = ELEMENT_FROM_KEY(existingEntry, hashp->hctl);
     HASHHDR    *hctl = hashp->hctl;
     uint32        newhashvalue;
     Size        keysize;
@@ -1198,7 +1237,7 @@ hash_update_hash_key(HTAB *hashp,
     while (currBucket != NULL)
     {
         if (currBucket->hashvalue == newhashvalue &&
-            match(ELEMENTKEY(currBucket), newKeyPtr, keysize) == 0)
+            match(ELEMENTKEY(currBucket, hctl), newKeyPtr, keysize) == 0)
             break;
         prevBucketPtr = &(currBucket->link);
         currBucket = *prevBucketPtr;
@@ -1232,7 +1271,7 @@ hash_update_hash_key(HTAB *hashp,
 
     /* copy new key into record */
     currBucket->hashvalue = newhashvalue;
-    hashp->keycopy(ELEMENTKEY(currBucket), newKeyPtr, keysize);
+    hashp->keycopy(ELEMENTKEY(currBucket, hctl), newKeyPtr, keysize);
 
     /* rest of record is untouched */
 
@@ -1386,8 +1425,8 @@ hash_seq_init(HASH_SEQ_STATUS *status, HTAB *hashp)
 void *
 hash_seq_search(HASH_SEQ_STATUS *status)
 {
-    HTAB       *hashp;
-    HASHHDR    *hctl;
+    HTAB       *hashp = status->hashp;
+    HASHHDR    *hctl = hashp->hctl;
     uint32        max_bucket;
     long        ssize;
     long        segment_num;
@@ -1402,15 +1441,13 @@ hash_seq_search(HASH_SEQ_STATUS *status)
         status->curEntry = curElem->link;
         if (status->curEntry == NULL)    /* end of this bucket */
             ++status->curBucket;
-        return (void *) ELEMENTKEY(curElem);
+        return (void *) ELEMENTKEY(curElem, hctl);
     }
 
     /*
      * Search for next nonempty bucket starting at curBucket.
      */
     curBucket = status->curBucket;
-    hashp = status->hashp;
-    hctl = hashp->hctl;
     ssize = hashp->ssize;
     max_bucket = hctl->max_bucket;
 
@@ -1456,7 +1493,7 @@ hash_seq_search(HASH_SEQ_STATUS *status)
     if (status->curEntry == NULL)    /* end of this bucket */
         ++curBucket;
     status->curBucket = curBucket;
-    return (void *) ELEMENTKEY(curElem);
+    return (void *) ELEMENTKEY(curElem, hctl);
 }
 
 void
@@ -1550,6 +1587,10 @@ expand_table(HTAB *hashp)
      */
     if ((uint32) new_bucket > hctl->high_mask)
     {
+        /* try pruning before expansion. return true on success */
+        if (hctl->prunable && prune_entries(hashp))
+            return true;
+
         hctl->low_mask = hctl->high_mask;
         hctl->high_mask = (uint32) new_bucket | hctl->low_mask;
     }
@@ -1592,6 +1633,86 @@ expand_table(HTAB *hashp)
     return true;
 }
 
+static bool
+prune_entries(HTAB *hashp)
+{
+    HASHHDR           *hctl = hashp->hctl;
+    HASH_SEQ_STATUS status;
+    void            *elm;
+    TimestampTz        currclock = GetCatCacheClock();
+    int                nall = 0,
+                    nremoved = 0;
+
+    Assert(hctl->prunable);
+
+    /* not called for frozen or under seqscan. see
+     * hash_search_with_hash_value. */
+    Assert(IS_PARTITIONED(hctl) ||
+        hashp->frozen ||
+        hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) <
+        hctl->ffactor ||
+        has_seq_scans(hashp));
+
+    /* This setting prevents pruning */
+    if (syscache_prune_min_age < 0)
+        return false;
+
+    /*
+     * return false immediately if this hash is small enough. We only consider
+     * bucket array size since it is the significant part of memory usage.
+     * settings is shared with syscache
+     */
+    if (hctl->dsize * sizeof(HASHBUCKET) * hashp->ssize <
+        (Size) syscache_memory_target * 1024L)
+        return false;
+
+    /*
+     * Ok, this hash can be pruned. start pruning. This function is called
+     * early enough for doing this via public API.
+     */
+    hash_seq_init(&status, hashp);
+    while ((elm = hash_seq_search(&status)) != NULL)
+    {
+        PRUNABLE_HASHELEMENT *helm =
+            (PRUNABLE_HASHELEMENT *)ELEMENT_FROM_KEY(elm, hctl);
+        long    entry_age;
+        int        us;
+
+        nall++;
+
+        TimestampDifference(helm->last_access, currclock, &entry_age, &us);
+
+        /* settings is shared with syscache */
+        if (entry_age > syscache_prune_min_age)
+        {
+            /* Wait for the next chance if this is recently used */
+            if (helm->naccess > 0)
+                helm->naccess--;
+            else
+            {
+                /* just call it if callback is provided, remove otherwise */
+                if (hctl->prune_cb)
+                {
+                    if (hctl->prune_cb(hashp, (void *)elm))
+                        nremoved++;
+                }
+                else
+                {
+                    bool found;
+                    
+                    hash_search(hashp, elm, HASH_REMOVE, &found);
+                    Assert(found);
+                    nremoved++;
+                }
+            }
+        }
+    }
+
+    elog(DEBUG1, "removed %d/%d entries from hash \"%s\"",
+         nremoved, nall, hashp->tabname);
+
+    return nremoved > 0;
+}
 
 static bool
 dir_realloc(HTAB *hashp)
@@ -1665,7 +1786,7 @@ element_alloc(HTAB *hashp, int nelem, int freelist_idx)
         return false;
 
     /* Each element has a HASHELEMENT header plus user data. */
-    elementSize = MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(hctl->entrysize);
+    elementSize = HASHELEMENT_SIZE(hctl) + MAXALIGN(hctl->entrysize);
 
     CurrentDynaHashCxt = hashp->hcxt;
     firstElement = (HASHELEMENT *) hashp->alloc(nelem * elementSize);
diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h
index c3c4d65998..fcc680bb82 100644
--- a/src/include/utils/catcache.h
+++ b/src/include/utils/catcache.h
@@ -208,6 +208,18 @@ SetCatCacheClock(TimestampTz ts)
     catcacheclock = ts;
 }
 
+/*
+ * GetCatCacheClock - get timestamp for catcache access record
+ *
+ * This clock is basically provided for catcache usage, but dynahash has a
+ * similar pruning mechanism and wants to use the same clock.
+ */
+static inline TimestampTz
+GetCatCacheClock(void)
+{
+    return catcacheclock;
+}
+
 extern void CreateCacheMemoryContext(void);
 
 extern CatCache *InitCatCache(int id, Oid reloid, Oid indexoid,
diff --git a/src/include/utils/hsearch.h b/src/include/utils/hsearch.h
index 8357faac5a..df12352a46 100644
--- a/src/include/utils/hsearch.h
+++ b/src/include/utils/hsearch.h
@@ -13,7 +13,7 @@
  */
 #ifndef HSEARCH_H
 #define HSEARCH_H
-
+#include "datatype/timestamp.h"
 
 /*
  * Hash functions must have this signature.
@@ -47,6 +47,7 @@ typedef void *(*HashAllocFunc) (Size request);
  * HASHELEMENT is the private part of a hashtable entry.  The caller's data
  * follows the HASHELEMENT structure (on a MAXALIGN'd boundary).  The hash key
  * is expected to be at the start of the caller's hash entry data structure.
+ * If this hash is prunable, PRUNABLE_HASHELEMENT is used instead.
  */
 typedef struct HASHELEMENT
 {
@@ -54,12 +55,26 @@ typedef struct HASHELEMENT
     uint32        hashvalue;        /* hash function result for this entry */
 } HASHELEMENT;
 
+typedef struct PRUNABLE_HASHELEMENT
+{
+    struct HASHELEMENT *link;    /* link to next entry in same bucket */
+    uint32        hashvalue;        /* hash function result for this entry */
+    TimestampTz    last_access;    /* timestamp of the last usage */
+    int            naccess;        /* takes 0 to 2, counted up when used */
+} PRUNABLE_HASHELEMENT;
+
 /* Hash table header struct is an opaque type known only within dynahash.c */
 typedef struct HASHHDR HASHHDR;
 
 /* Hash table control struct is an opaque type known only within dynahash.c */
 typedef struct HTAB HTAB;
 
+/*
+ * Hash pruning callback. This is called for the entries which is about to be
+ * removed without the owner's intention.
+ */
+typedef bool (*HASH_PRUNE_CB)(HTAB *hashp, void *ent);
+
 /* Parameter data structure for hash_create */
 /* Only those fields indicated by hash_flags need be set */
 typedef struct HASHCTL
@@ -77,6 +92,7 @@ typedef struct HASHCTL
     HashAllocFunc alloc;        /* memory allocator */
     MemoryContext hcxt;            /* memory context to use for allocations */
     HASHHDR    *hctl;            /* location of header in shared mem */
+    HASH_PRUNE_CB    prune_cb;    /* pruning callback. see above. */
 } HASHCTL;
 
 /* Flags to indicate which parameters are supplied */
@@ -94,6 +110,7 @@ typedef struct HASHCTL
 #define HASH_SHARED_MEM 0x0800    /* Hashtable is in shared memory */
 #define HASH_ATTACH        0x1000    /* Do not initialize hctl */
 #define HASH_FIXED_SIZE 0x2000    /* Initial size is a hard limit */
+#define HASH_PRUNABLE    0x4000    /* pruning setting */
 
 
 /* max_dsize value to indicate expansible directory */
-- 
2.16.2

From debface28e2261b0d819c46e52942ba500143581 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Mon, 12 Mar 2018 17:31:43 +0900
Subject: [PATCH 3/3] Apply purning to relcache

---
 src/backend/utils/cache/relcache.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 9ee78f885f..f344771d57 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3503,6 +3503,29 @@ RelationSetNewRelfilenode(Relation relation, char persistence,
 
 #define INITRELCACHESIZE        400
 
+/* callback function for hash pruning */
+static bool
+relcache_prune_cb(HTAB *hashp, void *ent)
+{
+    RelIdCacheEnt  *relent = (RelIdCacheEnt *) ent;
+    Relation        relation;
+
+    /* this relation is requested to be removed.  */
+    RelationIdCacheLookup(relent->reloid, relation);
+
+    /* but cannot removed an active cache entry */
+    if (!RelationHasReferenceCountZero(relation))
+        return false;
+
+    /*
+     * Otherwise we are allowd to forget it unconditionally. see
+     * RelationForgetRelation
+     */
+    RelationClearRelation(relation, false);
+
+    return true;
+}
+
 void
 RelationCacheInitialize(void)
 {
@@ -3520,8 +3543,11 @@ RelationCacheInitialize(void)
     MemSet(&ctl, 0, sizeof(ctl));
     ctl.keysize = sizeof(Oid);
     ctl.entrysize = sizeof(RelIdCacheEnt);
+
+    /* use the same setting with syscache */
+    ctl.prune_cb = relcache_prune_cb;
     RelationIdCache = hash_create("Relcache by OID", INITRELCACHESIZE,
-                                  &ctl, HASH_ELEM | HASH_BLOBS);
+                                  &ctl, HASH_ELEM | HASH_BLOBS | HASH_PRUNABLE);
 
     /*
      * relation mapper needs to be initialized too
-- 
2.16.2


pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: GSOC 2018 proposal
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Protect syscache from bloating with negative cache entries