Re: Protect syscache from bloating with negative cache entries - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: Protect syscache from bloating with negative cache entries |
Date | |
Msg-id | 20190212.203628.118792892.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | RE: Protect syscache from bloating with negative cache entries ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>) |
Responses |
Re: Protect syscache from bloating with negative cache entries
|
List | pgsql-hackers |
At Tue, 12 Feb 2019 01:02:39 +0000, "Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> wrote in <0A3221C70F24FB45833433255569204D1FB972A6@G01JPEXMBYT05> > From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp] > > Recuded frequency of dlist_move_tail by taking 1ms interval between two > > succesive updates on the same entry let the degradation dissapear. > > > > patched : 13720 tps (+2%) > > What do you think contributed to this performance increase? Or do you hink this is just a measurement variation? > > Most of my previous comments also seem to apply to v13, so let me repost them below: > > > (1) > > (1) > +/* GUC variable to define the minimum age of entries that will be cosidered to > + /* initilize catcache reference clock if haven't done yet */ > > cosidered -> considered > initilize -> initialize Fixed. I found "databsae", "temprary", "resturns", "If'force'"(missing space), "aginst", "maintan". And all fixed. > I remember I saw some other wrong spelling and/or missing words, which I forgot (sorry). Thank you for pointing some of them. > (2) > Only the doc prefixes "sys" to the new parameter names. Other places don't have it. I think we should prefix sys, becauserelcache and plancache should be configurable separately because of their different usage patterns/lifecycle. I tend to agree. They are already removed in this patchset. The names are changed to "catalog_cache_*" in the new version. > (3) > The doc doesn't describe the unit of syscache_memory_target. Kilobytes? Added. > (4) > + hash_size = cp->cc_nbuckets * sizeof(dlist_head); > + tupsize = sizeof(CatCTup) + MAXIMUM_ALIGNOF + dtp->t_len; > + tupsize = sizeof(CatCTup); > > GetMemoryChunkSpace() should be used to include the memory context overhead. That's what the files in src/backend/utils/sort/do. Thanks. Done. Include bucket and cache header part but still excluding clist. Renamed from tupsize to memusage. > (5) > + if (entry_age > cache_prune_min_age) > > ">=" instead of ">"? I didn't get it serious, but it is better. Fixed. > (6) > + if (!ct->c_list || ct->c_list->refcount == 0) > + { > + CatCacheRemoveCTup(cp, ct); > > It's better to write "ct->c_list == NULL" to follow the style in this file. > > "ct->refcount == 0" should also be checked prior to removing the catcache tuple, just in case the tuple hasn't been releasedfor a long time, which might hardly happen. Yeah, I fixed it in v12. This no longer removes an entry in use. (if (c_list) is used in the file.) > (7) > CatalogCacheCreateEntry > > + int tupsize = 0; > if (ntp) > { > int i; > + int tupsize; > > tupsize is defined twice. The second tupsize was bogus, but the first is removed in this version. Now memory usage of an entry is calculated as a chunk size. > (8) > CatalogCacheCreateEntry > > In the negative entry case, the memory allocated by CatCacheCopyKeys() is not counted. I'm afraid that's not negligible. Right. Fixed. > (9) > The memory for CatCList is not taken into account for syscache_memory_target. Yeah, this is intensional since CatCacheList is short lived. Comment added. | * Don't waste a time by counting the list in catcache memory usage, | * since a list doesn't persist for a long time | */ | cl = (CatCList *) | palloc(offsetof(CatCList, members) + nmembers * sizeof(CatCTup *)); Please fine the attached, which is the new version v14 addressing Tomas', Ideriha-san and your comments. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 3b24233b1891b967ccac65a4d21ed0207037578b Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Thu, 7 Feb 2019 14:56:07 +0900 Subject: [PATCH 1/3] Add dlist_move_tail We have dlist_push_head/tail and dlist_move_head but not dlist_move_tail. Add it. --- src/include/lib/ilist.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h index b1a5974ee4..659ab1ac87 100644 --- a/src/include/lib/ilist.h +++ b/src/include/lib/ilist.h @@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node) dlist_check(head); } +/* + * Move element from its current position in the list to the tail position in + * the same list. + * + * Undefined behaviour if 'node' is not already part of the list. + */ +static inline void +dlist_move_tail(dlist_head *head, dlist_node *node) +{ + /* fast path if it's already at the tail */ + if (head->head.prev == node) + return; + + dlist_delete(node); + dlist_push_tail(head, node); + + dlist_check(head); +} + /* * Check whether 'node' has a following node. * Caution: unreliable if 'node' is not in the list. -- 2.16.3 From 5031833af1777c4c6a6bf8daf32b6a3f428ccd79 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Tue, 16 Oct 2018 13:04:30 +0900 Subject: [PATCH 2/3] Remove entries that haven't been used for a certain time Catcache entries can be left alone for several reasons. It is not desirable that they eat up memory. With this patch, This adds consideration of removal of entries that haven't been used for a certain time before enlarging the hash array. This also can put a hard limit on the number of catcache entries. --- doc/src/sgml/config.sgml | 41 ++++ src/backend/tcop/postgres.c | 13 ++ src/backend/utils/cache/catcache.c | 285 +++++++++++++++++++++++++- src/backend/utils/init/globals.c | 1 + src/backend/utils/init/postinit.c | 11 + src/backend/utils/misc/guc.c | 43 ++++ src/backend/utils/misc/postgresql.conf.sample | 2 + src/include/miscadmin.h | 1 + src/include/utils/catcache.h | 49 ++++- src/include/utils/timeout.h | 1 + 10 files changed, 440 insertions(+), 7 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 07b847a8e9..71d784b6fe 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1661,6 +1661,47 @@ include_dir 'conf.d' </listitem> </varlistentry> + <varlistentry id="guc-catalog-cache-prune-min-age" xreflabel="catalog_cache_prune_min_age"> + <term><varname>catalog_cache_prune_min_age</varname> (<type>integer</type>) + <indexterm> + <primary><varname>catalog_cache_prune_min_age</varname> configuration + parameter</primary> + </indexterm> + </term> + <listitem> + <para> + + Specifies the minimum amount of unused time in seconds at which a + catalog cache entry is considered to be removed. -1 indicates that + this feature is disabled at all. The value defaults to 300 seconds + (<literal>5 minutes</literal>). The catalog cache entries that are + not used for the duration can be removed to prevent bloat. This + behavior is suppressed until the size of a catalog cache exceeds + <xref linkend="guc-catalog-cache-memory-target"/>. + </para> + </listitem> + </varlistentry> + + <varlistentry id="guc-catalog-cache-memory-target" xreflabel="catalog_cache_memory_target"> + <term><varname>catalog_cache_memory_target</varname> (<type>integer</type>) + <indexterm> + <primary><varname>syscache_memory_target</varname> configuration + parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Specifies the maximum amount of memory to which syscache is expanded + without pruning in kilobytes. The value defaults to 0, indicating that + pruning is always considered. After exceeding this size, catalog cache + pruning is considered according to + <xref linkend="guc-catalog-cache-prune-min-age"/>. If you need to keep + certain amount of catalog cache entries with intermittent usage, try + increase this setting. + </para> + </listitem> + </varlistentry> + <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth"> <term><varname>max_stack_depth</varname> (<type>integer</type>) <indexterm> diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 36cfd507b2..f192ee2ca6 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -71,6 +71,7 @@ #include "tcop/pquery.h" #include "tcop/tcopprot.h" #include "tcop/utility.h" +#include "utils/catcache.h" #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/ps_status.h" @@ -2584,6 +2585,7 @@ start_xact_command(void) * not desired, the timeout has to be disabled explicitly. */ enable_statement_timeout(); + SetCatCacheClock(GetCurrentStatementStartTimestamp()); } static void @@ -3159,6 +3161,14 @@ ProcessInterrupts(void) if (ParallelMessagePending) HandleParallelMessages(); + + if (CatcacheClockTimeoutPending) + { + CatcacheClockTimeoutPending = 0; + + /* Update timetamp then set up the next timeout */ + UpdateCatCacheClock(); + } } @@ -4021,6 +4031,9 @@ PostgresMain(int argc, char *argv[], QueryCancelPending = false; /* second to avoid race condition */ stmt_timeout_active = false; + /* get sync with the timer state */ + catcache_clock_timeout_active = false; + /* Not reading from the client anymore. */ DoingCommandRead = false; diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 258a1d64cc..0195e19976 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -39,6 +39,7 @@ #include "utils/rel.h" #include "utils/resowner_private.h" #include "utils/syscache.h" +#include "utils/timeout.h" /* #define CACHEDEBUG */ /* turns DEBUG elogs on */ @@ -71,9 +72,43 @@ #define CACHE6_elog(a,b,c,d,e,f,g) #endif +/* GUC variable to define the minimum age of entries that will be considered to + * be evicted in seconds. This variable is shared among various cache + * mechanisms. + */ +int catalog_cache_prune_min_age = 300; + +/* + * GUC variable to define the minimum size of hash to cosider entry eviction. + * This variable is shared among various cache mechanisms. + */ +int catalog_cache_memory_target = 0; + +/* + * GUC for limit by the number of entries. Entries are removed when the number + * of them goes above catalog_cache_entry_limit and leaving newer entries by + * the ratio specified by catalog_cache_prune_ratio. + */ +int catalog_cache_entry_limit = 0; +double catalog_cache_prune_ratio = 0.8; + +/* + * Flag to keep track of whether catcache clock timer is active. + */ +bool catcache_clock_timeout_active = false; + +/* + * Minimum interval between two success move of a cache entry in LRU list, + * in microseconds. + */ +#define MIN_LRU_UPDATE_INTERVAL 100000 /* 100ms */ + /* Cache management header --- pointer is NULL until created */ static CatCacheHeader *CacheHdr = NULL; +/* Clock used to record the last accessed time of a catcache record. */ +TimestampTz catcacheclock = 0; + static inline HeapTuple SearchCatCacheInternal(CatCache *cache, int nkeys, Datum v1, Datum v2, @@ -481,6 +516,7 @@ CatCacheRemoveCTup(CatCache *cache, CatCTup *ct) /* delink from linked list */ dlist_delete(&ct->cache_elem); + dlist_delete(&ct->lru_node); /* * Free keys when we're dealing with a negative entry, normal entries just @@ -490,6 +526,7 @@ CatCacheRemoveCTup(CatCache *cache, CatCTup *ct) CatCacheFreeKeys(cache->cc_tupdesc, cache->cc_nkeys, cache->cc_keyno, ct->keys); + cache->cc_memusage -= ct->size; pfree(ct); --cache->cc_ntup; @@ -841,7 +878,13 @@ InitCatCache(int id, cp->cc_nkeys = nkeys; for (i = 0; i < nkeys; ++i) cp->cc_keyno[i] = key[i]; + cp->cc_memusage = + CacheMemoryContext->methods->get_chunk_space(CacheMemoryContext, + cp) + + CacheMemoryContext->methods->get_chunk_space(CacheMemoryContext, + cp->cc_bucket); + dlist_init(&cp->cc_lru_list); /* * new cache is initialized as far as we can go for now. print some * debugging information, if appropriate. @@ -858,9 +901,191 @@ InitCatCache(int id, */ MemoryContextSwitchTo(oldcxt); + /* initialize catcache reference clock if haven't done yet */ + if (catcacheclock == 0) + catcacheclock = GetCurrentTimestamp(); + return cp; } +/* + * helper routine for SetCatCacheClock and UpdateCatCacheClockTimer. + * + * We need to maintain the catcache clock during a long query. + */ +void +SetupCatCacheClockTimer(void) +{ + long delay; + + /* stop timer if not needed */ + if (catalog_cache_prune_min_age == 0) + { + catcache_clock_timeout_active = false; + return; + } + + /* One 10th of the variable, in milliseconds */ + delay = catalog_cache_prune_min_age * 1000/10; + + /* Lower limit is 1 second */ + if (delay < 1000) + delay = 1000; + + enable_timeout_after(CATCACHE_CLOCK_TIMEOUT, delay); + + catcache_clock_timeout_active = true; +} + +/* + * Update catcacheclock: this is intended to be called from + * CATCACHE_CLOCK_TIMEOUT. The interval is expected more than 1 second (see + * above), so GetCurrentTime() doesn't harm. + */ +void +UpdateCatCacheClock(void) +{ + catcacheclock = GetCurrentTimestamp(); + SetupCatCacheClockTimer(); +} + +/* + * It may take an unexpectedly long time before the next clock update when + * catalog_cache_prune_min_age gets shorter. Disabling the current timer let + * the next update happen at the expected interval. We don't necessariry + * require this for increase the age but we don't need to avoid to disable + * either. + */ +void +assign_catalog_cache_prune_min_age(int newval, void *extra) +{ + if (catcache_clock_timeout_active) + disable_timeout(CATCACHE_CLOCK_TIMEOUT, false); + + catcache_clock_timeout_active = false; +} + +/* + * CatCacheCleanupOldEntries - Remove infrequently-used entries + * + * Catcache entries can be left alone for several reasons. We remove them if + * they are not accessed for a certain time to prevent catcache from + * bloating. The eviction is performed with the similar algorithm with buffer + * eviction using access counter. Entries that are accessed several times can + * live longer than those that have had less access in the same duration. + */ +static bool +CatCacheCleanupOldEntries(CatCache *cp) +{ + int nremoved = 0; + size_t hash_size; + int nelems_before = cp->cc_ntup; + int ndelelems = 0; + bool prune_by_age = false; + bool prune_by_number = false; + dlist_mutable_iter iter; + + if (catalog_cache_prune_min_age >= 0) + { + /* prune only if the size of the hash is above the target */ + + hash_size = cp->cc_nbuckets * sizeof(dlist_head); + if (hash_size + cp->cc_memusage > + (Size) catalog_cache_memory_target * 1024L) + prune_by_age = true; + } + + if (catalog_cache_entry_limit > 0 && + nelems_before >= catalog_cache_entry_limit) + { + ndelelems = nelems_before - + (int) (catalog_cache_entry_limit * catalog_cache_prune_ratio); + + /* an arbitrary lower limit.. */ + if (ndelelems < 256) + ndelelems = 256; + if (ndelelems > nelems_before) + ndelelems = nelems_before; + + prune_by_number = true; + } + + /* Return immediately if no pruning is wanted */ + if (!prune_by_age && !prune_by_number) + return false; + + /* Scan over LRU to find entries to remove */ + dlist_foreach_modify(iter, &cp->cc_lru_list) + { + CatCTup *ct = dlist_container(CatCTup, lru_node, iter.cur); + bool remove_this = false; + + /* We don't remove referenced entry */ + if (ct->refcount != 0 || + (ct->c_list && ct->c_list->refcount != 0)) + continue; + + /* check against age */ + if (prune_by_age) + { + long entry_age; + int us; + + /* + * Calculate the duration from the time of the last access to the + * "current" time. Since catcacheclock is not advanced within a + * transaction, the entries that are accessed within the current + * transaction won't be pruned. + */ + TimestampDifference(ct->lastaccess, catcacheclock, &entry_age, &us); + + if (entry_age < catalog_cache_prune_min_age) + { + /* no longer have a business with further entries, exit */ + prune_by_age = false; + break; + } + /* + * Entries that are not accessed after last pruning are removed in + * that seconds, and that has been accessed several times are + * removed after leaving alone for up to three times of the + * duration. We don't try shrink buckets since pruning effectively + * caps catcache expansion in the long term. + */ + if (ct->naccess > 0) + ct->naccess--; + else + remove_this = true; + } + + /* check against entry number */ + if (prune_by_number) + { + if (nremoved < ndelelems) + remove_this = true; + else + prune_by_number = false; /* we're satisfied */ + } + + /* exit immediately if all finished */ + if (!prune_by_age && !prune_by_number) + break; + + /* do the work */ + if (remove_this) + { + CatCacheRemoveCTup(cp, ct); + nremoved++; + } + } + + if (nremoved > 0) + elog(DEBUG1, "pruning catalog cache id=%d for %s: removed %d / %d", + cp->id, cp->cc_relname, nremoved, nelems_before); + + return nremoved > 0; +} + /* * Enlarge a catcache, doubling the number of buckets. */ @@ -878,6 +1103,13 @@ RehashCatCache(CatCache *cp) newnbuckets = cp->cc_nbuckets * 2; newbucket = (dlist_head *) MemoryContextAllocZero(CacheMemoryContext, newnbuckets * sizeof(dlist_head)); + /* recalculate memory usage from the first */ + cp->cc_memusage = + CacheMemoryContext->methods->get_chunk_space(CacheMemoryContext, + cp) + + CacheMemoryContext->methods->get_chunk_space(CacheMemoryContext, + newbucket); + /* Move all entries from old hash table to new. */ for (i = 0; i < cp->cc_nbuckets; i++) { @@ -890,6 +1122,7 @@ RehashCatCache(CatCache *cp) dlist_delete(iter.cur); dlist_push_head(&newbucket[hashIndex], &ct->cache_elem); + cp->cc_memusage += ct->size; } } @@ -1274,6 +1507,21 @@ SearchCatCacheInternal(CatCache *cache, */ dlist_move_head(bucket, &ct->cache_elem); + /* Update access information for pruning */ + if (ct->naccess < 2) + ct->naccess++; + + /* + * We don't want too frequent update of + * LRU. catalog_cache_prune_min_age can be changed on-session so we + * need to maintain the LRU regardless of catalog_cache_prune_min_age. + */ + if (catcacheclock - ct->lastaccess > MIN_LRU_UPDATE_INTERVAL) + { + ct->lastaccess = catcacheclock; + dlist_move_tail(&cache->cc_lru_list, &ct->lru_node); + } + /* * If it's a positive entry, bump its refcount and return it. If it's * negative, we can report failure to the caller. @@ -1709,6 +1957,11 @@ SearchCatCacheList(CatCache *cache, /* Now we can build the CatCList entry. */ oldcxt = MemoryContextSwitchTo(CacheMemoryContext); nmembers = list_length(ctlist); + + /* + * Don't waste a time by counting the list in catcache memory usage, + * since it doesn't live a long life. + */ cl = (CatCList *) palloc(offsetof(CatCList, members) + nmembers * sizeof(CatCTup *)); @@ -1824,6 +2077,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, if (ntp) { int i; + int tupsize; Assert(!negative); @@ -1842,8 +2096,8 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, /* Allocate memory for CatCTup and the cached tuple in one go */ oldcxt = MemoryContextSwitchTo(CacheMemoryContext); - ct = (CatCTup *) palloc(sizeof(CatCTup) + - MAXIMUM_ALIGNOF + dtp->t_len); + tupsize = sizeof(CatCTup) + MAXIMUM_ALIGNOF + dtp->t_len; + ct = (CatCTup *) palloc(tupsize); ct->tuple.t_len = dtp->t_len; ct->tuple.t_self = dtp->t_self; ct->tuple.t_tableOid = dtp->t_tableOid; @@ -1877,7 +2131,6 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, Assert(negative); oldcxt = MemoryContextSwitchTo(CacheMemoryContext); ct = (CatCTup *) palloc(sizeof(CatCTup)); - /* * Store keys - they'll point into separately allocated memory if not * by-value. @@ -1898,18 +2151,38 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, ct->dead = false; ct->negative = negative; ct->hash_value = hashValue; + ct->naccess = 0; + ct->lastaccess = catcacheclock; + dlist_push_tail(&cache->cc_lru_list, &ct->lru_node); dlist_push_head(&cache->cc_bucket[hashIndex], &ct->cache_elem); cache->cc_ntup++; CacheHdr->ch_ntup++; + ct->size = + CacheMemoryContext->methods->get_chunk_space(CacheMemoryContext, + ct); + cache->cc_memusage += ct->size; + + /* increase refcount so that this survives pruning */ + ct->refcount++; + /* - * If the hash table has become too full, enlarge the buckets array. Quite - * arbitrarily, we enlarge when fill factor > 2. + * If the hash table has become too full, try cleanup by removing + * infrequently used entries to make a room for the new entry. If it + * failed, enlarge the bucket array instead. Quite arbitrarily, we try + * this when fill factor > 2. */ - if (cache->cc_ntup > cache->cc_nbuckets * 2) + if (cache->cc_ntup > cache->cc_nbuckets * 2 && + !CatCacheCleanupOldEntries(cache)) RehashCatCache(cache); + /* we may still want to prune by entry number, check it */ + else if (catalog_cache_entry_limit > 0 && + cache->cc_ntup > catalog_cache_entry_limit) + CatCacheCleanupOldEntries(cache); + + ct->refcount--; return ct; } diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c index fd51934aaf..0e8b972a29 100644 --- a/src/backend/utils/init/globals.c +++ b/src/backend/utils/init/globals.c @@ -32,6 +32,7 @@ volatile sig_atomic_t QueryCancelPending = false; volatile sig_atomic_t ProcDiePending = false; volatile sig_atomic_t ClientConnectionLost = false; volatile sig_atomic_t IdleInTransactionSessionTimeoutPending = false; +volatile sig_atomic_t CatcacheClockTimeoutPending = false; volatile sig_atomic_t ConfigReloadPending = false; volatile uint32 InterruptHoldoffCount = 0; volatile uint32 QueryCancelHoldoffCount = 0; diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index a5ee209f91..9eb50e9676 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -72,6 +72,7 @@ static void ShutdownPostgres(int code, Datum arg); static void StatementTimeoutHandler(void); static void LockTimeoutHandler(void); static void IdleInTransactionSessionTimeoutHandler(void); +static void CatcacheClockTimeoutHandler(void); static bool ThereIsAtLeastOneRole(void); static void process_startup_options(Port *port, bool am_superuser); static void process_settings(Oid databaseid, Oid roleid); @@ -628,6 +629,8 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, RegisterTimeout(LOCK_TIMEOUT, LockTimeoutHandler); RegisterTimeout(IDLE_IN_TRANSACTION_SESSION_TIMEOUT, IdleInTransactionSessionTimeoutHandler); + RegisterTimeout(CATCACHE_CLOCK_TIMEOUT, + CatcacheClockTimeoutHandler); } /* @@ -1238,6 +1241,14 @@ IdleInTransactionSessionTimeoutHandler(void) SetLatch(MyLatch); } +static void +CatcacheClockTimeoutHandler(void) +{ + CatcacheClockTimeoutPending = true; + InterruptPending = true; + SetLatch(MyLatch); +} + /* * Returns true if at least one role is defined in this database cluster. */ diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 41d477165c..c62d5ad8b8 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -81,6 +81,7 @@ #include "tsearch/ts_cache.h" #include "utils/builtins.h" #include "utils/bytea.h" +#include "utils/catcache.h" #include "utils/guc_tables.h" #include "utils/float.h" #include "utils/memutils.h" @@ -2205,6 +2206,38 @@ static struct config_int ConfigureNamesInt[] = NULL, NULL, NULL }, + { + {"catalog_cache_prune_min_age", PGC_USERSET, RESOURCES_MEM, + gettext_noop("Sets the minimum unused duration of cache entries before removal."), + gettext_noop("Catalog cache entries that live unused for longer than this seconds are considered to be removed."), + GUC_UNIT_S + }, + &catalog_cache_prune_min_age, + 300, -1, INT_MAX, + NULL, assign_catalog_cache_prune_min_age, NULL + }, + + { + {"catalog_cache_memory_target", PGC_USERSET, RESOURCES_MEM, + gettext_noop("Sets the minimum syscache size to keep."), + gettext_noop("Time-based cache pruning starts working after exceeding this size."), + GUC_UNIT_KB + }, + &catalog_cache_memory_target, + 0, 0, MAX_KILOBYTES, + NULL, NULL, NULL + }, + + { + {"catalog_cache_entry_limit", PGC_USERSET, RESOURCES_MEM, + gettext_noop("Sets the maximum entries of catcache."), + NULL + }, + &catalog_cache_entry_limit, + 0, 0, INT_MAX, + NULL, NULL, NULL + }, + /* * We use the hopefully-safely-small value of 100kB as the compiled-in * default for max_stack_depth. InitializeGUCOptions will increase it if @@ -3368,6 +3401,16 @@ static struct config_real ConfigureNamesReal[] = NULL, NULL, NULL }, + { + {"catalog_cache_prune_ratio", PGC_USERSET, RESOURCES_MEM, + gettext_noop("Reduce ratio of pruning caused by catalog_cache_entry_limit."), + NULL + }, + &catalog_cache_prune_ratio, + 0.8, 0.0, 1.0, + NULL, NULL, NULL + }, + /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, 0.0, 0.0, 0.0, NULL, NULL, NULL diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index ad6c436f93..aeb5968e75 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -128,6 +128,8 @@ #work_mem = 4MB # min 64kB #maintenance_work_mem = 64MB # min 1MB #autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem +#catalog_cache_memory_target = 0kB # in kB +#catalog_cache_prune_min_age = 300s # -1 disables pruning #max_stack_depth = 2MB # min 100kB #shared_memory_type = mmap # the default is the first option # supported by the operating system: diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index c9e35003a5..33b800e80f 100644 --- a/src/include/miscadmin.h +++ b/src/include/miscadmin.h @@ -82,6 +82,7 @@ extern PGDLLIMPORT volatile sig_atomic_t InterruptPending; extern PGDLLIMPORT volatile sig_atomic_t QueryCancelPending; extern PGDLLIMPORT volatile sig_atomic_t ProcDiePending; extern PGDLLIMPORT volatile sig_atomic_t IdleInTransactionSessionTimeoutPending; +extern PGDLLIMPORT volatile sig_atomic_t CatcacheClockTimeoutPending; extern PGDLLIMPORT volatile sig_atomic_t ConfigReloadPending; extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost; diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h index 65d816a583..0425fc0786 100644 --- a/src/include/utils/catcache.h +++ b/src/include/utils/catcache.h @@ -22,6 +22,7 @@ #include "access/htup.h" #include "access/skey.h" +#include "datatype/timestamp.h" #include "lib/ilist.h" #include "utils/relcache.h" @@ -61,6 +62,10 @@ typedef struct catcache slist_node cc_next; /* list link */ ScanKeyData cc_skey[CATCACHE_MAXKEYS]; /* precomputed key info for heap * scans */ + dlist_head cc_lru_list; + int cc_memusage; /* memory usage of this catcache (excluding + * header part) */ + int cc_nfreeent; /* # of entries currently not referenced */ /* * Keep these at the end, so that compiling catcache.c with CATCACHE_STATS @@ -119,7 +124,10 @@ typedef struct catctup bool dead; /* dead but not yet removed? */ bool negative; /* negative cache entry? */ HeapTupleData tuple; /* tuple management header */ - + int naccess; /* # of access to this entry, up to 2 */ + TimestampTz lastaccess; /* approx. timestamp of the last usage */ + dlist_node lru_node; /* LRU node */ + int size; /* palloc'ed size off this tuple */ /* * The tuple may also be a member of at most one CatCList. (If a single * catcache is list-searched with varying numbers of keys, we may have to @@ -189,6 +197,45 @@ typedef struct catcacheheader /* this extern duplicates utils/memutils.h... */ extern PGDLLIMPORT MemoryContext CacheMemoryContext; +/* for guc.c, not PGDLLPMPORT'ed */ +extern int catalog_cache_prune_min_age; +extern int catalog_cache_memory_target; +extern int catalog_cache_entry_limit; +extern double catalog_cache_prune_ratio; + +/* to use as access timestamp of catcache entries */ +extern TimestampTz catcacheclock; + +/* + * Flag to keep track of whether catcache timestamp timer is active. + */ +extern bool catcache_clock_timeout_active; + +/* catcache prune time helper functions */ +extern void SetupCatCacheClockTimer(void); +extern void UpdateCatCacheClock(void); + +/* + * SetCatCacheClock - set timestamp for catcache access record and start + * maintenance timer if needed. We keep to update the clock even while pruning + * is disable so that we are not confused by bogus clock value. + */ +static inline void +SetCatCacheClock(TimestampTz ts) +{ + catcacheclock = ts; + + if (!catcache_clock_timeout_active && catalog_cache_prune_min_age > 0) + SetupCatCacheClockTimer(); +} + +static inline TimestampTz +GetCatCacheClock(void) +{ + return catcacheclock; +} + +extern void assign_catalog_cache_prune_min_age(int newval, void *extra); extern void CreateCacheMemoryContext(void); extern CatCache *InitCatCache(int id, Oid reloid, Oid indexoid, diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h index 9244a2a7b7..b2d97b4f7b 100644 --- a/src/include/utils/timeout.h +++ b/src/include/utils/timeout.h @@ -31,6 +31,7 @@ typedef enum TimeoutId STANDBY_TIMEOUT, STANDBY_LOCK_TIMEOUT, IDLE_IN_TRANSACTION_SESSION_TIMEOUT, + CATCACHE_CLOCK_TIMEOUT, /* First user-definable timeout reason */ USER_TIMEOUT, /* Maximum number of timeout reasons */ -- 2.16.3 From ea9d43f623d093bc1276fd1d5480e5cff6097d60 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Tue, 12 Feb 2019 20:31:16 +0900 Subject: [PATCH 3/3] Syscache usage tracking feature Collects syscache usage statictics and show it using the view pg_stat_syscache. The feature is controlled by the GUC variable track_syscache_usage_interval. --- doc/src/sgml/config.sgml | 16 ++ src/backend/catalog/system_views.sql | 17 +++ src/backend/postmaster/pgstat.c | 201 ++++++++++++++++++++++++-- src/backend/tcop/postgres.c | 23 +++ src/backend/utils/adt/pgstatfuncs.c | 134 +++++++++++++++++ src/backend/utils/cache/catcache.c | 93 +++++++++--- src/backend/utils/cache/syscache.c | 24 +++ src/backend/utils/init/globals.c | 1 + src/backend/utils/init/postinit.c | 11 ++ src/backend/utils/misc/guc.c | 10 ++ src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/catalog/pg_proc.dat | 9 ++ src/include/miscadmin.h | 1 + src/include/pgstat.h | 6 +- src/include/utils/catcache.h | 9 +- src/include/utils/syscache.h | 19 +++ src/include/utils/timeout.h | 1 + src/test/regress/expected/rules.out | 24 ++- 18 files changed, 564 insertions(+), 36 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 71d784b6fe..2eceec1d94 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -6703,6 +6703,22 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; </listitem> </varlistentry> + <varlistentry id="guc-track-catalog-cache-usage-interval" xreflabel="track_catalog_cache_usage_interval"> + <term><varname>track_catalog_cache_usage_interval</varname> (<type>integer</type>) + <indexterm> + <primary><varname>track_catlog_cache_usage_interval</varname> + configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Specifies the interval to collect catalog cache usage statistics on + the session in milliseconds. This parameter is 0 by default, which + means disabled. Only superusers can change this setting. + </para> + </listitem> + </varlistentry> + <varlistentry id="guc-track-io-timing" xreflabel="track_io_timing"> <term><varname>track_io_timing</varname> (<type>boolean</type>) <indexterm> diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 3e229c693c..f5d1aaf96f 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -906,6 +906,22 @@ CREATE VIEW pg_stat_progress_vacuum AS FROM pg_stat_get_progress_info('VACUUM') AS S LEFT JOIN pg_database D ON S.datid = D.oid; +CREATE VIEW pg_stat_syscache AS + SELECT + S.pid AS pid, + S.relid::regclass AS relname, + S.indid::regclass AS cache_name, + S.size AS size, + S.ntup AS ntuples, + S.searches AS searches, + S.hits AS hits, + S.neg_hits AS neg_hits, + S.ageclass AS ageclass, + S.last_update AS last_update + FROM pg_stat_activity A + JOIN LATERAL (SELECT A.pid, * FROM pg_get_syscache_stats(A.pid)) S + ON (A.pid = S.pid); + CREATE VIEW pg_user_mappings AS SELECT U.oid AS umid, @@ -1185,6 +1201,7 @@ GRANT EXECUTE ON FUNCTION pg_ls_waldir() TO pg_monitor; GRANT EXECUTE ON FUNCTION pg_ls_archive_statusdir() TO pg_monitor; GRANT EXECUTE ON FUNCTION pg_ls_tmpdir() TO pg_monitor; GRANT EXECUTE ON FUNCTION pg_ls_tmpdir(oid) TO pg_monitor; +GRANT EXECUTE ON FUNCTION pg_get_syscache_stats(int) TO pg_monitor; GRANT pg_read_all_settings TO pg_monitor; GRANT pg_read_all_stats TO pg_monitor; diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 81c6499251..8c4ab0aef9 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -66,6 +66,7 @@ #include "utils/ps_status.h" #include "utils/rel.h" #include "utils/snapmgr.h" +#include "utils/syscache.h" #include "utils/timestamp.h" @@ -124,6 +125,7 @@ bool pgstat_track_activities = false; bool pgstat_track_counts = false; int pgstat_track_functions = TRACK_FUNC_OFF; +int pgstat_track_syscache_usage_interval = 0; int pgstat_track_activity_query_size = 1024; /* ---------- @@ -236,6 +238,11 @@ typedef struct TwoPhasePgStatRecord bool t_truncated; /* was the relation truncated? */ } TwoPhasePgStatRecord; +/* bitmap symbols to specify target file types remove */ +#define PGSTAT_REMFILE_DBSTAT 1 /* remove only database stats files */ +#define PGSTAT_REMFILE_SYSCACHE 2 /* remove only syscache stats files */ +#define PGSTAT_REMFILE_ALL 3 /* remove both type of files */ + /* * Info about current "snapshot" of stats file */ @@ -335,6 +342,7 @@ static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len); static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len); static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len); static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len); +static void pgstat_remove_syscache_statsfile(void); /* ------------------------------------------------------------ * Public functions called from postmaster follow @@ -630,10 +638,13 @@ startup_failed: } /* - * subroutine for pgstat_reset_all + * remove stats files + * + * clean up stats files in specified directory. target is one of + * PGSTAT_REFILE_DBSTAT/SYSCACHE/ALL and restricts files to remove. */ static void -pgstat_reset_remove_files(const char *directory) +pgstat_reset_remove_files(const char *directory, int target) { DIR *dir; struct dirent *entry; @@ -644,25 +655,39 @@ pgstat_reset_remove_files(const char *directory) { int nchars; Oid tmp_oid; + int filetype = 0; /* * Skip directory entries that don't match the file names we write. * See get_dbstat_filename for the database-specific pattern. */ if (strncmp(entry->d_name, "global.", 7) == 0) + { + filetype = PGSTAT_REMFILE_DBSTAT; nchars = 7; + } else { + char head[2]; + nchars = 0; - (void) sscanf(entry->d_name, "db_%u.%n", - &tmp_oid, &nchars); - if (nchars <= 0) - continue; + (void) sscanf(entry->d_name, "%c%c_%u.%n", + head, head + 1, &tmp_oid, &nchars); + /* %u allows leading whitespace, so reject that */ - if (strchr("0123456789", entry->d_name[3]) == NULL) + if (nchars < 3 || !isdigit(entry->d_name[3])) continue; + + if (strncmp(head, "db", 2) == 0) + filetype = PGSTAT_REMFILE_DBSTAT; + else if (strncmp(head, "cc", 2) == 0) + filetype = PGSTAT_REMFILE_SYSCACHE; } + /* skip if this is not a target */ + if ((filetype & target) == 0) + continue; + if (strcmp(entry->d_name + nchars, "tmp") != 0 && strcmp(entry->d_name + nchars, "stat") != 0) continue; @@ -683,8 +708,9 @@ pgstat_reset_remove_files(const char *directory) void pgstat_reset_all(void) { - pgstat_reset_remove_files(pgstat_stat_directory); - pgstat_reset_remove_files(PGSTAT_STAT_PERMANENT_DIRECTORY); + pgstat_reset_remove_files(pgstat_stat_directory, PGSTAT_REMFILE_ALL); + pgstat_reset_remove_files(PGSTAT_STAT_PERMANENT_DIRECTORY, + PGSTAT_REMFILE_ALL); } #ifdef EXEC_BACKEND @@ -2963,6 +2989,10 @@ pgstat_beshutdown_hook(int code, Datum arg) if (OidIsValid(MyDatabaseId)) pgstat_report_stat(true); + /* clear syscache statistics files and temporary settings */ + if (MyBackendId != InvalidBackendId) + pgstat_remove_syscache_statsfile(); + /* * Clear my status entry, following the protocol of bumping st_changecount * before and after. We use a volatile pointer here to ensure the @@ -4287,6 +4317,9 @@ PgstatCollectorMain(int argc, char *argv[]) pgStatRunningInCollector = true; pgStatDBHash = pgstat_read_statsfiles(InvalidOid, true, true); + /* Remove left-over syscache stats files */ + pgstat_reset_remove_files(pgstat_stat_directory, PGSTAT_REMFILE_SYSCACHE); + /* * Loop to process messages until we get SIGQUIT or detect ungraceful * death of our parent postmaster. @@ -6377,3 +6410,153 @@ pgstat_clip_activity(const char *raw_activity) return activity; } + +/* + * return the filename for a syscache stat file; filename is the output + * buffer, of length len. + */ +void +pgstat_get_syscachestat_filename(bool permanent, bool tempname, int backendid, + char *filename, int len) +{ + int printed; + + /* NB -- pgstat_reset_remove_files knows about the pattern this uses */ + printed = snprintf(filename, len, "%s/cc_%u.%s", + permanent ? PGSTAT_STAT_PERMANENT_DIRECTORY : + pgstat_stat_directory, + backendid, + tempname ? "tmp" : "stat"); + if (printed >= len) + elog(ERROR, "overlength pgstat path"); +} + +/* removes syscache stats files of this backend */ +static void +pgstat_remove_syscache_statsfile(void) +{ + char fname[MAXPGPATH]; + + pgstat_get_syscachestat_filename(false, false, MyBackendId, + fname, MAXPGPATH); + unlink(fname); /* don't care of the result */ +} + +/* + * pgstat_write_syscache_stats() - + * Write the syscache statistics files. + * + * If 'force' is false, this function skips writing a file and returns the + * time remaining in the current interval in milliseconds. If 'force' is true, + * writes a file regardless of the remaining time and reset the interval. + */ +long +pgstat_write_syscache_stats(bool force) +{ + static TimestampTz last_report = 0; + TimestampTz now; + long elapsed; + long secs; + int usecs; + int cacheId; + FILE *fpout; + char statfile[MAXPGPATH]; + char tmpfile[MAXPGPATH]; + + /* Return if we don't want it */ + if (!force && pgstat_track_syscache_usage_interval <= 0) + { + /* disabled. remove the statistics file if any */ + if (last_report > 0) + { + last_report = 0; + pgstat_remove_syscache_statsfile(); + } + return 0; + } + + /* Check against the interval */ + now = GetCurrentTransactionStopTimestamp(); + TimestampDifference(last_report, now, &secs, &usecs); + elapsed = secs * 1000 + usecs / 1000; + + if (!force && elapsed < pgstat_track_syscache_usage_interval) + { + /* not yet the time, inform the remaining time to the caller */ + return pgstat_track_syscache_usage_interval - elapsed; + } + + /* now update the stats */ + last_report = now; + + pgstat_get_syscachestat_filename(false, true, + MyBackendId, tmpfile, MAXPGPATH); + pgstat_get_syscachestat_filename(false, false, + MyBackendId, statfile, MAXPGPATH); + + /* + * This function can be called from ProcessInterrupts(). Inhibit recursive + * interrupts to avoid recursive entry. + */ + HOLD_INTERRUPTS(); + + fpout = AllocateFile(tmpfile, PG_BINARY_W); + if (fpout == NULL) + { + ereport(LOG, + (errcode_for_file_access(), + errmsg("could not open temporary statistics file \"%s\": %m", + tmpfile))); + /* + * Failure writing this file is not critical. Just skip this time and + * tell caller to wait for the next interval. + */ + RESUME_INTERRUPTS(); + return pgstat_track_syscache_usage_interval; + } + + /* write out every catcache stats */ + for (cacheId = 0 ; cacheId < SysCacheSize ; cacheId++) + { + SysCacheStats *stats; + + stats = SysCacheGetStats(cacheId); + Assert (stats); + + /* write error is checked later using ferror() */ + fputc('T', fpout); + (void)fwrite(&cacheId, sizeof(int), 1, fpout); + (void)fwrite(&last_report, sizeof(TimestampTz), 1, fpout); + (void)fwrite(stats, sizeof(*stats), 1, fpout); + } + fputc('E', fpout); + + if (ferror(fpout)) + { + ereport(LOG, + (errcode_for_file_access(), + errmsg("could not write syscache statistics file \"%s\": %m", + tmpfile))); + FreeFile(fpout); + unlink(tmpfile); + } + else if (FreeFile(fpout) < 0) + { + ereport(LOG, + (errcode_for_file_access(), + errmsg("could not close syscache statistics file \"%s\": %m", + tmpfile))); + unlink(tmpfile); + } + else if (rename(tmpfile, statfile) < 0) + { + ereport(LOG, + (errcode_for_file_access(), + errmsg("could not rename syscache statistics file \"%s\" to \"%s\": %m", + tmpfile, statfile))); + unlink(tmpfile); + } + + RESUME_INTERRUPTS(); + return 0; +} diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index f192ee2ca6..d0afee189f 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -3159,6 +3159,12 @@ ProcessInterrupts(void) } + if (IdleSyscacheStatsUpdateTimeoutPending) + { + IdleSyscacheStatsUpdateTimeoutPending = false; + pgstat_write_syscache_stats(true); + } + if (ParallelMessagePending) HandleParallelMessages(); @@ -3743,6 +3749,7 @@ PostgresMain(int argc, char *argv[], sigjmp_buf local_sigjmp_buf; volatile bool send_ready_for_query = true; bool disable_idle_in_transaction_timeout = false; + bool disable_idle_syscache_update_timeout = false; /* Initialize startup process environment if necessary. */ if (!IsUnderPostmaster) @@ -4186,9 +4193,19 @@ PostgresMain(int argc, char *argv[], } else { + long timeout; + ProcessCompletedNotifies(); pgstat_report_stat(false); + timeout = pgstat_write_syscache_stats(false); + + if (timeout > 0) + { + disable_idle_syscache_update_timeout = true; + enable_timeout_after(IDLE_SYSCACHE_STATS_UPDATE_TIMEOUT, + timeout); + } set_ps_display("idle", false); pgstat_report_activity(STATE_IDLE, NULL); } @@ -4231,6 +4248,12 @@ PostgresMain(int argc, char *argv[], disable_idle_in_transaction_timeout = false; } + if (disable_idle_syscache_update_timeout) + { + disable_timeout(IDLE_SYSCACHE_STATS_UPDATE_TIMEOUT, false); + disable_idle_syscache_update_timeout = false; + } + /* * (6) check for any other interesting events that happened while we * slept. diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index b6ba856ebe..a314f431c6 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -14,6 +14,8 @@ */ #include "postgres.h" +#include <sys/stat.h> + #include "access/htup_details.h" #include "catalog/pg_authid.h" #include "catalog/pg_type.h" @@ -28,6 +30,7 @@ #include "utils/acl.h" #include "utils/builtins.h" #include "utils/inet.h" +#include "utils/syscache.h" #include "utils/timestamp.h" #define UINT32_ACCESS_ONCE(var) ((uint32)(*((volatile uint32 *)&(var)))) @@ -1899,3 +1902,134 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS) PG_RETURN_DATUM(HeapTupleGetDatum( heap_form_tuple(tupdesc, values, nulls))); } + +Datum +pgstat_get_syscache_stats(PG_FUNCTION_ARGS) +{ +#define PG_GET_SYSCACHE_SIZE 9 + int pid = PG_GETARG_INT32(0); + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; + TupleDesc tupdesc; + Tuplestorestate *tupstore; + MemoryContext per_query_ctx; + MemoryContext oldcontext; + PgBackendStatus *beentry; + int beid; + char fname[MAXPGPATH]; + FILE *fpin; + char c; + + if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo)) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("set-valued function called in context that cannot accept a set"))); + if (!(rsinfo->allowedModes & SFRM_Materialize)) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("materialize mode required, but it is not " \ + "allowed in this context"))); + + /* Build a tuple descriptor for our result type */ + if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) + elog(ERROR, "return type must be a row type"); + + + per_query_ctx = rsinfo->econtext->ecxt_per_query_memory; + + oldcontext = MemoryContextSwitchTo(per_query_ctx); + tupstore = tuplestore_begin_heap(true, false, work_mem); + rsinfo->returnMode = SFRM_Materialize; + rsinfo->setResult = tupstore; + rsinfo->setDesc = tupdesc; + + MemoryContextSwitchTo(oldcontext); + + /* find beentry for given pid*/ + beentry = NULL; + for (beid = 1; + (beentry = pgstat_fetch_stat_beentry(beid)) && + beentry->st_procpid != pid ; + beid++); + + /* + * we silently return empty result on failure or insufficient privileges + */ + if (!beentry || + (!has_privs_of_role(GetUserId(), beentry->st_userid) && + !is_member_of_role(GetUserId(), DEFAULT_ROLE_READ_ALL_STATS))) + goto no_data; + + pgstat_get_syscachestat_filename(false, false, beid, fname, MAXPGPATH); + + if ((fpin = AllocateFile(fname, PG_BINARY_R)) == NULL) + { + if (errno != ENOENT) + ereport(WARNING, + (errcode_for_file_access(), + errmsg("could not open statistics file \"%s\": %m", + fname))); + /* also return empty on no statistics file */ + goto no_data; + } + + /* read the statistics file into tuplestore */ + while ((c = fgetc(fpin)) == 'T') + { + TimestampTz last_update; + SysCacheStats stats; + int cacheid; + Datum values[PG_GET_SYSCACHE_SIZE]; + bool nulls[PG_GET_SYSCACHE_SIZE] = {0}; + Datum datums[SYSCACHE_STATS_NAGECLASSES * 2]; + bool arrnulls[SYSCACHE_STATS_NAGECLASSES * 2] = {0}; + int dims[] = {SYSCACHE_STATS_NAGECLASSES, 2}; + int lbs[] = {1, 1}; + ArrayType *arr; + int i, j; + + if (fread(&cacheid, sizeof(int), 1, fpin) != 1 || + fread(&last_update, sizeof(TimestampTz), 1, fpin) != 1 || + fread(&stats, 1, sizeof(stats), fpin) != sizeof(stats)) + { + ereport(WARNING, + (errmsg("corrupted syscache statistics file \"%s\"", + fname))); + goto no_data; + } + + i = 0; + values[i++] = ObjectIdGetDatum(stats.reloid); + values[i++] = ObjectIdGetDatum(stats.indoid); + values[i++] = Int64GetDatum(stats.size); + values[i++] = Int64GetDatum(stats.ntuples); + values[i++] = Int64GetDatum(stats.nsearches); + values[i++] = Int64GetDatum(stats.nhits); + values[i++] = Int64GetDatum(stats.nneg_hits); + + for (j = 0 ; j < SYSCACHE_STATS_NAGECLASSES ; j++) + { + datums[j * 2] = Int32GetDatum((int32) stats.ageclasses[j]); + datums[j * 2 + 1] = Int32GetDatum((int32) stats.nclass_entries[j]); + } + + arr = construct_md_array(datums, arrnulls, 2, dims, lbs, + INT4OID, sizeof(int32), true, 'i'); + values[i++] = PointerGetDatum(arr); + + values[i++] = TimestampTzGetDatum(last_update); + + Assert (i == PG_GET_SYSCACHE_SIZE); + + tuplestore_putvalues(tupstore, tupdesc, values, nulls); + } + + /* check for the end of file. abandon the result if file is broken */ + if (c != 'E' || fgetc(fpin) != EOF) + tuplestore_clear(tupstore); + + FreeFile(fpin); + +no_data: + tuplestore_donestoring(tupstore); + return (Datum) 0; +} diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 0195e19976..fd84e35a6a 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -109,6 +109,10 @@ static CatCacheHeader *CacheHdr = NULL; /* Clock used to record the last accessed time of a catcache record. */ TimestampTz catcacheclock = 0; +/* age classes for pruning */ +static double ageclass[SYSCACHE_STATS_NAGECLASSES] + = {0.05, 0.1, 1.0, 2.0, 3.0, 0.0}; + static inline HeapTuple SearchCatCacheInternal(CatCache *cache, int nkeys, Datum v1, Datum v2, @@ -640,9 +644,7 @@ CatCacheInvalidate(CatCache *cache, uint32 hashValue) else CatCacheRemoveCTup(cache, ct); CACHE1_elog(DEBUG2, "CatCacheInvalidate: invalidated"); -#ifdef CATCACHE_STATS cache->cc_invals++; -#endif /* could be multiple matches, so keep looking! */ } } @@ -718,9 +720,7 @@ ResetCatalogCache(CatCache *cache) } else CatCacheRemoveCTup(cache, ct); -#ifdef CATCACHE_STATS cache->cc_invals++; -#endif } } } @@ -1032,10 +1032,10 @@ CatCacheCleanupOldEntries(CatCache *cp) int us; /* - * Calculate the duration from the time of the last access to the - * "current" time. Since catcacheclock is not advanced within a - * transaction, the entries that are accessed within the current - * transaction won't be pruned. + * Calculate the duration from the time from the last access to + * the "current" time. Since catcacheclock is not advanced within + * a transaction, the entries that are accessed within the current + * transaction always get 0 as the result. */ TimestampDifference(ct->lastaccess, catcacheclock, &entry_age, &us); @@ -1463,9 +1463,7 @@ SearchCatCacheInternal(CatCache *cache, if (unlikely(cache->cc_tupdesc == NULL)) CatalogCacheInitializeCache(cache); -#ifdef CATCACHE_STATS cache->cc_searches++; -#endif /* Initialize local parameter array */ arguments[0] = v1; @@ -1535,9 +1533,7 @@ SearchCatCacheInternal(CatCache *cache, CACHE3_elog(DEBUG2, "SearchCatCache(%s): found in bucket %d", cache->cc_relname, hashIndex); -#ifdef CATCACHE_STATS cache->cc_hits++; -#endif return &ct->tuple; } @@ -1546,9 +1542,7 @@ SearchCatCacheInternal(CatCache *cache, CACHE3_elog(DEBUG2, "SearchCatCache(%s): found neg entry in bucket %d", cache->cc_relname, hashIndex); -#ifdef CATCACHE_STATS cache->cc_neg_hits++; -#endif return NULL; } @@ -1676,9 +1670,7 @@ SearchCatCacheMiss(CatCache *cache, CACHE3_elog(DEBUG2, "SearchCatCache(%s): put in bucket %d", cache->cc_relname, hashIndex); -#ifdef CATCACHE_STATS cache->cc_newloads++; -#endif return &ct->tuple; } @@ -1789,9 +1781,7 @@ SearchCatCacheList(CatCache *cache, Assert(nkeys > 0 && nkeys < cache->cc_nkeys); -#ifdef CATCACHE_STATS cache->cc_lsearches++; -#endif /* Initialize local parameter array */ arguments[0] = v1; @@ -1848,9 +1838,7 @@ SearchCatCacheList(CatCache *cache, CACHE2_elog(DEBUG2, "SearchCatCacheList(%s): found list", cache->cc_relname); -#ifdef CATCACHE_STATS cache->cc_lhits++; -#endif return cl; } @@ -2373,3 +2361,68 @@ PrintCatCacheListLeakWarning(CatCList *list) list->my_cache->cc_relname, list->my_cache->id, list, list->refcount); } + +/* + * CatCacheGetStats - fill in SysCacheStats struct. + * + * This is a support routine for SysCacheGetStats, substantially fills in the + * result. The classification here is based on the same criteria to + * CatCacheCleanupOldEntries(). + */ +void +CatCacheGetStats(CatCache *cache, SysCacheStats *stats) +{ + int i, j; + + Assert(ageclass[SYSCACHE_STATS_NAGECLASSES - 1] == 0.0); + + /* fill in the stats struct */ + stats->size = cache->cc_memusage; + stats->ntuples = cache->cc_ntup; + stats->nsearches = cache->cc_searches; + stats->nhits = cache->cc_hits; + stats->nneg_hits = cache->cc_neg_hits; + + /* + * catalog_cache_prune_min_age can be changed on-session, fill it every + * time + */ + for (i = 0 ; i < SYSCACHE_STATS_NAGECLASSES ; i++) + stats->ageclasses[i] = + (int) (catalog_cache_prune_min_age * ageclass[i]); + + /* + * nth element in nclass_entries stores the number of cache entries that + * have lived unaccessed for corresponding multiple in ageclass of + * catalog_cache_prune_min_age. + */ + memset(stats->nclass_entries, 0, sizeof(int) * SYSCACHE_STATS_NAGECLASSES); + + /* Scan the whole hash */ + for (i = 0; i < cache->cc_nbuckets; i++) + { + dlist_mutable_iter iter; + + dlist_foreach_modify(iter, &cache->cc_bucket[i]) + { + CatCTup *ct = dlist_container(CatCTup, cache_elem, iter.cur); + long entry_age; + int us; + + /* + * Calculate the duration from the time from the last access to + * the "current" time. Since catcacheclock is not advanced within + * a transaction, the entries that are accessed within the current + * transaction won't be pruned. + */ + TimestampDifference(ct->lastaccess, catcacheclock, &entry_age, &us); + + j = 0; + while (j < SYSCACHE_STATS_NAGECLASSES - 1 && + entry_age > stats->ageclasses[j]) + j++; + + stats->nclass_entries[j]++; + } + } +} diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c index ac98c19155..7b38a06708 100644 --- a/src/backend/utils/cache/syscache.c +++ b/src/backend/utils/cache/syscache.c @@ -20,6 +20,9 @@ */ #include "postgres.h" +#include <sys/stat.h> +#include <unistd.h> + #include "access/htup_details.h" #include "access/sysattr.h" #include "catalog/indexing.h" @@ -1534,6 +1537,27 @@ RelationSupportsSysCache(Oid relid) return false; } +/* + * SysCacheGetStats - returns stats of specified syscache + * + * This routine returns the address of its local static memory. + */ +SysCacheStats * +SysCacheGetStats(int cacheId) +{ + static SysCacheStats stats; + + Assert(cacheId >=0 && cacheId < SysCacheSize); + + memset(&stats, 0, sizeof(stats)); + + stats.reloid = cacheinfo[cacheId].reloid; + stats.indoid = cacheinfo[cacheId].indoid; + + CatCacheGetStats(SysCache[cacheId], &stats); + + return &stats; +} /* * OID comparator for pg_qsort diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c index 0e8b972a29..b7c647b5e0 100644 --- a/src/backend/utils/init/globals.c +++ b/src/backend/utils/init/globals.c @@ -33,6 +33,7 @@ volatile sig_atomic_t ProcDiePending = false; volatile sig_atomic_t ClientConnectionLost = false; volatile sig_atomic_t IdleInTransactionSessionTimeoutPending = false; volatile sig_atomic_t CatcacheClockTimeoutPending = false; +volatile sig_atomic_t IdleSyscacheStatsUpdateTimeoutPending = false; volatile sig_atomic_t ConfigReloadPending = false; volatile uint32 InterruptHoldoffCount = 0; volatile uint32 QueryCancelHoldoffCount = 0; diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index 9eb50e9676..2f3251e8d5 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -73,6 +73,7 @@ static void StatementTimeoutHandler(void); static void LockTimeoutHandler(void); static void IdleInTransactionSessionTimeoutHandler(void); static void CatcacheClockTimeoutHandler(void); +static void IdleSyscacheStatsUpdateTimeoutHandler(void); static bool ThereIsAtLeastOneRole(void); static void process_startup_options(Port *port, bool am_superuser); static void process_settings(Oid databaseid, Oid roleid); @@ -631,6 +632,8 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, IdleInTransactionSessionTimeoutHandler); RegisterTimeout(CATCACHE_CLOCK_TIMEOUT, CatcacheClockTimeoutHandler); + RegisterTimeout(IDLE_SYSCACHE_STATS_UPDATE_TIMEOUT, + IdleSyscacheStatsUpdateTimeoutHandler); } /* @@ -1249,6 +1252,14 @@ CatcacheClockTimeoutHandler(void) SetLatch(MyLatch); } +static void +IdleSyscacheStatsUpdateTimeoutHandler(void) +{ + IdleSyscacheStatsUpdateTimeoutPending = true; + InterruptPending = true; + SetLatch(MyLatch); +} + /* * Returns true if at least one role is defined in this database cluster. */ diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index c62d5ad8b8..7f1670fa5b 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -3178,6 +3178,16 @@ static struct config_int ConfigureNamesInt[] = NULL, NULL, NULL }, + { + {"track_catalog_cache_usage_interval", PGC_SUSET, STATS_COLLECTOR, + gettext_noop("Sets the interval between syscache usage collection, in milliseconds. Zero disables syscache usagetracking."), + NULL + }, + &pgstat_track_syscache_usage_interval, + 0, 0, INT_MAX / 2, + NULL, NULL, NULL + }, + { {"gin_pending_list_limit", PGC_USERSET, CLIENT_CONN_STATEMENT, gettext_noop("Sets the maximum size of the pending list for GIN index."), diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index aeb5968e75..797f52fa2a 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -556,6 +556,7 @@ #track_io_timing = off #track_functions = none # none, pl, all #track_activity_query_size = 1024 # (change requires restart) +#track_catlog_cache_usage_interval = 0 # zero disables tracking #stats_temp_directory = 'pg_stat_tmp' diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 24f99f7fc4..fc35b6be47 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -9689,6 +9689,15 @@ proargmodes => '{o,o,o,o,o,o,o,o,o,o,o}', proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn}', prosrc => 'pg_get_replication_slots' }, +{ oid => '3425', + descr => 'syscache statistics', + proname => 'pg_get_syscache_stats', prorows => '100', proisstrict => 'f', + proretset => 't', provolatile => 'v', prorettype => 'record', + proargtypes => 'int4', + proallargtypes => '{int4,oid,oid,int8,int8,int8,int8,int8,_int4,timestamptz}', + proargmodes => '{i,o,o,o,o,o,o,o,o,o}', + proargnames => '{pid,relid,indid,size,ntup,searches,hits,neg_hits,ageclass,last_update}', + prosrc => 'pgstat_get_syscache_stats' }, { oid => '3786', descr => 'set up a logical replication slot', proname => 'pg_create_logical_replication_slot', provolatile => 'v', proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool', diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index 33b800e80f..767c94a63c 100644 --- a/src/include/miscadmin.h +++ b/src/include/miscadmin.h @@ -83,6 +83,7 @@ extern PGDLLIMPORT volatile sig_atomic_t QueryCancelPending; extern PGDLLIMPORT volatile sig_atomic_t ProcDiePending; extern PGDLLIMPORT volatile sig_atomic_t IdleInTransactionSessionTimeoutPending; extern PGDLLIMPORT volatile sig_atomic_t CatcacheClockTimeoutPending; +extern PGDLLIMPORT volatile sig_atomic_t IdleSyscacheStatsUpdateTimeoutPending; extern PGDLLIMPORT volatile sig_atomic_t ConfigReloadPending; extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost; diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 88a75fb798..b6bfd7d644 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -1144,6 +1144,7 @@ extern bool pgstat_track_activities; extern bool pgstat_track_counts; extern int pgstat_track_functions; extern PGDLLIMPORT int pgstat_track_activity_query_size; +extern int pgstat_track_syscache_usage_interval; extern char *pgstat_stat_directory; extern char *pgstat_stat_tmpname; extern char *pgstat_stat_filename; @@ -1228,7 +1229,8 @@ extern PgStat_BackendFunctionEntry *find_funcstat_entry(Oid func_id); extern void pgstat_initstats(Relation rel); extern char *pgstat_clip_activity(const char *raw_activity); - +extern void pgstat_get_syscachestat_filename(bool permanent, + bool tempname, int backendid, char *filename, int len); /* ---------- * pgstat_report_wait_start() - * @@ -1363,5 +1365,5 @@ extern PgStat_StatFuncEntry *pgstat_fetch_stat_funcentry(Oid funcid); extern int pgstat_fetch_stat_numbackends(void); extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void); extern PgStat_GlobalStats *pgstat_fetch_global(void); - +extern long pgstat_write_syscache_stats(bool force); #endif /* PGSTAT_H */ diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h index 0425fc0786..8e477090e2 100644 --- a/src/include/utils/catcache.h +++ b/src/include/utils/catcache.h @@ -68,10 +68,8 @@ typedef struct catcache int cc_nfreeent; /* # of entries currently not referenced */ /* - * Keep these at the end, so that compiling catcache.c with CATCACHE_STATS - * doesn't break ABI for other modules + * Statistics entries */ -#ifdef CATCACHE_STATS long cc_searches; /* total # searches against this cache */ long cc_hits; /* # of matches against existing entry */ long cc_neg_hits; /* # of matches against negative entry */ @@ -84,7 +82,6 @@ typedef struct catcache long cc_invals; /* # of entries invalidated from cache */ long cc_lsearches; /* total # list-searches */ long cc_lhits; /* # of matches against existing lists */ -#endif } CatCache; @@ -275,4 +272,8 @@ extern void PrepareToInvalidateCacheTuple(Relation relation, extern void PrintCatCacheLeakWarning(HeapTuple tuple); extern void PrintCatCacheListLeakWarning(CatCList *list); +/* defined in syscache.h */ +typedef struct syscachestats SysCacheStats; +extern void CatCacheGetStats(CatCache *cache, SysCacheStats *syscachestats); + #endif /* CATCACHE_H */ diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h index 95ee48954e..71b399c902 100644 --- a/src/include/utils/syscache.h +++ b/src/include/utils/syscache.h @@ -112,6 +112,24 @@ enum SysCacheIdentifier #define SysCacheSize (USERMAPPINGUSERSERVER + 1) }; +#define SYSCACHE_STATS_NAGECLASSES 6 +/* Struct for catcache tracking information */ +typedef struct syscachestats +{ + Oid reloid; /* target relation */ + Oid indoid; /* index */ + size_t size; /* size of the catcache */ + int ntuples; /* number of tuples resides in the catcache */ + int nsearches; /* number of searches */ + int nhits; /* number of cache hits */ + int nneg_hits; /* number of negative cache hits */ + /* age classes in seconds */ + int ageclasses[SYSCACHE_STATS_NAGECLASSES]; + /* number of tuples fall into the corresponding age class */ + int nclass_entries[SYSCACHE_STATS_NAGECLASSES]; +} SysCacheStats; + + extern void InitCatalogCache(void); extern void InitCatalogCachePhase2(void); @@ -164,6 +182,7 @@ extern void SysCacheInvalidate(int cacheId, uint32 hashValue); extern bool RelationInvalidatesSnapshotsOnly(Oid relid); extern bool RelationHasSysCache(Oid relid); extern bool RelationSupportsSysCache(Oid relid); +extern SysCacheStats *SysCacheGetStats(int cacheId); /* * The use of the macros below rather than direct calls to the corresponding diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h index b2d97b4f7b..0677978923 100644 --- a/src/include/utils/timeout.h +++ b/src/include/utils/timeout.h @@ -32,6 +32,7 @@ typedef enum TimeoutId STANDBY_LOCK_TIMEOUT, IDLE_IN_TRANSACTION_SESSION_TIMEOUT, CATCACHE_CLOCK_TIMEOUT, + IDLE_SYSCACHE_STATS_UPDATE_TIMEOUT, /* First user-definable timeout reason */ USER_TIMEOUT, /* Maximum number of timeout reasons */ diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index 2c8e21baa7..7bd77e9972 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1921,6 +1921,28 @@ pg_stat_sys_tables| SELECT pg_stat_all_tables.relid, pg_stat_all_tables.autoanalyze_count FROM pg_stat_all_tables WHERE ((pg_stat_all_tables.schemaname = ANY (ARRAY['pg_catalog'::name, 'information_schema'::name])) OR (pg_stat_all_tables.schemaname~ '^pg_toast'::text)); +pg_stat_syscache| SELECT s.pid, + (s.relid)::regclass AS relname, + (s.indid)::regclass AS cache_name, + s.size, + s.ntup AS ntuples, + s.searches, + s.hits, + s.neg_hits, + s.ageclass, + s.last_update + FROM (pg_stat_activity a + JOIN LATERAL ( SELECT a.pid, + pg_get_syscache_stats.relid, + pg_get_syscache_stats.indid, + pg_get_syscache_stats.size, + pg_get_syscache_stats.ntup, + pg_get_syscache_stats.searches, + pg_get_syscache_stats.hits, + pg_get_syscache_stats.neg_hits, + pg_get_syscache_stats.ageclass, + pg_get_syscache_stats.last_update + FROM pg_get_syscache_stats(a.pid) pg_get_syscache_stats(relid, indid, size, ntup, searches, hits, neg_hits, ageclass,last_update)) s ON ((a.pid = s.pid))); pg_stat_user_functions| SELECT p.oid AS funcid, n.nspname AS schemaname, p.proname AS funcname, @@ -2352,7 +2374,7 @@ pg_settings|pg_settings_n|CREATE RULE pg_settings_n AS ON UPDATE TO pg_catalog.pg_settings DO INSTEAD NOTHING; pg_settings|pg_settings_u|CREATE RULE pg_settings_u AS ON UPDATE TO pg_catalog.pg_settings - WHERE (new.name = old.name) DO SELECT set_config(old.name, new.setting, false) AS set_config; + WHERE (new.name = old.name) DO SELECT set_config(old.name, new.setting, false, false) AS set_config; rtest_emp|rtest_emp_del|CREATE RULE rtest_emp_del AS ON DELETE TO public.rtest_emp DO INSERT INTO rtest_emplog (ename, who, action, newsal, oldsal) VALUES (old.ename, CURRENT_USER, 'fired'::bpchar, '$0.00'::money, old.salary); -- 2.16.3
pgsql-hackers by date: