Re: Enhancing Memory Context Statistics Reporting - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Enhancing Memory Context Statistics Reporting |
Date | |
Msg-id | 5bxhxniyvjyfldi7yjxcnxkl3i2ghci2grjyeclrbkfqnyhowk@dfkqzbvtbpml Whole thread Raw |
In response to | Re: Enhancing Memory Context Statistics Reporting (Daniel Gustafsson <daniel@yesql.se>) |
Responses |
Re: Enhancing Memory Context Statistics Reporting
Re: Enhancing Memory Context Statistics Reporting |
List | pgsql-hackers |
Hi, On 2025-04-07 15:41:37 +0200, Daniel Gustafsson wrote: > I think this function can be a valuable debugging aid going forward. What I am most excited about for this is to be able to measure server-wide and fleet-wide memory usage over time. Today I have actually very little idea about what memory is being used for across all connections, not to speak of a larger number of servers. > diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c > index 4f6795f7265..d3b4df27935 100644 > --- a/src/backend/postmaster/auxprocess.c > +++ b/src/backend/postmaster/auxprocess.c > @@ -84,6 +84,13 @@ AuxiliaryProcessMainCommon(void) > /* register a before-shutdown callback for LWLock cleanup */ > before_shmem_exit(ShutdownAuxiliaryProcess, 0); > > + /* > + * The before shmem exit callback frees the DSA memory occupied by the > + * latest memory context statistics that could be published by this aux > + * proc if requested. > + */ > + before_shmem_exit(AtProcExit_memstats_dsa_free, 0); > + > SetProcessingMode(NormalProcessing); > } How about putting it into BaseInit()? Or maybe just register it when its first used? > diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c > index fda91ffd1ce..d3cb3f1891c 100644 > --- a/src/backend/postmaster/checkpointer.c > +++ b/src/backend/postmaster/checkpointer.c > @@ -663,6 +663,10 @@ ProcessCheckpointerInterrupts(void) > /* Perform logging of memory contexts of this process */ > if (LogMemoryContextPending) > ProcessLogMemoryContextInterrupt(); > + > + /* Publish memory contexts of this process */ > + if (PublishMemoryContextPending) > + ProcessGetMemoryContextInterrupt(); > } > > /* Not this patch's responsibility, but we really desperately need to unify our interrupt handling. Manually keeping a ~dozen of functions similar, but not exactly the same, is an insane approach. > --- a/src/backend/utils/activity/wait_event_names.txt > +++ b/src/backend/utils/activity/wait_event_names.txt > @@ -161,6 +161,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit." > WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication." > WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated." > XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end." > +MEM_CTX_PUBLISH "Waiting for a process to publish memory information." The memory context stuff abbreviates as cxt not ctx. There's a few more cases of that in the patch. > +const char * > +ContextTypeToString(NodeTag type) > +{ > + const char *context_type; > + > + switch (type) > + { > + case T_AllocSetContext: > + context_type = "AllocSet"; > + break; > + case T_GenerationContext: > + context_type = "Generation"; > + break; > + case T_SlabContext: > + context_type = "Slab"; > + break; > + case T_BumpContext: > + context_type = "Bump"; > + break; > + default: > + context_type = "???"; > + break; > + } > + return (context_type); Why these parens? > + * If the publishing backend does not respond before the condition variable > + * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry given that there is > + * time left within the timeout specified by the user, before giving up and > + * returning previously published statistics, if any. If no previous statistics > + * exist, return NULL. Why do we need to repeatedly wake up rather than just sleeping with the "remaining" amount of time based on the time the function was called and the time that has passed since? > + /* > + * A valid DSA pointer isn't proof that statistics are available, it can > + * be valid due to previously published stats. Somehow "valid DSA pointer" is a bit too much about the precise mechanics and not enough about what's actually happening. I'd rather say something like "Even if the proc has published statistics, they may not be due to the current request, but previously published stats." > + if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv, > + MEMSTATS_WAIT_TIMEOUT, > + WAIT_EVENT_MEM_CTX_PUBLISH)) > + { > + timer += MEMSTATS_WAIT_TIMEOUT; > + > + /* > + * Wait for the timeout as defined by the user. If no updated > + * statistics are available within the allowed time then display > + * previously published statistics if there are any. If no > + * previous statistics are available then return NULL. The timer > + * is defined in milliseconds since thats what the condition > + * variable sleep uses. > + */ > + if ((timer * 1000) >= timeout) > + { I'd suggest just comparing how much time has elapsed since the timestamp you've requested earlier. > + LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE); > + /* Displaying previously published statistics if available */ > + if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)) > + break; > + else > + { > + LWLockRelease(&memCtxState[procNumber].lw_lock); > + PG_RETURN_NULL(); > + } > + } > + } > + } > +/* > + * Initialize shared memory for displaying memory context statistics > + */ > +void > +MemoryContextReportingShmemInit(void) > +{ > + bool found; > + > + memCtxArea = (MemoryContextState *) > + ShmemInitStruct("MemoryContextState", sizeof(MemoryContextState), &found); > + > + if (!IsUnderPostmaster) > + { > + Assert(!found); I don't really understand why this uses IsUnderPostmaster? Seems like this should just use found like most (or all) the other *ShmemInit() functions do? > + LWLockInitialize(&memCtxArea->lw_lock, LWLockNewTrancheId()); I think for builtin code we just hardcode the tranches in BuiltinTrancheIds. > + memCtxState = (MemoryContextBackendState *) > + ShmemInitStruct("MemoryContextBackendState", > + ((MaxBackends + NUM_AUXILIARY_PROCS) * sizeof(MemoryContextBackendState)), > + &found); FWIW, I think it'd be mildly better if these two ShmemInitStruct()'s were combined. > static void > MemoryContextStatsInternal(MemoryContext context, int level, > int max_level, int max_children, > MemoryContextCounters *totals, > - bool print_to_stderr) > + PrintDestination print_location, int *num_contexts) > { > MemoryContext child; > int ichild; > @@ -884,10 +923,39 @@ MemoryContextStatsInternal(MemoryContext context, int level, > Assert(MemoryContextIsValid(context)); > > /* Examine the context itself */ > - context->methods->stats(context, > - MemoryContextStatsPrint, > - &level, > - totals, print_to_stderr); > + switch (print_location) > + { > + case PRINT_STATS_TO_STDERR: > + context->methods->stats(context, > + MemoryContextStatsPrint, > + &level, > + totals, true); > + break; > + > + case PRINT_STATS_TO_LOGS: > + context->methods->stats(context, > + MemoryContextStatsPrint, > + &level, > + totals, false); > + break; > + > + case PRINT_STATS_NONE: > + > + /* > + * Do not print the statistics if print_location is > + * PRINT_STATS_NONE, only compute totals. This is used in > + * reporting of memory context statistics via a sql function. Last > + * parameter is not relevant. > + */ > + context->methods->stats(context, > + NULL, > + NULL, > + totals, false); > + break; > + } > + > + /* Increment the context count for each of the recursive call */ > + *num_contexts = *num_contexts + 1; It feels a bit silly to duplicate the call to context->methods->stats three times. We've changed these parameters a bunch in the past, having more callers to fix makes that more work. Can't the switch just set up the args that are then passed to one call to context->methods->stats? > + > + /* Compute the number of stats that can fit in the defined limit */ > + max_stats = (MAX_SEGMENTS_PER_BACKEND * DSA_DEFAULT_INIT_SEGMENT_SIZE) > + / (MAX_MEMORY_CONTEXT_STATS_SIZE); MAX_SEGMENTS_PER_BACKEND sounds way too generic to me for something defined in memutils.h. I don't really understand why DSA_DEFAULT_INIT_SEGMENT_SIZE is something that makes sense to use here? The header says: > +/* Maximum size (in Mb) of DSA area per process */ > +#define MAX_SEGMENTS_PER_BACKEND 1 But the name doesn't at all indicate it's in megabytes. Nor does the way it's used clearly indicate that. That seems to be completely incidental based on the current default value DSA_DEFAULT_INIT_SEGMENT_SIZE. > + /* > + * Hold the process lock to protect writes to process specific memory. Two > + * processes publishing statistics do not block each other. > + */ s/specific/process specific/ > + LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE); > + memCtxState[idx].proc_id = MyProcPid; > + > + if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer)) > + { > + /* > + * Free any previous allocations, free the name, ident and path > + * pointers before freeing the pointer that contains them. > + */ > + free_memorycontextstate_dsa(area, memCtxState[idx].total_stats, > + memCtxState[idx].memstats_dsa_pointer); > + > + dsa_free(area, memCtxState[idx].memstats_dsa_pointer); > + memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer; Both callers to free_memorycontextstate_dsa() do these lines immediately after calling free_memorycontextstate_dsa(), why not do that inside? > + for (MemoryContext c = TopMemoryContext->firstchild; c != NULL; > + c = c->nextchild) > + { > + MemoryContextCounters grand_totals; > + int num_contexts = 0; > + int level = 0; > + > + path = NIL; > + memset(&grand_totals, 0, sizeof(grand_totals)); > + > + MemoryContextStatsInternal(c, level, 100, 100, &grand_totals, > + PRINT_STATS_NONE, &num_contexts); > + > + path = compute_context_path(c, context_id_lookup); > + > + PublishMemoryContext(meminfo, ctx_id, c, path, > + grand_totals, num_contexts, area, 100); > + ctx_id = ctx_id + 1; > + } > + memCtxState[idx].total_stats = ctx_id; > + /* Notify waiting backends and return */ > + hash_destroy(context_id_lookup); > + dsa_detach(area); > + signal_memorycontext_reporting(); > + } > + > + foreach_ptr(MemoryContextData, cur, contexts) > + { > + List *path = NIL; > + > + /* > + * Figure out the transient context_id of this context and each of its > + * ancestors, to compute a path for this context. > + */ > + path = compute_context_path(cur, context_id_lookup); > + > + /* Account for saving one statistics slot for cumulative reporting */ > + if (context_id < (max_stats - 1) || stats_count <= max_stats) > + { > + /* Examine the context stats */ > + memset(&stat, 0, sizeof(stat)); > + (*cur->methods->stats) (cur, NULL, NULL, &stat, true); Hm. So here we call the callback ourselves, even though we extended MemoryContextStatsInternal() to satisfy the summary output. I guess it's tolerable, but it's not great. > + /* Copy statistics to DSA memory */ > + PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area, 100); > + } > + else > + { > + /* Examine the context stats */ > + memset(&stat, 0, sizeof(stat)); > + (*cur->methods->stats) (cur, NULL, NULL, &stat, true); But do we really do it twice in a row? The lines are exactly the same, so it seems that should just be done before the if? > + > + /* Notify waiting backends and return */ > + hash_destroy(context_id_lookup); > + dsa_detach(area); > + signal_memorycontext_reporting(); > +} > + > +/* > + * Signal all the waiting client backends after copying all the statistics. > + */ > +static void > +signal_memorycontext_reporting(void) > +{ > + memCtxState[MyProcNumber].stats_timestamp = GetCurrentTimestamp(); > + LWLockRelease(&memCtxState[MyProcNumber].lw_lock); > + ConditionVariableBroadcast(&memCtxState[MyProcNumber].memctx_cv); > +} IMO somewhat confusing to release the lock in a function named signal_memorycontext_reporting(). Why do we do that after hash_destroy()/dsa_detach()? > +static void > +compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup, > + int *stats_count, bool summary) > +{ > + foreach_ptr(MemoryContextData, cur, contexts) > + { > + MemoryContextId *entry; > + bool found; > + > + entry = (MemoryContextId *) hash_search(context_id_lookup, &cur, > + HASH_ENTER, &found); > + Assert(!found); > + > + /* context id starts with 1 */ > + entry->context_id = ++(*stats_count); Given that we don't actually do anything here relating to starting with 1, I find that comment confusing. > +static void > +PublishMemoryContext(MemoryContextStatsEntry *memctx_info, int curr_id, > + MemoryContext context, List *path, > + MemoryContextCounters stat, int num_contexts, > + dsa_area *area, int max_levels) > +{ > + const char *ident = context->ident; > + const char *name = context->name; > + int *path_list; > + > + /* > + * To be consistent with logging output, we label dynahash contexts with > + * just the hash table name as with MemoryContextStatsPrint(). > + */ > + if (context->ident && strncmp(context->name, "dynahash", 8) == 0) > + { > + name = context->ident; > + ident = NULL; > + } > + > + if (name != NULL) > + { > + int namelen = strlen(name); > + char *nameptr; > + > + if (strlen(name) >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE) > + namelen = pg_mbcliplen(name, namelen, > + MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1); > + > + memctx_info[curr_id].name = dsa_allocate0(area, namelen + 1); Given the number of references to memctx_info[curr_id] I'd put it in a local variable. Why is this a dsa_allocate0 given that we're immediately overwriting it? > + nameptr = (char *) dsa_get_address(area, memctx_info[curr_id].name); > + strlcpy(nameptr, name, namelen + 1); > + } > + else > + memctx_info[curr_id].name = InvalidDsaPointer; > + > + /* Trim and copy the identifier if it is not set to NULL */ > + if (ident != NULL) > + { > + int idlen = strlen(context->ident); > + char *identptr; > + > + /* > + * Some identifiers such as SQL query string can be very long, > + * truncate oversize identifiers. > + */ > + if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE) > + idlen = pg_mbcliplen(ident, idlen, > + MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1); > + > + memctx_info[curr_id].ident = dsa_allocate0(area, idlen + 1); > + identptr = (char *) dsa_get_address(area, memctx_info[curr_id].ident); > + strlcpy(identptr, ident, idlen + 1); Hm. First I thought we'd leak memory if this second (and subsequent) dsa_allocate failed. Then I thought we'd be ok, because the memory would be memory because it'd be reachable from memCtxState[idx].memstats_dsa_pointer. But I think it wouldn't *quite* work, because memCtxState[idx].total_stats is only set *after* we would have failed. > + /* Allocate DSA memory for storing path information */ > + if (path == NIL) > + memctx_info[curr_id].path = InvalidDsaPointer; > + else > + { > + int levels = Min(list_length(path), max_levels); > + > + memctx_info[curr_id].path_length = levels; > + memctx_info[curr_id].path = dsa_allocate0(area, levels * sizeof(int)); > + memctx_info[curr_id].levels = list_length(path); > + path_list = (int *) dsa_get_address(area, memctx_info[curr_id].path); > + > + foreach_int(i, path) > + { > + path_list[foreach_current_index(i)] = i; > + if (--levels == 0) > + break; > + } > + } > + memctx_info[curr_id].type = ContextTypeToString(context->type); I don't think this works across platforms. On windows / EXEC_BACKEND builds the location of string constants can differ across backends. And: Why do we need the string here? You can just call ContextTypeToString when reading? > +/* > + * Free the memory context statistics stored by this process > + * in DSA area. > + */ > +void > +AtProcExit_memstats_dsa_free(int code, Datum arg) > +{ FWIW, to me the fact that it does a dsa_free() is an implementation detail. It's also not the only thing this does. And, I don't think AtProcExit* really is accurate, given that it runs *before* shmem is cleaned up? I wonder if the best approach here wouldn't be to forgo the use of a before_shmem_exit() callback, but instead use on_dsm_detach(). That would require we'd not constantly detach from the dsm segment, but I don't understand why we do that in the first place? > + int idx = MyProcNumber; > + dsm_segment *dsm_seg = NULL; > + dsa_area *area = NULL; > + > + if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID) > + return; > + > + dsm_seg = dsm_find_mapping(memCtxArea->memstats_dsa_handle); > + > + LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE); > + > + if (!DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer)) > + { > + LWLockRelease(&memCtxState[idx].lw_lock); > + return; > + } > + > + /* If the dsm mapping could not be found, attach to the area */ > + if (dsm_seg != NULL) > + return; I don't understand what we do here with the dsm? Why do we not need cleanup if we are already attached to the dsm segment? > +/* > + * Static shared memory state representing the DSA area created for memory > + * context statistics reporting. A single DSA area is created and used by all > + * the processes, each having its specific DSA allocations for sharing memory > + * statistics, tracked by per backend static shared memory state. > + */ > +typedef struct MemoryContextState > +{ > + dsa_handle memstats_dsa_handle; > + LWLock lw_lock; > +} MemoryContextState; IMO that's too generic a name for something in a header. > +/* > + * Used for storage of transient identifiers for pg_get_backend_memory_contexts > + */ > +typedef struct MemoryContextId > +{ > + MemoryContext context; > + int context_id; > +} MemoryContextId; This too. Particularly because MemoryContextData->ident exist but is something different. > +DO $$ > +DECLARE > + launcher_pid int; > + r RECORD; > +BEGIN > + SELECT pid from pg_stat_activity where backend_type='autovacuum launcher' > + INTO launcher_pid; > + > + select type, name, ident > + from pg_get_process_memory_contexts(launcher_pid, false, 20) > + where path = '{1}' into r; > + RAISE NOTICE '%', r; > + select type, name, ident > + from pg_get_process_memory_contexts(pg_backend_pid(), false, 20) > + where path = '{1}' into r; > + RAISE NOTICE '%', r; > +END $$; I'd also test an aux process. I think the AV launcher isn't one, because it actually does "table" access of shared relations. Greetings, Andres Freund
pgsql-hackers by date: