Re: Enhancing Memory Context Statistics Reporting - Mailing list pgsql-hackers
From | Daniel Gustafsson |
---|---|
Subject | Re: Enhancing Memory Context Statistics Reporting |
Date | |
Msg-id | F23526FA-407E-484B-88BC-4CA81C7CED52@yesql.se Whole thread Raw |
In response to | Re: Enhancing Memory Context Statistics Reporting (Rahila Syed <rahilasyed90@gmail.com>) |
Responses |
Re: Enhancing Memory Context Statistics Reporting
|
List | pgsql-hackers |
> On 20 Mar 2025, at 08:39, Rahila Syed <rahilasyed90@gmail.com> wrote: Thanks for the new version, I believe this will be a welcome tool in the debugging toolbox. I took a cleanup pass over the docs with among others the below changes: * You had broken the text in paragraphs, but without <para/> tags they are rendered as a single blob of text so added that. * Removed the "(PID)" explanation as process id is used elsewhere on the same page already without explanation. * Added <productname/> markup on PostgreSQL * Added <literal/> markup on paramater values * Switched the example query output to use \x * Added a note on when pg_backend_memory_contexts is a better choice The paragraphs need some re-indenting to avoid too long lines, but I opted out of doing so here to make reviewing the changes easier. A few comments on the code (all comments are performed in 0003 attached here which also has smaller cleanups wrt indentation, code style etc): +#include <math.h> I don't think we need this, maybe it was from an earlier version of the patch? +MEM_CTX_PUBLISH "Waiting for backend to publish memory information." I wonder if this should really be "process" and not backend? + default: + context_type = "???"; + break; In ContextTypeToString() I'm having second thoughts about this, there shouldn't be any legitimate use-case of passing a nodetag this function which would fail MemoryContextIsValid(). I wonder if we aren't helping callers more by erroring out rather than silently returning an unknown? I haven't changed this but maybe we should to set the API right from the start? + * if the process has more memory contexts than that can fit in the allocated s/than that can/than what can/? + errmsg("memory context statistics privilege error")); Similar checks elsewhere in the tree mostly use "permission denied to .." so I think we should adopt that here as well. + LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE); + msecs = + TimestampDifferenceMilliseconds(curr_timestamp, + memCtxState[procNumber].stats_timestamp); Since we only want to consider the stats if they are from the current process, we can delay checking the time difference until after we've checked the pid and thus reduce the amount of time we hold the lock in the error case. + /* + * Recheck the state of the backend before sleeping on the condition + * variable + */ + proc = BackendPidGetProc(pid); Here we are really rechecking that the process is still alive, but I wonder if we should take the opportunity to ensure that the type is what we expect it to be? If the pid has moved from being a backend to an aux proc or vice versa we really don't want to go on. + ereport(WARNING, + errmsg("PID %d is not a PostgreSQL server process", + pid)); I wonder if we should differentiate between the warnings? When we hit this in the loop the errmsg is describing a slightly different case. I did leave it for now, but it's food for thought if we should perhaps reword this one. + ereport(LOG, + errmsg("Wait for %d process to publish stats timed out, trying again", + pid)); This should probably by DEBUG1, in a congested cluster it might cause a fair bit of logging which isn't really helping the user. Also, nitpick, errmsg starts with a lowercase letter. +static Size +MemCtxShmemSize(void) We don't really need this function anymore and keeping it separate we risk it going out of sync with the matching calcuation in MemCtxBackendShmemInit, so I think we should condense into one. else { + Assert(print_location == PRINT_STATS_NONE); Rather than an if-then-else and an assert we can use a switch statement without a default, this way we'll automatically get a warning if a value is missed. + ereport(LOG, + errmsg("hash table corrupted, can't construct path value")); I know you switched from elog(LOG.. to ereport(LOG.. but I still think a LOG entry stating corruption isn't helpful, it's not actionable for the user. Given that it's a case that shouldn't happen I wonder if we should downgrade it to an Assert(false) and potentially a DEBUG1? -- Daniel Gustafsson
Attachment
pgsql-hackers by date: