Thread: Re: Enhancing Memory Context Statistics Reporting

Re: Enhancing Memory Context Statistics Reporting

From

Alvaro Herrera

Date:

14 November 2024, 14:48:47

On 2024-Nov-14, Michael Paquier wrote:

> Already mentioned previously at [1] and echoing with some surrounding
> arguments, but I'd suggest to keep it simple and just remove entirely
> the part of the patch where the stats information gets spilled into
> disk.  With more than 6000-ish context information available with a
> hard limit in place, there should be plenty enough to know what's
> going on anyway.

Functionally-wise I don't necessarily agree with _removing_ the spill
code, considering that production systems with thousands of tables would
easily reach that number of contexts (each index gets its own index info
context, each regexp gets its own memcxt); and I don't think silently
omitting a fraction of people's memory situation (or erroring out if the
case is hit) is going to make us any friends.

That said, it worries me that we choose a shared memory size so large
that it becomes impractical to hit the spill-to-disk code in regression
testing.  Maybe we can choose a much smaller limit size when
USE_ASSERT_CHECKING is enabled, and use a test that hits that number?
That way, we know the code is being hit and tested, without imposing a
huge memory consumption on test machines.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Tiene valor aquel que admite que es un cobarde" (Fernandel)

Re: Enhancing Memory Context Statistics Reporting

From

Ashutosh Bapat

Date:

20 November 2024, 17:01:04

On Wed, Nov 20, 2024 at 2:39 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
>
> Hi,
>
>> To achieve both completeness and avoid writing to a file, I can consider
>> displaying the numbers for the remaining contexts as a cumulative total
>> at the end of the output.
>>
>> Something like follows:
>> ```
>> postgres=# select * from pg_get_process_memory_contexts('237244', false);
>>                  name                  |                     ident                      |   type   |     path     |
total_bytes| tot 
>> al_nblocks | free_bytes | free_chunks | used_bytes |  pid
>>
---------------------------------------+------------------------------------------------+----------+--------------+-------------+----
>> -----------+------------+-------------+------------+--------
>>  TopMemoryContext                      |                                                | AllocSet | {0}          |
    97696 | 
>>          5 |      14288 |          11 |      83408 | 237244
>>  search_path processing cache          |                                                | AllocSet | {0,1}        |
     8192 | 
>>          1 |       5328 |           7 |       2864 | 237244
>> Remaining contexts total:  23456 bytes (total_bytes) , 12345(used_bytes),  11,111(free_bytes)
>>
>> ```
>
>
> Please find attached an updated patch with this change. The file previously used to
> store spilled statistics has been removed. Instead, a cumulative total of the
> remaining/spilled context statistics is now stored in the DSM segment, which is
> displayed as follows.
>
> postgres=# select * from pg_get_process_memory_contexts('352966', false);
>              name             | ident |   type   |  path  | total_bytes | total_nblocks | free_bytes | free_chunks |
used_bytes|  pi 
> d
>
------------------------------+-------+----------+--------+-------------+---------------+------------+-------------+------------+----
> ----
>  TopMemoryContext             |       | AllocSet | {0}    |       97696 |             5 |      14288 |          11 |
   83408 | 352 
> 966
> .
> .
> .
>  MdSmgr                       |       | AllocSet | {0,18} |        8192 |             1 |       7424 |           0 |
     768 | 352 
> 966
>  Remaining Totals             |       |          |        |     1756016 |           188 |     658584 |         132 |
 1097432 | 352 
> 966
> (7129 rows)
> -----
>
> I believe this serves as a good compromise between completeness
> and avoiding the overhead of file handling. However, I am open to
> reintroducing file handling if displaying the complete statistics of the
> remaining contexts prove to be more important.
>
> All the known bugs in the patch have been fixed.
>
> In summary, one DSA  per PostgreSQL process is used to share
> the statistics of that process. A DSA is created by the first client
> backend that requests memory context statistics, and it is pinned
> for all future requests to that process.
> A handle to this DSA is shared between the client and the publishing
> process using fixed shared memory. The fixed shared memory consists
> of an array of size MaxBackends + auxiliary processes, indexed
> by procno. Each element in this array is less than 100 bytes in size.
>
> A PostgreSQL process uses a condition variable to signal a waiting client
> backend once it has finished publishing the statistics. If, for some reason,
> the signal is not sent, the waiting client backend will time out.

How does the process know that the client backend has finished reading
stats and it can be refreshed? What happens, if the next request for
memory context stats comes before first requester has consumed the
statistics it requested?

Does the shared memory get deallocated when the backend which
allocated it exits?

>
> When statistics of a local backend is requested, this function returns the following
> WARNING and exits, since this can be handled by an existing function which
> doesn't require a DSA.
>
> WARNING:  cannot return statistics for local backend
> HINT:  Use pg_get_backend_memory_contexts instead

How about using pg_get_backend_memory_contexts() for both - local as
well as other backend? Let PID argument default to NULL which would
indicate local backend, otherwise some other backend?

--
Best Wishes,
Ashutosh Bapat

Re: Enhancing Memory Context Statistics Reporting

From

Ashutosh Bapat

Date:

25 November 2024, 07:54:47

On Fri, Nov 22, 2024 at 6:33 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
>
> Hi,
>
>> How does the process know that the client backend has finished reading
>> stats and it can be refreshed? What happens, if the next request for
>> memory context stats comes before first requester has consumed the
>> statistics it requested?
>>
> A process that's copying its statistics does not need to know that.
> Whenever it receives a signal to copy statistics, it goes ahead and
> copies the latest statistics to the DSA after acquiring an exclusive
> lwlock.
>
> A requestor takes a lock before it starts consuming the statistics.
> If the next request comes while the first requestor is consuming the
> statistics, the publishing process will wait on lwlock to be released
> by the consuming process before it can write the statistics.
> If the next request arrives before the first requester begins consuming
> the statistics, the publishing process will acquire the lock and overwrite
> the earlier statistics with the most recent ones.
> As a result, both the first and second requesters will consume the
> updated statistics.

IIUC, the publisher and the consumer processes, both, use the same
LWLock. Publisher acquires an exclusive lock. Does consumer acquire
SHARED lock?

The publisher process might be in a transaction, processing a query or
doing something else. If it has to wait for an LWLock may affect its
performance. This will become even more visible if the client backend
is trying to diagnose a slow running query. Have we tried to measure
how long the publisher might have to wait for an LWLock while the
consumer is consuming statistics OR what is the impact of this wait?

>> >
>> > When statistics of a local backend is requested, this function returns the following
>> > WARNING and exits, since this can be handled by an existing function which
>> > doesn't require a DSA.
>> >
>> > WARNING:  cannot return statistics for local backend
>> > HINT:  Use pg_get_backend_memory_contexts instead
>>
>> How about using pg_get_backend_memory_contexts() for both - local as
>> well as other backend? Let PID argument default to NULL which would
>> indicate local backend, otherwise some other backend?
>>
> I don't see much value in combining the two, specially since with
> pg_get_process_memory_contexts() we can query both the postgres
> backend and a background process, the name pg_get_backend_memory_context()
> would be inaccurate and I am not sure whether a change to rename the
> existing function would be welcome.

Having two separate functions for the same functionality isn't a
friendly user interface.

Playing a bit with pg_terminate_backend() which is another function
dealing with backends to understand a. what does it do to its own
backend and b. which processes are considered backends.

1. pg_terminate_backend() allows to terminate the backend from which
it is fired.
#select pid, application_name, backend_type, pg_terminate_backend(pid)
from pg_stat_activity;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

2. It considers autovacuum launcher and logical replication launcher
as postgres backends but not checkpointer, background writer and
walwriter.
#select pid, application_name, backend_type, pg_terminate_backend(pid)
from pg_stat_activity where pid <> pg_backend_pid();
WARNING:  PID 644887 is not a PostgreSQL backend process
WARNING:  PID 644888 is not a PostgreSQL backend process
WARNING:  PID 644890 is not a PostgreSQL backend process
  pid   | application_name |         backend_type         | pg_terminate_backend
--------+------------------+------------------------------+----------------------
 645636 |                  | autovacuum launcher          | t
 645677 |                  | logical replication launcher | t
 644887 |                  | checkpointer                 | f
 644888 |                  | background writer            | f
 644890 |                  | walwriter                    | f
(5 rows)

In that sense you are correct that pg_get_backend_memory_context()
should not provide context information of WAL writer process for
example. But pg_get_process_memory_contexts() would be expected to
provide its own memory context information instead of redirecting to
another function through a WARNING. It could do that redirection
itself. That will also prevent the functions' output format going out
of sync.

--
Best Wishes,
Ashutosh Bapat

Re: Enhancing Memory Context Statistics Reporting

From

Tomas Vondra

Date:

27 November 2024, 19:19:56

Hi,

I took a quick look at the patch today. Overall, I think this would be
very useful, I've repeatedly needed to inspect why a backend uses so
much memory, and I ended up triggering MemoryContextStats() from gdb.
This would be more convenient / safer. So +1 to the patch intent.


A couple review comments:

1) I read through the thread, and in general I agree with the reasoning
for removing the file part - it seems perfectly fine to just dump as
much as we can fit into a buffer, and then summarize the rest. But do we
need to invent a "new" limit here? The other places logging memory
contexts do something like this:

   MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);

Which means we only print the 100 memory contexts at the top, and that's
it. Wouldn't that give us a reasonable memory limit too?


2) I see the function got renamed to pg_get_process_memory_contexts(),
bu the comment still says pg_get_remote_backend_memory_contexts().


3) I don't see any SGML docs for this new function. I was a bit unsure
what the "summary" argument is meant to do. The comment does not explain
that either.


4) I wonder if the function needs to return PID. I mean, the caller
knows which PID it is for, so it seems rather unnecessary.


5) In the "summary" mode, it might be useful to include info about how
many child contexts were aggregated. It's useful to know whether there
was 1 child or 10000 children. In the regular (non-summary) mode it'd
always be "1", probably, but maybe it'd interact with the limit in (1).
Not sure.


6) I feel a bit uneasy about the whole locking / communication scheme.
In particular, I'm worried about lockups, deadlocks, etc. So I decided
to do a trivial stress-test - just run the new function through pgbench
with many clients.

The memstats.sql script does just this:

  SELECT * FROM pg_get_process_memory_contexts(
    (SELECT pid FROM pg_stat_activity
      WHERE pid != pg_backend_pid()
      ORDER BY random() LIMIT 1)
    , false);

where the inner query just picks a PID for some other backend, and asks
for memory context stats for that.

And just run it like this on a scale 1 pgbench database:

  pgbench -n -f memstats.sql -c 10 test

And it gets stuck *immediately*. I've seen it to wait for other client
backends and auxiliary processes like autovacuum launcher.

This is absolutely idle system, there's no reason why a process would
not respond almost immediately. I wonder if e.g. autovacuum launcher may
not be handling these requests, or what if client backends can wait in a
cycle. IIRC condition variables are not covered by a deadlock detector,
so that would be an issue. But maybe I remember wrong?


7) I've also seen this error:

  pgbench: error: client 6 script 0 aborted in command 0 query 0: \
  ERROR:  can't attach the same segment more than once

I haven't investigated it, but it seems like a problem handling errors,
where we fail to detach from a segment after a timeout. I may be wrong,
but it might be related to this:

  > I opted for DSAs over DSMs to enable memory reuse by freeing
  > segments for subsequent statistics copies of the same backend,
  > without needing to recreate DSMs for each request.

I feel like this might be a premature optimization - I don't have a
clear idea how expensive it is to create DSM per request, but my
intuition is that it's cheaper than processing the contexts and
generating the info.

I'd just remove that, unless someone demonstrates it really matters. I
don't really worry about how expensive it is to process a request
(within reason, of course) - it will happen only very rarely. It's more
important to make sure there's no overhead when no one asks the backend
for memory context info, and simplicity.

Also, how expensive it is to just keep the DSA "just in case"? Imagine
someone asks for the memory context info once - isn't it a was to still
keep the DSA? I don't recall how much resources could that be.

I don't have a clear opinion on that, I'm more asking for opinions.


8) Two minutes seems pretty arbitrary, and also quite high. If a timeout
is necessary, I think it should not be hard-coded.


regards

-- 
Tomas Vondra

Re: Enhancing Memory Context Statistics Reporting

From

Tomas Vondra

Date:

29 November 2024, 03:21:30

On 11/29/24 00:23, Rahila Syed wrote:
> Hi Tomas,
> 
> Thank you for the review.
> 
> 
> 
>     1) I read through the thread, and in general I agree with the reasoning
>     for removing the file part - it seems perfectly fine to just dump as
>     much as we can fit into a buffer, and then summarize the rest. But do we
>     need to invent a "new" limit here? The other places logging memory
>     contexts do something like this:
> 
>        MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
> 
>     Which means we only print the 100 memory contexts at the top, and that's
>     it. Wouldn't that give us a reasonable memory limit too?
> 
> I think this prints more than 100 memory contexts, since 100 denotes the
> max_level
> and contexts at each level could have upto 100 children. This limit
> seems much higher than
> what I am currently storing in DSA which is approx. 7000 contexts.  I
> will verify this again.
>  

Yeah, you may be right. I don't remember what exactly that limit does.

> 
>     2) I see the function got renamed to pg_get_process_memory_contexts(),
>     bu the comment still says pg_get_remote_backend_memory_contexts().
> 
> Fixed 
> 
> 
>     3) I don't see any SGML docs for this new function. I was a bit unsure
>     what the "summary" argument is meant to do. The comment does not explain
>     that either.
> 
> Added docs. 
> Intention behind adding a summary argument is to report statistics of
> contexts at level 0 
> and 1 i.e TopMemoryContext and its immediate children. 
> 

OK

>     4) I wonder if the function needs to return PID. I mean, the caller
>     knows which PID it is for, so it seems rather unnecessary.
> 
> Perhaps it can be used to ascertain that the information indeed belongs to 
> the requested pid.
> 

I find that a bit ... suspicious. By this logic we'd include the input
parameters in every result, but we don't. So why is this case different?

>     5) In the "summary" mode, it might be useful to include info about how
>     many child contexts were aggregated. It's useful to know whether there
>     was 1 child or 10000 children. In the regular (non-summary) mode it'd
>     always be "1", probably, but maybe it'd interact with the limit in (1).
>     Not sure.
> 
> Sure,  I will add this in the next iteration. 
> 

OK

> 
>     6) I feel a bit uneasy about the whole locking / communication scheme.
>     In particular, I'm worried about lockups, deadlocks, etc. So I decided
>     to do a trivial stress-test - just run the new function through pgbench
>     with many clients.
> 
>     The memstats.sql script does just this:
> 
>       SELECT * FROM pg_get_process_memory_contexts(
>         (SELECT pid FROM pg_stat_activity
>           WHERE pid != pg_backend_pid()
>           ORDER BY random() LIMIT 1)
>         , false);
> 
>     where the inner query just picks a PID for some other backend, and asks
>     for memory context stats for that.
> 
>     And just run it like this on a scale 1 pgbench database:
> 
>       pgbench -n -f memstats.sql -c 10 test
> 
>     And it gets stuck *immediately*. I've seen it to wait for other client
>     backends and auxiliary processes like autovacuum launcher.
> 
>     This is absolutely idle system, there's no reason why a process would
>     not respond almost immediately.
> 
>  
> In my reproduction, this issue occurred because the process was terminated 
> while the requesting backend was waiting on the condition variable to be 
> signaled by it. I don’t see any solution other than having the waiting
> client 
> backend timeout using ConditionVariableTimedSleep.
> 
> In the patch, since the timeout was set to a high value, pgbench ended
> up stuck 
> waiting for the timeout to occur. The failure happens less frequently
> after I added an
> additional check for the process's existence, but it cannot be entirely 
> avoided. This is because a process can terminate after we check for its
> existence but 
> before it signals the client. In such cases, the client will not receive
> any signal.
> 

Hmmm, I see. I guess there's no way to know if a process responds to us,
but I guess it should be possible to wake up regularly and check if the
process still exists? Wouldn't that solve the case you mentioned?

>     I wonder if e.g. autovacuum launcher may
>     not be handling these requests, or what if client backends can wait in a
>     cycle.
> 
>  
> Did not see a cyclic wait in client backends due to the pgbench stress test.
>  

Not sure, but if I modify the query to only request memory contexts from
non-client processes, i.e.

  SELECT * FROM pg_get_process_memory_contexts(
    (SELECT pid FROM pg_stat_activity
      WHERE pid != pg_backend_pid()
        AND backend_type != 'client backend'
      ORDER BY random() LIMIT 1)
    , false);

then it gets stuck and reports this:

  pgbench -n -f select.sql -c 4 -T 10 test
  pgbench (18devel)
  WARNING:  Wait for 105029 process to publish stats timed out, ...

But process 105029 still very much exists, and it's the checkpointer:

  $ ps ax | grep 105029
  105029 ?        Ss     0:00 postgres: checkpointer

OTOH if I modify the script to only look at client backends, and wait
until the processes get "stuck" (i.e. waiting on the condition variable,
consuming 0% CPU), I get this:

$ pgbench -n -f select.sql -c 4 -T 10 test
pgbench (18devel)
WARNING:  Wait for 107146 process to publish stats timed out, try again
WARNING:  Wait for 107144 process to publish stats timed out, try again
WARNING:  Wait for 107147 process to publish stats timed out, try again
transaction type: select.sql
...

but when it gets 'stuck', most of the processes are still very much
running (but waiting for contexts from some other process). In the above
example I see this:

 107144 ?        Ss     0:02 postgres: user test [local] SELECT
 107145 ?        Ss     0:01 postgres: user test [local] SELECT
 107147 ?        Ss     0:02 postgres: user test [local] SELECT

So yes, 107146 seems to be gone. But why would that block getting info
from 107144 and 107147?

Maybe that's acceptable, but couldn't this be an issue with short-lived
connections, making it hard to implement the kind of automated
collection of stats that you envision. If it hits this kind of timeouts
often, it'll make it hard to reliably collect info. No?

> 
>       > I opted for DSAs over DSMs to enable memory reuse by freeing
>       > segments for subsequent statistics copies of the same backend,
>       > without needing to recreate DSMs for each request.
> 
>     I feel like this might be a premature optimization - I don't have a
>     clear idea how expensive it is to create DSM per request, but my
>     intuition is that it's cheaper than processing the contexts and
>     generating the info.
> 
>     I'd just remove that, unless someone demonstrates it really matters. I
>     don't really worry about how expensive it is to process a request
>     (within reason, of course) - it will happen only very rarely. It's more
>     important to make sure there's no overhead when no one asks the backend
>     for memory context info, and simplicity.
> 
>     Also, how expensive it is to just keep the DSA "just in case"? Imagine
>     someone asks for the memory context info once - isn't it a was to still
>     keep the DSA? I don't recall how much resources could that be.
> 
>     I don't have a clear opinion on that, I'm more asking for opinions.
> 
>   
> Imagining a tool that periodically queries the backends for statistics, 
> it would be beneficial to avoid recreating the DSAs for each call.

I think it would be nice if you backed this with some numbers. I mean,
how expensive is it to create/destroy the DSA? How does it compare to
the other stuff this function needs to do?

> Currently,  DSAs of size 1MB per process 
> (i.e., a maximum of 1MB * (MaxBackends + auxiliary processes)) 
> would be created and pinned for subsequent reporting. This size does 
> not seem excessively high, even for approx 100 backends and 
> auxiliary processes. 
> 

That seems like a pretty substantial amount of memory reserved for each
connection. IMHO the benefits would have to be pretty significant to
justify this, especially considering it's kept "forever", even if you
run the function only once per day.

> 
>     8) Two minutes seems pretty arbitrary, and also quite high. If a timeout
>     is necessary, I think it should not be hard-coded.
> 
> Not sure which is the ideal value. Changed it to 15 secs and added a
> #define as of now. 
> Something that gives enough time for the process to respond but 
> does not hold up the client for too long would be ideal. 15 secs seem to 
> be not enough for github CI tests, which fail with timeout error with
> this setting.
> 
> PFA an updated patch with the above changes.

Why not to make this a parameter of the function? With some sensible
default, but easy to override.


regards

-- 
Tomas Vondra

Re: Enhancing Memory Context Statistics Reporting

From

Tomas Vondra

Date:

04 December 2024, 00:09:11

On 12/3/24 20:09, Rahila Syed wrote:
> 
> Hi,
>  
> 
> 
> 
>     >     4) I wonder if the function needs to return PID. I mean, the
>     caller
>     >     knows which PID it is for, so it seems rather unnecessary.
>     >
>     > Perhaps it can be used to ascertain that the information indeed
>     belongs to 
>     > the requested pid.
>     >
> 
>     I find that a bit ... suspicious. By this logic we'd include the input
>     parameters in every result, but we don't. So why is this case different?
> 
>  
> This was added to address a review suggestion. I had left it in case
> anyone found it useful 
> for verification. 
> Previously, I included a check for scenarios where multiple processes
> could write to the same 
> shared memory. Now, each process has a separate shared memory space
> identified by 
> pgprocno, making it highly unlikely for the receiving process to see
> another process's memory 
> dump.
> Such a situation could theoretically occur if another process were
> mapped to the same 
> pgprocno, although I’m not sure how likely that is. That said, I’ve
> added a check in the receiver
> to ensure the PID written in the shared memory matches the PID for which
> the dump is 
> requested. 
> This guarantees that a user will never see the memory dump of another
> process.
> Given this, I’m fine with removing the pid column if it helps to make
> the output more readable.
> 

I'd just remove that. I agree it might have been useful with the single
chunk of shared memory, but I think with separate chunks it's not very
useful. And if we can end up with multiple processed getting the same
pgprocno I guess we have way bigger problems, this won't fix that.

>     >     5) In the "summary" mode, it might be useful to include info
>     about how
>     >     many child contexts were aggregated. It's useful to know
>     whether there
>     >     was 1 child or 10000 children. In the regular (non-summary)
>     mode it'd
>     >     always be "1", probably, but maybe it'd interact with the
>     limit in (1).
>     >     Not sure.
>     >
>     > Sure,  I will add this in the next iteration. 
>     >
> 
>     OK
> 
>  
> I have added this information as a column named "num_agg_contexts",
> which indicates 
> the number of contexts whose statistics have been aggregated/added for a
> particular output.
> 
> In summary mode, all the child contexts of a given level-1 context are
> aggregated, and 
> their statistics are presented as part of the parent context's
> statistics. In this case, 
> num_agg_contexts  provides the count of all child contexts under a given
> level-1 context.
> 
> In regular (non-summary) mode, this column shows a value of 1, meaning
> the statistics 
> correspond to a single context, with all context statistics displayed
> individually. In this mode
> an aggregate result is displayed if the number of contexts exceed the
> DSA size limit. In
> this case the num_agg_contexts will display the number of the remaining
> contexts.
> 

OK

>     >      
>     > In the patch, since the timeout was set to a high value, pgbench ended
>     > up stuck 
>     > waiting for the timeout to occur. The failure happens less frequently
>     > after I added an
>     > additional check for the process's existence, but it cannot be
>     entirely 
>     > avoided. This is because a process can terminate after we check
>     for its
>     > existence but 
>     > before it signals the client. In such cases, the client will not
>     receive
>     > any signal.
>     >
> 
>     Hmmm, I see. I guess there's no way to know if a process responds to us,
>     but I guess it should be possible to wake up regularly and check if the
>     process still exists? Wouldn't that solve the case you mentioned?
> 
> I have fixed it accordingly in the attached patch by waking up after
> every 5 seconds
> to check if the process exists and sleeping again if the wake-up condition
> is not satisfied.  The number of such tries is limited to 20. So, the
> total wait 
> time can be 100 seconds. I will make the re-tries configurable, inline
> with your 
> suggestion to be able to override the default waiting time.
>  

Makes sense, although 100 seconds seems a bit weird, it seems we usually
pick "natural" values like 60s, or multiples of that. But if it's
configurable, that's not a huge issue.

Could the process wake up earlier than the timeout, say if it gets EINT
signal? That'd break the "total timeout is 100 seconds", and it would be
better to check that explicitly. Not sure if this can happen, though.

One thing I'd maybe consider is starting with a short timeout, and
gradually increasing it until e.g. 5 seconds (or maybe just 1 second
would be perfectly fine, IMHO). With the current coding it means we
either get the response right away, or wait 5+ seconds. That's a big
huge jump. If we start e.g. with 10ms, and then gradually multiply it by
1.2, it means we only wait "0-20% extra" on average.

But perhaps this is very unlikely and not worth the complexity.

> 
>     >     I wonder if e.g. autovacuum launcher may
>     >     not be handling these requests, or what if client backends can
>     wait in a
>     >     cycle.
>     >
>     >  
>     > Did not see a cyclic wait in client backends due to the pgbench
>     stress test.
>     >  
> 
>     Not sure, but if I modify the query to only request memory contexts from
>     non-client processes, i.e.
> 
>       SELECT * FROM pg_get_process_memory_contexts(
>         (SELECT pid FROM pg_stat_activity
>           WHERE pid != pg_backend_pid()
>             AND backend_type != 'client backend'
>           ORDER BY random() LIMIT 1)
>         , false);
> 
>     then it gets stuck and reports this:
> 
>       pgbench -n -f select.sql -c 4 -T 10 test
>       pgbench (18devel)
>       WARNING:  Wait for 105029 process to publish stats timed out, ...
> 
>     But process 105029 still very much exists, and it's the checkpointer:
> 
> In the case of checkpointer, I also see some wait time after running the
> tests that you mentioned, but it eventually completes the request in my
> runs.
>  

OK, but why should it even wait that long? Surely the checkpointer
should be able to report memory contexts too?

> 
>       $ ps ax | grep 105029
>       105029 ?        Ss     0:00 postgres: checkpointer
> 
>     OTOH if I modify the script to only look at client backends, and wait
>     until the processes get "stuck" (i.e. waiting on the condition variable,
>     consuming 0% CPU), I get this:
> 
>     $ pgbench -n -f select.sql -c 4 -T 10 test
>     pgbench (18devel)
>     WARNING:  Wait for 107146 process to publish stats timed out, try again
>     WARNING:  Wait for 107144 process to publish stats timed out, try again
>     WARNING:  Wait for 107147 process to publish stats timed out, try again
>     transaction type: select.sql
>     ...
> 
>     but when it gets 'stuck', most of the processes are still very much
>     running (but waiting for contexts from some other process). In the above
>     example I see this:
> 
>      107144 ?        Ss     0:02 postgres: user test [local] SELECT
>      107145 ?        Ss     0:01 postgres: user test [local] SELECT
>      107147 ?        Ss     0:02 postgres: user test [local] SELECT
> 
>     So yes, 107146 seems to be gone. But why would that block getting info
>     from 107144 and 107147?
> 
> Most likely 107144 and/or 107147 must also be waiting for 107146 which is 
> gone. Something like 107144 -> 107147 -> 107146(dead) or 107144 -
>>107146(dead)
> and 107147->107146(dead).
> 

I think I forgot to mention only 107145 was waiting for 107146 (dead),
and the other processes were waiting for 107145 in some way. But yeah,
detecting the dead process would improve this, although it also shows
the issues can "spread" easily.

OTOH it's unlikely to have multiple pg_get_process_memory_contexts()
queries pointing at each other like this - monitoring will just to that
from one backend, and that's it. So not a huge issue.

> 
>     Maybe that's acceptable, but couldn't this be an issue with short-lived
>     connections, making it hard to implement the kind of automated
>     collection of stats that you envision. If it hits this kind of timeouts
>     often, it'll make it hard to reliably collect info. No?
> 
> 
> Yes, if there is a chain of waiting clients due to a process no longer
> existing, 
> the waiting time to receive information will increase. However, as long
> as a failed 
> a request caused by a non-existent process is detected promptly, the
> wait time should 
> remain manageable, allowing other waiting clients to obtain the
> requested information 
> from the existing processes.
> 
> In such cases, it might be necessary to experiment with the waiting
> times at the receiving 
> client. Making the waiting time user-configurable, as you suggested, by
> passing it as an 
> argument to the function, could help address this scenario.
> Thanks for highlighting this, I will test this some more. 
>  

I think we should try very hard to make this work well without the user
having to mess with the timeouts. These are exceptional conditions that
happen only very rarely, which makes it hard to find good values.

> 
>     >
>     >       > I opted for DSAs over DSMs to enable memory reuse by freeing
>     >       > segments for subsequent statistics copies of the same backend,
>     >       > without needing to recreate DSMs for each request.
>     >
>     >     I feel like this might be a premature optimization - I don't
>     have a
>     >     clear idea how expensive it is to create DSM per request, but my
>     >     intuition is that it's cheaper than processing the contexts and
>     >     generating the info.
>     >
>     >     I'd just remove that, unless someone demonstrates it really
>     matters. I
>     >     don't really worry about how expensive it is to process a request
>     >     (within reason, of course) - it will happen only very rarely.
>     It's more
>     >     important to make sure there's no overhead when no one asks
>     the backend
>     >     for memory context info, and simplicity.
>     >
>     >     Also, how expensive it is to just keep the DSA "just in case"?
>     Imagine
>     >     someone asks for the memory context info once - isn't it a was
>     to still
>     >     keep the DSA? I don't recall how much resources could that be.
>     >
>     >     I don't have a clear opinion on that, I'm more asking for
>     opinions.
>     >
>     >   
>     > Imagining a tool that periodically queries the backends for
>     statistics, 
>     > it would be beneficial to avoid recreating the DSAs for each call.
> 
>     I think it would be nice if you backed this with some numbers. I mean,
>     how expensive is it to create/destroy the DSA? How does it compare to
>     the other stuff this function needs to do?
> 
> After instrumenting the code with timestamps, I observed that DSA creation 
> accounts for approximately 17% to 26% of the total execution time of the
> function 
> pg_get_process_memory_contexts().
> 
>     > Currently,  DSAs of size 1MB per process 
>     > (i.e., a maximum of 1MB * (MaxBackends + auxiliary processes)) 
>     > would be created and pinned for subsequent reporting. This size does 
>     > not seem excessively high, even for approx 100 backends and 
>     > auxiliary processes. 
>     >
> 
>     That seems like a pretty substantial amount of memory reserved for each
>     connection. IMHO the benefits would have to be pretty significant to
>     justify this, especially considering it's kept "forever", even if you
>     run the function only once per day.
> 
> I can reduce the initial segment size to DSA_MIN_SEGMENT_SIZE, which is 
> 256KB per process. If needed, this could grow up to 16MB based on the
> current settings.
> 
> However, for the scenario you mentioned, it would be ideal to have a
> mechanism 
> to mark a pinned DSA (using dsa_pin()) for deletion if it is not used/
> attached within a 
> specified duration. Alternatively, I could avoid using dsa_pin()
> altogether, allowing the 
> DSA to be automatically destroyed once all processes detach from it, and
> recreate it 
> for a new request.
> 
> At the moment, I am unsure which approach is most feasible. Any
> suggestions would be
> greatly appreciated.
> 

I'm entirely unconcerned about the pg_get_process_memory_contexts()
performance, within some reasonable limits. It's something executed
every now and then - no one is going to complain it takes 10ms extra,
measure tps with this function, etc.

17-26% seems surprisingly high, but Even 256kB is too much, IMHO. I'd
just get rid of this optimization until someone complains and explains
why it's worth it.

Yes, let's make it fast, but I don't think we should optimize it at the
expense of "regular workload" ...


regards

-- 
Tomas Vondra

Re: Enhancing Memory Context Statistics Reporting

From

Amit Langote

Date:

16 December 2024, 07:44:46

Hi Rahila,

Thanks for working on this.  I've wanted something like this a number
of times to replace my current method of attaching gdb like everyone
else I suppose.

I have a question / suggestion about the interface.

+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+    int         pid = PG_GETARG_INT32(0);
+    bool        get_summary = PG_GETARG_BOOL(1);

IIUC, this always returns all memory contexts starting from
TopMemoryContext, summarizing some child contexts if memory doesn't
suffice. Would it be helpful to allow users to specify a context other
than TopMemoryContext as the root? This could be particularly useful
in cases where the information a user is looking for would otherwise
be grouped under "Remaining Totals." Alternatively, is there a way to
achieve this with the current function, perhaps by specifying a
condition in the WHERE clause?

Re: Enhancing Memory Context Statistics Reporting

From

torikoshia

Date:

16 December 2024, 15:51:53

Hi,

Thanks for updating the patch and here are some comments:

'path' column of pg_get_process_memory_contexts() begins with 0, but 
that column of pg_backend_memory_contexts view begins with 1:

   =# select path FROM pg_get_process_memory_contexts('20271', false);
   path
   -------
    {0}
    {0,1}
    {0,2}
    ..

=# select path from pg_backend_memory_contexts;
    path
   -------
    {1}
    {1,2}
    {1,3}
    ..asdf asdf

Would it be better to begin with 1 to make them consistent?


pg_log_backend_memory_contexts() does not allow non-superusers to 
execute by default since it can peek at other session information.
pg_get_process_memory_contexts() does not have this restriction, but 
wouldn't it be necessary?


When the target pid is the local backend, the HINT suggests using 
pg_get_backend_memory_contexts(), but this function is not described in 
the manual.
How about suggesting pg_backend_memory_contexts view instead?

   =# select pg_get_process_memory_contexts('27041', false);
   WARNING:  cannot return statistics for local backend
   HINT:  Use pg_get_backend_memory_contexts instead


There are no explanations about 'num_agg_contexts', but I thought the 
explanation like below would be useful.

> I have added this information as a column named "num_agg_contexts", 
> which indicates
> the number of contexts whose statistics have been aggregated/added for 
> a particular output.

git apply caused some warnings:

$ git apply 
v7-Function-to-report-memory-context-stats-of-any-backe.patch
v7-Function-to-report-memory-context-stats-of-any-backe.patch:71: space 
before tab in indent.
         Requests to return the memory contexts of the backend with the
v7-Function-to-report-memory-context-stats-of-any-backe.patch:72: space 
before tab in indent.
         specified process ID.  This function can send the request to
v7-Function-to-report-memory-context-stats-of-any-backe.patch:ldmv: 
space before tab in indent.
         both the backends and auxiliary processes. After receiving the 
memory
v7-Function-to-report-memory-context-stats-of-any-backe.patch:74: space 
before tab in indent.
         contexts from the process, it returns the result as one row per
v7-Function-to-report-memory-context-stats-of-any-backe.patch:75: space 
before tab in indent.
         context. When get_summary is true, memory contexts at level 0


-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Re: Enhancing Memory Context Statistics Reporting

From

Fujii Masao

Date:

06 January, 20:04:20

On 2025/01/06 22:16, Rahila Syed wrote:
> PFA the patch with above updates.

Thanks for updating the patch! I like this feature.

I tested this feature and encountered two issues:

Issue 1: Error with pg_get_process_memory_contexts()
When I used pg_get_process_memory_contexts() on the PID of a backend process
that had just caused an error but hadn’t rolled back yet,
the following error occurred:

   Session 1 (PID=70011):
   =# begin;
   =# select 1/0;
   ERROR:  division by zero
   
   Session 2:
   =# select * from pg_get_process_memory_contexts(70011, false);
   
   Session 1 terminated with:
   ERROR:  ResourceOwnerEnlarge called after release started
   FATAL:  terminating connection because protocol synchronization was lost


Issue 2: Segmentation Fault
When I ran pg_get_process_memory_contexts() every 0.1 seconds using
\watch command while running "make -j 4 installcheck-world",
I encountered a segmentation fault:

   LOG:  client backend (PID 97975) was terminated by signal 11: Segmentation fault: 11
   DETAIL:  Failed process was running: select infinite_recurse();
   LOG:  terminating any other active server processes

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Enhancing Memory Context Statistics Reporting

From

Fujii Masao

Date:

08 January, 18:45:48

On 2025/01/08 21:03, Rahila Syed wrote:
> I have not been able to reproduce this issue. Could you please clarify which process you ran
> |pg_get_process_memory_context()| on, with the interval of 0.1?

I used the following query for testing:

=# SELECT count(*) FROM pg_stat_activity, pg_get_process_memory_contexts(pid, false) WHERE pid <> pg_backend_pid();
=# \watch 0.1

> Was it a backend process
> created by |make installcheck-world|, or some other process?

Yes, the target backends were from make installcheck-world.
No other workloads were running.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Enhancing Memory Context Statistics Reporting

From

Tomas Vondra

Date:

25 January, 01:20:50

On 1/24/25 14:47, Rahila Syed wrote:
> 
> Hi,
> 
> 
>     Just idea; as an another option, how about blocking new requests to
>     the target process (e.g., causing them to fail with an error or
>     returning NULL with a warning) if a previous request is still pending?
>     Users can simply retry the request if it fails. IMO failing quickly
>     seems preferable to getting stuck for a while in cases with concurrent
>     requests.
> 
> Thank you for the suggestion. I agree that it is better to fail 
> early and avoid waiting for a timeout in such cases. I will add a 
> "pending request" tracker for this in shared memory. This approach 
> will help prevent sending a concurrent request if a request for the
> same backend is still being processed.
> 

AFAIK these failures should be extremely rare - we're only talking about
that because the workload I used for testing is highly concurrent, i.e.
it requests memory context info extremely often. I doubt anyone sane is
going to do that in practice ...

> IMO, one downside of throwing an error in such cases is that the
> users might wonder if they need to take a corrective action, even
> though the issue is actually going to solve itself and they just
> need to retry. Therefore, issuing a warning or displaying previously
> updated statistics might be a better alternative to throwing an
> error.
> 

Wouldn't this be mostly mitigated by adding proper detail/hint to the
error message? Sure, the user can always ignore that (especially when
calling this from a script), but well ... we can only do so much.

All this makes me think about how we shared pgstat data before the shmem
approach was introduced in PG15. Until then the process signaled pgstat
collector, and the collector wrote the statistics into a file, with a
timestamp. And the process used the timestamp to decide if it's fresh
enough ... Wouldn't the same approach work here?

I imagined it would work something like this:

requesting backend:
-------------------
* set request_ts to current timestamp
* signal the target process, to generate memory context info
* wait until the DSA gets filled with stats_ts > request_ts
* return the data, don't erase anything

target backend
--------------
* clear the signal
* generate the statistics
* set stats_ts to current timestamp
* wait all the backends waiting for the stats (through CV)

I see v11 does almost this, except that it accepts somewhat stale data.
But why would that be necessary? I don't think it's needed, and I don't
think we should accept data from before the process sends the signal.

regards

-- 
Tomas Vondra

Re: Enhancing Memory Context Statistics Reporting

From

Rahila Syed

Date:

29 January, 15:45:38

Hi,

On Sat, Jan 25, 2025 at 3:50 AM Tomas Vondra <tomas@vondra.me> wrote:

On 1/24/25 14:47, Rahila Syed wrote:
>
> Hi,
>
>
> Just idea; as an another option, how about blocking new requests to
> the target process (e.g., causing them to fail with an error or
> returning NULL with a warning) if a previous request is still pending?
> Users can simply retry the request if it fails. IMO failing quickly
> seems preferable to getting stuck for a while in cases with concurrent
> requests.
>
> Thank you for the suggestion. I agree that it is better to fail
> early and avoid waiting for a timeout in such cases. I will add a
> "pending request" tracker for this in shared memory. This approach
> will help prevent sending a concurrent request if a request for the
> same backend is still being processed.
>

AFAIK these failures should be extremely rare - we're only talking about
that because the workload I used for testing is highly concurrent, i.e.
it requests memory context info extremely often. I doubt anyone sane is
going to do that in practice ...

Yes, that makes sense.

> IMO, one downside of throwing an error in such cases is that the
> users might wonder if they need to take a corrective action, even
> though the issue is actually going to solve itself and they just
> need to retry. Therefore, issuing a warning or displaying previously
> updated statistics might be a better alternative to throwing an
> error.
>

Wouldn't this be mostly mitigated by adding proper detail/hint to the
error message? Sure, the user can always ignore that (especially when
calling this from a script), but well ... we can only do so much.

OK.

All this makes me think about how we shared pgstat data before the shmem
approach was introduced in PG15. Until then the process signaled pgstat
collector, and the collector wrote the statistics into a file, with a
timestamp. And the process used the timestamp to decide if it's fresh
enough ... Wouldn't the same approach work here?

I imagined it would work something like this:

requesting backend:
-------------------
* set request_ts to current timestamp
* signal the target process, to generate memory context info
* wait until the DSA gets filled with stats_ts > request_ts
* return the data, don't erase anything

target backend
--------------
* clear the signal
* generate the statistics
* set stats_ts to current timestamp
* wait all the backends waiting for the stats (through CV)

I see v11 does almost this, except that it accepts somewhat stale data.

That's correct.

But why would that be necessary? I don't think it's needed, and I don't
think we should accept data from before the process sends the signal.

This is done in an attempt to avoid concurrent requests from timing out.
In such cases, data in response to another request is likely to already be in the

dynamic shared memory. Hence instead of waiting for the latest data and risking a

timeout, the approach displays available statistics that are newer than a defined
threshold. Additionally, since we can't distinguish between sequential and

concurrent requests, we accept somewhat stale data for all requests.

I realize this approach has some issues, mainly regarding how to determine

an appropriate threshold value or a limit for old data.

Therefore, I agree that it makes sense to display the data that is published

after the request is made. If such data can't be published due to concurrent

requests or other delays, the function should detect this and return as soon as

possible.

Thank you,

Rahila Syed

Re: Enhancing Memory Context Statistics Reporting

From

Rahila Syed

Date:

20 February, 16:26:49

Hi,

Please find attached the updated patches after some cleanup and test fixes.

Thank you,
Rahila Syed

On Tue, Feb 18, 2025 at 6:35 PM Rahila Syed <rahilasyed90@gmail.com> wrote:

Hi,

Thanks for updating the patch!

The below comments would be a bit too detailed at this stage, but I’d
like to share the points I noticed.

Thanks for sharing the detailed comments. I have incorporated some of them
into the new version of the patch. I will include the rest when I refine and
comment the code further.

Meanwhile, I have fixed the following outstanding issues:

1. Currently one DSA is created per backend when the first request for
statistics is made and remains for the lifetime of the server.
I think I should add logic to periodically destroy DSAs, when memory
context statistics are not being *actively* queried from the backend,
as determined by the statistics timestamp.

After an offline discussion with Andres and Tomas, I have fixed this to use
only one DSA for all the publishing backends/processes. Each backend
allocates smaller chunks of memory within the DSA while publishing statistics.
These chunks are tracked independently by each backend, ensuring that two
publishing backends/processes do not block each other despite using the same
DSA. This approach eliminates the overhead of creating multiple DSAs,
one for each backend.

I am not destroying the DSA area because it stores the previously published
statistics for each process. This allows the system to display older statistics
when the latest data cannot be retrieved within a reasonable time.
Only the most recently updated statistics are kept, while all earlier ones
are freed using dsa_free by each backend when they are no longer needed.
.
2. The two issues reported by Fujii-san here: [1].
i. I have proposed a fix for the first issue here [2].
ii. I am able to reproduce the second issue. This happens when we try
to query statistics of a backend running infinite_recurse.sql. While I am
working on finding a root-cause, I think it happens due to some memory
being overwritten due to to stack-depth violation, as the issue is not seen
when I reduce the max_stack_depth to 100kb.
}
}

The second issue is also resolved by using smaller allocations within a DSA.
Previously, it occurred because a few statically allocated strings were placed
within a single large chunk of DSA allocation. I have changed this to use
dynamically allocated chunks with dsa_allocate0 within the same DSA.

Please find attached updated and rebased patches.

Thank you,
Rahila Syed

Attachment

Re: Enhancing Memory Context Statistics Reporting

From

Tomas Vondra

Date:

21 February, 18:01:00

On 2/20/25 14:26, Rahila Syed wrote:
> Hi,
> 
> Please find attached the updated patches after some cleanup and test
> fixes.  
> 
> Thank you,
> Rahila Syed
> 
> On Tue, Feb 18, 2025 at 6:35 PM Rahila Syed <rahilasyed90@gmail.com
> <mailto:rahilasyed90@gmail.com>> wrote:
> 
>     Hi,
> 
> 
>         Thanks for updating the patch!
> 
>         The below comments would be a bit too detailed at this stage,
>         but I’d
>         like to share the points I noticed.
> 
>     Thanks for sharing the detailed comments. I have incorporated some
>     of them
>     into the new version of the patch. I will include the rest when I
>     refine and
>     comment the code further.
> 
>     Meanwhile, I have fixed the following outstanding issues:
> 
>         1.  Currently one DSA  is created per backend when the first
>         request for
>         statistics is made and remains for the lifetime of the server.
>         I think I should add logic to periodically destroy DSAs, when memory
>         context statistics are not being *actively* queried from the
>         backend,
>         as determined by the statistics timestamp.
> 
>       
>     After an offline discussion with Andres and Tomas, I have fixed this
>     to use
>     only one DSA for all the publishing backends/processes. Each backend
>      allocates smaller chunks of memory within the DSA while publishing
>     statistics.
>     These chunks are tracked independently by each backend, ensuring
>     that two
>     publishing backends/processes do not block each other despite using
>     the same
>     DSA. This approach eliminates the overhead of creating multiple DSAs,
>     one for each backend. 
> 
>     I am not destroying the DSA area because it stores the previously
>     published
>     statistics for each process. This allows the system to display older
>     statistics
>     when the latest data cannot be retrieved within a reasonable time.
>     Only the most recently updated statistics are kept, while all
>     earlier ones
>     are freed using dsa_free by each backend when they are no longer needed.
>     .  

I think something is not quite right, because if I try running a simple
pgbench script that does pg_get_process_memory_contexts() on PIDs of
random postgres process (just like in the past), I immediately get this:

pgbench: error: client 28 script 0 aborted in command 0 query 0: ERROR:
can't attach the same segment more than once
pgbench: error: client 10 script 0 aborted in command 0 query 0: ERROR:
can't attach the same segment more than once
pgbench: error: client 5 script 0 aborted in command 0 query 0: ERROR:
can't attach the same segment more than once
pgbench: error: client 8 script 0 aborted in command 0 query 0: ERROR:
can't attach the same segment more than once
...

Perhaps the backends need to synchronize creation of the DSA?

> 
>         2. The two issues reported by Fujii-san here: [1].
>         i. I have proposed a fix for the first issue here [2].
>         ii. I am able to reproduce the second issue. This happens when
>         we try 
>         to query statistics of a backend running infinite_recurse.sql.
>         While I am 
>         working on finding a root-cause, I think it happens due to some
>         memory 
>         being overwritten due to to stack-depth violation, as the issue
>         is not seen 
>         when I reduce the max_stack_depth to 100kb.
>          }
>          }
> 
> 
>     The second issue is also resolved by using smaller allocations
>     within a DSA.
>     Previously, it occurred because a few statically allocated strings
>     were placed
>     within a single large chunk of DSA allocation. I have changed this
>     to use
>     dynamically allocated chunks with dsa_allocate0 within the same DSA.  
> 

Sounds good. Do you have any measurements how much this reduced the size
of the entries written to the DSA? How many entries will fit into 1MB of
shared memory?


regards

-- 
Tomas Vondra

Re: Enhancing Memory Context Statistics Reporting

From

Rahila Syed

Date:

24 February, 15:46:45

I think something is not quite right, because if I try running a simple
pgbench script that does pg_get_process_memory_contexts() on PIDs of
random postgres process (just like in the past), I immediately get this:

Thank you for testing. This issue occurs when a process that previously attached
to a DSA area for publishing its own context statistics tries to attach to it again while
querying statistics from another backend. Previously, I was not detaching at the end
of publishing the statistics. I have now changed it to detach from the area after the
statistics are published. The fix is included in the updated patch.

Perhaps the backends need to synchronize creation of the DSA?

This has been implemented in the patch.

Sounds good. Do you have any measurements how much this reduced the size
of the entries written to the DSA? How many entries will fit into 1MB of
shared memory?

The size of the entries has approximately halved after dynamically allocating the

strings and a datum array.
Also, previously, I was allocating the entire memory for all contexts in one large chunk
from DSA. I have now separated them into smaller allocations

per context. The integer counters are still allocated at once for all contexts, but

the size of an allocated chunk will not exceed approximately 128 bytes * total_num_of_contexts.
Average total number of contexts is in the hundreds.

PFA the updated and rebased patches.

Thank you,

Rahila Syed

Attachment

Re: Enhancing Memory Context Statistics Reporting

From

Daniel Gustafsson

Date:

28 February, 18:42:37

> On 24 Feb 2025, at 13:46, Rahila Syed <rahilasyed90@gmail.com> wrote:

> PFA the updated and rebased patches.

Thanks for the rebase, a few mostly superficial comments from a first
read-through.  I'll do some more testing and playing around with it for
functional comments.

+  ...
+  child contexts' statistics, with num_agg_contexts indicating the number
+  of these aggregated child contexts.
The documentation refers to attributes in the return row but the format of that
row isn't displayed which makes following along hard.  I think we should
include a table or a programlisting showing the return data before this
paragraph.


+const char *
+AssignContextType(NodeTag type)
This function doesn't actually assign anything so the name is a bit misleading,
it would be better with ContextTypeToString or something similar.


+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
This sentence is really long and should probably be broken up.


+ * The shared memory buffer has a limited size - it the process has too many
s/it/if/


+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_tries
+ * number of times, which is defined by user, before giving up and
+ * returning previously published statistics, if any.
This comment should mention what happens if the process gives up and there is
no previously published stats.


+   int         i;
    ...
+   for (i = 0; i < memCtxState[procNumber].total_stats; i++)
This can be rewritten as "for (int i = 0; .." since we allow C99.


+    * process running and consuming lots of memory, that it might end on its
+    * own first and its memory contexts are not logged is not a problem.
This comment is copy/pasted from pg_log_backend_memory_contexts and while it
mostly still apply it should at least be rewritten to not refer to logging as
this function doesn't do that.


+   ereport(WARNING,
+           (errmsg("PID %d is not a PostgreSQL server process",
No need to add the extra parenthesis around errmsg anymore, so I think new code
should omit those.


+  errhint("Use pg_backend_memory_contexts view instead")));
Super nitpick, but errhints should be complete sentences ending with a period.


+    * statitics have previously been published by the backend. In which case,
s/statitics/statistics/


+    * statitics have previously been published by the backend. In which case,
+    * check if statistics are not older than curr_timestamp, if they are wait
I think the sentence around the time check is needlessly confusing, could it be
rewritten into something like:
    "A valid DSA pointer isn't proof that statistics are available, it can be
    valid due to previously published stats.  Check if the stats are updated by
    comparing the timestamp, if the stats are newer than our previously
    recorded timestamp from before sending the procsignal they must by
    definition be updated."


+   /* Assert for dsa_handle to be valid */
Was this intended to be turned into an Assert call? Else it seems better to remove.


+   if (print_location != PRINT_STATS_NONE)
This warrants a comment stating why it makes sense.


+    * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
s/print_to_stderr/print_location/.  Also, do we really need print_to_stderr in
this function at all?  It seems a bit awkward to combine a boolean and a
paramter for a tri-state value when the parameter holds the tri_state already.
For readability I think just checking print_location will be better since the
value will be clear, where print_to_stderr=false is less clear in a tri-state
scenario.


+ * its ancestors to a list, inorder to compute a path.
s/inorder/in order/


+   elog(LOG, "hash table corrupted, can't construct path value");
+   break;
This will return either a NIL list or a partial path, but PublishMemoryContext
doesn't really take into consideration that it might be so.  Is this really
benign to the point that we can blindly go on?  Also, elog(LOG..) is mostly for
tracing or debugging as elog() isn't intended for user facing errors.


+static void
+compute_num_of_contexts(List *contexts, HTAB *context_id_lookup,
+                       int *stats_count, bool get_summary)
This function does a lot than compute the number of contexts so the name seems
a bit misleading.  Perhaps a rename to compute_contexts() or something similar?


+   memctx_info[curr_id].name = dsa_allocate0(area,
+                                             strlen(clipped_ident) + 1);
These calls can use idlen instead of more strlen() calls no?  While there is no
performance benefit, it would increase readability IMHO if the code first
calculates a value, and then use it.


+   strncpy(name,
+           clipped_ident, strlen(clipped_ident));
Since clipped_ident has been nul terminated earlier there is no need to use
strncpy, we can instead use strlcpy and take the target buffer size into
consideration rather than the input string length.


    PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+   PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
This comment should be different from the LOG_MEMORY_xx one.


+#define MEM_CONTEXT_SHMEM_STATS_SIZE   30
+#define MAX_TYPE_STRING_LENGTH 64
These are unused, from an earlier version of the patch perhaps?


+ * Singe DSA area is created and used by all the processes,
s/Singe/Since/


+typedef struct MemoryContextBackendState
This is only used in mcxtfuncs.c and can be moved there rather than being
exported in the header.


+}          MemoryContextId;
This lacks an entry in the typedefs.list file.

--
Daniel Gustafsson

Re: Enhancing Memory Context Statistics Reporting

From

Rahila Syed

Date:

04 March, 10:00:02

Hi Daniel,

Thanks for the rebase, a few mostly superficial comments from a first
read-through.

Thank you for your comments.

The documentation refers to attributes in the return row but the format of that
row isn't displayed which makes following along hard. I think we should
include a table or a programlisting showing the return data before this
paragraph.

I included the sql function call and its output in programlisting format after the
function description.
Since the description was part of a table, I added this additional information at the
end of that table.

+const char *
+AssignContextType(NodeTag type)
This function doesn't actually assign anything so the name is a bit misleading,
it would be better with ContextTypeToString or something similar.

Done.

+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
This sentence is really long and should probably be broken up.

Fixed.

+ * The shared memory buffer has a limited size - it the process has too many
s/it/if/

Fixed.

+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_tries
+ * number of times, which is defined by user, before giving up and
+ * returning previously published statistics, if any.
This comment should mention what happens if the process gives up and there is
no previously published stats.

Done.

+ int i;
...
+ for (i = 0; i < memCtxState[procNumber].total_stats; i++)
This can be rewritten as "for (int i = 0; .." since we allow C99.

Done.

+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
This comment is copy/pasted from pg_log_backend_memory_contexts and while it
mostly still apply it should at least be rewritten to not refer to logging as
this function doesn't do that.

Fixed.

+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process",
No need to add the extra parenthesis around errmsg anymore, so I think new code
should omit those.

Done.

+ errhint("Use pg_backend_memory_contexts view instead")));
Super nitpick, but errhints should be complete sentences ending with a period.

Done.

+ * statitics have previously been published by the backend. In which case,
s/statitics/statistics/

Fixed.

+ * statitics have previously been published by the backend. In which case,
+ * check if statistics are not older than curr_timestamp, if they are wait
I think the sentence around the time check is needlessly confusing, could it be
rewritten into something like:
"A valid DSA pointer isn't proof that statistics are available, it can be
valid due to previously published stats. Check if the stats are updated by
comparing the timestamp, if the stats are newer than our previously
recorded timestamp from before sending the procsignal they must by
definition be updated."

Replaced accordingly.

+ /* Assert for dsa_handle to be valid */
Was this intended to be turned into an Assert call? Else it seems better to remove.

Added an assert and removed the comment.

+ if (print_location != PRINT_STATS_NONE)
This warrants a comment stating why it makes sense.

+ * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
s/print_to_stderr/print_location/. Also, do we really need print_to_stderr in
this function at all? It seems a bit awkward to combine a boolean and a
paramter for a tri-state value when the parameter holds the tri_state already.
For readability I think just checking print_location will be better since the
value will be clear, where print_to_stderr=false is less clear in a tri-state
scenario.

I removed the boolean print_to_stderr, the checks are now using
the tri-state enum-print_location.

+ * its ancestors to a list, inorder to compute a path.
s/inorder/in order/

Fixed.

+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
This will return either a NIL list or a partial path, but PublishMemoryContext
doesn't really take into consideration that it might be so. Is this really
benign to the point that we can blindly go on? Also, elog(LOG..) is mostly for
tracing or debugging as elog() isn't intended for user facing errors.

I agree that this should be addressed. I added a check for path value before
storing it in shared memory. If the path is NIL, the path pointer in DSA will point
to InvalidDsaPointer.
When a client encounters an InvalidDsaPointer it will print NULL in the path column.
Partial path scenario is unlikely IMO, and I am not sure if it warrants additional
checks.

+static void
+compute_num_of_contexts(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool get_summary)
This function does a lot than compute the number of contexts so the name seems
a bit misleading. Perhaps a rename to compute_contexts() or something similar?

Renamed to compute_contexts_count_and_ids.

+ memctx_info[curr_id].name = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
These calls can use idlen instead of more strlen() calls no? While there is no
performance benefit, it would increase readability IMHO if the code first
calculates a value, and then use it.

Done.

+ strncpy(name,
+ clipped_ident, strlen(clipped_ident));
Since clipped_ident has been nul terminated earlier there is no need to use
strncpy, we can instead use strlcpy and take the target buffer size into
consideration rather than the input string length.

Replaced with the strlcpy calls.

PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
This comment should be different from the LOG_MEMORY_xx one.

Fixed.

+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MAX_TYPE_STRING_LENGTH 64
These are unused, from an earlier version of the patch perhaps?

Removed

+ * Singe DSA area is created and used by all the processes,
s/Singe/Since/

Fixed.

+typedef struct MemoryContextBackendState
This is only used in mcxtfuncs.c and can be moved there rather than being
exported in the header.

This is being used in mcxt.c too in the form of the variable memCtxState.

+} MemoryContextId;
This lacks an entry in the typedefs.list file.

Added.

Please find attached the updated patches with the above fixes.

Thank you,

Rahila Syed

Attachment

Re: Enhancing Memory Context Statistics Reporting

From

Rahila Syed

Date:

13 March, 16:56:51

Hi,

Please find attached updated and rebased patches. It has the following changes

1. To prevent memory leaks, ensure that the latest statistics published by a process
are freed before it exits. This can be achieved by calling dsa_free in the
before_shmem_exit callback.
2. Add a level column to maintain consistency with the output of
pg_backend_memory_contexts.

Thank you,

Rahila Syed

On Tue, Mar 4, 2025 at 12:30 PM Rahila Syed <rahilasyed90@gmail.com> wrote:

Hi Daniel,

Thanks for the rebase, a few mostly superficial comments from a first
read-through.
Thank you for your comments.

The documentation refers to attributes in the return row but the format of that
row isn't displayed which makes following along hard. I think we should
include a table or a programlisting showing the return data before this
paragraph.

I included the sql function call and its output in programlisting format after the
function description.
Since the description was part of a table, I added this additional information at the
end of that table.

+const char *
+AssignContextType(NodeTag type)
This function doesn't actually assign anything so the name is a bit misleading,
it would be better with ContextTypeToString or something similar.

Done.

+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
This sentence is really long and should probably be broken up.

Fixed.

+ * The shared memory buffer has a limited size - it the process has too many
s/it/if/

Fixed.

+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_tries
+ * number of times, which is defined by user, before giving up and
+ * returning previously published statistics, if any.
This comment should mention what happens if the process gives up and there is
no previously published stats.

Done.

+ int i;
...
+ for (i = 0; i < memCtxState[procNumber].total_stats; i++)
This can be rewritten as "for (int i = 0; .." since we allow C99.

Done.

+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
This comment is copy/pasted from pg_log_backend_memory_contexts and while it
mostly still apply it should at least be rewritten to not refer to logging as
this function doesn't do that.

Fixed.

+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process",
No need to add the extra parenthesis around errmsg anymore, so I think new code
should omit those.

Done.

+ errhint("Use pg_backend_memory_contexts view instead")));
Super nitpick, but errhints should be complete sentences ending with a period.

Done.

+ * statitics have previously been published by the backend. In which case,
s/statitics/statistics/

Fixed.

+ * statitics have previously been published by the backend. In which case,
+ * check if statistics are not older than curr_timestamp, if they are wait
I think the sentence around the time check is needlessly confusing, could it be
rewritten into something like:
"A valid DSA pointer isn't proof that statistics are available, it can be
valid due to previously published stats. Check if the stats are updated by
comparing the timestamp, if the stats are newer than our previously
recorded timestamp from before sending the procsignal they must by
definition be updated."

Replaced accordingly.

+ /* Assert for dsa_handle to be valid */
Was this intended to be turned into an Assert call? Else it seems better to remove.

Added an assert and removed the comment.

+ if (print_location != PRINT_STATS_NONE)
This warrants a comment stating why it makes sense.

+ * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
s/print_to_stderr/print_location/. Also, do we really need print_to_stderr in
this function at all? It seems a bit awkward to combine a boolean and a
paramter for a tri-state value when the parameter holds the tri_state already.
For readability I think just checking print_location will be better since the
value will be clear, where print_to_stderr=false is less clear in a tri-state
scenario.

I removed the boolean print_to_stderr, the checks are now using
the tri-state enum-print_location.

+ * its ancestors to a list, inorder to compute a path.
s/inorder/in order/

Fixed.

+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
This will return either a NIL list or a partial path, but PublishMemoryContext
doesn't really take into consideration that it might be so. Is this really
benign to the point that we can blindly go on? Also, elog(LOG..) is mostly for
tracing or debugging as elog() isn't intended for user facing errors.

I agree that this should be addressed. I added a check for path value before
storing it in shared memory. If the path is NIL, the path pointer in DSA will point
to InvalidDsaPointer.
When a client encounters an InvalidDsaPointer it will print NULL in the path column.
Partial path scenario is unlikely IMO, and I am not sure if it warrants additional
checks.

+static void
+compute_num_of_contexts(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool get_summary)
This function does a lot than compute the number of contexts so the name seems
a bit misleading. Perhaps a rename to compute_contexts() or something similar?

Renamed to compute_contexts_count_and_ids.

+ memctx_info[curr_id].name = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
These calls can use idlen instead of more strlen() calls no? While there is no
performance benefit, it would increase readability IMHO if the code first
calculates a value, and then use it.

Done.

+ strncpy(name,
+ clipped_ident, strlen(clipped_ident));
Since clipped_ident has been nul terminated earlier there is no need to use
strncpy, we can instead use strlcpy and take the target buffer size into
consideration rather than the input string length.

Replaced with the strlcpy calls.

PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
This comment should be different from the LOG_MEMORY_xx one.

Fixed.

+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MAX_TYPE_STRING_LENGTH 64
These are unused, from an earlier version of the patch perhaps?

Removed

+ * Singe DSA area is created and used by all the processes,
s/Singe/Since/

Fixed.

+typedef struct MemoryContextBackendState
This is only used in mcxtfuncs.c and can be moved there rather than being
exported in the header.

This is being used in mcxt.c too in the form of the variable memCtxState.

+} MemoryContextId;
This lacks an entry in the typedefs.list file.

Added.

Please find attached the updated patches with the above fixes.

Thank you,
Rahila Syed

Attachment

Re: Enhancing Memory Context Statistics Reporting

From

Alexander Korotkov

Date:

15 March, 11:40:39

Hi, Rahila!

On Thu, Mar 13, 2025 at 3:57 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
>
> Please find attached updated and rebased patches. It has the following changes
>
> 1. To prevent memory leaks, ensure that the latest statistics published by a process
> are freed before it exits. This can be achieved by calling dsa_free in the
> before_shmem_exit callback.
> 2. Add a level column to maintain consistency with the output of
> pg_backend_memory_contexts.

Thank you for your work on this subject.

v17-0001-Preparatory-changes-for-reporting-memory-context-sta.patch

It looks like we're increasing *num_contexts twice per child memory
context.  First, it gets increased with a recursive
MemoryContextStatsInternal() call, then by adding an ichild.  I might
be wrong, but I think these calculations at least deserve more
comments.

v17-0002-Function-to-report-memory-context-statistics.patch

+   if (procNumber == MyProcNumber)
+   {
+       ereport(WARNING,
+               errmsg("cannot return statistics for local backend"),
+               errhint("Use pg_backend_memory_contexts view instead."));
+       PG_RETURN_NULL();
+   }

Is it worth it to keep this restriction?  Can we fetch info about
local memory context for the sake of generality?

I know there have been discussions in the thread before, but the
mechanism of publishing memory context stats via DSA looks quite
complicated.  Also, the user probably intends to inspect memory
contexts when there is not a huge amount of free memory.  So, failure
is probable on DSA allocation.  Could we do simpler?  For instance,
allocate some amount of static shared memory and use it as a message
queue between processes.  As a heavy load is not supposed to be here,
I think one queue would be enough.

------
Regards,
Alexander Korotkov
Supabase

Re: Enhancing Memory Context Statistics Reporting

From

Rahila Syed

Date:

17 March, 10:52:41

Hi Alexander,

Thank you for the review.

It looks like we're increasing *num_contexts twice per child memory
context. First, it gets increased with a recursive
MemoryContextStatsInternal() call, then by adding an ichild. I might
be wrong, but I think these calculations at least deserve more
comments.

I believe that's not the case. The recursive calls only work for children

encountered up to max_level and less than max_children per context.
The rest of the children are handled using MemoryContextTraverseNext,
without recursive calls. Thus, num_contexts is incremented for those
children separately from the recursive call counter.

I will add more comments around this.

v17-0002-Function-to-report-memory-context-statistics.patch

+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_backend_memory_contexts view instead."));
+ PG_RETURN_NULL();
+ }

Is it worth it to keep this restriction? Can we fetch info about
local memory context for the sake of generality?

I think that could be done, but using pg_backend_memory_context would
be more efficient in this case.

I know there have been discussions in the thread before, but the
mechanism of publishing memory context stats via DSA looks quite
complicated. Also, the user probably intends to inspect memory
contexts when there is not a huge amount of free memory. So, failure
is probable on DSA allocation. Could we do simpler? For instance,
allocate some amount of static shared memory and use it as a message
queue between processes. As a heavy load is not supposed to be here,
I think one queue would be enough.

There could be other uses for such a function, such as a monitoring dashboard
that periodically queries all running backends for memory statistics. If we use a
single queue shared between all the backends, they will need to wait for the queue
to become available before sharing their statistics, leading to processing delays at
the publishing backend.

Even with separate queues for each backend or without expecting concurrent use,
publishing statistics could be delayed if a message queue is full. This is because a
backend needs to wait for a client process to consume existing messages or
statistics before publishing more.
If a client process exits without consuming messages, the publishing backend will
experience timeouts when trying to publish stats. This will impact backend performance
as statistics are published during CHECK_FOR_INTERRUPTS.

In the current implementation, the backend publishes all the statistics in one go
without waiting for clients to read any statistics.

In addition, allocating complete message queues in static shared memory can be
expensive, especially since these static structures need to be created even if memory
context statistics are never queried.
On the contrary, a dsa is created for the feature whenever statistics are first queried.
We are not preallocating shared memory for this feature, except for small structures
to store the dsa_handle and dsa_pointers for each backend.

Thank you,

Rahila Syed

Re: Enhancing Memory Context Statistics Reporting

From

Ashutosh Bapat

Date:

17 March, 11:36:50

On Mon, Mar 17, 2025 at 1:23 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
>
>>
>> v17-0002-Function-to-report-memory-context-statistics.patch
>>
>> +   if (procNumber == MyProcNumber)
>> +   {
>> +       ereport(WARNING,
>> +               errmsg("cannot return statistics for local backend"),
>> +               errhint("Use pg_backend_memory_contexts view instead."));
>> +       PG_RETURN_NULL();
>> +   }
>>
>> Is it worth it to keep this restriction?  Can we fetch info about
>> local memory context for the sake of generality?
>>
>
> I think that could be done, but using pg_backend_memory_context would
> be more efficient in this case.
>

I have raised a similar concern before. Having two separate functions
one for local backend and other for remote is going to be confusing.
We should have one function doing both and renamed appropriately.


--
Best Wishes,
Ashutosh Bapat

Re: Enhancing Memory Context Statistics Reporting

From

Rahila Syed

Date:

20 March, 10:39:45

Hi,

>>
>> + if (procNumber == MyProcNumber)
>> + {
>> + ereport(WARNING,
>> + errmsg("cannot return statistics for local backend"),
>> + errhint("Use pg_backend_memory_contexts view instead."));
>> + PG_RETURN_NULL();
>> + }
>>
>> Is it worth it to keep this restriction? Can we fetch info about
>> local memory context for the sake of generality?
>>
>
> I think that could be done, but using pg_backend_memory_context would
> be more efficient in this case.
>

I have raised a similar concern before. Having two separate functions
one for local backend and other for remote is going to be confusing.
We should have one function doing both and renamed appropriately.

This is a separate concern from what has been raised by Alexander.
He has suggested removing the restriction and fetching local backend statistics also
with the proposed function.
I've removed this restriction in the latest version of the patch. Now, the proposed
function can be used to fetch local backend statistics too.

Regarding your suggestion on merging these functions, although they both report memory
context statistics, they differ in how they fetch these statistics—locally versus from dynamic
shared memory. Additionally, the function signatures are different: the proposed function
takes three arguments (pid, get_summary, and num_tries), whereas
pg_get_backend_memory_contexts does not take any arguments. Therefore, I believe
these functions can remain separate as long as we document their usages correctly.

Please find attached rebased and updated patches. I have added some more comments and
fixed an issue caused due to registering before_shmem_exit callback from interrupt processing
routine. To address this issue, I am registering this callback in the InitProcess() function.

This happened because interrupt processing could be triggered from a
PG_ENSURE_ERROR_CLEANUP block. This block operates under the assumption that
the before_shmem_exit callback registered at the beginning of the block, will be the last one
in the registered callback list at the end of the block, which would not be the case if I register
before_shmem_exit callback in the interrupt handling routine.

Thank you,

Rahila Syed

Attachment

Re: Enhancing Memory Context Statistics Reporting

From

Daniel Gustafsson

Date:

25 March, 17:14:08

> On 20 Mar 2025, at 08:39, Rahila Syed <rahilasyed90@gmail.com> wrote:

Thanks for the new version, I believe this will be a welcome tool in the
debugging toolbox.

I took a cleanup pass over the docs with among others the below changes:
  * You had broken the text in paragraphs, but without <para/> tags they are
    rendered as a single blob of text so added that.
  * Removed the "(PID)" explanation as process id is used elsewhere on the same
    page already without explanation.
  * Added <productname/> markup on PostgreSQL
  * Added <literal/> markup on paramater values
  * Switched the example query output to use \x
  * Added a note on when pg_backend_memory_contexts is a better choice
The paragraphs need some re-indenting to avoid too long lines, but I opted out
of doing so here to make reviewing the changes easier.

A few comments on the code (all comments are performed in 0003 attached here
which also has smaller cleanups wrt indentation, code style etc):

+#include <math.h>
I don't think we need this, maybe it was from an earlier version of the patch?


+MEM_CTX_PUBLISH    "Waiting for backend to publish memory information."
I wonder if this should really be "process" and not backend?


+       default:
+           context_type = "???";
+           break;
In ContextTypeToString() I'm having second thoughts about this, there shouldn't
be any legitimate use-case of passing a nodetag this function which would fail
MemoryContextIsValid().  I wonder if we aren't helping callers more by erroring
out rather than silently returning an unknown?  I haven't changed this but
maybe we should to set the API right from the start?


+ * if the process has more memory contexts than that can fit in the allocated
s/than that can/than what can/?


+   errmsg("memory context statistics privilege error"));
Similar checks elsewhere in the tree mostly use "permission denied to .." so I
think we should adopt that here as well.


+       LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+       msecs =
+           TimestampDifferenceMilliseconds(curr_timestamp,
+                                           memCtxState[procNumber].stats_timestamp);
Since we only want to consider the stats if they are from the current process,
we can delay checking the time difference until after we've checked the pid and
thus reduce the amount of time we hold the lock in the error case.


+       /*
+        * Recheck the state of the backend before sleeping on the condition
+        * variable
+        */
+       proc = BackendPidGetProc(pid);
Here we are really rechecking that the process is still alive, but I wonder if
we should take the opportunity to ensure that the type is what we expect it to
be?  If the pid has moved from being a backend to an aux proc or vice versa we
really don't want to go on.


+     ereport(WARNING,
+             errmsg("PID %d is not a PostgreSQL server process",
+                    pid));
I wonder if we should differentiate between the warnings?  When we hit this in
the loop the errmsg is describing a slightly different case.  I did leave it
for now, but it's food for thought if we should perhaps reword this one.


+       ereport(LOG,
+               errmsg("Wait for %d process to publish stats timed out, trying again",
+                      pid));
This should probably by DEBUG1, in a congested cluster it might cause a fair
bit of logging which isn't really helping the user.  Also, nitpick, errmsg
starts with a lowercase letter.


+static Size
+MemCtxShmemSize(void)
We don't really need this function anymore and keeping it separate we risk it
going out of sync with the matching calcuation in MemCtxBackendShmemInit, so I
think we should condense into one.


    else
    {
+       Assert(print_location == PRINT_STATS_NONE);
Rather than an if-then-else and an assert we can use a switch statement without
a default, this way we'll automatically get a warning if a value is missed.


+       ereport(LOG,
+               errmsg("hash table corrupted, can't construct path value"));
I know you switched from elog(LOG..  to ereport(LOG..  but I still think a LOG
entry stating corruption isn't helpful, it's not actionable for the user.
Given that it's a case that shouldn't happen I wonder if we should downgrade it
to an Assert(false) and potentially a DEBUG1?

--
Daniel Gustafsson

Attachment

Re: Enhancing Memory Context Statistics Reporting

From

Rahila Syed

Date:

26 March, 13:34:17

Hi Daniel,

Thank you for your review.

I have incorporated all your changes in v20 patches and ensured that the review comments
corresponding to 0001 patch are included in that patch and not in 0002.

+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
I wonder if this should really be "process" and not backend?

Fixed.

+ default:
+ context_type = "???";
+ break;
In ContextTypeToString() I'm having second thoughts about this, there shouldn't
be any legitimate use-case of passing a nodetag this function which would fail
MemoryContextIsValid(). I wonder if we aren't helping callers more by erroring
out rather than silently returning an unknown? I haven't changed this but
maybe we should to set the API right from the start?

I cannot think of any legitimate scenario where the context type would be unknown.
However, if we were to throw an error, it would prevent us from reporting any memory
usage information when the context type is unidentified. Perhaps, it would be more
informative and less restrictive to label it as "Unrecognized" or "Unknown."
I wonder if this was the reasoning behind doing it when it was added with the
pg_backend_memory_contexts() function.

+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable
+ */
+ proc = BackendPidGetProc(pid);
Here we are really rechecking that the process is still alive, but I wonder if
we should take the opportunity to ensure that the type is what we expect it to
be? If the pid has moved from being a backend to an aux proc or vice versa we
really don't want to go on.

The reasoning makes sense to me. For periodic monitoring of all processes,
any PID that reincarnates into a different type could be queried in subsequent
function calls. Regarding targeted monitoring of a specific process, such a reincarnated
process would exhibit a completely different memory consumption,
likely not aligning with the user's original intent behind requesting the statistics.

+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process",
+ pid));
I wonder if we should differentiate between the warnings? When we hit this in
the loop the errmsg is describing a slightly different case. I did leave it
for now, but it's food for thought if we should perhaps reword this one.

Changed it to "PID %d is no longer the same PostgreSQL server process".

+ ereport(LOG,
+ errmsg("hash table corrupted, can't construct path value"));
I know you switched from elog(LOG.. to ereport(LOG.. but I still think a LOG
entry stating corruption isn't helpful, it's not actionable for the user.
Given that it's a case that shouldn't happen I wonder if we should downgrade it
to an Assert(false) and potentially a DEBUG1?

How about changing it to ERROR, in accordance with current occurrences of the
same message? I did it in the attached version, however I am open to changing
it to an Assert(false) and DEBUG1.

Apart from the above, I made the following improvements.

1. Eliminated the unnecessary creation of an extra memory context before calling hash_create.
The hash_create function already generates a memory context containing the hash table,
enabling easy memory deallocation by simply deleting the context via hash_destroy.
Therefore, the patch relies on hash_destroy for memory management instead of manual freeing.

2. Optimized memory usage by storing the path as an array of integers rather than as an array of
Datums.
This approach conserves DSA memory allocated for storing this information.

3. Miscellaneous comment cleanups and introduced macros to simplify calculations.

Thank you,

Rahila Syed

Attachment

Re: Enhancing Memory Context Statistics Reporting

From

Daniel Gustafsson

Date:

03 April, 00:44:25

> On 26 Mar 2025, at 11:34, Rahila Syed <rahilasyed90@gmail.com> wrote:

> +       ereport(LOG,
> +               errmsg("hash table corrupted, can't construct path value"));
> I know you switched from elog(LOG..  to ereport(LOG..  but I still think a LOG
> entry stating corruption isn't helpful, it's not actionable for the user.
> Given that it's a case that shouldn't happen I wonder if we should downgrade it
> to an Assert(false) and potentially a DEBUG1?
>
> How about changing it to ERROR, in accordance with current occurrences of the
> same message? I did it in the attached version, however I am open to changing
> it to an Assert(false) and DEBUG1.

In the attached I moved it to an elog() as it's an internal error, and spending
translation effort on it seems fruitless.

> 1. Eliminated the unnecessary creation of an extra memory context before calling hash_create.
> The hash_create function already generates a memory context containing the hash table,
> enabling easy memory deallocation by simply deleting the context via hash_destroy.
> Therefore, the patch relies on hash_destroy for memory management instead of manual freeing.

Nice

> 2. Optimized memory usage by storing the path as an array of integers rather than as an array of
> Datums.
> This approach conserves DSA memory allocated for storing this information.

Ah yes, much better.

The attached v21 has a few improvements:

* The function documentation didn't specify the return type, only the fact that
it's setof record.  I've added all output columns.

* Some general cleaups of the docs with better markup, improved xref linking
and various rewording.

* Comment cleanups and language alignment

* Added a missing_ok parameter to ContextTypeToString().  While all callers are
fine with unknown context types, if we introduce an API for this it seems
prudent to not place that burden on callers but to take it on in the function.

* Renamed get_summary to just summary, and num_of_tries to retries which feels
more in line with the naming convention in other functions

* Deferred calling InitMaterializedSRF() until after the PID has been checked
for validity.

* Pulled back the timeout to 500msec from 1 second.  In running congested
pgbench simulations I saw better performance and improved results in getting stats.

* Replaced strncpy with strlcpy and consistently used idlen to keep all length
calculations equal.

* Fixed misspelled param name in pg_proc.dat

* Pulled back maximum memory usage from 8Mb to 1Mb.  8Mb for the duration of a
process (once allocated) is a lot for a niche feature and I while I'm still not
sure 1Mb is the right value I think from experimentation that it's closer.

I think this version is close to a committable state, will spend a little more
time testing, polishing and rewriting the commit message.  I will also play
around with placement within the memory context code files to keep it from
making backpatch issues.

--
Daniel Gustafsson

Attachment

Re: Enhancing Memory Context Statistics Reporting

From

Daniel Gustafsson

Date:

05 April, 22:29:21

> On 2 Apr 2025, at 23:44, Daniel Gustafsson <daniel@yesql.se> wrote:

> I think this version is close to a committable state, will spend a little more
> time testing, polishing and rewriting the commit message.  I will also play
> around with placement within the memory context code files to keep it from
> making backpatch issues.

After a bit more polish I landed with the attached, which I most likely will go
ahead with after another round in CI.

--
Daniel Gustafsson

Attachment

v23-0001-Add-function-to-get-memory-context-stats-for-pro.patch

Re: Enhancing Memory Context Statistics Reporting

From

Rahila Syed

Date:

07 April, 09:51:43

Hi Daniel,

After a bit more polish I landed with the attached, which I most likely will go
ahead with after another round in CI.

Thank you for refining the code. The changes look good to me.
Regression tests ran smoothly in parallel with the memory monitoring function,
pgbench results with the following custom script also shows good performance.
```

SELECT * FROM pg_get_process_memory_contexts(
(SELECT pid FROM pg_stat_activity
ORDER BY random() LIMIT 1)
, false, 5);
```

Thank you,
Rahila Syed

Re: Enhancing Memory Context Statistics Reporting

From

Daniel Gustafsson

Date:

07 April, 16:41:37

Following up on some off-list comments, attached is a v26 with a few small last
changes:

  * Improved documentation (docs and comments)
  * Fixed up Shmem sizing and init
  * Delayed registering to the shmem cleanup to get it earlier in cleanup
  * Renamed a few datastructures to improve readability
  * Various bits of polish

I think this function can be a valuable debugging aid going forward.

--
Daniel Gustafsson

Attachment

v26-0001-Add-function-to-get-memory-context-stats-for-pro.patch

Re: Enhancing Memory Context Statistics Reporting

From

Andres Freund

Date:

07 April, 18:43:51

Hi,

On 2025-04-07 15:41:37 +0200, Daniel Gustafsson wrote:
> I think this function can be a valuable debugging aid going forward.

What I am most excited about for this is to be able to measure server-wide and
fleet-wide memory usage over time. Today I have actually very little idea
about what memory is being used for across all connections, not to speak of a
larger number of servers.


> diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
> index 4f6795f7265..d3b4df27935 100644
> --- a/src/backend/postmaster/auxprocess.c
> +++ b/src/backend/postmaster/auxprocess.c
> @@ -84,6 +84,13 @@ AuxiliaryProcessMainCommon(void)
>      /* register a before-shutdown callback for LWLock cleanup */
>      before_shmem_exit(ShutdownAuxiliaryProcess, 0);
>  
> +    /*
> +     * The before shmem exit callback frees the DSA memory occupied by the
> +     * latest memory context statistics that could be published by this aux
> +     * proc if requested.
> +     */
> +    before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
> +
>      SetProcessingMode(NormalProcessing);
>  }

How about putting it into BaseInit()?  Or maybe just register it when its
first used?


> diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
> index fda91ffd1ce..d3cb3f1891c 100644
> --- a/src/backend/postmaster/checkpointer.c
> +++ b/src/backend/postmaster/checkpointer.c
> @@ -663,6 +663,10 @@ ProcessCheckpointerInterrupts(void)
>      /* Perform logging of memory contexts of this process */
>      if (LogMemoryContextPending)
>          ProcessLogMemoryContextInterrupt();
> +
> +    /* Publish memory contexts of this process */
> +    if (PublishMemoryContextPending)
> +        ProcessGetMemoryContextInterrupt();
>  }
>  
>  /*

Not this patch's responsibility, but we really desperately need to unify our
interrupt handling.  Manually keeping a ~dozen of functions similar, but not
exactly the same, is an insane approach.


> --- a/src/backend/utils/activity/wait_event_names.txt
> +++ b/src/backend/utils/activity/wait_event_names.txt
> @@ -161,6 +161,7 @@ WAL_RECEIVER_EXIT    "Waiting for the WAL receiver to exit."
>  WAL_RECEIVER_WAIT_START    "Waiting for startup process to send initial data for streaming replication."
>  WAL_SUMMARY_READY    "Waiting for a new WAL summary to be generated."
>  XACT_GROUP_UPDATE    "Waiting for the group leader to update transaction status at transaction end."
> +MEM_CTX_PUBLISH    "Waiting for a process to publish memory information."

The memory context stuff abbreviates as cxt not ctx.  There's a few more cases
of that in the patch.


> +const char *
> +ContextTypeToString(NodeTag type)
> +{
> +    const char *context_type;
> +
> +    switch (type)
> +    {
> +        case T_AllocSetContext:
> +            context_type = "AllocSet";
> +            break;
> +        case T_GenerationContext:
> +            context_type = "Generation";
> +            break;
> +        case T_SlabContext:
> +            context_type = "Slab";
> +            break;
> +        case T_BumpContext:
> +            context_type = "Bump";
> +            break;
> +        default:
> +            context_type = "???";
> +            break;
> +    }
> +    return (context_type);

Why these parens?



> + * If the publishing backend does not respond before the condition variable
> + * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry given that there is
> + * time left within the timeout specified by the user, before giving up and
> + * returning previously published statistics, if any. If no previous statistics
> + * exist, return NULL.

Why do we need to repeatedly wake up rather than just sleeping with the
"remaining" amount of time based on the time the function was called and the
time that has passed since?


> +    /*
> +     * A valid DSA pointer isn't proof that statistics are available, it can
> +     * be valid due to previously published stats.

Somehow "valid DSA pointer" is a bit too much about the precise mechanics and
not enough about what's actually happening. I'd rather say something like

"Even if the proc has published statistics, they may not be due to the current
request, but previously published stats."


> +        if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv,
> +                                        MEMSTATS_WAIT_TIMEOUT,
> +                                        WAIT_EVENT_MEM_CTX_PUBLISH))
> +        {
> +            timer += MEMSTATS_WAIT_TIMEOUT;
> +
> +            /*
> +             * Wait for the timeout as defined by the user. If no updated
> +             * statistics are available within the allowed time then display
> +             * previously published statistics if there are any. If no
> +             * previous statistics are available then return NULL.  The timer
> +             * is defined in milliseconds since thats what the condition
> +             * variable sleep uses.
> +             */
> +            if ((timer * 1000) >= timeout)
> +            {

I'd suggest just comparing how much time has elapsed since the timestamp
you've requested earlier.


> +                LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
> +                /* Displaying previously published statistics if available */
> +                if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
> +                    break;
> +                else
> +                {
> +                    LWLockRelease(&memCtxState[procNumber].lw_lock);
> +                    PG_RETURN_NULL();
> +                }
> +            }
> +        }
> +    }


> +/*
> + * Initialize shared memory for displaying memory context statistics
> + */
> +void
> +MemoryContextReportingShmemInit(void)
> +{
> +    bool        found;
> +
> +    memCtxArea = (MemoryContextState *)
> +        ShmemInitStruct("MemoryContextState", sizeof(MemoryContextState), &found);
> +
> +    if (!IsUnderPostmaster)
> +    {
> +        Assert(!found);

I don't really understand why this uses IsUnderPostmaster?  Seems like this
should just use found like most (or all) the other *ShmemInit() functions do?


> +        LWLockInitialize(&memCtxArea->lw_lock, LWLockNewTrancheId());

I think for builtin code we just hardcode the tranches in BuiltinTrancheIds.



> +    memCtxState = (MemoryContextBackendState *)
> +        ShmemInitStruct("MemoryContextBackendState",
> +                        ((MaxBackends + NUM_AUXILIARY_PROCS) * sizeof(MemoryContextBackendState)),
> +                        &found);

FWIW, I think it'd be mildly better if these two ShmemInitStruct()'s were
combined.


>  static void
>  MemoryContextStatsInternal(MemoryContext context, int level,
>                             int max_level, int max_children,
>                             MemoryContextCounters *totals,
> -                           bool print_to_stderr)
> +                           PrintDestination print_location, int *num_contexts)
>  {
>      MemoryContext child;
>      int            ichild;
> @@ -884,10 +923,39 @@ MemoryContextStatsInternal(MemoryContext context, int level,
>      Assert(MemoryContextIsValid(context));
>  
>      /* Examine the context itself */
> -    context->methods->stats(context,
> -                            MemoryContextStatsPrint,
> -                            &level,
> -                            totals, print_to_stderr);
> +    switch (print_location)
> +    {
> +        case PRINT_STATS_TO_STDERR:
> +            context->methods->stats(context,
> +                                    MemoryContextStatsPrint,
> +                                    &level,
> +                                    totals, true);
> +            break;
> +
> +        case PRINT_STATS_TO_LOGS:
> +            context->methods->stats(context,
> +                                    MemoryContextStatsPrint,
> +                                    &level,
> +                                    totals, false);
> +            break;
> +
> +        case PRINT_STATS_NONE:
> +
> +            /*
> +             * Do not print the statistics if print_location is
> +             * PRINT_STATS_NONE, only compute totals. This is used in
> +             * reporting of memory context statistics via a sql function. Last
> +             * parameter is not relevant.
> +             */
> +            context->methods->stats(context,
> +                                    NULL,
> +                                    NULL,
> +                                    totals, false);
> +            break;
> +    }
> +
> +    /* Increment the context count for each of the recursive call */
> +    *num_contexts = *num_contexts + 1;

It feels a bit silly to duplicate the call to context->methods->stats three
times. We've changed these parameters a bunch in the past, having more callers
to fix makes that more work. Can't the switch just set up the args that are
then passed to one call to context->methods->stats?


> +
> +    /* Compute the number of stats that can fit in the defined limit */
> +    max_stats = (MAX_SEGMENTS_PER_BACKEND * DSA_DEFAULT_INIT_SEGMENT_SIZE)
> +        / (MAX_MEMORY_CONTEXT_STATS_SIZE);

MAX_SEGMENTS_PER_BACKEND sounds way too generic to me for something defined in
memutils.h.  I don't really understand why DSA_DEFAULT_INIT_SEGMENT_SIZE is
something that makes sense to use here?

The header says:

> +/* Maximum size (in Mb) of DSA area per process */
> +#define MAX_SEGMENTS_PER_BACKEND 1

But the name doesn't at all indicate it's in megabytes. Nor does the way it's
used clearly indicate that. That seems to be completely incidental based on
the current default value DSA_DEFAULT_INIT_SEGMENT_SIZE.


> +    /*
> +     * Hold the process lock to protect writes to process specific memory. Two
> +     * processes publishing statistics do not block each other.
> +     */

s/specific/process specific/


> +    LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
> +    memCtxState[idx].proc_id = MyProcPid;
> +
> +    if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
> +    {
> +        /*
> +         * Free any previous allocations, free the name, ident and path
> +         * pointers before freeing the pointer that contains them.
> +         */
> +        free_memorycontextstate_dsa(area, memCtxState[idx].total_stats,
> +                                    memCtxState[idx].memstats_dsa_pointer);
> +
> +        dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
> +        memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;

Both callers to free_memorycontextstate_dsa() do these lines immediately after
calling free_memorycontextstate_dsa(), why not do that inside?



> +        for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
> +             c = c->nextchild)
> +        {
> +            MemoryContextCounters grand_totals;
> +            int            num_contexts = 0;
> +            int            level = 0;
> +
> +            path = NIL;
> +            memset(&grand_totals, 0, sizeof(grand_totals));
> +
> +            MemoryContextStatsInternal(c, level, 100, 100, &grand_totals,
> +                                       PRINT_STATS_NONE, &num_contexts);
> +
> +            path = compute_context_path(c, context_id_lookup);
> +
> +            PublishMemoryContext(meminfo, ctx_id, c, path,
> +                                 grand_totals, num_contexts, area, 100);
> +            ctx_id = ctx_id + 1;
> +        }
> +        memCtxState[idx].total_stats = ctx_id;
> +        /* Notify waiting backends and return */
> +        hash_destroy(context_id_lookup);
> +        dsa_detach(area);
> +        signal_memorycontext_reporting();
> +    }
> +
> +    foreach_ptr(MemoryContextData, cur, contexts)
> +    {
> +        List       *path = NIL;
> +
> +        /*
> +         * Figure out the transient context_id of this context and each of its
> +         * ancestors, to compute a path for this context.
> +         */
> +        path = compute_context_path(cur, context_id_lookup);
> +
> +        /* Account for saving one statistics slot for cumulative reporting */
> +        if (context_id < (max_stats - 1) || stats_count <= max_stats)
> +        {
> +            /* Examine the context stats */
> +            memset(&stat, 0, sizeof(stat));
> +            (*cur->methods->stats) (cur, NULL, NULL, &stat, true);

Hm. So here we call the callback ourselves, even though we extended
MemoryContextStatsInternal() to satisfy the summary output.  I guess it's
tolerable, but it's not great.


> +            /* Copy statistics to DSA memory */
> +            PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area, 100);
> +        }
> +        else
> +        {
> +            /* Examine the context stats */
> +            memset(&stat, 0, sizeof(stat));
> +            (*cur->methods->stats) (cur, NULL, NULL, &stat, true);

But do we really do it twice in a row?  The lines are exactly the same, so it
seems that should just be done before the if?

> +
> +    /* Notify waiting backends and return */
> +    hash_destroy(context_id_lookup);
> +    dsa_detach(area);
> +    signal_memorycontext_reporting();
> +}
> +
> +/*
> + * Signal all the waiting client backends after copying all the statistics.
> + */
> +static void
> +signal_memorycontext_reporting(void)
> +{
> +    memCtxState[MyProcNumber].stats_timestamp = GetCurrentTimestamp();
> +    LWLockRelease(&memCtxState[MyProcNumber].lw_lock);
> +    ConditionVariableBroadcast(&memCtxState[MyProcNumber].memctx_cv);
> +}

IMO somewhat confusing to release the lock in a function named
signal_memorycontext_reporting().  Why do we do that after
hash_destroy()/dsa_detach()?



> +static void
> +compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
> +                               int *stats_count, bool summary)
> +{
> +    foreach_ptr(MemoryContextData, cur, contexts)
> +    {
> +        MemoryContextId *entry;
> +        bool        found;
> +
> +        entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
> +                                                HASH_ENTER, &found);
> +        Assert(!found);
> +
> +        /* context id starts with 1 */
> +        entry->context_id = ++(*stats_count);

Given that we don't actually do anything here relating to starting with 1, I
find that comment confusing.


> +static void
> +PublishMemoryContext(MemoryContextStatsEntry *memctx_info, int curr_id,
> +                     MemoryContext context, List *path,
> +                     MemoryContextCounters stat, int num_contexts,
> +                     dsa_area *area, int max_levels)
> +{
> +    const char *ident = context->ident;
> +    const char *name = context->name;
> +    int           *path_list;
> +
> +    /*
> +     * To be consistent with logging output, we label dynahash contexts with
> +     * just the hash table name as with MemoryContextStatsPrint().
> +     */
> +    if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
> +    {
> +        name = context->ident;
> +        ident = NULL;
> +    }
> +
> +    if (name != NULL)
> +    {
> +        int            namelen = strlen(name);
> +        char       *nameptr;
> +
> +        if (strlen(name) >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
> +            namelen = pg_mbcliplen(name, namelen,
> +                                   MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
> +
> +        memctx_info[curr_id].name = dsa_allocate0(area, namelen + 1);

Given the number of references to memctx_info[curr_id] I'd put it in a local variable.

Why is this a dsa_allocate0 given that we're immediately overwriting it?


> +        nameptr = (char *) dsa_get_address(area, memctx_info[curr_id].name);
> +        strlcpy(nameptr, name, namelen + 1);
> +    }
> +    else
> +        memctx_info[curr_id].name = InvalidDsaPointer;
> +
> +    /* Trim and copy the identifier if it is not set to NULL */
> +    if (ident != NULL)
> +    {
> +        int            idlen = strlen(context->ident);
> +        char       *identptr;
> +
> +        /*
> +         * Some identifiers such as SQL query string can be very long,
> +         * truncate oversize identifiers.
> +         */
> +        if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
> +            idlen = pg_mbcliplen(ident, idlen,
> +                                 MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
> +
> +        memctx_info[curr_id].ident = dsa_allocate0(area, idlen + 1);
> +        identptr = (char *) dsa_get_address(area, memctx_info[curr_id].ident);
> +        strlcpy(identptr, ident, idlen + 1);

Hm. First I thought we'd leak memory if this second (and subsequent)
dsa_allocate failed. Then I thought we'd be ok, because the memory would be
memory because it'd be reachable from memCtxState[idx].memstats_dsa_pointer.

But I think it wouldn't *quite* work, because memCtxState[idx].total_stats is
only set *after* we would have failed.




> +    /* Allocate DSA memory for storing path information */
> +    if (path == NIL)
> +        memctx_info[curr_id].path = InvalidDsaPointer;
> +    else
> +    {
> +        int            levels = Min(list_length(path), max_levels);
> +
> +        memctx_info[curr_id].path_length = levels;
> +        memctx_info[curr_id].path = dsa_allocate0(area, levels * sizeof(int));
> +        memctx_info[curr_id].levels = list_length(path);
> +        path_list = (int *) dsa_get_address(area, memctx_info[curr_id].path);
> +
> +        foreach_int(i, path)
> +        {
> +            path_list[foreach_current_index(i)] = i;
> +            if (--levels == 0)
> +                break;
> +        }
> +    }
> +    memctx_info[curr_id].type = ContextTypeToString(context->type);

I don't think this works across platforms. On windows / EXEC_BACKEND builds
the location of string constants can differ across backends.  And: Why do we
need the string here? You can just call ContextTypeToString when reading?


> +/*
> + * Free the memory context statistics stored by this process
> + * in DSA area.
> + */
> +void
> +AtProcExit_memstats_dsa_free(int code, Datum arg)
> +{

FWIW, to me the fact that it does a dsa_free() is an implementation
detail. It's also not the only thing this does.

And, I don't think AtProcExit* really is accurate, given that it runs *before*
shmem is cleaned up?

I wonder if the best approach here wouldn't be to forgo the use of a
before_shmem_exit() callback, but instead use on_dsm_detach(). That would
require we'd not constantly detach from the dsm segment, but I don't
understand why we do that in the first place?


> +    int            idx = MyProcNumber;
> +    dsm_segment *dsm_seg = NULL;
> +    dsa_area   *area = NULL;
> +
> +    if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
> +        return;
> +
> +    dsm_seg = dsm_find_mapping(memCtxArea->memstats_dsa_handle);
> +
> +    LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
> +
> +    if (!DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
> +    {
> +        LWLockRelease(&memCtxState[idx].lw_lock);
> +        return;
> +    }
> +
> +    /* If the dsm mapping could not be found, attach to the area */
> +    if (dsm_seg != NULL)
> +        return;

I don't understand what we do here with the dsm?  Why do we not need cleanup
if we are already attached to the dsm segment?


> +/*
> + * Static shared memory state representing the DSA area created for memory
> + * context statistics reporting.  A single DSA area is created and used by all
> + * the processes, each having its specific DSA allocations for sharing memory
> + * statistics, tracked by per backend static shared memory state.
> + */
> +typedef struct MemoryContextState
> +{
> +    dsa_handle    memstats_dsa_handle;
> +    LWLock        lw_lock;
> +} MemoryContextState;

IMO that's too generic a name for something in a header.


> +/*
> + * Used for storage of transient identifiers for pg_get_backend_memory_contexts
> + */
> +typedef struct MemoryContextId
> +{
> +    MemoryContext context;
> +    int            context_id;
> +} MemoryContextId;

This too.  Particularly because MemoryContextData->ident exist but is
something different.



> +DO $$
> +DECLARE
> +    launcher_pid int;
> +    r RECORD;
> +BEGIN
> +        SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
> +     INTO launcher_pid;
> +
> +        select type, name, ident
> +        from pg_get_process_memory_contexts(launcher_pid, false, 20)
> +     where path = '{1}' into r;
> +    RAISE NOTICE '%', r;
> +        select type, name, ident
> +        from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
> +     where path = '{1}' into r;
> +    RAISE NOTICE '%', r;
> +END $$;

I'd also test an aux process.  I think the AV launcher isn't one, because it
actually does "table" access of shared relations.


Greetings,

Andres Freund

Re: Enhancing Memory Context Statistics Reporting

From

Rahila Syed

Date:

07 April, 19:27:57

Hi,

Please see some responses below.

On Mon, Apr 7, 2025 at 9:13 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2025-04-07 15:41:37 +0200, Daniel Gustafsson wrote:
> I think this function can be a valuable debugging aid going forward.

What I am most excited about for this is to be able to measure server-wide and
fleet-wide memory usage over time. Today I have actually very little idea
about what memory is being used for across all connections, not to speak of a
larger number of servers.

> diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
> index 4f6795f7265..d3b4df27935 100644
> --- a/src/backend/postmaster/auxprocess.c
> +++ b/src/backend/postmaster/auxprocess.c
> @@ -84,6 +84,13 @@ AuxiliaryProcessMainCommon(void)
> /* register a before-shutdown callback for LWLock cleanup */
> before_shmem_exit(ShutdownAuxiliaryProcess, 0);
>
> + /*
> + * The before shmem exit callback frees the DSA memory occupied by the
> + * latest memory context statistics that could be published by this aux
> + * proc if requested.
> + */
> + before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
> +
> SetProcessingMode(NormalProcessing);
> }

How about putting it into BaseInit()? Or maybe just register it when its
first used?

Problem with registering it when dsa is first used is that dsa is used in an interrupt handler.
The handler could be called from the PG_ENSURE_ERROR_CLEANUP block. This block
operates under the assumption that the before_shmem_exit callback registered at the beginning,
will be the last one in the registered callback list at the end of the block. However, this won't be
the case if a callback is registered from an interrupt handler called in the
PG_ENSURE_ERROR_CLEANUP block.

I don't really understand why DSA_DEFAULT_INIT_SEGMENT_SIZE is

something that makes sense to use here?

To determine the memory limit per backend in multiples of DSA_DEFAULT_INIT_SEGMENT_SIZE.
Currently it is set to 1 * DSA_DEFAULT_INIT_SEGMENT_SIZE.
Since a call to dsa_create would create a DSA segment of this size, I thought it makes sense
to define a limit related to the segment size.

> +/*

> + /* If the dsm mapping could not be found, attach to the area */
> + if (dsm_seg != NULL)
> + return;

I don't understand what we do here with the dsm? Why do we not need cleanup
if we are already attached to the dsm segment?

I am not expecting to hit this case, since we are always detaching from the dsa.
This could be an assert but since it is a cleanup code, I thought returning would be
a harmless step.

Thank you,

Rahila Syed

Re: Enhancing Memory Context Statistics Reporting

From

Andres Freund

Date:

07 April, 19:38:28

Hi,

On 2025-04-07 21:57:57 +0530, Rahila Syed wrote:
> > > diff --git a/src/backend/postmaster/auxprocess.c
> > b/src/backend/postmaster/auxprocess.c
> > > index 4f6795f7265..d3b4df27935 100644
> > > --- a/src/backend/postmaster/auxprocess.c
> > > +++ b/src/backend/postmaster/auxprocess.c
> > > @@ -84,6 +84,13 @@ AuxiliaryProcessMainCommon(void)
> > >       /* register a before-shutdown callback for LWLock cleanup */
> > >       before_shmem_exit(ShutdownAuxiliaryProcess, 0);
> > >
> > > +     /*
> > > +      * The before shmem exit callback frees the DSA memory occupied by
> > the
> > > +      * latest memory context statistics that could be published by
> > this aux
> > > +      * proc if requested.
> > > +      */
> > > +     before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
> > > +
> > >       SetProcessingMode(NormalProcessing);
> > >  }
> >
> > How about putting it into BaseInit()?  Or maybe just register it when its
> > first used?
> >
> >
> Problem with registering it when dsa is first used is that dsa is used in an
> interrupt handler. The handler could be called from the
> PG_ENSURE_ERROR_CLEANUP block. This block operates under the assumption that
> the before_shmem_exit callback registered at the beginning, will be the last
> one in the registered callback list at the end of the block. However, this
> won't be the case if a callback is registered from an interrupt handler
> called in the PG_ENSURE_ERROR_CLEANUP block.

Ugh, I really dislike PG_ENSURE_ERROR_CLEANUP().

That's not an argument against moving it to BaseInit() though, as that's
called before procsignal is even initialized and before signals are unmasked.


> I don't really understand why DSA_DEFAULT_INIT_SEGMENT_SIZE is
> 
> something that makes sense to use here?
> >
> >
> To determine the memory  limit per backend in multiples of
> DSA_DEFAULT_INIT_SEGMENT_SIZE.
> Currently it is set to  1 * DSA_DEFAULT_INIT_SEGMENT_SIZE.
> Since a call to dsa_create would create a DSA segment of this size, I
> thought it makes sense
> to define a limit related to the segment size.

I strongly disagree.  The limit should be in an understandable unit, not on
another subystems's defaults that might change at some point.



> > +     /* If the dsm mapping could not be found, attach to the area */
> > > +     if (dsm_seg != NULL)
> > > +             return;
> >
> > I don't understand what we do here with the dsm?  Why do we not need
> > cleanup
> > if we are already attached to the dsm segment?
> >
> 
> I am not expecting to hit this case, since we are always detaching from the
> dsa.

Pretty sure it's reachable, consider a failure of dsa_allocate(). That'll
throw an error, while attached to the segment.


> This could be an assert but since it is a cleanup code, I thought returning
> would be a harmless step.

The problem is that the code seems wrong - if we are already attached we'll
leak the memory!


As I also mentioned, I don't understand why we're constantly
attaching/detaching from the dsa/dsm either. It just seems to make things more
complicated an dmore expensive.

Greetings,

Andres Freund

Re: Enhancing Memory Context Statistics Reporting

From

Rahila Syed

Date:

07 April, 21:30:54

That's not an argument against moving it to BaseInit() though, as that's
called before procsignal is even initialized and before signals are unmasked.

Yes, OK.

> I don't really understand why DSA_DEFAULT_INIT_SEGMENT_SIZE is
>
> something that makes sense to use here?
> >
> >
> To determine the memory limit per backend in multiples of
> DSA_DEFAULT_INIT_SEGMENT_SIZE.
> Currently it is set to 1 * DSA_DEFAULT_INIT_SEGMENT_SIZE.
> Since a call to dsa_create would create a DSA segment of this size, I
> thought it makes sense
> to define a limit related to the segment size.

I strongly disagree. The limit should be in an understandable unit, not on
another subystems's defaults that might change at some point.

OK, makes sense.

> > + /* If the dsm mapping could not be found, attach to the area */
> > > + if (dsm_seg != NULL)
> > > + return;
> >
> > I don't understand what we do here with the dsm? Why do we not need
> > cleanup
> > if we are already attached to the dsm segment?
> >
>
> I am not expecting to hit this case, since we are always detaching from the
> dsa.

Pretty sure it's reachable, consider a failure of dsa_allocate(). That'll
throw an error, while attached to the segment.

You are right, I did not think of this scenario.

> This could be an assert but since it is a cleanup code, I thought returning
> would be a harmless step.

The problem is that the code seems wrong - if we are already attached we'll
leak the memory!

I understand your concern. One issue I recall is that we do not have a dsa_find_mapping
function similar to dsm_find_mapping(). If I understand correctly, the only way to access
an already attached DSA is to ensure we store the DSA area mapping in a global variable.
I'm considering using a global variable and accessing it from the cleanup function in case
it is already mapped.
Does that sound fine?

As I also mentioned, I don't understand why we're constantly
attaching/detaching from the dsa/dsm either. It just seems to make things more
complicated an dmore expensive.

OK, I see that this could be expensive if a process is periodically being queried for
statistics. However, in scenarios where a process is queried only once for memory,
statistics, keeping the area mapped would consume memory resources, correct?

Thank you,

Rahila Syed

Re: Enhancing Memory Context Statistics Reporting

From

Daniel Gustafsson

Date:

08 April, 02:17:17

> On 7 Apr 2025, at 17:43, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2025-04-07 15:41:37 +0200, Daniel Gustafsson wrote:
>> I think this function can be a valuable debugging aid going forward.
>
> What I am most excited about for this is to be able to measure server-wide and
> fleet-wide memory usage over time. Today I have actually very little idea
> about what memory is being used for across all connections, not to speak of a
> larger number of servers.

Thanks for looking, Rahila and I took a collective stab at the review comments.

>> + before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
>> +
>> SetProcessingMode(NormalProcessing);
>> }
>
> How about putting it into BaseInit()?  Or maybe just register it when its
> first used?

Moved to BaseInit().

>> +MEM_CTX_PUBLISH "Waiting for a process to publish memory information."
>
> The memory context stuff abbreviates as cxt not ctx.  There's a few more cases
> of that in the patch.

I never get that right. Fixed.

>> + return (context_type);
>
> Why these parens?

Must be a leftover from something, fixed. Sorry about that.

>> + * If the publishing backend does not respond before the condition variable
>> + * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry given that there is
>> + * time left within the timeout specified by the user, before giving up and
>> + * returning previously published statistics, if any. If no previous statistics
>> + * exist, return NULL.
>
> Why do we need to repeatedly wake up rather than just sleeping with the
> "remaining" amount of time based on the time the function was called and the
> time that has passed since?

Fair point, the current coding was a conversion from the previous retry-based
approach but your suggestion is clearly correct.  There is still potential for
refactoring but at this point I don't want to change too much all at once.

>> + * A valid DSA pointer isn't proof that statistics are available, it can
>> + * be valid due to previously published stats.
>
> Somehow "valid DSA pointer" is a bit too much about the precise mechanics and
> not enough about what's actually happening. I'd rather say something like
>
> "Even if the proc has published statistics, they may not be due to the current
> request, but previously published stats."

Agreed, thats better. Changed.

>> + if (!IsUnderPostmaster)
>> + {
>> + Assert(!found);
>
> I don't really understand why this uses IsUnderPostmaster?  Seems like this
> should just use found like most (or all) the other *ShmemInit() functions do?

Agreed, Fixed.

>> + LWLockInitialize(&memCtxArea->lw_lock, LWLockNewTrancheId());
>
> I think for builtin code we just hardcode the tranches in BuiltinTrancheIds.

Fixed.

> It feels a bit silly to duplicate the call to context->methods->stats three
> times. We've changed these parameters a bunch in the past, having more callers
> to fix makes that more work. Can't the switch just set up the args that are
> then passed to one call to context->methods->stats?

I don't disagree, but I prefer to do that as a separate refactoring to not
change too many things all at once.

>> +
>> + /* Compute the number of stats that can fit in the defined limit */
>> + max_stats = (MAX_SEGMENTS_PER_BACKEND * DSA_DEFAULT_INIT_SEGMENT_SIZE)
>> + / (MAX_MEMORY_CONTEXT_STATS_SIZE);
>
> MAX_SEGMENTS_PER_BACKEND sounds way too generic to me for something defined in
> memutils.h.  I don't really understand why DSA_DEFAULT_INIT_SEGMENT_SIZE is
> something that makes sense to use here?

Renamed, and dependency on DSA_DEFAULT_INIT_SEGMENT_SIZE removed.

>> + /*
>> + * Hold the process lock to protect writes to process specific memory. Two
>> + * processes publishing statistics do not block each other.
>> + */
>
> s/specific/process specific/

That's what it says though.. isn't it? I might be missing something obvious.

>> + dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
>> + memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
>
> Both callers to free_memorycontextstate_dsa() do these lines immediately after
> calling free_memorycontextstate_dsa(), why not do that inside?

Fixed.

>> + /* Copy statistics to DSA memory */
>> + PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area, 100);
>> + }
>> + else
>> + {
>> + /* Examine the context stats */
>> + memset(&stat, 0, sizeof(stat));
>> + (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
>
> But do we really do it twice in a row?  The lines are exactly the same, so it
> seems that should just be done before the if?

Fixed.

>> +signal_memorycontext_reporting(void)
>
> IMO somewhat confusing to release the lock in a function named
> signal_memorycontext_reporting().  Why do we do that after
> hash_destroy()/dsa_detach()?

The function has been renamed for clarity.

>> + /* context id starts with 1 */
>> + entry->context_id = ++(*stats_count);
>
> Given that we don't actually do anything here relating to starting with 1, I
> find that comment confusing.

Reworded, not sure if it's much better tbh.

>> + memctx_info[curr_id].name = dsa_allocate0(area, namelen + 1);
>
> Given the number of references to memctx_info[curr_id] I'd put it in a local variable.

I might be partial, but I sort of prefer this way since it makes the underlying
data structure clear to the reader.

> Why is this a dsa_allocate0 given that we're immediately overwriting it?

It doesn't need to be zeroed as it's immediately overwritten. Fixed.

>> + memctx_info[curr_id].ident = dsa_allocate0(area, idlen + 1);
>> + identptr = (char *) dsa_get_address(area, memctx_info[curr_id].ident);
>> + strlcpy(identptr, ident, idlen + 1);
>
> Hm. First I thought we'd leak memory if this second (and subsequent)
> dsa_allocate failed. Then I thought we'd be ok, because the memory would be
> memory because it'd be reachable from memCtxState[idx].memstats_dsa_pointer.
>
> But I think it wouldn't *quite* work, because memCtxState[idx].total_stats is
> only set *after* we would have failed.

Keeping a running total in .total_stats should make the leak window smaller.

>> + memctx_info[curr_id].type = ContextTypeToString(context->type);
>
> I don't think this works across platforms. On windows / EXEC_BACKEND builds
> the location of string constants can differ across backends.  And: Why do we
> need the string here? You can just call ContextTypeToString when reading?

Correct, we can just store the type and call ContextTypeToString when
generating the tuple. Fixed.

>> +/*
>> + * Free the memory context statistics stored by this process
>> + * in DSA area.
>> + */
>> +void
>> +AtProcExit_memstats_dsa_free(int code, Datum arg)
>> +{
>
> FWIW, to me the fact that it does a dsa_free() is an implementation
> detail. It's also not the only thing this does.

Renamed.

> And, I don't think AtProcExit* really is accurate, given that it runs *before*
> shmem is cleaned up?
>
> I wonder if the best approach here wouldn't be to forgo the use of a
> before_shmem_exit() callback, but instead use on_dsm_detach(). That would
> require we'd not constantly detach from the dsm segment, but I don't
> understand why we do that in the first place?

The attach/detach has been removed.

>> + /* If the dsm mapping could not be found, attach to the area */
>> + if (dsm_seg != NULL)
>> + return;
>
> I don't understand what we do here with the dsm?  Why do we not need cleanup
> if we are already attached to the dsm segment?

Fixed.

>> +} MemoryContextState;
>
> IMO that's too generic a name for something in a header.
>
>> +} MemoryContextId;
>
> This too.  Particularly because MemoryContextData->ident exist but is
> something different.

Renamed both to use MemoryContextReporting* namespace, which leaves
MemoryContextReportingBackendState at an unwieldly long name.  I'm running out
of ideas on how to improve and it does make purpose quite explicit at least.

>> +        from pg_get_process_memory_contexts(launcher_pid, false, 20)
>> + where path = '{1}' into r;
>> + RAISE NOTICE '%', r;
>> +        select type, name, ident
>> +        from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
>> + where path = '{1}' into r;
>> + RAISE NOTICE '%', r;
>> +END $$;
>
> I'd also test an aux process.  I think the AV launcher isn't one, because it
> actually does "table" access of shared relations.

Fixed, switched from the AV launcher.

--
Daniel Gustafsson

Attachment

v27-0001-Add-function-to-get-memory-context-stats-for-pro.patch

Re: Enhancing Memory Context Statistics Reporting

From

Andres Freund

Date:

08 April, 03:03:36

Hi,

On 2025-04-08 01:17:17 +0200, Daniel Gustafsson wrote:
> > On 7 Apr 2025, at 17:43, Andres Freund <andres@anarazel.de> wrote:
> 
> >> + /*
> >> + * Hold the process lock to protect writes to process specific memory. Two
> >> + * processes publishing statistics do not block each other.
> >> + */
> > 
> > s/specific/process specific/
> 
> That's what it says though.. isn't it? I might be missing something obvious.

Understandable confusion, not sure what my brain was doing anymore
either...



> >> +} MemoryContextState;
> > 
> > IMO that's too generic a name for something in a header.
> > 
> >> +} MemoryContextId;
> > 
> > This too.  Particularly because MemoryContextData->ident exist but is
> > something different.
> 
> Renamed both to use MemoryContextReporting* namespace, which leaves
> MemoryContextReportingBackendState at an unwieldly long name.  I'm running out
> of ideas on how to improve and it does make purpose quite explicit at least.

How about

MemoryContextReportingBackendState -> MemoryStatsBackendState
MemoryContextReportingId -> MemoryStatsContextId
MemoryContextReportingSharedState -> MemoryStatsCtl
MemoryContextReportingStatsEntry -> MemoryStatsEntry


> >> + /* context id starts with 1 */
> >> + entry->context_id = ++(*stats_count);
> > 
> > Given that we don't actually do anything here relating to starting with 1, I
> > find that comment confusing.
> 
> Reworded, not sure if it's much better tbh.

I'd probably just remove the comment.


> > Hm. First I thought we'd leak memory if this second (and subsequent)
> > dsa_allocate failed. Then I thought we'd be ok, because the memory would be
> > memory because it'd be reachable from memCtxState[idx].memstats_dsa_pointer.
> > 
> > But I think it wouldn't *quite* work, because memCtxState[idx].total_stats is
> > only set *after* we would have failed.
> 
> Keeping a running total in .total_stats should make the leak window smaller.

Why not just initialize .total_stats *before* calling any fallible code?
Afaict it's zero-allocated, so the free function should have no problem
dealing with the entries that haven't yet been populated/

Greetings,

Andres Freund

Re: Enhancing Memory Context Statistics Reporting

From

Rahila Syed

Date:

08 April, 08:40:34

Hi Daniel, Andres,

> >> +} MemoryContextState;
> >
> > IMO that's too generic a name for something in a header.
> >
> >> +} MemoryContextId;
> >
> > This too. Particularly because MemoryContextData->ident exist but is
> > something different.
>
> Renamed both to use MemoryContextReporting* namespace, which leaves
> MemoryContextReportingBackendState at an unwieldly long name. I'm running out
> of ideas on how to improve and it does make purpose quite explicit at least.

How about

MemoryContextReportingBackendState -> MemoryStatsBackendState
MemoryContextReportingId -> MemoryStatsContextId
MemoryContextReportingSharedState -> MemoryStatsCtl
MemoryContextReportingStatsEntry -> MemoryStatsEntry

Fixed accordingly.

> >> + /* context id starts with 1 */
> >> + entry->context_id = ++(*stats_count);
> >
> > Given that we don't actually do anything here relating to starting with 1, I
> > find that comment confusing.
>
> Reworded, not sure if it's much better tbh.

I'd probably just remove the comment.

Reworded to mention that we pre-increment stats_count to make sure
id starts with 1.

> > Hm. First I thought we'd leak memory if this second (and subsequent)
> > dsa_allocate failed. Then I thought we'd be ok, because the memory would be
> > memory because it'd be reachable from memCtxState[idx].memstats_dsa_pointer.
> >
> > But I think it wouldn't *quite* work, because memCtxState[idx].total_stats is
> > only set *after* we would have failed.
>
> Keeping a running total in .total_stats should make the leak window smaller.

Why not just initialize .total_stats *before* calling any fallible code?
Afaict it's zero-allocated, so the free function should have no problem
dealing with the entries that haven't yet been populated/

Fixed accordingly.

PFA a v28 which passes all local and github CI tests.

Thank you,
Rahila Syed

Attachment

v28-0001-Add-function-to-get-memory-context-stats-for-process.patch

Re: Enhancing Memory Context Statistics Reporting

From

Daniel Gustafsson

Date:

08 April, 11:03:09

On 8 Apr 2025, at 07:40, Rahila Syed <rahilasyed90@gmail.com> wrote:

> Renamed both to use MemoryContextReporting* namespace, which leaves
> MemoryContextReportingBackendState at an unwieldly long name. I'm running out
> of ideas on how to improve and it does make purpose quite explicit at least.

How about

MemoryContextReportingBackendState -> MemoryStatsBackendState
MemoryContextReportingId -> MemoryStatsContextId
MemoryContextReportingSharedState -> MemoryStatsCtl
MemoryContextReportingStatsEntry -> MemoryStatsEntry

Fixed accordingly.

That's much better, thanks.

There was a bug in the shmem init function which caused it to fail on Windows,

the attached fixes that.

--
Daniel Gustafsson

Attachment

v29-0001-Add-function-to-get-memory-context-stats-for-pro.patch

Re: Enhancing Memory Context Statistics Reporting

From

Daniel Gustafsson

Date:

08 April, 12:46:59

> On 8 Apr 2025, at 10:03, Daniel Gustafsson <daniel@yesql.se> wrote:

> There was a bug in the shmem init function which caused it to fail on Windows,
> the attached fixes that.

With this building green in CI over several re-builds, and another pass over
the docs and code with pgindent etc done, I pushed this earlier today.  A few
BF animals have built green so far but I will continue to monitor it.

--
Daniel Gustafsson

Re: Enhancing Memory Context Statistics Reporting

From

Fujii Masao

Date:

08 April, 19:41:49

On 2025/04/08 18:46, Daniel Gustafsson wrote:
>> On 8 Apr 2025, at 10:03, Daniel Gustafsson <daniel@yesql.se> wrote:
> 
>> There was a bug in the shmem init function which caused it to fail on Windows,
>> the attached fixes that.
> 
> With this building green in CI over several re-builds, and another pass over
> the docs and code with pgindent etc done, I pushed this earlier today.  A few
> BF animals have built green so far but I will continue to monitor it.

Thanks for committing this feature!

I noticed that the third argument of pg_get_process_memory_contexts() is named
"retries" in pg_proc.dat, while the documentation refers to it as "timeout".
Since "retries" is misleading, how about renaming it to "timeout" in pg_proc.dat?
Patch attached.

Also, as I mentioned earlier, I encountered an issue when calling
pg_get_process_memory_contexts() on the PID of a backend that had just
encountered an error but hadn't finished rolling back. It led to
the following situation:

    Session 1 (PID=70011):
    =# begin;
    =# select 1/0;
    ERROR:  division by zero

    Session 2:
    =# select * from pg_get_process_memory_contexts(70011, false, 10);

    Session 1 terminated with:
    ERROR:  ResourceOwnerEnlarge called after release started
    FATAL:  terminating connection because protocol synchronization was lost

Shouldn't this be addressed?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

v1-0001-Rename-misleading-argument-in-pg_get_process_memo.patch

Re: Enhancing Memory Context Statistics Reporting

From

Daniel Gustafsson

Date:

08 April, 19:44:41

> On 8 Apr 2025, at 18:41, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2025/04/08 18:46, Daniel Gustafsson wrote:
>>> On 8 Apr 2025, at 10:03, Daniel Gustafsson <daniel@yesql.se> wrote:
>>> There was a bug in the shmem init function which caused it to fail on Windows,
>>> the attached fixes that.
>> With this building green in CI over several re-builds, and another pass over
>> the docs and code with pgindent etc done, I pushed this earlier today.  A few
>> BF animals have built green so far but I will continue to monitor it.
>
> Thanks for committing this feature!
>
> I noticed that the third argument of pg_get_process_memory_contexts() is named
> "retries" in pg_proc.dat, while the documentation refers to it as "timeout".
> Since "retries" is misleading, how about renaming it to "timeout" in pg_proc.dat?
> Patch attached.

Ugh, that's my bad.  It was changed from using retries to a timeout and I
missed that.

> Also, as I mentioned earlier, I encountered an issue when calling
> pg_get_process_memory_contexts() on the PID of a backend that had just
> encountered an error but hadn't finished rolling back. It led to
> the following situation:
>
>   Session 1 (PID=70011):
>   =# begin;
>   =# select 1/0;
>   ERROR:  division by zero
>
>   Session 2:
>   =# select * from pg_get_process_memory_contexts(70011, false, 10);
>
>   Session 1 terminated with:
>   ERROR:  ResourceOwnerEnlarge called after release started
>   FATAL:  terminating connection because protocol synchronization was lost
>
> Shouldn't this be addressed?

Sorry, this must've been missed in this fairly lon thread, will have a look at
it tonight.

--
Daniel Gustafsson

Re: Enhancing Memory Context Statistics Reporting

From

Daniel Gustafsson

Date:

09 April, 00:27:41

> On 8 Apr 2025, at 18:41, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

> I noticed that the third argument of pg_get_process_memory_contexts() is named
> "retries" in pg_proc.dat, while the documentation refers to it as "timeout".

I've committed this patch as it was obviously correct, thanks!

> Also, as I mentioned earlier, I encountered an issue when calling
> pg_get_process_memory_contexts() on the PID of a backend that had just
> encountered an error but hadn't finished rolling back. It led to
> the following situation:

I reconfirmed that the bugfix that Rahila shared in [0] fixes this issue (and
will fix others like it, as it's not related to this patch in particular but is
a bug in DSM attaching).  My plan is to take that for a more thorough review
and test tomorrow and see how far it can be safely backpatched.  Thanks for
bringing this up, sorry about it getting a bit lost among all the emails.

--
Daniel Gustafsson

[0] CAH2L28shr0j3JE5V3CXDFmDH-agTSnh2V8pR23X0UhRMbDQD9Q@mail.gmail.com

Re: Enhancing Memory Context Statistics Reporting

From

Fujii Masao

Date:

09 April, 02:28:09

On 2025/04/09 6:27, Daniel Gustafsson wrote:
>> On 8 Apr 2025, at 18:41, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> 
>> I noticed that the third argument of pg_get_process_memory_contexts() is named
>> "retries" in pg_proc.dat, while the documentation refers to it as "timeout".
> 
> I've committed this patch as it was obviously correct, thanks!

Thanks a lot!

Since pg_proc.dat was modified, do we need to bump the catalog version?


>> Also, as I mentioned earlier, I encountered an issue when calling
>> pg_get_process_memory_contexts() on the PID of a backend that had just
>> encountered an error but hadn't finished rolling back. It led to
>> the following situation:
> 
> I reconfirmed that the bugfix that Rahila shared in [0] fixes this issue (and
> will fix others like it, as it's not related to this patch in particular but is
> a bug in DSM attaching).  My plan is to take that for a more thorough review
> and test tomorrow and see how far it can be safely backpatched.  Thanks for
> bringing this up, sorry about it getting a bit lost among all the emails.

Appreciate your work on this!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Enhancing Memory Context Statistics Reporting

From

Rahila Syed

Date:

29 April, 16:13:07

Hi,

Please find attached a patch with some comments and documentation changes.
Additionaly, added a missing '\0' termination to "Remaining Totals" string.
I think this became necessary after we replaced dsa_allocate0()
with dsa_allocate() is the latest version.

Thank you,

Rahila Syed

Attachment

0001-Fix-typos-and-modify-few-comments.patch

Re: Enhancing Memory Context Statistics Reporting

From

Peter Eisentraut

Date:

30 April, 13:14:26

On 29.04.25 15:13, Rahila Syed wrote:
> Please find attached a patch with some comments and documentation changes.
> Additionaly, added a missing '\0' termination to "Remaining Totals" string.
> I think this became necessary after we replaced dsa_allocate0()
> with dsa_allocate() is the latest version.

 >              strncpy(nameptr, "Remaining Totals", namelen);
 > +            nameptr[namelen] = '\0';

Looks like a case for strlcpy()?

Re: Enhancing Memory Context Statistics Reporting

From

Daniel Gustafsson

Date:

30 April, 13:43:24

> On 30 Apr 2025, at 12:14, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 29.04.25 15:13, Rahila Syed wrote:
>> Please find attached a patch with some comments and documentation changes.
>> Additionaly, added a missing '\0' termination to "Remaining Totals" string.
>> I think this became necessary after we replaced dsa_allocate0()
>> with dsa_allocate() is the latest version.
>
> >   strncpy(nameptr, "Remaining Totals", namelen);
> > + nameptr[namelen] = '\0';
>
> Looks like a case for strlcpy()?

True.  I did go ahead with the strncpy and nul terminator assignment, mostly
out of muscle memory, but I agree that this would be a good place for a
strlcpy() instead.

--
Daniel Gustafsson