Thread: Re: Enhancing Memory Context Statistics Reporting
On 2024-Nov-14, Michael Paquier wrote: > Already mentioned previously at [1] and echoing with some surrounding > arguments, but I'd suggest to keep it simple and just remove entirely > the part of the patch where the stats information gets spilled into > disk. With more than 6000-ish context information available with a > hard limit in place, there should be plenty enough to know what's > going on anyway. Functionally-wise I don't necessarily agree with _removing_ the spill code, considering that production systems with thousands of tables would easily reach that number of contexts (each index gets its own index info context, each regexp gets its own memcxt); and I don't think silently omitting a fraction of people's memory situation (or erroring out if the case is hit) is going to make us any friends. That said, it worries me that we choose a shared memory size so large that it becomes impractical to hit the spill-to-disk code in regression testing. Maybe we can choose a much smaller limit size when USE_ASSERT_CHECKING is enabled, and use a test that hits that number? That way, we know the code is being hit and tested, without imposing a huge memory consumption on test machines. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Tiene valor aquel que admite que es un cobarde" (Fernandel)
On Wed, Nov 20, 2024 at 2:39 PM Rahila Syed <rahilasyed90@gmail.com> wrote: > > Hi, > >> To achieve both completeness and avoid writing to a file, I can consider >> displaying the numbers for the remaining contexts as a cumulative total >> at the end of the output. >> >> Something like follows: >> ``` >> postgres=# select * from pg_get_process_memory_contexts('237244', false); >> name | ident | type | path | total_bytes| tot >> al_nblocks | free_bytes | free_chunks | used_bytes | pid >> ---------------------------------------+------------------------------------------------+----------+--------------+-------------+---- >> -----------+------------+-------------+------------+-------- >> TopMemoryContext | | AllocSet | {0} | 97696 | >> 5 | 14288 | 11 | 83408 | 237244 >> search_path processing cache | | AllocSet | {0,1} | 8192 | >> 1 | 5328 | 7 | 2864 | 237244 >> Remaining contexts total: 23456 bytes (total_bytes) , 12345(used_bytes), 11,111(free_bytes) >> >> ``` > > > Please find attached an updated patch with this change. The file previously used to > store spilled statistics has been removed. Instead, a cumulative total of the > remaining/spilled context statistics is now stored in the DSM segment, which is > displayed as follows. > > postgres=# select * from pg_get_process_memory_contexts('352966', false); > name | ident | type | path | total_bytes | total_nblocks | free_bytes | free_chunks | used_bytes| pi > d > ------------------------------+-------+----------+--------+-------------+---------------+------------+-------------+------------+---- > ---- > TopMemoryContext | | AllocSet | {0} | 97696 | 5 | 14288 | 11 | 83408 | 352 > 966 > . > . > . > MdSmgr | | AllocSet | {0,18} | 8192 | 1 | 7424 | 0 | 768 | 352 > 966 > Remaining Totals | | | | 1756016 | 188 | 658584 | 132 | 1097432 | 352 > 966 > (7129 rows) > ----- > > I believe this serves as a good compromise between completeness > and avoiding the overhead of file handling. However, I am open to > reintroducing file handling if displaying the complete statistics of the > remaining contexts prove to be more important. > > All the known bugs in the patch have been fixed. > > In summary, one DSA per PostgreSQL process is used to share > the statistics of that process. A DSA is created by the first client > backend that requests memory context statistics, and it is pinned > for all future requests to that process. > A handle to this DSA is shared between the client and the publishing > process using fixed shared memory. The fixed shared memory consists > of an array of size MaxBackends + auxiliary processes, indexed > by procno. Each element in this array is less than 100 bytes in size. > > A PostgreSQL process uses a condition variable to signal a waiting client > backend once it has finished publishing the statistics. If, for some reason, > the signal is not sent, the waiting client backend will time out. How does the process know that the client backend has finished reading stats and it can be refreshed? What happens, if the next request for memory context stats comes before first requester has consumed the statistics it requested? Does the shared memory get deallocated when the backend which allocated it exits? > > When statistics of a local backend is requested, this function returns the following > WARNING and exits, since this can be handled by an existing function which > doesn't require a DSA. > > WARNING: cannot return statistics for local backend > HINT: Use pg_get_backend_memory_contexts instead How about using pg_get_backend_memory_contexts() for both - local as well as other backend? Let PID argument default to NULL which would indicate local backend, otherwise some other backend? -- Best Wishes, Ashutosh Bapat
On Fri, Nov 22, 2024 at 6:33 PM Rahila Syed <rahilasyed90@gmail.com> wrote: > > Hi, > >> How does the process know that the client backend has finished reading >> stats and it can be refreshed? What happens, if the next request for >> memory context stats comes before first requester has consumed the >> statistics it requested? >> > A process that's copying its statistics does not need to know that. > Whenever it receives a signal to copy statistics, it goes ahead and > copies the latest statistics to the DSA after acquiring an exclusive > lwlock. > > A requestor takes a lock before it starts consuming the statistics. > If the next request comes while the first requestor is consuming the > statistics, the publishing process will wait on lwlock to be released > by the consuming process before it can write the statistics. > If the next request arrives before the first requester begins consuming > the statistics, the publishing process will acquire the lock and overwrite > the earlier statistics with the most recent ones. > As a result, both the first and second requesters will consume the > updated statistics. IIUC, the publisher and the consumer processes, both, use the same LWLock. Publisher acquires an exclusive lock. Does consumer acquire SHARED lock? The publisher process might be in a transaction, processing a query or doing something else. If it has to wait for an LWLock may affect its performance. This will become even more visible if the client backend is trying to diagnose a slow running query. Have we tried to measure how long the publisher might have to wait for an LWLock while the consumer is consuming statistics OR what is the impact of this wait? >> > >> > When statistics of a local backend is requested, this function returns the following >> > WARNING and exits, since this can be handled by an existing function which >> > doesn't require a DSA. >> > >> > WARNING: cannot return statistics for local backend >> > HINT: Use pg_get_backend_memory_contexts instead >> >> How about using pg_get_backend_memory_contexts() for both - local as >> well as other backend? Let PID argument default to NULL which would >> indicate local backend, otherwise some other backend? >> > I don't see much value in combining the two, specially since with > pg_get_process_memory_contexts() we can query both the postgres > backend and a background process, the name pg_get_backend_memory_context() > would be inaccurate and I am not sure whether a change to rename the > existing function would be welcome. Having two separate functions for the same functionality isn't a friendly user interface. Playing a bit with pg_terminate_backend() which is another function dealing with backends to understand a. what does it do to its own backend and b. which processes are considered backends. 1. pg_terminate_backend() allows to terminate the backend from which it is fired. #select pid, application_name, backend_type, pg_terminate_backend(pid) from pg_stat_activity; FATAL: terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. 2. It considers autovacuum launcher and logical replication launcher as postgres backends but not checkpointer, background writer and walwriter. #select pid, application_name, backend_type, pg_terminate_backend(pid) from pg_stat_activity where pid <> pg_backend_pid(); WARNING: PID 644887 is not a PostgreSQL backend process WARNING: PID 644888 is not a PostgreSQL backend process WARNING: PID 644890 is not a PostgreSQL backend process pid | application_name | backend_type | pg_terminate_backend --------+------------------+------------------------------+---------------------- 645636 | | autovacuum launcher | t 645677 | | logical replication launcher | t 644887 | | checkpointer | f 644888 | | background writer | f 644890 | | walwriter | f (5 rows) In that sense you are correct that pg_get_backend_memory_context() should not provide context information of WAL writer process for example. But pg_get_process_memory_contexts() would be expected to provide its own memory context information instead of redirecting to another function through a WARNING. It could do that redirection itself. That will also prevent the functions' output format going out of sync. -- Best Wishes, Ashutosh Bapat
Hi, I took a quick look at the patch today. Overall, I think this would be very useful, I've repeatedly needed to inspect why a backend uses so much memory, and I ended up triggering MemoryContextStats() from gdb. This would be more convenient / safer. So +1 to the patch intent. A couple review comments: 1) I read through the thread, and in general I agree with the reasoning for removing the file part - it seems perfectly fine to just dump as much as we can fit into a buffer, and then summarize the rest. But do we need to invent a "new" limit here? The other places logging memory contexts do something like this: MemoryContextStatsDetail(TopMemoryContext, 100, 100, false); Which means we only print the 100 memory contexts at the top, and that's it. Wouldn't that give us a reasonable memory limit too? 2) I see the function got renamed to pg_get_process_memory_contexts(), bu the comment still says pg_get_remote_backend_memory_contexts(). 3) I don't see any SGML docs for this new function. I was a bit unsure what the "summary" argument is meant to do. The comment does not explain that either. 4) I wonder if the function needs to return PID. I mean, the caller knows which PID it is for, so it seems rather unnecessary. 5) In the "summary" mode, it might be useful to include info about how many child contexts were aggregated. It's useful to know whether there was 1 child or 10000 children. In the regular (non-summary) mode it'd always be "1", probably, but maybe it'd interact with the limit in (1). Not sure. 6) I feel a bit uneasy about the whole locking / communication scheme. In particular, I'm worried about lockups, deadlocks, etc. So I decided to do a trivial stress-test - just run the new function through pgbench with many clients. The memstats.sql script does just this: SELECT * FROM pg_get_process_memory_contexts( (SELECT pid FROM pg_stat_activity WHERE pid != pg_backend_pid() ORDER BY random() LIMIT 1) , false); where the inner query just picks a PID for some other backend, and asks for memory context stats for that. And just run it like this on a scale 1 pgbench database: pgbench -n -f memstats.sql -c 10 test And it gets stuck *immediately*. I've seen it to wait for other client backends and auxiliary processes like autovacuum launcher. This is absolutely idle system, there's no reason why a process would not respond almost immediately. I wonder if e.g. autovacuum launcher may not be handling these requests, or what if client backends can wait in a cycle. IIRC condition variables are not covered by a deadlock detector, so that would be an issue. But maybe I remember wrong? 7) I've also seen this error: pgbench: error: client 6 script 0 aborted in command 0 query 0: \ ERROR: can't attach the same segment more than once I haven't investigated it, but it seems like a problem handling errors, where we fail to detach from a segment after a timeout. I may be wrong, but it might be related to this: > I opted for DSAs over DSMs to enable memory reuse by freeing > segments for subsequent statistics copies of the same backend, > without needing to recreate DSMs for each request. I feel like this might be a premature optimization - I don't have a clear idea how expensive it is to create DSM per request, but my intuition is that it's cheaper than processing the contexts and generating the info. I'd just remove that, unless someone demonstrates it really matters. I don't really worry about how expensive it is to process a request (within reason, of course) - it will happen only very rarely. It's more important to make sure there's no overhead when no one asks the backend for memory context info, and simplicity. Also, how expensive it is to just keep the DSA "just in case"? Imagine someone asks for the memory context info once - isn't it a was to still keep the DSA? I don't recall how much resources could that be. I don't have a clear opinion on that, I'm more asking for opinions. 8) Two minutes seems pretty arbitrary, and also quite high. If a timeout is necessary, I think it should not be hard-coded. regards -- Tomas Vondra
On 11/29/24 00:23, Rahila Syed wrote: > Hi Tomas, > > Thank you for the review. > > > > 1) I read through the thread, and in general I agree with the reasoning > for removing the file part - it seems perfectly fine to just dump as > much as we can fit into a buffer, and then summarize the rest. But do we > need to invent a "new" limit here? The other places logging memory > contexts do something like this: > > MemoryContextStatsDetail(TopMemoryContext, 100, 100, false); > > Which means we only print the 100 memory contexts at the top, and that's > it. Wouldn't that give us a reasonable memory limit too? > > I think this prints more than 100 memory contexts, since 100 denotes the > max_level > and contexts at each level could have upto 100 children. This limit > seems much higher than > what I am currently storing in DSA which is approx. 7000 contexts. I > will verify this again. > Yeah, you may be right. I don't remember what exactly that limit does. > > 2) I see the function got renamed to pg_get_process_memory_contexts(), > bu the comment still says pg_get_remote_backend_memory_contexts(). > > Fixed > > > 3) I don't see any SGML docs for this new function. I was a bit unsure > what the "summary" argument is meant to do. The comment does not explain > that either. > > Added docs. > Intention behind adding a summary argument is to report statistics of > contexts at level 0 > and 1 i.e TopMemoryContext and its immediate children. > OK > 4) I wonder if the function needs to return PID. I mean, the caller > knows which PID it is for, so it seems rather unnecessary. > > Perhaps it can be used to ascertain that the information indeed belongs to > the requested pid. > I find that a bit ... suspicious. By this logic we'd include the input parameters in every result, but we don't. So why is this case different? > 5) In the "summary" mode, it might be useful to include info about how > many child contexts were aggregated. It's useful to know whether there > was 1 child or 10000 children. In the regular (non-summary) mode it'd > always be "1", probably, but maybe it'd interact with the limit in (1). > Not sure. > > Sure, I will add this in the next iteration. > OK > > 6) I feel a bit uneasy about the whole locking / communication scheme. > In particular, I'm worried about lockups, deadlocks, etc. So I decided > to do a trivial stress-test - just run the new function through pgbench > with many clients. > > The memstats.sql script does just this: > > SELECT * FROM pg_get_process_memory_contexts( > (SELECT pid FROM pg_stat_activity > WHERE pid != pg_backend_pid() > ORDER BY random() LIMIT 1) > , false); > > where the inner query just picks a PID for some other backend, and asks > for memory context stats for that. > > And just run it like this on a scale 1 pgbench database: > > pgbench -n -f memstats.sql -c 10 test > > And it gets stuck *immediately*. I've seen it to wait for other client > backends and auxiliary processes like autovacuum launcher. > > This is absolutely idle system, there's no reason why a process would > not respond almost immediately. > > > In my reproduction, this issue occurred because the process was terminated > while the requesting backend was waiting on the condition variable to be > signaled by it. I don’t see any solution other than having the waiting > client > backend timeout using ConditionVariableTimedSleep. > > In the patch, since the timeout was set to a high value, pgbench ended > up stuck > waiting for the timeout to occur. The failure happens less frequently > after I added an > additional check for the process's existence, but it cannot be entirely > avoided. This is because a process can terminate after we check for its > existence but > before it signals the client. In such cases, the client will not receive > any signal. > Hmmm, I see. I guess there's no way to know if a process responds to us, but I guess it should be possible to wake up regularly and check if the process still exists? Wouldn't that solve the case you mentioned? > I wonder if e.g. autovacuum launcher may > not be handling these requests, or what if client backends can wait in a > cycle. > > > Did not see a cyclic wait in client backends due to the pgbench stress test. > Not sure, but if I modify the query to only request memory contexts from non-client processes, i.e. SELECT * FROM pg_get_process_memory_contexts( (SELECT pid FROM pg_stat_activity WHERE pid != pg_backend_pid() AND backend_type != 'client backend' ORDER BY random() LIMIT 1) , false); then it gets stuck and reports this: pgbench -n -f select.sql -c 4 -T 10 test pgbench (18devel) WARNING: Wait for 105029 process to publish stats timed out, ... But process 105029 still very much exists, and it's the checkpointer: $ ps ax | grep 105029 105029 ? Ss 0:00 postgres: checkpointer OTOH if I modify the script to only look at client backends, and wait until the processes get "stuck" (i.e. waiting on the condition variable, consuming 0% CPU), I get this: $ pgbench -n -f select.sql -c 4 -T 10 test pgbench (18devel) WARNING: Wait for 107146 process to publish stats timed out, try again WARNING: Wait for 107144 process to publish stats timed out, try again WARNING: Wait for 107147 process to publish stats timed out, try again transaction type: select.sql ... but when it gets 'stuck', most of the processes are still very much running (but waiting for contexts from some other process). In the above example I see this: 107144 ? Ss 0:02 postgres: user test [local] SELECT 107145 ? Ss 0:01 postgres: user test [local] SELECT 107147 ? Ss 0:02 postgres: user test [local] SELECT So yes, 107146 seems to be gone. But why would that block getting info from 107144 and 107147? Maybe that's acceptable, but couldn't this be an issue with short-lived connections, making it hard to implement the kind of automated collection of stats that you envision. If it hits this kind of timeouts often, it'll make it hard to reliably collect info. No? > > > I opted for DSAs over DSMs to enable memory reuse by freeing > > segments for subsequent statistics copies of the same backend, > > without needing to recreate DSMs for each request. > > I feel like this might be a premature optimization - I don't have a > clear idea how expensive it is to create DSM per request, but my > intuition is that it's cheaper than processing the contexts and > generating the info. > > I'd just remove that, unless someone demonstrates it really matters. I > don't really worry about how expensive it is to process a request > (within reason, of course) - it will happen only very rarely. It's more > important to make sure there's no overhead when no one asks the backend > for memory context info, and simplicity. > > Also, how expensive it is to just keep the DSA "just in case"? Imagine > someone asks for the memory context info once - isn't it a was to still > keep the DSA? I don't recall how much resources could that be. > > I don't have a clear opinion on that, I'm more asking for opinions. > > > Imagining a tool that periodically queries the backends for statistics, > it would be beneficial to avoid recreating the DSAs for each call. I think it would be nice if you backed this with some numbers. I mean, how expensive is it to create/destroy the DSA? How does it compare to the other stuff this function needs to do? > Currently, DSAs of size 1MB per process > (i.e., a maximum of 1MB * (MaxBackends + auxiliary processes)) > would be created and pinned for subsequent reporting. This size does > not seem excessively high, even for approx 100 backends and > auxiliary processes. > That seems like a pretty substantial amount of memory reserved for each connection. IMHO the benefits would have to be pretty significant to justify this, especially considering it's kept "forever", even if you run the function only once per day. > > 8) Two minutes seems pretty arbitrary, and also quite high. If a timeout > is necessary, I think it should not be hard-coded. > > Not sure which is the ideal value. Changed it to 15 secs and added a > #define as of now. > Something that gives enough time for the process to respond but > does not hold up the client for too long would be ideal. 15 secs seem to > be not enough for github CI tests, which fail with timeout error with > this setting. > > PFA an updated patch with the above changes. Why not to make this a parameter of the function? With some sensible default, but easy to override. regards -- Tomas Vondra
On 12/3/24 20:09, Rahila Syed wrote: > > Hi, > > > > > > 4) I wonder if the function needs to return PID. I mean, the > caller > > knows which PID it is for, so it seems rather unnecessary. > > > > Perhaps it can be used to ascertain that the information indeed > belongs to > > the requested pid. > > > > I find that a bit ... suspicious. By this logic we'd include the input > parameters in every result, but we don't. So why is this case different? > > > This was added to address a review suggestion. I had left it in case > anyone found it useful > for verification. > Previously, I included a check for scenarios where multiple processes > could write to the same > shared memory. Now, each process has a separate shared memory space > identified by > pgprocno, making it highly unlikely for the receiving process to see > another process's memory > dump. > Such a situation could theoretically occur if another process were > mapped to the same > pgprocno, although I’m not sure how likely that is. That said, I’ve > added a check in the receiver > to ensure the PID written in the shared memory matches the PID for which > the dump is > requested. > This guarantees that a user will never see the memory dump of another > process. > Given this, I’m fine with removing the pid column if it helps to make > the output more readable. > I'd just remove that. I agree it might have been useful with the single chunk of shared memory, but I think with separate chunks it's not very useful. And if we can end up with multiple processed getting the same pgprocno I guess we have way bigger problems, this won't fix that. > > 5) In the "summary" mode, it might be useful to include info > about how > > many child contexts were aggregated. It's useful to know > whether there > > was 1 child or 10000 children. In the regular (non-summary) > mode it'd > > always be "1", probably, but maybe it'd interact with the > limit in (1). > > Not sure. > > > > Sure, I will add this in the next iteration. > > > > OK > > > I have added this information as a column named "num_agg_contexts", > which indicates > the number of contexts whose statistics have been aggregated/added for a > particular output. > > In summary mode, all the child contexts of a given level-1 context are > aggregated, and > their statistics are presented as part of the parent context's > statistics. In this case, > num_agg_contexts provides the count of all child contexts under a given > level-1 context. > > In regular (non-summary) mode, this column shows a value of 1, meaning > the statistics > correspond to a single context, with all context statistics displayed > individually. In this mode > an aggregate result is displayed if the number of contexts exceed the > DSA size limit. In > this case the num_agg_contexts will display the number of the remaining > contexts. > OK > > > > In the patch, since the timeout was set to a high value, pgbench ended > > up stuck > > waiting for the timeout to occur. The failure happens less frequently > > after I added an > > additional check for the process's existence, but it cannot be > entirely > > avoided. This is because a process can terminate after we check > for its > > existence but > > before it signals the client. In such cases, the client will not > receive > > any signal. > > > > Hmmm, I see. I guess there's no way to know if a process responds to us, > but I guess it should be possible to wake up regularly and check if the > process still exists? Wouldn't that solve the case you mentioned? > > I have fixed it accordingly in the attached patch by waking up after > every 5 seconds > to check if the process exists and sleeping again if the wake-up condition > is not satisfied. The number of such tries is limited to 20. So, the > total wait > time can be 100 seconds. I will make the re-tries configurable, inline > with your > suggestion to be able to override the default waiting time. > Makes sense, although 100 seconds seems a bit weird, it seems we usually pick "natural" values like 60s, or multiples of that. But if it's configurable, that's not a huge issue. Could the process wake up earlier than the timeout, say if it gets EINT signal? That'd break the "total timeout is 100 seconds", and it would be better to check that explicitly. Not sure if this can happen, though. One thing I'd maybe consider is starting with a short timeout, and gradually increasing it until e.g. 5 seconds (or maybe just 1 second would be perfectly fine, IMHO). With the current coding it means we either get the response right away, or wait 5+ seconds. That's a big huge jump. If we start e.g. with 10ms, and then gradually multiply it by 1.2, it means we only wait "0-20% extra" on average. But perhaps this is very unlikely and not worth the complexity. > > > I wonder if e.g. autovacuum launcher may > > not be handling these requests, or what if client backends can > wait in a > > cycle. > > > > > > Did not see a cyclic wait in client backends due to the pgbench > stress test. > > > > Not sure, but if I modify the query to only request memory contexts from > non-client processes, i.e. > > SELECT * FROM pg_get_process_memory_contexts( > (SELECT pid FROM pg_stat_activity > WHERE pid != pg_backend_pid() > AND backend_type != 'client backend' > ORDER BY random() LIMIT 1) > , false); > > then it gets stuck and reports this: > > pgbench -n -f select.sql -c 4 -T 10 test > pgbench (18devel) > WARNING: Wait for 105029 process to publish stats timed out, ... > > But process 105029 still very much exists, and it's the checkpointer: > > In the case of checkpointer, I also see some wait time after running the > tests that you mentioned, but it eventually completes the request in my > runs. > OK, but why should it even wait that long? Surely the checkpointer should be able to report memory contexts too? > > $ ps ax | grep 105029 > 105029 ? Ss 0:00 postgres: checkpointer > > OTOH if I modify the script to only look at client backends, and wait > until the processes get "stuck" (i.e. waiting on the condition variable, > consuming 0% CPU), I get this: > > $ pgbench -n -f select.sql -c 4 -T 10 test > pgbench (18devel) > WARNING: Wait for 107146 process to publish stats timed out, try again > WARNING: Wait for 107144 process to publish stats timed out, try again > WARNING: Wait for 107147 process to publish stats timed out, try again > transaction type: select.sql > ... > > but when it gets 'stuck', most of the processes are still very much > running (but waiting for contexts from some other process). In the above > example I see this: > > 107144 ? Ss 0:02 postgres: user test [local] SELECT > 107145 ? Ss 0:01 postgres: user test [local] SELECT > 107147 ? Ss 0:02 postgres: user test [local] SELECT > > So yes, 107146 seems to be gone. But why would that block getting info > from 107144 and 107147? > > Most likely 107144 and/or 107147 must also be waiting for 107146 which is > gone. Something like 107144 -> 107147 -> 107146(dead) or 107144 - >>107146(dead) > and 107147->107146(dead). > I think I forgot to mention only 107145 was waiting for 107146 (dead), and the other processes were waiting for 107145 in some way. But yeah, detecting the dead process would improve this, although it also shows the issues can "spread" easily. OTOH it's unlikely to have multiple pg_get_process_memory_contexts() queries pointing at each other like this - monitoring will just to that from one backend, and that's it. So not a huge issue. > > Maybe that's acceptable, but couldn't this be an issue with short-lived > connections, making it hard to implement the kind of automated > collection of stats that you envision. If it hits this kind of timeouts > often, it'll make it hard to reliably collect info. No? > > > Yes, if there is a chain of waiting clients due to a process no longer > existing, > the waiting time to receive information will increase. However, as long > as a failed > a request caused by a non-existent process is detected promptly, the > wait time should > remain manageable, allowing other waiting clients to obtain the > requested information > from the existing processes. > > In such cases, it might be necessary to experiment with the waiting > times at the receiving > client. Making the waiting time user-configurable, as you suggested, by > passing it as an > argument to the function, could help address this scenario. > Thanks for highlighting this, I will test this some more. > I think we should try very hard to make this work well without the user having to mess with the timeouts. These are exceptional conditions that happen only very rarely, which makes it hard to find good values. > > > > > > I opted for DSAs over DSMs to enable memory reuse by freeing > > > segments for subsequent statistics copies of the same backend, > > > without needing to recreate DSMs for each request. > > > > I feel like this might be a premature optimization - I don't > have a > > clear idea how expensive it is to create DSM per request, but my > > intuition is that it's cheaper than processing the contexts and > > generating the info. > > > > I'd just remove that, unless someone demonstrates it really > matters. I > > don't really worry about how expensive it is to process a request > > (within reason, of course) - it will happen only very rarely. > It's more > > important to make sure there's no overhead when no one asks > the backend > > for memory context info, and simplicity. > > > > Also, how expensive it is to just keep the DSA "just in case"? > Imagine > > someone asks for the memory context info once - isn't it a was > to still > > keep the DSA? I don't recall how much resources could that be. > > > > I don't have a clear opinion on that, I'm more asking for > opinions. > > > > > > Imagining a tool that periodically queries the backends for > statistics, > > it would be beneficial to avoid recreating the DSAs for each call. > > I think it would be nice if you backed this with some numbers. I mean, > how expensive is it to create/destroy the DSA? How does it compare to > the other stuff this function needs to do? > > After instrumenting the code with timestamps, I observed that DSA creation > accounts for approximately 17% to 26% of the total execution time of the > function > pg_get_process_memory_contexts(). > > > Currently, DSAs of size 1MB per process > > (i.e., a maximum of 1MB * (MaxBackends + auxiliary processes)) > > would be created and pinned for subsequent reporting. This size does > > not seem excessively high, even for approx 100 backends and > > auxiliary processes. > > > > That seems like a pretty substantial amount of memory reserved for each > connection. IMHO the benefits would have to be pretty significant to > justify this, especially considering it's kept "forever", even if you > run the function only once per day. > > I can reduce the initial segment size to DSA_MIN_SEGMENT_SIZE, which is > 256KB per process. If needed, this could grow up to 16MB based on the > current settings. > > However, for the scenario you mentioned, it would be ideal to have a > mechanism > to mark a pinned DSA (using dsa_pin()) for deletion if it is not used/ > attached within a > specified duration. Alternatively, I could avoid using dsa_pin() > altogether, allowing the > DSA to be automatically destroyed once all processes detach from it, and > recreate it > for a new request. > > At the moment, I am unsure which approach is most feasible. Any > suggestions would be > greatly appreciated. > I'm entirely unconcerned about the pg_get_process_memory_contexts() performance, within some reasonable limits. It's something executed every now and then - no one is going to complain it takes 10ms extra, measure tps with this function, etc. 17-26% seems surprisingly high, but Even 256kB is too much, IMHO. I'd just get rid of this optimization until someone complains and explains why it's worth it. Yes, let's make it fast, but I don't think we should optimize it at the expense of "regular workload" ... regards -- Tomas Vondra
Hi Rahila, Thanks for working on this. I've wanted something like this a number of times to replace my current method of attaching gdb like everyone else I suppose. I have a question / suggestion about the interface. +Datum +pg_get_process_memory_contexts(PG_FUNCTION_ARGS) +{ + int pid = PG_GETARG_INT32(0); + bool get_summary = PG_GETARG_BOOL(1); IIUC, this always returns all memory contexts starting from TopMemoryContext, summarizing some child contexts if memory doesn't suffice. Would it be helpful to allow users to specify a context other than TopMemoryContext as the root? This could be particularly useful in cases where the information a user is looking for would otherwise be grouped under "Remaining Totals." Alternatively, is there a way to achieve this with the current function, perhaps by specifying a condition in the WHERE clause?
Hi, Thanks for updating the patch and here are some comments: 'path' column of pg_get_process_memory_contexts() begins with 0, but that column of pg_backend_memory_contexts view begins with 1: =# select path FROM pg_get_process_memory_contexts('20271', false); path ------- {0} {0,1} {0,2} .. =# select path from pg_backend_memory_contexts; path ------- {1} {1,2} {1,3} ..asdf asdf Would it be better to begin with 1 to make them consistent? pg_log_backend_memory_contexts() does not allow non-superusers to execute by default since it can peek at other session information. pg_get_process_memory_contexts() does not have this restriction, but wouldn't it be necessary? When the target pid is the local backend, the HINT suggests using pg_get_backend_memory_contexts(), but this function is not described in the manual. How about suggesting pg_backend_memory_contexts view instead? =# select pg_get_process_memory_contexts('27041', false); WARNING: cannot return statistics for local backend HINT: Use pg_get_backend_memory_contexts instead There are no explanations about 'num_agg_contexts', but I thought the explanation like below would be useful. > I have added this information as a column named "num_agg_contexts", > which indicates > the number of contexts whose statistics have been aggregated/added for > a particular output. git apply caused some warnings: $ git apply v7-Function-to-report-memory-context-stats-of-any-backe.patch v7-Function-to-report-memory-context-stats-of-any-backe.patch:71: space before tab in indent. Requests to return the memory contexts of the backend with the v7-Function-to-report-memory-context-stats-of-any-backe.patch:72: space before tab in indent. specified process ID. This function can send the request to v7-Function-to-report-memory-context-stats-of-any-backe.patch:ldmv: space before tab in indent. both the backends and auxiliary processes. After receiving the memory v7-Function-to-report-memory-context-stats-of-any-backe.patch:74: space before tab in indent. contexts from the process, it returns the result as one row per v7-Function-to-report-memory-context-stats-of-any-backe.patch:75: space before tab in indent. context. When get_summary is true, memory contexts at level 0 -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.