Re: Enhancing Memory Context Statistics Reporting - Mailing list pgsql-hackers

From torikoshia
Subject Re: Enhancing Memory Context Statistics Reporting
Date
Msg-id 44a9d3857aaed49b818dc893dde9f6b4@oss.nttdata.com
Whole thread Raw
In response to Re: Enhancing Memory Context Statistics Reporting  (Rahila Syed <rahilasyed90@gmail.com>)
List pgsql-hackers
On 2025-11-28 18:22, Rahila Syed wrote:

Hi,

> I'm attaching the updated patches, which primarily include cleanup and
> have been rebased
> following the CFbot report.

Thanks for updating the patch!

I observed an assertion failure when forcing a timeout as follows:

```
$ psql
   (pid:38587)=#

$ kill -s SIGSTOP 38587

$ psql
   (pid:38618) =# select * from pg_get_process_memory_contexts(38587, 
false);
     name  | ident  |  type  | level  |  path  | total_bytes | 
total_nblocks | free_bytes | free_chunks | used_bytes | num_agg_contexts
     

--------+--------+--------+--------+--------+-------------+---------------+------------+-------------+------------+------------------
      [NULL] | [NULL] | [NULL] | [NULL] | [NULL] |      [NULL] |        
[NULL] |     [NULL] |      [NULL] |     [NULL] |           [NULL]
     (1 row)

    Time: 5013.515 ms (00:05.014)

$ kill -s SIGCONT 38587

$ tail postgresql.log

   TRAP: failed Assert("client_keys[MyProcNumber] != -1"), File: 
"mcxtfuncs.c", Line: 881, PID: 38587
   0   postgres                            0x0000000104943400 
ExceptionalCondition + 216
   1   postgres                            0x000000010480f738 
ProcessGetMemoryContextInterrupt + 140
   2   postgres                            0x00000001046f2710 
ProcessInterrupts + 3008
   3   postgres                            0x00000001046f1a78 
ProcessClientReadInterrupt + 80
   4   postgres                            0x0000000104433994 secure_read 
+ 404
   5   postgres                            0x00000001044411dc pq_recvbuf 
+ 260
   6   postgres                            0x0000000104441088 pq_getbyte 
+ 96
   7   postgres                            0x00000001046fa0fc 
SocketBackend + 44
   8   postgres                            0x00000001046f6d3c ReadCommand 
+ 44
   9   postgres                            0x00000001046f6284 
PostgresMain + 2900
   10  postgres                            0x00000001046ed558 
BackendInitialize + 0
   11  postgres                            0x00000001045c0a48 
postmaster_child_launch + 456
   12  postgres                            0x00000001045c8520 
BackendStartup + 304
   13  postgres                            0x00000001045c636c ServerLoop 
+ 372
   14  postgres                            0x00000001045c4e24 
PostmasterMain + 6448
   15  postgres                            0x0000000104445b4c main + 924
   16  dyld                                0x0000000188662b98 start + 
6076
   2025-12-08 07:35:32.608 JST   [38540] LOG:  00000: client backend (PID 
38587) was terminated by signal 6: Abort trap: 6
   2025-12-08 07:35:32.608 JST   [38540] LOCATION:  LogChildExit, 
postmaster.c:2872
```


Below are comments regarding the v42-0001 patch:

> In order to not block on busy processes, we have hardcoded
> the number of seconds during which to retry before timing out.
> In the case where no statistics are published within the set
> timeout, NULL is returned.

It might be good to also document in func-admin.sgml that the function 
times out after 5 seconds when the target backend does not respond, and 
that in such a case NULLs are returned.

+    * If DSA exists, created by another process requesting statistics, 
attach
+    * to it. We expect the client process to create required DSA and 
Dshash
+    * table.
+    */
+   if (MemoryStatsDsaArea == NULL)
+       MemoryStatsDsaArea = 
GetNamedDSA("memory_context_statistics_dsa",
+                                        &found);
+
+   if (MemoryStatsDsHash == NULL)
+       MemoryStatsDsHash = 
GetNamedDSHash("memory_context_statistics_dshash",
+                                          &memctx_dsh_params, &found);

 From the comment, it sounded to me as if the client executing 
pg_get_process_memory_contexts() might not create the DSA in some cases.
Is it correct to assume that such a situation can happen?
In [1], as a response to concerns about using DSA inside a CFI handler, 
you wrote that “all the dynamic shared memory needed to store the 
statistics is created and deleted in the client function”.
So I understood that it would never create the DSA inside the CFI 
handler.
If that understanding is correct, perhaps the comment should be reworded 
to make that clear.

+   context_id_lookup = 
hash_create("pg_get_remote_backend_memory_contexts",

This appears to use the old function name. Should this be updated to 
"pg_get_process_memory_contexts" instead?

[1] 
https://www.postgresql.org/message-id/CAH2L28sc-rEhyntPLoaC2XUa0ZjS5ka6KzEbuSVxQBBnUYu1KQ%40mail.gmail.com


-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Extended Statistics set/restore/clear functions.
Next
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Newly created replication slot may be invalidated by checkpoint