Re: Get memory contexts of an arbitrary backend process - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Get memory contexts of an arbitrary backend process
Date
Msg-id 168dbd34-c081-b832-e795-ffcd7c5be35c@oss.nttdata.com
Whole thread Raw
In response to Re: Get memory contexts of an arbitrary backend process  (torikoshia <torikoshia@oss.nttdata.com>)
Responses Re: Get memory contexts of an arbitrary backend process
List pgsql-hackers

On 2021/03/29 11:59, torikoshia wrote:
> On 2021-03-26 14:08, Kyotaro Horiguchi wrote:
>> At Fri, 26 Mar 2021 14:02:49 +0900, Fujii Masao
>> <masao.fujii@oss.nttdata.com> wrote in
>>>
>>>
>>> On 2021/03/26 13:28, Kyotaro Horiguchi wrote:
>>> >> "some contexts are omitted"
>>> >> "n child contexts: total_bytes = ..."
>>> > Sorry I missed that is already implemented.  So my opnion is I agree
>>> > with limiting with a fixed-number, and preferablly sorted in
>>> > descending order of... totalspace/nblocks?
>>>
>>> This may be an improvement, but makes us modify
>>> MemoryContextStatsInternal()
>>> very much. I'm afraid that it's too late to do that at this stage...
>>> What about leaving the output order as it is at the first version?
>>
>> So I said "preferably":p  (with a misspelling...)
>> I'm fine with that.
>>
>> regards.
> 
> Thanks for the comments!
> 
> Attached a new patch.

Thanks!

> It adds pg_log_backend_memory_contexts(pid) which logs memory contexts
> of the specified backend process.
> 
> The number of child contexts to be logged per parent is limited to 100
> as with MemoryContextStats().
> 
> As written in commit 7b5ef8f2d07, which limits the verbosity of
> memory context statistics dumps, it supposes that practical cases
> where the dump gets long will typically be huge numbers of
> siblings under the same parent context; while the additional
> debugging value from seeing details about individual siblings
> beyond 100 will not be large.
> 
> Thoughts?

I'm OK with 100. We should comment why we chose 100 for that.

Here are some review comments.

Isn't it better to move HandleProcSignalLogMemoryContext() and
ProcessLogMemoryContextInterrupt() to mcxt.c from procsignal.c
(like the functions for notify interrupt are defined in async.c)
because they are the functions for memory contexts?

+ * HandleProcSignalLogMemoryContext
+ *
+ * Handle receipt of an interrupt indicating log memory context.
+ * Signal handler portion of interrupt handling.

IMO it's better to comment why we need to separate the function into two,
i.e., HandleProcSignalLogMemoryContext() and ProcessLogMemoryContextInterrupt(),
like the comment for other similar function explains. What about the followings?

-------------------------------
HandleLogMemoryContextInterrupt

Handle receipt of an interrupt indicating logging of memory contexts.

All the actual work is deferred to ProcessLogMemoryContextInterrupt(),
because we cannot safely emit a log message inside the signal handler.
-------------------------------
ProcessLogMemoryContextInterrupt

Perform logging of memory contexts of this backend process.

Any backend that participates in ProcSignal signaling must arrange to
call this function if we see LogMemoryContextPending set. It is called
from CHECK_FOR_INTERRUPTS(), which is enough because the target process
for logging of memory contexts is a backend.
-------------------------------


+    if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
+        HandleProcSignalLogMemoryContext();
+
      if (CheckProcSignal(PROCSIG_BARRIER))
          HandleProcSignalBarrierInterrupt();

The code for memory context logging interrupt came after barrier interrupt
in other places, e.g., procsignal.h. Why is this order of code different?

+/*
+ * pg_log_backend_memory_contexts
+ *        Print memory context of the specified backend process.

Isn't it better to move pg_log_backend_memory_contexts() to mcxtfuncs.c
from mcxt.c because this is the SQL function memory contexts?

IMO we should comment why we allow only superuser to call this function.
What about the following?

-----------------
Signal a backend process to log its memory contexts.

Only superusers are allowed to signal to log the memory contexts
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of service.
-----------------

+    PGPROC        *proc = BackendPidGetProc(pid);
+
+    /* Check whether the target process is PostgreSQL backend process. */
+    if (proc == NULL)

What about adding more comments as follows?

---------------------------------
+    /*
+     * BackendPidGetProc returns NULL if the pid isn't valid; but by the time
+     * we reach kill(), a process for which we get a valid proc here might
+     * have terminated on its own.  There's no way to acquire a lock on an
+     * arbitrary process to prevent that. But since this mechanism is usually
+     * used to debug a backend running and consuming lots of memory,
+     * that it might end on its own first and its memory contexts are not
+     * logged is not a problem.
+     */
+    if (proc == NULL)
+    {
+        /*
+         * This is just a warning so a loop-through-resultset will not abort
+         * if one backend logged its memory contexts during the run.
+         */
+        ereport(WARNING,
---------------------------------

+        ereport(ERROR,
+                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+                 errmsg("must be a superuser to log memory contexts")));
+        PG_RETURN_BOOL(false);

This PG_RETURN_BOOL(false) is unnecessary because ereport() will emit an ERROR?

+$node->psql('postgres', 'select pg_log_backend_memory_contexts(pg_backend_pid())');
+
+my $logfile = slurp_file($node->logfile);
+like($logfile, qr/Grand total: \d+ bytes in \d+ blocks/,
+        'found expected memory context print in the log file');

Isn't there the case where the memory contexts have not been logged yet
when slurp_file() is called? It's rare, though. If there is the risk for that,
we should wait for the memory contexts to be actually logged?

BTW, I agree that we should have the regression test that calls
pg_log_backend_memory_contexts() and checks, e.g., that it doesn't cause
segmentation fault. But I'm just wondering if it's a bit overkill to make perl
scriptand start new node  to test only this function...

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Maksim Milyutin
Date:
Subject: Re: Add client connection check during the execution of the query
Next
From: James Hilliard
Date:
Subject: Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.