Thread: Get memory contexts of an arbitrary backend process

Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
Hi,

After commit 3e98c0bafb28de, we can display the usage of the
memory contexts using pg_backend_memory_contexts system
view.

However, its target is limited to the  process attached to
the current session.

As discussed in the thread[1], it'll be useful to make it
possible to get the memory contexts of an arbitrary backend
process.

Attached PoC patch makes pg_get_backend_memory_contexts()
display meory contexts of the specified PID of the process.


   =# -- PID of the target process is 17051
   =# SELECT * FROM  pg_get_backend_memory_contexts(17051) ;
            name          | ident |      parent      | level | 
total_bytes | total_nblocks | free_bytes | free_chunks | used_bytes
   

-----------------------+-------+------------------+-------+-------------+---------------+------------+-------------+------------
    TopMemoryContext      |       |                  |     0 |       
68720 |             5 |      16816 |          16 |      51904
    RowDescriptionContext |       | TopMemoryContext |     1 |        
8192 |             1 |       6880 |           0 |       1312
    MessageContext        |       | TopMemoryContext |     1 |       
65536 |             4 |      19912 |           1 |      45624
    ...

It doesn't display contexts of all the backends but only
the contexts of specified process.
I think it would be enough because I suppose this function
is used after investigations using ps command or other OS
level utilities.


The rough idea of implementation is like below:

   1. send  a signal to the specified process
   2. signaled process dumps its memory contexts to a file
   3. read the dumped file and display it to the user


Any thoughts?

[1] 
https://www.postgresql.org/message-id/72a656e0f71d0860161e0b3f67e4d771%40oss.nttdata.com


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachment

Re: Get memory contexts of an arbitrary backend process

From
Kasahara Tatsuhito
Date:
Hi,

On Mon, Aug 31, 2020 at 8:22 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
> As discussed in the thread[1], it'll be useful to make it
> possible to get the memory contexts of an arbitrary backend
> process.
+1

> Attached PoC patch makes pg_get_backend_memory_contexts()
> display meory contexts of the specified PID of the process.
Thanks, it's a very good patch for discussion.

> It doesn't display contexts of all the backends but only
> the contexts of specified process.
or we can  "SELECT (pg_get_backend_memory_contexts(pid)).* FROM
pg_stat_activity WHERE ...",
so I don't think it's a big deal.

> The rough idea of implementation is like below:
>
>    1. send  a signal to the specified process
>    2. signaled process dumps its memory contexts to a file
>    3. read the dumped file and display it to the user
I agree with the overview of the idea.
Here are some comments and questions.

- Currently, "the signal transmission for dumping memory information"
and "the read & output of dump information "
  are on the same interface, but I think it would be better to separate them.
  How about providing the following three types of functions for users?
  - send a signal to specified pid
  - check the status of the signal sent and received
  - read the dumped information
- How about managing the status of signal send/receive and dump
operations on a shared hash or others ?
  Sending and receiving signals, dumping memory information, and
referencing dump information all work asynchronously.
  Therefore, it would be good to have management information to check
the status of each process.
  A simple idea is that ..
   - send a signal to dump to a PID, it first record following
information into the shared hash.
     pid (specified pid)
     loc (dump location, currently might be ASAP)
     recv (did the pid process receive a signal? first false)
     dumped (did the pid process dump a mem information?  first false)
   - specified process receive the signal, update the status in the
shared hash, then dumped at specified location.
   - specified process finish dump mem information,  update the status
in the shared hash.
- Does it allow one process to output multiple dump files?
  It appears to be a specification to overwrite at present, but I
thought it would be good to be able to generate
  multiple dump files in different phases (e.g., planning phase and
execution phase) in the future.
- How is the dump file cleaned up?

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com



Re: Get memory contexts of an arbitrary backend process

From
Pavel Stehule
Date:
Hi

po 31. 8. 2020 v 17:03 odesílatel Kasahara Tatsuhito <kasahara.tatsuhito@gmail.com> napsal:
Hi,

On Mon, Aug 31, 2020 at 8:22 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
> As discussed in the thread[1], it'll be useful to make it
> possible to get the memory contexts of an arbitrary backend
> process.
+1

> Attached PoC patch makes pg_get_backend_memory_contexts()
> display meory contexts of the specified PID of the process.
Thanks, it's a very good patch for discussion.

> It doesn't display contexts of all the backends but only
> the contexts of specified process.
or we can  "SELECT (pg_get_backend_memory_contexts(pid)).* FROM
pg_stat_activity WHERE ...",
so I don't think it's a big deal.

> The rough idea of implementation is like below:
>
>    1. send  a signal to the specified process
>    2. signaled process dumps its memory contexts to a file
>    3. read the dumped file and display it to the user
I agree with the overview of the idea.
Here are some comments and questions.

- Currently, "the signal transmission for dumping memory information"
and "the read & output of dump information "
  are on the same interface, but I think it would be better to separate them.
  How about providing the following three types of functions for users?
  - send a signal to specified pid
  - check the status of the signal sent and received
  - read the dumped information
- How about managing the status of signal send/receive and dump
operations on a shared hash or others ?
  Sending and receiving signals, dumping memory information, and
referencing dump information all work asynchronously.
  Therefore, it would be good to have management information to check
the status of each process.
  A simple idea is that ..
   - send a signal to dump to a PID, it first record following
information into the shared hash.
     pid (specified pid)
     loc (dump location, currently might be ASAP)
     recv (did the pid process receive a signal? first false)
     dumped (did the pid process dump a mem information?  first false)
   - specified process receive the signal, update the status in the
shared hash, then dumped at specified location.
   - specified process finish dump mem information,  update the status
in the shared hash.
- Does it allow one process to output multiple dump files?
  It appears to be a specification to overwrite at present, but I
thought it would be good to be able to generate
  multiple dump files in different phases (e.g., planning phase and
execution phase) in the future.
- How is the dump file cleaned up?

 For a very long time there has been similar discussion about taking session query and session execution plans from other sessions.

I am not sure how necessary information is in the memory dump, but I am sure so taking the current execution plan and complete text of the current query is pretty necessary information.

but can be great so this infrastructure can be used for any debugging purpose.

Regards

Pavel


Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


Re: Get memory contexts of an arbitrary backend process

From
Andres Freund
Date:
Hi,

On 2020-08-31 20:22:18 +0900, torikoshia wrote:
> After commit 3e98c0bafb28de, we can display the usage of the
> memory contexts using pg_backend_memory_contexts system
> view.
> 
> However, its target is limited to the  process attached to
> the current session.
> 
> As discussed in the thread[1], it'll be useful to make it
> possible to get the memory contexts of an arbitrary backend
> process.
> 
> Attached PoC patch makes pg_get_backend_memory_contexts()
> display meory contexts of the specified PID of the process.

Awesome!


> It doesn't display contexts of all the backends but only
> the contexts of specified process.
> I think it would be enough because I suppose this function
> is used after investigations using ps command or other OS
> level utilities.

It can be used as a building block if all are needed. Getting the
infrastructure right is the big thing here, I think. Adding more
detailed views on top of that data later is easier.



> diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
> index a2d61302f9..88fb837ecd 100644
> --- a/src/backend/catalog/system_views.sql
> +++ b/src/backend/catalog/system_views.sql
> @@ -555,10 +555,10 @@ REVOKE ALL ON pg_shmem_allocations FROM PUBLIC;
>  REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
>  
>  CREATE VIEW pg_backend_memory_contexts AS
> -    SELECT * FROM pg_get_backend_memory_contexts();
> +    SELECT * FROM pg_get_backend_memory_contexts(-1);

-1 is odd. Why not use NULL or even 0?

> +    else
> +    {
> +        int            rc;
> +        int parent_len = strlen(parent);
> +        int name_len = strlen(name);
> +
> +        /*
> +         * write out the current memory context information.
> +         * Since some elements of values are reusable, we write it out.

Not sure what the second comment line here is supposed to mean?


> +         */
> +        fputc('D', fpout);
> +        rc = fwrite(values, sizeof(values), 1, fpout);
> +        rc = fwrite(nulls, sizeof(nulls), 1, fpout);
> +
> +        /* write out information which is not resuable from serialized values */

s/resuable/reusable/


> +        rc = fwrite(&name_len, sizeof(int), 1, fpout);
> +        rc = fwrite(name, name_len, 1, fpout);
> +        rc = fwrite(&idlen, sizeof(int), 1, fpout);
> +        rc = fwrite(clipped_ident, idlen, 1, fpout);
> +        rc = fwrite(&level, sizeof(int), 1, fpout);
> +        rc = fwrite(&parent_len, sizeof(int), 1, fpout);
> +        rc = fwrite(parent, parent_len, 1, fpout);
> +        (void) rc;                /* we'll check for error with ferror */
> +
> +    }

This format is not descriptive. How about serializing to json or
something? Or at least having field names?

Alternatively, build the same tuple we build for the SRF, and serialize
that. Then there's basically no conversion needed.


> @@ -117,6 +157,8 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
>  Datum
>  pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
>  {
> +    int            pid =  PG_GETARG_INT32(0);
> +
>      ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
>      TupleDesc    tupdesc;
>      Tuplestorestate *tupstore;
> @@ -147,11 +189,258 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
>  
>      MemoryContextSwitchTo(oldcontext);
>  
> -    PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
> -                                TopMemoryContext, NULL, 0);
> +    if (pid == -1)
> +    {
> +        /*
> +         * Since pid -1 indicates target is the local process, simply
> +         * traverse memory contexts.
> +         */
> +        PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
> +                                TopMemoryContext, "", 0, NULL);
> +    }
> +    else
> +    {
> +        /*
> +         * Send signal for dumping memory contexts to the target process,
> +         * and read the dumped file.
> +         */
> +        FILE       *fpin;
> +        char        dumpfile[MAXPGPATH];
> +
> +        SendProcSignal(pid, PROCSIG_DUMP_MEMORY, InvalidBackendId);
> +
> +        snprintf(dumpfile, sizeof(dumpfile), "pg_memusage/%d", pid);
> +
> +        while (true)
> +        {
> +            CHECK_FOR_INTERRUPTS();
> +
> +            pg_usleep(10000L);
> +

Need better signalling back/forth here.



> +/*
> + * dump_memory_contexts
> + *     Dumping local memory contexts to a file.
> + *     This function does not delete the file as it is intended to be read by
> + *     another process.
> + */
> +static void
> +dump_memory_contexts(void)
> +{
> +    FILE       *fpout;
> +    char        tmpfile[MAXPGPATH];
> +    char        dumpfile[MAXPGPATH];
> +
> +    snprintf(tmpfile, sizeof(tmpfile), "pg_memusage/%d.tmp", MyProcPid);
> +    snprintf(dumpfile, sizeof(dumpfile), "pg_memusage/%d", MyProcPid);
> +
> +    /*
> +     * Open a temp file to dump the current memory context.
> +     */
> +    fpout = AllocateFile(tmpfile, PG_BINARY_W);
> +    if (fpout == NULL)
> +    {
> +        ereport(LOG,
> +                (errcode_for_file_access(),
> +                 errmsg("could not write temporary memory context file \"%s\": %m",
> +                        tmpfile)));
> +        return;
> +    }

Probably should be opened with O_CREAT | O_TRUNC?


Greetings,

Andres Freund



Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
On 2020-09-01 03:29, Pavel Stehule wrote:
> Hi
> 
> po 31. 8. 2020 v 17:03 odesílatel Kasahara Tatsuhito
> <kasahara.tatsuhito@gmail.com> napsal:
> 
>> Hi,
>> 
>> On Mon, Aug 31, 2020 at 8:22 PM torikoshia
>> <torikoshia@oss.nttdata.com> wrote:
>>> As discussed in the thread[1], it'll be useful to make it
>>> possible to get the memory contexts of an arbitrary backend
>>> process.
>> +1
>> 
>>> Attached PoC patch makes pg_get_backend_memory_contexts()
>>> display meory contexts of the specified PID of the process.
>> Thanks, it's a very good patch for discussion.
>> 
>>> It doesn't display contexts of all the backends but only
>>> the contexts of specified process.
>> or we can  "SELECT (pg_get_backend_memory_contexts(pid)).* FROM
>> pg_stat_activity WHERE ...",
>> so I don't think it's a big deal.
>> 
>>> The rough idea of implementation is like below:
>>> 
>>> 1. send  a signal to the specified process
>>> 2. signaled process dumps its memory contexts to a file
>>> 3. read the dumped file and display it to the user
>> I agree with the overview of the idea.
>> Here are some comments and questions.

Thanks for the comments!

>> 
>> - Currently, "the signal transmission for dumping memory
>> information"
>> and "the read & output of dump information "
>> are on the same interface, but I think it would be better to
>> separate them.
>> How about providing the following three types of functions for
>> users?
>> - send a signal to specified pid
>> - check the status of the signal sent and received
>> - read the dumped information

Is this for future extensibility to make it possible to get
other information like the current execution plan which was
suggested by Pavel?

If so, I agree with considering extensibility, but I'm not
sure it's necessary whether providing these types of
functions for 'users'.

>> - How about managing the status of signal send/receive and dump
>> operations on a shared hash or others ?
>> Sending and receiving signals, dumping memory information, and
>> referencing dump information all work asynchronously.
>> Therefore, it would be good to have management information to
>> check
>> the status of each process.
>> A simple idea is that ..
>> - send a signal to dump to a PID, it first record following
>> information into the shared hash.
>> pid (specified pid)
>> loc (dump location, currently might be ASAP)
>> recv (did the pid process receive a signal? first false)
>> dumped (did the pid process dump a mem information?  first
>> false)
>> - specified process receive the signal, update the status in the
>> shared hash, then dumped at specified location.
>> - specified process finish dump mem information,  update the
>> status
>> in the shared hash.

Adding management information on shared memory seems necessary
when we want to have more controls over dumping like 'dump
location' or any other information such as 'current execution
plan'.
I'm going to consider this.


>> - Does it allow one process to output multiple dump files?
>> It appears to be a specification to overwrite at present, but I
>> thought it would be good to be able to generate
>> multiple dump files in different phases (e.g., planning phase and
>> execution phase) in the future.
>> - How is the dump file cleaned up?
> 
>  For a very long time there has been similar discussion about taking
> session query and session execution plans from other sessions.
> 
> I am not sure how necessary information is in the memory dump, but I
> am sure so taking the current execution plan and complete text of the
> current query is pretty necessary information.
> 
> but can be great so this infrastructure can be used for any debugging
> purpose.

Thanks!
It would be good if some part of this effort can be an infrastructure
of other debugging.
It may be hard, but I will keep your comment in mind.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

> 
> Regards
> 
> Pavel
> 
>> Best regards,
>> 
>> --
>> Tatsuhito Kasahara
>> kasahara.tatsuhito _at_ gmail.com [1]
> 
> 
> Links:
> ------
> [1] http://gmail.com



Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
Thanks for reviewing!

I'm going to modify the patch according to your comments.

On 2020-09-01 10:54, Andres Freund wrote:
> Hi,
> 
> On 2020-08-31 20:22:18 +0900, torikoshia wrote:
>> After commit 3e98c0bafb28de, we can display the usage of the
>> memory contexts using pg_backend_memory_contexts system
>> view.
>> 
>> However, its target is limited to the  process attached to
>> the current session.
>> 
>> As discussed in the thread[1], it'll be useful to make it
>> possible to get the memory contexts of an arbitrary backend
>> process.
>> 
>> Attached PoC patch makes pg_get_backend_memory_contexts()
>> display meory contexts of the specified PID of the process.
> 
> Awesome!
> 
> 
>> It doesn't display contexts of all the backends but only
>> the contexts of specified process.
>> I think it would be enough because I suppose this function
>> is used after investigations using ps command or other OS
>> level utilities.
> 
> It can be used as a building block if all are needed. Getting the
> infrastructure right is the big thing here, I think. Adding more
> detailed views on top of that data later is easier.
> 
> 
> 
>> diff --git a/src/backend/catalog/system_views.sql 
>> b/src/backend/catalog/system_views.sql
>> index a2d61302f9..88fb837ecd 100644
>> --- a/src/backend/catalog/system_views.sql
>> +++ b/src/backend/catalog/system_views.sql
>> @@ -555,10 +555,10 @@ REVOKE ALL ON pg_shmem_allocations FROM PUBLIC;
>>  REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
>> 
>>  CREATE VIEW pg_backend_memory_contexts AS
>> -    SELECT * FROM pg_get_backend_memory_contexts();
>> +    SELECT * FROM pg_get_backend_memory_contexts(-1);
> 
> -1 is odd. Why not use NULL or even 0?
> 
>> +    else
>> +    {
>> +        int            rc;
>> +        int parent_len = strlen(parent);
>> +        int name_len = strlen(name);
>> +
>> +        /*
>> +         * write out the current memory context information.
>> +         * Since some elements of values are reusable, we write it out.
> 
> Not sure what the second comment line here is supposed to mean?
> 
> 
>> +         */
>> +        fputc('D', fpout);
>> +        rc = fwrite(values, sizeof(values), 1, fpout);
>> +        rc = fwrite(nulls, sizeof(nulls), 1, fpout);
>> +
>> +        /* write out information which is not resuable from serialized 
>> values */
> 
> s/resuable/reusable/
> 
> 
>> +        rc = fwrite(&name_len, sizeof(int), 1, fpout);
>> +        rc = fwrite(name, name_len, 1, fpout);
>> +        rc = fwrite(&idlen, sizeof(int), 1, fpout);
>> +        rc = fwrite(clipped_ident, idlen, 1, fpout);
>> +        rc = fwrite(&level, sizeof(int), 1, fpout);
>> +        rc = fwrite(&parent_len, sizeof(int), 1, fpout);
>> +        rc = fwrite(parent, parent_len, 1, fpout);
>> +        (void) rc;                /* we'll check for error with ferror */
>> +
>> +    }
> 
> This format is not descriptive. How about serializing to json or
> something? Or at least having field names?
> 
> Alternatively, build the same tuple we build for the SRF, and serialize
> that. Then there's basically no conversion needed.
> 
> 
>> @@ -117,6 +157,8 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate 
>> *tupstore,
>>  Datum
>>  pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
>>  {
>> +    int            pid =  PG_GETARG_INT32(0);
>> +
>>      ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
>>      TupleDesc    tupdesc;
>>      Tuplestorestate *tupstore;
>> @@ -147,11 +189,258 @@ 
>> pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
>> 
>>      MemoryContextSwitchTo(oldcontext);
>> 
>> -    PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
>> -                                TopMemoryContext, NULL, 0);
>> +    if (pid == -1)
>> +    {
>> +        /*
>> +         * Since pid -1 indicates target is the local process, simply
>> +         * traverse memory contexts.
>> +         */
>> +        PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
>> +                                TopMemoryContext, "", 0, NULL);
>> +    }
>> +    else
>> +    {
>> +        /*
>> +         * Send signal for dumping memory contexts to the target process,
>> +         * and read the dumped file.
>> +         */
>> +        FILE       *fpin;
>> +        char        dumpfile[MAXPGPATH];
>> +
>> +        SendProcSignal(pid, PROCSIG_DUMP_MEMORY, InvalidBackendId);
>> +
>> +        snprintf(dumpfile, sizeof(dumpfile), "pg_memusage/%d", pid);
>> +
>> +        while (true)
>> +        {
>> +            CHECK_FOR_INTERRUPTS();
>> +
>> +            pg_usleep(10000L);
>> +
> 
> Need better signalling back/forth here.

Do you mean I should also send another signal from the dumped
process to the caller of the pg_get_backend_memory_contexts()
when it finishes dumping?

Regards,



--
Atsushi Torikoshi
NTT DATA CORPORATION

> 
> 
> 
>> +/*
>> + * dump_memory_contexts
>> + *     Dumping local memory contexts to a file.
>> + *     This function does not delete the file as it is intended to be 
>> read by
>> + *     another process.
>> + */
>> +static void
>> +dump_memory_contexts(void)
>> +{
>> +    FILE       *fpout;
>> +    char        tmpfile[MAXPGPATH];
>> +    char        dumpfile[MAXPGPATH];
>> +
>> +    snprintf(tmpfile, sizeof(tmpfile), "pg_memusage/%d.tmp", MyProcPid);
>> +    snprintf(dumpfile, sizeof(dumpfile), "pg_memusage/%d", MyProcPid);
>> +
>> +    /*
>> +     * Open a temp file to dump the current memory context.
>> +     */
>> +    fpout = AllocateFile(tmpfile, PG_BINARY_W);
>> +    if (fpout == NULL)
>> +    {
>> +        ereport(LOG,
>> +                (errcode_for_file_access(),
>> +                 errmsg("could not write temporary memory context file \"%s\": 
>> %m",
>> +                        tmpfile)));
>> +        return;
>> +    }
> 
> Probably should be opened with O_CREAT | O_TRUNC?
> 
> 
> Greetings,
> 
> Andres Freund



Re: Get memory contexts of an arbitrary backend process

From
Kasahara Tatsuhito
Date:
Hi,

On Thu, Sep 3, 2020 at 3:38 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
> >> - Currently, "the signal transmission for dumping memory
> >> information"
> >> and "the read & output of dump information "
> >> are on the same interface, but I think it would be better to
> >> separate them.
> >> How about providing the following three types of functions for
> >> users?
> >> - send a signal to specified pid
> >> - check the status of the signal sent and received
> >> - read the dumped information
>
> Is this for future extensibility to make it possible to get
> other information like the current execution plan which was
> suggested by Pavel?
Yes, but it's not only for future expansion, but also for the
usability and the stability of this feature.
For example, if you want to read one dumped file multiple times and analyze it,
you will want the ability to just read the dump.
Moreover, when it takes a long time from the receive the signal to the
dump output,
or the dump output itself takes a long time, users can investigate
where it takes time
if each process is separated.

> If so, I agree with considering extensibility, but I'm not
> sure it's necessary whether providing these types of
> functions for 'users'.
Of course, it is possible and may be necessary to provide a wrapped
sequence of processes
from sending a signal to reading  dump files.
But IMO, some users would like to perform the signal transmission,
state management and
dump file reading processes separately.

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com



Re: Get memory contexts of an arbitrary backend process

From
Tom Lane
Date:
Kasahara Tatsuhito <kasahara.tatsuhito@gmail.com> writes:
> Yes, but it's not only for future expansion, but also for the
> usability and the stability of this feature.
> For example, if you want to read one dumped file multiple times and analyze it,
> you will want the ability to just read the dump.

If we design it to make that possible, how are we going to prevent disk
space leaks from never-cleaned-up dump files?

            regards, tom lane



Re: Get memory contexts of an arbitrary backend process

From
Kasahara Tatsuhito
Date:
On Fri, Sep 4, 2020 at 2:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kasahara Tatsuhito <kasahara.tatsuhito@gmail.com> writes:
> > Yes, but it's not only for future expansion, but also for the
> > usability and the stability of this feature.
> > For example, if you want to read one dumped file multiple times and analyze it,
> > you will want the ability to just read the dump.
>
> If we design it to make that possible, how are we going to prevent disk
> space leaks from never-cleaned-up dump files?
In my thought, with features such as a view that allows us to see a
list of dumped files,
it would be better to have a function that simply deletes the dump
files associated with a specific PID,
or to delete all dump files.
Some files may be dumped with unexpected delays, so I think the
cleaning feature will be necessary.
( Also, as the pgsql_tmp file, it might better to delete dump files
when PostgreSQL start.)

Or should we try to delete the dump file as soon as we can read it?

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com



Re: Get memory contexts of an arbitrary backend process

From
Tomas Vondra
Date:
On Fri, Sep 04, 2020 at 11:47:30AM +0900, Kasahara Tatsuhito wrote:
>On Fri, Sep 4, 2020 at 2:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Kasahara Tatsuhito <kasahara.tatsuhito@gmail.com> writes:
>> > Yes, but it's not only for future expansion, but also for the
>> > usability and the stability of this feature.
>> > For example, if you want to read one dumped file multiple times and analyze it,
>> > you will want the ability to just read the dump.
>>
>> If we design it to make that possible, how are we going to prevent disk
>> space leaks from never-cleaned-up dump files?
>In my thought, with features such as a view that allows us to see a
>list of dumped files,
>it would be better to have a function that simply deletes the dump
>files associated with a specific PID,
>or to delete all dump files.
>Some files may be dumped with unexpected delays, so I think the
>cleaning feature will be necessary.
>( Also, as the pgsql_tmp file, it might better to delete dump files
>when PostgreSQL start.)
>
>Or should we try to delete the dump file as soon as we can read it?
>

IMO making the cleanup a responsibility of the users (e.g. by exposing
the list of dumped files through a view and expecting users to delete
them in some way) is rather fragile.

I don't quite see what's the point of designing it this way. It was
suggested this improves stability and usability of this feature, but
surely making it unnecessarily complex contradicts both points?

IMHO if the user needs to process the dump repeatedly, what's preventing
him/her from storing it in a file, or something like that? At that point
it's clear it's up to them to remove the file. So I suggest to keep the
feature as simple as possible - hand the dump over and delete.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
On 2020-09-04 21:46, Tomas Vondra wrote:
> On Fri, Sep 04, 2020 at 11:47:30AM +0900, Kasahara Tatsuhito wrote:
>> On Fri, Sep 4, 2020 at 2:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Kasahara Tatsuhito <kasahara.tatsuhito@gmail.com> writes:
>>> > Yes, but it's not only for future expansion, but also for the
>>> > usability and the stability of this feature.
>>> > For example, if you want to read one dumped file multiple times and analyze it,
>>> > you will want the ability to just read the dump.
>>> 
>>> If we design it to make that possible, how are we going to prevent 
>>> disk
>>> space leaks from never-cleaned-up dump files?
>> In my thought, with features such as a view that allows us to see a
>> list of dumped files,
>> it would be better to have a function that simply deletes the dump
>> files associated with a specific PID,
>> or to delete all dump files.
>> Some files may be dumped with unexpected delays, so I think the
>> cleaning feature will be necessary.
>> ( Also, as the pgsql_tmp file, it might better to delete dump files
>> when PostgreSQL start.)
>> 
>> Or should we try to delete the dump file as soon as we can read it?
>> 
> 
> IMO making the cleanup a responsibility of the users (e.g. by exposing
> the list of dumped files through a view and expecting users to delete
> them in some way) is rather fragile.
> 
> I don't quite see what's the point of designing it this way. It was
> suggested this improves stability and usability of this feature, but
> surely making it unnecessarily complex contradicts both points?
> 
> IMHO if the user needs to process the dump repeatedly, what's 
> preventing
> him/her from storing it in a file, or something like that? At that 
> point
> it's clear it's up to them to remove the file. So I suggest to keep the
> feature as simple as possible - hand the dump over and delete.

+1.
If there are no other objections, I'm going to accept this
suggestion.

Regards



Re: Get memory contexts of an arbitrary backend process

From
Kasahara Tatsuhito
Date:
Hi,

On Thu, Sep 10, 2020 at 8:53 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2020-09-04 21:46, Tomas Vondra wrote:
> > On Fri, Sep 04, 2020 at 11:47:30AM +0900, Kasahara Tatsuhito wrote:
> >> On Fri, Sep 4, 2020 at 2:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> Kasahara Tatsuhito <kasahara.tatsuhito@gmail.com> writes:
> >>> > Yes, but it's not only for future expansion, but also for the
> >>> > usability and the stability of this feature.
> >>> > For example, if you want to read one dumped file multiple times and analyze it,
> >>> > you will want the ability to just read the dump.
> >>>
> >>> If we design it to make that possible, how are we going to prevent
> >>> disk
> >>> space leaks from never-cleaned-up dump files?
> >> In my thought, with features such as a view that allows us to see a
> >> list of dumped files,
> >> it would be better to have a function that simply deletes the dump
> >> files associated with a specific PID,
> >> or to delete all dump files.
> >> Some files may be dumped with unexpected delays, so I think the
> >> cleaning feature will be necessary.
> >> ( Also, as the pgsql_tmp file, it might better to delete dump files
> >> when PostgreSQL start.)
> >>
> >> Or should we try to delete the dump file as soon as we can read it?
> >>
> >
> > IMO making the cleanup a responsibility of the users (e.g. by exposing
> > the list of dumped files through a view and expecting users to delete
> > them in some way) is rather fragile.
> >
> > I don't quite see what's the point of designing it this way. It was
> > suggested this improves stability and usability of this feature, but
> > surely making it unnecessarily complex contradicts both points?
> >
> > IMHO if the user needs to process the dump repeatedly, what's
> > preventing
> > him/her from storing it in a file, or something like that? At that
> > point
> > it's clear it's up to them to remove the file. So I suggest to keep the
> > feature as simple as possible - hand the dump over and delete.
Yeah, it might be better  to avoid making the user responsible for removal.

I think it's fine to have an interface to delete in an emergency, but
I agree that
users shouldn't be made aware of the existence or deletion of dump
files, basically.

> +1.
> If there are no other objections, I'm going to accept this
> suggestion.
So +1

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com



Re: Get memory contexts of an arbitrary backend process

From
Michael Paquier
Date:
On Thu, Sep 10, 2020 at 09:11:21PM +0900, Kasahara Tatsuhito wrote:
> I think it's fine to have an interface to delete in an emergency, but
> I agree that
> users shouldn't be made aware of the existence or deletion of dump
> files, basically.

Per the CF bot, the number of tests needs to be tweaked, because we
test each entry filtered out with is_deeply(), meaning that the number
of tests needs to be updated to reflect that if the filtered list is
changed:
t/010_pg_basebackup.pl ... 104/109 # Looks like you planned 109 tests but ran 110.
t/010_pg_basebackup.pl ... Dubious, test returned 255 (wstat 65280, 0xff00)
All 109 subtests passed

Simple enough to fix.
--
Michael

Attachment

Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
Hi,

Thanks for all your comments, I updated the patch.


On Tue, Sep 1, 2020 at 12:03 AM Kasahara Tatsuhito 
<kasahara.tatsuhito@gmail.com> wrote:

> - How about managing the status of signal send/receive and dump
> operations on a shared hash or others ?
> Sending and receiving signals, dumping memory information, and
> referencing dump information all work asynchronously.
> Therefore, it would be good to have management information to
> check
> the status of each process.
> A simple idea is that ..
> - send a signal to dump to a PID, it first record following
> information into the shared hash.
> pid (specified pid)
> loc (dump location, currently might be ASAP)
> recv (did the pid process receive a signal? first false)
> dumped (did the pid process dump a mem information?  first
> false)
> - specified process receive the signal, update the status in the
> shared hash, then dumped at specified location.
> - specified process finish dump mem information,  update the
> status
> in the shared hash.

I added a shared hash table consisted of minimal members
mainly for managing whether the file is dumped or not.
Some members like 'loc' seem useful in the future, but I
haven't added them since it's not essential at this point.



On 2020-09-01 10:54, Andres Freund wrote:

>>  CREATE VIEW pg_backend_memory_contexts AS
>> -    SELECT * FROM pg_get_backend_memory_contexts();
>> +    SELECT * FROM pg_get_backend_memory_contexts(-1);
> 
> -1 is odd. Why not use NULL or even 0?

Changed it from -1 to NULL.

>> +             rc = fwrite(&name_len, sizeof(int), 1, fpout);
>> +             rc = fwrite(name, name_len, 1, fpout);
>> +             rc = fwrite(&idlen, sizeof(int), 1, fpout);
>> +             rc = fwrite(clipped_ident, idlen, 1, fpout);
>> +             rc = fwrite(&level, sizeof(int), 1, fpout);
>> +             rc = fwrite(&parent_len, sizeof(int), 1, fpout);
>> +             rc = fwrite(parent, parent_len, 1, fpout);
>> +             (void) rc;                              /* we'll check 
>> for error with ferror */
>> +
>> +     }
> This format is not descriptive. How about serializing to json or
> something? Or at least having field names?

Added field names for each value.

>> +             while (true)
>> +             {
>> +                     CHECK_FOR_INTERRUPTS();
>> +
>> +                     pg_usleep(10000L);
>> +
> 
> Need better signalling back/forth here.

Added a shared hash table that has a flag for managing whether the file 
is dumped or not.
I modified it to use this flag.

>> +     /*
>> +      * Open a temp file to dump the current memory context.
>> +      */
>> +     fpout = AllocateFile(tmpfile, PG_BINARY_W);
>> +     if (fpout == NULL)
>> +     {
>> +             ereport(LOG,
>> +                             (errcode_for_file_access(),
>> +                              errmsg("could not write temporary 
>> memory context file \"%s\": %m",
>> +                                             tmpfile)));
>> +             return;
>> +     }
> 
> Probably should be opened with O_CREAT | O_TRUNC?

AllocateFile() calls fopen() and AFAIU fopen() with mode "w" corresponds 
to open() with "O_WRONLY | O_CREAT | O_TRUNC".

Do you mean I should not use fopen() here?

On 2020-09-24 13:01, Michael Paquier wrote:
> On Thu, Sep 10, 2020 at 09:11:21PM +0900, Kasahara Tatsuhito wrote:
>> I think it's fine to have an interface to delete in an emergency, but
>> I agree that
>> users shouldn't be made aware of the existence or deletion of dump
>> files, basically.
> 
> Per the CF bot, the number of tests needs to be tweaked, because we
> test each entry filtered out with is_deeply(), meaning that the number
> of tests needs to be updated to reflect that if the filtered list is
> changed:
> t/010_pg_basebackup.pl ... 104/109 # Looks like you planned 109 tests
> but ran 110.
> t/010_pg_basebackup.pl ... Dubious, test returned 255 (wstat 65280, 
> 0xff00)
> All 109 subtests passed
> 
> Simple enough to fix.

Incremented the number of tests.


Any thoughts?

Regards,

-- 
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

Re: Get memory contexts of an arbitrary backend process

From
Kasahara Tatsuhito
Date:
Hi,

On Fri, Sep 25, 2020 at 4:28 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
> Thanks for all your comments, I updated the patch.
Thanks for updating the patch.
I did a brief test and code review.

> I added a shared hash table consisted of minimal members
> mainly for managing whether the file is dumped or not.
> Some members like 'loc' seem useful in the future, but I
> haven't added them since it's not essential at this point.
Yes, that would be good.

+        /*
+         * Since we allow only one session can request to dump
memory context at
+         * the same time, check whether the dump files already exist.
+         */
+        while (stat(dumpfile, &stat_tmp) == 0 || stat(tmpfile, &stat_tmp) == 0)
+        {
+            pg_usleep(1000000L);
+        }

If pg_get_backend_memory_contexts() is executed by two or more
sessions at the same time, it cannot be run exclusively in this way.
Currently it seems to cause a crash when do it so.
This is easy to reproduce and can be done as follows.

[session-1]
BEGIN;
LOCK TABKE t1;
  [Session-2]
  BEGIN;
  LOCK TABLE t1; <- waiting
    [Session-3]
    select * FROM pg_get_backend_memory_contexts(<pid of session-2>);
      [Session-4]
      select * FROM pg_get_backend_memory_contexts(<pid of session-2>);

If you issue commit or abort at session-1, you will get SEGV.

Instead of checking for the existence of the file, it might be better
to use a hash (mcxtdumpHash) entry with LWLock.

+        if (proc == NULL)
+        {
+            ereport(WARNING,
+                    (errmsg("PID %d is not a PostgreSQL server
process", dst_pid)));
+            return (Datum) 1;
+        }

Shouldn't it clear the hash entry before return?

+        /* Wait until target process finished dumping file. */
+        while (!entry->is_dumped)
+        {
+            CHECK_FOR_INTERRUPTS();
+            pg_usleep(10000L);
+        }

If the target is killed and exit before dumping the memory
information, you're in an infinite loop here.
So how about making sure that any process that has to stop before
doing a memory dump changes the status of the hash (entry->is_dumped)
before stopping and the caller detects it?
I'm not sure it's best or not, but you might want to use something
like the on_shmem_exit callback.

In the current design, if the caller stops processing before reading
the dumped file, you will have an orphaned file.
It looks like the following.

[session-1]
BEGIN;
LOCK TABKE t1;
  [Session-2]
  BEGIN;
  LOCK TABLE t1; <- waiting
    [Session-3]
    select * FROM pg_get_backend_memory_contexts(<pid of session-2>);

If you cancel or terminate the session-3, then issue commit or abort
at session-1, you will get orphaned files in pg_memusage.

So if you allow only one session can request to dump file, it could
call pg_memusage_reset() before send the signal in this function.

Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com



Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
> On Thu, Oct 1, 2020 at 4:06 PM Kasahara Tatsuhito 
> <kasahara.tatsuhito@gmail.com> wrote:
> Hi,
> 
> On Fri, Sep 25, 2020 at 4:28 PM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
> > Thanks for all your comments, I updated the patch.
> Thanks for updating the patch.
> I did a brief test and code review.

Thanks for your tests and review!

> > I added a shared hash table consisted of minimal members
> > mainly for managing whether the file is dumped or not.
> > Some members like 'loc' seem useful in the future, but I
> > haven't added them since it's not essential at this point.
> Yes, that would be good.
> 
> +        /*
> +         * Since we allow only one session can request to dump
> memory context at
> +         * the same time, check whether the dump files already exist.
> +         */
> +        while (stat(dumpfile, &stat_tmp) == 0 || stat(tmpfile, 
> &stat_tmp) == 0)
> +        {
> +            pg_usleep(1000000L);
> +        }
> 
> If pg_get_backend_memory_contexts() is executed by two or more
> sessions at the same time, it cannot be run exclusively in this way.
> Currently it seems to cause a crash when do it so.
> This is easy to reproduce and can be done as follows.
> 
> [session-1]
> BEGIN;
> LOCK TABKE t1;
>   [Session-2]
>   BEGIN;
>   LOCK TABLE t1; <- waiting
>     [Session-3]
>     select * FROM pg_get_backend_memory_contexts(<pid of session-2>);
>       [Session-4]
>       select * FROM pg_get_backend_memory_contexts(<pid of session-2>);
> 
> If you issue commit or abort at session-1, you will get SEGV.
> 
> Instead of checking for the existence of the file, it might be better
> to use a hash (mcxtdumpHash) entry with LWLock.

Thanks!
Added a LWLock and changed the way from checking the file existence
to finding the hash entry.

> +        if (proc == NULL)
> +        {
> +            ereport(WARNING,
> +                    (errmsg("PID %d is not a PostgreSQL server
> process", dst_pid)));
> +            return (Datum) 1;
> +        }
> 
> Shouldn't it clear the hash entry before return?

Yeah. added codes for removing the entry.

> 
> +        /* Wait until target process finished dumping file. */
> +        while (!entry->is_dumped)
> +        {
> +            CHECK_FOR_INTERRUPTS();
> +            pg_usleep(10000L);
> +        }
> 
> If the target is killed and exit before dumping the memory
> information, you're in an infinite loop here.
> So how about making sure that any process that has to stop before
> doing a memory dump changes the status of the hash (entry->is_dumped)
> before stopping and the caller detects it?
> I'm not sure it's best or not, but you might want to use something
> like the on_shmem_exit callback.

Thanks for your idea!
Added a callback to change the status of the hash table entry.

Although I think it's necessary to remove this callback when it finished
processing memory dumping, on_shmem_exit() does not seem to have such
a function.
I used before_shmem_exit() since it has a corresponding function to
remove registered callback.
If it's inappropriate, I'm going to add a function removing the
registered callback of on_shmem_exit().

> In the current design, if the caller stops processing before reading
> the dumped file, you will have an orphaned file.
> It looks like the following.
> 
> [session-1]
> BEGIN;
> LOCK TABKE t1;
>   [Session-2]
>   BEGIN;
>   LOCK TABLE t1; <- waiting
>     [Session-3]
>     select * FROM pg_get_backend_memory_contexts(<pid of session-2>);
> 
> If you cancel or terminate the session-3, then issue commit or abort
> at session-1, you will get orphaned files in pg_memusage.
> 
> So if you allow only one session can request to dump file, it could
> call pg_memusage_reset() before send the signal in this function.

Although I'm going to allow only one session per one target process,
I'd like to allow running multiple pg_get_backend_memory_contexts()
which target process is different.

Instead of calling pg_memusage_reset(), I added a callback for
cleaning up orphaned files and the elements of the hash table
using before_shmem_exit() through PG_ENSURE_ERROR_CLEANUP() and
PG_END_ENSURE_ERROR_CLEANUP().

I chose PG_ENSURE_ERROR_CLEANUP() and PG_END_ENSURE_ERROR_CLEANUP()
here since it can handle not only termination but also cancellation.

Any thoughts?


-- 
Atsushi Torikoshi
Attachment

Re: Get memory contexts of an arbitrary backend process

From
Kyotaro Horiguchi
Date:
Wait...

> Attachments: 0003-Enabled-pg_get_backend_memory_contexts-to-collect.patch

For a moment I thought that the number is patch number but the
predecessors are 0002-Enabled..collect.patch and 0001-(same
name). It's not mandatory but we usually do as the follows and it's
the way of git.

v1-0001-Enabled...collect.patch
v2-0001-Enabled...collect.patch

The vn is added by -v option for git-format-patch.

At Thu, 22 Oct 2020 21:32:00 +0900, torikoshia <torikoshia@oss.nttdata.com> wrote in
> > > I added a shared hash table consisted of minimal members
> > > mainly for managing whether the file is dumped or not.
> > > Some members like 'loc' seem useful in the future, but I
> > > haven't added them since it's not essential at this point.
> > Yes, that would be good.
> > +        /*
> > +         * Since we allow only one session can request to dump
> > memory context at
> > +         * the same time, check whether the dump files already exist.
> > +         */
> > +        while (stat(dumpfile, &stat_tmp) == 0 || stat(tmpfile,
> > &stat_tmp) == 0)
> > +        {
> > +            pg_usleep(1000000L);
> > +        }
> > If pg_get_backend_memory_contexts() is executed by two or more
> > sessions at the same time, it cannot be run exclusively in this way.
> > Currently it seems to cause a crash when do it so.
> > This is easy to reproduce and can be done as follows.
> > [session-1]
> > BEGIN;
> > LOCK TABKE t1;
> >   [Session-2]
> >   BEGIN;
> >   LOCK TABLE t1; <- waiting
> >     [Session-3]
> >     select * FROM pg_get_backend_memory_contexts(<pid of session-2>);
> >       [Session-4]
> >       select * FROM pg_get_backend_memory_contexts(<pid of
> > session-2>);
> > If you issue commit or abort at session-1, you will get SEGV.
> > Instead of checking for the existence of the file, it might be better
> > to use a hash (mcxtdumpHash) entry with LWLock.
>
> Thanks!
> Added a LWLock and changed the way from checking the file existence
> to finding the hash entry.

> > +        if (proc == NULL)
> > +        {
> > +            ereport(WARNING,
> > +                    (errmsg("PID %d is not a PostgreSQL server
> > process", dst_pid)));
> > +            return (Datum) 1;
> > +        }
> > Shouldn't it clear the hash entry before return?
>
> Yeah. added codes for removing the entry.

+        entry = AddEntryToMcxtdumpHash(dst_pid);
+
+        /* Check whether the target process is PostgreSQL backend process. */
+        /* TODO: Check also whether backend or not. */
+        proc = BackendPidGetProc(dst_pid);
+
+        if (proc == NULL)
+        {
+            ereport(WARNING,
+                    (errmsg("PID %d is not a PostgreSQL server process", dst_pid)));
+
+            LWLockAcquire(McxtDumpHashLock, LW_EXCLUSIVE);
+
+            if (hash_search(mcxtdumpHash, &dst_pid, HASH_REMOVE, NULL) == NULL)
+                elog(WARNING, "hash table corrupted");
+
+            LWLockRelease(McxtDumpHashLock);
+
+            return (Datum) 1;
+        }

Why do you enter a useles entry then remove it immedately?

+        PG_ENSURE_ERROR_CLEANUP(McxtReqKill, (Datum) Int32GetDatum(dst_pid));
+        {
+            SendProcSignal(dst_pid, PROCSIG_DUMP_MEMORY, InvalidBackendId);

"PROCSIG_DUMP_MEMORY" is somewhat misleading. Hwo about
"PROCSIG_DUMP_MEMCXT" or "PROCSIG_DUMP_MEMORY_CONTEXT"?

I thought that the hash table would prevent multiple reqestors from
making a request at once, but the patch doesn't seem to do that.

+            /* Wait until target process finished dumping file. */
+            while (entry->dump_status == MCXTDUMPSTATUS_NOTYET)

This needs LWLock. And this could read the entry after reused by
another backend if the dumper process is gone. That isn't likely to
happen, but theoretically the other backend may set it to
MCXTDUMPSTATUS_NOTYET inbetween two successive check on the member.

+    /*
+     * Make dump file ends with 'D'.
+     * This is checked by the caller when reading the file.
+     */
+    fputc('E', fpout);

Which is right?

+    fputc('E', fpout);
+
+    CHECK_FOR_INTERRUPTS();

This means that the process accepts another request and rewrite the
file even while the first requester is reading it. And, the file can
be removed by the first requestor before the second requestor can read
it.

+    mcxtdumpHash = ShmemInitHash("mcxtdump hash",
+                               SHMEM_MEMCONTEXT_SIZE,
+                               SHMEM_MEMCONTEXT_SIZE,

The space needed for this hash doesn't seem to be secured. The hash is
sized to 64 entries, so pg_get_backend_memory_contexts() may fail with
ERROR "out of shared memory".

The hash is used only to check if the dump file is completed and if
ended with error. If we need only those, an shared byte array with the
length of max_backend is sufficient.

+        PG_ENSURE_ERROR_CLEANUP(McxtReqKill, (Datum) Int32GetDatum(dst_pid));
+        {
+            SendProcSignal(dst_pid, PROCSIG_DUMP_MEMORY, InvalidBackendId);
+
+            /* Wait until target process finished dumping file. */
+            while (entry->dump_status == MCXTDUMPSTATUS_NOTYET)
+            {
+                CHECK_FOR_INTERRUPTS();
+                pg_usleep(10000L);
+            }
+        }
+        PG_END_ENSURE_ERROR_CLEANUP(McxtReqKill, (Datum) Int32GetDatum(dst_pid));
+
+        if (entry->dump_status == MCXTDUMPSTATUS_ERROR)
+        {
..
+            if (stat(tmpfile, &stat_tmp) == 0)
+                unlink(tmpfile);
+            if (stat(dumpfile, &stat_tmp) == 0)
+                unlink(dumpfile);
...
+            return (Datum) 0;
+        }
+
+        /* Read values from the dumped file and put them on tuplestore. */
+        PutDumpedValuesOnTuplestore(dumpfile, tupstore, tupdesc, dst_pid);

This means that if the function gets sigint before the dumper creates
the file, the dumper can leave a dump file?

> > +        /* Wait until target process finished dumping file. */
> > +        while (!entry->is_dumped)
> > +        {
> > +            CHECK_FOR_INTERRUPTS();
> > +            pg_usleep(10000L);
> > +        }
> > If the target is killed and exit before dumping the memory
> > information, you're in an infinite loop here.
> > So how about making sure that any process that has to stop before
> > doing a memory dump changes the status of the hash (entry->is_dumped)
> > before stopping and the caller detects it?
> > I'm not sure it's best or not, but you might want to use something
> > like the on_shmem_exit callback.
>
> Thanks for your idea!
> Added a callback to change the status of the hash table entry.
>
> Although I think it's necessary to remove this callback when it
> finished
> processing memory dumping, on_shmem_exit() does not seem to have such
> a function.
> I used before_shmem_exit() since it has a corresponding function to
> remove registered callback.
> If it's inappropriate, I'm going to add a function removing the
> registered callback of on_shmem_exit().

This seems to leave a file for the pid.

> > In the current design, if the caller stops processing before reading
> > the dumped file, you will have an orphaned file.
> > It looks like the following.
> > [session-1]
> > BEGIN;
> > LOCK TABKE t1;
> >   [Session-2]
> >   BEGIN;
> >   LOCK TABLE t1; <- waiting
> >     [Session-3]
> >     select * FROM pg_get_backend_memory_contexts(<pid of session-2>);
> > If you cancel or terminate the session-3, then issue commit or abort
> > at session-1, you will get orphaned files in pg_memusage.
> > So if you allow only one session can request to dump file, it could
> > call pg_memusage_reset() before send the signal in this function.
>
> Although I'm going to allow only one session per one target process,
> I'd like to allow running multiple pg_get_backend_memory_contexts()
> which target process is different.
>
> Instead of calling pg_memusage_reset(), I added a callback for
> cleaning up orphaned files and the elements of the hash table
> using before_shmem_exit() through PG_ENSURE_ERROR_CLEANUP() and
> PG_END_ENSURE_ERROR_CLEANUP().
>
> I chose PG_ENSURE_ERROR_CLEANUP() and PG_END_ENSURE_ERROR_CLEANUP()
> here since it can handle not only termination but also cancellation.
>
> Any thoughts?

+/*
+ * pg_memusage_reset
+ *         Remove the memory context dump files.
+ */
+void
+pg_memusage_reset(int pid)

The function name seem to represents somewhat different from what it
does.


I think we might need to step-back to basic design of this feature
since this patch seems to have unhandled corner cases that are
difficult to find.

- Dump file lifecycle or state-transition of the dumper

  Currently the lifecycle of a dump file, or the state-transition of
  the dumper process doesn't seem to be well defined.

  The file is create only by the dumper.
  If the requestor reads the completed file, the reader removes it.

  If the dumper receives a cancel request, the dumper removes it if
  any.

  Of course dumper removes it if it fails to complete the file.

- Better way of signalling between the requestor and the dumper

  I think there's no doubt about request signal.

  About the complete signal, currently the requestor polls on a flag
  in a hash entry.  I'm wondering if we can get rid of polling. The
  problem on doing that is the lack of a means for a dumper to know
  the requestor.  We need to store requestor pid or backend id in the
  shared hash entry.

  By the way, about shared hash entry, it uses about 70kB for only 64
  entries so it seems inefficient than a shared array that has
  MaxBackends entries.  If we used a following array on shared memory,

  struct hoge
  {
    BackendId requestor[MAX_BACKENDS];
    int          status[MAX_BACKENDS];
    LWLock lock;
  };

  This array has the size of 24 * MaxBackends + 16. 24kB for 1000
  backends.  It could be on dsm since this feature is not used
  commonly.


- The way to cancel a request already made. (or management of the dump
  state transition.)

  Cancellation seems to contain some race conditions. But basically
  that could be done by sending a request signal after setting the
  hoge.requestor above to some special value, not needing the third
  type of signal.  The special value should be different from the
  initial state, which signals that the process is accepting a new
  request.

  As the whole, that would looks like the folloing?

------------------------------------------------------------
Successful request.

  Requestor         dumper       state
                    [idle]       initial
  [request] -------------------> requestor pid/backendid
           -signal->
                    [dumping]
           <-signal-[done]
  [read]
  [done]   --------------------> initial

------------------------------------------------------------
On failure, the dumper signals with setting state to initial.
  [request] -------------------> requestor pid/backendid
           -signal->
                    [dumping]
                    [failed]     initial
           <-signal-
     (some other requestor might come meantime.)
  <sees that the requestor is not me>
  [failed]

------------------------------------------------------------
If the requestor wants to cancel the request, it sets the state to
'cancel' then signal.

Requestor         dumper       state
                    [idle]       initial
  [request] -------------------> cancel
                    <if canceled. clean up>
                    [dumping]
                    <if canceled. clean up>
           <-signal-[done]

           -signal-><try to clean up>


Other aspects to cnosider?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center







Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
On 2020-10-23 13:46, Kyotaro Horiguchi wrote:
> Wait...
> 
>> Attachments: 
>> 0003-Enabled-pg_get_backend_memory_contexts-to-collect.patch
> 
> For a moment I thought that the number is patch number but the
> predecessors are 0002-Enabled..collect.patch and 0001-(same
> name). It's not mandatory but we usually do as the follows and it's
> the way of git.
> 
> v1-0001-Enabled...collect.patch
> v2-0001-Enabled...collect.patch
> 
> The vn is added by -v option for git-format-patch.

Sorry for the confusion. I'll follow that way next time.

> At Thu, 22 Oct 2020 21:32:00 +0900, torikoshia
> <torikoshia@oss.nttdata.com> wrote in
>> > > I added a shared hash table consisted of minimal members
>> > > mainly for managing whether the file is dumped or not.
>> > > Some members like 'loc' seem useful in the future, but I
>> > > haven't added them since it's not essential at this point.
>> > Yes, that would be good.
>> > +        /*
>> > +         * Since we allow only one session can request to dump
>> > memory context at
>> > +         * the same time, check whether the dump files already exist.
>> > +         */
>> > +        while (stat(dumpfile, &stat_tmp) == 0 || stat(tmpfile,
>> > &stat_tmp) == 0)
>> > +        {
>> > +            pg_usleep(1000000L);
>> > +        }
>> > If pg_get_backend_memory_contexts() is executed by two or more
>> > sessions at the same time, it cannot be run exclusively in this way.
>> > Currently it seems to cause a crash when do it so.
>> > This is easy to reproduce and can be done as follows.
>> > [session-1]
>> > BEGIN;
>> > LOCK TABKE t1;
>> >   [Session-2]
>> >   BEGIN;
>> >   LOCK TABLE t1; <- waiting
>> >     [Session-3]
>> >     select * FROM pg_get_backend_memory_contexts(<pid of session-2>);
>> >       [Session-4]
>> >       select * FROM pg_get_backend_memory_contexts(<pid of
>> > session-2>);
>> > If you issue commit or abort at session-1, you will get SEGV.
>> > Instead of checking for the existence of the file, it might be better
>> > to use a hash (mcxtdumpHash) entry with LWLock.
>> 
>> Thanks!
>> Added a LWLock and changed the way from checking the file existence
>> to finding the hash entry.
> 
>> > +        if (proc == NULL)
>> > +        {
>> > +            ereport(WARNING,
>> > +                    (errmsg("PID %d is not a PostgreSQL server
>> > process", dst_pid)));
>> > +            return (Datum) 1;
>> > +        }
>> > Shouldn't it clear the hash entry before return?
>> 
>> Yeah. added codes for removing the entry.
> 
> +        entry = AddEntryToMcxtdumpHash(dst_pid);
> +
> +        /* Check whether the target process is PostgreSQL backend process. 
> */
> +        /* TODO: Check also whether backend or not. */
> +        proc = BackendPidGetProc(dst_pid);
> +
> +        if (proc == NULL)
> +        {
> +            ereport(WARNING,
> +                    (errmsg("PID %d is not a PostgreSQL server process", dst_pid)));
> +
> +            LWLockAcquire(McxtDumpHashLock, LW_EXCLUSIVE);
> +
> +            if (hash_search(mcxtdumpHash, &dst_pid, HASH_REMOVE, NULL) == NULL)
> +                elog(WARNING, "hash table corrupted");
> +
> +            LWLockRelease(McxtDumpHashLock);
> +
> +            return (Datum) 1;
> +        }
> 
> Why do you enter a useles entry then remove it immedately?

Do you mean I should check the process existence first
since it enables us to skip entering hash entries?

> 
> +        PG_ENSURE_ERROR_CLEANUP(McxtReqKill, (Datum) 
> Int32GetDatum(dst_pid));
> +        {
> +            SendProcSignal(dst_pid, PROCSIG_DUMP_MEMORY, InvalidBackendId);
> 
> "PROCSIG_DUMP_MEMORY" is somewhat misleading. Hwo about
> "PROCSIG_DUMP_MEMCXT" or "PROCSIG_DUMP_MEMORY_CONTEXT"?

I'll go with "PROCSIG_DUMP_MEMCXT".

> 
> I thought that the hash table would prevent multiple reqestors from
> making a request at once, but the patch doesn't seem to do that.
> 
> +            /* Wait until target process finished dumping file. */
> +            while (entry->dump_status == MCXTDUMPSTATUS_NOTYET)
> 
> This needs LWLock. And this could read the entry after reused by
> another backend if the dumper process is gone. That isn't likely to
> happen, but theoretically the other backend may set it to
> MCXTDUMPSTATUS_NOTYET inbetween two successive check on the member.

Thanks for your notification.
I'll use an LWLock.

> 
> +    /*
> +     * Make dump file ends with 'D'.
> +     * This is checked by the caller when reading the file.
> +     */
> +    fputc('E', fpout);
> 
> Which is right?

Sorry, the comment was wrong..

> 
> +    fputc('E', fpout);
> +
> +    CHECK_FOR_INTERRUPTS();
> 
> This means that the process accepts another request and rewrite the
> file even while the first requester is reading it. And, the file can
> be removed by the first requestor before the second requestor can read
> it.

I added CHECK_FOR_INTERRUPTS() here to make the dump cancellation
possible, however, considering your indication, it needs to think
about a way to handle only the dump cancellation.

> 
> +    mcxtdumpHash = ShmemInitHash("mcxtdump hash",
> +                               SHMEM_MEMCONTEXT_SIZE,
> +                               SHMEM_MEMCONTEXT_SIZE,
> .
> The space needed for this hash doesn't seem to be secured. The hash is
> sized to 64 entries, so pg_get_backend_memory_contexts() may fail with
> ERROR "out of shared memory".
> 
> The hash is used only to check if the dump file is completed and if
> ended with error. If we need only those, an shared byte array with the
> length of max_backend is sufficient.

Although there was a comment that controlling dump location may be a
good idea, but it also seems possible to include the location 
information
in the struct Hoge you suggested below.

On Tue, Sep 1, 2020 at 12:03 AM Kasahara Tatsuhito <[hidden email]> 
wrote:
|   - send a signal to dump to a PID, it first record following
information into the shared hash.
|     pid (specified pid)
|     loc (dump location, currently might be ASAP)


> 
> +        PG_ENSURE_ERROR_CLEANUP(McxtReqKill, (Datum) 
> Int32GetDatum(dst_pid));
> +        {
> +            SendProcSignal(dst_pid, PROCSIG_DUMP_MEMORY, InvalidBackendId);
> +
> +            /* Wait until target process finished dumping file. */
> +            while (entry->dump_status == MCXTDUMPSTATUS_NOTYET)
> +            {
> +                CHECK_FOR_INTERRUPTS();
> +                pg_usleep(10000L);
> +            }
> +        }
> +        PG_END_ENSURE_ERROR_CLEANUP(McxtReqKill, (Datum) 
> Int32GetDatum(dst_pid));
> +
> +        if (entry->dump_status == MCXTDUMPSTATUS_ERROR)
> +        {
> ..
> +            if (stat(tmpfile, &stat_tmp) == 0)
> +                unlink(tmpfile);
> +            if (stat(dumpfile, &stat_tmp) == 0)
> +                unlink(dumpfile);
> ...
> +            return (Datum) 0;
> +        }
> +
> +        /* Read values from the dumped file and put them on tuplestore. */
> +        PutDumpedValuesOnTuplestore(dumpfile, tupstore, tupdesc, dst_pid);
> 
> This means that if the function gets sigint before the dumper creates
> the file, the dumper can leave a dump file?

In that case, the requestor removes the corresponding hash entry in the
callback McxtReqKill, then the dumper who can not find the hash entry
does not dump a file.

However, I've now noticed that when the requestor gets sigint just
after the dumper check, the dump file remains.

In the current design, only the requestor can remove the dump file,
but it seems necessary to allow the dumper to remove it.


>> > +        /* Wait until target process finished dumping file. */
>> > +        while (!entry->is_dumped)
>> > +        {
>> > +            CHECK_FOR_INTERRUPTS();
>> > +            pg_usleep(10000L);
>> > +        }
>> > If the target is killed and exit before dumping the memory
>> > information, you're in an infinite loop here.
>> > So how about making sure that any process that has to stop before
>> > doing a memory dump changes the status of the hash (entry->is_dumped)
>> > before stopping and the caller detects it?
>> > I'm not sure it's best or not, but you might want to use something
>> > like the on_shmem_exit callback.
>> 
>> Thanks for your idea!
>> Added a callback to change the status of the hash table entry.
>> 
>> Although I think it's necessary to remove this callback when it
>> finished
>> processing memory dumping, on_shmem_exit() does not seem to have such
>> a function.
>> I used before_shmem_exit() since it has a corresponding function to
>> remove registered callback.
>> If it's inappropriate, I'm going to add a function removing the
>> registered callback of on_shmem_exit().
> 
> This seems to leave a file for the pid.

As mentioned above, there can be a chance to remain files.

> 
>> > In the current design, if the caller stops processing before reading
>> > the dumped file, you will have an orphaned file.
>> > It looks like the following.
>> > [session-1]
>> > BEGIN;
>> > LOCK TABKE t1;
>> >   [Session-2]
>> >   BEGIN;
>> >   LOCK TABLE t1; <- waiting
>> >     [Session-3]
>> >     select * FROM pg_get_backend_memory_contexts(<pid of session-2>);
>> > If you cancel or terminate the session-3, then issue commit or abort
>> > at session-1, you will get orphaned files in pg_memusage.
>> > So if you allow only one session can request to dump file, it could
>> > call pg_memusage_reset() before send the signal in this function.
>> 
>> Although I'm going to allow only one session per one target process,
>> I'd like to allow running multiple pg_get_backend_memory_contexts()
>> which target process is different.
>> 
>> Instead of calling pg_memusage_reset(), I added a callback for
>> cleaning up orphaned files and the elements of the hash table
>> using before_shmem_exit() through PG_ENSURE_ERROR_CLEANUP() and
>> PG_END_ENSURE_ERROR_CLEANUP().
>> 
>> I chose PG_ENSURE_ERROR_CLEANUP() and PG_END_ENSURE_ERROR_CLEANUP()
>> here since it can handle not only termination but also cancellation.
>> 
>> Any thoughts?
> 
> +/*
> + * pg_memusage_reset
> + *         Remove the memory context dump files.
> + */
> +void
> +pg_memusage_reset(int pid)
> 
> The function name seem to represents somewhat different from what it
> does.

Yeah, It looks like it actually reset the memory.
I'll rename it to remove_memcxt_file or something.

> 
> I think we might need to step-back to basic design of this feature
> since this patch seems to have unhandled corner cases that are
> difficult to find.

Agreed and thanks for writing it down below.


> - Dump file lifecycle or state-transition of the dumper
> 
>   Currently the lifecycle of a dump file, or the state-transition of
>   the dumper process doesn't seem to be well defined.
> 
>   The file is create only by the dumper.
>   If the requestor reads the completed file, the reader removes it.
> 
>   If the dumper receives a cancel request, the dumper removes it if
>   any.


> 
>   Of course dumper removes it if it fails to complete the file.
> 
> - Better way of signalling between the requestor and the dumper
> 
>   I think there's no doubt about request signal.
> 
>   About the complete signal, currently the requestor polls on a flag
>   in a hash entry.  I'm wondering if we can get rid of polling. The
>   problem on doing that is the lack of a means for a dumper to know
>   the requestor.  We need to store requestor pid or backend id in the
>   shared hash entry.

Agreed to get rid of polling.

BTW, it seems common to use a latch instead of pg_usleep() to wait until
signals arrive as far as I read latch.h.
I'm now thinking about using a latch here and it would make polling
removed.

>   By the way, about shared hash entry, it uses about 70kB for only 64
>   entries so it seems inefficient than a shared array that has
>   MaxBackends entries.  If we used a following array on shared memory,
> 
>   struct hoge
>   {
>     BackendId requestor[MAX_BACKENDS];
>     int          status[MAX_BACKENDS];
>     LWLock lock;
>   };

If the requestor's id is 5 and dumper's id is 10,
is this struct used like below?

- hoge.requestor[10] = 5
- Both status[5] and status[10] change like "request", "idle" or "done"

>   This array has the size of 24 * MaxBackends + 16. 24kB for 1000
>   backends.  It could be on dsm since this feature is not used
>   commonly.

Sorry but I'm not sure this calculation.
Do you mean 16.24kB for 1000 backends?

Regarding dsm, do you imagine using hoge on dsm?

> 
> - The way to cancel a request already made. (or management of the dump
>   state transition.)
> 
>   Cancellation seems to contain some race conditions. But basically
>   that could be done by sending a request signal after setting the
>   hoge.requestor above to some special value, not needing the third

I imagined hoge.requestor was set to requestor's backendid.
Isn't it hoge.status?

>   type of signal.  The special value should be different from the
>   initial state, which signals that the process is accepting a new
>   request.
> 
>   As the whole, that would looks like the folloing?
> 
> ------------------------------------------------------------
> Successful request.
> 
>   Requestor         dumper       state
>                     [idle]       initial
>   [request] -------------------> requestor pid/backendid
>            -signal->
>                     [dumping]
>            <-signal-[done]
>   [read]
>   [done]   --------------------> initial
> 
> ------------------------------------------------------------
> On failure, the dumper signals with setting state to initial.
>   [request] -------------------> requestor pid/backendid
>            -signal->
>                     [dumping]
>                     [failed]     initial
>            <-signal-
>      (some other requestor might come meantime.)
>   <sees that the requestor is not me>

Is this "some other requestor come case" relevant to the dumper failure?

Regards,

--
Atsushi Torikoshi

>   [failed]
> 
> ------------------------------------------------------------
> If the requestor wants to cancel the request, it sets the state to
> 'cancel' then signal.
> 
> Requestor         dumper       state
>                     [idle]       initial
>   [request] -------------------> cancel
>                     <if canceled. clean up>
>                     [dumping]
>                     <if canceled. clean up>
>            <-signal-[done]
> 
>            -signal-><try to clean up>
> 
> 
> Other aspects to cnosider?
> 
> regards.



Re: Get memory contexts of an arbitrary backend process

From
Georgios Kokolatos
Date:
Hi,

I noticed that this patch fails on the cfbot.
For this, I changed the status to: 'Waiting on Author'.

Cheers,
//Georgios

The new status of this patch is: Waiting on Author

Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
On 2020-10-28 15:32, torikoshia wrote:
> On 2020-10-23 13:46, Kyotaro Horiguchi wrote:

>> I think we might need to step-back to basic design of this feature
>> since this patch seems to have unhandled corner cases that are
>> difficult to find.

I've written out the basic design below and attached
corresponding patch.

   # Communication flow between the dumper and the requester
   - (1) When REQUESTING memory context dumping, the dumper adds an entry 
to the shared memory. The entry manages the dump state and it is set to 
'REQUESTING'.
   - (2) The dumper sends the signal to the dumper and wait on the latch.
   - (3) The dumper looks into the corresponding shared memory entry and 
changes its state to 'DUMPING'.
   - (4) When the dumper completes dumping, it changes the state to 
'DONE' and set the latch.
   - (5) The dumper reads the dump file and shows it to the user. 
Finally, the dumper removes the dump file and reset the shared memory 
entry.

   # Query cancellation
   - When the requestor cancels dumping, e.g. signaling using ctrl-C, the 
requestor changes the status of the shared memory entry to 'CANCELING'.
   - The dumper checks the status when it tries to change the state to 
'DONE' at (4), and if the state is 'CANCELING', it removes the dump file 
and reset the shared memory entry.

   # Cleanup dump file and the shared memory entry
   - In the normal case, the dumper removes the dump file and resets the 
shared memory entry as described in (5).
   - When something like query cancellation or process termination 
happens on the dumper after (1) and before (3), in other words, the 
state is 'REQUESTING', the requestor does the cleanup.
   - When something happens on the dumper or the requestor after (3) and 
before (4), in other words, the state is 'DUMPING', the dumper does the 
cleanup. Specifically, if the requestor cancels the query, it just 
changes the state to 'CANCELING' and the dumper notices it and cleans up 
things later. OTOH, when the dumper fails to dump, it cleans up the dump 
file and deletes the entry on the shared memory.
   - When something happens on the requestor after (4), i.e., the state 
is 'DONE', the requestor does the cleanup.
   - In the case of receiving SIGKILL or power failure, all dump files 
are removed in the crash recovery process.


Although there was a suggestion that shared memory hash
table should be changed to more efficient structures,
I haven't done it in this patch.
I think it can be treated separately, I'm going to work
on that later.


On 2020-11-11 00:07, Georgios Kokolatos wrote:
> Hi,
> 
> I noticed that this patch fails on the cfbot.
> For this, I changed the status to: 'Waiting on Author'.
> 
> Cheers,
> //Georgios
> 
> The new status of this patch is: Waiting on Author

Thanks for your notification and updated the patch.
Changed the status to: 'Waiting on Author'.

Regards,

--
Atsushi Torikoshi
Attachment

Re: Get memory contexts of an arbitrary backend process

From
Fujii Masao
Date:

On 2020/11/16 19:58, torikoshia wrote:
> On 2020-10-28 15:32, torikoshia wrote:
>> On 2020-10-23 13:46, Kyotaro Horiguchi wrote:
> 
>>> I think we might need to step-back to basic design of this feature
>>> since this patch seems to have unhandled corner cases that are
>>> difficult to find.
> 
> I've written out the basic design below and attached
> corresponding patch.

I'm starting to study how this feature behaves. At first, when I executed
the following query, the function never returned. ISTM that since
the autovacuum launcher cannot respond to the request of memory
contexts dump, the function keeps waiting infinity. Is this a bug?
Probably we should exclude non-backend proceses from the target
processes to dump? Sorry if this was already discussed.

     SELECT pg_get_backend_memory_contexts(pid) FROM pg_stat_activity;


Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Get memory contexts of an arbitrary backend process

From
Tom Lane
Date:
Fujii Masao <masao.fujii@oss.nttdata.com> writes:
> I'm starting to study how this feature behaves. At first, when I executed
> the following query, the function never returned. ISTM that since
> the autovacuum launcher cannot respond to the request of memory
> contexts dump, the function keeps waiting infinity. Is this a bug?
> Probably we should exclude non-backend proceses from the target
> processes to dump? Sorry if this was already discussed.

>      SELECT pg_get_backend_memory_contexts(pid) FROM pg_stat_activity;

FWIW, I think this patch is fundamentally unsafe.  It's got a
lot of the same problems that I complained about w.r.t. the
nearby proposal to allow cross-backend stack trace dumping.
It does avoid the trap of thinking that it can do work in
a signal handler, but instead it supposes that it can do
work involving very high-level objects such as shared hash tables
in anyplace that might execute CHECK_FOR_INTERRUPTS.  That's
never going to be safe: the only real expectation the system
has is that CHECK_FOR_INTERRUPTS is called at places where our
state is sane enough that a transaction abort can clean up.
Trying to do things like taking LWLocks is going to lead to
deadlocks or worse.  We need not even get into the hard questions,
such as what happens when one process or the other exits
unexpectedly.

I also find the idea that this should be the same SQL function
as pg_get_backend_memory_contexts to be a seriously bad decision.
That means that it's not possible to GRANT the right to examine
only your own process's memory --- with this proposal, that means
granting the right to inspect every other process as well.

Beyond that, the fact that there's no way to restrict the capability
to just, say, other processes owned by the same user means that
it's not really safe to GRANT to non-superusers anyway.  Even with
such a restriction added, things are problematic, since for example
it would be possible to inquire into the workings of a
security-definer function executing in another process that
nominally is owned by your user.

Between the security and implementation issues here, I really
think we'd be best advised to just reject the concept, period.

            regards, tom lane



Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
On 2020-12-03 10:36, Tom Lane wrote:
> Fujii Masao <masao.fujii@oss.nttdata.com> writes:
>> I'm starting to study how this feature behaves. At first, when I 
>> executed
>> the following query, the function never returned. ISTM that since
>> the autovacuum launcher cannot respond to the request of memory
>> contexts dump, the function keeps waiting infinity. Is this a bug?
>> Probably we should exclude non-backend proceses from the target
>> processes to dump? Sorry if this was already discussed.
> 
>>      SELECT pg_get_backend_memory_contexts(pid) FROM pg_stat_activity;

Thanks for trying it!

It was not discussed explicitly, and I was going to do it later
as commented.

> +        /* TODO: Check also whether backend or not. */

> 
> FWIW, I think this patch is fundamentally unsafe.  It's got a
> lot of the same problems that I complained about w.r.t. the
> nearby proposal to allow cross-backend stack trace dumping.
> It does avoid the trap of thinking that it can do work in
> a signal handler, but instead it supposes that it can do
> work involving very high-level objects such as shared hash tables
> in anyplace that might execute CHECK_FOR_INTERRUPTS.  That's
> never going to be safe: the only real expectation the system
> has is that CHECK_FOR_INTERRUPTS is called at places where our
> state is sane enough that a transaction abort can clean up.
> Trying to do things like taking LWLocks is going to lead to
> deadlocks or worse.  We need not even get into the hard questions,
> such as what happens when one process or the other exits
> unexpectedly.

Thanks for reviewing!

I may misunderstand something, but the dumper works not at
CHECK_FOR_INTERRUPTS but during the client read, i.e.,
ProcessClientReadInterrupt().

Is it also unsafe?


BTW, since there was a comment that the shared hash table
used too much memory, I'm now rewriting this patch not to use
the shared hash table but a simpler static shared memory struct.

> I also find the idea that this should be the same SQL function
> as pg_get_backend_memory_contexts to be a seriously bad decision.
> That means that it's not possible to GRANT the right to examine
> only your own process's memory --- with this proposal, that means
> granting the right to inspect every other process as well.
> 
> Beyond that, the fact that there's no way to restrict the capability
> to just, say, other processes owned by the same user means that
> it's not really safe to GRANT to non-superusers anyway.  Even with
> such a restriction added, things are problematic, since for example
> it would be possible to inquire into the workings of a
> security-definer function executing in another process that
> nominally is owned by your user.

I'm going to change the function name and restrict the executor to
superusers. Is it enough?


Regards,



Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
On 2020-12-04 19:16, torikoshia wrote:
> On 2020-12-03 10:36, Tom Lane wrote:
>> Fujii Masao <masao.fujii@oss.nttdata.com> writes:
>>> I'm starting to study how this feature behaves. At first, when I 
>>> executed
>>> the following query, the function never returned. ISTM that since
>>> the autovacuum launcher cannot respond to the request of memory
>>> contexts dump, the function keeps waiting infinity. Is this a bug?
>>> Probably we should exclude non-backend proceses from the target
>>> processes to dump? Sorry if this was already discussed.
>> 
>>>      SELECT pg_get_backend_memory_contexts(pid) FROM 
>>> pg_stat_activity;
> 
> Thanks for trying it!
> 
> It was not discussed explicitly, and I was going to do it later
> as commented.
> 
>> +        /* TODO: Check also whether backend or not. */
> 
>> 
>> FWIW, I think this patch is fundamentally unsafe.  It's got a
>> lot of the same problems that I complained about w.r.t. the
>> nearby proposal to allow cross-backend stack trace dumping.
>> It does avoid the trap of thinking that it can do work in
>> a signal handler, but instead it supposes that it can do
>> work involving very high-level objects such as shared hash tables
>> in anyplace that might execute CHECK_FOR_INTERRUPTS.  That's
>> never going to be safe: the only real expectation the system
>> has is that CHECK_FOR_INTERRUPTS is called at places where our
>> state is sane enough that a transaction abort can clean up.
>> Trying to do things like taking LWLocks is going to lead to
>> deadlocks or worse.  We need not even get into the hard questions,
>> such as what happens when one process or the other exits
>> unexpectedly.
> 
> Thanks for reviewing!
> 
> I may misunderstand something, but the dumper works not at
> CHECK_FOR_INTERRUPTS but during the client read, i.e.,
> ProcessClientReadInterrupt().
> 
> Is it also unsafe?
> 
> 
> BTW, since there was a comment that the shared hash table
> used too much memory, I'm now rewriting this patch not to use
> the shared hash table but a simpler static shared memory struct.

Attached a rewritten patch.

Accordingly, I also slightly modified the basic design as below.

---
# Communication flow between the dumper and the requester
- (1) When requesting memory context dumping, the dumper changes
the struct on the shared memory state from 'ACCEPTABLE' to
'REQUESTING'.
- (2) The dumper sends the signal to the dumper process and wait on
the latch.
- (3) When the dumper noticed the signal, it changes the state to
'DUMPING'.
- (4) When the dumper completes dumping, it changes the state to
'DONE' and set the latch.
- (5) The requestor reads the dump file and shows it to the user.
Finally, the requestor removes the dump file and reset the shared
memory state to 'ACCEPTABLE'.

# Query cancellation
- When the requestor cancels dumping, e.g. signaling using ctrl-C,
the requestor changes the state of the shared memory to 'CANCELING'.
- The dumper checks the state when it tries to change the state to
'DONE' at (4), and if the state is 'CANCELING', it initializes the
dump file and reset the shared memory state to 'ACCEPTABLE'.

# Cleanup dump file and the shared memory
- In the normal case, the dumper removes the dump file and resets
the shared memory entry as described in (5).
- When something like query cancellation or process termination
happens on the dumper after (1) and before (3), in other words,
the state is 'REQUESTING', the requestor does the cleanup.
- When something happens on the dumper or the requestor after (3)
and before (4), in other words, the state is 'DUMPING', the dumper
does the cleanup. Specifically, if the requestor cancels the query,
it just changes the state to 'CANCELING' and the dumper notices it
and cleans up things later.
OTOH, when the dumper fails to dump, it cleans up the dump file and
reset the shared memory state.
- When something happens on the requestor after (4), i.e., the state
is 'DONE', the requestor does the cleanup.
- In the case of receiving SIGKILL or power failure, all dump files
are removed in the crash recovery process.
---


> 
>> I also find the idea that this should be the same SQL function
>> as pg_get_backend_memory_contexts to be a seriously bad decision.
>> That means that it's not possible to GRANT the right to examine
>> only your own process's memory --- with this proposal, that means
>> granting the right to inspect every other process as well.
>> 
>> Beyond that, the fact that there's no way to restrict the capability
>> to just, say, other processes owned by the same user means that
>> it's not really safe to GRANT to non-superusers anyway.  Even with
>> such a restriction added, things are problematic, since for example
>> it would be possible to inquire into the workings of a
>> security-definer function executing in another process that
>> nominally is owned by your user.
> 
> I'm going to change the function name and restrict the executor to
> superusers. Is it enough?

In the attached patch, I changed the function name to
pg_get_target_backend_memory_contexts() for now.


Regards,

Attachment

Re: Get memory contexts of an arbitrary backend process

From
Kasahara Tatsuhito
Date:
Hi,

On Thu, Dec 10, 2020 at 10:48 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2020-12-04 19:16, torikoshia wrote:
> > On 2020-12-03 10:36, Tom Lane wrote:
> >> Fujii Masao <masao.fujii@oss.nttdata.com> writes:
> >>> I'm starting to study how this feature behaves. At first, when I
> >>> executed
> >>> the following query, the function never returned. ISTM that since
> >>> the autovacuum launcher cannot respond to the request of memory
> >>> contexts dump, the function keeps waiting infinity. Is this a bug?
> >>> Probably we should exclude non-backend proceses from the target
> >>> processes to dump? Sorry if this was already discussed.
> >>
> >>>      SELECT pg_get_backend_memory_contexts(pid) FROM
> >>> pg_stat_activity;
> >
> > Thanks for trying it!
> >
> > It was not discussed explicitly, and I was going to do it later
> > as commented.
> >
> >> +            /* TODO: Check also whether backend or not. */
> >
> >>
> >> FWIW, I think this patch is fundamentally unsafe.  It's got a
> >> lot of the same problems that I complained about w.r.t. the
> >> nearby proposal to allow cross-backend stack trace dumping.
> >> It does avoid the trap of thinking that it can do work in
> >> a signal handler, but instead it supposes that it can do
> >> work involving very high-level objects such as shared hash tables
> >> in anyplace that might execute CHECK_FOR_INTERRUPTS.  That's
> >> never going to be safe: the only real expectation the system
> >> has is that CHECK_FOR_INTERRUPTS is called at places where our
> >> state is sane enough that a transaction abort can clean up.
> >> Trying to do things like taking LWLocks is going to lead to
> >> deadlocks or worse.  We need not even get into the hard questions,
> >> such as what happens when one process or the other exits
> >> unexpectedly.
> >
> > Thanks for reviewing!
> >
> > I may misunderstand something, but the dumper works not at
> > CHECK_FOR_INTERRUPTS but during the client read, i.e.,
> > ProcessClientReadInterrupt().
> >
> > Is it also unsafe?
> >
> >
> > BTW, since there was a comment that the shared hash table
> > used too much memory, I'm now rewriting this patch not to use
> > the shared hash table but a simpler static shared memory struct.
>
> Attached a rewritten patch.
Thanks for updating patch.

But when I had applyed the patch to the current HEAD and did make, I
got an error due to duplicate OIDs.
You need to rebase the patch.

> Accordingly, I also slightly modified the basic design as below.
>
> ---
> # Communication flow between the dumper and the requester
> - (1) When requesting memory context dumping, the dumper changes
> the struct on the shared memory state from 'ACCEPTABLE' to
> 'REQUESTING'.
> - (2) The dumper sends the signal to the dumper process and wait on
> the latch.
> - (3) When the dumper noticed the signal, it changes the state to
> 'DUMPING'.
> - (4) When the dumper completes dumping, it changes the state to
> 'DONE' and set the latch.
> - (5) The requestor reads the dump file and shows it to the user.
> Finally, the requestor removes the dump file and reset the shared
> memory state to 'ACCEPTABLE'.
>
> # Query cancellation
> - When the requestor cancels dumping, e.g. signaling using ctrl-C,
> the requestor changes the state of the shared memory to 'CANCELING'.
> - The dumper checks the state when it tries to change the state to
> 'DONE' at (4), and if the state is 'CANCELING', it initializes the
> dump file and reset the shared memory state to 'ACCEPTABLE'.
>
> # Cleanup dump file and the shared memory
> - In the normal case, the dumper removes the dump file and resets
> the shared memory entry as described in (5).
> - When something like query cancellation or process termination
> happens on the dumper after (1) and before (3), in other words,
> the state is 'REQUESTING', the requestor does the cleanup.
> - When something happens on the dumper or the requestor after (3)
> and before (4), in other words, the state is 'DUMPING', the dumper
> does the cleanup. Specifically, if the requestor cancels the query,
> it just changes the state to 'CANCELING' and the dumper notices it
> and cleans up things later.
> OTOH, when the dumper fails to dump, it cleans up the dump file and
> reset the shared memory state.
> - When something happens on the requestor after (4), i.e., the state
> is 'DONE', the requestor does the cleanup.
> - In the case of receiving SIGKILL or power failure, all dump files
> are removed in the crash recovery process.
> ---
If the dumper is terminated before it dumps, the requestor will appear
to enter an
infinite loop because the status of mcxtdumpShmem will not change.
The following are the steps to reproduce.

 - session1
   BEGIN; LOCK TABLE t;
   - session2
     SELECT * FROM t; -- wait
     - session3
       select pg_get_target_backend_memory_contexts(<pid of session2>); -- wait
 - session1
   select pg_terminate_backend(<pid of session2>); -- kill session2
     - session3 waits forever.

Therefore, you may need to set mcxtdumpShmem->dump_status to
MCXTDUMPSTATUS_CANCELING
or other status before the dumper terminates.

Also, although I have not been able to reproduce it, I believe that
with the current design,
if the requestor disappears right after the dumper dumps the memory information,
the dump file will remain.
Since the current design appears to allow only one requestor per
instance, when the requestor
requests a dump, it might be a good idea to delete any remaining dump
files, if any.

The following are comments on the code.

+   proc = BackendPidGetProc(dst_pid);
+
+   if (proc == NULL)
+   {
+       ereport(WARNING,
+               (errmsg("PID %d is not a PostgreSQL server process", dst_pid)));
+
+       return (Datum) 1;
+   }
For now, background writer, checkpointer and WAL writer are belong to
the auxiliary process.
Therefore, if we specify the PIDs of these processes for
pg_get_target_backend_memory_contexts(),
"PID xxxx is not a PostgreSQL server process" whould be outoput.
This confuses the user.
How about use AuxiliaryPidGetProc() to determine these processes?

+               ereport(INFO,
+                   (errmsg("The request has failed and now PID %d is
requsting dumping.",
+                       mcxtdumpShmem->src_pid)));
+
+               LWLockRelease(McxtDumpLock);
You can release LWLock before ereport.

+   Assert(mcxtdumpShmem->dump_status = MCXTDUMPSTATUS_REQUESTING);
typo?
It might be "mcxtdumpShmem->dump_status == MCXTDUMPSTATUS_REQUESTING".

+           ereport(INFO,
+                   (errmsg("PID %d is looked up by another process", pid)));
This message is not accurate.
Because, in the current design, only one session can request a dump,
so if there are multiple requests, this message will be output even if the
request is for a PID that is not looked up.

+   if (fpout == NULL)
+   {
+       ereport(LOG,
+               (errcode_for_file_access(),
+                errmsg("could not write memory context file \"%s\": %m",
+                       dumpfile)));
It would be better to use "dump file" instead of "memory context file".

+static void
+McxtReqKill(int code, Datum arg)
The function McxtReqKill looks like it should have a different name.
(Since it doesn't do any kill).
How about McxtReqcCleanup or something similar?

Best regards,

> >> I also find the idea that this should be the same SQL function
> >> as pg_get_backend_memory_contexts to be a seriously bad decision.
> >> That means that it's not possible to GRANT the right to examine
> >> only your own process's memory --- with this proposal, that means
> >> granting the right to inspect every other process as well.
> >>
> >> Beyond that, the fact that there's no way to restrict the capability
> >> to just, say, other processes owned by the same user means that
> >> it's not really safe to GRANT to non-superusers anyway.  Even with
> >> such a restriction added, things are problematic, since for example
> >> it would be possible to inquire into the workings of a
> >> security-definer function executing in another process that
> >> nominally is owned by your user.
> >
> > I'm going to change the function name and restrict the executor to
> > superusers. Is it enough?
>
> In the attached patch, I changed the function name to
> pg_get_target_backend_memory_contexts() for now.
>
>
> Regards,



-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com



Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
On Fri, Dec 25, 2020 at 6:08 PM Kasahara Tatsuhito 
<kasahara.tatsuhito@gmail.com> wrote:

Thanks for reviewing and kind suggestion!

>> Attached a rewritten patch.
> Thanks for updating patch.
> 
> But when I had applyed the patch to the current HEAD and did make, I
> got an error due to duplicate OIDs.
> You need to rebase the patch.

Assigned another OID.

>> Accordingly, I also slightly modified the basic design as below.
>> 
>> ---
>> # Communication flow between the dumper and the requester
>> - (1) When requesting memory context dumping, the dumper changes
>> the struct on the shared memory state from 'ACCEPTABLE' to
>> 'REQUESTING'.
>> - (2) The dumper sends the signal to the dumper process and wait on
>> the latch.
>> - (3) When the dumper noticed the signal, it changes the state to
>> 'DUMPING'.
>> - (4) When the dumper completes dumping, it changes the state to
>> 'DONE' and set the latch.
>> - (5) The requestor reads the dump file and shows it to the user.
>> Finally, the requestor removes the dump file and reset the shared
>> memory state to 'ACCEPTABLE'.
>> 
>> # Query cancellation
>> - When the requestor cancels dumping, e.g. signaling using ctrl-C,
>> the requestor changes the state of the shared memory to 'CANCELING'.
>> - The dumper checks the state when it tries to change the state to
>> 'DONE' at (4), and if the state is 'CANCELING', it initializes the
>> dump file and reset the shared memory state to 'ACCEPTABLE'.
>> 
>> # Cleanup dump file and the shared memory
>> - In the normal case, the dumper removes the dump file and resets
>> the shared memory entry as described in (5).
>> - When something like query cancellation or process termination
>> happens on the dumper after (1) and before (3), in other words,
>> the state is 'REQUESTING', the requestor does the cleanup.
>> - When something happens on the dumper or the requestor after (3)
>> and before (4), in other words, the state is 'DUMPING', the dumper
>> does the cleanup. Specifically, if the requestor cancels the query,
>> it just changes the state to 'CANCELING' and the dumper notices it
>> and cleans up things later.
>> OTOH, when the dumper fails to dump, it cleans up the dump file and
>> reset the shared memory state.
>> - When something happens on the requestor after (4), i.e., the state
>> is 'DONE', the requestor does the cleanup.
>> - In the case of receiving SIGKILL or power failure, all dump files
>> are removed in the crash recovery process.
>> ---
> If the dumper is terminated before it dumps, the requestor will appear
> to enter an
> infinite loop because the status of mcxtdumpShmem will not change.
> The following are the steps to reproduce.
> 
>  - session1
>    BEGIN; LOCK TABLE t;
>    - session2
>      SELECT * FROM t; -- wait
>      - session3
>        select pg_get_target_backend_memory_contexts(<pid of session2>); 
> -- wait
>  - session1
>    select pg_terminate_backend(<pid of session2>); -- kill session2
>      - session3 waits forever.
> 
> Therefore, you may need to set mcxtdumpShmem->dump_status to
> MCXTDUMPSTATUS_CANCELING
> or other status before the dumper terminates.

In this case, it may be difficult for the dumper to change dump_status 
because
it's waiting for latch and dump_memory_contexts() is not called yet.

Instead, it's possible for the requestor to check the existence of the 
dumper
process periodically during waiting.
I added this logic to the attached patch.


> Also, although I have not been able to reproduce it, I believe that
> with the current design,
> if the requestor disappears right after the dumper dumps the memory 
> information,
> the dump file will remain.
> Since the current design appears to allow only one requestor per
> instance, when the requestor
> requests a dump, it might be a good idea to delete any remaining dump
> files, if any.

Although I'm not sure when the dump file remains, deleting any remaining 
dump
files seems good for safety.
I also added this idea to the attached patch.


> The following are comments on the code.
> 
> +   proc = BackendPidGetProc(dst_pid);
> +
> +   if (proc == NULL)
> +   {
> +       ereport(WARNING,
> +               (errmsg("PID %d is not a PostgreSQL server process", 
> dst_pid)));
> +
> +       return (Datum) 1;
> +   }
> For now, background writer, checkpointer and WAL writer are belong to
> the auxiliary process.
> Therefore, if we specify the PIDs of these processes for
> pg_get_target_backend_memory_contexts(),
> "PID xxxx is not a PostgreSQL server process" whould be outoput.
> This confuses the user.
> How about use AuxiliaryPidGetProc() to determine these processes?

Thanks and I modified the patch to output the below message when it's an
auxiliary process.

| PID %d is not a PostgreSQL backend process but an auxiliary process.

> 
> +               ereport(INFO,
> +                   (errmsg("The request has failed and now PID %d is
> requsting dumping.",
> +                       mcxtdumpShmem->src_pid)));
> +
> +               LWLockRelease(McxtDumpLock);
> You can release LWLock before ereport.

Modified to release the lock before ereport.

> +   Assert(mcxtdumpShmem->dump_status = MCXTDUMPSTATUS_REQUESTING);
> typo?
> It might be "mcxtdumpShmem->dump_status == MCXTDUMPSTATUS_REQUESTING".

Ops, it's a serious typo. Fixed it.

> +           ereport(INFO,
> +                   (errmsg("PID %d is looked up by another process", 
> pid)));
> This message is not accurate.
> Because, in the current design, only one session can request a dump,
> so if there are multiple requests, this message will be output even if 
> the
> request is for a PID that is not looked up.

Exactly. Modified the message as below.

| Only one session can dump at a time and another session is dumping 
currently.

> +   if (fpout == NULL)
> +   {
> +       ereport(LOG,
> +               (errcode_for_file_access(),
> +                errmsg("could not write memory context file \"%s\": 
> %m",
> +                       dumpfile)));
> It would be better to use "dump file" instead of "memory context file".
> 
> +static void
> +McxtReqKill(int code, Datum arg)
> The function McxtReqKill looks like it should have a different name.
> (Since it doesn't do any kill).
> How about McxtReqcCleanup or something similar?

Thanks for your idea and I changed the name to McxtReqCleanup.


Regards,
Attachment

Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
v7 that fixes recent conflicts.

It also changed the behavior of requestor when another requestor is
already working for simplicity.
In this case, v6 patch makes the requestor wait. v7 patch makes the
requestor quit.


Regards,

--
Atsushi Torikoshi
Attachment

Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
Since pg_get_target_backend_memory_contexts() waits to dump memory and
it could lead dead lock as below.

   - session1
   BEGIN; TRUNCATE t;

   - session2
   BEGIN; TRUNCATE t; -- wait

   - session1
   SELECT * FROM pg_get_target_backend_memory_contexts(<pid of session 
2>); --wait


Thanks for notifying me, Fujii-san.


Attached v8 patch that prohibited calling the function inside 
transactions.


Regards,

--
Atsushi Torikoshi
Attachment

Re: Get memory contexts of an arbitrary backend process

From
Fujii Masao
Date:

On 2021/03/17 22:24, torikoshia wrote:
> I remade the patch and introduced a function
> pg_print_backend_memory_contexts(PID) which prints the memory contexts of
> the specified PID to elog.

Thanks for the patch!


>    =# SELECT pg_print_backend_memory_contexts(450855);
> 
>    ** log output **
>    2021-03-17 15:21:01.942 JST [450855] LOG:  Printing memory contexts of PID 450855
>    2021-03-17 15:21:01.942 JST [450855] LOG:  level: 0 TopMemoryContext: 68720 total in 5 blocks; 16312 free (15
chunks);52408 used
 
>    2021-03-17 15:21:01.942 JST [450855] LOG:  level: 1 Prepared Queries: 65536 total in 4 blocks; 35088 free (14
chunks);30448 used
 
>    2021-03-17 15:21:01.942 JST [450855] LOG:  level: 1 pgstat TabStatusArray lookup hash table: 8192 total in 1
blocks;1408 free (0 chunks); 6784 used
 
>    ..(snip)..
>    2021-03-17 15:21:01.942 JST [450855] LOG:  level: 2 CachedPlanSource: 4096 total in 3 blocks; 680 free (0 chunks);
3416used: PREPARE hoge_200 AS SELECT * FROM pgbench_accounts WHERE aid = 1111111111111111111111111111111111111...
 
>    2021-03-17 15:21:01.942 JST [450855] LOG:  level: 3 CachedPlanQuery: 4096 total in 3 blocks; 464 free (0 chunks);
3632used
 
>    ..(snip)..
>    2021-03-17 15:21:01.945 JST [450855] LOG:  level: 1 Timezones: 104128 total in 2 blocks; 2584 free (0 chunks);
101544used
 
>    2021-03-17 15:21:01.945 JST [450855] LOG:  level: 1 ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks);
264used
 
>    2021-03-17 15:21:01.945 JST [450855] LOG:  Grand total: 2802080 bytes in 1399 blocks; 480568 free (178 chunks);
2321512used
 
> 
> 
> As above, the output is almost the same as MemoryContextStatsPrint()
> except for the way of expression of the level.
> MemoryContextStatsPrint() uses indents, but
> pg_print_backend_memory_contexts() writes it as "level: %d".

This format looks better to me.


> Since there was discussion about enlarging StringInfo may cause
> errors on OOM[1], this patch calls elog for each context.
> 
> As with MemoryContextStatsPrint(), each context shows 100
> children at most.
> I once thought it should be configurable, but something like
> pg_print_backend_memory_contexts(PID, num_children) needs to send
> the 'num_children' from requestor to dumper and it seems to require
> another infrastructure.
> Creating a new GUC for this seems overkill.
> If MemoryContextStatsPrint(), i.e. showing 100 children at most is
> enough, this hard limit may be acceptable.

Can't this number be passed via shared memory?


> Only superusers can call pg_print_backend_memory_contexts().

+    /* Only allow superusers to signal superuser-owned backends. */
+    if (superuser_arg(proc->roleId) && !superuser())

The patch seems to allow even non-superuser to request to print the memory
contexts if the target backend is owned by non-superuser. Is this intentional?
I think that only superuser should be allowed to execute
pg_print_backend_memory_contexts() whoever owns the target backend.
Because that function can cause lots of log messages.


> I'm going to add documentation and regression tests.

Thanks!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
On 2021-03-18 15:09, Fujii Masao wrote:

Thanks for your comments!

> On 2021/03/17 22:24, torikoshia wrote:
>> I remade the patch and introduced a function
>> pg_print_backend_memory_contexts(PID) which prints the memory contexts 
>> of
>> the specified PID to elog.
> 
> Thanks for the patch!
> 
> 
>>    =# SELECT pg_print_backend_memory_contexts(450855);
>> 
>>    ** log output **
>>    2021-03-17 15:21:01.942 JST [450855] LOG:  Printing memory contexts 
>> of PID 450855
>>    2021-03-17 15:21:01.942 JST [450855] LOG:  level: 0 
>> TopMemoryContext: 68720 total in 5 blocks; 16312 free (15 chunks); 
>> 52408 used
>>    2021-03-17 15:21:01.942 JST [450855] LOG:  level: 1 Prepared 
>> Queries: 65536 total in 4 blocks; 35088 free (14 chunks); 30448 used
>>    2021-03-17 15:21:01.942 JST [450855] LOG:  level: 1 pgstat 
>> TabStatusArray lookup hash table: 8192 total in 1 blocks; 1408 free (0 
>> chunks); 6784 used
>>    ..(snip)..
>>    2021-03-17 15:21:01.942 JST [450855] LOG:  level: 2 
>> CachedPlanSource: 4096 total in 3 blocks; 680 free (0 chunks); 3416 
>> used: PREPARE hoge_200 AS SELECT * FROM pgbench_accounts WHERE aid = 
>> 1111111111111111111111111111111111111...
>>    2021-03-17 15:21:01.942 JST [450855] LOG:  level: 3 
>> CachedPlanQuery: 4096 total in 3 blocks; 464 free (0 chunks); 3632 
>> used
>>    ..(snip)..
>>    2021-03-17 15:21:01.945 JST [450855] LOG:  level: 1 Timezones: 
>> 104128 total in 2 blocks; 2584 free (0 chunks); 101544 used
>>    2021-03-17 15:21:01.945 JST [450855] LOG:  level: 1 ErrorContext: 
>> 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
>>    2021-03-17 15:21:01.945 JST [450855] LOG:  Grand total: 2802080 
>> bytes in 1399 blocks; 480568 free (178 chunks); 2321512 used
>> 
>> 
>> As above, the output is almost the same as MemoryContextStatsPrint()
>> except for the way of expression of the level.
>> MemoryContextStatsPrint() uses indents, but
>> pg_print_backend_memory_contexts() writes it as "level: %d".
> 
> This format looks better to me.
> 
> 
>> Since there was discussion about enlarging StringInfo may cause
>> errors on OOM[1], this patch calls elog for each context.
>> 
>> As with MemoryContextStatsPrint(), each context shows 100
>> children at most.
>> I once thought it should be configurable, but something like
>> pg_print_backend_memory_contexts(PID, num_children) needs to send
>> the 'num_children' from requestor to dumper and it seems to require
>> another infrastructure.
>> Creating a new GUC for this seems overkill.
>> If MemoryContextStatsPrint(), i.e. showing 100 children at most is
>> enough, this hard limit may be acceptable.
> 
> Can't this number be passed via shared memory?

The attached patch uses static shared memory to pass the number.

As documented, the current implementation allows that when multiple
pg_print_backend_memory_contexts() called in succession or
simultaneously, max_children can be the one of another
pg_print_backend_memory_contexts().

I had tried to avoid this by adding some state information and using
before_shmem_exit() in case of process termination for cleaning up the
state information as in the patch I presented earlier, but since kill()
returns success before the dumper called signal handler, it seemed
there were times when we couldn't clean up the state.

Since this happens only when multiple pg_print_backend_memory_contexts()
are called and their specified number of children are different, and the
effect is just the not intended number of children to print, it might be
acceptable.

Or it might be better to wait for some seconds if num_chilren on shared
memory is not the initialized value(meaning some other process is
requesting to print memory contexts).

>> Only superusers can call pg_print_backend_memory_contexts().
> 
> +    /* Only allow superusers to signal superuser-owned backends. */
> +    if (superuser_arg(proc->roleId) && !superuser())
> 
> The patch seems to allow even non-superuser to request to print the 
> memory
> contexts if the target backend is owned by non-superuser. Is this 
> intentional?
> I think that only superuser should be allowed to execute
> pg_print_backend_memory_contexts() whoever owns the target backend.
> Because that function can cause lots of log messages.

Thanks, it's not intentional, modified it.

>> I'm going to add documentation and regression tests.

Added them.

Regards,
Attachment

Re: Get memory contexts of an arbitrary backend process

From
Kyotaro Horiguchi
Date:
At Mon, 22 Mar 2021 15:09:58 +0900, torikoshia <torikoshia@oss.nttdata.com> wrote in 
> >> If MemoryContextStatsPrint(), i.e. showing 100 children at most is
> >> enough, this hard limit may be acceptable.
> > Can't this number be passed via shared memory?
> 
> The attached patch uses static shared memory to pass the number.

"pg_print_backend_memory_contexts"

That name looks like as if it returns the result as text when used on
command-line. We could have pg_get_backend_memory_context(bool
dump_to_log (or where to dump), int limit).  Or couldn't we name it
differently even in the ase we add a separate function?


+/*
+ * MaxChildrenPerContext
+ *         Max number of children to print per one parent context.
+ */
+int     *MaxChildrenPerContext = NULL;

Perhaps it'd be better to have a struct even if it consists only of
one member.  (Aligned) C-int values are atomic so we can omit the
McxtPrintLock. (I don't think it's a problem even if it is modifed
while reading^^:)


+    if(max_children <= 0)
+    {
+        ereport(WARNING,
+                (errmsg("%d is invalid value", max_children),
+                 errhint("second parameter is the number of context and it must be set to a value greater than or
equalto 1")));
 

It's annoying to choose a number large enough when I want to dump
children unlimitedly.  Couldn't we use 0 to specify "unlimited"?

+                (errmsg("%d is invalid value", max_children),
+                 errhint("second parameter is the number of context and it must be set to a value greater than or
equalto 1")));
 

For the main message, (I think) we usually spell the "%d is invalid
value" as "maximum number of children must be positive" or such.  For
the hint, we don't need a copy of the primary section of the
documentation here.

I think we should ERROR out for invalid parameters, at least for
max_children.  I'm not sure about pid since we might call it based on
pg_stat_activity..


+    if(!SendProcSignal(pid, PROCSIG_PRINT_MEMORY_CONTEXT, InvalidBackendId))

We know the backendid of the process here.


+            if (is_dst_stderr)
+            {
+                for (i = 0; i <= level; i++)
+                    fprintf(stderr, "  ");

The fprintf path is used nowhere in the patch at all. It can be used
while attaching debugger but I'm not sure we need that code.  The
footprint of this patch is largely shrinked by removing it.


+        strcat(truncated_ident, delimiter);

strcpy is sufficient here.  And we don't need the delimiter to be a
variable.  (we can copy a string literal into truncate_ident, then
count the length of truncate_ident, instead of the delimiter
variable.)


+        $current_logfiles = slurp_file($node->data_dir . '/current_logfiles');
...
+my $lfname = $current_logfiles;
+$lfname =~ s/^stderr //;
+chomp $lfname;

$node->logfile is the current log file name.

+    'target PID is not PostgreSQL server process');

Maybe "check if PID check is working" or such?  And, we can do
something like the following to exercise in a more practical way.

 select pg_print_backend...(pid,) from pg_stat_activity where backend_type = 'checkpointer';

> As documented, the current implementation allows that when multiple
> pg_print_backend_memory_contexts() called in succession or
> simultaneously, max_children can be the one of another
> pg_print_backend_memory_contexts().
> I had tried to avoid this by adding some state information and using
> before_shmem_exit() in case of process termination for cleaning up the
> state information as in the patch I presented earlier, but since
> kill()
> returns success before the dumper called signal handler, it seemed
> there were times when we couldn't clean up the state.
> Since this happens only when multiple
> pg_print_backend_memory_contexts()
> are called and their specified number of children are different, and
> the
> effect is just the not intended number of children to print, it might
> be
> acceptable.

I see it as a non-issue. Even though the behavior looks somewhat
strange, that usage is stranger than the behavior.

> Or it might be better to wait for some seconds if num_chilren on
> shared
> memory is not the initialized value(meaning some other process is
> requesting to print memory contexts).
> 
> >> Only superusers can call pg_print_backend_memory_contexts().
> > +    /* Only allow superusers to signal superuser-owned backends. */
> > +    if (superuser_arg(proc->roleId) && !superuser())
> > The patch seems to allow even non-superuser to request to print the
> > memory
> > contexts if the target backend is owned by non-superuser. Is this
> > intentional?
> > I think that only superuser should be allowed to execute
> > pg_print_backend_memory_contexts() whoever owns the target backend.
> > Because that function can cause lots of log messages.
> 
> Thanks, it's not intentional, modified it.

By the way we can send a procsig to other than a client backend. And
most of the postgres processes are using the standard signal handler
and intializes the procsig facility. So some of such kind of processes
can dump the memory context information as-is. Otherwise we can add
CHECK_FOR_INTERRUPT to appropriate place to allow that.  I'm not sure
how it is useful for other kind of processes, but it might be useful
for autovacuum workers.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
On 2021-03-23 17:24, Kyotaro Horiguchi wrote:

Thanks for reviewing and suggestions!

> At Mon, 22 Mar 2021 15:09:58 +0900, torikoshia
> <torikoshia@oss.nttdata.com> wrote in
>> >> If MemoryContextStatsPrint(), i.e. showing 100 children at most is
>> >> enough, this hard limit may be acceptable.
>> > Can't this number be passed via shared memory?
>> 
>> The attached patch uses static shared memory to pass the number.
> 
> "pg_print_backend_memory_contexts"
> 
> That name looks like as if it returns the result as text when used on
> command-line. We could have pg_get_backend_memory_context(bool
> dump_to_log (or where to dump), int limit).  Or couldn't we name it
> differently even in the ase we add a separate function?

Redefined pg_get_backend_memory_contexts() as
pg_get_backend_memory_contexts(pid, int max_children).

When pid equals 0, pg_get_backend_memory_contexts() prints local memory
contexts as original pg_get_backend_memory_contexts() does.
In this case, 'max_children' is ignored.

When 'pid' does not equal 0 and it is the PID of the client backend,
  memory contexts are logged through elog().

> 
> +/*
> + * MaxChildrenPerContext
> + *           Max number of children to print per one parent context.
> + */
> +int  *MaxChildrenPerContext = NULL;
> 
> Perhaps it'd be better to have a struct even if it consists only of
> one member.  (Aligned) C-int values are atomic so we can omit the
> McxtPrintLock. (I don't think it's a problem even if it is modifed
> while reading^^:)

Fixed them.

> +     if(max_children <= 0)
> +     {
> +             ereport(WARNING,
> +                             (errmsg("%d is invalid value", 
> max_children),
> +                              errhint("second parameter is the number 
> of context and it must
> be set to a value greater than or equal to 1")));
> 
> It's annoying to choose a number large enough when I want to dump
> children unlimitedly.  Couldn't we use 0 to specify "unlimited"?

Modified as you suggested.

> +                             (errmsg("%d is invalid value", 
> max_children),
> +                              errhint("second parameter is the number 
> of context and it must
> be set to a value greater than or equal to 1")));
> 
> For the main message, (I think) we usually spell the "%d is invalid
> value" as "maximum number of children must be positive" or such.  For
> the hint, we don't need a copy of the primary section of the
> documentation here.

Modified it to "The maximum number of children must be greater than 0".

> 
> I think we should ERROR out for invalid parameters, at least for
> max_children.  I'm not sure about pid since we might call it based on
> pg_stat_activity..

Changed to ERROR out when the 'max_children' is less than 0.

Regarding pid, I left it untouched considering the consistency with 
other
signal sending functions such as pg_cancel_backend().

> +     if(!SendProcSignal(pid, PROCSIG_PRINT_MEMORY_CONTEXT, 
> InvalidBackendId))
> 
> We know the backendid of the process here.

Added it.

> 
> +                     if (is_dst_stderr)
> +                     {
> +                             for (i = 0; i <= level; i++)
> +                                     fprintf(stderr, "  ");
> 
> The fprintf path is used nowhere in the patch at all. It can be used
> while attaching debugger but I'm not sure we need that code.  The
> footprint of this patch is largely shrinked by removing it.

According to the past discussion[1], people wanted MemoryContextStats
as it was, so I think it's better that MemoryContextStats can be used
as before.

> 
> +             strcat(truncated_ident, delimiter);
> 
> strcpy is sufficient here.  And we don't need the delimiter to be a
> variable.  (we can copy a string literal into truncate_ident, then
> count the length of truncate_ident, instead of the delimiter
> variable.)

True.

> +             $current_logfiles = slurp_file($node->data_dir . 
> '/current_logfiles');
> ...
> +my $lfname = $current_logfiles;
> +$lfname =~ s/^stderr //;
> +chomp $lfname;
> 
> $node->logfile is the current log file name.
> 
> +     'target PID is not PostgreSQL server process');
> 
> Maybe "check if PID check is working" or such?  And, we can do
> something like the following to exercise in a more practical way.
> 
>  select pg_print_backend...(pid,) from pg_stat_activity where
> backend_type = 'checkpointer';

It seems better.

>> As documented, the current implementation allows that when multiple
>> pg_print_backend_memory_contexts() called in succession or
>> simultaneously, max_children can be the one of another
>> pg_print_backend_memory_contexts().
>> I had tried to avoid this by adding some state information and using
>> before_shmem_exit() in case of process termination for cleaning up the
>> state information as in the patch I presented earlier, but since
>> kill()
>> returns success before the dumper called signal handler, it seemed
>> there were times when we couldn't clean up the state.
>> Since this happens only when multiple
>> pg_print_backend_memory_contexts()
>> are called and their specified number of children are different, and
>> the
>> effect is just the not intended number of children to print, it might
>> be
>> acceptable.
> 
> I see it as a non-issue. Even though the behavior looks somewhat
> strange, that usage is stranger than the behavior.

Thanks for your comments!

>> Or it might be better to wait for some seconds if num_chilren on
>> shared
>> memory is not the initialized value(meaning some other process is
>> requesting to print memory contexts).
>> 
>> >> Only superusers can call pg_print_backend_memory_contexts().
>> > +  /* Only allow superusers to signal superuser-owned backends. */
>> > +  if (superuser_arg(proc->roleId) && !superuser())
>> > The patch seems to allow even non-superuser to request to print the
>> > memory
>> > contexts if the target backend is owned by non-superuser. Is this
>> > intentional?
>> > I think that only superuser should be allowed to execute
>> > pg_print_backend_memory_contexts() whoever owns the target backend.
>> > Because that function can cause lots of log messages.
>> 
>> Thanks, it's not intentional, modified it.
> 
> By the way we can send a procsig to other than a client backend. And
> most of the postgres processes are using the standard signal handler
> and intializes the procsig facility. So some of such kind of processes
> can dump the memory context information as-is. Otherwise we can add
> CHECK_FOR_INTERRUPT to appropriate place to allow that.  I'm not sure
> how it is useful for other kind of processes, but it might be useful
> for autovacuum workers.

Yeah, I also think it's possible to get memory contexts other than the
client backend.
But I'm not sure whether people want to do it...


[1] https://www.postgresql.org/message-id/906.1513707472%40sss.pgh.pa.us

regards.
Attachment

Re: Get memory contexts of an arbitrary backend process

From
Fujii Masao
Date:

On 2021/03/25 0:17, torikoshia wrote:
> On 2021-03-23 17:24, Kyotaro Horiguchi wrote:
> 
> Thanks for reviewing and suggestions!

The patched version failed to be compiled as follows. Could you fix this issue?

mcxtfuncs.c:22:10: fatal error: utils/mcxtfuncs.h: No such file or directory
  #include "utils/mcxtfuncs.h"
           ^~~~~~~~~~~~~~~~~~~
compilation terminated.
make[4]: *** [<builtin>: mcxtfuncs.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [../../../src/backend/common.mk:39: adt-recursive] Error 2
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [common.mk:39: utils-recursive] Error 2
make[1]: *** [Makefile:42: all-backend-recurse] Error 2
make: *** [GNUmakefile:11: all-src-recurse] Error 2

https://cirrus-ci.com/task/4621477321375744

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
On 2021-03-25 22:02, Fujii Masao wrote:
> On 2021/03/25 0:17, torikoshia wrote:
>> On 2021-03-23 17:24, Kyotaro Horiguchi wrote:
>> 
>> Thanks for reviewing and suggestions!
> 
> The patched version failed to be compiled as follows. Could you fix 
> this issue?

Sorry, it included a header file that's not contained in
the current version patch.

Attached new one.

> mcxtfuncs.c:22:10: fatal error: utils/mcxtfuncs.h: No such file or 
> directory
>  #include "utils/mcxtfuncs.h"
>           ^~~~~~~~~~~~~~~~~~~
> compilation terminated.
> make[4]: *** [<builtin>: mcxtfuncs.o] Error 1
> make[4]: *** Waiting for unfinished jobs....
> make[3]: *** [../../../src/backend/common.mk:39: adt-recursive] Error 2
> make[3]: *** Waiting for unfinished jobs....
> make[2]: *** [common.mk:39: utils-recursive] Error 2
> make[1]: *** [Makefile:42: all-backend-recurse] Error 2
> make: *** [GNUmakefile:11: all-src-recurse] Error 2
> 
> https://cirrus-ci.com/task/4621477321375744
> 
> Regards,

Attachment

Re: Get memory contexts of an arbitrary backend process

From
Kyotaro Horiguchi
Date:
At Thu, 25 Mar 2021 23:45:17 +0900, torikoshia <torikoshia@oss.nttdata.com> wrote in 
> Attached new one.

Thanks!

-    SELECT * FROM pg_get_backend_memory_contexts();
+    SELECT * FROM pg_get_backend_memory_contexts(0, 0);

However we can freely change the signature since it's new in 14, but I
see the (void) signature as useful.  I vaguely thought of defining
pg_get_backend_memory_contexts(void) in pg_proc.dat then defining
(int, int) as a different function in system_views.sql.  That allows
the function of the second signature to output nothing. You can make a
distinction between the two signatures by using PG_NARGS() in the C
function.

That being said, it's somewhat uneasy that two functions with the same
name returns different values.  I'd like to hear others what they feel
like about the behavior.


+# print memory context of specified backend
+

This looks like a garbage.


+    if( pid == 0)

Odd spacing:p

+note "current_logfiles = $current_logfiles";
+
+like(
+    $current_logfiles,
+    qr|^stderr log/postgresql-.*log$|,
+    'current_logfiles is sane');
+
+my $lfname = $current_logfiles;
+$lfname =~ s/^stderr //;
+chomp $lfname;
+
+my $logfile;
+my $print_count;
+
+# Verify that the backtraces of the processes are logged into logfile.
+for (my $attempts = 0; $attempts < $max_attempts; $attempts++)
+{
+    $logfile = $node->data_dir . '/' . $lfname;

It seems that you forgot to include the change on this part in v4.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Get memory contexts of an arbitrary backend process

From
Fujii Masao
Date:

On 2021/03/26 12:01, Kyotaro Horiguchi wrote:
> At Thu, 25 Mar 2021 23:45:17 +0900, torikoshia <torikoshia@oss.nttdata.com> wrote in
>> Attached new one.

Regarding the argument max_children, isn't it better to set its default value,
e.g., 100 like MemoryContextStats() uses?

+        ereport(WARNING,
+                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+                 errmsg("must be a superuser to log memory contexts")));

IMO this type of error, i.e., permission error, should be treated as ERROR
rather than WARNING, like pg_terminate_backend() does.

+        ereport(WARNING,
+                (errmsg("failed to send signal: %m")));

Per message style rule, "could not send signal to process %d: %m" is better?

> Thanks!
> 
> -    SELECT * FROM pg_get_backend_memory_contexts();
> +    SELECT * FROM pg_get_backend_memory_contexts(0, 0);
> 
> However we can freely change the signature since it's new in 14, but I
> see the (void) signature as useful.  I vaguely thought of defining
> pg_get_backend_memory_contexts(void) in pg_proc.dat then defining
> (int, int) as a different function in system_views.sql.  That allows
> the function of the second signature to output nothing. You can make a
> distinction between the two signatures by using PG_NARGS() in the C
> function.
> 
> That being said, it's somewhat uneasy that two functions with the same
> name returns different values.  I'd like to hear others what they feel
> like about the behavior.

I think that it's confusing to merge two different features into one function.
Isn't it better to leave pg_get_backend_memory_contexts() as it is, and
make the log-memory-contexts feature as separate function, e.g.,
pg_log_backend_memory_contexts()?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Get memory contexts of an arbitrary backend process

From
Fujii Masao
Date:

On 2021/03/26 12:26, Fujii Masao wrote:
> 
> 
> On 2021/03/26 12:01, Kyotaro Horiguchi wrote:
>> At Thu, 25 Mar 2021 23:45:17 +0900, torikoshia <torikoshia@oss.nttdata.com> wrote in
>>> Attached new one.
> 
> Regarding the argument max_children, isn't it better to set its default value,
> e.g., 100 like MemoryContextStats() uses?

+    mcxtLogData->maxChildrenPerContext = max_children;
+
+    if(!SendProcSignal(pid, PROCSIG_LOG_MEMORY_CONTEXT, proc->backendId))
+        PG_RETURN_BOOL(true);

Do we need memory barrier here? Otherwise, there is a race condition
where maxChildrenPerContext has not been set yet when the target backend
that received signal read that shared variable. No?

+        Note that when multiple
+        <function>pg_get_backend_memory_contexts</function> called in
+        succession or simultaneously, <parameter>max_children</parameter> can
+        be the one of another
+        <function>pg_get_backend_memory_contexts</function>.
+       </para></entry>

This might not be an issue in practice as Horiguchi-san said upthread.
But this looks a bit ugly and might be footgun later. The current approach
using shared memory to pass this information to the target backends
might be overkill, and too complicated to polish the design at the stage
just before feature freeze. So I'd withdraw my previous comment and
am OK to use the hard-coded value as max_children at the first version
of this feature. Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Get memory contexts of an arbitrary backend process

From
Kyotaro Horiguchi
Date:
At Fri, 26 Mar 2021 12:26:31 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2021/03/26 12:01, Kyotaro Horiguchi wrote:
> > At Thu, 25 Mar 2021 23:45:17 +0900, torikoshia
> > <torikoshia@oss.nttdata.com> wrote in
> >> Attached new one.
> 
> Regarding the argument max_children, isn't it better to set its
> default value,
> e.g., 100 like MemoryContextStats() uses?
> +        ereport(WARNING,
> +                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> + errmsg("must be a superuser to log memory contexts")));
> 
> IMO this type of error, i.e., permission error, should be treated as
> ERROR
> rather than WARNING, like pg_terminate_backend() does.
> +        ereport(WARNING,
> +                (errmsg("failed to send signal: %m")));
> 
> Per message style rule, "could not send signal to process %d: %m" is
> better?

+1 x 3 for the above.

> > Thanks!
> > -    SELECT * FROM pg_get_backend_memory_contexts();
> > +    SELECT * FROM pg_get_backend_memory_contexts(0, 0);
> > However we can freely change the signature since it's new in 14, but I
> > see the (void) signature as useful.  I vaguely thought of defining
> > pg_get_backend_memory_contexts(void) in pg_proc.dat then defining
> > (int, int) as a different function in system_views.sql.  That allows
> > the function of the second signature to output nothing. You can make a
> > distinction between the two signatures by using PG_NARGS() in the C
> > function.
> > That being said, it's somewhat uneasy that two functions with the same
> > name returns different values.  I'd like to hear others what they feel
> > like about the behavior.
> 
> I think that it's confusing to merge two different features into one
> function.
> Isn't it better to leave pg_get_backend_memory_contexts() as it is,
> and
> make the log-memory-contexts feature as separate function, e.g.,
> pg_log_backend_memory_contexts()?

Yeah, that name looks better than pg_print_ba.. and I agree to
separate the two features.  Sorry for wagging the discussion
back-and-forth.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Get memory contexts of an arbitrary backend process

From
Kyotaro Horiguchi
Date:
At Fri, 26 Mar 2021 12:49:16 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> On 2021/03/26 12:26, Fujii Masao wrote:
> > On 2021/03/26 12:01, Kyotaro Horiguchi wrote:
> >> At Thu, 25 Mar 2021 23:45:17 +0900, torikoshia
> >> <torikoshia@oss.nttdata.com> wrote in
> >>> Attached new one.
> > Regarding the argument max_children, isn't it better to set its
> > default value,
> > e.g., 100 like MemoryContextStats() uses?
> 
> +    mcxtLogData->maxChildrenPerContext = max_children;
> +
> + if(!SendProcSignal(pid, PROCSIG_LOG_MEMORY_CONTEXT,
> proc->backendId))
> +        PG_RETURN_BOOL(true);
> 
> Do we need memory barrier here? Otherwise, there is a race condition
> where maxChildrenPerContext has not been set yet when the target
> backend
> that received signal read that shared variable. No?
> 
> +        Note that when multiple
> +        <function>pg_get_backend_memory_contexts</function> called in
> + succession or simultaneously, <parameter>max_children</parameter>
> can
> +        be the one of another
> +        <function>pg_get_backend_memory_contexts</function>.
> +       </para></entry>
> 
> This might not be an issue in practice as Horiguchi-san said upthread.
> But this looks a bit ugly and might be footgun later. The current
> approach
> using shared memory to pass this information to the target backends
> might be overkill, and too complicated to polish the design at the
> stage
> just before feature freeze. So I'd withdraw my previous comment and
> am OK to use the hard-coded value as max_children at the first version
> of this feature. Thought?

The dumper function silently suppresses logging when there are too
many children.  Suppressing a part of output when the user didn't told
anything looks like just a misbehavior (even if it is written in the
documentation..), especially when the suppressed contexts occupy the
majority of memory usage.  I think the fixed-limit of lines works if
the lines are in descending order of memory usage.

At least we need an additional line to inform the suppression.

"some contexts are omitted"
"n child contexts: total_bytes = ..."

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Get memory contexts of an arbitrary backend process

From
Kyotaro Horiguchi
Date:
At Fri, 26 Mar 2021 13:17:21 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Fri, 26 Mar 2021 12:49:16 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> > On 2021/03/26 12:26, Fujii Masao wrote:
> > > On 2021/03/26 12:01, Kyotaro Horiguchi wrote:
> > >> At Thu, 25 Mar 2021 23:45:17 +0900, torikoshia
> > >> <torikoshia@oss.nttdata.com> wrote in
> > >>> Attached new one.
> > > Regarding the argument max_children, isn't it better to set its
> > > default value,
> > > e.g., 100 like MemoryContextStats() uses?
> > 
> > +    mcxtLogData->maxChildrenPerContext = max_children;
> > +
> > + if(!SendProcSignal(pid, PROCSIG_LOG_MEMORY_CONTEXT,
> > proc->backendId))
> > +        PG_RETURN_BOOL(true);
> > 
> > Do we need memory barrier here? Otherwise, there is a race condition
> > where maxChildrenPerContext has not been set yet when the target
> > backend
> > that received signal read that shared variable. No?
> > 
> > +        Note that when multiple
> > +        <function>pg_get_backend_memory_contexts</function> called in
> > + succession or simultaneously, <parameter>max_children</parameter>
> > can
> > +        be the one of another
> > +        <function>pg_get_backend_memory_contexts</function>.
> > +       </para></entry>
> > 
> > This might not be an issue in practice as Horiguchi-san said upthread.
> > But this looks a bit ugly and might be footgun later. The current
> > approach
> > using shared memory to pass this information to the target backends
> > might be overkill, and too complicated to polish the design at the
> > stage
> > just before feature freeze. So I'd withdraw my previous comment and
> > am OK to use the hard-coded value as max_children at the first version
> > of this feature. Thought?
> 
> The dumper function silently suppresses logging when there are too
> many children.  Suppressing a part of output when the user didn't told
> anything looks like just a misbehavior (even if it is written in the
> documentation..), especially when the suppressed contexts occupy the
> majority of memory usage.  I think the fixed-limit of lines works if
> the lines are in descending order of memory usage.
> 
> At least we need an additional line to inform the suppression.


> "some contexts are omitted"
> "n child contexts: total_bytes = ..."

Sorry I missed that is already implemented.  So my opnion is I agree
with limiting with a fixed-number, and preferablly sorted in
descending order of... totalspace/nblocks?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Get memory contexts of an arbitrary backend process

From
Fujii Masao
Date:

On 2021/03/26 13:28, Kyotaro Horiguchi wrote:
>> "some contexts are omitted"
>> "n child contexts: total_bytes = ..."
> 
> Sorry I missed that is already implemented.  So my opnion is I agree
> with limiting with a fixed-number, and preferablly sorted in
> descending order of... totalspace/nblocks?

This may be an improvement, but makes us modify MemoryContextStatsInternal()
very much. I'm afraid that it's too late to do that at this stage...
What about leaving the output order as it is at the first version?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Get memory contexts of an arbitrary backend process

From
Kyotaro Horiguchi
Date:
At Fri, 26 Mar 2021 14:02:49 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2021/03/26 13:28, Kyotaro Horiguchi wrote:
> >> "some contexts are omitted"
> >> "n child contexts: total_bytes = ..."
> > Sorry I missed that is already implemented.  So my opnion is I agree
> > with limiting with a fixed-number, and preferablly sorted in
> > descending order of... totalspace/nblocks?
> 
> This may be an improvement, but makes us modify
> MemoryContextStatsInternal()
> very much. I'm afraid that it's too late to do that at this stage...
> What about leaving the output order as it is at the first version?

So I said "preferably":p  (with a misspelling...)
I'm fine with that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
On 2021-03-26 14:08, Kyotaro Horiguchi wrote:
> At Fri, 26 Mar 2021 14:02:49 +0900, Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote in
>> 
>> 
>> On 2021/03/26 13:28, Kyotaro Horiguchi wrote:
>> >> "some contexts are omitted"
>> >> "n child contexts: total_bytes = ..."
>> > Sorry I missed that is already implemented.  So my opnion is I agree
>> > with limiting with a fixed-number, and preferablly sorted in
>> > descending order of... totalspace/nblocks?
>> 
>> This may be an improvement, but makes us modify
>> MemoryContextStatsInternal()
>> very much. I'm afraid that it's too late to do that at this stage...
>> What about leaving the output order as it is at the first version?
> 
> So I said "preferably":p  (with a misspelling...)
> I'm fine with that.
> 
> regards.

Thanks for the comments!

Attached a new patch.

It adds pg_log_backend_memory_contexts(pid) which logs memory contexts
of the specified backend process.

The number of child contexts to be logged per parent is limited to 100
as with MemoryContextStats().

As written in commit 7b5ef8f2d07, which limits the verbosity of
memory context statistics dumps, it supposes that practical cases
where the dump gets long will typically be huge numbers of
siblings under the same parent context; while the additional
debugging value from seeing details about individual siblings
beyond 100 will not be large.

Thoughts?


Regards.
Attachment

Re: Get memory contexts of an arbitrary backend process

From
Fujii Masao
Date:

On 2021/03/29 11:59, torikoshia wrote:
> On 2021-03-26 14:08, Kyotaro Horiguchi wrote:
>> At Fri, 26 Mar 2021 14:02:49 +0900, Fujii Masao
>> <masao.fujii@oss.nttdata.com> wrote in
>>>
>>>
>>> On 2021/03/26 13:28, Kyotaro Horiguchi wrote:
>>> >> "some contexts are omitted"
>>> >> "n child contexts: total_bytes = ..."
>>> > Sorry I missed that is already implemented.  So my opnion is I agree
>>> > with limiting with a fixed-number, and preferablly sorted in
>>> > descending order of... totalspace/nblocks?
>>>
>>> This may be an improvement, but makes us modify
>>> MemoryContextStatsInternal()
>>> very much. I'm afraid that it's too late to do that at this stage...
>>> What about leaving the output order as it is at the first version?
>>
>> So I said "preferably":p  (with a misspelling...)
>> I'm fine with that.
>>
>> regards.
> 
> Thanks for the comments!
> 
> Attached a new patch.

Thanks!

> It adds pg_log_backend_memory_contexts(pid) which logs memory contexts
> of the specified backend process.
> 
> The number of child contexts to be logged per parent is limited to 100
> as with MemoryContextStats().
> 
> As written in commit 7b5ef8f2d07, which limits the verbosity of
> memory context statistics dumps, it supposes that practical cases
> where the dump gets long will typically be huge numbers of
> siblings under the same parent context; while the additional
> debugging value from seeing details about individual siblings
> beyond 100 will not be large.
> 
> Thoughts?

I'm OK with 100. We should comment why we chose 100 for that.

Here are some review comments.

Isn't it better to move HandleProcSignalLogMemoryContext() and
ProcessLogMemoryContextInterrupt() to mcxt.c from procsignal.c
(like the functions for notify interrupt are defined in async.c)
because they are the functions for memory contexts?

+ * HandleProcSignalLogMemoryContext
+ *
+ * Handle receipt of an interrupt indicating log memory context.
+ * Signal handler portion of interrupt handling.

IMO it's better to comment why we need to separate the function into two,
i.e., HandleProcSignalLogMemoryContext() and ProcessLogMemoryContextInterrupt(),
like the comment for other similar function explains. What about the followings?

-------------------------------
HandleLogMemoryContextInterrupt

Handle receipt of an interrupt indicating logging of memory contexts.

All the actual work is deferred to ProcessLogMemoryContextInterrupt(),
because we cannot safely emit a log message inside the signal handler.
-------------------------------
ProcessLogMemoryContextInterrupt

Perform logging of memory contexts of this backend process.

Any backend that participates in ProcSignal signaling must arrange to
call this function if we see LogMemoryContextPending set. It is called
from CHECK_FOR_INTERRUPTS(), which is enough because the target process
for logging of memory contexts is a backend.
-------------------------------


+    if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
+        HandleProcSignalLogMemoryContext();
+
      if (CheckProcSignal(PROCSIG_BARRIER))
          HandleProcSignalBarrierInterrupt();

The code for memory context logging interrupt came after barrier interrupt
in other places, e.g., procsignal.h. Why is this order of code different?

+/*
+ * pg_log_backend_memory_contexts
+ *        Print memory context of the specified backend process.

Isn't it better to move pg_log_backend_memory_contexts() to mcxtfuncs.c
from mcxt.c because this is the SQL function memory contexts?

IMO we should comment why we allow only superuser to call this function.
What about the following?

-----------------
Signal a backend process to log its memory contexts.

Only superusers are allowed to signal to log the memory contexts
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of service.
-----------------

+    PGPROC        *proc = BackendPidGetProc(pid);
+
+    /* Check whether the target process is PostgreSQL backend process. */
+    if (proc == NULL)

What about adding more comments as follows?

---------------------------------
+    /*
+     * BackendPidGetProc returns NULL if the pid isn't valid; but by the time
+     * we reach kill(), a process for which we get a valid proc here might
+     * have terminated on its own.  There's no way to acquire a lock on an
+     * arbitrary process to prevent that. But since this mechanism is usually
+     * used to debug a backend running and consuming lots of memory,
+     * that it might end on its own first and its memory contexts are not
+     * logged is not a problem.
+     */
+    if (proc == NULL)
+    {
+        /*
+         * This is just a warning so a loop-through-resultset will not abort
+         * if one backend logged its memory contexts during the run.
+         */
+        ereport(WARNING,
---------------------------------

+        ereport(ERROR,
+                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+                 errmsg("must be a superuser to log memory contexts")));
+        PG_RETURN_BOOL(false);

This PG_RETURN_BOOL(false) is unnecessary because ereport() will emit an ERROR?

+$node->psql('postgres', 'select pg_log_backend_memory_contexts(pg_backend_pid())');
+
+my $logfile = slurp_file($node->logfile);
+like($logfile, qr/Grand total: \d+ bytes in \d+ blocks/,
+        'found expected memory context print in the log file');

Isn't there the case where the memory contexts have not been logged yet
when slurp_file() is called? It's rare, though. If there is the risk for that,
we should wait for the memory contexts to be actually logged?

BTW, I agree that we should have the regression test that calls
pg_log_backend_memory_contexts() and checks, e.g., that it doesn't cause
segmentation fault. But I'm just wondering if it's a bit overkill to make perl
scriptand start new node  to test only this function...

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
On 2021-03-30 02:28, Fujii Masao wrote:

Thanks for reviewing and kind suggestions!

>> It adds pg_log_backend_memory_contexts(pid) which logs memory contexts
>> of the specified backend process.
>> 
>> The number of child contexts to be logged per parent is limited to 100
>> as with MemoryContextStats().
>> 
>> As written in commit 7b5ef8f2d07, which limits the verbosity of
>> memory context statistics dumps, it supposes that practical cases
>> where the dump gets long will typically be huge numbers of
>> siblings under the same parent context; while the additional
>> debugging value from seeing details about individual siblings
>> beyond 100 will not be large.
>> 
>> Thoughts?
> 
> I'm OK with 100. We should comment why we chose 100 for that.

Added following comments.

+       /*
+        * When a backend process is consuming huge memory, logging all 
its
+        * memory contexts might overrun available disk space. To 
prevent
+        * this, we limit the number of child contexts per parent to 
100.
+        *
+        * As with MemoryContextStats(), we suppose that practical cases
+        * where the dump gets long will typically be huge numbers of
+        * siblings under the same parent context; while the additional
+        * debugging value from seeing details about individual siblings
+        * beyond 100 will not be large.
+        */
+       MemoryContextStatsDetail(TopMemoryContext, 100, false);

> 
> Here are some review comments.
> 
> Isn't it better to move HandleProcSignalLogMemoryContext() and
> ProcessLogMemoryContextInterrupt() to mcxt.c from procsignal.c
> (like the functions for notify interrupt are defined in async.c)
> because they are the functions for memory contexts?

Agreed.
Also renamed HandleProcSignalLogMemoryContext to
HandleLogMemoryContextInterrupt.

> + * HandleProcSignalLogMemoryContext
> + *
> + * Handle receipt of an interrupt indicating log memory context.
> + * Signal handler portion of interrupt handling.
> 
> IMO it's better to comment why we need to separate the function into 
> two,
> i.e., HandleProcSignalLogMemoryContext() and 
> ProcessLogMemoryContextInterrupt(),
> like the comment for other similar function explains. What about the 
> followings?

Thanks! Changed them to the suggested one.

> -------------------------------
> HandleLogMemoryContextInterrupt
> 
> Handle receipt of an interrupt indicating logging of memory contexts.
> 
> All the actual work is deferred to ProcessLogMemoryContextInterrupt(),
> because we cannot safely emit a log message inside the signal handler.
> -------------------------------
> ProcessLogMemoryContextInterrupt
> 
> Perform logging of memory contexts of this backend process.
> 
> Any backend that participates in ProcSignal signaling must arrange to
> call this function if we see LogMemoryContextPending set. It is called
> from CHECK_FOR_INTERRUPTS(), which is enough because the target process
> for logging of memory contexts is a backend.
> -------------------------------
> 
> 
> +    if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
> +        HandleProcSignalLogMemoryContext();
> +
>      if (CheckProcSignal(PROCSIG_BARRIER))
>          HandleProcSignalBarrierInterrupt();
> 
> The code for memory context logging interrupt came after barrier 
> interrupt
> in other places, e.g., procsignal.h. Why is this order of code 
> different?

Fixed.

> +/*
> + * pg_log_backend_memory_contexts
> + *        Print memory context of the specified backend process.
> 
> Isn't it better to move pg_log_backend_memory_contexts() to mcxtfuncs.c
> from mcxt.c because this is the SQL function memory contexts?

Agreed.

> IMO we should comment why we allow only superuser to call this 
> function.
> What about the following?

Thanks!
Modified the patch according to the suggestions.

> -----------------
> Signal a backend process to log its memory contexts.
> 
> Only superusers are allowed to signal to log the memory contexts
> because allowing any users to issue this request at an unbounded rate
> would cause lots of log messages and which can lead to denial of 
> service.
> -----------------
> 
> +    PGPROC        *proc = BackendPidGetProc(pid);
> +
> +    /* Check whether the target process is PostgreSQL backend process. */
> +    if (proc == NULL)
> 
> What about adding more comments as follows?
> 
> ---------------------------------
> +    /*
> +     * BackendPidGetProc returns NULL if the pid isn't valid; but by the 
> time
> +     * we reach kill(), a process for which we get a valid proc here 
> might
> +     * have terminated on its own.  There's no way to acquire a lock on 
> an
> +     * arbitrary process to prevent that. But since this mechanism is 
> usually
> +     * used to debug a backend running and consuming lots of memory,
> +     * that it might end on its own first and its memory contexts are not
> +     * logged is not a problem.
> +     */
> +    if (proc == NULL)
> +    {
> +        /*
> +         * This is just a warning so a loop-through-resultset will not abort
> +         * if one backend logged its memory contexts during the run.
> +         */
> +        ereport(WARNING,
> ---------------------------------
> 
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> +                 errmsg("must be a superuser to log memory contexts")));
> +        PG_RETURN_BOOL(false);
> 
> This PG_RETURN_BOOL(false) is unnecessary because ereport() will emit 
> an ERROR?
> 
> +$node->psql('postgres', 'select
> pg_log_backend_memory_contexts(pg_backend_pid())');
> +
> +my $logfile = slurp_file($node->logfile);
> +like($logfile, qr/Grand total: \d+ bytes in \d+ blocks/,
> +        'found expected memory context print in the log file');
> 
> Isn't there the case where the memory contexts have not been logged yet
> when slurp_file() is called? It's rare, though. If there is the risk 
> for that,
> we should wait for the memory contexts to be actually logged?
> 
> BTW, I agree that we should have the regression test that calls
> pg_log_backend_memory_contexts() and checks, e.g., that it doesn't 
> cause
> segmentation fault. But I'm just wondering if it's a bit overkill to 
> make perl
> scriptand start new node  to test only this function...

OK, I removed the perl TAP test and added a regression test running 
pg_log_backend_memory_contexts(pg_backend_pid()) and checking it returns 
't'.


Regards.
Attachment

Re: Get memory contexts of an arbitrary backend process

From
Fujii Masao
Date:

On 2021/03/30 22:06, torikoshia wrote:
> Modified the patch according to the suggestions.

Thanks for updating the patch!

I applied the cosmetic changes to the patch and added the example of
the function call into the document. Attached is the updated version
of the patch. Could you check this version?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
On 2021-03-31 04:36, Fujii Masao wrote:
> On 2021/03/30 22:06, torikoshia wrote:
>> Modified the patch according to the suggestions.
> 
> Thanks for updating the patch!
> 
> I applied the cosmetic changes to the patch and added the example of
> the function call into the document. Attached is the updated version
> of the patch. Could you check this version?
> 

Thanks a lot!


+The memory contexts will be logged in the log file. For example:

When 'log_destination = stderr' and 'logging_collector = off', it does
not log in the file but in the stderr.

Description like below would be a bit more accurate but I'm wondering
it's  repeating the same words.

+ The memory contexts will be logged based on the log configuration set. 
For example:

How do you think?


+<programlisting>
+postgres=# SELECT pg_log_backend_memory_contexts(pg_backend_pid());
+ pg_log_backend_memory_contexts
+--------------------------------
+ t
+(1 row)
+
+The memory contexts will be logged in the log file. For example:
+LOG:  logging memory contexts of PID 10377
+STATEMENT:  SELECT pg_log_backend_memory_contexts(pg_backend_pid());
+LOG:  level: 0; TopMemoryContext: 80800 total in 6 blocks; 14432 free 
(5 chunks); 66368 used
+LOG:  level: 1; pgstat TabStatusArray lookup hash table: 8192 total in 
1 blocks; 1408 free (0 chunks); 6784 used

The line "The memory contexts will be logged in the log file. For 
example:"
is neither nor SQL command and its outputs, it might be better to
differentiate it.


What about the following like attached patch?

+<programlisting>
+postgres=# SELECT pg_log_backend_memory_contexts(pg_backend_pid());
+ pg_log_backend_memory_contexts
+--------------------------------
+ t
+(1 row)
+</programlisting>
+The memory contexts will be logged in the log file. For example:
+<screen>
+LOG:  logging memory contexts of PID 10377
+STATEMENT:  SELECT pg_log_backend_memory_contexts(pg_backend_pid());
+LOG:  level: 0; TopMemoryContext: 80800 total in 6 blocks; 14432 free 
(5 chunks); 66368 used
+LOG:  level: 1; pgstat TabStatusArray lookup hash table: 8192 total in 
1 blocks; 1408 free (0 chunks); 6784 used
...(snip)...
+LOG:  level: 1; ErrorContext: 8192 total in 1 blocks; 7928 free (3 
chunks); 264 used
+LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 
chunks); 1029560 used
+</screen>


Regards.
Attachment

Re: Get memory contexts of an arbitrary backend process

From
Kyotaro Horiguchi
Date:
At Wed, 31 Mar 2021 15:02:02 +0900, torikoshia <torikoshia@oss.nttdata.com> wrote in 
> On 2021-03-31 04:36, Fujii Masao wrote:
> > On 2021/03/30 22:06, torikoshia wrote:
> >> Modified the patch according to the suggestions.
> > Thanks for updating the patch!
> > I applied the cosmetic changes to the patch and added the example of
> > the function call into the document. Attached is the updated version
> > of the patch. Could you check this version?
> > 
> 
> Thanks a lot!
> 
> 
> +The memory contexts will be logged in the log file. For example:
> 
> When 'log_destination = stderr' and 'logging_collector = off', it does
> not log in the file but in the stderr.
> 
> Description like below would be a bit more accurate but I'm wondering
> it's  repeating the same words.
> 
> + The memory contexts will be logged based on the log configuration
> set. For example:
> 
> How do you think?

How about "The memory contexts will be logged in the server log" ?
I think "server log" doesn't suggest any concrete target.

> +<programlisting>
> +postgres=# SELECT pg_log_backend_memory_contexts(pg_backend_pid());
> + pg_log_backend_memory_contexts
> +--------------------------------
> + t
> +(1 row)
> +
> +The memory contexts will be logged in the log file. For example:
> +LOG:  logging memory contexts of PID 10377
> +STATEMENT:  SELECT pg_log_backend_memory_contexts(pg_backend_pid());
> +LOG: level: 0; TopMemoryContext: 80800 total in 6 blocks; 14432 free
> (5 chunks); 66368 used
> +LOG: level: 1; pgstat TabStatusArray lookup hash table: 8192 total in
> 1 blocks; 1408 free (0 chunks); 6784 used
> 
> The line "The memory contexts will be logged in the log file. For
> example:"
> is neither nor SQL command and its outputs, it might be better to
> differentiate it.
> 
> 
> What about the following like attached patch?
> 
> +<programlisting>
> +postgres=# SELECT pg_log_backend_memory_contexts(pg_backend_pid());
> + pg_log_backend_memory_contexts
> +--------------------------------
> + t
> +(1 row)
> +</programlisting>
> +The memory contexts will be logged in the log file. For example:
> +<screen>
> +LOG:  logging memory contexts of PID 10377
> +STATEMENT:  SELECT pg_log_backend_memory_contexts(pg_backend_pid());
> +LOG: level: 0; TopMemoryContext: 80800 total in 6 blocks; 14432 free
> (5 chunks); 66368 used
> +LOG: level: 1; pgstat TabStatusArray lookup hash table: 8192 total in
> 1 blocks; 1408 free (0 chunks); 6784 used
> ...(snip)...
> +LOG: level: 1; ErrorContext: 8192 total in 1 blocks; 7928 free (3
> chunks); 264 used
> +LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88
> chunks); 1029560 used
> +</screen>

+1 from me. It looks like more compliant with the standard?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Get memory contexts of an arbitrary backend process

From
Fujii Masao
Date:

On 2021/03/31 15:16, Kyotaro Horiguchi wrote:
>> + The memory contexts will be logged based on the log configuration
>> set. For example:
>>
>> How do you think?
> 
> How about "The memory contexts will be logged in the server log" ?
> I think "server log" doesn't suggest any concrete target.

Or just using "logged" is enough?

Also I'd like to document that one message for each memory context is logged.
So what about the following?

     One message for each memory context will be logged. For example,


> +1 from me. It looks like more compliant with the standard?

+1, too.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
On 2021-04-01 19:13, Fujii Masao wrote:
> On 2021/03/31 15:16, Kyotaro Horiguchi wrote:
>>> + The memory contexts will be logged based on the log configuration
>>> set. For example:
>>> 
>>> How do you think?
>> 
>> How about "The memory contexts will be logged in the server log" ?
>> I think "server log" doesn't suggest any concrete target.
> 
> Or just using "logged" is enough?
> 
> Also I'd like to document that one message for each memory context is 
> logged.
> So what about the following?
> 
>     One message for each memory context will be logged. For example,


Agreed.

BTW, there was a conflict since c30f54ad732(Detect POLLHUP/POLLRDHUP 
while
running queries), attached v9.


Regards,
Attachment

Re: Get memory contexts of an arbitrary backend process

From
Zhihong Yu
Date:


On Sun, Apr 4, 2021 at 7:56 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2021-04-01 19:13, Fujii Masao wrote:
> On 2021/03/31 15:16, Kyotaro Horiguchi wrote:
>>> + The memory contexts will be logged based on the log configuration
>>> set. For example:
>>>
>>> How do you think?
>>
>> How about "The memory contexts will be logged in the server log" ?
>> I think "server log" doesn't suggest any concrete target.
>
> Or just using "logged" is enough?
>
> Also I'd like to document that one message for each memory context is
> logged.
> So what about the following?
>
>     One message for each memory context will be logged. For example,


Agreed.

BTW, there was a conflict since c30f54ad732(Detect POLLHUP/POLLRDHUP
while
running queries), attached v9.


Regards,

Hi,

+ * On receipt of this signal, a backend sets the flag in the signal
+ * handler, and then which causes the next CHECK_FOR_INTERRUPTS()

I think the 'and then' is not needed:

  handler which causes the next ...

+        * This is just a warning so a loop-through-resultset will not abort
+        * if one backend logged its memory contexts during the run.

The pid given by arg 0 is not a PostgreSQL server process. Which other backend could it be ?

Thanks

Re: Get memory contexts of an arbitrary backend process

From
Fujii Masao
Date:

On 2021/04/05 12:20, Zhihong Yu wrote:
> +        * This is just a warning so a loop-through-resultset will not abort
> +        * if one backend logged its memory contexts during the run.
> 
> The pid given by arg 0 is not a PostgreSQL server process. Which other backend could it be ?

This is the comment that I added wrongly. So the comment should be
"This is just a warning so a loop-through-resultset will not abort
if one backend terminated on its own during the run.",
like pg_signal_backend(). Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
On 2021-04-05 12:59, Fujii Masao wrote:
> On 2021/04/05 12:20, Zhihong Yu wrote:

Thanks for reviewing!

>> + * On receipt of this signal, a backend sets the flag in the signal
>> + * handler, and then which causes the next CHECK_FOR_INTERRUPTS()

>> I think the 'and then' is not needed:

Although I wonder either would be fine, removed the words.

>> +        * This is just a warning so a loop-through-resultset will not 
>> abort
>> +        * if one backend logged its memory contexts during the run.
>> 
>> The pid given by arg 0 is not a PostgreSQL server process. Which other 
>> backend could it be ?
> 
> This is the comment that I added wrongly. So the comment should be
> "This is just a warning so a loop-through-resultset will not abort
> if one backend terminated on its own during the run.",
> like pg_signal_backend(). Thought?

+1.

Attached v10 patch.


Regards,
Attachment

Re: Get memory contexts of an arbitrary backend process

From
Fujii Masao
Date:

On 2021/04/05 21:03, torikoshia wrote:
> On 2021-04-05 12:59, Fujii Masao wrote:
>> On 2021/04/05 12:20, Zhihong Yu wrote:
> 
> Thanks for reviewing!
> 
>>> + * On receipt of this signal, a backend sets the flag in the signal
>>> + * handler, and then which causes the next CHECK_FOR_INTERRUPTS()
> 
>>> I think the 'and then' is not needed:
> 
> Although I wonder either would be fine, removed the words.
> 
>>> +        * This is just a warning so a loop-through-resultset will not abort
>>> +        * if one backend logged its memory contexts during the run.
>>>
>>> The pid given by arg 0 is not a PostgreSQL server process. Which other backend could it be ?
>>
>> This is the comment that I added wrongly. So the comment should be
>> "This is just a warning so a loop-through-resultset will not abort
>> if one backend terminated on its own during the run.",
>> like pg_signal_backend(). Thought?
> 
> +1.
> 
> Attached v10 patch.

Thanks for updating the patch!

I updated the patch as follows. Could you check the attached patch?

+        Memory contexts will be logged based on the log configuration set.
+        See <xref linkend="runtime-config-logging"/> for more information.

Those memory contexts are logged at LOG level, but they are not sent to
a client whatever the setting of client_min_messages. I think
this information should be documented. So I updated the document as follows.

     These memory contexts will be logged at <literal>LOG</literal>
     message level. They will appear in the server log based on
     the log configuration set (See <xref linkend="runtime-config-logging"/>
     for more information), but will not be sent to the client whatever
     the setting of <xref linkend="guc-client-min-messages"/>.

+        Only superusers can log the memory contexts.

We can read this description as "only superusers can request to log ...".
But ISTM that we can also read this as "only superusers can log (dump)
the memory contexts of its backend". Right? To avoid this confusion,
I updated this description as follows.

     Only superusers can request to log the memory contexts.

I added the following note about the performance overhead by this function.

     Note that frequent calls to this function could incur significant overhead,
     because it may generate a large number of log messages.

I also added some comments.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

Re: Get memory contexts of an arbitrary backend process

From
torikoshia
Date:
On 2021-04-06 00:08, Fujii Masao wrote:
> On 2021/04/05 21:03, torikoshia wrote:
>> On 2021-04-05 12:59, Fujii Masao wrote:
>>> On 2021/04/05 12:20, Zhihong Yu wrote:
>> 
>> Thanks for reviewing!
>> 
>>>> + * On receipt of this signal, a backend sets the flag in the signal
>>>> + * handler, and then which causes the next CHECK_FOR_INTERRUPTS()
>> 
>>>> I think the 'and then' is not needed:
>> 
>> Although I wonder either would be fine, removed the words.
>> 
>>>> +        * This is just a warning so a loop-through-resultset will 
>>>> not abort
>>>> +        * if one backend logged its memory contexts during the run.
>>>> 
>>>> The pid given by arg 0 is not a PostgreSQL server process. Which 
>>>> other backend could it be ?
>>> 
>>> This is the comment that I added wrongly. So the comment should be
>>> "This is just a warning so a loop-through-resultset will not abort
>>> if one backend terminated on its own during the run.",
>>> like pg_signal_backend(). Thought?
>> 
>> +1.
>> 
>> Attached v10 patch.
> 
> Thanks for updating the patch!
> 
> I updated the patch as follows. Could you check the attached patch?

Thanks a lot!

I don't have any objections to your improvements.

Regards,



Re: Get memory contexts of an arbitrary backend process

From
Fujii Masao
Date:

On 2021/04/06 10:57, torikoshia wrote:
> I don't have any objections to your improvements.

Thanks for the check! I pushed the patch. Thanks a lot!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION