Thread: Creating a function for exposing memory usage of backend process

Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
Hi,

As you may know better than I do, backend processes sometimes use a lot
of memory because of the various reasons like caches, prepared
statements and cursors.
When supporting PostgreSQL, I face situations for investigating the
reason of memory bloat.

AFAIK, the way to examine it is attaching a debugger and call
MemoryContextStats(TopMemoryContext), however, I feel some difficulties
doing it:

   - some production environments don't allow us to run a debugger easily
   - many lines about memory contexts are hard to analyze

Using an extension(pg_stat_get_memory_context() in pg_cheat_funcs[1]),
we can get the view of the memory contexts, but it's the memory contexts
of the backend which executed the pg_stat_get_memory_context().


[user interface]
If we have a function exposing memory contexts for specified PID,
we can easily examine them.
I imagine a user interface something like this:

   =# SELECT * FROM pg_stat_get_backend_memory_context(PID);

            name           |       parent       | level | total_bytes | 
total_nblocks | free_bytes | free_chunks | used_bytes | some other 
attibutes..

--------------------------+--------------------+-------+-------------+---------------+------------+-------------+------------
  TopMemoryContext         |                    |     0 |       68720 |   
           5 |       9936 |          16 |      58784
  TopTransactionContext    | TopMemoryContext   |     1 |        8192 |   
           1 |       7720 |           0 |        472
  PL/pgSQL function        | TopMemoryContext   |     1 |       16384 |   
           2 |       5912 |           1 |      10472
  PL/pgSQL function        | TopMemoryContext   |     1 |       32768 |   
           3 |      15824 |           3 |      16944
  dynahash                 | TopMemoryContext   |     1 |        8192 |   
           1 |        512 |           0 |       7680
...


[rough implementation ideas and challenges]
I suppose communication between a process which runs
pg_stat_get_backend_memory_context()(referred to as A) and
target backend(referred to as B) is like:

   1. A sends a message to B and order to dump the memory contexts
   2. B dumps its memory contexts to some shared area
   3. A reads the shared area and returns it to the function invoker

To do so, there seem some challenges.

(1) how to share memory contexts information between backend processes
The amount of memory contexts greatly varies depending on the
situation, so it's not appropriate to share the memory contexts using
fixed shared memory.
Also using the file on 'stats_temp_directory' seems difficult thinking
about the background of the shared-memory based stats collector
proposal[2].
Instead, I'm thinking about using dsm_mq, which allows messages of
arbitrary length to be sent and receive.

(2) how to send messages wanting memory contexts
Communicating via signal seems simple but assigning a specific number
of signal for this purpose seems wasting.
I'm thinking about using fixed shared memory to put dsm_mq handle.
To send a message, A creates a dsm_mq and put the handle on the shared
memory area. When B founds a handle, B dumps the memory contexts to the
corresponding dsm_mq.

However, enabling B to find the handle needs to check the shared memory
periodically. I'm not sure the suitable location and timing for this
checking yet, and doubt this way of communication is acceptable because
it gives certain additional loads to all the backends.

(3) clarifying the necessary attributes
As far as reading the past disucussion[3], it's not so clear what kind
of information should be exposed regarding memory contexts.


As a first step, to deal with (3) I'd like to add
pg_stat_get_backend_memory_context() which target is limited to the
local backend process.


Thanks for reading and how do you think?


[1] 
https://github.com/MasaoFujii/pg_cheat_funcs#setof-record-pg_stat_get_memory_context
[2] 
https://www.postgresql.org/message-id/flat/20180629.173418.190173462.horiguchi.kyotaro@lab.ntt.co.jp
[3] 
https://www.postgresql.org/message-id/20190805171608.g22gxwmfr2r7uf6t%40alap3.anarazel.de


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/06/17 22:00, torikoshia wrote:
> Hi,
> 
> As you may know better than I do, backend processes sometimes use a lot
> of memory because of the various reasons like caches, prepared
> statements and cursors.
> When supporting PostgreSQL, I face situations for investigating the
> reason of memory bloat.
> 
> AFAIK, the way to examine it is attaching a debugger and call
> MemoryContextStats(TopMemoryContext), however, I feel some difficulties
> doing it:
> 
>    - some production environments don't allow us to run a debugger easily
>    - many lines about memory contexts are hard to analyze

Agreed. The feature to view how local memory contexts are used in
each process is very useful!


> Using an extension(pg_stat_get_memory_context() in pg_cheat_funcs[1]),
> we can get the view of the memory contexts, but it's the memory contexts
> of the backend which executed the pg_stat_get_memory_context().
> 
> 
> [user interface]
> If we have a function exposing memory contexts for specified PID,
> we can easily examine them.
> I imagine a user interface something like this:
> 
>    =# SELECT * FROM pg_stat_get_backend_memory_context(PID);

I'm afraid that this interface is not convenient when we want to monitor
the usages of local memory contexts for all the processes. For example,
I'd like to monitor how much memory is totally used to store prepared
statements information. For that purpose, I wonder if it's more convenient
to provide the view displaying the memory context usages for
all the processes.

To provide that view, all the processes need to save their local memory
context usages into the shared memory or the special files in their
convenient timing. For example, backends do that every end of query
execution (during waiting for new request from clients). OTOH,
the query on the view scans and displays all those information.

Of course there would be several issues in this idea. One issue is
the performance overhead caused when each process stores
its own memory context usage to somewhere. Even if backends do that
during waiting for new request from clients, non-negligible overhead
might happen. Performance test is necessary. Also this means that
we cannot see the memory context usage of the process in the middle of
query execution since it's saved at the end of query. If local memory bloat
occurs only during query execution and we want to investigate it, we still
need to use gdb to output the memory context information.

Another issue is that the large amount of shared memory might be
necessary to save the memory context usages for all the proceses. We can
save the usage information into the file instead, but which would cause
more overhead. If we use shared memory, the similar parameter like
track_activity_query_size might be necessary. That is, backends save
only the specified number of memory context information. If it's zero,
the feature is disabled.

Also we should reduce the same of information to save. For example,
instead of saving all memory context information like MemoryContextStats()
prints, it might be better to save the summary stats (per memory context
type) from them.


> 
>             name           |       parent       | level | total_bytes | total_nblocks | free_bytes | free_chunks |
used_bytes| some other attibutes..
 
>
--------------------------+--------------------+-------+-------------+---------------+------------+-------------+------------
>   TopMemoryContext         |                    |     0 |       68720 |           5 |       9936 |          16 |     
58784
>   TopTransactionContext    | TopMemoryContext   |     1 |        8192 |           1 |       7720 |           0
|       472
 
>   PL/pgSQL function        | TopMemoryContext   |     1 |       16384 |           2 |       5912 |           1 |     
10472
>   PL/pgSQL function        | TopMemoryContext   |     1 |       32768 |           3 |      15824 |           3 |     
16944
>   dynahash                 | TopMemoryContext   |     1 |        8192 |           1 |        512 |           0
|      7680
 
> ...
> 
> 
> [rough implementation ideas and challenges]
> I suppose communication between a process which runs
> pg_stat_get_backend_memory_context()(referred to as A) and
> target backend(referred to as B) is like:
> 
>    1. A sends a message to B and order to dump the memory contexts
>    2. B dumps its memory contexts to some shared area
>    3. A reads the shared area and returns it to the function invoker
> 
> To do so, there seem some challenges.
> 
> (1) how to share memory contexts information between backend processes
> The amount of memory contexts greatly varies depending on the
> situation, so it's not appropriate to share the memory contexts using
> fixed shared memory.
> Also using the file on 'stats_temp_directory' seems difficult thinking
> about the background of the shared-memory based stats collector
> proposal[2].
> Instead, I'm thinking about using dsm_mq, which allows messages of
> arbitrary length to be sent and receive.
> 
> (2) how to send messages wanting memory contexts
> Communicating via signal seems simple but assigning a specific number
> of signal for this purpose seems wasting.
> I'm thinking about using fixed shared memory to put dsm_mq handle.
> To send a message, A creates a dsm_mq and put the handle on the shared
> memory area. When B founds a handle, B dumps the memory contexts to the
> corresponding dsm_mq.
> 
> However, enabling B to find the handle needs to check the shared memory
> periodically. I'm not sure the suitable location and timing for this
> checking yet, and doubt this way of communication is acceptable because
> it gives certain additional loads to all the backends.
> 
> (3) clarifying the necessary attributes
> As far as reading the past disucussion[3], it's not so clear what kind
> of information should be exposed regarding memory contexts.
> 
> 
> As a first step, to deal with (3) I'd like to add
> pg_stat_get_backend_memory_context() which target is limited to the
> local backend process.

+1


Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Robert Haas
Date:
On Wed, Jun 17, 2020 at 11:56 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> > As a first step, to deal with (3) I'd like to add
> > pg_stat_get_backend_memory_context() which target is limited to the
> > local backend process.
>
> +1

+1 from me, too. Something that exposed this via shared memory would
be quite useful, and there are other things we'd like to expose
similarly, such as the plan(s) from the running queries, or even just
the untruncated query string(s). I'd like to have a good
infrastructure for that sort of thing, but I think it's quite tricky
to do properly. You need a variable-size chunk of shared memory, so
probably it has to use DSM somehow, and you need to update it as
things change, but if you update it too frequently performance will
stink. If you ping processes to do the updates, how do you know when
they've completed them, and what happens if they don't respond in a
timely fashion? These are probably all solvable problems, but I don't
think they are very easy ones.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Creating a function for exposing memory usage of backend process

From
Kasahara Tatsuhito
Date:
Hi !

On Thu, Jun 18, 2020 at 12:56 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> Agreed. The feature to view how local memory contexts are used in
> each process is very useful!
+1

> >    =# SELECT * FROM pg_stat_get_backend_memory_context(PID);
>
> I'm afraid that this interface is not convenient when we want to monitor
> the usages of local memory contexts for all the processes. For example,
> I'd like to monitor how much memory is totally used to store prepared
> statements information. For that purpose, I wonder if it's more convenient
> to provide the view displaying the memory context usages for
> all the processes.
How about separating a function that examines memory consumption
trends for all processes and a function that examines memory
consumption for a particular phase of a particular process?

For the former, as Fujii said, the function shows the output limited
information for each context type. All processes calculation and
output the information at idle status.

I think the latter is useful for debugging and other purposes.
For example, imagine preparing a function for registration like the following.
=# SELECT pg_stat_get_backend_memory_context_regist (pid, context,
max_children, calc_point)

pid: A target process
context: The top level of the context of interest
max_children: Maximum number of output for the target context's children
 (similar to MemoryContextStatsInternal()'s max_children)
calc_point: Single or multiple position(s) to calculate and output
context information
 (Existing hooks such as planner_hook, executor_start, etc.. could be used. )

This function informs the target PID to output the information of the
specified context at the specified calc_point.
When the target PID process reaches the calc_point, it calculates and
output the context information one time to a file or DSM.

(Currently PostgreSQL has no formal ways of externally modifying the
parameters of a particular process, so it may need to be
implemented...)

Sometimes I want to know the memory usage in the planning phase or
others with a query_string and/or plan_tree that before target process
move to the idle status.
So it would be nice to retrieve memory usage at some arbitrary point in time !

Regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com



Re: Creating a function for exposing memory usage of backend process

From
Bharath Rupireddy
Date:
Hi,

While going through the mail chain on relation, plan and catalogue
caching [1], I'm thinking on the lines that is there a way to know the
current relation, plan and catalogue cache sizes? If there is a way
already,  please ignore this and it would be grateful if someone point
me to that.

Posting this here as I felt it's relevant.

If there is no such way to know the cache sizes and other info such as
statistics, number of entries, cache misses, hits etc.  can the
approach discussed here be applied?

If the user knows the cache statistics and other information, may be
we can allow user to take appropriate actions such as allowing him to
delete few entries through a command or some other way.

I'm sorry, If I'm diverting the topic being discussed in this mail
thread, please ignore if it is irrelevant.

[1] - https://www.postgresql.org/message-id/flat/20161219.201505.11562604.horiguchi.kyotaro%40lab.ntt.co.jp

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Creating a function for exposing memory usage of backend process

From
Kasahara Tatsuhito
Date:
Hi,

On Fri, Jun 26, 2020 at 3:42 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> While going through the mail chain on relation, plan and catalogue
> caching [1], I'm thinking on the lines that is there a way to know the
> current relation, plan and catalogue cache sizes? If there is a way
> already,  please ignore this and it would be grateful if someone point
> me to that.
AFAIK the only way to get statistics on PostgreSQL's backend  internal
local memory usage is to use MemoryContextStats() via gdb to output
the information to the log, so far.

> If there is no such way to know the cache sizes and other info such as
> statistics, number of entries, cache misses, hits etc.  can the
> approach discussed here be applied?
I think it's partially yes.

> If the user knows the cache statistics and other information, may be
> we can allow user to take appropriate actions such as allowing him to
> delete few entries through a command or some other way.
Yeah, one of the purposes of the features we are discussing here is to
use them for such situation.

Regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com



Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
On 2020-06-20 03:11, Robert Haas wrote:
> On Wed, Jun 17, 2020 at 11:56 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>> > As a first step, to deal with (3) I'd like to add
>> > pg_stat_get_backend_memory_context() which target is limited to the
>> > local backend process.
>> 
>> +1
> 
> +1 from me, too.

Attached a patch that adds a function exposing memory usage of local
backend.

It's almost same as pg_cheat_funcs's pg_stat_get_memory_context().
I've also added MemoryContexts identifier because it seems useful to
distinguish the same kind of memory contexts.

For example, when there are many prepared statements we can
distinguish them using identifiers, which shows the cached query.

   =# SELECT name, ident FROM pg_stat_get_memory_contexts() WHERE name = 
'CachedPlanSource';
          name       |            ident
   ------------------+--------------------------------
    CachedPlanSource | PREPARE q1(text) AS SELECT ..;
    CachedPlanSource | PREPARE q2(text) AS SELECT ..;
   (2 rows)


> Something that exposed this via shared memory would
> be quite useful, and there are other things we'd like to expose
> similarly, such as the plan(s) from the running queries, or even just
> the untruncated query string(s). I'd like to have a good
> infrastructure for that sort of thing, but I think it's quite tricky
> to do properly. You need a variable-size chunk of shared memory, so
> probably it has to use DSM somehow, and you need to update it as
> things change, but if you update it too frequently performance will
> stink. If you ping processes to do the updates, how do you know when
> they've completed them, and what happens if they don't respond in a
> timely fashion? These are probably all solvable problems, but I don't
> think they are very easy ones.

Thanks for your comments!

It seems hard as you pointed out.
I'm going to consider it along with the advice of Fujii and Kasahara.

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachment

Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/06/29 12:01, torikoshia wrote:
> On 2020-06-20 03:11, Robert Haas wrote:
>> On Wed, Jun 17, 2020 at 11:56 PM Fujii Masao
>> <masao.fujii@oss.nttdata.com> wrote:
>>> > As a first step, to deal with (3) I'd like to add
>>> > pg_stat_get_backend_memory_context() which target is limited to the
>>> > local backend process.
>>>
>>> +1
>>
>> +1 from me, too.
> 
> Attached a patch that adds a function exposing memory usage of local
> backend.

Thanks for the patch!
Could you add this patch to Commitfest 2020-07?

> 
> It's almost same as pg_cheat_funcs's pg_stat_get_memory_context().

This patch provides only the function, but isn't it convenient to
provide the view like pg_shmem_allocations?


> I've also added MemoryContexts identifier because it seems useful to
> distinguish the same kind of memory contexts.

Sounds good. But isn't it better to document each column?
Otherwise, users cannot undersntad what "ident" column indicates.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Bharath Rupireddy
Date:
> > If there is no such way to know the cache sizes and other info such as
> > statistics, number of entries, cache misses, hits etc.  can the
> > approach discussed here be applied?
> I think it's partially yes.
>

> > If the user knows the cache statistics and other information, may be
> > we can allow user to take appropriate actions such as allowing him to
> > delete few entries through a command or some other way.
> Yeah, one of the purposes of the features we are discussing here is to
> use them for such situation.
>

Thanks Kasahara for the response. I will try to understand more about
getting the cache statistics and
also will study the possibility of applying this approach being
discussed here in this thread.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
On Mon, Jun 29, 2020 at 3:13 PM Fujii Masao 
<masao.fujii@oss.nttdata.com> wrote:

> Could you add this patch to Commitfest 2020-07?

Thanks for notifying, I've added it.
BTW, I registered you as an author because this patch used
lots of pg_cheat_funcs' codes.

   https://commitfest.postgresql.org/28/2622/

> This patch provides only the function, but isn't it convenient to
> provide the view like pg_shmem_allocations?

> Sounds good. But isn't it better to document each column?
> Otherwise, users cannot undersntad what "ident" column indicates.

Agreed.
Attached a patch for creating a view for local memory context
and its explanation on the document.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachment

Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/07/01 14:48, torikoshia wrote:
> On Mon, Jun 29, 2020 at 3:13 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> 
>> Could you add this patch to Commitfest 2020-07?
> 
> Thanks for notifying, I've added it.
> BTW, I registered you as an author because this patch used
> lots of pg_cheat_funcs' codes.
> 
>    https://commitfest.postgresql.org/28/2622/

Thanks!

> 
>> This patch provides only the function, but isn't it convenient to
>> provide the view like pg_shmem_allocations?
> 
>> Sounds good. But isn't it better to document each column?
>> Otherwise, users cannot undersntad what "ident" column indicates.
> 
> Agreed.
> Attached a patch for creating a view for local memory context
> and its explanation on the document.

Thanks for updating the patch!

You treat pg_stat_local_memory_contexts view as a dynamic statistics view.
But isn't it better to treat it as just system view like pg_shmem_allocations
or pg_prepared_statements  because it's not statistics information? If yes,
we would need to rename the view, move the documentation from
monitoring.sgml to catalogs.sgml, and move the code from pgstat.c to
the more appropriate source file.

+    tupdesc = rsinfo->setDesc;
+    tupstore = rsinfo->setResult;

These seem not to be necessary.

+    /*
+     * It seems preferable to label dynahash contexts with just the hash table
+     * name.  Those are already unique enough, so the "dynahash" part isn't
+     * very helpful, and this way is more consistent with pre-v11 practice.
+     */
+    if (ident && strcmp(name, "dynahash") == 0)
+    {
+        name = ident;
+        ident = NULL;
+    }

IMO it seems better to report both name and ident even in the case of
dynahash than report only ident (as name). We can easily understand
the context is used for dynahash when it's reported. But you think it's
better to report NULL rather than "dynahash"?

+/* ----------
+ * The max bytes for showing identifiers of MemoryContext.
+ * ----------
+ */
+#define MEMORY_CONTEXT_IDENT_SIZE    1024

Do we really need this upper size limit? Could you tell me why
this is necessary?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Daniel Gustafsson
Date:
> On 1 Jul 2020, at 07:48, torikoshia <torikoshia@oss.nttdata.com> wrote:

> Attached a patch for creating a view for local memory context
> and its explanation on the document.

For the next version (if there will be one), please remove the catversion bump
from the patch as it will otherwise just break patch application without
constant rebasing (as it's done now).  The committer will handle the catversion
change if/when it gets committed.

cheers ./daniel


Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
On Wed, Jul 1, 2020 at 4:43 PM Fujii Masao <masao.fujii@oss.nttdata.com> 
wrote:

Thanks for reviewing!

> You treat pg_stat_local_memory_contexts view as a dynamic statistics 
> view.
> But isn't it better to treat it as just system view like 
> pg_shmem_allocations
> or pg_prepared_statements  because it's not statistics information? If 
> yes,
> we would need to rename the view, move the documentation from
> monitoring.sgml to catalogs.sgml, and move the code from pgstat.c to
> the more appropriate source file.

Agreed.
At first, I thought not only statistical but dynamic information about 
exactly
what is going on was OK when reading the sentence on the manual below.

> PostgreSQL also supports reporting dynamic information about exactly 
> what is going on in the system right now, such as the exact command 
> currently being executed by other server processes, and which other 
> connections exist in the system. This facility is independent of the 
> collector process.

However, now I feel something strange because existing pg_stats_* views 
seem
to be per cluster information but the view I'm adding is about per 
backend
information.

I'm going to do some renaming and transportations.

- view name: pg_memory_contexts
- function name: pg_get_memory_contexts()
- source file: mainly src/backend/utils/mmgr/mcxt.c


> +       tupdesc = rsinfo->setDesc;
> +       tupstore = rsinfo->setResult;
> 
> These seem not to be necessary.

Thanks!

> +       /*
> +        * It seems preferable to label dynahash contexts with just the 
> hash table
> +        * name.  Those are already unique enough, so the "dynahash" 
> part isn't
> +        * very helpful, and this way is more consistent with pre-v11 
> practice.
> +        */
> +       if (ident && strcmp(name, "dynahash") == 0)
> +       {
> +               name = ident;
> +               ident = NULL;
> +       }
> 
> IMO it seems better to report both name and ident even in the case of
> dynahash than report only ident (as name). We can easily understand
> the context is used for dynahash when it's reported. But you think it's
> better to report NULL rather than "dynahash"?

These codes come from MemoryContextStatsPrint() and my intension is to
keep consistent with it.

> +/* ----------
> + * The max bytes for showing identifiers of MemoryContext.
> + * ----------
> + */
> +#define MEMORY_CONTEXT_IDENT_SIZE      1024
> 
> Do we really need this upper size limit? Could you tell me why
> this is necessary?

It also derived from MemoryContextStatsPrint().
I suppose it is for keeping readability of the log..

I'm going to follow the discussion on the mailing list and find why
these codes were introduced.
If there's no important reason to do the same in our context, I'll
change them.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
On 2020-07-01 20:47, Daniel Gustafsson wrote:

> For the next version (if there will be one), please remove the 
> catversion bump
> from the patch as it will otherwise just break patch application 
> without
> constant rebasing (as it's done now).  The committer will handle the 
> catversion
> change if/when it gets committed.
> 
> cheers ./daniel

Thanks for telling me it!
I'll do that way next time.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/07/01 22:15, torikoshia wrote:
> On Wed, Jul 1, 2020 at 4:43 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> 
> Thanks for reviewing!
> 
>> You treat pg_stat_local_memory_contexts view as a dynamic statistics view.
>> But isn't it better to treat it as just system view like pg_shmem_allocations
>> or pg_prepared_statements  because it's not statistics information? If yes,
>> we would need to rename the view, move the documentation from
>> monitoring.sgml to catalogs.sgml, and move the code from pgstat.c to
>> the more appropriate source file.
> 
> Agreed.
> At first, I thought not only statistical but dynamic information about exactly
> what is going on was OK when reading the sentence on the manual below.
> 
>> PostgreSQL also supports reporting dynamic information about exactly what is going on in the system right now, such
asthe exact command currently being executed by other server processes, and which other connections exist in the
system.This facility is independent of the collector process.
 
> 
> However, now I feel something strange because existing pg_stats_* views seem
> to be per cluster information but the view I'm adding is about per backend
> information.
> 
> I'm going to do some renaming and transportations.
> 
> - view name: pg_memory_contexts
> - function name: pg_get_memory_contexts()
> - source file: mainly src/backend/utils/mmgr/mcxt.c
> 
> 
>> +       tupdesc = rsinfo->setDesc;
>> +       tupstore = rsinfo->setResult;
>>
>> These seem not to be necessary.
> 
> Thanks!
> 
>> +       /*
>> +        * It seems preferable to label dynahash contexts with just the hash table
>> +        * name.  Those are already unique enough, so the "dynahash" part isn't
>> +        * very helpful, and this way is more consistent with pre-v11 practice.
>> +        */
>> +       if (ident && strcmp(name, "dynahash") == 0)
>> +       {
>> +               name = ident;
>> +               ident = NULL;
>> +       }
>>
>> IMO it seems better to report both name and ident even in the case of
>> dynahash than report only ident (as name). We can easily understand
>> the context is used for dynahash when it's reported. But you think it's
>> better to report NULL rather than "dynahash"?
> 
> These codes come from MemoryContextStatsPrint() and my intension is to
> keep consistent with it.

Ok, understood! I agree that it's strange to display different names
for the same memory context between this view and logging.

It's helpful if the comment there refers to MemoryContextStatsPrint()
and mentions the reason that you told.


> 
>> +/* ----------
>> + * The max bytes for showing identifiers of MemoryContext.
>> + * ----------
>> + */
>> +#define MEMORY_CONTEXT_IDENT_SIZE      1024
>>
>> Do we really need this upper size limit? Could you tell me why
>> this is necessary?
> 
> It also derived from MemoryContextStatsPrint().
> I suppose it is for keeping readability of the log..

Understood. I may want to change the upper limit of query size to display.
But at the first step, I'm fine with limitting 1024 bytes.


> 
> I'm going to follow the discussion on the mailing list and find why
> these codes were introduced.

https://www.postgresql.org/message-id/12319.1521999065%40sss.pgh.pa.us

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
On Wed, Jul 1, 2020 at 10:15 PM torikoshia <torikoshia@oss.nttdata.com> 
wrote:
> I'm going to do some renaming and transportations.
> 
> - view name: pg_memory_contexts
> - function name: pg_get_memory_contexts()
> - source file: mainly src/backend/utils/mmgr/mcxt.c

Attached an updated patch.

On Wed, Jul 1, 2020 at 10:58 PM Fujii Masao 
<masao.fujii@oss.nttdata.com> wrote:
> Ok, understood! I agree that it's strange to display different names
> for the same memory context between this view and logging.
> 
> It's helpful if the comment there refers to MemoryContextStatsPrint()
> and mentions the reason that you told.

Agreed. I changed the comments.

> > It also derived from MemoryContextStatsPrint().
> > I suppose it is for keeping readability of the log..
> 
> Understood. I may want to change the upper limit of query size to 
> display.
> But at the first step, I'm fine with limitting 1024 bytes.

Thanks, I've left it as it was.

> > I'm going to follow the discussion on the mailing list and find why
> > these codes were introduced.
> 
> https://www.postgresql.org/message-id/12319.1521999065%40sss.pgh.pa.us

Thanks for sharing!


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachment

Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/07/03 11:45, torikoshia wrote:
> On Wed, Jul 1, 2020 at 10:15 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>> I'm going to do some renaming and transportations.
>>
>> - view name: pg_memory_contexts

I like more specific name like pg_backend_memory_contexts.
But I'd like to hear more opinions about the name from others.


>> - function name: pg_get_memory_contexts()
>> - source file: mainly src/backend/utils/mmgr/mcxt.c
> 
> Attached an updated patch.

Thanks for updating the patch!

+       <structfield>level</structfield> <type>integer</type>

In catalog.sgml, "int4" and "int8" are used in other catalogs tables.
So "integer" in the above should be "int4"?

+       <structfield>total_bytes</structfield> <type>bigint</type>

"bigint" should be "int8"?

+       Identification information of the memory context. This field is truncated if the identification field is longer
than1024 characters
 

"characters" should be "bytes"?

It's a bit confusing to have both "This field" and "the identification field"
in one description. What about just "This field is truncated at 1024 bytes"?

+      <para>
+       Total bytes requested from malloc

Isn't it better not to use "malloc" in the description? For example,
what about something like "Total bytes allocated for this memory context"?

+#define PG_STAT_GET_MEMORY_CONTEXT_COLS     9

Isn't it better to rename this to PG_GET_MEMORY_CONTEXTS_COLS
for the consistency with the function name?

+    memset(nulls, 0, sizeof(nulls));

"values[]" also should be initialized with zero?

Regards,


-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
On Fri, Jul 3, 2020 at 7:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> 
wrote:

Thanks for your review!

> I like more specific name like pg_backend_memory_contexts.

Agreed.

When I was trying to add this function as statistics function,
I thought that naming pg_stat_getbackend_memory_context()
might make people regarded it as a "per-backend statistics
function", whose parameter is backend ID number.
So I removed "backend", but now there is no necessity to do
so.

> But I'd like to hear more opinions about the name from others.

I changed the name to pg_backend_memory_contexts for the time
being.


>> - function name: pg_get_memory_contexts()
>> - source file: mainly src/backend/utils/mmgr/mcxt.c


>> +       Identification information of the memory context. This field 
>> is truncated if the identification field is longer than 1024 
>> characters
> 
> "characters" should be "bytes"?

Fixed, but I used "characters" while referring to the
descriptions on the manual of pg_stat_activity.query
below.

| By default the query text is truncated at 1024 characters;

It has nothing to do with this thread, but considering
multibyte characters, it also may be better to change it
to "bytes".


Regarding the other comments, I revised the patch as you pointed.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachment

Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/07/06 12:12, torikoshia wrote:
> On Fri, Jul 3, 2020 at 7:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> 
> Thanks for your review!
> 
>> I like more specific name like pg_backend_memory_contexts.
> 
> Agreed.
> 
> When I was trying to add this function as statistics function,
> I thought that naming pg_stat_getbackend_memory_context()
> might make people regarded it as a "per-backend statistics
> function", whose parameter is backend ID number.
> So I removed "backend", but now there is no necessity to do
> so.
> 
>> But I'd like to hear more opinions about the name from others.
> 
> I changed the name to pg_backend_memory_contexts for the time
> being.

+1


>>> - function name: pg_get_memory_contexts()
>>> - source file: mainly src/backend/utils/mmgr/mcxt.c
> 
> 
>>> +       Identification information of the memory context. This field is truncated if the identification field is
longerthan 1024 characters
 
>>
>> "characters" should be "bytes"?
> 
> Fixed, but I used "characters" while referring to the
> descriptions on the manual of pg_stat_activity.query
> below.
> 
> | By default the query text is truncated at 1024 characters;
> 
> It has nothing to do with this thread, but considering
> multibyte characters, it also may be better to change it
> to "bytes".

Yeah, I agree we should write the separate patch fixing that. You will?
If not, I will do that later.


> Regarding the other comments, I revised the patch as you pointed.

Thanks for updating the patch! The patch basically looks good to me/
Here are some minor comments:

+#define MEMORY_CONTEXT_IDENT_SIZE    1024

This macro varible name sounds like the maximum allowed length of ident that
each menory context has. But actually this limits the maximum bytes of ident
to display. So I think that it's better to rename this macro to something like
MEMORY_CONTEXT_IDENT_DISPLAY_SIZE. Thought?

+#define PG_GET_MEMORY_CONTEXTS_COLS    9
+    Datum        values[PG_GET_MEMORY_CONTEXTS_COLS];
+    bool        nulls[PG_GET_MEMORY_CONTEXTS_COLS];

This macro variable name should be PG_GET_BACKEND_MEMORY_CONTEXTS_COLS
for the consistency with the function name?

+{ oid => '2282', descr => 'statistics: information about all memory contexts of local backend',

Isn't it better to remove "statistics: " from the above description?

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>parent</structfield> <type>text</type>

There can be multiple memory contexts with the same name. So I'm afraid
that it's difficult to identify the actual parent memory context from this
"parent" column. This is ok when logging memory contexts by calling
MemoryContextStats() via gdb. Because child memory contexts are printed
just under their parent, with indents. But this doesn't work in the view.
To identify the actual parent memory or calculate the memory contexts tree
from the view, we might need to assign unique ID to each memory context
and display it. But IMO this is overkill. So I'm fine with current "parent"
column. Thought? Do you have any better idea?


Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
On 2020-07-06 15:16, Fujii Masao wrote:
> On 2020/07/06 12:12, torikoshia wrote:
>> On Fri, Jul 3, 2020 at 7:33 PM Fujii Masao 
>> <masao.fujii@oss.nttdata.com> wrote:
>> 
>> Thanks for your review!
>> 
>>> I like more specific name like pg_backend_memory_contexts.
>> 
>> Agreed.
>> 
>> When I was trying to add this function as statistics function,
>> I thought that naming pg_stat_getbackend_memory_context()
>> might make people regarded it as a "per-backend statistics
>> function", whose parameter is backend ID number.
>> So I removed "backend", but now there is no necessity to do
>> so.
>> 
>>> But I'd like to hear more opinions about the name from others.
>> 
>> I changed the name to pg_backend_memory_contexts for the time
>> being.
> 
> +1
> 
> 
>>>> - function name: pg_get_memory_contexts()
>>>> - source file: mainly src/backend/utils/mmgr/mcxt.c
>> 
>> 
>>>> +       Identification information of the memory context. This field 
>>>> is truncated if the identification field is longer than 1024 
>>>> characters
>>> 
>>> "characters" should be "bytes"?
>> 
>> Fixed, but I used "characters" while referring to the
>> descriptions on the manual of pg_stat_activity.query
>> below.
>> 
>> | By default the query text is truncated at 1024 characters;
>> 
>> It has nothing to do with this thread, but considering
>> multibyte characters, it also may be better to change it
>> to "bytes".
> 
> Yeah, I agree we should write the separate patch fixing that. You will?
> If not, I will do that later.

Thanks, I will try it!

>> Regarding the other comments, I revised the patch as you pointed.
> 
> Thanks for updating the patch! The patch basically looks good to me/
> Here are some minor comments:
> 
> +#define MEMORY_CONTEXT_IDENT_SIZE    1024
> 
> This macro varible name sounds like the maximum allowed length of ident 
> that
> each menory context has. But actually this limits the maximum bytes of 
> ident
> to display. So I think that it's better to rename this macro to 
> something like
> MEMORY_CONTEXT_IDENT_DISPLAY_SIZE. Thought?

Agreed.
MEMORY_CONTEXT_IDENT_DISPLAY_SIZE seems more accurate.

> +#define PG_GET_MEMORY_CONTEXTS_COLS    9
> +    Datum        values[PG_GET_MEMORY_CONTEXTS_COLS];
> +    bool        nulls[PG_GET_MEMORY_CONTEXTS_COLS];
> 
> This macro variable name should be PG_GET_BACKEND_MEMORY_CONTEXTS_COLS
> for the consistency with the function name?

Thanks! Fixed it.

> 
> +{ oid => '2282', descr => 'statistics: information about all memory
> contexts of local backend',
> 
> Isn't it better to remove "statistics: " from the above description?

Yeah, it's my oversight.

> 
> +     <row>
> +      <entry role="catalog_table_entry"><para 
> role="column_definition">
> +       <structfield>parent</structfield> <type>text</type>
> 
> There can be multiple memory contexts with the same name. So I'm afraid
> that it's difficult to identify the actual parent memory context from 
> this
> "parent" column. This is ok when logging memory contexts by calling
> MemoryContextStats() via gdb. Because child memory contexts are printed
> just under their parent, with indents. But this doesn't work in the 
> view.
> To identify the actual parent memory or calculate the memory contexts 
> tree
> from the view, we might need to assign unique ID to each memory context
> and display it. But IMO this is overkill. So I'm fine with current 
> "parent"
> column. Thought? Do you have any better idea?

Indeed.
I also feel it's not usual to assign a unique ID, which
can vary every time the view displayed.

We show each context using a recursive function and this is
a kind of depth-first search. So, as far as I understand,
we can identify its parent using both the "parent" column
and the order of the rows.

Documenting these things may worth for who want to identify
the relation between parents and children.

Of course, in the relational model, the order of relation
does not have meaning so it's also unusual in this sense..


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachment

Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/07/07 22:02, torikoshia wrote:
> On 2020-07-06 15:16, Fujii Masao wrote:
>> On 2020/07/06 12:12, torikoshia wrote:
>>> On Fri, Jul 3, 2020 at 7:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>
>>> Thanks for your review!
>>>
>>>> I like more specific name like pg_backend_memory_contexts.
>>>
>>> Agreed.
>>>
>>> When I was trying to add this function as statistics function,
>>> I thought that naming pg_stat_getbackend_memory_context()
>>> might make people regarded it as a "per-backend statistics
>>> function", whose parameter is backend ID number.
>>> So I removed "backend", but now there is no necessity to do
>>> so.
>>>
>>>> But I'd like to hear more opinions about the name from others.
>>>
>>> I changed the name to pg_backend_memory_contexts for the time
>>> being.
>>
>> +1
>>
>>
>>>>> - function name: pg_get_memory_contexts()
>>>>> - source file: mainly src/backend/utils/mmgr/mcxt.c
>>>
>>>
>>>>> +       Identification information of the memory context. This field is truncated if the identification field is
longerthan 1024 characters
 
>>>>
>>>> "characters" should be "bytes"?
>>>
>>> Fixed, but I used "characters" while referring to the
>>> descriptions on the manual of pg_stat_activity.query
>>> below.
>>>
>>> | By default the query text is truncated at 1024 characters;
>>>
>>> It has nothing to do with this thread, but considering
>>> multibyte characters, it also may be better to change it
>>> to "bytes".
>>
>> Yeah, I agree we should write the separate patch fixing that. You will?
>> If not, I will do that later.
> 
> Thanks, I will try it!

Thanks!


> 
>>> Regarding the other comments, I revised the patch as you pointed.
>>
>> Thanks for updating the patch! The patch basically looks good to me/
>> Here are some minor comments:
>>
>> +#define MEMORY_CONTEXT_IDENT_SIZE    1024
>>
>> This macro varible name sounds like the maximum allowed length of ident that
>> each menory context has. But actually this limits the maximum bytes of ident
>> to display. So I think that it's better to rename this macro to something like
>> MEMORY_CONTEXT_IDENT_DISPLAY_SIZE. Thought?
> 
> Agreed.
> MEMORY_CONTEXT_IDENT_DISPLAY_SIZE seems more accurate.
> 
>> +#define PG_GET_MEMORY_CONTEXTS_COLS    9
>> +    Datum        values[PG_GET_MEMORY_CONTEXTS_COLS];
>> +    bool        nulls[PG_GET_MEMORY_CONTEXTS_COLS];
>>
>> This macro variable name should be PG_GET_BACKEND_MEMORY_CONTEXTS_COLS
>> for the consistency with the function name?
> 
> Thanks! Fixed it.

Thanks for updating the patch! It basically looks good to me.

+  <indexterm zone="view-pg-backend-memory-contexts">
+   <primary>backend memory contexts</primary>
+  </indexterm>

Do we need this indexterm?


> 
>>
>> +{ oid => '2282', descr => 'statistics: information about all memory
>> contexts of local backend',
>>
>> Isn't it better to remove "statistics: " from the above description?
> 
> Yeah, it's my oversight.
> 
>>
>> +     <row>
>> +      <entry role="catalog_table_entry"><para role="column_definition">
>> +       <structfield>parent</structfield> <type>text</type>
>>
>> There can be multiple memory contexts with the same name. So I'm afraid
>> that it's difficult to identify the actual parent memory context from this
>> "parent" column. This is ok when logging memory contexts by calling
>> MemoryContextStats() via gdb. Because child memory contexts are printed
>> just under their parent, with indents. But this doesn't work in the view.
>> To identify the actual parent memory or calculate the memory contexts tree
>> from the view, we might need to assign unique ID to each memory context
>> and display it. But IMO this is overkill. So I'm fine with current "parent"
>> column. Thought? Do you have any better idea?
> 
> Indeed.
> I also feel it's not usual to assign a unique ID, which
> can vary every time the view displayed.

Agreed. Displaying such ID would be more confusing to users.
Ok, let's leave the code as it is.

Another comment about parent column is: dynahash can be parent?
If yes, its indent instead of name should be displayed in parent column?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Andres Freund
Date:
Hi,

I think this is an incredibly useful feature.


On 2020-07-07 22:02:10 +0900, torikoshia wrote:
> > There can be multiple memory contexts with the same name. So I'm afraid
> > that it's difficult to identify the actual parent memory context from
> > this
> > "parent" column. This is ok when logging memory contexts by calling
> > MemoryContextStats() via gdb. Because child memory contexts are printed
> > just under their parent, with indents. But this doesn't work in the
> > view.
> > To identify the actual parent memory or calculate the memory contexts
> > tree
> > from the view, we might need to assign unique ID to each memory context
> > and display it. But IMO this is overkill. So I'm fine with current
> > "parent"
> > column. Thought? Do you have any better idea?
> 
> Indeed.
> I also feel it's not usual to assign a unique ID, which
> can vary every time the view displayed.

Hm. I wonder if we just could include the address of the context
itself. There might be reasons not to do so (e.g. security concerns
about leaked pointers making attacks easier), but I think it's worth
considering.


> We show each context using a recursive function and this is
> a kind of depth-first search. So, as far as I understand,
> we can identify its parent using both the "parent" column
> and the order of the rows.
> 
> Documenting these things may worth for who want to identify
> the relation between parents and children.
> 
> Of course, in the relational model, the order of relation
> does not have meaning so it's also unusual in this sense..

It makes it pretty hard to write summarizing queries, so I am not a huge
fan of just relying on the order.


> +/*
> + * PutMemoryContextsStatsTupleStore
> + *        One recursion level for pg_get_backend_memory_contexts.
> + */
> +static void
> +PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
> +                                TupleDesc tupdesc, MemoryContext context,
> +                                MemoryContext parent, int level)
> +{
> +#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS    9
> +    Datum        values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
> +    bool        nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
> +    MemoryContextCounters stat;
> +    MemoryContext child;
> +    const char *name = context->name;
> +    const char *ident = context->ident;
> +
> +    if (context == NULL)
> +        return;
> +
> +    /*
> +     * To be consistent with logging output, we label dynahash contexts
> +     * with just the hash table name as with MemoryContextStatsPrint().
> +     */
> +    if (ident && strcmp(name, "dynahash") == 0)
> +    {
> +        name = ident;
> +        ident = NULL;
> +    }
> +
> +    /* Examine the context itself */
> +    memset(&stat, 0, sizeof(stat));
> +    (*context->methods->stats) (context, NULL, (void *) &level, &stat);
> +
> +    memset(values, 0, sizeof(values));
> +    memset(nulls, 0, sizeof(nulls));
> +
> +    values[0] = CStringGetTextDatum(name);
> +
> +    if (ident)
> +    {
> +        int        idlen = strlen(ident);
> +        char        clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
> +
> +        /*
> +         * Some identifiers such as SQL query string can be very long,
> +         * truncate oversize identifiers.
> +         */
> +        if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
> +            idlen = pg_mbcliplen(ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
> +

Why?

Greetings,

Andres Freund



Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
On 2020-07-08 22:12, Fujii Masao wrote:
> Thanks for updating the patch! It basically looks good to me.
> 
> +  <indexterm zone="view-pg-backend-memory-contexts">
> +   <primary>backend memory contexts</primary>
> +  </indexterm>
> 
> Do we need this indexterm?

Thanks! it's not necessary. I remove this indexterm.


>>> 
>>> +{ oid => '2282', descr => 'statistics: information about all memory
>>> contexts of local backend',
>>> 
>>> Isn't it better to remove "statistics: " from the above description?
>> 
>> Yeah, it's my oversight.
>> 
>>> 
>>> +     <row>
>>> +      <entry role="catalog_table_entry"><para 
>>> role="column_definition">
>>> +       <structfield>parent</structfield> <type>text</type>
>>> 
>>> There can be multiple memory contexts with the same name. So I'm 
>>> afraid
>>> that it's difficult to identify the actual parent memory context from 
>>> this
>>> "parent" column. This is ok when logging memory contexts by calling
>>> MemoryContextStats() via gdb. Because child memory contexts are 
>>> printed
>>> just under their parent, with indents. But this doesn't work in the 
>>> view.
>>> To identify the actual parent memory or calculate the memory contexts 
>>> tree
>>> from the view, we might need to assign unique ID to each memory 
>>> context
>>> and display it. But IMO this is overkill. So I'm fine with current 
>>> "parent"
>>> column. Thought? Do you have any better idea?
>> 
>> Indeed.
>> I also feel it's not usual to assign a unique ID, which
>> can vary every time the view displayed.
> 
> Agreed. Displaying such ID would be more confusing to users.
> Ok, let's leave the code as it is.
> 
> Another comment about parent column is: dynahash can be parent?
> If yes, its indent instead of name should be displayed in parent 
> column?

I'm not sure yet, but considering the changes in the future, it seems
better to do so.

But if we add information for identifying parent-child relation like the
memory address suggested from Andres, it seems not necessary.

So I'd like to go back to this point.



Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
On 2020-07-09 02:03, Andres Freund wrote:
> Hi,
> 
> I think this is an incredibly useful feature.

Thanks for your kind comments and suggestion!


> On 2020-07-07 22:02:10 +0900, torikoshia wrote:
>> > There can be multiple memory contexts with the same name. So I'm afraid
>> > that it's difficult to identify the actual parent memory context from
>> > this
>> > "parent" column. This is ok when logging memory contexts by calling
>> > MemoryContextStats() via gdb. Because child memory contexts are printed
>> > just under their parent, with indents. But this doesn't work in the
>> > view.
>> > To identify the actual parent memory or calculate the memory contexts
>> > tree
>> > from the view, we might need to assign unique ID to each memory context
>> > and display it. But IMO this is overkill. So I'm fine with current
>> > "parent"
>> > column. Thought? Do you have any better idea?
>> 
>> Indeed.
>> I also feel it's not usual to assign a unique ID, which
>> can vary every time the view displayed.
> 
> Hm. I wonder if we just could include the address of the context
> itself. There might be reasons not to do so (e.g. security concerns
> about leaked pointers making attacks easier), but I think it's worth
> considering.


I tried exposing addresses of each context and their parent.
Attached a poc patch.

   =# SELECT name, address, parent_address, total_bytes FROM 
pg_backend_memory_contexts ;

            name           |  address  | parent_address | total_bytes
   --------------------------+-----------+----------------+-------------
    TopMemoryContext         | 0x1280da0 |                |       80800
    TopTransactionContext    | 0x1309040 | 0x1280da0      |        8192
    Prepared Queries         | 0x138a480 | 0x1280da0      |       16384
    Type information cache   | 0x134b8c0 | 0x1280da0      |       24624
    ...
    CacheMemoryContext       | 0x12cb390 | 0x1280da0      |     1048576
    CachedPlanSource         | 0x13c47f0 | 0x12cb390      |        4096
    CachedPlanQuery          | 0x13c9ae0 | 0x13c47f0      |        4096
    CachedPlanSource         | 0x13c7310 | 0x12cb390      |        4096
    CachedPlanQuery          | 0x13c1230 | 0x13c7310      |        4096
    ...


Now it's possible to identify the actual parent memory context even when
there are multiple memory contexts with the same name.

I'm not sure, but I'm also worrying about this might incur some security
related problems..

I'd like to hear more opinions about:

- whether information for identifying parent-child relation is necessary 
or it's an overkill
- if this information is necessary, memory address is suitable or other 
means like assigning unique numbers are required


>> +/*
>> + * PutMemoryContextsStatsTupleStore
>> + *        One recursion level for pg_get_backend_memory_contexts.
>> + */
>> +static void
>> +PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
>> +                                TupleDesc tupdesc, MemoryContext context,
>> +                                MemoryContext parent, int level)
>> +{
>> +#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS    9
>> +    Datum        values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
>> +    bool        nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
>> +    MemoryContextCounters stat;
>> +    MemoryContext child;
>> +    const char *name = context->name;
>> +    const char *ident = context->ident;
>> +
>> +    if (context == NULL)
>> +        return;
>> +
>> +    /*
>> +     * To be consistent with logging output, we label dynahash contexts
>> +     * with just the hash table name as with MemoryContextStatsPrint().
>> +     */
>> +    if (ident && strcmp(name, "dynahash") == 0)
>> +    {
>> +        name = ident;
>> +        ident = NULL;
>> +    }
>> +
>> +    /* Examine the context itself */
>> +    memset(&stat, 0, sizeof(stat));
>> +    (*context->methods->stats) (context, NULL, (void *) &level, &stat);
>> +
>> +    memset(values, 0, sizeof(values));
>> +    memset(nulls, 0, sizeof(nulls));
>> +
>> +    values[0] = CStringGetTextDatum(name);
>> +
>> +    if (ident)
>> +    {
>> +        int        idlen = strlen(ident);
>> +        char        clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
>> +
>> +        /*
>> +         * Some identifiers such as SQL query string can be very long,
>> +         * truncate oversize identifiers.
>> +         */
>> +        if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
>> +            idlen = pg_mbcliplen(ident, idlen, 
>> MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
>> +
> 
> Why?

As described below[1], too long messages caused problems in the past and 
now
MemoryContextStatsPrint() truncates ident, so I decided to truncate it 
also
here.

Do you think it's not necessary here?

[1] https://www.postgresql.org/message-id/12319.1521999065@sss.pgh.pa.us


Regards,


--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachment

Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/07/10 17:32, torikoshia wrote:
> On 2020-07-09 02:03, Andres Freund wrote:
>> Hi,
>>
>> I think this is an incredibly useful feature.
> 
> Thanks for your kind comments and suggestion!
> 
> 
>> On 2020-07-07 22:02:10 +0900, torikoshia wrote:
>>> > There can be multiple memory contexts with the same name. So I'm afraid
>>> > that it's difficult to identify the actual parent memory context from
>>> > this
>>> > "parent" column. This is ok when logging memory contexts by calling
>>> > MemoryContextStats() via gdb. Because child memory contexts are printed
>>> > just under their parent, with indents. But this doesn't work in the
>>> > view.
>>> > To identify the actual parent memory or calculate the memory contexts
>>> > tree
>>> > from the view, we might need to assign unique ID to each memory context
>>> > and display it. But IMO this is overkill. So I'm fine with current
>>> > "parent"
>>> > column. Thought? Do you have any better idea?
>>>
>>> Indeed.
>>> I also feel it's not usual to assign a unique ID, which
>>> can vary every time the view displayed.
>>
>> Hm. I wonder if we just could include the address of the context
>> itself. There might be reasons not to do so (e.g. security concerns
>> about leaked pointers making attacks easier), but I think it's worth
>> considering.
> 
> 
> I tried exposing addresses of each context and their parent.
> Attached a poc patch.
> 
>    =# SELECT name, address, parent_address, total_bytes FROM pg_backend_memory_contexts ;
> 
>             name           |  address  | parent_address | total_bytes
>    --------------------------+-----------+----------------+-------------
>     TopMemoryContext         | 0x1280da0 |                |       80800
>     TopTransactionContext    | 0x1309040 | 0x1280da0      |        8192
>     Prepared Queries         | 0x138a480 | 0x1280da0      |       16384
>     Type information cache   | 0x134b8c0 | 0x1280da0      |       24624
>     ...
>     CacheMemoryContext       | 0x12cb390 | 0x1280da0      |     1048576
>     CachedPlanSource         | 0x13c47f0 | 0x12cb390      |        4096
>     CachedPlanQuery          | 0x13c9ae0 | 0x13c47f0      |        4096
>     CachedPlanSource         | 0x13c7310 | 0x12cb390      |        4096
>     CachedPlanQuery          | 0x13c1230 | 0x13c7310      |        4096
>     ...
> 
> 
> Now it's possible to identify the actual parent memory context even when
> there are multiple memory contexts with the same name.
> 
> I'm not sure, but I'm also worrying about this might incur some security
> related problems..
> 
> I'd like to hear more opinions about:
> 
> - whether information for identifying parent-child relation is necessary or it's an overkill
> - if this information is necessary, memory address is suitable or other means like assigning unique numbers are
required

To consider this, I'd like to know what security issue can actually
happen when memory addresses are exposed. I have no idea about this..

Regards,


-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Kasahara Tatsuhito
Date:
Hi,

On Fri, Jul 10, 2020 at 5:32 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
> - whether information for identifying parent-child relation is necessary
> or it's an overkill
I think it's important to understand the parent-child relationship of
the context.
Personally, I often want to know the following two things ..

- In which life cycle is the target context? (Remaining as long as the
process is living? per query?)
- Does the target context belong to the correct (parent) context?

> - if this information is necessary, memory address is suitable or other
> means like assigning unique numbers are required
IMO, If each context can be uniquely identified (or easily guessed) by
"name" and "ident",
then I don't think the address information is necessary.
Instead, I like the way that directly shows the context name of the
parent, as in the 0005 patch.

Best regards

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com



Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
On 2020-07-30 15:13, Kasahara Tatsuhito wrote:
> Hi,
> 
> On Fri, Jul 10, 2020 at 5:32 PM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> - whether information for identifying parent-child relation is 
>> necessary
>> or it's an overkill
> I think it's important to understand the parent-child relationship of
> the context.
> Personally, I often want to know the following two things ..
> 
> - In which life cycle is the target context? (Remaining as long as the
> process is living? per query?)
> - Does the target context belong to the correct (parent) context?
> 
>> - if this information is necessary, memory address is suitable or 
>> other
>> means like assigning unique numbers are required
> IMO, If each context can be uniquely identified (or easily guessed) by
> "name" and "ident",
> then I don't think the address information is necessary.
> Instead, I like the way that directly shows the context name of the
> parent, as in the 0005 patch.

Thanks for your opinion!

I also feel it'll be sufficient to know not the exact memory context
of the parent but the name of the parent context.

And as Fujii-san told me in person, exposing memory address seems
not preferable considering there are security techniques like
address space layout randomization.



> On 2020-07-10 08:30:22 +0900, torikoshia wrote:
>> On 2020-07-08 22:12, Fujii Masao wrote:

>> Another comment about parent column is: dynahash can be parent?
>> If yes, its indent instead of name should be displayed in parent
>> column?

> I'm not sure yet, but considering the changes in the future, it seems
> better to do so.

Attached a patch which displays ident as parent when dynahash is a
parent.

I could not find the case when dynahash can be a parent so I tested it
using attached test purposed patch.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachment

Re: Creating a function for exposing memory usage of backend process

From
Robert Haas
Date:
On Fri, Jul 31, 2020 at 4:25 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
> And as Fujii-san told me in person, exposing memory address seems
> not preferable considering there are security techniques like
> address space layout randomization.

Yeah, exactly. ASLR wouldn't do anything to improve security if there
were no other security bugs, but there are, and some of those bugs are
harder to exploit if you don't know the precise memory addresses of
certain data structures. Similarly, exposing the addresses of our
internal data structures is harmless if we have no other security
bugs, but if we do, it might make those bugs easier to exploit. I
don't think this information is useful enough to justify taking that
risk.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Creating a function for exposing memory usage of backend process

From
Kasahara Tatsuhito
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            tested, passed

I tested the latest patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch) 
with the latest PG-version (199cec9779504c08aaa8159c6308283156547409) and test was passed.
It looks good to me.

The new status of this patch is: Ready for Committer

Re: Creating a function for exposing memory usage of backend process

From
Michael Paquier
Date:
On Fri, Jul 31, 2020 at 03:23:52PM -0400, Robert Haas wrote:
> On Fri, Jul 31, 2020 at 4:25 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>> And as Fujii-san told me in person, exposing memory address seems
>> not preferable considering there are security techniques like
>> address space layout randomization.
>
> Yeah, exactly. ASLR wouldn't do anything to improve security if there
> were no other security bugs, but there are, and some of those bugs are
> harder to exploit if you don't know the precise memory addresses of
> certain data structures. Similarly, exposing the addresses of our
> internal data structures is harmless if we have no other security
> bugs, but if we do, it might make those bugs easier to exploit. I
> don't think this information is useful enough to justify taking that
> risk.

FWIW, this is the class of issues where it is possible to print some
areas of memory, or even manipulate the stack so as it was possible to
pass down a custom pointer, so exposing the pointer locations is a
real risk, and this has happened in the past.  Anyway, it seems to me
that if this part is done, we could just make it superuser-only with
restrictive REVOKE privileges, but I am not sure that we have enough
user cases to justify this addition.
--
Michael

Attachment

Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
On 2020-08-08 10:44, Michael Paquier wrote:
> On Fri, Jul 31, 2020 at 03:23:52PM -0400, Robert Haas wrote:
>> On Fri, Jul 31, 2020 at 4:25 AM torikoshia 
>> <torikoshia@oss.nttdata.com> wrote:
>>> And as Fujii-san told me in person, exposing memory address seems
>>> not preferable considering there are security techniques like
>>> address space layout randomization.
>> 
>> Yeah, exactly. ASLR wouldn't do anything to improve security if there
>> were no other security bugs, but there are, and some of those bugs are
>> harder to exploit if you don't know the precise memory addresses of
>> certain data structures. Similarly, exposing the addresses of our
>> internal data structures is harmless if we have no other security
>> bugs, but if we do, it might make those bugs easier to exploit. I
>> don't think this information is useful enough to justify taking that
>> risk.
> 
> FWIW, this is the class of issues where it is possible to print some
> areas of memory, or even manipulate the stack so as it was possible to
> pass down a custom pointer, so exposing the pointer locations is a
> real risk, and this has happened in the past.  Anyway, it seems to me
> that if this part is done, we could just make it superuser-only with
> restrictive REVOKE privileges, but I am not sure that we have enough
> user cases to justify this addition.


Thanks for your comments!

I convinced that exposing pointer locations introduce security risks
and it seems better not to do so.

And I now feel identifying exact memory context by exposing memory
address or other means seems overkill.
Showing just the context name of the parent would be sufficient and
0007 pattch takes this way.


On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
> The following review has been posted through the commitfest 
> application:
> make installcheck-world:  tested, passed
> Implements feature:       tested, passed
> Spec compliant:           not tested
> Documentation:            tested, passed
> 
> I tested the latest
> patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
> with the latest PG-version (199cec9779504c08aaa8159c6308283156547409)
> and test was passed.
> It looks good to me.
> 
> The new status of this patch is: Ready for Committer

Thanks for your testing!


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/08/11 15:24, torikoshia wrote:
> On 2020-08-08 10:44, Michael Paquier wrote:
>> On Fri, Jul 31, 2020 at 03:23:52PM -0400, Robert Haas wrote:
>>> On Fri, Jul 31, 2020 at 4:25 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>>>> And as Fujii-san told me in person, exposing memory address seems
>>>> not preferable considering there are security techniques like
>>>> address space layout randomization.
>>>
>>> Yeah, exactly. ASLR wouldn't do anything to improve security if there
>>> were no other security bugs, but there are, and some of those bugs are
>>> harder to exploit if you don't know the precise memory addresses of
>>> certain data structures. Similarly, exposing the addresses of our
>>> internal data structures is harmless if we have no other security
>>> bugs, but if we do, it might make those bugs easier to exploit. I
>>> don't think this information is useful enough to justify taking that
>>> risk.
>>
>> FWIW, this is the class of issues where it is possible to print some
>> areas of memory, or even manipulate the stack so as it was possible to
>> pass down a custom pointer, so exposing the pointer locations is a
>> real risk, and this has happened in the past.  Anyway, it seems to me
>> that if this part is done, we could just make it superuser-only with
>> restrictive REVOKE privileges, but I am not sure that we have enough
>> user cases to justify this addition.
> 
> 
> Thanks for your comments!
> 
> I convinced that exposing pointer locations introduce security risks
> and it seems better not to do so.
> 
> And I now feel identifying exact memory context by exposing memory
> address or other means seems overkill.
> Showing just the context name of the parent would be sufficient and
> 0007 pattch takes this way.

Agreed.


> 
> 
> On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
>> The following review has been posted through the commitfest application:
>> make installcheck-world:  tested, passed
>> Implements feature:       tested, passed
>> Spec compliant:           not tested
>> Documentation:            tested, passed
>>
>> I tested the latest
>> patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
>> with the latest PG-version (199cec9779504c08aaa8159c6308283156547409)
>> and test was passed.
>> It looks good to me.
>>
>> The new status of this patch is: Ready for Committer
> 
> Thanks for your testing!

Thanks for updating the patch! Here are the review comments.

+     <row>
+      <entry><link
linkend="view-pg-backend-memory-contexts"><structname>pg_backend_memory_contexts</structname></link></entry>
+      <entry>backend memory contexts</entry>
+     </row>

The above is located just after pg_matviews entry. But it should be located
just after pg_available_extension_versions entry. Because the rows in the table
"System Views" should be located in alphabetical order.


+ <sect1 id="view-pg-backend-memory-contexts">
+  <title><structname>pg_backend_memory_contexts</structname></title>

Same as above.


+   The view <structname>pg_backend_memory_contexts</structname> displays all
+   the local backend memory contexts.

This description seems a bit confusing because maybe we can interpret this
as "... displays the memory contexts of all the local backends" wrongly. Thought?
What about the following description, instead?

     The view <structname>pg_backend_memory_contexts</structname> displays all
     the memory contexts of the server process attached to the current session.


+    const char *name = context->name;
+    const char *ident = context->ident;
+
+    if (context == NULL)
+        return;

The above check "context == NULL" is useless? If "context" is actually NULL,
"context->name" would cause segmentation fault, so ISTM that the check
will never be performed.

If "context" can be NULL, the check should be performed before accessing
to "contect". OTOH, if "context" must not be NULL per the specification of
PutMemoryContextStatsTupleStore(), assertion test checking
"context != NULL" should be used here, instead?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/08/17 21:14, Fujii Masao wrote:
> 
> 
> On 2020/08/11 15:24, torikoshia wrote:
>> On 2020-08-08 10:44, Michael Paquier wrote:
>>> On Fri, Jul 31, 2020 at 03:23:52PM -0400, Robert Haas wrote:
>>>> On Fri, Jul 31, 2020 at 4:25 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>>>>> And as Fujii-san told me in person, exposing memory address seems
>>>>> not preferable considering there are security techniques like
>>>>> address space layout randomization.
>>>>
>>>> Yeah, exactly. ASLR wouldn't do anything to improve security if there
>>>> were no other security bugs, but there are, and some of those bugs are
>>>> harder to exploit if you don't know the precise memory addresses of
>>>> certain data structures. Similarly, exposing the addresses of our
>>>> internal data structures is harmless if we have no other security
>>>> bugs, but if we do, it might make those bugs easier to exploit. I
>>>> don't think this information is useful enough to justify taking that
>>>> risk.
>>>
>>> FWIW, this is the class of issues where it is possible to print some
>>> areas of memory, or even manipulate the stack so as it was possible to
>>> pass down a custom pointer, so exposing the pointer locations is a
>>> real risk, and this has happened in the past.  Anyway, it seems to me
>>> that if this part is done, we could just make it superuser-only with
>>> restrictive REVOKE privileges, but I am not sure that we have enough
>>> user cases to justify this addition.
>>
>>
>> Thanks for your comments!
>>
>> I convinced that exposing pointer locations introduce security risks
>> and it seems better not to do so.
>>
>> And I now feel identifying exact memory context by exposing memory
>> address or other means seems overkill.
>> Showing just the context name of the parent would be sufficient and
>> 0007 pattch takes this way.
> 
> Agreed.
> 
> 
>>
>>
>> On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
>>> The following review has been posted through the commitfest application:
>>> make installcheck-world:  tested, passed
>>> Implements feature:       tested, passed
>>> Spec compliant:           not tested
>>> Documentation:            tested, passed
>>>
>>> I tested the latest
>>> patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
>>> with the latest PG-version (199cec9779504c08aaa8159c6308283156547409)
>>> and test was passed.
>>> It looks good to me.
>>>
>>> The new status of this patch is: Ready for Committer
>>
>> Thanks for your testing!
> 
> Thanks for updating the patch! Here are the review comments.
> 
> +     <row>
> +      <entry><link
linkend="view-pg-backend-memory-contexts"><structname>pg_backend_memory_contexts</structname></link></entry>
> +      <entry>backend memory contexts</entry>
> +     </row>
> 
> The above is located just after pg_matviews entry. But it should be located
> just after pg_available_extension_versions entry. Because the rows in the table
> "System Views" should be located in alphabetical order.
> 
> 
> + <sect1 id="view-pg-backend-memory-contexts">
> +  <title><structname>pg_backend_memory_contexts</structname></title>
> 
> Same as above.
> 
> 
> +   The view <structname>pg_backend_memory_contexts</structname> displays all
> +   the local backend memory contexts.
> 
> This description seems a bit confusing because maybe we can interpret this
> as "... displays the memory contexts of all the local backends" wrongly. Thought?
> What about the following description, instead?
> 
>      The view <structname>pg_backend_memory_contexts</structname> displays all
>      the memory contexts of the server process attached to the current session.
> 
> 
> +    const char *name = context->name;
> +    const char *ident = context->ident;
> +
> +    if (context == NULL)
> +        return;
> 
> The above check "context == NULL" is useless? If "context" is actually NULL,
> "context->name" would cause segmentation fault, so ISTM that the check
> will never be performed.
> 
> If "context" can be NULL, the check should be performed before accessing
> to "contect". OTOH, if "context" must not be NULL per the specification of
> PutMemoryContextStatsTupleStore(), assertion test checking
> "context != NULL" should be used here, instead?

Here is another comment.

+    if (parent == NULL)
+        nulls[2] = true;
+    else
+        /*
+         * We labeled dynahash contexts with just the hash table name.
+         * To make it possible to identify its parent, we also display
+         * parent's ident here.
+         */
+        if (parent->ident && strcmp(parent->name, "dynahash") == 0)
+            values[2] = CStringGetTextDatum(parent->ident);
+        else
+            values[2] = CStringGetTextDatum(parent->name);

PutMemoryContextsStatsTupleStore() doesn't need "parent" memory context,
but uses only the name of "parent" memory context. So isn't it better to use
"const char *parent" instead of "MemoryContext parent", as the argument of
the function? If we do that, we can simplify the above code.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
On 2020-08-17 21:19, Fujii Masao wrote:
> On 2020/08/17 21:14, Fujii Masao wrote:
>>> On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
>>>> The following review has been posted through the commitfest 
>>>> application:
>>>> make installcheck-world:  tested, passed
>>>> Implements feature:       tested, passed
>>>> Spec compliant:           not tested
>>>> Documentation:            tested, passed
>>>> 
>>>> I tested the latest
>>>> patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
>>>> with the latest PG-version 
>>>> (199cec9779504c08aaa8159c6308283156547409)
>>>> and test was passed.
>>>> It looks good to me.
>>>> 
>>>> The new status of this patch is: Ready for Committer
>>> 
>>> Thanks for your testing!
>> 
>> Thanks for updating the patch! Here are the review comments.

Thanks for reviewing!

>> 
>> +     <row>
>> +      <entry><link 
>> linkend="view-pg-backend-memory-contexts"><structname>pg_backend_memory_contexts</structname></link></entry>
>> +      <entry>backend memory contexts</entry>
>> +     </row>
>> 
>> The above is located just after pg_matviews entry. But it should be 
>> located
>> just after pg_available_extension_versions entry. Because the rows in 
>> the table
>> "System Views" should be located in alphabetical order.
>> 
>> 
>> + <sect1 id="view-pg-backend-memory-contexts">
>> +  <title><structname>pg_backend_memory_contexts</structname></title>
>> 
>> Same as above.

Modified both.

>> 
>> 
>> +   The view <structname>pg_backend_memory_contexts</structname> 
>> displays all
>> +   the local backend memory contexts.
>> 
>> This description seems a bit confusing because maybe we can interpret 
>> this
>> as "... displays the memory contexts of all the local backends" 
>> wrongly. Thought?
>> What about the following description, instead?

>>      The view <structname>pg_backend_memory_contexts</structname> 
>> displays all
>>      the memory contexts of the server process attached to the current 
>> session.

Thanks! it seems better.

>> +    const char *name = context->name;
>> +    const char *ident = context->ident;
>> +
>> +    if (context == NULL)
>> +        return;
>> 
>> The above check "context == NULL" is useless? If "context" is actually 
>> NULL,
>> "context->name" would cause segmentation fault, so ISTM that the check
>> will never be performed.
>> 
>> If "context" can be NULL, the check should be performed before 
>> accessing
>> to "contect". OTOH, if "context" must not be NULL per the 
>> specification of
>> PutMemoryContextStatsTupleStore(), assertion test checking
>> "context != NULL" should be used here, instead?

Yeah, "context" cannot be NULL because "context" must be 
TopMemoryContext
or it is already checked as not NULL as follows(child != NULL).

I added the assertion check.

| for (child = context->firstchild; child != NULL; child = 
child->nextchild)
| {
|  ...
|         PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
|                                                           child, 
parentname, level + 1);
| }

> Here is another comment.
> 
> +     if (parent == NULL)
> +             nulls[2] = true;
> +     else
> +             /*
> +              * We labeled dynahash contexts with just the hash table 
> name.
> +              * To make it possible to identify its parent, we also 
> display
> +              * parent's ident here.
> +              */
> +             if (parent->ident && strcmp(parent->name, "dynahash") == 
> 0)
> +                     values[2] = CStringGetTextDatum(parent->ident);
> +             else
> +                     values[2] = CStringGetTextDatum(parent->name);
> 
> PutMemoryContextsStatsTupleStore() doesn't need "parent" memory 
> context,
> but uses only the name of "parent" memory context. So isn't it better 
> to use
> "const char *parent" instead of "MemoryContext parent", as the argument 
> of
> the function? If we do that, we can simplify the above code.

Thanks, the attached patch adopted the advice.

However, since PutMemoryContextsStatsTupleStore() used not only the name
but also the ident of the "parent", I could not help but adding similar
codes before calling the function.
The total amount of codes and complexity seem not to change so much.

Any thoughts? Am I misunderstanding something?


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachment

Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/08/18 18:41, torikoshia wrote:
> On 2020-08-17 21:19, Fujii Masao wrote:
>> On 2020/08/17 21:14, Fujii Masao wrote:
>>>> On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
>>>>> The following review has been posted through the commitfest application:
>>>>> make installcheck-world:  tested, passed
>>>>> Implements feature:       tested, passed
>>>>> Spec compliant:           not tested
>>>>> Documentation:            tested, passed
>>>>>
>>>>> I tested the latest
>>>>> patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
>>>>> with the latest PG-version (199cec9779504c08aaa8159c6308283156547409)
>>>>> and test was passed.
>>>>> It looks good to me.
>>>>>
>>>>> The new status of this patch is: Ready for Committer
>>>>
>>>> Thanks for your testing!
>>>
>>> Thanks for updating the patch! Here are the review comments.
> 
> Thanks for reviewing!
> 
>>>
>>> +     <row>
>>> +      <entry><link
linkend="view-pg-backend-memory-contexts"><structname>pg_backend_memory_contexts</structname></link></entry>
>>> +      <entry>backend memory contexts</entry>
>>> +     </row>
>>>
>>> The above is located just after pg_matviews entry. But it should be located
>>> just after pg_available_extension_versions entry. Because the rows in the table
>>> "System Views" should be located in alphabetical order.
>>>
>>>
>>> + <sect1 id="view-pg-backend-memory-contexts">
>>> +  <title><structname>pg_backend_memory_contexts</structname></title>
>>>
>>> Same as above.
> 
> Modified both.
> 
>>>
>>>
>>> +   The view <structname>pg_backend_memory_contexts</structname> displays all
>>> +   the local backend memory contexts.
>>>
>>> This description seems a bit confusing because maybe we can interpret this
>>> as "... displays the memory contexts of all the local backends" wrongly. Thought?
>>> What about the following description, instead?
> 
>>>      The view <structname>pg_backend_memory_contexts</structname> displays all
>>>      the memory contexts of the server process attached to the current session.
> 
> Thanks! it seems better.
> 
>>> +    const char *name = context->name;
>>> +    const char *ident = context->ident;
>>> +
>>> +    if (context == NULL)
>>> +        return;
>>>
>>> The above check "context == NULL" is useless? If "context" is actually NULL,
>>> "context->name" would cause segmentation fault, so ISTM that the check
>>> will never be performed.
>>>
>>> If "context" can be NULL, the check should be performed before accessing
>>> to "contect". OTOH, if "context" must not be NULL per the specification of
>>> PutMemoryContextStatsTupleStore(), assertion test checking
>>> "context != NULL" should be used here, instead?
> 
> Yeah, "context" cannot be NULL because "context" must be TopMemoryContext
> or it is already checked as not NULL as follows(child != NULL).
> 
> I added the assertion check.

Isn't it better to add AssertArg(MemoryContextIsValid(context)), instead?


> 
> | for (child = context->firstchild; child != NULL; child = child->nextchild)
> | {
> |  ...
> |         PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
> |                                                           child, parentname, level + 1);
> | }
> 
>> Here is another comment.
>>
>> +     if (parent == NULL)
>> +             nulls[2] = true;
>> +     else
>> +             /*
>> +              * We labeled dynahash contexts with just the hash table name.
>> +              * To make it possible to identify its parent, we also display
>> +              * parent's ident here.
>> +              */
>> +             if (parent->ident && strcmp(parent->name, "dynahash") == 0)
>> +                     values[2] = CStringGetTextDatum(parent->ident);
>> +             else
>> +                     values[2] = CStringGetTextDatum(parent->name);
>>
>> PutMemoryContextsStatsTupleStore() doesn't need "parent" memory context,
>> but uses only the name of "parent" memory context. So isn't it better to use
>> "const char *parent" instead of "MemoryContext parent", as the argument of
>> the function? If we do that, we can simplify the above code.
> 
> Thanks, the attached patch adopted the advice.
> 
> However, since PutMemoryContextsStatsTupleStore() used not only the name
> but also the ident of the "parent", I could not help but adding similar
> codes before calling the function.
> The total amount of codes and complexity seem not to change so much.
> 
> Any thoughts? Am I misunderstanding something?

I was thinking that we can simplify the code as follows.
That is, we can just pass "name" as the argument of PutMemoryContextsStatsTupleStore()
since "name" indicates context->name or ident (if name is "dynahash").

      for (child = context->firstchild; child != NULL; child = child->nextchild)
      {
-        const char *parentname;
-
-        /*
-         * We labeled dynahash contexts with just the hash table name.
-         * To make it possible to identify its parent, we also use
-         * the hash table as its context name.
-         */
-        if (context->ident && strcmp(context->name, "dynahash") == 0)
-            parentname = context->ident;
-        else
-            parentname = context->name;
-
          PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
-                                  child, parentname, level + 1);
+                                  child, name, level + 1);

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
On 2020-08-18 22:54, Fujii Masao wrote:
> On 2020/08/18 18:41, torikoshia wrote:
>> On 2020-08-17 21:19, Fujii Masao wrote:
>>> On 2020/08/17 21:14, Fujii Masao wrote:
>>>>> On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
>>>>>> The following review has been posted through the commitfest 
>>>>>> application:
>>>>>> make installcheck-world:  tested, passed
>>>>>> Implements feature:       tested, passed
>>>>>> Spec compliant:           not tested
>>>>>> Documentation:            tested, passed
>>>>>> 
>>>>>> I tested the latest
>>>>>> patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
>>>>>> with the latest PG-version 
>>>>>> (199cec9779504c08aaa8159c6308283156547409)
>>>>>> and test was passed.
>>>>>> It looks good to me.
>>>>>> 
>>>>>> The new status of this patch is: Ready for Committer
>>>>> 
>>>>> Thanks for your testing!
>>>> 
>>>> Thanks for updating the patch! Here are the review comments.
>> 
>> Thanks for reviewing!
>> 
>>>> 
>>>> +     <row>
>>>> +      <entry><link 
>>>> linkend="view-pg-backend-memory-contexts"><structname>pg_backend_memory_contexts</structname></link></entry>
>>>> +      <entry>backend memory contexts</entry>
>>>> +     </row>
>>>> 
>>>> The above is located just after pg_matviews entry. But it should be 
>>>> located
>>>> just after pg_available_extension_versions entry. Because the rows 
>>>> in the table
>>>> "System Views" should be located in alphabetical order.
>>>> 
>>>> 
>>>> + <sect1 id="view-pg-backend-memory-contexts">
>>>> +  
>>>> <title><structname>pg_backend_memory_contexts</structname></title>
>>>> 
>>>> Same as above.
>> 
>> Modified both.
>> 
>>>> 
>>>> 
>>>> +   The view <structname>pg_backend_memory_contexts</structname> 
>>>> displays all
>>>> +   the local backend memory contexts.
>>>> 
>>>> This description seems a bit confusing because maybe we can 
>>>> interpret this
>>>> as "... displays the memory contexts of all the local backends" 
>>>> wrongly. Thought?
>>>> What about the following description, instead?
>> 
>>>>      The view <structname>pg_backend_memory_contexts</structname> 
>>>> displays all
>>>>      the memory contexts of the server process attached to the 
>>>> current session.
>> 
>> Thanks! it seems better.
>> 
>>>> +    const char *name = context->name;
>>>> +    const char *ident = context->ident;
>>>> +
>>>> +    if (context == NULL)
>>>> +        return;
>>>> 
>>>> The above check "context == NULL" is useless? If "context" is 
>>>> actually NULL,
>>>> "context->name" would cause segmentation fault, so ISTM that the 
>>>> check
>>>> will never be performed.
>>>> 
>>>> If "context" can be NULL, the check should be performed before 
>>>> accessing
>>>> to "contect". OTOH, if "context" must not be NULL per the 
>>>> specification of
>>>> PutMemoryContextStatsTupleStore(), assertion test checking
>>>> "context != NULL" should be used here, instead?
>> 
>> Yeah, "context" cannot be NULL because "context" must be 
>> TopMemoryContext
>> or it is already checked as not NULL as follows(child != NULL).
>> 
>> I added the assertion check.
> 
> Isn't it better to add AssertArg(MemoryContextIsValid(context)), 
> instead?

Thanks, that's better.

>> 
>> | for (child = context->firstchild; child != NULL; child = 
>> child->nextchild)
>> | {
>> |  ...
>> |         PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
>> |                                                           child, 
>> parentname, level + 1);
>> | }
>> 
>>> Here is another comment.
>>> 
>>> +     if (parent == NULL)
>>> +             nulls[2] = true;
>>> +     else
>>> +             /*
>>> +              * We labeled dynahash contexts with just the hash 
>>> table name.
>>> +              * To make it possible to identify its parent, we also 
>>> display
>>> +              * parent's ident here.
>>> +              */
>>> +             if (parent->ident && strcmp(parent->name, "dynahash") 
>>> == 0)
>>> +                     values[2] = CStringGetTextDatum(parent->ident);
>>> +             else
>>> +                     values[2] = CStringGetTextDatum(parent->name);
>>> 
>>> PutMemoryContextsStatsTupleStore() doesn't need "parent" memory 
>>> context,
>>> but uses only the name of "parent" memory context. So isn't it better 
>>> to use
>>> "const char *parent" instead of "MemoryContext parent", as the 
>>> argument of
>>> the function? If we do that, we can simplify the above code.
>> 
>> Thanks, the attached patch adopted the advice.
>> 
>> However, since PutMemoryContextsStatsTupleStore() used not only the 
>> name
>> but also the ident of the "parent", I could not help but adding 
>> similar
>> codes before calling the function.
>> The total amount of codes and complexity seem not to change so much.
>> 
>> Any thoughts? Am I misunderstanding something?
> 
> I was thinking that we can simplify the code as follows.
> That is, we can just pass "name" as the argument of
> PutMemoryContextsStatsTupleStore()
> since "name" indicates context->name or ident (if name is "dynahash").
> 
>      for (child = context->firstchild; child != NULL; child = 
> child->nextchild)
>      {
> -        const char *parentname;
> -
> -        /*
> -         * We labeled dynahash contexts with just the hash table name.
> -         * To make it possible to identify its parent, we also use
> -         * the hash table as its context name.
> -         */
> -        if (context->ident && strcmp(context->name, "dynahash") == 0)
> -            parentname = context->ident;
> -        else
> -            parentname = context->name;
> -
>          PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
> -                                  child, parentname, level + 1);
> +                                  child, name, level + 1);

I got it, thanks for the clarification!

Attached a revised patch.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachment

Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/08/19 9:43, torikoshia wrote:
> On 2020-08-18 22:54, Fujii Masao wrote:
>> On 2020/08/18 18:41, torikoshia wrote:
>>> On 2020-08-17 21:19, Fujii Masao wrote:
>>>> On 2020/08/17 21:14, Fujii Masao wrote:
>>>>>> On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
>>>>>>> The following review has been posted through the commitfest application:
>>>>>>> make installcheck-world:  tested, passed
>>>>>>> Implements feature:       tested, passed
>>>>>>> Spec compliant:           not tested
>>>>>>> Documentation:            tested, passed
>>>>>>>
>>>>>>> I tested the latest
>>>>>>> patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
>>>>>>> with the latest PG-version (199cec9779504c08aaa8159c6308283156547409)
>>>>>>> and test was passed.
>>>>>>> It looks good to me.
>>>>>>>
>>>>>>> The new status of this patch is: Ready for Committer
>>>>>>
>>>>>> Thanks for your testing!
>>>>>
>>>>> Thanks for updating the patch! Here are the review comments.
>>>
>>> Thanks for reviewing!
>>>
>>>>>
>>>>> +     <row>
>>>>> +      <entry><link
linkend="view-pg-backend-memory-contexts"><structname>pg_backend_memory_contexts</structname></link></entry>
>>>>> +      <entry>backend memory contexts</entry>
>>>>> +     </row>
>>>>>
>>>>> The above is located just after pg_matviews entry. But it should be located
>>>>> just after pg_available_extension_versions entry. Because the rows in the table
>>>>> "System Views" should be located in alphabetical order.
>>>>>
>>>>>
>>>>> + <sect1 id="view-pg-backend-memory-contexts">
>>>>> + <title><structname>pg_backend_memory_contexts</structname></title>
>>>>>
>>>>> Same as above.
>>>
>>> Modified both.
>>>
>>>>>
>>>>>
>>>>> +   The view <structname>pg_backend_memory_contexts</structname> displays all
>>>>> +   the local backend memory contexts.
>>>>>
>>>>> This description seems a bit confusing because maybe we can interpret this
>>>>> as "... displays the memory contexts of all the local backends" wrongly. Thought?
>>>>> What about the following description, instead?
>>>
>>>>>      The view <structname>pg_backend_memory_contexts</structname> displays all
>>>>>      the memory contexts of the server process attached to the current session.
>>>
>>> Thanks! it seems better.
>>>
>>>>> +    const char *name = context->name;
>>>>> +    const char *ident = context->ident;
>>>>> +
>>>>> +    if (context == NULL)
>>>>> +        return;
>>>>>
>>>>> The above check "context == NULL" is useless? If "context" is actually NULL,
>>>>> "context->name" would cause segmentation fault, so ISTM that the check
>>>>> will never be performed.
>>>>>
>>>>> If "context" can be NULL, the check should be performed before accessing
>>>>> to "contect". OTOH, if "context" must not be NULL per the specification of
>>>>> PutMemoryContextStatsTupleStore(), assertion test checking
>>>>> "context != NULL" should be used here, instead?
>>>
>>> Yeah, "context" cannot be NULL because "context" must be TopMemoryContext
>>> or it is already checked as not NULL as follows(child != NULL).
>>>
>>> I added the assertion check.
>>
>> Isn't it better to add AssertArg(MemoryContextIsValid(context)), instead?
> 
> Thanks, that's better.
> 
>>>
>>> | for (child = context->firstchild; child != NULL; child = child->nextchild)
>>> | {
>>> |  ...
>>> |         PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
>>> |                                                           child, parentname, level + 1);
>>> | }
>>>
>>>> Here is another comment.
>>>>
>>>> +     if (parent == NULL)
>>>> +             nulls[2] = true;
>>>> +     else
>>>> +             /*
>>>> +              * We labeled dynahash contexts with just the hash table name.
>>>> +              * To make it possible to identify its parent, we also display
>>>> +              * parent's ident here.
>>>> +              */
>>>> +             if (parent->ident && strcmp(parent->name, "dynahash") == 0)
>>>> +                     values[2] = CStringGetTextDatum(parent->ident);
>>>> +             else
>>>> +                     values[2] = CStringGetTextDatum(parent->name);
>>>>
>>>> PutMemoryContextsStatsTupleStore() doesn't need "parent" memory context,
>>>> but uses only the name of "parent" memory context. So isn't it better to use
>>>> "const char *parent" instead of "MemoryContext parent", as the argument of
>>>> the function? If we do that, we can simplify the above code.
>>>
>>> Thanks, the attached patch adopted the advice.
>>>
>>> However, since PutMemoryContextsStatsTupleStore() used not only the name
>>> but also the ident of the "parent", I could not help but adding similar
>>> codes before calling the function.
>>> The total amount of codes and complexity seem not to change so much.
>>>
>>> Any thoughts? Am I misunderstanding something?
>>
>> I was thinking that we can simplify the code as follows.
>> That is, we can just pass "name" as the argument of
>> PutMemoryContextsStatsTupleStore()
>> since "name" indicates context->name or ident (if name is "dynahash").
>>
>>      for (child = context->firstchild; child != NULL; child = child->nextchild)
>>      {
>> -        const char *parentname;
>> -
>> -        /*
>> -         * We labeled dynahash contexts with just the hash table name.
>> -         * To make it possible to identify its parent, we also use
>> -         * the hash table as its context name.
>> -         */
>> -        if (context->ident && strcmp(context->name, "dynahash") == 0)
>> -            parentname = context->ident;
>> -        else
>> -            parentname = context->name;
>> -
>>          PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
>> -                                  child, parentname, level + 1);
>> +                                  child, name, level + 1);
> 
> I got it, thanks for the clarification!
> 
> Attached a revised patch.

Thanks for updating the patch! I pushed it.

BTW, I guess that you didn't add the regression test for this view because
the contents of the view are not stable. Right? But isn't it better to just
add the "stable" test like

     SELECT name, ident, parent, level, total_bytes >= free_bytes FROM pg_backend_memory_contexts WHERE level = 0;

rather than adding nothing?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
On 2020-08-19 15:48, Fujii Masao wrote:
> On 2020/08/19 9:43, torikoshia wrote:
>> On 2020-08-18 22:54, Fujii Masao wrote:
>>> On 2020/08/18 18:41, torikoshia wrote:
>>>> On 2020-08-17 21:19, Fujii Masao wrote:
>>>>> On 2020/08/17 21:14, Fujii Masao wrote:
>>>>>>> On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
>>>>>>>> The following review has been posted through the commitfest 
>>>>>>>> application:
>>>>>>>> make installcheck-world:  tested, passed
>>>>>>>> Implements feature:       tested, passed
>>>>>>>> Spec compliant:           not tested
>>>>>>>> Documentation:            tested, passed
>>>>>>>> 
>>>>>>>> I tested the latest
>>>>>>>> patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
>>>>>>>> with the latest PG-version 
>>>>>>>> (199cec9779504c08aaa8159c6308283156547409)
>>>>>>>> and test was passed.
>>>>>>>> It looks good to me.
>>>>>>>> 
>>>>>>>> The new status of this patch is: Ready for Committer
>>>>>>> 
>>>>>>> Thanks for your testing!
>>>>>> 
>>>>>> Thanks for updating the patch! Here are the review comments.
>>>> 
>>>> Thanks for reviewing!
>>>> 
>>>>>> 
>>>>>> +     <row>
>>>>>> +      <entry><link 
>>>>>> linkend="view-pg-backend-memory-contexts"><structname>pg_backend_memory_contexts</structname></link></entry>
>>>>>> +      <entry>backend memory contexts</entry>
>>>>>> +     </row>
>>>>>> 
>>>>>> The above is located just after pg_matviews entry. But it should 
>>>>>> be located
>>>>>> just after pg_available_extension_versions entry. Because the rows 
>>>>>> in the table
>>>>>> "System Views" should be located in alphabetical order.
>>>>>> 
>>>>>> 
>>>>>> + <sect1 id="view-pg-backend-memory-contexts">
>>>>>> + 
>>>>>> <title><structname>pg_backend_memory_contexts</structname></title>
>>>>>> 
>>>>>> Same as above.
>>>> 
>>>> Modified both.
>>>> 
>>>>>> 
>>>>>> 
>>>>>> +   The view <structname>pg_backend_memory_contexts</structname> 
>>>>>> displays all
>>>>>> +   the local backend memory contexts.
>>>>>> 
>>>>>> This description seems a bit confusing because maybe we can 
>>>>>> interpret this
>>>>>> as "... displays the memory contexts of all the local backends" 
>>>>>> wrongly. Thought?
>>>>>> What about the following description, instead?
>>>> 
>>>>>>      The view <structname>pg_backend_memory_contexts</structname> 
>>>>>> displays all
>>>>>>      the memory contexts of the server process attached to the 
>>>>>> current session.
>>>> 
>>>> Thanks! it seems better.
>>>> 
>>>>>> +    const char *name = context->name;
>>>>>> +    const char *ident = context->ident;
>>>>>> +
>>>>>> +    if (context == NULL)
>>>>>> +        return;
>>>>>> 
>>>>>> The above check "context == NULL" is useless? If "context" is 
>>>>>> actually NULL,
>>>>>> "context->name" would cause segmentation fault, so ISTM that the 
>>>>>> check
>>>>>> will never be performed.
>>>>>> 
>>>>>> If "context" can be NULL, the check should be performed before 
>>>>>> accessing
>>>>>> to "contect". OTOH, if "context" must not be NULL per the 
>>>>>> specification of
>>>>>> PutMemoryContextStatsTupleStore(), assertion test checking
>>>>>> "context != NULL" should be used here, instead?
>>>> 
>>>> Yeah, "context" cannot be NULL because "context" must be 
>>>> TopMemoryContext
>>>> or it is already checked as not NULL as follows(child != NULL).
>>>> 
>>>> I added the assertion check.
>>> 
>>> Isn't it better to add AssertArg(MemoryContextIsValid(context)), 
>>> instead?
>> 
>> Thanks, that's better.
>> 
>>>> 
>>>> | for (child = context->firstchild; child != NULL; child = 
>>>> child->nextchild)
>>>> | {
>>>> |  ...
>>>> |         PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
>>>> |                                                           child, 
>>>> parentname, level + 1);
>>>> | }
>>>> 
>>>>> Here is another comment.
>>>>> 
>>>>> +     if (parent == NULL)
>>>>> +             nulls[2] = true;
>>>>> +     else
>>>>> +             /*
>>>>> +              * We labeled dynahash contexts with just the hash 
>>>>> table name.
>>>>> +              * To make it possible to identify its parent, we 
>>>>> also display
>>>>> +              * parent's ident here.
>>>>> +              */
>>>>> +             if (parent->ident && strcmp(parent->name, "dynahash") 
>>>>> == 0)
>>>>> +                     values[2] = 
>>>>> CStringGetTextDatum(parent->ident);
>>>>> +             else
>>>>> +                     values[2] = 
>>>>> CStringGetTextDatum(parent->name);
>>>>> 
>>>>> PutMemoryContextsStatsTupleStore() doesn't need "parent" memory 
>>>>> context,
>>>>> but uses only the name of "parent" memory context. So isn't it 
>>>>> better to use
>>>>> "const char *parent" instead of "MemoryContext parent", as the 
>>>>> argument of
>>>>> the function? If we do that, we can simplify the above code.
>>>> 
>>>> Thanks, the attached patch adopted the advice.
>>>> 
>>>> However, since PutMemoryContextsStatsTupleStore() used not only the 
>>>> name
>>>> but also the ident of the "parent", I could not help but adding 
>>>> similar
>>>> codes before calling the function.
>>>> The total amount of codes and complexity seem not to change so much.
>>>> 
>>>> Any thoughts? Am I misunderstanding something?
>>> 
>>> I was thinking that we can simplify the code as follows.
>>> That is, we can just pass "name" as the argument of
>>> PutMemoryContextsStatsTupleStore()
>>> since "name" indicates context->name or ident (if name is 
>>> "dynahash").
>>> 
>>>      for (child = context->firstchild; child != NULL; child = 
>>> child->nextchild)
>>>      {
>>> -        const char *parentname;
>>> -
>>> -        /*
>>> -         * We labeled dynahash contexts with just the hash table 
>>> name.
>>> -         * To make it possible to identify its parent, we also use
>>> -         * the hash table as its context name.
>>> -         */
>>> -        if (context->ident && strcmp(context->name, "dynahash") == 
>>> 0)
>>> -            parentname = context->ident;
>>> -        else
>>> -            parentname = context->name;
>>> -
>>>          PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
>>> -                                  child, parentname, level + 1);
>>> +                                  child, name, level + 1);
>> 
>> I got it, thanks for the clarification!
>> 
>> Attached a revised patch.
> 
> Thanks for updating the patch! I pushed it.

Thanks a lot!

> 
> BTW, I guess that you didn't add the regression test for this view 
> because
> the contents of the view are not stable. Right? But isn't it better to 
> just
> add the "stable" test like
> 
>     SELECT name, ident, parent, level, total_bytes >= free_bytes FROM
> pg_backend_memory_contexts WHERE level = 0;
> 
> rather than adding nothing?

Yes, I didn't add regression tests because of the unstability of the 
output.
I thought it would be OK since other views like pg_stat_slru and 
pg_shmem_allocations
didn't have tests for their outputs.

I don't have strong objections for adding tests like you proposed, but 
I'm not sure
whether there are special reasons to add tests compared with these 
existing views.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/08/19 17:40, torikoshia wrote:
> On 2020-08-19 15:48, Fujii Masao wrote:
>> On 2020/08/19 9:43, torikoshia wrote:
>>> On 2020-08-18 22:54, Fujii Masao wrote:
>>>> On 2020/08/18 18:41, torikoshia wrote:
>>>>> On 2020-08-17 21:19, Fujii Masao wrote:
>>>>>> On 2020/08/17 21:14, Fujii Masao wrote:
>>>>>>>> On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
>>>>>>>>> The following review has been posted through the commitfest application:
>>>>>>>>> make installcheck-world:  tested, passed
>>>>>>>>> Implements feature:       tested, passed
>>>>>>>>> Spec compliant:           not tested
>>>>>>>>> Documentation:            tested, passed
>>>>>>>>>
>>>>>>>>> I tested the latest
>>>>>>>>> patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
>>>>>>>>> with the latest PG-version (199cec9779504c08aaa8159c6308283156547409)
>>>>>>>>> and test was passed.
>>>>>>>>> It looks good to me.
>>>>>>>>>
>>>>>>>>> The new status of this patch is: Ready for Committer
>>>>>>>>
>>>>>>>> Thanks for your testing!
>>>>>>>
>>>>>>> Thanks for updating the patch! Here are the review comments.
>>>>>
>>>>> Thanks for reviewing!
>>>>>
>>>>>>>
>>>>>>> +     <row>
>>>>>>> +      <entry><link
linkend="view-pg-backend-memory-contexts"><structname>pg_backend_memory_contexts</structname></link></entry>
>>>>>>> +      <entry>backend memory contexts</entry>
>>>>>>> +     </row>
>>>>>>>
>>>>>>> The above is located just after pg_matviews entry. But it should be located
>>>>>>> just after pg_available_extension_versions entry. Because the rows in the table
>>>>>>> "System Views" should be located in alphabetical order.
>>>>>>>
>>>>>>>
>>>>>>> + <sect1 id="view-pg-backend-memory-contexts">
>>>>>>> + <title><structname>pg_backend_memory_contexts</structname></title>
>>>>>>>
>>>>>>> Same as above.
>>>>>
>>>>> Modified both.
>>>>>
>>>>>>>
>>>>>>>
>>>>>>> +   The view <structname>pg_backend_memory_contexts</structname> displays all
>>>>>>> +   the local backend memory contexts.
>>>>>>>
>>>>>>> This description seems a bit confusing because maybe we can interpret this
>>>>>>> as "... displays the memory contexts of all the local backends" wrongly. Thought?
>>>>>>> What about the following description, instead?
>>>>>
>>>>>>>      The view <structname>pg_backend_memory_contexts</structname> displays all
>>>>>>>      the memory contexts of the server process attached to the current session.
>>>>>
>>>>> Thanks! it seems better.
>>>>>
>>>>>>> +    const char *name = context->name;
>>>>>>> +    const char *ident = context->ident;
>>>>>>> +
>>>>>>> +    if (context == NULL)
>>>>>>> +        return;
>>>>>>>
>>>>>>> The above check "context == NULL" is useless? If "context" is actually NULL,
>>>>>>> "context->name" would cause segmentation fault, so ISTM that the check
>>>>>>> will never be performed.
>>>>>>>
>>>>>>> If "context" can be NULL, the check should be performed before accessing
>>>>>>> to "contect". OTOH, if "context" must not be NULL per the specification of
>>>>>>> PutMemoryContextStatsTupleStore(), assertion test checking
>>>>>>> "context != NULL" should be used here, instead?
>>>>>
>>>>> Yeah, "context" cannot be NULL because "context" must be TopMemoryContext
>>>>> or it is already checked as not NULL as follows(child != NULL).
>>>>>
>>>>> I added the assertion check.
>>>>
>>>> Isn't it better to add AssertArg(MemoryContextIsValid(context)), instead?
>>>
>>> Thanks, that's better.
>>>
>>>>>
>>>>> | for (child = context->firstchild; child != NULL; child = child->nextchild)
>>>>> | {
>>>>> |  ...
>>>>> |         PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
>>>>> |                                                           child, parentname, level + 1);
>>>>> | }
>>>>>
>>>>>> Here is another comment.
>>>>>>
>>>>>> +     if (parent == NULL)
>>>>>> +             nulls[2] = true;
>>>>>> +     else
>>>>>> +             /*
>>>>>> +              * We labeled dynahash contexts with just the hash table name.
>>>>>> +              * To make it possible to identify its parent, we also display
>>>>>> +              * parent's ident here.
>>>>>> +              */
>>>>>> +             if (parent->ident && strcmp(parent->name, "dynahash") == 0)
>>>>>> +                     values[2] = CStringGetTextDatum(parent->ident);
>>>>>> +             else
>>>>>> +                     values[2] = CStringGetTextDatum(parent->name);
>>>>>>
>>>>>> PutMemoryContextsStatsTupleStore() doesn't need "parent" memory context,
>>>>>> but uses only the name of "parent" memory context. So isn't it better to use
>>>>>> "const char *parent" instead of "MemoryContext parent", as the argument of
>>>>>> the function? If we do that, we can simplify the above code.
>>>>>
>>>>> Thanks, the attached patch adopted the advice.
>>>>>
>>>>> However, since PutMemoryContextsStatsTupleStore() used not only the name
>>>>> but also the ident of the "parent", I could not help but adding similar
>>>>> codes before calling the function.
>>>>> The total amount of codes and complexity seem not to change so much.
>>>>>
>>>>> Any thoughts? Am I misunderstanding something?
>>>>
>>>> I was thinking that we can simplify the code as follows.
>>>> That is, we can just pass "name" as the argument of
>>>> PutMemoryContextsStatsTupleStore()
>>>> since "name" indicates context->name or ident (if name is "dynahash").
>>>>
>>>>      for (child = context->firstchild; child != NULL; child = child->nextchild)
>>>>      {
>>>> -        const char *parentname;
>>>> -
>>>> -        /*
>>>> -         * We labeled dynahash contexts with just the hash table name.
>>>> -         * To make it possible to identify its parent, we also use
>>>> -         * the hash table as its context name.
>>>> -         */
>>>> -        if (context->ident && strcmp(context->name, "dynahash") == 0)
>>>> -            parentname = context->ident;
>>>> -        else
>>>> -            parentname = context->name;
>>>> -
>>>>          PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
>>>> -                                  child, parentname, level + 1);
>>>> +                                  child, name, level + 1);
>>>
>>> I got it, thanks for the clarification!
>>>
>>> Attached a revised patch.
>>
>> Thanks for updating the patch! I pushed it.
> 
> Thanks a lot!
> 
>>
>> BTW, I guess that you didn't add the regression test for this view because
>> the contents of the view are not stable. Right? But isn't it better to just
>> add the "stable" test like
>>
>>     SELECT name, ident, parent, level, total_bytes >= free_bytes FROM
>> pg_backend_memory_contexts WHERE level = 0;
>>
>> rather than adding nothing?
> 
> Yes, I didn't add regression tests because of the unstability of the output.
> I thought it would be OK since other views like pg_stat_slru and pg_shmem_allocations
> didn't have tests for their outputs.

You're right.

> I don't have strong objections for adding tests like you proposed, but I'm not sure
> whether there are special reasons to add tests compared with these existing views.

Ok, understood. So I'd withdraw my suggestion.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Michael Paquier
Date:
On Wed, Aug 19, 2020 at 06:12:02PM +0900, Fujii Masao wrote:
> On 2020/08/19 17:40, torikoshia wrote:
>> Yes, I didn't add regression tests because of the unstability of the output.
>> I thought it would be OK since other views like pg_stat_slru and pg_shmem_allocations
>> didn't have tests for their outputs.
>
> You're right.

If you can make a test with something minimal and with a stable
output, adding a test is helpful IMO, or how can you make easily sure
that this does not get broken, particularly in the event of future
refactorings, or even with platform-dependent behaviors?

By the way, I was looking at the code that has been committed, and I
think that it is awkward to have a SQL function in mcxt.c, which is a
rather low-level interface.  I think that this new code should be
moved to its own file, one suggestion for a location I have being
src/backend/utils/adt/mcxtfuncs.c.
--
Michael

Attachment

Re: Creating a function for exposing memory usage of backend process

From
Tom Lane
Date:
Hadn't been paying attention to this thread up till now, but ...

Michael Paquier <michael@paquier.xyz> writes:
> By the way, I was looking at the code that has been committed, and I
> think that it is awkward to have a SQL function in mcxt.c, which is a
> rather low-level interface.  I think that this new code should be
> moved to its own file, one suggestion for a location I have being
> src/backend/utils/adt/mcxtfuncs.c.

I agree with that, but I think this patch has a bigger problem:
why bother at all?  It seems like a waste of code space and future
maintenance effort, because there is no use-case.  In the situations
where you need to know where the memory went, you are almost never
in a position to leisurely execute a query and send the results over
to your client.  This certainly would be useless to figure out why
an already-running query is eating space, for instance.

The only situation I could imagine where this would have any use is
where there is long-term (cross-query) bloat in, say, CacheMemoryContext
--- but it's not even very helpful for that, since you can't examine
anything finer-grain than a memory context.  Plus you need to be
running an interactive session, or else be willing to hack up your
application to try to get it to inspect the view (and log the
results somewhere) at useful times.

On top of all that, the functionality has Heisenberg problems,
because simply using it changes what you are trying to observe,
in complex and undocumented ways (not that the documentation
would be of any use to non-experts anyway).

My own thoughts about improving the debugging situation would've been
to create a way to send a signal to a session to make it dump its
current memory map to the postmaster log (not the client, since the
client is unlikely to be prepared to receive anything extraneous).
But this is nothing like that.

Given the lack of clear use-case, and the possibility (admittedly
not strong) that this is still somehow a security hazard, I think
we should revert it.  If it stays, I'd like to see restrictions
on who can read the view.

            regards, tom lane



Re: Creating a function for exposing memory usage of backend process

From
Andres Freund
Date:
Hi,

On 2020-08-19 11:01:37 -0400, Tom Lane wrote:
> Hadn't been paying attention to this thread up till now, but ...
> 
> Michael Paquier <michael@paquier.xyz> writes:
> > By the way, I was looking at the code that has been committed, and I
> > think that it is awkward to have a SQL function in mcxt.c, which is a
> > rather low-level interface.  I think that this new code should be
> > moved to its own file, one suggestion for a location I have being
> > src/backend/utils/adt/mcxtfuncs.c.
> 
> I agree with that, but I think this patch has a bigger problem:
> why bother at all?  It seems like a waste of code space and future
> maintenance effort, because there is no use-case.  In the situations
> where you need to know where the memory went, you are almost never
> in a position to leisurely execute a query and send the results over
> to your client.  This certainly would be useless to figure out why
> an already-running query is eating space, for instance.

I don't agree with this at all. I think there's plenty use cases. It's
e.g. very common to try to figure out why the memory usage of a process
is high. Is it memory not returned to the OS? Is it caches that have
grown too much etc.

I agree it's not perfect:

> Plus you need to be running an interactive session, or else be willing
> to hack up your application to try to get it to inspect the view (and
> log the results somewhere) at useful times.

and that we likely should address that by *also* allowing to view the
memory usage of another process. Which obviously isn't entirely
trivial. But some infrastructure likely could be reused.


> My own thoughts about improving the debugging situation would've been
> to create a way to send a signal to a session to make it dump its
> current memory map to the postmaster log (not the client, since the
> client is unlikely to be prepared to receive anything extraneous).
> But this is nothing like that.

That doesn't really work in a large number of environments, I'm
afraid. Many many users don't have access to the server log.


> If it stays, I'd like to see restrictions on who can read the view.

As long as access is grantable rather than needing a security definer
wrapper I'd be fine with that.

Greetings,

Andres Freund



Re: Creating a function for exposing memory usage of backend process

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2020-08-19 11:01:37 -0400, Tom Lane wrote:
>> I agree with that, but I think this patch has a bigger problem:
>> why bother at all?  It seems like a waste of code space and future
>> maintenance effort, because there is no use-case.

> I don't agree with this at all. I think there's plenty use cases. It's
> e.g. very common to try to figure out why the memory usage of a process
> is high. Is it memory not returned to the OS? Is it caches that have
> grown too much etc.

Oh, I agree completely that there are lots of use-cases for finding
out what a process' memory map looks like.  But this patch fails to
address any of them in a usable way.

>> My own thoughts about improving the debugging situation would've been
>> to create a way to send a signal to a session to make it dump its
>> current memory map to the postmaster log (not the client, since the
>> client is unlikely to be prepared to receive anything extraneous).

> That doesn't really work in a large number of environments, I'm
> afraid. Many many users don't have access to the server log.

My rationale for that was (a) it can be implemented without a lot
of impact on the memory map, and (b) requiring access to the server
log eliminates questions about whether you have enough privilege to
examine the map.  I'm prepared to compromise about (b), but less so
about (a).  Having to run a SQL query to find this out is a mess.

            regards, tom lane



Re: Creating a function for exposing memory usage of backend process

From
Andres Freund
Date:
Hi,

On 2020-08-19 21:29:06 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2020-08-19 11:01:37 -0400, Tom Lane wrote:
> >> I agree with that, but I think this patch has a bigger problem:
> >> why bother at all?  It seems like a waste of code space and future
> >> maintenance effort, because there is no use-case.
> 
> > I don't agree with this at all. I think there's plenty use cases. It's
> > e.g. very common to try to figure out why the memory usage of a process
> > is high. Is it memory not returned to the OS? Is it caches that have
> > grown too much etc.
> 
> Oh, I agree completely that there are lots of use-cases for finding
> out what a process' memory map looks like.  But this patch fails to
> address any of them in a usable way.

Even just being able to see the memory usage in a queryable way is a
huge benefit. The difference over having to parse the log, then parse
the memory usage dump, and then aggregate the data in there in a
meaningful way is *huge*.  We've been slacking around lowering our
memory usage, and I think the fact that it's annoying to analyze is a
partial reason for that.

I totally agree that it's not *enough*, but in contrast to you I think
it's a good step. Subsequently we should add a way to get any backends
memory usage.
It's not too hard to imagine how to serialize it in a way that can be
easily deserialized by another backend. I am imagining something like
sending a procsignal that triggers (probably at CFR() time) a backend to
write its own memory usage into pg_memusage/<pid> or something roughly
like that.

Greetings,

Andres Freund



Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/08/20 0:01, Tom Lane wrote:
> Hadn't been paying attention to this thread up till now, but ...
> 
> Michael Paquier <michael@paquier.xyz> writes:
>> By the way, I was looking at the code that has been committed, and I
>> think that it is awkward to have a SQL function in mcxt.c, which is a
>> rather low-level interface.  I think that this new code should be
>> moved to its own file, one suggestion for a location I have being
>> src/backend/utils/adt/mcxtfuncs.c.
> 
> I agree with that, but I think this patch has a bigger problem:
> why bother at all?  It seems like a waste of code space and future
> maintenance effort, because there is no use-case.  In the situations
> where you need to know where the memory went, you are almost never
> in a position to leisurely execute a query and send the results over
> to your client.  This certainly would be useless to figure out why
> an already-running query is eating space, for instance.
> 
> The only situation I could imagine where this would have any use is
> where there is long-term (cross-query) bloat in, say, CacheMemoryContext

Yes, this feature is useful to check a cross-query memory bloat like
the bloats of relcache, prepared statements, PL/pgSQL cache,
SMgrRelationHash, etc. For example, several years before, my colleague
investigated the cause of the memory bloat by using the almost same
feature that pg_cheat_funcs extension provides, and then found that
the cause was that the application forgot to release lots of SAVEPONT.


> --- but it's not even very helpful for that, since you can't examine
> anything finer-grain than a memory context.

Yes, but even that information can be good hint when investigating
the memory bloat.


> Plus you need to be
> running an interactive session, or else be willing to hack up your
> application to try to get it to inspect the view (and log the
> results somewhere) at useful times.
> 
> On top of all that, the functionality has Heisenberg problems,
> because simply using it changes what you are trying to observe,
> in complex and undocumented ways (not that the documentation
> would be of any use to non-experts anyway).
> 
> My own thoughts about improving the debugging situation would've been
> to create a way to send a signal to a session to make it dump its
> current memory map to the postmaster log (not the client, since the
> client is unlikely to be prepared to receive anything extraneous).
> But this is nothing like that.

I agree that we need something like this, i.e., the way to monitor the memory
usage of any process even during query running. OTOH, I think that the added
feature has a use case and is good as the first step.


> Given the lack of clear use-case, and the possibility (admittedly
> not strong) that this is still somehow a security hazard, I think
> we should revert it.  If it stays, I'd like to see restrictions
> on who can read the view.

For example, allowing only the role with pg_monitor to see this view?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/08/20 10:43, Andres Freund wrote:
> Hi,
> 
> On 2020-08-19 21:29:06 -0400, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> On 2020-08-19 11:01:37 -0400, Tom Lane wrote:
>>>> I agree with that, but I think this patch has a bigger problem:
>>>> why bother at all?  It seems like a waste of code space and future
>>>> maintenance effort, because there is no use-case.
>>
>>> I don't agree with this at all. I think there's plenty use cases. It's
>>> e.g. very common to try to figure out why the memory usage of a process
>>> is high. Is it memory not returned to the OS? Is it caches that have
>>> grown too much etc.
>>
>> Oh, I agree completely that there are lots of use-cases for finding
>> out what a process' memory map looks like.  But this patch fails to
>> address any of them in a usable way.
> 
> Even just being able to see the memory usage in a queryable way is a
> huge benefit.

+1

> The difference over having to parse the log, then parse
> the memory usage dump, and then aggregate the data in there in a
> meaningful way is *huge*.  We've been slacking around lowering our
> memory usage, and I think the fact that it's annoying to analyze is a
> partial reason for that.

Agreed.

  
> I totally agree that it's not *enough*, but in contrast to you I think
> it's a good step. Subsequently we should add a way to get any backends
> memory usage.
> It's not too hard to imagine how to serialize it in a way that can be
> easily deserialized by another backend. I am imagining something like
> sending a procsignal that triggers (probably at CFR() time) a backend to
> write its own memory usage into pg_memusage/<pid> or something roughly
> like that.

Sounds good. Maybe we can also provide the SQL-callable function
or view to read pg_memusage/<pid>, to make the analysis easier.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Kasahara Tatsuhito
Date:
On 2020/08/20 0:01, Tom Lane wrote:
> The only situation I could imagine where this would have any use is
> where there is long-term (cross-query) bloat in, say, CacheMemoryContext
Yeah, in cases where a very large number of sessions are connected to
the DB for very long periods of time, the memory consumption of the
back-end processes may increase slowly, and such a feature is useful
for analysis.

And ,as Fujii said, this feature very useful to see which contexts are
consuming a lot of memory and to narrow down the causes.

On Thu, Aug 20, 2020 at 11:18 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> On 2020/08/20 10:43, Andres Freund wrote:
> > Hi,
> > Even just being able to see the memory usage in a queryable way is a
> > huge benefit.
>
> +1
+1

I think this feature is very useful in environments where gdb is not
available or access to server logs is limited.
And it is cumbersome to extract and analyze the memory information
from the very large server logs.


> > I totally agree that it's not *enough*, but in contrast to you I think
> > it's a good step. Subsequently we should add a way to get any backends
> > memory usage.
> > It's not too hard to imagine how to serialize it in a way that can be
> > easily deserialized by another backend. I am imagining something like
> > sending a procsignal that triggers (probably at CFR() time) a backend to
> > write its own memory usage into pg_memusage/<pid> or something roughly
> > like that.
>
> Sounds good. Maybe we can also provide the SQL-callable function
> or view to read pg_memusage/<pid>, to make the analysis easier.
+1

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com



Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
Thanks for all your comments!

Thankfully it seems that this feature is regarded as not
meaningless one, I'm going to do some improvements.


On Wed, Aug 19, 2020 at 10:56 PM Michael Paquier <michael@paquier.xyz> 
wrote:
> On Wed, Aug 19, 2020 at 06:12:02PM +0900, Fujii Masao wrote:
>> On 2020/08/19 17:40, torikoshia wrote:
>>> Yes, I didn't add regression tests because of the unstability of the 
>>> output.
>>> I thought it would be OK since other views like pg_stat_slru and 
>>> pg_shmem_allocations
>>> didn't have tests for their outputs.
>> 
>> You're right.
> 
> If you can make a test with something minimal and with a stable
> output, adding a test is helpful IMO, or how can you make easily sure
> that this does not get broken, particularly in the event of future
> refactorings, or even with platform-dependent behaviors?

OK. Added a regression test on sysviews.sql.
(0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)

Fujii-san gave us an example, but I added more simple one considering
the simplicity of other tests on that.


On Thu, Aug 20, 2020 at 12:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael@paquier.xyz> writes:
> > By the way, I was looking at the code that has been committed, and I
> > think that it is awkward to have a SQL function in mcxt.c, which is a
> > rather low-level interface.  I think that this new code should be
> > moved to its own file, one suggestion for a location I have being
> > src/backend/utils/adt/mcxtfuncs.c.
> 
> I agree with that,

Thanks for pointing out.
Added a patch for relocating the codes to mcxtfuncs.c.
(patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)


> On Thu, Aug 20, 2020 at 11:09 AM Fujii Masao 
> <masao.fujii@oss.nttdata.com> wrote:
> On 2020/08/20 0:01, Tom Lane wrote:
>> Given the lack of clear use-case, and the possibility (admittedly
>> not strong) that this is still somehow a security hazard, I think
>> we should revert it.  If it stays, I'd like to see restrictions
>> on who can read the view.

> For example, allowing only the role with pg_monitor to see this view?

Attached a patch adding that restriction.
(0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch)

Of course, this restriction makes pg_backend_memory_contexts hard to use
when the user of the target session is not granted pg_monitor because 
the
scope of this view is session local.

In this case, I imagine additional operations something like temporarily
granting pg_monitor to that user.

Thoughts?


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachment

Re: Creating a function for exposing memory usage of backend process

From
Michael Paquier
Date:
On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:
> OK. Added a regression test on sysviews.sql.
> (0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)
>
> Fujii-san gave us an example, but I added more simple one considering
> the simplicity of other tests on that.

What you have sent in 0001 looks fine to me.  A small test is much
better than nothing.

> Added a patch for relocating the codes to mcxtfuncs.c.
> (patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)

The same code is moved around line-by-line.

> Of course, this restriction makes pg_backend_memory_contexts hard to use
> when the user of the target session is not granted pg_monitor because the
> scope of this view is session local.
>
> In this case, I imagine additional operations something like temporarily
> granting pg_monitor to that user.

Hmm.  I am not completely sure either that pg_monitor is the best fit
here, because this view provides information about a bunch of internal
structures.  Something that could easily be done though is to revoke
the access from public, and then users could just set up GRANT
permissions post-initdb, with pg_monitor as one possible choice.  This
is the safest path by default, and this stuff is of a caliber similar
to pg_shmem_allocations in terms of internal contents.

It seems to me that you are missing one "REVOKE ALL on
pg_backend_memory_contexts FROM PUBLIC" in patch 0003.

By the way, if that was just for me, I would remove used_bytes, which
is just a computation from the total and free numbers.  I'll defer
that point to Fujii-san.
--
Michael

Attachment

Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
On 2020-08-22 21:18, Michael Paquier wrote:

Thanks for reviewing!

> On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:
>> OK. Added a regression test on sysviews.sql.
>> (0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)
>> 
>> Fujii-san gave us an example, but I added more simple one considering
>> the simplicity of other tests on that.
> 
> What you have sent in 0001 looks fine to me.  A small test is much
> better than nothing.
> 
>> Added a patch for relocating the codes to mcxtfuncs.c.
>> (patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)
> 
> The same code is moved around line-by-line.
> 
>> Of course, this restriction makes pg_backend_memory_contexts hard to 
>> use
>> when the user of the target session is not granted pg_monitor because 
>> the
>> scope of this view is session local.
>> 
>> In this case, I imagine additional operations something like 
>> temporarily
>> granting pg_monitor to that user.
> 
> Hmm.  I am not completely sure either that pg_monitor is the best fit
> here, because this view provides information about a bunch of internal
> structures.  Something that could easily be done though is to revoke
> the access from public, and then users could just set up GRANT
> permissions post-initdb, with pg_monitor as one possible choice.  This
> is the safest path by default, and this stuff is of a caliber similar
> to pg_shmem_allocations in terms of internal contents.

I think this is a better way than what I did in
0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch.

Attached a patch.

> 
> It seems to me that you are missing one "REVOKE ALL on
> pg_backend_memory_contexts FROM PUBLIC" in patch 0003.
> 
> By the way, if that was just for me, I would remove used_bytes, which
> is just a computation from the total and free numbers.  I'll defer
> that point to Fujii-san.
> --
> Michael


On 2020/08/20 2:59, Kasahara Tatsuhito wrote:
>>> I totally agree that it's not *enough*, but in contrast to you I 
>>> think
>>> it's a good step. Subsequently we should add a way to get any 
>>> backends
>>> memory usage.
>>> It's not too hard to imagine how to serialize it in a way that can be
>>> easily deserialized by another backend. I am imagining something like
>>> sending a procsignal that triggers (probably at CFR() time) a backend 
>>> to
>>> write its own memory usage into pg_memusage/<pid> or something 
>>> roughly
>>> like that.
>> 
>> Sounds good. Maybe we can also provide the SQL-callable function
>> or view to read pg_memusage/<pid>, to make the analysis easier.
> +1


I'm thinking about starting a new thread to discuss exposing other
backend's memory context.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION


Attachment

Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/08/24 13:01, torikoshia wrote:
> On 2020-08-22 21:18, Michael Paquier wrote:
> 
> Thanks for reviewing!
> 
>> On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:
>>> OK. Added a regression test on sysviews.sql.
>>> (0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)
>>>
>>> Fujii-san gave us an example, but I added more simple one considering
>>> the simplicity of other tests on that.
>>
>> What you have sent in 0001 looks fine to me.  A small test is much
>> better than nothing.

+1

But as I proposed upthread, what about a bit complicated test as follows,
e.g., to confirm that the internal logic for level works expectedly?

      SELECT name, ident, parent, level, total_bytes >= free_bytes FROM pg_backend_memory_contexts WHERE level = 0;


>>
>>> Added a patch for relocating the codes to mcxtfuncs.c.
>>> (patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)

Thanks for the patch! Looks good to me.
Barring any objection, I will commit this patch at first.


>>
>> The same code is moved around line-by-line.
>>
>>> Of course, this restriction makes pg_backend_memory_contexts hard to use
>>> when the user of the target session is not granted pg_monitor because the
>>> scope of this view is session local.
>>>
>>> In this case, I imagine additional operations something like temporarily
>>> granting pg_monitor to that user.
>>
>> Hmm.  I am not completely sure either that pg_monitor is the best fit
>> here, because this view provides information about a bunch of internal
>> structures.  Something that could easily be done though is to revoke
>> the access from public, and then users could just set up GRANT
>> permissions post-initdb, with pg_monitor as one possible choice.  This
>> is the safest path by default, and this stuff is of a caliber similar
>> to pg_shmem_allocations in terms of internal contents.
> 
> I think this is a better way than what I did in
> 0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch.

You mean 0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch?


> Attached a patch.

Thanks for updating the patch! This also looks good to me.


>> It seems to me that you are missing one "REVOKE ALL on
>> pg_backend_memory_contexts FROM PUBLIC" in patch 0003.
>>
>> By the way, if that was just for me, I would remove used_bytes, which
>> is just a computation from the total and free numbers.  I'll defer
>> that point to Fujii-san.

Yeah, I was just thinking that displaying also used_bytes was useful,
but this might be inconsistent with the other views' ways.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/08/24 13:13, Fujii Masao wrote:
> 
> 
> On 2020/08/24 13:01, torikoshia wrote:
>> On 2020-08-22 21:18, Michael Paquier wrote:
>>
>> Thanks for reviewing!
>>
>>> On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:
>>>> OK. Added a regression test on sysviews.sql.
>>>> (0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)
>>>>
>>>> Fujii-san gave us an example, but I added more simple one considering
>>>> the simplicity of other tests on that.
>>>
>>> What you have sent in 0001 looks fine to me.  A small test is much
>>> better than nothing.
> 
> +1
> 
> But as I proposed upthread, what about a bit complicated test as follows,
> e.g., to confirm that the internal logic for level works expectedly?
> 
>       SELECT name, ident, parent, level, total_bytes >= free_bytes FROM pg_backend_memory_contexts WHERE level = 0;
> 
> 
>>>
>>>> Added a patch for relocating the codes to mcxtfuncs.c.
>>>> (patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)
> 
> Thanks for the patch! Looks good to me.
> Barring any objection, I will commit this patch at first.

As far as I know, utils/adt is the directory to basically include the files
for a particular type or operator. So ISTM that mcxtfuncs.c doesn't
fit to this directory. Isn't it better to put that in utils/mmgr ?


The copyright line in new file mcxtfuncs.c should be changed as follows
because it contains new code?

- * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
- * Portions Copyright (c) 1994, Regents of the University of California
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Michael Paquier
Date:
On Mon, Aug 24, 2020 at 02:48:50PM +0900, Fujii Masao wrote:
> As far as I know, utils/adt is the directory to basically include the files
> for a particular type or operator. So ISTM that mcxtfuncs.c doesn't
> fit to this directory. Isn't it better to put that in utils/mmgr ?

We have also stuff like ruleutils.c, dbsize.c, genfile.c there which
is rather generic, so I would rather leave utils/mmgr/ out of the
business of this thread, and just keep inside all the lower-level APIs
for memory context handling.  I don't have a strong feeling for one
being better than the other, so if you prefer more one way than the
other, that's fine by me as long as the split is done as the new
functions depend on nothing static in mcxt.c.  And you are the
committer of this feature.

> The copyright line in new file mcxtfuncs.c should be changed as follows
> because it contains new code?

> - * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> - * Portions Copyright (c) 1994, Regents of the University of California
> + * Portions Copyright (c) 2020, PostgreSQL Global Development Group

FWIW, I usually choose what's proposed in the patch as a matter of
consistency, because it is a no-brainer and because you don't have to
think about past references when it comes to structures or such.
--
Michael

Attachment

Re: Creating a function for exposing memory usage of backend process

From
torikoshia
Date:
On 2020-08-24 13:13, Fujii Masao wrote:
> On 2020/08/24 13:01, torikoshia wrote:
>> On 2020-08-22 21:18, Michael Paquier wrote:
>> 
>> Thanks for reviewing!
>> 
>>> On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:
>>>> OK. Added a regression test on sysviews.sql.
>>>> (0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)
>>>> 
>>>> Fujii-san gave us an example, but I added more simple one 
>>>> considering
>>>> the simplicity of other tests on that.
>>> 
>>> What you have sent in 0001 looks fine to me.  A small test is much
>>> better than nothing.
> 
> +1
> 
> But as I proposed upthread, what about a bit complicated test as 
> follows,
> e.g., to confirm that the internal logic for level works expectedly?
> 
>      SELECT name, ident, parent, level, total_bytes >= free_bytes FROM
> pg_backend_memory_contexts WHERE level = 0;

OK!
Attached an updated patch.

> 
> 
>>> 
>>>> Added a patch for relocating the codes to mcxtfuncs.c.
>>>> (patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)
> 
> Thanks for the patch! Looks good to me.
> Barring any objection, I will commit this patch at first.
> 
> 
>>> 
>>> The same code is moved around line-by-line.
>>> 
>>>> Of course, this restriction makes pg_backend_memory_contexts hard to 
>>>> use
>>>> when the user of the target session is not granted pg_monitor 
>>>> because the
>>>> scope of this view is session local.
>>>> 
>>>> In this case, I imagine additional operations something like 
>>>> temporarily
>>>> granting pg_monitor to that user.
>>> 
>>> Hmm.  I am not completely sure either that pg_monitor is the best fit
>>> here, because this view provides information about a bunch of 
>>> internal
>>> structures.  Something that could easily be done though is to revoke
>>> the access from public, and then users could just set up GRANT
>>> permissions post-initdb, with pg_monitor as one possible choice.  
>>> This
>>> is the safest path by default, and this stuff is of a caliber similar
>>> to pg_shmem_allocations in terms of internal contents.
>> 
>> I think this is a better way than what I did in
>> 0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch.
> 
> You mean 
> 0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch?

Oops, I meant 
0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch.

> 
> 
>> Attached a patch.
> 
> Thanks for updating the patch! This also looks good to me.
> 
> 
>>> It seems to me that you are missing one "REVOKE ALL on
>>> pg_backend_memory_contexts FROM PUBLIC" in patch 0003.
>>> 
>>> By the way, if that was just for me, I would remove used_bytes, which
>>> is just a computation from the total and free numbers.  I'll defer
>>> that point to Fujii-san.
> 
> Yeah, I was just thinking that displaying also used_bytes was useful,
> but this might be inconsistent with the other views' ways.
> 
> Regards,

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION


Attachment

Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/08/24 17:09, Michael Paquier wrote:
> On Mon, Aug 24, 2020 at 02:48:50PM +0900, Fujii Masao wrote:
>> As far as I know, utils/adt is the directory to basically include the files
>> for a particular type or operator. So ISTM that mcxtfuncs.c doesn't
>> fit to this directory. Isn't it better to put that in utils/mmgr ?
> 
> We have also stuff like ruleutils.c, dbsize.c, genfile.c there which
> is rather generic, so I would rather leave utils/mmgr/ out of the
> business of this thread, and just keep inside all the lower-level APIs
> for memory context handling.

Understood. So I will commit the latest patch 0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/08/24 21:56, torikoshia wrote:
> On 2020-08-24 13:13, Fujii Masao wrote:
>> On 2020/08/24 13:01, torikoshia wrote:
>>> On 2020-08-22 21:18, Michael Paquier wrote:
>>>
>>> Thanks for reviewing!
>>>
>>>> On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:
>>>>> OK. Added a regression test on sysviews.sql.
>>>>> (0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)
>>>>>
>>>>> Fujii-san gave us an example, but I added more simple one considering
>>>>> the simplicity of other tests on that.
>>>>
>>>> What you have sent in 0001 looks fine to me.  A small test is much
>>>> better than nothing.
>>
>> +1
>>
>> But as I proposed upthread, what about a bit complicated test as follows,
>> e.g., to confirm that the internal logic for level works expectedly?
>>
>>      SELECT name, ident, parent, level, total_bytes >= free_bytes FROM
>> pg_backend_memory_contexts WHERE level = 0;
> 
> OK!
> Attached an updated patch.

Thanks for updating the patch! Looks good to me.
Barring any objection, I will commit this patch.

> 
>>
>>
>>>>
>>>>> Added a patch for relocating the codes to mcxtfuncs.c.
>>>>> (patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)
>>
>> Thanks for the patch! Looks good to me.
>> Barring any objection, I will commit this patch at first.
>>
>>
>>>>
>>>> The same code is moved around line-by-line.
>>>>
>>>>> Of course, this restriction makes pg_backend_memory_contexts hard to use
>>>>> when the user of the target session is not granted pg_monitor because the
>>>>> scope of this view is session local.
>>>>>
>>>>> In this case, I imagine additional operations something like temporarily
>>>>> granting pg_monitor to that user.
>>>>
>>>> Hmm.  I am not completely sure either that pg_monitor is the best fit
>>>> here, because this view provides information about a bunch of internal
>>>> structures.  Something that could easily be done though is to revoke
>>>> the access from public, and then users could just set up GRANT
>>>> permissions post-initdb, with pg_monitor as one possible choice. This
>>>> is the safest path by default, and this stuff is of a caliber similar
>>>> to pg_shmem_allocations in terms of internal contents.
>>>
>>> I think this is a better way than what I did in
>>> 0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch.
>>
>> You mean 0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch?
> 
> Oops, I meant 0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch.

This patch also looks good to me. Thanks!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Creating a function for exposing memory usage of backend process

From
Fujii Masao
Date:

On 2020/08/25 11:39, Fujii Masao wrote:
> 
> 
> On 2020/08/24 21:56, torikoshia wrote:
>> On 2020-08-24 13:13, Fujii Masao wrote:
>>> On 2020/08/24 13:01, torikoshia wrote:
>>>> On 2020-08-22 21:18, Michael Paquier wrote:
>>>>
>>>> Thanks for reviewing!
>>>>
>>>>> On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:
>>>>>> OK. Added a regression test on sysviews.sql.
>>>>>> (0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)
>>>>>>
>>>>>> Fujii-san gave us an example, but I added more simple one considering
>>>>>> the simplicity of other tests on that.
>>>>>
>>>>> What you have sent in 0001 looks fine to me.  A small test is much
>>>>> better than nothing.
>>>
>>> +1
>>>
>>> But as I proposed upthread, what about a bit complicated test as follows,
>>> e.g., to confirm that the internal logic for level works expectedly?
>>>
>>>      SELECT name, ident, parent, level, total_bytes >= free_bytes FROM
>>> pg_backend_memory_contexts WHERE level = 0;
>>
>> OK!
>> Attached an updated patch.
> 
> Thanks for updating the patch! Looks good to me.
> Barring any objection, I will commit this patch.
> 
>>
>>>
>>>
>>>>>
>>>>>> Added a patch for relocating the codes to mcxtfuncs.c.
>>>>>> (patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)
>>>
>>> Thanks for the patch! Looks good to me.
>>> Barring any objection, I will commit this patch at first.
>>>
>>>
>>>>>
>>>>> The same code is moved around line-by-line.
>>>>>
>>>>>> Of course, this restriction makes pg_backend_memory_contexts hard to use
>>>>>> when the user of the target session is not granted pg_monitor because the
>>>>>> scope of this view is session local.
>>>>>>
>>>>>> In this case, I imagine additional operations something like temporarily
>>>>>> granting pg_monitor to that user.
>>>>>
>>>>> Hmm.  I am not completely sure either that pg_monitor is the best fit
>>>>> here, because this view provides information about a bunch of internal
>>>>> structures.  Something that could easily be done though is to revoke
>>>>> the access from public, and then users could just set up GRANT
>>>>> permissions post-initdb, with pg_monitor as one possible choice. This
>>>>> is the safest path by default, and this stuff is of a caliber similar
>>>>> to pg_shmem_allocations in terms of internal contents.
>>>>
>>>> I think this is a better way than what I did in
>>>> 0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch.
>>>
>>> You mean 0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch?
>>
>> Oops, I meant 0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch.
> 
> This patch also looks good to me. Thanks!

I pushed the proposed three patches. Thanks!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION