Thread: Re: Enhancing Memory Context Statistics Reporting

Re: Enhancing Memory Context Statistics Reporting

From
torikoshia
Date:
On 2024-10-22 03:24, Rahila Syed wrote:
> Hi,
> 
> PostgreSQL provides following capabilities for reporting memory
> contexts statistics.
> 1. pg_get_backend_memory_contexts(); [1]
> 2. pg_log_backend_memory_contexts(pid); [2]
> 
> [1]  provides a view of memory context statistics for a local backend,
> while [2] prints the memory context statistics of any backend or
> auxiliary
> process to the PostgreSQL logs. Although [1] offers detailed
> statistics,
> it is limited to the local backend, restricting its use to PostgreSQL
> client backends only.
> On the other hand, [2] provides the statistics for all backends but
> logs them in a file,
> which may not be convenient for quick access.
> 
> I propose enhancing memory context statistics reporting by combining
> these
> capabilities and offering a view of memory statistics for all
> PostgreSQL backends
> and auxiliary processes.

Thanks for working on this!

I originally tried to develop something like your proposal in [2], but 
there were some difficulties and settled down to implement 
pg_log_backend_memory_contexts().

> Attached is a patch that implements this functionality. It introduces
> a SQL function
> that takes the PID of a backend as an argument, returning a set of
> records,
> each containing statistics for a single memory context. The underlying
> C function
> sends a signal to the backend and waits for it to publish its memory
> context statistics
>  before returning them to the user. The publishing backend copies
> these statistics
> during the next CHECK_FOR_INTERRUPTS call.

I remember waiting for dumping memory contexts stats could cause trouble 
considering some erroneous cases.

For example, just after the target process finished dumping stats, 
pg_get_remote_backend_memory_contexts() caller is terminated before 
reading the stats, calling pg_get_remote_backend_memory_contexts() has 
no response any more:

[session1]$ psql
            (40699)=#

$ kill -s SIGSTOP 40699

[session2] psql
           (40866)=# select * FROM 
pg_get_remote_backend_memory_contexts('40699', false); -- waiting

$ kill -s SIGSTOP 40866

$ kill -s SIGCONT 40699

[session3] psql
            (47656) $ select pg_terminate_backend(40866);

$ kill -s SIGCONT 40866 -- session2 terminated

[session3] (47656)=# select * FROM 
pg_get_remote_backend_memory_contexts('47656', false); -- no response

It seems the reason is memCtxState->in_use is now and 
memCtxState->proc_id is 40699.
We can continue to use pg_get_remote_backend_memory_contexts() after 
specifying 40699, but it'd be hard to understand for users.

> This approach facilitates on-demand publication of memory statistics
> for a specific backend, rather than collecting them at regular
> intervals.
> Since past memory context statistics may no longer be relevant,
> there is little value in retaining historical data. Any collected
> statistics
> can be discarded once read by the client backend.
> 
> A fixed-size shared memory block, currently accommodating 30 records,
>  is used to store the statistics. This number was chosen arbitrarily,
>  as it covers all parent contexts at level 1 (i.e., direct children of
> the top memory context)
> based on my tests.
> Further experiments are needed to determine the optimal number
> for summarizing memory statistics.
> 
> Any additional statistics that exceed the shared memory capacity
> are written to a file per backend in the PG_TEMP_FILES_DIR. The client
> backend
>  first reads from the shared memory, and if necessary, retrieves the
> remaining data from the file,
> combining everything into a unified view. The files are cleaned up
> automatically
> if a backend crashes or during server restarts.
> 
> The statistics are reported in a breadth-first search order of the
> memory context tree,
>  with parent contexts reported before their children. This provides a
> cumulative summary
> before diving into the details of each child context's consumption.
> 
> The rationale behind the shared memory chunk is to ensure that the
> majority of contexts which are the direct children of
> TopMemoryContext,
> fit into memory
> This allows a client to request a summary of memory statistics,
> which can be served from memory without the overhead of file access,
> unless necessary.
> 
> A publishing backend signals waiting client backends using a condition
> 
> variable when it has finished writing its statistics to memory.
> The client backend checks whether the statistics belong to the
> requested backend.
> If not, it continues waiting on the condition variable, timing out
> after 2 minutes.
> This timeout is an arbitrary choice, and further work is required to
> determine
> a more practical value.
> 
> All backends use the same memory space to publish their statistics.
> Before publishing, a backend checks whether the previous statistics
> have been
> successfully read by a client using a shared flag, "in_use."
> This flag is set by the publishing backend and cleared by the client
> backend once the data is read. If a backend cannot publish due to
> shared
> memory being occupied, it exits the interrupt processing code,
> and the client backend times out with a warning.
> 
> Please find below an example query to fetch memory contexts from the
> backend
>  with id '106114'. Second argument -'get_summary' is 'false',
> indicating a request for statistics of all the contexts.
> 
> postgres=#
> select * FROM pg_get_remote_backend_memory_contexts('116292', false)
> LIMIT 2;
> -[ RECORD 1 ]-+----------------------
> name          | TopMemoryContext
> ident         |
> type          | AllocSet
> path          | {0}
> total_bytes   | 97696
> total_nblocks | 5
> free_bytes    | 15376
> free_chunks   | 11
> used_bytes    | 82320
> pid           | 116292
> -[ RECORD 2 ]-+----------------------
> name          | RowDescriptionContext
> ident         |
> type          | AllocSet
> path          | {0,1}
> total_bytes   | 8192
> total_nblocks | 1
> free_bytes    | 6912
> free_chunks   | 0
> used_bytes    | 1280
> pid           | 116292

32d3ed8165f821f introduced 1-based path to pg_backend_memory_contexts, 
but pg_get_remote_backend_memory_contexts() seems to have 0-base path.

pg_backend_memory_contexts has "level" column, but 
pg_get_remote_backend_memory_contexts doesn't.

Are there any reasons for these?


> TODO:
> 1. Determine the behaviour when the statistics don't fit in one file.
> 
> [1] PostgreSQL: Re: Creating a function for exposing memory usage of
> backend process [1]
> 
> [2] PostgreSQL: Re: Get memory contexts of an arbitrary backend
> process [2]
> 
> Thank you,
> Rahila Syed
> 
> 
> 
> Links:
> ------
> [1] 
>
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2F0a768ae1-1703-59c7-86cc-7068ff5e318c%2540oss.nttdata.com&data=05%7C02%7Csyedrahila%40microsoft.com%7C3b35e97c29cf4796042408dcee8a4dbb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638647525436604911%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=cbO2DBP6IsgMPTEVFNh%2FKeq4IoK3MZvTpzKkCQzNPMo%3D&reserved=0
> [2] 
>
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fbea016ad-d1a7-f01d-a7e8-01106a1de77f%2540oss.nttdata.com&data=05%7C02%7Csyedrahila%40microsoft.com%7C3b35e97c29cf4796042408dcee8a4dbb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638647525436629740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=UCwkwg6kikVEf0oHf3%2BlliA%2FTUdMG%2F0cOiMta7fjPPk%3D&reserved=0

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.



Re: Enhancing Memory Context Statistics Reporting

From
Rahila Syed
Date:
Hi Torikoshia,

Thank you for reviewing the patch!

On Wed, Oct 23, 2024 at 9:28 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-10-22 03:24, Rahila Syed wrote:
> Hi,
>
> PostgreSQL provides following capabilities for reporting memory
> contexts statistics.
> 1. pg_get_backend_memory_contexts(); [1]
> 2. pg_log_backend_memory_contexts(pid); [2]
>
> [1]  provides a view of memory context statistics for a local backend,
> while [2] prints the memory context statistics of any backend or
> auxiliary
> process to the PostgreSQL logs. Although [1] offers detailed
> statistics,
> it is limited to the local backend, restricting its use to PostgreSQL
> client backends only.
> On the other hand, [2] provides the statistics for all backends but
> logs them in a file,
> which may not be convenient for quick access.
>
> I propose enhancing memory context statistics reporting by combining
> these
> capabilities and offering a view of memory statistics for all
> PostgreSQL backends
> and auxiliary processes.

Thanks for working on this!

I originally tried to develop something like your proposal in [2], but
there were some difficulties and settled down to implement
pg_log_backend_memory_contexts().

Yes. I am revisiting this problem :)
 
> Attached is a patch that implements this functionality. It introduces
> a SQL function
> that takes the PID of a backend as an argument, returning a set of
> records,
> each containing statistics for a single memory context. The underlying
> C function
> sends a signal to the backend and waits for it to publish its memory
> context statistics
>  before returning them to the user. The publishing backend copies
> these statistics
> during the next CHECK_FOR_INTERRUPTS call.

I remember waiting for dumping memory contexts stats could cause trouble
considering some erroneous cases.

For example, just after the target process finished dumping stats,
pg_get_remote_backend_memory_contexts() caller is terminated before
reading the stats, calling pg_get_remote_backend_memory_contexts() has
no response any more:

[session1]$ psql
            (40699)=#

$ kill -s SIGSTOP 40699

[session2] psql
           (40866)=# select * FROM
pg_get_remote_backend_memory_contexts('40699', false); -- waiting

$ kill -s SIGSTOP 40866

$ kill -s SIGCONT 40699

[session3] psql
            (47656) $ select pg_terminate_backend(40866);

$ kill -s SIGCONT 40866 -- session2 terminated

[session3] (47656)=# select * FROM
pg_get_remote_backend_memory_contexts('47656', false); -- no response

It seems the reason is memCtxState->in_use is now and
memCtxState->proc_id is 40699.
We can continue to use pg_get_remote_backend_memory_contexts() after
specifying 40699, but it'd be hard to understand for users.

Thanks for testing and reporting. While I am not able to reproduce this problem,
I think this may be happening because the requesting backend/caller is terminated 
before it gets a chance to mark  memCtxState->in_use as false.

In this case memCtxState->in_use should be marked as
'false' possibly during the processing of ProcDiePending in 
ProcessInterrupts().

> This approach facilitates on-demand publication of memory statistics
> for a specific backend, rather than collecting them at regular
> intervals.
> Since past memory context statistics may no longer be relevant,
> there is little value in retaining historical data. Any collected
> statistics
> can be discarded once read by the client backend.
>
> A fixed-size shared memory block, currently accommodating 30 records,
>  is used to store the statistics. This number was chosen arbitrarily,
>  as it covers all parent contexts at level 1 (i.e., direct children of
> the top memory context)
> based on my tests.
> Further experiments are needed to determine the optimal number
> for summarizing memory statistics.
>
> Any additional statistics that exceed the shared memory capacity
> are written to a file per backend in the PG_TEMP_FILES_DIR. The client
> backend
>  first reads from the shared memory, and if necessary, retrieves the
> remaining data from the file,
> combining everything into a unified view. The files are cleaned up
> automatically
> if a backend crashes or during server restarts.
>
> The statistics are reported in a breadth-first search order of the
> memory context tree,
>  with parent contexts reported before their children. This provides a
> cumulative summary
> before diving into the details of each child context's consumption.
>
> The rationale behind the shared memory chunk is to ensure that the
> majority of contexts which are the direct children of
> TopMemoryContext,
> fit into memory
> This allows a client to request a summary of memory statistics,
> which can be served from memory without the overhead of file access,
> unless necessary.
>
> A publishing backend signals waiting client backends using a condition
>
> variable when it has finished writing its statistics to memory.
> The client backend checks whether the statistics belong to the
> requested backend.
> If not, it continues waiting on the condition variable, timing out
> after 2 minutes.
> This timeout is an arbitrary choice, and further work is required to
> determine
> a more practical value.
>
> All backends use the same memory space to publish their statistics.
> Before publishing, a backend checks whether the previous statistics
> have been
> successfully read by a client using a shared flag, "in_use."
> This flag is set by the publishing backend and cleared by the client
> backend once the data is read. If a backend cannot publish due to
> shared
> memory being occupied, it exits the interrupt processing code,
> and the client backend times out with a warning.
>
> Please find below an example query to fetch memory contexts from the
> backend
>  with id '106114'. Second argument -'get_summary' is 'false',
> indicating a request for statistics of all the contexts.
>
> postgres=#
> select * FROM pg_get_remote_backend_memory_contexts('116292', false)
> LIMIT 2;
> -[ RECORD 1 ]-+----------------------
> name          | TopMemoryContext
> ident         |
> type          | AllocSet
> path          | {0}
> total_bytes   | 97696
> total_nblocks | 5
> free_bytes    | 15376
> free_chunks   | 11
> used_bytes    | 82320
> pid           | 116292
> -[ RECORD 2 ]-+----------------------
> name          | RowDescriptionContext
> ident         |
> type          | AllocSet
> path          | {0,1}
> total_bytes   | 8192
> total_nblocks | 1
> free_bytes    | 6912
> free_chunks   | 0
> used_bytes    | 1280
> pid           | 116292

32d3ed8165f821f introduced 1-based path to pg_backend_memory_contexts,
but pg_get_remote_backend_memory_contexts() seems to have 0-base path.

Right. I will change it to match this commit. 
 
pg_backend_memory_contexts has "level" column, but
pg_get_remote_backend_memory_contexts doesn't.

Are there any reasons for these?

 No particular reason, I can add this column as well.

Thank you,
Rahila Syed

Re: Enhancing Memory Context Statistics Reporting

From
torikoshia
Date:
On 2024-10-24 14:59, Rahila Syed wrote:
> Hi Torikoshia,
> 
> Thank you for reviewing the patch!
> 
> On Wed, Oct 23, 2024 at 9:28 AM torikoshia
> <torikoshia@oss.nttdata.com> wrote:
> 
>> On 2024-10-22 03:24, Rahila Syed wrote:
>>> Hi,
>>> 
>>> PostgreSQL provides following capabilities for reporting memory
>>> contexts statistics.
>>> 1. pg_get_backend_memory_contexts(); [1]
>>> 2. pg_log_backend_memory_contexts(pid); [2]
>>> 
>>> [1]  provides a view of memory context statistics for a local
>> backend,
>>> while [2] prints the memory context statistics of any backend or
>>> auxiliary
>>> process to the PostgreSQL logs. Although [1] offers detailed
>>> statistics,
>>> it is limited to the local backend, restricting its use to
>> PostgreSQL
>>> client backends only.
>>> On the other hand, [2] provides the statistics for all backends
>> but
>>> logs them in a file,
>>> which may not be convenient for quick access.
>>> 
>>> I propose enhancing memory context statistics reporting by
>> combining
>>> these
>>> capabilities and offering a view of memory statistics for all
>>> PostgreSQL backends
>>> and auxiliary processes.
>> 
>> Thanks for working on this!
>> 
>> I originally tried to develop something like your proposal in [2],
>> but
>> there were some difficulties and settled down to implement
>> pg_log_backend_memory_contexts().
> 
> Yes. I am revisiting this problem :)
> 
>>> Attached is a patch that implements this functionality. It
>> introduces
>>> a SQL function
>>> that takes the PID of a backend as an argument, returning a set of
>>> records,
>>> each containing statistics for a single memory context. The
>> underlying
>>> C function
>>> sends a signal to the backend and waits for it to publish its
>> memory
>>> context statistics
>>> before returning them to the user. The publishing backend copies
>>> these statistics
>>> during the next CHECK_FOR_INTERRUPTS call.
>> 
>> I remember waiting for dumping memory contexts stats could cause
>> trouble
>> considering some erroneous cases.
>> 
>> For example, just after the target process finished dumping stats,
>> pg_get_remote_backend_memory_contexts() caller is terminated before
>> reading the stats, calling pg_get_remote_backend_memory_contexts()
>> has
>> no response any more:
>> 
>> [session1]$ psql
>> (40699)=#
>> 
>> $ kill -s SIGSTOP 40699
>> 
>> [session2] psql
>> (40866)=# select * FROM
>> pg_get_remote_backend_memory_contexts('40699', false); -- waiting
>> 
>> $ kill -s SIGSTOP 40866
>> 
>> $ kill -s SIGCONT 40699
>> 
>> [session3] psql
>> (47656) $ select pg_terminate_backend(40866);
>> 
>> $ kill -s SIGCONT 40866 -- session2 terminated
>> 
>> [session3] (47656)=# select * FROM
>> pg_get_remote_backend_memory_contexts('47656', false); -- no
>> response
>> 
>> It seems the reason is memCtxState->in_use is now and
>> memCtxState->proc_id is 40699.
>> We can continue to use pg_get_remote_backend_memory_contexts() after
>> 
>> specifying 40699, but it'd be hard to understand for users.
> 
> Thanks for testing and reporting. While I am not able to reproduce
> this problem,
> I think this may be happening because the requesting backend/caller is
> terminated
> before it gets a chance to mark  memCtxState->in_use as false.

Yeah, when I attached a debugger to 47656 when it was waiting on 
pg_get_remote_backend_memory_contexts('47656', false), 
memCtxState->in_use was true as you suspected:

   (lldb) p memCtxState->in_use
   (bool) $1 = true
   (lldb) p memCtxState->proc_id
   (int) $2 = 40699
   (lldb) p pid
   (int) $3 = 47656

> In this case memCtxState->in_use should be marked as
> 'false' possibly during the processing of ProcDiePending in
> ProcessInterrupts().
> 
>>> This approach facilitates on-demand publication of memory
>> statistics
>>> for a specific backend, rather than collecting them at regular
>>> intervals.
>>> Since past memory context statistics may no longer be relevant,
>>> there is little value in retaining historical data. Any collected
>>> statistics
>>> can be discarded once read by the client backend.
>>> 
>>> A fixed-size shared memory block, currently accommodating 30
>> records,
>>> is used to store the statistics. This number was chosen
>> arbitrarily,
>>> as it covers all parent contexts at level 1 (i.e., direct
>> children of
>>> the top memory context)
>>> based on my tests.
>>> Further experiments are needed to determine the optimal number
>>> for summarizing memory statistics.
>>> 
>>> Any additional statistics that exceed the shared memory capacity
>>> are written to a file per backend in the PG_TEMP_FILES_DIR. The
>> client
>>> backend
>>> first reads from the shared memory, and if necessary, retrieves
>> the
>>> remaining data from the file,
>>> combining everything into a unified view. The files are cleaned up
>>> automatically
>>> if a backend crashes or during server restarts.
>>> 
>>> The statistics are reported in a breadth-first search order of the
>>> memory context tree,
>>> with parent contexts reported before their children. This
>> provides a
>>> cumulative summary
>>> before diving into the details of each child context's
>> consumption.
>>> 
>>> The rationale behind the shared memory chunk is to ensure that the
>>> majority of contexts which are the direct children of
>>> TopMemoryContext,
>>> fit into memory
>>> This allows a client to request a summary of memory statistics,
>>> which can be served from memory without the overhead of file
>> access,
>>> unless necessary.
>>> 
>>> A publishing backend signals waiting client backends using a
>> condition
>>> 
>>> variable when it has finished writing its statistics to memory.
>>> The client backend checks whether the statistics belong to the
>>> requested backend.
>>> If not, it continues waiting on the condition variable, timing out
>>> after 2 minutes.
>>> This timeout is an arbitrary choice, and further work is required
>> to
>>> determine
>>> a more practical value.
>>> 
>>> All backends use the same memory space to publish their
>> statistics.
>>> Before publishing, a backend checks whether the previous
>> statistics
>>> have been
>>> successfully read by a client using a shared flag, "in_use."
>>> This flag is set by the publishing backend and cleared by the
>> client
>>> backend once the data is read. If a backend cannot publish due to
>>> shared
>>> memory being occupied, it exits the interrupt processing code,
>>> and the client backend times out with a warning.
>>> 
>>> Please find below an example query to fetch memory contexts from
>> the
>>> backend
>>> with id '106114'. Second argument -'get_summary' is 'false',
>>> indicating a request for statistics of all the contexts.
>>> 
>>> postgres=#
>>> select * FROM pg_get_remote_backend_memory_contexts('116292',
>> false)
>>> LIMIT 2;
>>> -[ RECORD 1 ]-+----------------------
>>> name          | TopMemoryContext
>>> ident         |
>>> type          | AllocSet
>>> path          | {0}
>>> total_bytes   | 97696
>>> total_nblocks | 5
>>> free_bytes    | 15376
>>> free_chunks   | 11
>>> used_bytes    | 82320
>>> pid           | 116292
>>> -[ RECORD 2 ]-+----------------------
>>> name          | RowDescriptionContext
>>> ident         |
>>> type          | AllocSet
>>> path          | {0,1}
>>> total_bytes   | 8192
>>> total_nblocks | 1
>>> free_bytes    | 6912
>>> free_chunks   | 0
>>> used_bytes    | 1280
>>> pid           | 116292
>> 
>> 32d3ed8165f821f introduced 1-based path to
>> pg_backend_memory_contexts,
>> but pg_get_remote_backend_memory_contexts() seems to have 0-base
>> path.
> 
> Right. I will change it to match this commit.
> 
>> pg_backend_memory_contexts has "level" column, but
>> pg_get_remote_backend_memory_contexts doesn't.
>> 
>> Are there any reasons for these?
> 
>  No particular reason, I can add this column as well.
> 
> Thank you,
> Rahila Syed

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.