Thread: Re: Trim the heap free memory

Re: Trim the heap free memory

From

Ashutosh Bapat

Date:

23 August 2024, 15:02:10

Hi Shawn,


On Fri, Aug 23, 2024 at 2:24 PM shawn wang <shawn.wang.pg@gmail.com> wrote:
>
> Hi hackers,
>
> Currently, all processes in PostgreSQL actually use malloc to allocate and free memory. In the case of long
connectionswhere business queries are executed over extended periods, the distribution of memory can become extremely
complex.
>
> Under certain circumstances, a common issue in memory usage due to the caching strategy of malloc may arise: even if
memoryis released through the free function, it may not be returned to the OS in a timely manner. This can lead to high
systemmemory usage, affecting performance and the operation of other applications, and may even result in Out-Of-Memory
(OOM)errors. 
>
> To address this issue, I have developed a new function called pg_trim_backend_heap_free_memory, based on the existing
pg_log_backend_memory_contextsfunction. This function triggers the specified process to execute the malloc_trim
operationby sending signals, thereby releasing as much unreturned memory to the operating system as possible. This not
onlyhelps to optimize memory usage but can also significantly enhance system performance under memory pressure. 
>
> Here is an example of using the pg_trim_backend_heap_free_memory function to demonstrate its effect:
>>
>> CREATE OR REPLACE FUNCTION public.partition_create(schemaname character varying, numberofpartition integer)
>> RETURNS integer
>> LANGUAGE plpgsql
>> AS $function$
>> declare
>> currentTableId integer;
>> currentSchemaName varchar(100);
>> currentTableName varchar(100);
>> begin
>> execute 'create schema ' || schemaname;
>> execute 'create table ' || schemaname || '.' || schemaname || 'hashtable (p1 text, p2 text, p3 text, p4 int, p5 int,
p6int, p7 int, p8 text, p9 name, p10 varchar, p11 text, p12 text, p13 text) PARTITION BY HASH(p1);'; 
>> currentTableId := 1;
>> loop
>> currentTableName := schemaname || '.' || schemaname || 'hashtable' || ltrim(currentTableId::varchar(10));
>> execute 'create table ' || currentTableName || ' PARTITION OF ' || schemaname || '.' || schemaname || 'hashtable' ||
'FOR VALUES WITH(MODULUS ' || numberofpartition || ', REMAINDER ' || currentTableId - 1 || ')'; 
>> currentTableId := currentTableId + 1;
>> if (currentTableId > numberofpartition) then exit; end if;
>> end loop;
>> return currentTableId - 1;
>> END $function$;
>>
>> select public.partition_create('test3', 5000);
>> select public.partition_create('test4', 5000);
>> select count(*) from test4.test4hashtable a, test3.test3hashtable b where a.p1=b.p1;
>
> You are now about to see the memory size of the process executing the query.
>>
>> postgres   68673  1.2  0.0 610456 124768 ?        Ss   08:25   0:01 postgres: postgres postgres [local] idle
>> Size:              89600 kB
>> KernelPageSize:        4 kB
>> MMUPageSize:           4 kB
>> Rss:               51332 kB
>> Pss:               51332 kB
>>
>> 02b65000-082e5000 rw-p 00000000 00:00 0                                  [heap]
>
>
>
> After use pg_trim_backend_heap_free_memory, you will see:
>>
>> postgres=# select pg_trim_backend_heap_free_memory(pg_backend_pid());
>> 2024-08-23 08:27:53.958 UTC [68673] LOG:  trimming heap free memory of PID 68673
>>  pg_trim_backend_heap_free_memory
>> ----------------------------------
>>  t
>> (1 row)
>> 02b65000-082e5000 rw-p 00000000 00:00 0                                  [heap]
>> Size:              89600 kB
>> KernelPageSize:        4 kB
>> MMUPageSize:           4 kB
>> Rss:                4888 kB
>> Pss:                4888 kB
>>
>> postgres   68673  1.2  0.0 610456 75244 ?        Ss   08:26   0:01 postgres: postgres postgres [local] idle
>
>
> Looking forward to your feedback,
Looks useful.

How much time does malloc_trim() take to finish? Does it affect the
current database activity in that backend? It may be good to see
effect of this function by firing the function on random backends
while the query is running through pgbench.

In the patch I don't see definitions of
ProcessTrimHeapFreeMemoryInterrupt() and
HandleTrimHeapFreeMemoryInterrupt(). Am I missing something?

--
Best Wishes,
Ashutosh Bapat

Re: Trim the heap free memory

From

shawn wang

Date:

24 August 2024, 05:42:04

Hi Ashutosh, thank you for your response.

Firstly, the purpose of caching memory in malloc is for performance, so when we execute malloc_trim(), it will affect the efficiency of memory usage in the subsequent operation. Secondly, the function of malloc_trim() is to lock and traverse the bins, then execute madvise on the memory that can be released. When there is a lot of memory in the bins, the traversal time will also increase. I once placed malloc_trim() to execute at the end of each query, which resulted in a 20% performance drop. Therefore, I use it as such a function. The new v2 patch has included the omitted code.

Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> 于2024年8月23日周五 20:02写道：

Hi Shawn,

On Fri, Aug 23, 2024 at 2:24 PM shawn wang <shawn.wang.pg@gmail.com> wrote:
>
> Hi hackers,
>
> Currently, all processes in PostgreSQL actually use malloc to allocate and free memory. In the case of long connections where business queries are executed over extended periods, the distribution of memory can become extremely complex.
>
> Under certain circumstances, a common issue in memory usage due to the caching strategy of malloc may arise: even if memory is released through the free function, it may not be returned to the OS in a timely manner. This can lead to high system memory usage, affecting performance and the operation of other applications, and may even result in Out-Of-Memory (OOM) errors.
>
> To address this issue, I have developed a new function called pg_trim_backend_heap_free_memory, based on the existing pg_log_backend_memory_contexts function. This function triggers the specified process to execute the malloc_trim operation by sending signals, thereby releasing as much unreturned memory to the operating system as possible. This not only helps to optimize memory usage but can also significantly enhance system performance under memory pressure.
>
> Here is an example of using the pg_trim_backend_heap_free_memory function to demonstrate its effect:
>>
>> CREATE OR REPLACE FUNCTION public.partition_create(schemaname character varying, numberofpartition integer)
>> RETURNS integer
>> LANGUAGE plpgsql
>> AS $function$
>> declare
>> currentTableId integer;
>> currentSchemaName varchar(100);
>> currentTableName varchar(100);
>> begin
>> execute 'create schema ' || schemaname;
>> execute 'create table ' || schemaname || '.' || schemaname || 'hashtable (p1 text, p2 text, p3 text, p4 int, p5 int, p6 int, p7 int, p8 text, p9 name, p10 varchar, p11 text, p12 text, p13 text) PARTITION BY HASH(p1);';
>> currentTableId := 1;
>> loop
>> currentTableName := schemaname || '.' || schemaname || 'hashtable' || ltrim(currentTableId::varchar(10));
>> execute 'create table ' || currentTableName || ' PARTITION OF ' || schemaname || '.' || schemaname || 'hashtable' || ' FOR VALUES WITH(MODULUS ' || numberofpartition || ', REMAINDER ' || currentTableId - 1 || ')';
>> currentTableId := currentTableId + 1;
>> if (currentTableId > numberofpartition) then exit; end if;
>> end loop;
>> return currentTableId - 1;
>> END $function$;
>>
>> select public.partition_create('test3', 5000);
>> select public.partition_create('test4', 5000);
>> select count(*) from test4.test4hashtable a, test3.test3hashtable b where a.p1=b.p1;
>
> You are now about to see the memory size of the process executing the query.
>>
>> postgres 68673 1.2 0.0 610456 124768 ? Ss 08:25 0:01 postgres: postgres postgres [local] idle
>> Size: 89600 kB
>> KernelPageSize: 4 kB
>> MMUPageSize: 4 kB
>> Rss: 51332 kB
>> Pss: 51332 kB
>>
>> 02b65000-082e5000 rw-p 00000000 00:00 0 [heap]
>
>
>
> After use pg_trim_backend_heap_free_memory, you will see:
>>
>> postgres=# select pg_trim_backend_heap_free_memory(pg_backend_pid());
>> 2024-08-23 08:27:53.958 UTC [68673] LOG: trimming heap free memory of PID 68673
>> pg_trim_backend_heap_free_memory
>> ----------------------------------
>> t
>> (1 row)
>> 02b65000-082e5000 rw-p 00000000 00:00 0 [heap]
>> Size: 89600 kB
>> KernelPageSize: 4 kB
>> MMUPageSize: 4 kB
>> Rss: 4888 kB
>> Pss: 4888 kB
>>
>> postgres 68673 1.2 0.0 610456 75244 ? Ss 08:26 0:01 postgres: postgres postgres [local] idle
>
>
> Looking forward to your feedback,
Looks useful.

How much time does malloc_trim() take to finish? Does it affect the
current database activity in that backend? It may be good to see
effect of this function by firing the function on random backends
while the query is running through pgbench.

In the patch I don't see definitions of
ProcessTrimHeapFreeMemoryInterrupt() and
HandleTrimHeapFreeMemoryInterrupt(). Am I missing something?

--
Best Wishes,
Ashutosh Bapat

Re: Trim the heap free memory

From

Ashutosh Bapat

Date:

26 August 2024, 14:04:49

Hi Shawn,
It will be good to document usage of this function. Please add
document changes in your patch. We need to document the impact of this
function so that users can judiciously decide whether or not to use
this function and under what conditions. Also they would know what to
expect when they use this function.

Running it after a query finishes is one thing but that can't be
guaranteed because of the asynchronous nature of signal handlers.
malloc_trim() may be called while a query is being executed. We need
to assess that impact as well.

Can you please share some numbers - TPS, latency etc. with and without
this function invoked during a benchmark run?

--
Best Wishes,
Ashutosh Bapat

On Sat, Aug 24, 2024 at 8:12 AM shawn wang <shawn.wang.pg@gmail.com> wrote:
>
> Hi Ashutosh, thank you for your response.
> Firstly, the purpose of caching memory in malloc is for performance, so when we execute malloc_trim(), it will affect
theefficiency of memory usage in the subsequent operation. Secondly, the function of malloc_trim() is to lock and
traversethe bins, then execute madvise on the memory that can be released. When there is a lot of memory in the bins,
thetraversal time will also increase. I once placed malloc_trim() to execute at the end of each query, which resulted
ina 20% performance drop. Therefore, I use it as such a function. The new v2 patch has included the omitted code. 
>
> Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> 于2024年8月23日周五 20:02写道：
>>
>> Hi Shawn,
>>
>>
>> On Fri, Aug 23, 2024 at 2:24 PM shawn wang <shawn.wang.pg@gmail.com> wrote:
>> >
>> > Hi hackers,
>> >
>> > Currently, all processes in PostgreSQL actually use malloc to allocate and free memory. In the case of long
connectionswhere business queries are executed over extended periods, the distribution of memory can become extremely
complex.
>> >
>> > Under certain circumstances, a common issue in memory usage due to the caching strategy of malloc may arise: even
ifmemory is released through the free function, it may not be returned to the OS in a timely manner. This can lead to
highsystem memory usage, affecting performance and the operation of other applications, and may even result in
Out-Of-Memory(OOM) errors. 
>> >
>> > To address this issue, I have developed a new function called pg_trim_backend_heap_free_memory, based on the
existingpg_log_backend_memory_contexts function. This function triggers the specified process to execute the
malloc_trimoperation by sending signals, thereby releasing as much unreturned memory to the operating system as
possible.This not only helps to optimize memory usage but can also significantly enhance system performance under
memorypressure. 
>> >
>> > Here is an example of using the pg_trim_backend_heap_free_memory function to demonstrate its effect:
>> >>
>> >> CREATE OR REPLACE FUNCTION public.partition_create(schemaname character varying, numberofpartition integer)
>> >> RETURNS integer
>> >> LANGUAGE plpgsql
>> >> AS $function$
>> >> declare
>> >> currentTableId integer;
>> >> currentSchemaName varchar(100);
>> >> currentTableName varchar(100);
>> >> begin
>> >> execute 'create schema ' || schemaname;
>> >> execute 'create table ' || schemaname || '.' || schemaname || 'hashtable (p1 text, p2 text, p3 text, p4 int, p5
int,p6 int, p7 int, p8 text, p9 name, p10 varchar, p11 text, p12 text, p13 text) PARTITION BY HASH(p1);'; 
>> >> currentTableId := 1;
>> >> loop
>> >> currentTableName := schemaname || '.' || schemaname || 'hashtable' || ltrim(currentTableId::varchar(10));
>> >> execute 'create table ' || currentTableName || ' PARTITION OF ' || schemaname || '.' || schemaname || 'hashtable'
||' FOR VALUES WITH(MODULUS ' || numberofpartition || ', REMAINDER ' || currentTableId - 1 || ')'; 
>> >> currentTableId := currentTableId + 1;
>> >> if (currentTableId > numberofpartition) then exit; end if;
>> >> end loop;
>> >> return currentTableId - 1;
>> >> END $function$;
>> >>
>> >> select public.partition_create('test3', 5000);
>> >> select public.partition_create('test4', 5000);
>> >> select count(*) from test4.test4hashtable a, test3.test3hashtable b where a.p1=b.p1;
>> >
>> > You are now about to see the memory size of the process executing the query.
>> >>
>> >> postgres   68673  1.2  0.0 610456 124768 ?        Ss   08:25   0:01 postgres: postgres postgres [local] idle
>> >> Size:              89600 kB
>> >> KernelPageSize:        4 kB
>> >> MMUPageSize:           4 kB
>> >> Rss:               51332 kB
>> >> Pss:               51332 kB
>> >>
>> >> 02b65000-082e5000 rw-p 00000000 00:00 0                                  [heap]
>> >
>> >
>> >
>> > After use pg_trim_backend_heap_free_memory, you will see:
>> >>
>> >> postgres=# select pg_trim_backend_heap_free_memory(pg_backend_pid());
>> >> 2024-08-23 08:27:53.958 UTC [68673] LOG:  trimming heap free memory of PID 68673
>> >>  pg_trim_backend_heap_free_memory
>> >> ----------------------------------
>> >>  t
>> >> (1 row)
>> >> 02b65000-082e5000 rw-p 00000000 00:00 0                                  [heap]
>> >> Size:              89600 kB
>> >> KernelPageSize:        4 kB
>> >> MMUPageSize:           4 kB
>> >> Rss:                4888 kB
>> >> Pss:                4888 kB
>> >>
>> >> postgres   68673  1.2  0.0 610456 75244 ?        Ss   08:26   0:01 postgres: postgres postgres [local] idle
>> >
>> >
>> > Looking forward to your feedback,
>> Looks useful.
>>
>> How much time does malloc_trim() take to finish? Does it affect the
>> current database activity in that backend? It may be good to see
>> effect of this function by firing the function on random backends
>> while the query is running through pgbench.
>>
>> In the patch I don't see definitions of
>> ProcessTrimHeapFreeMemoryInterrupt() and
>> HandleTrimHeapFreeMemoryInterrupt(). Am I missing something?
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat

Re: Trim the heap free memory

From

Rafia Sabih

Date:

11 September 2024, 13:24:49

Unfortunately, I still see a compiling issue with this patch,

memtrim.c:15:10: fatal error: 'malloc.h' file not found
#include <malloc.h>
^~~~~~~~~~
1 error generated.

On Wed, 28 Aug 2024 at 12:54, shawn wang <shawn.wang.pg@gmail.com> wrote:

Hi Ashutosh,

Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> 于2024年8月26日周一 19:05写道：
Hi Shawn,
It will be good to document usage of this function. Please add
document changes in your patch. We need to document the impact of this
function so that users can judiciously decide whether or not to use
this function and under what conditions. Also they would know what to
expect when they use this function.

I have already incorporated the usage of this function into the new patch.
Currently, there is no memory information that can be extremely accurate to
reflect whether a trim operation should be performed. Here are two conditions
that can be used as references:
1. Check the difference between the process's memory usage (for example,
the top command, due to the relationship with shared memory, it is necessary
to subtract SHR from RES) and the statistics of the memory context. If the
difference is very large, this function should be used to release memory;
2. Execute malloc_stats(). If the system bytes are greater than the
in-use bytes, this indicates that this function can be used to release memory.

Running it after a query finishes is one thing but that can't be
guaranteed because of the asynchronous nature of signal handlers.
malloc_trim() may be called while a query is being executed. We need
to assess that impact as well.

Can you please share some numbers - TPS, latency etc. with and without
this function invoked during a benchmark run?

I have placed malloc_trim() at the end of the exec_simple_query function,
so that malloc_trim() is executed once for each SQL statement executed. I
used pgbench to reproduce the performance impact,
and the results are as follows.
Database preparation：
create database testc;
create user t1;
alter database testc owner to t1;
./pgbench testc -U t1 -i -s 100
./pgbench testc -U t1 -S -c 100 -j 100 -T 600
Without Trim:
$./pgbench testc -U t1 -S -c 100 -j 100 -T 600
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 551984376
number of failed transactions: 0 (0.000%)
latency average = 0.109 ms
initial connection time = 23.569 ms
tps = 920001.842189 (without initial connection time)
With Trim :
$./pgbench testc -U t1 -S -c 100 -j 100 -T 600
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 470690787
number of failed transactions: 0 (0.000%)
latency average = 0.127 ms
initial connection time = 23.632 ms
tps = 784511.901558 (without initial connection time)

Regards,
Rafia Sabih

Re: Trim the heap free memory

From

David Rowley

Date:

12 September 2024, 11:42:24

On Thu, 12 Sept 2024 at 14:40, shawn wang <shawn.wang.pg@gmail.com> wrote:
> Could you please perform another round of testing to ensure that everything is functioning as expected with this
modification?

One way to get a few machines with various build systems testing this
is to register the patch on the commitfest app in [1]. You can then
see if the patch is passing the continuous integration tests in [2].
One day soon the features of [2] should be combined with [1].

David

[1] https://commitfest.postgresql.org/50/
[2] http://cfbot.cputube.org/

Re: Trim the heap free memory

From

Tom Lane

Date:

15 September 2024, 21:22:14

shawn wang <shawn.wang.pg@gmail.com> writes:
> I have successfully registered my patch for the commitfest. However, upon
> integration, I encountered several errors during the testing phase. I am
> currently investigating the root causes of these issues and will work on
> providing the necessary fixes.

I should think the root cause is pretty obvious: malloc_trim() is
a glibc-ism.

I'm fairly doubtful that this is something we should spend time on.
It can never work on any non-glibc platform.  Even granting that
a Linux-only feature could be worth having, I'm really doubtful
that our memory allocation patterns are such that malloc_trim()
could be expected to free a useful amount of memory mid-query.
The single test case you showed suggested that maybe we could
usefully prod glibc to free memory at query completion, but we
don't need all this interrupt infrastructure to do that.  I think
we could likely get 95% of the benefit with about a five-line
patch.

            regards, tom lane

Re: Trim the heap free memory

From

Tom Lane

Date:

15 September 2024, 22:16:27

I wrote:
> The single test case you showed suggested that maybe we could
> usefully prod glibc to free memory at query completion, but we
> don't need all this interrupt infrastructure to do that.  I think
> we could likely get 95% of the benefit with about a five-line
> patch.

To try to quantify that a little, I wrote a very quick-n-dirty
patch to apply malloc_trim during finish_xact_command and log
the effects.  (I am not asserting this is the best place to
call malloc_trim; it's just one plausible possibility.)  Patch
attached, as well as statistics collected from a run of the
core regression tests followed by

grep malloc_trim postmaster.log | sed 's/.*LOG:/LOG:/' | sort -k4n | uniq -c >trim_savings.txt

We can see that out of about 43K test queries, 32K saved nothing
whatever, and in only four was more than a couple of meg saved.
That's pretty discouraging IMO.  It might be useful to look closer
at the behavior of those top four though.  I see them as

2024-09-15 14:58:06.146 EDT [960138] LOG:  malloc_trim saved 7228 kB
2024-09-15 14:58:06.146 EDT [960138] STATEMENT:  ALTER TABLE delete_test_table ADD PRIMARY KEY (a,b,c,d);

2024-09-15 14:58:09.861 EDT [960949] LOG:  malloc_trim saved 12488 kB
2024-09-15 14:58:09.861 EDT [960949] STATEMENT:  with recursive search_graph(f, t, label, is_cycle, path) as (
        select *, false, array[row(g.f, g.t)] from graph g
        union distinct
        select g.*, row(g.f, g.t) = any(path), path || row(g.f, g.t)
        from graph g, search_graph sg
        where g.f = sg.t and not is_cycle
    )
    select * from search_graph;

2024-09-15 14:58:09.866 EDT [960949] LOG:  malloc_trim saved 12488 kB
2024-09-15 14:58:09.866 EDT [960949] STATEMENT:  with recursive search_graph(f, t, label) as (
        select * from graph g
        union distinct
        select g.*
        from graph g, search_graph sg
        where g.f = sg.t
    ) cycle f, t set is_cycle to 'Y' default 'N' using path
    select * from search_graph;

2024-09-15 14:58:09.853 EDT [960949] LOG:  malloc_trim saved 12616 kB
2024-09-15 14:58:09.853 EDT [960949] STATEMENT:  with recursive search_graph(f, t, label) as (
        select * from graph0 g
        union distinct
        select g.*
        from graph0 g, search_graph sg
        where g.f = sg.t
    ) search breadth first by f, t set seq
    select * from search_graph order by seq;

I don't understand why WITH RECURSIVE queries might be more prone
to leave non-garbage-collected memory behind than other queries,
but maybe that is worth looking into.

            regards, tom lane

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea113..9efb4f7636 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -21,6 +21,7 @@
 
 #include <fcntl.h>
 #include <limits.h>
+#include <malloc.h>
 #include <signal.h>
 #include <unistd.h>
 #include <sys/resource.h>
@@ -2797,6 +2798,16 @@ finish_xact_command(void)
         MemoryContextStats(TopMemoryContext);
 #endif
 
+        {
+            char       *old_brk = sbrk(0);
+            char       *new_brk;
+
+            malloc_trim(0);
+            new_brk = sbrk(0);
+            elog(LOG, "malloc_trim saved %zu kB",
+                 (old_brk - new_brk + 1023) / 1024);
+        }
+
         xact_started = false;
     }
 }
  32293 LOG:  malloc_trim saved 0 kB
      4 LOG:  malloc_trim saved 4 kB
     12 LOG:  malloc_trim saved 8 kB
      7 LOG:  malloc_trim saved 12 kB
     57 LOG:  malloc_trim saved 16 kB
      3 LOG:  malloc_trim saved 20 kB
     22 LOG:  malloc_trim saved 24 kB
     14 LOG:  malloc_trim saved 28 kB
    288 LOG:  malloc_trim saved 32 kB
     20 LOG:  malloc_trim saved 36 kB
     26 LOG:  malloc_trim saved 40 kB
     18 LOG:  malloc_trim saved 44 kB
     26 LOG:  malloc_trim saved 48 kB
     27 LOG:  malloc_trim saved 52 kB
     37 LOG:  malloc_trim saved 56 kB
     26 LOG:  malloc_trim saved 60 kB
    218 LOG:  malloc_trim saved 64 kB
     20 LOG:  malloc_trim saved 68 kB
     44 LOG:  malloc_trim saved 72 kB
     44 LOG:  malloc_trim saved 76 kB
     45 LOG:  malloc_trim saved 80 kB
     29 LOG:  malloc_trim saved 84 kB
     91 LOG:  malloc_trim saved 88 kB
     31 LOG:  malloc_trim saved 92 kB
    191 LOG:  malloc_trim saved 96 kB
     30 LOG:  malloc_trim saved 100 kB
     81 LOG:  malloc_trim saved 104 kB
     24 LOG:  malloc_trim saved 108 kB
    214 LOG:  malloc_trim saved 112 kB
     32 LOG:  malloc_trim saved 116 kB
    178 LOG:  malloc_trim saved 120 kB
     86 LOG:  malloc_trim saved 124 kB
   4498 LOG:  malloc_trim saved 128 kB
     29 LOG:  malloc_trim saved 132 kB
    286 LOG:  malloc_trim saved 136 kB
     34 LOG:  malloc_trim saved 140 kB
    563 LOG:  malloc_trim saved 144 kB
     20 LOG:  malloc_trim saved 148 kB
    821 LOG:  malloc_trim saved 152 kB
    987 LOG:  malloc_trim saved 156 kB
    212 LOG:  malloc_trim saved 160 kB
      8 LOG:  malloc_trim saved 164 kB
     22 LOG:  malloc_trim saved 168 kB
      8 LOG:  malloc_trim saved 172 kB
     19 LOG:  malloc_trim saved 176 kB
      7 LOG:  malloc_trim saved 180 kB
     10 LOG:  malloc_trim saved 184 kB
      5 LOG:  malloc_trim saved 188 kB
    128 LOG:  malloc_trim saved 192 kB
      7 LOG:  malloc_trim saved 196 kB
     14 LOG:  malloc_trim saved 200 kB
     14 LOG:  malloc_trim saved 204 kB
      9 LOG:  malloc_trim saved 208 kB
     11 LOG:  malloc_trim saved 212 kB
     11 LOG:  malloc_trim saved 216 kB
     12 LOG:  malloc_trim saved 224 kB
      6 LOG:  malloc_trim saved 228 kB
      5 LOG:  malloc_trim saved 232 kB
      2 LOG:  malloc_trim saved 236 kB
      3 LOG:  malloc_trim saved 240 kB
      7 LOG:  malloc_trim saved 244 kB
      3 LOG:  malloc_trim saved 248 kB
      3 LOG:  malloc_trim saved 252 kB
    186 LOG:  malloc_trim saved 256 kB
     17 LOG:  malloc_trim saved 260 kB
     28 LOG:  malloc_trim saved 264 kB
     51 LOG:  malloc_trim saved 268 kB
     33 LOG:  malloc_trim saved 272 kB
     33 LOG:  malloc_trim saved 276 kB
     69 LOG:  malloc_trim saved 280 kB
     25 LOG:  malloc_trim saved 284 kB
     39 LOG:  malloc_trim saved 288 kB
     12 LOG:  malloc_trim saved 292 kB
     20 LOG:  malloc_trim saved 296 kB
     12 LOG:  malloc_trim saved 300 kB
     34 LOG:  malloc_trim saved 304 kB
      5 LOG:  malloc_trim saved 308 kB
      2 LOG:  malloc_trim saved 312 kB
      6 LOG:  malloc_trim saved 316 kB
     12 LOG:  malloc_trim saved 320 kB
      1 LOG:  malloc_trim saved 324 kB
      1 LOG:  malloc_trim saved 328 kB
      1 LOG:  malloc_trim saved 332 kB
      6 LOG:  malloc_trim saved 336 kB
      3 LOG:  malloc_trim saved 344 kB
      1 LOG:  malloc_trim saved 348 kB
      6 LOG:  malloc_trim saved 352 kB
      1 LOG:  malloc_trim saved 356 kB
     43 LOG:  malloc_trim saved 360 kB
    103 LOG:  malloc_trim saved 364 kB
      1 LOG:  malloc_trim saved 368 kB
    119 LOG:  malloc_trim saved 384 kB
      1 LOG:  malloc_trim saved 388 kB
      1 LOG:  malloc_trim saved 392 kB
      2 LOG:  malloc_trim saved 396 kB
      4 LOG:  malloc_trim saved 412 kB
      4 LOG:  malloc_trim saved 416 kB
      1 LOG:  malloc_trim saved 420 kB
      3 LOG:  malloc_trim saved 424 kB
      1 LOG:  malloc_trim saved 428 kB
      5 LOG:  malloc_trim saved 432 kB
      3 LOG:  malloc_trim saved 436 kB
      1 LOG:  malloc_trim saved 440 kB
     27 LOG:  malloc_trim saved 448 kB
      6 LOG:  malloc_trim saved 452 kB
      5 LOG:  malloc_trim saved 464 kB
     10 LOG:  malloc_trim saved 480 kB
      2 LOG:  malloc_trim saved 484 kB
      1 LOG:  malloc_trim saved 492 kB
      1 LOG:  malloc_trim saved 496 kB
      1 LOG:  malloc_trim saved 500 kB
      4 LOG:  malloc_trim saved 508 kB
     16 LOG:  malloc_trim saved 512 kB
      1 LOG:  malloc_trim saved 528 kB
      2 LOG:  malloc_trim saved 536 kB
      1 LOG:  malloc_trim saved 540 kB
      1 LOG:  malloc_trim saved 552 kB
      2 LOG:  malloc_trim saved 556 kB
      1 LOG:  malloc_trim saved 576 kB
      1 LOG:  malloc_trim saved 580 kB
      6 LOG:  malloc_trim saved 588 kB
      1 LOG:  malloc_trim saved 604 kB
      1 LOG:  malloc_trim saved 616 kB
      1 LOG:  malloc_trim saved 620 kB
      1 LOG:  malloc_trim saved 628 kB
      1 LOG:  malloc_trim saved 632 kB
      3 LOG:  malloc_trim saved 640 kB
      2 LOG:  malloc_trim saved 668 kB
      2 LOG:  malloc_trim saved 672 kB
      1 LOG:  malloc_trim saved 716 kB
      2 LOG:  malloc_trim saved 728 kB
      1 LOG:  malloc_trim saved 732 kB
      1 LOG:  malloc_trim saved 736 kB
      2 LOG:  malloc_trim saved 748 kB
      1 LOG:  malloc_trim saved 760 kB
     11 LOG:  malloc_trim saved 768 kB
      2 LOG:  malloc_trim saved 776 kB
      1 LOG:  malloc_trim saved 800 kB
     10 LOG:  malloc_trim saved 844 kB
      1 LOG:  malloc_trim saved 880 kB
      1 LOG:  malloc_trim saved 884 kB
      1 LOG:  malloc_trim saved 896 kB
      2 LOG:  malloc_trim saved 904 kB
      6 LOG:  malloc_trim saved 960 kB
      4 LOG:  malloc_trim saved 996 kB
      1 LOG:  malloc_trim saved 1024 kB
      3 LOG:  malloc_trim saved 1084 kB
      1 LOG:  malloc_trim saved 1088 kB
      1 LOG:  malloc_trim saved 1136 kB
      7 LOG:  malloc_trim saved 1152 kB
      1 LOG:  malloc_trim saved 1160 kB
      1 LOG:  malloc_trim saved 1248 kB
      1 LOG:  malloc_trim saved 1256 kB
      1 LOG:  malloc_trim saved 1280 kB
      1 LOG:  malloc_trim saved 1288 kB
      2 LOG:  malloc_trim saved 1296 kB
      3 LOG:  malloc_trim saved 1300 kB
      2 LOG:  malloc_trim saved 1412 kB
      2 LOG:  malloc_trim saved 1416 kB
      1 LOG:  malloc_trim saved 1512 kB
      3 LOG:  malloc_trim saved 1520 kB
      5 LOG:  malloc_trim saved 1536 kB
      2 LOG:  malloc_trim saved 1568 kB
      1 LOG:  malloc_trim saved 1608 kB
      2 LOG:  malloc_trim saved 1612 kB
      1 LOG:  malloc_trim saved 1648 kB
      1 LOG:  malloc_trim saved 1768 kB
      1 LOG:  malloc_trim saved 1772 kB
      1 LOG:  malloc_trim saved 1792 kB
      1 LOG:  malloc_trim saved 1860 kB
      1 LOG:  malloc_trim saved 1872 kB
      1 LOG:  malloc_trim saved 1908 kB
      1 LOG:  malloc_trim saved 1936 kB
      2 LOG:  malloc_trim saved 2004 kB
      1 LOG:  malloc_trim saved 2104 kB
      1 LOG:  malloc_trim saved 2112 kB
      1 LOG:  malloc_trim saved 2116 kB
      1 LOG:  malloc_trim saved 2176 kB
      1 LOG:  malloc_trim saved 2228 kB
      2 LOG:  malloc_trim saved 2240 kB
      1 LOG:  malloc_trim saved 2348 kB
      1 LOG:  malloc_trim saved 2496 kB
      1 LOG:  malloc_trim saved 2624 kB
      1 LOG:  malloc_trim saved 2676 kB
      1 LOG:  malloc_trim saved 3568 kB
      1 LOG:  malloc_trim saved 3712 kB
      1 LOG:  malloc_trim saved 3760 kB
      1 LOG:  malloc_trim saved 3836 kB
      1 LOG:  malloc_trim saved 7228 kB
      2 LOG:  malloc_trim saved 12488 kB
      1 LOG:  malloc_trim saved 12616 kB

Re: Trim the heap free memory

From

shawn wang

Date:

18 September 2024, 05:56:08

Thank you very much for your response and suggestions.

As you mentioned, the patch here is actually designed for glibc's ptmalloc2 andis not applicable to other platforms. I will consider supporting it only on the Linux platform in the future. In the memory management strategy of ptmalloc2, there is a certain amount of non-garbage-collected memory, which is closely related to the order and method of memory allocation and release. To reduce the performance overhead caused by frequent allocation and release of small blocks of memory, ptmalloc2 intentionally retains this part of the memory. The malloc_trim function locks, traverses memory blocks, and uses madvise to release this part of the memory, but this process may also have a negative impact on performance. In the process of exploring solutions, I also considered a variety of strategies, including scheduling malloc_trim to be executed at regular intervals or triggering malloc_trim after a specific number of free operations. However, we found that these methods are not optimal solutions.

We can see that out of about 43K test queries, 32K saved nothing
whatever, and in only four was more than a couple of meg saved.
That's pretty discouraging IMO. It might be useful to look closer
at the behavior of those top four though. I see them as

I have previously encountered situations where the non-garbage-collected memory of wal_sender was approximately hundreds of megabytes or even exceeded 1GB, but I was unable to reproduce this situation using simple SQL. Therefore, I introduced an asynchronous processing function, hoping to manage memory more efficiently without affecting performance.

In addition, I have considered the following optimization strategies:

Adjust the configuration of ptmalloc2 through the mallopt function to use mmap rather than sbrk for memory allocation. This can immediately return the memory to the operating system when it is released, but it may affect performance due to the higher overhead of mmap.
Use other memory allocators such as jemalloc or tcmalloc, and adjust relevant parameters to reduce the generation of non-garbage-collected memory. However, these allocators are designed for multi-threaded and may lead to increased memory usage per process.
Build a set of memory context (memory context) allocation functions based on mmap, delegating the responsibility of memory management entirely to the database level. Although this solution can effectively control memory allocation, it requires a large-scale engineering implementation.

I look forward to further discussing these solutions with you and exploring the best memory management practices together.

Best regards, Shawn

Tom Lane <tgl@sss.pgh.pa.us> 于2024年9月16日周一 03:16写道：

I wrote:
> The single test case you showed suggested that maybe we could
> usefully prod glibc to free memory at query completion, but we
> don't need all this interrupt infrastructure to do that. I think
> we could likely get 95% of the benefit with about a five-line
> patch.

To try to quantify that a little, I wrote a very quick-n-dirty
patch to apply malloc_trim during finish_xact_command and log
the effects. (I am not asserting this is the best place to
call malloc_trim; it's just one plausible possibility.) Patch
attached, as well as statistics collected from a run of the
core regression tests followed by

grep malloc_trim postmaster.log | sed 's/.*LOG:/LOG:/' | sort -k4n | uniq -c >trim_savings.txt

We can see that out of about 43K test queries, 32K saved nothing
whatever, and in only four was more than a couple of meg saved.
That's pretty discouraging IMO. It might be useful to look closer
at the behavior of those top four though. I see them as

2024-09-15 14:58:06.146 EDT [960138] LOG: malloc_trim saved 7228 kB
2024-09-15 14:58:06.146 EDT [960138] STATEMENT: ALTER TABLE delete_test_table ADD PRIMARY KEY (a,b,c,d);

2024-09-15 14:58:09.861 EDT [960949] LOG: malloc_trim saved 12488 kB
2024-09-15 14:58:09.861 EDT [960949] STATEMENT: with recursive search_graph(f, t, label, is_cycle, path) as (
select *, false, array[row(g.f, g.t)] from graph g
union distinct
select g.*, row(g.f, g.t) = any(path), path || row(g.f, g.t)
from graph g, search_graph sg
where g.f = sg.t and not is_cycle
)
select * from search_graph;

2024-09-15 14:58:09.866 EDT [960949] LOG: malloc_trim saved 12488 kB
2024-09-15 14:58:09.866 EDT [960949] STATEMENT: with recursive search_graph(f, t, label) as (
select * from graph g
union distinct
select g.*
from graph g, search_graph sg
where g.f = sg.t
) cycle f, t set is_cycle to 'Y' default 'N' using path
select * from search_graph;

2024-09-15 14:58:09.853 EDT [960949] LOG: malloc_trim saved 12616 kB
2024-09-15 14:58:09.853 EDT [960949] STATEMENT: with recursive search_graph(f, t, label) as (
select * from graph0 g
union distinct
select g.*
from graph0 g, search_graph sg
where g.f = sg.t
) search breadth first by f, t set seq
select * from search_graph order by seq;

I don't understand why WITH RECURSIVE queries might be more prone
to leave non-garbage-collected memory behind than other queries,
but maybe that is worth looking into.

regards, tom lane

Re: Trim the heap free memory

From

Tomas Vondra

Date:

08 December 2024, 07:23:26

On 9/18/24 04:56, shawn wang wrote:
> Thank you very much for your response and suggestions.
> 
> As you mentioned, the patch here is actually designed for glibc's
> ptmalloc2 andis not applicable to other platforms. I will consider
> supporting it only on the Linux platform in the future. In the memory
> management strategy of ptmalloc2, there is a certain amount of non-
> garbage-collected memory, which is closely related to the order and
> method of memory allocation and release. To reduce the performance
> overhead caused by frequent allocation and release of small blocks of
> memory, ptmalloc2 intentionally retains this part of the memory. The
> malloc_trim function locks, traverses memory blocks, and uses madvise to
> release this part of the memory, but this process may also have a
> negative impact on performance. In the process of exploring solutions, I
> also considered a variety of strategies, including scheduling
> malloc_trim to be executed at regular intervals or triggering
> malloc_trim after a specific number of free operations. However, we
> found that these methods are not optimal solutions.
> 
>     We can see that out of about 43K test queries, 32K saved nothing
>     whatever, and in only four was more than a couple of meg saved.
>     That's pretty discouraging IMO.  It might be useful to look closer
>     at the behavior of those top four though.  I see them as
> 
> 
> I have previously encountered situations where the non-garbage-collected
> memory of wal_sender was approximately hundreds of megabytes or even
> exceeded 1GB, but I was unable to reproduce this situation using simple
> SQL. Therefore, I introduced an asynchronous processing function, hoping
> to manage memory more efficiently without affecting performance.
>  

I doubt a system function is the right approach to deal with these
memory allocation issues. The function has to be called by the user,
which means the user is expected to monitor the system and decide when
to invoke the function. That seems far from trivial - it would require
collecting OS-level information about memory usage, and I suppose it'd
need to happen fairly often to actually help with OOM reliably.

> 
> In addition, I have considered the following optimization strategies:
> 
>  1.
> 
>     Adjust the configuration of ptmalloc2 through the mallopt function
>     to use mmap rather than sbrk for memory allocation. This can
>     immediately return the memory to the operating system when it is
>     released, but it may affect performance due to the higher overhead
>     of mmap.
> 

Sure, forcing the system to release memory more aggressively may affect
performance - that's the tradeoff done by glibc. But calling the new
pg_trim_backend_heap_free_memory() function is not free either.

But why would it force returning the memory to be returned immediately?
The decision whether to trim memory is driven by M_TRIM_THRESHOLD, and
that does not need to be 0. In fact, it's 128kB by default, i.e. glibc
trims memory automatically, if it can trim at least 128kB.

Yes, by default the thresholds are adjusted dynamically, which I guess
is one way to get excessive memory usage that could have been solved by
calling malloc_trim(). But setting the option to any value disabled the
dynamic behavior, it doesn't need to be set to 0.

>  2.
> 
>     Use other memory allocators such as jemalloc or tcmalloc, and adjust
>     relevant parameters to reduce the generation of non-garbage-
>     collected memory. However, these allocators are designed for multi-
>     threaded and may lead to increased memory usage per process.
> 

Right, that's kinda the opposite of trying to not waste memory.

But it also suggests syscalls (done by malloc) may be a problem under
high concurrency - not just with multi-threading, but even with regular
processes. And for glibc that matters too, of course - in fact, it may
be pretty important to allow glibc to cache more memory (by increasing
M_TOP_PAD) to get good throughput in certain workloads ...

>  3.
> 
>     Build a set of memory context (memory context) allocation functions
>     based on mmap, delegating the responsibility of memory management
>     entirely to the database level. Although this solution can
>     effectively control memory allocation, it requires a large-scale
>     engineering implementation.
> 

Why would it be complex? You could just as well set M_MMAP_THRESHOLD to
some low value, so that all malloc() calls are handled by mmap()
internally. Not sure it's a good idea, though.

> I look forward to further discussing these solutions with you and
> exploring the best memory management practices together.
> 

Adjusting the glibc malloc() behavior may be important, but I don't
think a system function is a good approach. It's possible to change the
behavior by setting environment variables, which is pretty easy, but
maybe we could have some thing that does the same thing using mallopt().

That's what Ronan Dunklau proposed in thread [1] a year ago ... I like
that approach much more, it's much simpler for the user.

[1]
https://www.postgresql.org/message-id/flat/3424675.QJadu78ljV%40aivenlaptop

regards

-- 
Tomas Vondra

Re: Trim the heap free memory

From

Tomas Vondra

Date:

08 December 2024, 21:48:38


On 12/8/24 05:23, Tomas Vondra wrote:
> On 9/18/24 04:56, shawn wang wrote:
>> Thank you very much for your response and suggestions.
>>
>> As you mentioned, the patch here is actually designed for glibc's
>> ptmalloc2 andis not applicable to other platforms. I will consider
>> supporting it only on the Linux platform in the future. In the memory
>> management strategy of ptmalloc2, there is a certain amount of non-
>> garbage-collected memory, which is closely related to the order and
>> method of memory allocation and release. To reduce the performance
>> overhead caused by frequent allocation and release of small blocks of
>> memory, ptmalloc2 intentionally retains this part of the memory. The
>> malloc_trim function locks, traverses memory blocks, and uses madvise to
>> release this part of the memory, but this process may also have a
>> negative impact on performance. In the process of exploring solutions, I
>> also considered a variety of strategies, including scheduling
>> malloc_trim to be executed at regular intervals or triggering
>> malloc_trim after a specific number of free operations. However, we
>> found that these methods are not optimal solutions.
>>
>>     We can see that out of about 43K test queries, 32K saved nothing
>>     whatever, and in only four was more than a couple of meg saved.
>>     That's pretty discouraging IMO.  It might be useful to look closer
>>     at the behavior of those top four though.  I see them as
>>
>>
>> I have previously encountered situations where the non-garbage-collected
>> memory of wal_sender was approximately hundreds of megabytes or even
>> exceeded 1GB, but I was unable to reproduce this situation using simple
>> SQL. Therefore, I introduced an asynchronous processing function, hoping
>> to manage memory more efficiently without affecting performance.
>>  
> 
> I doubt a system function is the right approach to deal with these
> memory allocation issues. The function has to be called by the user,
> which means the user is expected to monitor the system and decide when
> to invoke the function. That seems far from trivial - it would require
> collecting OS-level information about memory usage, and I suppose it'd
> need to happen fairly often to actually help with OOM reliably.
> 
>>
>> In addition, I have considered the following optimization strategies:
>>
>>  1.
>>
>>     Adjust the configuration of ptmalloc2 through the mallopt function
>>     to use mmap rather than sbrk for memory allocation. This can
>>     immediately return the memory to the operating system when it is
>>     released, but it may affect performance due to the higher overhead
>>     of mmap.
>>
> 
> Sure, forcing the system to release memory more aggressively may affect
> performance - that's the tradeoff done by glibc. But calling the new
> pg_trim_backend_heap_free_memory() function is not free either.
> 
> But why would it force returning the memory to be returned immediately?
> The decision whether to trim memory is driven by M_TRIM_THRESHOLD, and
> that does not need to be 0. In fact, it's 128kB by default, i.e. glibc
> trims memory automatically, if it can trim at least 128kB.
> 
> Yes, by default the thresholds are adjusted dynamically, which I guess
> is one way to get excessive memory usage that could have been solved by
> calling malloc_trim(). But setting the option to any value disabled the
> dynamic behavior, it doesn't need to be set to 0.
> 
> 
>>  2.
>>
>>     Use other memory allocators such as jemalloc or tcmalloc, and adjust
>>     relevant parameters to reduce the generation of non-garbage-
>>     collected memory. However, these allocators are designed for multi-
>>     threaded and may lead to increased memory usage per process.
>>
> 
> Right, that's kinda the opposite of trying to not waste memory.
> 
> But it also suggests syscalls (done by malloc) may be a problem under
> high concurrency - not just with multi-threading, but even with regular
> processes. And for glibc that matters too, of course - in fact, it may
> be pretty important to allow glibc to cache more memory (by increasing
> M_TOP_PAD) to get good throughput in certain workloads ...
> 
>>  3.
>>
>>     Build a set of memory context (memory context) allocation functions
>>     based on mmap, delegating the responsibility of memory management
>>     entirely to the database level. Although this solution can
>>     effectively control memory allocation, it requires a large-scale
>>     engineering implementation.
>>
> 
> Why would it be complex? You could just as well set M_MMAP_THRESHOLD to
> some low value, so that all malloc() calls are handled by mmap()
> internally. Not sure it's a good idea, though.
> 
>> I look forward to further discussing these solutions with you and
>> exploring the best memory management practices together.
>>
> 
> Adjusting the glibc malloc() behavior may be important, but I don't
> think a system function is a good approach. It's possible to change the
> behavior by setting environment variables, which is pretty easy, but
> maybe we could have some thing that does the same thing using mallopt().
> 
> That's what Ronan Dunklau proposed in thread [1] a year ago ... I like
> that approach much more, it's much simpler for the user.
> 

To propose something less abstract / more tangible, I think we should do
something like this:

1) add a bit of code for glibc-based systems, that adjusts selected
malloc parameters using mallopt() during startup

2) add a GUC that enables this, with the default being the regular glibc
behavior (with dynamic adjustment of various thresholds)


Which exact parameters would this set is an open question, but based on
my earlier experiments, Ronan's earlier patches, etc. I think it should
adjust at least

  M_TRIM_THRESHOLD - to make sure we trim heap regularly
  M_TOP_PAD - to make sure we cache some allocated memory

I wonder if maybe we should tune M_MMAP_THRESHOLD, which on 64-bit
systems defaults to 32MB, so we don't really mmap() very often for
regular memory contexts. But I don't know if that's a good idea, that
would need some experiments.

I believe that's essentially what Ronan Dunklau proposed, but it
stalled. Not because of some inherent complexity, but because of
concerns about introducing glibc-specific code.

Based on my recent experiments I think it's clearly worth it (esp. with
high concurrency workloads). If glibc was a niche, it'd be a different
situation, but I'd guess vast majority of databases runs on glibc. Yes,
it's possible to do these changes without new code (e.g. by setting the
environment variables), but that's rather inconvenient.

Perhaps it'd be possible to make it a bit smarter by looking at malloc
stats, and adjust the trim/pad thresholds, but I'd leave that for the
future. It might even lead to similar issues with excessive memory usage
just like the logic built into glibc.

But maybe we could at least print / provide some debugging information?
That would help with adjusting the GUC ...


regards

-- 
Tomas Vondra