Re: BUG #19400: Memory leak in checkpointer and startup processes on PostgreSQL 18 - Mailing list pgsql-bugs
| From | Raphaël Perissat |
|---|---|
| Subject | Re: BUG #19400: Memory leak in checkpointer and startup processes on PostgreSQL 18 |
| Date | |
| Msg-id | CAOLPA2ds60bcR7ZnL1uwdottW+2DS+Hx=3Gy28ex2E9-+UXQAg@mail.gmail.com Whole thread Raw |
| In response to | Re: BUG #19400: Memory leak in checkpointer and startup processes on PostgreSQL 18 (Andres Freund <andres@anarazel.de>) |
| List | pgsql-bugs |
Hi and thanks for the explanation.
Indeed the pmap seems to confirm that the private memory usage is correct
# pmap -d -p $(pgrep -f "postgres.*checkpointer") | tail -n 1
mapped: 8647256K writeable/private: 2632K shared: 8574584K
mapped: 8647256K writeable/private: 2632K shared: 8574584K
I used pg_log_backend_memory_contexts() with the pid of the checkpointer process and I can't see much on this side as well :
2026-02-11 08:58:13.794 UTC [1988622] LOG: logging memory contexts of PID 1988622
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 1; TopMemoryContext: 61568 total in 3 blocks; 2768 free (0 chunks); 58800 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; smgr relation table: 32768 total in 3 blocks; 16904 free (9 chunks); 15864 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; Checkpointer: 24576 total in 2 blocks; 24296 free (13 chunks); 280 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; LOCALLOCK hash: 8192 total in 1 blocks; 616 free (0 chunks); 7576 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; WAL record construction: 50200 total in 2 blocks; 6400 free (0 chunks); 43800 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; PrivateRefCount: 8192 total in 1 blocks; 2672 free (0 chunks); 5520 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; MdSmgr: 8192 total in 1 blocks; 7952 free (62 chunks); 240 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; Pending ops context: 8192 total in 1 blocks; 7952 free (5 chunks); 240 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 3; Pending Ops Table: 16384 total in 2 blocks; 6712 free (3 chunks); 9672 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; Rendezvous variable hash: 8192 total in 1 blocks; 616 free (0 chunks); 7576 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; GUCMemoryContext: 32768 total in 3 blocks; 3264 free (19 chunks); 29504 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 3; GUC hash table: 32768 total in 3 blocks; 10664 free (6 chunks); 22104 used
2026-02-11 08:58:13.795 UTC [1988622] LOG: level: 2; Timezones: 104112 total in 2 blocks; 2672 free (0 chunks); 101440 used
2026-02-11 08:58:13.795 UTC [1988622] LOG: level: 2; ErrorContext: 8192 total in 1 blocks; 7952 free (5 chunks); 240 used
2026-02-11 08:58:13.795 UTC [1988622] LOG: Grand total: 404296 bytes in 26 blocks; 101440 free (122 chunks); 302856 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 1; TopMemoryContext: 61568 total in 3 blocks; 2768 free (0 chunks); 58800 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; smgr relation table: 32768 total in 3 blocks; 16904 free (9 chunks); 15864 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; Checkpointer: 24576 total in 2 blocks; 24296 free (13 chunks); 280 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; LOCALLOCK hash: 8192 total in 1 blocks; 616 free (0 chunks); 7576 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; WAL record construction: 50200 total in 2 blocks; 6400 free (0 chunks); 43800 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; PrivateRefCount: 8192 total in 1 blocks; 2672 free (0 chunks); 5520 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; MdSmgr: 8192 total in 1 blocks; 7952 free (62 chunks); 240 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; Pending ops context: 8192 total in 1 blocks; 7952 free (5 chunks); 240 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 3; Pending Ops Table: 16384 total in 2 blocks; 6712 free (3 chunks); 9672 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; Rendezvous variable hash: 8192 total in 1 blocks; 616 free (0 chunks); 7576 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 2; GUCMemoryContext: 32768 total in 3 blocks; 3264 free (19 chunks); 29504 used
2026-02-11 08:58:13.794 UTC [1988622] LOG: level: 3; GUC hash table: 32768 total in 3 blocks; 10664 free (6 chunks); 22104 used
2026-02-11 08:58:13.795 UTC [1988622] LOG: level: 2; Timezones: 104112 total in 2 blocks; 2672 free (0 chunks); 101440 used
2026-02-11 08:58:13.795 UTC [1988622] LOG: level: 2; ErrorContext: 8192 total in 1 blocks; 7952 free (5 chunks); 240 used
2026-02-11 08:58:13.795 UTC [1988622] LOG: Grand total: 404296 bytes in 26 blocks; 101440 free (122 chunks); 302856 used
I monitor my servers metric on ELK (grafana-like) and I can clearly see the memory usage growing with approx 100MB/hour until it reaches 8GB for the postgres process, causing patroni to crashout on the primary.
The fact that this memory "leak" is appearing on both the primary and the 2 replicas make me think that this is not caused by some ingest delay / index creation but maybe I'm wrong.
I restarted the 3 nodes yesterday and here is the output of the watch command on the primary :
total used free shared buff/cache available
Mem: 31988 5861 670 4571 30530 26126
Swap: 4095 55 4040
Mem: 31988 5861 670 4571 30530 26126
Swap: 4095 55 4040
free is only showing 670 already, and I can see it going down in real-time like a countdown.
Based on this output on the replicas :
:~# ps aux --sort=-%mem | head -20
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 62338 0.1 10.6 8652016 3492940 ? Ss Feb10 1:04 postgres: atmo_data: checkpointer
postgres 62340 0.2 9.8 8655736 3227356 ? Ss Feb10 2:36 postgres: atmo_data: startup recovering 0000001E00000246000000C9
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 62338 0.1 10.6 8652016 3492940 ? Ss Feb10 1:04 postgres: atmo_data: checkpointer
postgres 62340 0.2 9.8 8655736 3227356 ? Ss Feb10 2:36 postgres: atmo_data: startup recovering 0000001E00000246000000C9
Couldn't the startup recovery process be causing this ? I do this command on the same replica time to time and I can clearly see that the %MEM used by those 2 processes are growing over time.
Thanks for your help.
Le mar. 10 févr. 2026 à 20:58, Andres Freund <andres@anarazel.de> a écrit :
Hi,
On 2026-02-10 15:28:38 +0000, PG Bug reporting form wrote:
> I recently migrated my cluster with 3 dedicated servers to a new cluster. I
> was running on PG12 and I am now on PG18.1.
> I noticed an increasing memory usage on all of my 3 node, until at some
> point there is no memory left and patroni crashes on the leader, leaving the
> cluster with no available primary.
> The cluster is a Data Warehouse type using TimescaleDB, ingesting approx. 1M
> of time-serie a day.
> It appears that the memory leak is affecting both the checkpointer and
> startup (WAL replay) processes in PostgreSQL 18.0 and 18.1.
> I never had such issue on the old cluster with PG12 and the server's
> configuration and cluster usage are the same (except the upgrade of PG)
>
> SYMPTOMS:
> - Checkpointer process grows to 5.6GB RSS after 24 hours
> - Startup process on replicas grows to 3.9GB RSS
> - Memory growth rate: approximately 160-200MB per hour
> - Eventually causes out-of-memory conditions
>
> CONFIGURATION:
> - PostgreSQL version: Initially 18.0, upgraded to 18.1 - same issue persists
> - Platform: Debian 13
> - TimescaleDB: 2.23.0
> - Deployment: 3-node Patroni cluster with streaming replication
> - WAL level: logical
> - Hot standby enabled
>
> SYSTEM RESOURCES:
> RAM: 32GB
> Proc: 12 core of Intel(R) Xeon(R) E-2386G 3.50GHz
>
> KEY SETTINGS:
> - wal_level: logical
> - hot_standby: on
> - max_wal_senders: 20
> - max_replication_slots: 20
> - wal_keep_size: 1GB
> - shared_buffer: 8GB
>
> WAL STATISTICS (over 7 days):
> - Total WAL generated: 2.3TB (approximately 31GB/day)
> - Replication lag: 0 bytes (replicas are caught up)
> - No long-running transactions
>
> MEMORY STATE AFTER 24 HOURS:
> On primary:
> postgres checkpointer: 3.9GB RSS
>
> On replicas:
> postgres checkpointer: 5.6GB RSS
> postgres startup recovering: 3.9GB RSS <-- This is abnormal
The RSS slowly increasing towards shared_buffers is normal if you're not using
huge_pages. The OS only counts pages in shared memory as part of RSS once a
page has been used in the process. Over time the checkpointer process touches
more and more of shared_buffers, thus increasing the RSS.
You can use "pmap -d -p $pid_of_process" to see how much of the RSS is
actually shared memory.
To show this, here's a PS for a new backend:
ps:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
andres 2544694 0.0 0.0 8719956 25744 ? Ss 14:55 0:00 postgres: dev assert: andres postgres [local] idle
and then after reading in a relation 1.3GB relation:
andres 2544694 1.7 2.2 8720972 1403576 ? Ss 14:55 0:00 postgres: dev assert: andres postgres [local] idle
So you can see that RSS increased proportionally with the amount of touched
data.
Whereas with pmap:
pmap -d -p 2544694|tail -n 1
mapped: 8721924K writeable/private: 5196K shared: 8646284K
I think you would need to monitor the real memory usage of various processes
to know why you're OOMing.
You can use pg_log_backend_memory_contexts() to get the memory usage
information of backend processes.
Greetings,
Andres Freund
pgsql-bugs by date: