Re: BUG #19400: Memory leak in checkpointer and startup processes on PostgreSQL 18 - Mailing list pgsql-bugs

From Raphaël Perissat
Subject Re: BUG #19400: Memory leak in checkpointer and startup processes on PostgreSQL 18
Date
Msg-id CAOLPA2ds60bcR7ZnL1uwdottW+2DS+Hx=3Gy28ex2E9-+UXQAg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #19400: Memory leak in checkpointer and startup processes on PostgreSQL 18  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
Hi and thanks for the explanation.

Indeed the pmap seems to confirm that the private memory usage is correct 
# pmap -d -p $(pgrep -f "postgres.*checkpointer") | tail -n 1
mapped: 8647256K    writeable/private: 2632K    shared: 8574584K

I used pg_log_backend_memory_contexts() with the pid of the checkpointer process and I can't see much on this side as well :
2026-02-11 08:58:13.794 UTC [1988622] LOG:  logging memory contexts of PID 1988622
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 1; TopMemoryContext: 61568 total in 3 blocks; 2768 free (0 chunks); 58800 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; smgr relation table: 32768 total in 3 blocks; 16904 free (9 chunks); 15864 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; Checkpointer: 24576 total in 2 blocks; 24296 free (13 chunks); 280 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; LOCALLOCK hash: 8192 total in 1 blocks; 616 free (0 chunks); 7576 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; WAL record construction: 50200 total in 2 blocks; 6400 free (0 chunks); 43800 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; PrivateRefCount: 8192 total in 1 blocks; 2672 free (0 chunks); 5520 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; MdSmgr: 8192 total in 1 blocks; 7952 free (62 chunks); 240 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; Pending ops context: 8192 total in 1 blocks; 7952 free (5 chunks); 240 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 3; Pending Ops Table: 16384 total in 2 blocks; 6712 free (3 chunks); 9672 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; Rendezvous variable hash: 8192 total in 1 blocks; 616 free (0 chunks); 7576 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; GUCMemoryContext: 32768 total in 3 blocks; 3264 free (19 chunks); 29504 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 3; GUC hash table: 32768 total in 3 blocks; 10664 free (6 chunks); 22104 used
2026-02-11 08:58:13.795 UTC [1988622] LOG:  level: 2; Timezones: 104112 total in 2 blocks; 2672 free (0 chunks); 101440 used
2026-02-11 08:58:13.795 UTC [1988622] LOG:  level: 2; ErrorContext: 8192 total in 1 blocks; 7952 free (5 chunks); 240 used
2026-02-11 08:58:13.795 UTC [1988622] LOG:  Grand total: 404296 bytes in 26 blocks; 101440 free (122 chunks); 302856 used

I monitor my servers metric on ELK (grafana-like) and I can clearly see the memory usage growing with approx 100MB/hour until it reaches 8GB for the postgres process, causing patroni to crashout on the primary. 
The fact that this memory "leak" is appearing on both the primary and the 2 replicas make me think that this is not caused by some ingest delay / index creation but maybe I'm wrong.

I restarted the 3 nodes yesterday and here is the output of the watch command on the primary :
total        used        free      shared  buff/cache   available
Mem:           31988        5861         670        4571       30530       26126
Swap:           4095          55        4040

free is only showing 670 already, and I can see it going down in real-time like a countdown.

Based on this output on the replicas :
:~# ps aux --sort=-%mem | head -20
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres   62338  0.1 10.6 8652016 3492940 ?     Ss   Feb10   1:04 postgres: atmo_data: checkpointer
postgres   62340  0.2  9.8 8655736 3227356 ?     Ss   Feb10   2:36 postgres: atmo_data: startup recovering 0000001E00000246000000C9

Couldn't the startup recovery process be causing this ? I do this command on the same replica time to time and I can clearly see that the %MEM used by those 2 processes are growing over time.

Thanks for your help.


Le mar. 10 févr. 2026 à 20:58, Andres Freund <andres@anarazel.de> a écrit :
Hi,

On 2026-02-10 15:28:38 +0000, PG Bug reporting form wrote:
> I recently migrated my cluster with 3 dedicated servers to a new cluster. I
> was running on PG12 and I am now on PG18.1.
> I noticed an increasing memory usage on all of my 3 node, until at some
> point there is no memory left and patroni crashes on the leader, leaving the
> cluster with no available primary.
> The cluster is a Data Warehouse type using TimescaleDB, ingesting approx. 1M
> of time-serie a day.
> It appears that the memory leak is affecting both the checkpointer and
> startup (WAL replay) processes in PostgreSQL 18.0 and 18.1.
> I never had such issue on the old cluster with PG12 and the server's
> configuration and cluster usage are the same (except the upgrade of PG)
>
> SYMPTOMS:
> - Checkpointer process grows to 5.6GB RSS after 24 hours
> - Startup process on replicas grows to 3.9GB RSS
> - Memory growth rate: approximately 160-200MB per hour
> - Eventually causes out-of-memory conditions
>
> CONFIGURATION:
> - PostgreSQL version: Initially 18.0, upgraded to 18.1 - same issue persists
> - Platform: Debian 13
> - TimescaleDB: 2.23.0
> - Deployment: 3-node Patroni cluster with streaming replication
> - WAL level: logical
> - Hot standby enabled
>
> SYSTEM RESOURCES:
> RAM: 32GB
> Proc: 12 core of Intel(R) Xeon(R) E-2386G 3.50GHz
>
> KEY SETTINGS:
> - wal_level: logical
> - hot_standby: on
> - max_wal_senders: 20
> - max_replication_slots: 20
> - wal_keep_size: 1GB
> - shared_buffer: 8GB
>
> WAL STATISTICS (over 7 days):
> - Total WAL generated: 2.3TB (approximately 31GB/day)
> - Replication lag: 0 bytes (replicas are caught up)
> - No long-running transactions
>
> MEMORY STATE AFTER 24 HOURS:
> On primary:
>   postgres checkpointer: 3.9GB RSS
>
> On replicas:
>   postgres checkpointer: 5.6GB RSS
>   postgres startup recovering: 3.9GB RSS  <-- This is abnormal

The RSS slowly increasing towards shared_buffers is normal if you're not using
huge_pages. The OS only counts pages in shared memory as part of RSS once a
page has been used in the process. Over time the checkpointer process touches
more and more of shared_buffers, thus increasing the RSS.

You can use "pmap -d -p $pid_of_process" to see how much of the RSS is
actually shared memory.

To show this, here's a PS for a new backend:

ps:
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
andres   2544694  0.0  0.0 8719956 25744 ?       Ss   14:55   0:00 postgres: dev assert: andres postgres [local] idle

and then after reading in a relation 1.3GB relation:

andres   2544694  1.7  2.2 8720972 1403576 ?     Ss   14:55   0:00 postgres: dev assert: andres postgres [local] idle

So you can see that RSS increased proportionally with the amount of touched
data.

Whereas with pmap:

pmap -d -p 2544694|tail -n 1
mapped: 8721924K    writeable/private: 5196K    shared: 8646284K


I think you would need to monitor the real memory usage of various processes
to know why you're OOMing.

You can use pg_log_backend_memory_contexts() to get the memory usage
information of backend processes.

Greetings,

Andres Freund

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Possibly a bug
Next
From: PG Bug reporting form
Date:
Subject: BUG #19401: Inconsistent predicate evaluation with derived table vs direct query involving NULL