Re: What should I expect when creating many logical replication slots? - Mailing list pgsql-general

From Klaus Darilion
Subject Re: What should I expect when creating many logical replication slots?
Date
Msg-id 8d3be16df9f21a71c94c0ce0d7ebe553@pernau.at
Whole thread Raw
In response to Re: What should I expect when creating many logical replication slots?  (Jim Nasby <jim.nasby@gmail.com>)
List pgsql-general
Am 2024-01-16 19:51, schrieb Jim Nasby:
> On 1/11/24 6:17 PM, Antonin Bas wrote:
>> Hi all,
>> 
>> I have a use case for which I am considering using Postgres Logical 
>> Replication, but I would like to scale up to 100 or even 200 
>> replication slots.
>> 
>> I have increased max_wal_senders and max_replication_slots to 100 
>> (also making sure that max_connections is large enough). Things seem 
>> to be working pretty well so far based on some PoC code I have 
>> written. Postgres is creating a walsender process for each replication 
>> slot, as expected, and the memory footprint of each one is around 4MB.
>> 
>> So I am quite happy with the way things are working, but I am a bit 
>> uneasy about increasing these configuration values by 10-20x compared 
>> to their defaults (both max_wal_senders and max_replication_slots 
>> default to 10).
>> 
>> Is there anything I should be looking out for specifically? Is it 
>> considered an anti-pattern to use that many replication slots and 
>> walsender processes? And, when my database comes under heavy write 
>> load, will walsender processes start consuming a large amount of CPU / 
>> memory (I recognize that this is a vague question, I am still working 
>> on some empirical testing).
> 
> The biggest issue with logical decoding (what drives logical 
> replication) is that every subscriber has to completely decode 
> everything for it's publication, which can be extremely memory 
> intensive under certain circumstances (long running transacitons being 
> one potential trigger). Decoders also have to read through all WAL 
> traffic, regardless of what their publication is set to - everything 
> runs of the single WAL stream.

Indeed, big/long-running transaction can be an issue. Today I added a 
column with SERIAL to a table with 55 mio rows. Although that table is 
not in the publication, the hugh transaction got written to the WAL and 
spilled to disk by the WAL senders. Having 50 WAL senders, we basically 
created a DoS-amplification-attack against our disk. Luckily we could 
increase the disk size.*

Besides that, our DB has (in my point of view) plenty of UPDATE/INSERTs 
and did not have replication performance problems with currently 50 
replication slots. But if that would become a bottleneck, we would use 
cascading replication, ie the master uses logical replication to 2 
"distribution" servers, which further replicates to the 50+ subscribers. 
That way, the inbetween "distribution" server would also filter out WAL 
decoding for changes that are not part of the publication.

regards
Klaus

* Does somebody know why the WAL sender has to track transactions that 
make changes to tables which are not the publication?



pgsql-general by date:

Previous
From: Christophe Pettus
Date:
Subject: Re: Best practices for data buffer cache setting/tuning (v15)
Next
From: Adrian Klaver
Date:
Subject: Re: Scriptable way to validate a pg_dump restore ?