Home > mailing lists

Re: Logical Replication Delay - Mailing list pgsql-general

From	Justin
Subject	Re: Logical Replication Delay
Date	September 24 02:37:22
Msg-id	CALL-XeMU_8vNsHY0PojWnfmPn4Q2+-TX7=m2arDKrOiisBA1oQ@mail.gmail.com Whole thread Raw
In response to	Re: Logical Replication Delay (Ramakrishna m <ram.pgdb@gmail.com>)
List	pgsql-general

Tree view

Hi Ramakrishna,

I am not following the reasoning on not separating the tables into different publications and subscriptions. I set up logical replication all the time in many different environments, one of the audits I perform before deploying LR is looking at pg_stat_all_tables and WAL creation rate. If it is above a certain threshold the tables are split up into different publications and subscriptions.

If the concern is FK constraints being violated LR does not validate those, it's not a concern.

If the concerns are triggers, the trigger has to be seto ALWAYS or REPLICA meaning LR will execute it. I can see an LR worker being AHEAD of other workers trying to reference ROWs that do not exist yet. Which can be dealt with by making sure the Triggers that reference other tables are in the same publication and subscription.

Thanks
Justin

On Mon, Sep 23, 2024 at 12:32 AM Ramakrishna m <ram.pgdb@gmail.com> wrote:

Hi Justin,

Thank you for your suggestions and detailed insights.

Due to certain business constraints, we are unable to split the tables into parallel publications. All of the tables involved are primary key tables, which adds further complexity to separating them into multiple publications.

That said, we truly appreciate your recommendations regarding the use of binary mode and reviewing the use of IDENTITY SET TO FULL. We will ensure that the subscriber is operating in binary mode and will recheck the identity setup to minimize WAL size and overhead.

Regards,
Ram.

On Sun, 22 Sept 2024 at 01:32, Justin <zzzzz.graf@gmail.com> wrote:
Hi Ramakrishna,

4GB of WAL generated per minute is a lot. I would expect the replay on the subscriber to lag behind because it is a single process. PostgreSQL 16 can create parallel workers for large transactions, however if there is a flood of small transactions touching many tables the single LR worker is going to fall behind.

The only option is split the LR into multiple Publications and Subscriptions as a single worker can't keep up.

What is the justification to not split the tables across multiple Publications and Subscriptions

Additional items to check

Make sure the Subscriber is using binary mode, this avoids an encoding step.
https://www.postgresql.org/docs/current/sql-createsubscription.html

Avoid the use of IDENTITY SET TO FULL on the publisher, if you do use IDENTITY FULL make sure the subscriber table identity is set to a qualifying unique index. In previous versions of PG the publisher and subscriber identities had to match...

IDENTITY SET TO FULL increase the size of the WAL and the work the publisher and subscriber has to do.

Hope this helps.

On Sat, Sep 21, 2024 at 3:08 PM Ramakrishna m <ram.pgdb@gmail.com> wrote:
Hi Team,
We have configured bidirectional replication (but traffic can only flow in one direction) between two data centers (distance: 1000 km, maximum Network latency: 100 ms) with an application TPS (transactions per second) of 700 at maximum.
We are fine with handling up to 500 TPS without observing any lag between the two data centers. However, when TPS increases, we notice a lag in WAL files of over 100 GB (initially, it was 1 TB, but after tuning, it was reduced to 100 GB). During peak times, WAL files are generated at a rate of 4 GB per minute.
All transactions (Tx) take less than 200 ms, with a maximum of 1 second at times (no long-running transactions).
Here are the configured parameters and resources:
OS: Ubuntu
RAM: 376 GB
CPU: 64 cores
Swap: 32 GB
PostgreSQL Version: 16.4 (each side has 3 nodes with Patroni and etcd configured)
DB Size: 15 TB
Parameters configured on both sides:
Name Setting Unit

log_replication_commands off
logical_decoding_work_mem 524288 kB
max_logical_replication_workers 16
max_parallel_apply_workers_per_subscription 2
max_replication_slots 20
max_sync_workers_per_subscription 2
max_wal_senders 20
max_worker_processes 40
wal_level logical
wal_receiver_timeout 600000 ms
wal_segment_size 1073741824 B
wal_sender_timeout 600000 ms
Optimizations applied:
Vacuum freeze is managed during off-hours; no aggressive vacuum is triggered during business hours.
Converted a few tables to unlogged.
Removed unwanted tables from publication.
Partitioned all large tables.
Pending:
Turning off/tuning autovacuum parameters to avoid triggering during business hours.
Not possible: We are running all tables in a single publication, and it is not possible to separate them.
I would greatly appreciate any suggestions you may have to help avoid logical replication delays, whether through tuning database or operating system parameters, or any other recommendations

--
Thanks & Regards,
Ram.


log_replication_commands	off
logical_decoding_work_mem	524288	kB
max_logical_replication_workers	16
max_parallel_apply_workers_per_subscription	2
max_replication_slots	20
max_sync_workers_per_subscription	2
max_wal_senders	20
max_worker_processes	40
wal_level	logical
wal_receiver_timeout	600000	ms
wal_segment_size	1073741824	B
wal_sender_timeout	600000	ms

pgsql-general by date:

From: Dominique Devienne
Date: 23 September, 21:37:58
Subject: Re: Customize psql prompt to show current_role

From: Ayush Vatsa
Date: 24 September, 17:14:38
Subject: Issues with PostgreSQL Source Code Installation

Re: Logical Replication Delay - Mailing list pgsql-general

Previous

Next