Home > mailing lists

Logical Replication for Very Large Databases - Mailing list pgsql-general

From	Taylor Sarrafian
Subject	Logical Replication for Very Large Databases
Date	July 9, 2019 03:15:23
Msg-id	8B0D84B4-7A97-42E1-9E94-14E026244104@gmail.com Whole thread Raw
List	pgsql-general

Tree view

Hi friends,

We have a relatively large database (~5TB) receiving very frequent INSERTs/UPDATEs, and we’d like to create a logical replica of it. To put things in perspective, we often generate up to 10 WAL-segments (16MB each) per second (wal_level=replica).

We successfully set up logical replication in our smaller staging DB, but the scale of production is making things difficult.

---

Here are some approaches we considered:

1. `CREATE SUBSCRIPTION … WITH ( copy_data=true )`

This would be ideal, except we’re afraid of accumulating too much WAL on the publishing server while waiting for ~5TB to copy over. We could theoretically increase the size of the drive to something like 10TB, but that’s not currently an option for us.

There’s also the issue of performance. It’s unclear what impact `copy_data=true` would have on our DB read/write throughput, and we’re hesitant to try it.

2. Start from a base-backup and catch-up with `restore_command`:

`SELECT pg_replication_slot_advance( pg_current_wal_lsn() )` could work nicely with a matching `recovery_target_lsn` in `recovery.conf`.

We already use `archive_command` with `wal-g` to store WAL and base-backups to GCS. Unfortunately, by the time our base-backup is downloaded onto the replica, it’s already at least 6 hours old. And restoring with `restore_command` runs at a similar speed to `archive_command`, so it remains ~5-8 hours behind as long as it’s in standby.

Our postgres process isn’t limited by CPU, memory, network, or disk-speed on the replica. It doesn’t seem to be working at its full capacity, which suggests a configuration issue.

Please look at our configurations below and let us know if there are any options we could tweak in `postgresql.conf` to speed up the WAL ingestion for `restore_command`.

3. Start from a base-backup and catch-up with the `pg_wal` directory:

We could write a script to grab WAL-segments from GCS and place them into the `pg_wal` directory. This is what `restore_command` is technically doing, but we might be able to squeeze some efficiency out of a custom script. Maybe.

—

Our server has 32 cores and 120GB of memory with the following selected configs:

```

shared_buffers = 30GB

huge_pages = try

maintenance_work_mem = 1GB

max_files_per_process = 2000

effective_io_concurrency = 100

max_worker_processes = 32

max_parallel_maintenance_workers = 2

max_parallel_workers_per_gather = 0

parallel_leader_participation = on

max_parallel_workers = 32

wal_level = archive

wal_buffers = 16MB

checkpoint_timeout = 30min

max_wal_size = 100GB

min_wal_size = 1GB

checkpoint_completion_target = 0.9

max_wal_senders = 20

wal_keep_segments = 10

effective_cache_size = 90GB

```

Thanks for all the help!

pgsql-general by date:

From: Tatsuo Ishii
Date: 09 July 2019, 01:57:04
Subject: Re: pgpool, pgmaster and pgslave migration to ubuntu 18.04

From: Thomas Munro
Date: 09 July 2019, 07:43:21
Subject: Re: OOM with many sorts

Logical Replication for Very Large Databases - Mailing list pgsql-general

Previous

Next