Add support for specifying tables in pg_createsubscriber. - Mailing list pgsql-hackers

From Shubham Khanna
Subject Add support for specifying tables in pg_createsubscriber.
Date
Msg-id CAHv8Rj+sxWutv10WiDEAPZnygaCbuY2RqiLMj2aRMH-H3iZwyA@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hi hackers,

Currently, pg_createsubscriber supports converting streaming
replication to logical replication for selected databases or all
databases. However, there is no provision to replicate only a few
selected tables. For such cases, users are forced to manually set up
logical replication using individual SQL commands (CREATE PUBLICATION,
CREATE SUBSCRIPTION, etc.), which can be time-consuming and
error-prone. Extending pg_createsubscriber to support table-level
replication would significantly improve the time taken to perform the
setup.
The attached patch introduces a new '--table' option that can be
specified after each '--database' argument. It allows users to
selectively replicate specific tables within a database instead of
defaulting to all tables. The syntax is like that used in 'vacuumdb'
and supports multiple '--table' arguments per database, including
optional column lists and row filters.
Example usage:
./pg_createsubscriber \ --database db1 \ --table 'public.t1' \ --table
'public.t2(a,b) WHERE a > 100' \ --database db2 \ --table 'public.t3'

I conducted tests comparing the patched pg_createsubscriber with
standard logical replication under various scenarios to assess
performance and flexibility. All test results represent the average of
five runs.

Scenario               pg_createsubscriber   Logical Replication  Improvement

Two databases
(postgres and
db1 each
having 100
tables), replicate
all 100 in
 postgres, 50
tables in db1
(100MB/table)
total 15GB data      2m4.823s                  7m23.294s               71.85%

One DB, 100
tables, replicate
50 tables
(200 MB/table)
total 10GB data     2m47.703s                 4m58.003s                43.73%

One DB, 200
tables, replicate
100 tables
(100 MB/table)
total 10GB data     3m6.476s                  4m35.130s                32.22%

One DB, 100
tables, replicate
50 tables
(100MB/table)
total 5GB data       1m54.384s                2m23.719s                 20.42%

These results demonstrate that pg_createsubscriber consistently
outperforms standard logical replication by 20.42% for 5GB data to
71.85% for 15GB data, the time taken reduces as the data increases.

The attached test scripts were used for all experiments.
Scenario 1 (Logical replication setup involving 50 tables across 2
databases, each containing 100 tables with 100 MB of data per table):
pg_createsubscriber_setup_multi_db.sh was used for setup, followed by
pg_createsubscriber_test_multi_db.sh to measure performance. For
logical replication, the setup was done using
logical_replication_setup_multi_db.sh, with performance measured via
logical_replication_test_multi_db.sh.

Scenario 2 and 3:
The pg_createsubscriber_setup_single_db.sh (uncomment appropriate
scenario mentioned in comments) script was used, with configuration
changes specific to Scenario 2 and Scenario 3. In both cases,
pg_createsubscriber_test_single_db.sh (uncomment appropriate scenario
mentioned in comments) was used for measuring performance. Logical
replication followed the same pattern, using
logical_replication_setup_single_db.sh (uncomment appropriate scenario
mentioned in comments) and logical_replication_test_single_db.sh
(uncomment appropriate scenario mentioned in comments) for
measurement.

Scenario 4 (Logical replication setup on 50 tables from a database
containing 100 tables, each with 100 MB of data):
pg_createsubscriber_setup_single_db.sh (without modifications) was
used for setup, and pg_createsubscriber_test_single_db.sh (without
modifications) was used for performance measurement. Logical
replication used logical_replication_setup_single_db.sh (without
modifications) for setup and logical_replication_test_single_db.sh
(without modifications) for measurement.

Thoughts?

Thanks and regards,
Shubham Khanna.

Attachment

pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Next
From: Amit Kapila
Date:
Subject: Re: Document slot's restart_lsn can go backward