Appending data locally to a logical replication subscriber - Mailing list pgsql-general

From andrew cooke
Subject Appending data locally to a logical replication subscriber
Date
Msg-id 20220530141323.y2saamlcv6y6h3zf@acooke.org
Whole thread Raw
List pgsql-general
Hi All,

I would appreciate some advice on adding data to logically replicated
tables on the subscriber.  I am worried about contention between
writes from local data loading and replication.

We have 14 publisher databases (all with identical schema) that are
constantly receiving new data.  The tables have a composite primary
key where one column identifies the publisher machine.

We also have a single subscriber that subscribes to all these
machines.

In addition to the realtime data described above, we would like to
back-load some archive data.  For practical (ie political) reasons it
is easier to do this on the subscriber than on each publisher.  The
archive data will use a composite primary key that combines the
publisher code (used to differentiate publishers) with an incremental
ID that will start at a "sufficiently large number" to avoid
conflicts.

The loading process is not a simple append; some data are normalized
in separate tables.  So, for example, there may be a table with
orders, where each line is unique, but the orders reference companies
in another table, and usually these companies already exist.

My primary concern is that there is going to be contention between
replication and archive loading.

All tables on the subscriber may be queried by users (so disabling
indices is not trivial although maybe we could drop constraints and
load archive data at night).

Really, after all that, I guess I am asking the following:

 - Is this just a bad idea?  Is it implicit (or maybe explicit and
   I've missed it) that the subscriber should be dedicated only to
   reading subscriptions?

 - Do subscriber transactions reflect publisher transactions?  For
   example, if the publisher receives many rows in a single
   transaction (which can happen if we know/guess that the table is
   only being written to by one process) does that mean that the
   subscriber also writes those rows in a single transaction?  Or is
   it "simplified" to a transaction per row?  Or something else (like
   one transaction per "batch")?

 - Is there anything else I should be concerned about?

Thanks,
Andrew




pgsql-general by date:

Previous
From: Imre Samu
Date:
Subject: Re: Is it possible to index "deep" into a JSONB column?
Next
From: Shaheed Haque
Date:
Subject: Re: Is it possible to index "deep" into a JSONB column?