Thread: Logical Replication - Single Destination Table With Multiple Source Tables - How to Handle Structure Changes

I'm using logical replication to copy data from multiple tables to a single destination table.  At times the structure of the source table needs to change.  However, not all source table will have their structure updated at the same time.  Let's assume, for example, a column type needs to be changed (but solution needs to work for column addition, deletion, rename etc.).  What is the preferable approach:

 

  1. To add another column to the destination table where its type will be the new type.  Source tables, that have the new column type, will write to the new column.  While source tables with old column type will write to the old column.  A view will do the proper casting and will show a single column to user.
  2. Add a new table (versioned: table_name_v2) where source tables that have a new structure will write to the new destination table, while old source tables will write to the old destination table.   A view with UNION and casting will combine all tables.
  3. A better way?

 

 

Thanks

IMPORTANT - This email and any attachments is intended for the above named addressee(s), and may contain information which is confidential or privileged. If you are not the intended recipient, please inform the sender immediately and delete this email: you should not copy or use this e-mail for any purpose nor disclose its contents to any person.
On Sun, 4 Jul 2021 at 15:53, Avi Weinberg <AviW@gilat.com> wrote:

I'm using logical replication to copy data from multiple tables to a single destination table.  At times the structure of the source table needs to change.  However, not all source table will have their structure updated at the same time.  Let's assume, for example, a column type needs to be changed (but solution needs to work for column addition, deletion, rename etc.).  What is the preferable approach:

 

  1. To add another column to the destination table where its type will be the new type.  Source tables, that have the new column type, will write to the new column.  While source tables with old column type will write to the old column.  A view will do the proper casting and will show a single column to user.
  2. Add a new table (versioned: table_name_v2) where source tables that have a new structure will write to the new destination table, while old source tables will write to the old destination table.   A view with UNION and casting will combine all tables.
  3. A better way?

Does the below not work for all alter table changes on publisher.
I have been testing on a demo setup pg14beta, and subscribers are able to catch up fine.

on publisher (port 5001)
postgres=# alter table t alter COLUMN name type bigint using name::bigint;

on subscriber (port 5002)
postgres=# alter table t alter COLUMN name type bigint using name::bigint;
ALTER TABLE
postgres=# alter subscription mysub refresh publication;
 
this should work for all the cases for ddl changes right.

-- demo 

-- create table on pub and sub
postgres@db:~/playground/logical_replication$ psql -p 5001
psql (14beta1)
Type "help" for help.
postgres=# create table t(id int primary key, name text);
CREATE TABLE
postgres=# \q

postgres@db:~/playground/logical_replication$ psql -p 5002
psql (14beta1)
Type "help" for help.
postgres=# create table t(id int primary key, name text);
CREATE TABLE
postgres=# \q

-- insert dummy data to check sub rx changes
postgres@db:~/playground/logical_replication$ psql -p 5001
psql (14beta1)
Type "help" for help.
postgres=# insert into t values (1, 1::text);
INSERT 0 1
postgres=# \q

postgres@db:~/playground/logical_replication$ psql -p 5002
psql (14beta1)
Type "help" for help.
postgres=# table t;
 id | name
----+------
(0 rows)
postgres=# alter subscription mysub refresh publication; -- this is because i dropped table with publication enabled
ALTER SUBSCRIPTION
postgres=# table t;
 id | name
----+------
  1 | 1
(1 row)
postgres=# \q

-- alter table alter column change type on pub
postgres@db:~/playground/logical_replication$ psql -p 5001
psql (14beta1)
Type "help" for help.
postgres=# alter table t alter COLUMN name type bigint using name::bigint;
ALTER TABLE
postgres=# \q

-- alter table alter column change type on sub
postgres@db:~/playground/logical_replication$ psql -p 5002
psql (14beta1)
Type "help" for help.
postgres=# alter table t alter COLUMN name type bigint using name::bigint;
ALTER TABLE
postgres=# \q

-- insert new data based on new column type
postgres@db:~/playground/logical_replication$ psql -p 5001
psql (14beta1)
Type "help" for help.
postgres=# insert into t values (2, 100000000000000);
INSERT 0 1
postgres=# \q

-- check new data on sub
postgres@db:~/playground/logical_replication$ psql -p 5002
psql (14beta1)
Type "help" for help.
postgres=# table t;
 id |      name
----+-----------------
  1 |               1
  2 | 100000000000000
(2 rows)
postgres=# \q


--alter table alter col type on pub and insert data
postgres@db:~/playground/logical_replication$ psql -p 5001
psql (14beta1)
Type "help" for help.
postgres=# alter table t alter COLUMN name type text using name::text;
ALTER TABLE
postgres=# insert into t values (3, 'three');
INSERT 0 1
postgres=# \q

--alter table alter col type on sub, changes will not come till refresh pub
postgres@db:~/playground/logical_replication$ psql -p 5002
psql (14beta1)
Type "help" for help.
postgres=# table t;
 id |      name
----+-----------------
  1 |               1
  2 | 100000000000000
(2 rows)
postgres=#  alter table t alter COLUMN name type text using name::text;
ALTER TABLE
postgres=# table t;
 id |      name
----+-----------------
  1 | 1
  2 | 100000000000000
(2 rows)
postgres=# alter subscription mysub refresh publication; -- refresh would get back changes 
ALTER SUBSCRIPTION
postgres=# table t;
 id |      name
----+-----------------
  1 | 1
  2 | 100000000000000
  3 | three
(3 rows)

or
have I misunderstood the question? 

--
Thanks,
Vijay
Mumbai, India

Thanks for the reply,

 

My question was, what will happen if I have one destination table which gets data from many source tables.  What is the best way to handle changes in the structure of SOME of the source tables, while other source tables remain in the old format.

Maybe in some cases where the type was changed it may be able to work with source tables of different types, but what if column was renamed in one source table but the column remained with the old name in the other source table?  What column name should the destination table have?  Do I need to duplicate the column to have both old and new names?

 

From: Vijaykumar Jain [mailto:vijaykumarjain.github@gmail.com]
Sent: Sunday, July 4, 2021 6:53 PM
To: Avi Weinberg <AviW@gilat.com>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Logical Replication - Single Destination Table With Multiple Source Tables - How to Handle Structure Changes

 

On Sun, 4 Jul 2021 at 15:53, Avi Weinberg <AviW@gilat.com> wrote:

I'm using logical replication to copy data from multiple tables to a single destination table.  At times the structure of the source table needs to change.  However, not all source table will have their structure updated at the same time.  Let's assume, for example, a column type needs to be changed (but solution needs to work for column addition, deletion, rename etc.).  What is the preferable approach:

 

  1. To add another column to the destination table where its type will be the new type.  Source tables, that have the new column type, will write to the new column.  While source tables with old column type will write to the old column.  A view will do the proper casting and will show a single column to user.
  2. Add a new table (versioned: table_name_v2) where source tables that have a new structure will write to the new destination table, while old source tables will write to the old destination table.   A view with UNION and casting will combine all tables.
  3. A better way?

 

Does the below not work for all alter table changes on publisher.

I have been testing on a demo setup pg14beta, and subscribers are able to catch up fine.

 

on publisher (port 5001)

postgres=# alter table t alter COLUMN name type bigint using name::bigint;

 

on subscriber (port 5002)

postgres=# alter table t alter COLUMN name type bigint using name::bigint;

ALTER TABLE

postgres=# alter subscription mysub refresh publication;

 

this should work for all the cases for ddl changes right.

 

-- demo 

 

-- create table on pub and sub

postgres@db:~/playground/logical_replication$ psql -p 5001

psql (14beta1)

Type "help" for help.

postgres=# create table t(id int primary key, name text);

CREATE TABLE

postgres=# \q

 

postgres@db:~/playground/logical_replication$ psql -p 5002

psql (14beta1)

Type "help" for help.

postgres=# create table t(id int primary key, name text);

CREATE TABLE

postgres=# \q

 

-- insert dummy data to check sub rx changes

postgres@db:~/playground/logical_replication$ psql -p 5001

psql (14beta1)

Type "help" for help.

postgres=# insert into t values (1, 1::text);

INSERT 0 1

postgres=# \q

 

postgres@db:~/playground/logical_replication$ psql -p 5002

psql (14beta1)

Type "help" for help.

postgres=# table t;

 id | name

----+------

(0 rows)

postgres=# alter subscription mysub refresh publication; -- this is because i dropped table with publication enabled

ALTER SUBSCRIPTION

postgres=# table t;

 id | name

----+------

  1 | 1

(1 row)

postgres=# \q

 

-- alter table alter column change type on pub

postgres@db:~/playground/logical_replication$ psql -p 5001

psql (14beta1)

Type "help" for help.

postgres=# alter table t alter COLUMN name type bigint using name::bigint;

ALTER TABLE

postgres=# \q

 

-- alter table alter column change type on sub

postgres@db:~/playground/logical_replication$ psql -p 5002

psql (14beta1)

Type "help" for help.

postgres=# alter table t alter COLUMN name type bigint using name::bigint;

ALTER TABLE

postgres=# \q

 

-- insert new data based on new column type

postgres@db:~/playground/logical_replication$ psql -p 5001

psql (14beta1)

Type "help" for help.

postgres=# insert into t values (2, 100000000000000);

INSERT 0 1

postgres=# \q

 

-- check new data on sub

postgres@db:~/playground/logical_replication$ psql -p 5002

psql (14beta1)

Type "help" for help.

postgres=# table t;

 id |      name

----+-----------------

  1 |               1

  2 | 100000000000000

(2 rows)

postgres=# \q

 

 

--alter table alter col type on pub and insert data

postgres@db:~/playground/logical_replication$ psql -p 5001

psql (14beta1)

Type "help" for help.

postgres=# alter table t alter COLUMN name type text using name::text;

ALTER TABLE

postgres=# insert into t values (3, 'three');

INSERT 0 1

postgres=# \q

 

--alter table alter col type on sub, changes will not come till refresh pub

postgres@db:~/playground/logical_replication$ psql -p 5002

psql (14beta1)

Type "help" for help.

postgres=# table t;

 id |      name

----+-----------------

  1 |               1

  2 | 100000000000000

(2 rows)

postgres=#  alter table t alter COLUMN name type text using name::text;

ALTER TABLE

postgres=# table t;

 id |      name

----+-----------------

  1 | 1

  2 | 100000000000000

(2 rows)

postgres=# alter subscription mysub refresh publication; -- refresh would get back changes 

ALTER SUBSCRIPTION

postgres=# table t;

 id |      name

----+-----------------

  1 | 1

  2 | 100000000000000

  3 | three

(3 rows)

 

or

have I misunderstood the question? 

 

--

Thanks,

Vijay

Mumbai, India

IMPORTANT - This email and any attachments is intended for the above named addressee(s), and may contain information which is confidential or privileged. If you are not the intended recipient, please inform the sender immediately and delete this email: you should not copy or use this e-mail for any purpose nor disclose its contents to any person.
On Mon, 5 Jul 2021 at 14:29, Avi Weinberg <AviW@gilat.com> wrote:

Thanks for the reply,

 

My question was, what will happen if I have one destination table which gets data from many source tables.  What is the best way to handle changes in the structure of SOME of the source tables, while other source tables remain in the old format.

Maybe in some cases where the type was changed it may be able to work with source tables of different types, but what if column was renamed in one source table but the column remained with the old name in the other source table?  What column name should the destination table have?  Do I need to duplicate the column to have both old and new names?


ADD NEW column: [1]
add a new column to the destination first and then to the source(s).
the WALs would be retained by the publisher till it can start publishing again, so no data loss.

ALTER COL: [2]
imho, add a new column of the to be modified datatype to both destination and source as above in [1]. then write a trigger for source tables to sync the columns which need to have the type changed to the new column added in the same source tables. let those changes get published to destination.  (for ex. check the int to bigint migration in PG, like int_to_bigint_migration
and then drop the old column at source and destination.

DROP COL [3]:
dropping-columns  if there are apps making use of that column, first you would have to ensure, the column is no longer in use in queries. then you can start by dropping the col at source(s) and then the destination.

I have a setup that I have used with three source dbs and 1 dest db for above. but things can get more complicated with FK constraints etc, so i am not sure i have the best answer to this as i have not done it in production.
I have only used LR for migration and was thinking of (federated setup /to unshard) where many shards -> LR -> one shard and when sharding was a bad decision, but in both cases did not allow DDL changes at source till it was completed.