Logical Replication Custom Column Expression - Mailing list pgsql-hackers

From Stavros Koureas
Subject Logical Replication Custom Column Expression
Date
Msg-id CA+O1jk6bpKcgc9HcjJtEgS6Cq=KfZobGF42GPeQ-ZCNP4uXRsQ@mail.gmail.com
Whole thread Raw
Responses Re: Logical Replication Custom Column Expression
List pgsql-hackers
Hi all,

Working with PostgreSQL Logical Replication is just great! It helps a lot doing real time replication for analytical purposes without using any other 3d party service. Although all these years working as product architect of reporting i have noted a few requirements which are always a challenge and may help enhance logical replication even better.

To the point:
PostgreSQL14 Logical Replication allows replication of a table to another table that exists in another database or even in another host. It also allows multiple upstream tables using the same structure to downstream into a single table.
CREATE PUBLICATION pb_test FOR TABLE test

PostgreSQL15 Logical Replication allows even better replication options, like selecting subsets of the columns from publisher tables. It also supports plenty of options like disable_on_error etc.
CREATE PUBLICATION pb_test FOR TABLE test ("id", "name")

What does not support is the option for defining custom column expressions, as keys or values, into the upstream (publication). This will give more flexibility into making replication from multiple upstreams into less downstreams adding more logic. For instance, in a project for analytical purposes there is the need to consolidate data from multiple databases into one and at the same time keep the origin of each replicated data identified by a tenanant_id column. In this case we also need the ability to define the new column as an additional key which will participate into the destination table.

Tenant 1 table
id serial pk
description varchar

Tenant 2 table
id integer pk
description varchar

Group table
tenant integer pk
id integer pk
description varchar

Possible syntax to archive that
CREATE PUBLICATION pb_test FOR TABLE test ({value:datatype:iskey:alias} ,"id", "name")

Example
CREATE PUBLICATION pb_test FOR TABLE test ({1:integer:true:tenant} ,"id", "name")

I suppose the column definition should exist in the publication syntax as the publication should know from before the datatype and if is a key before being consumed by a subscriber which may already have the column.

So making an insert or update or delete statement into a tenant 1 database:
INSERT INTO test (id, description) VALUES (5, 'data')
UPDATE test SET description = 'data' WHERE id = 5
DELETE FROM test WHERE id = 5
Will be reflected into subscriber as the following
INSERT INTO test (tenant, id, description) VALUES (1, 5, 'data')
UPDATE test SET description = 'data' WHERE tenant=1 AND id = 5
DELETE FROM test WHERE tenant=1 AND id = 5

For more clarifications please reach me at koureasstavros@gmail.com
Thanks!



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Next
From: Andrew Dunstan
Date:
Subject: Re: Fix proposal for comparaison bugs in PostgreSQL::Version