Re: Support logical replication of DDLs - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Support logical replication of DDLs
Date
Msg-id CAA4eK1LPX71LzSye6ac-OOx6=wWq7U2PL-1tBf3zvncSko8Srw@mail.gmail.com
Whole thread Raw
In response to Re: Support logical replication of DDLs  ("Euler Taveira" <euler@eulerto.com>)
Responses Re: Support logical replication of DDLs
List pgsql-hackers
On Thu, Apr 14, 2022 at 7:45 PM Euler Taveira <euler@eulerto.com> wrote:
>
> You should forbid it. Unless you can decompose the command into multiple SQL
> commands to make it a safe operation for logical replication.
>
> Let's say you want to add a column with a volatile default.
>
> ALTER TABLE foo ADD COLUMN bar double precision DEFAULT random();
>
> If you replicate the DDL command as is, you will have different data
> downstream. You should forbid it. However, this operation can be supported if
> the DDL command is decomposed in multiple steps.
>
> -- add a new column without DEFAULT to avoid rewrite
> ALTER TABLE foo ADD COLUMN bar double precision;
>
> -- future rows could use the DEFAULT expression
> -- it also doesn't rewrite the table
> ALTER TABLE foo ALTER COLUMN bar SET DEFAULT random();
>
> -- it effectively rewrites the table
> -- all rows are built from one source node
> -- data will be the same on all nodes
> UPDATE foo SET bar = random();
>

While thinking about this, I see more to it than this. Say, we are
able to decompose/split the DDL command with the help of deparser, do
we want to update the additional rows on the subscriber that didn't
exist on the publisher? For example,

A table on the publisher side has rows:
ddl_test(foo)
a
----
1
2

The same table on the subscriber side has rows:
ddl_test(foo)
a
----
1
2
3
4

Now, say, the user has added a bar column with "ALTER TABLE foo ADD
COLUMN bar double precision NOT NULL DEFAULT random();" If we compare
with replication of DMLs like (UPDATE ddl_test SET bar = random();),
the replication won't update rows with values (3 and 4) on subscriber
as they don't exist on the publisher. However, if we follow the same
here for DDL replication of Alter, it will fail because of NOT NULL
constraint. So, it seems we should update all the existing rows on the
subscriber to make replication of such constraints successful. It
seems that IBM's replication solution allows replication of such DDLs
and does update all existing rows on the target table [1][2].

I think it would be tricky to update the rows in subscriber that
doesn't exist in the publisher as we need to distinguish/find such
rows during apply but I think we can find some way to achieve this if
we decide to go this way.

We can also conclude that we want to restrict the replication of Alter
Table for such cases but as other databases seem to support this, I
think it is worth trying to support such an operation. If it turns out
to be too complex or not at all feasible then we can always exclude it
from the first version.

Thoughts?

[1] - https://www.ibm.com/docs/en/idr/10.2.1?topic=replication-adding-existing-columns-subscription-unidirectional
(... Columns are added to the target table with the same data type,
null characteristic, and default value as the matching columns in the
source table... Rows that existed in the target table before the new
column is added will have a NULL or default value for the new column.
[2] - https://www.ibm.com/docs/en/idr/11.4.0?topic=replication-alter-add-column-command-multidirectional

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Allow file inclusion in pg_hba and pg_ident files
Next
From: Bharath Rupireddy
Date:
Subject: Re: Enforce "max_wal_size/ min_wal_size must be at least twice wal_segment_size" limit while setting GUCs