Re: Support logical replication of DDLs - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Support logical replication of DDLs
Date
Msg-id CAA4eK1JVynFsj+mcRWj9sewR2yNUs6LuNxJ0eN-gNJ83oKcUOQ@mail.gmail.com
Whole thread Raw
In response to RE: Support logical replication of DDLs  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Thu, Jun 9, 2022 at 5:14 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> Hi,
>
> I did some research for one potential problem[1] mentioned earlier which is related
> to the function execution when replicating DDL.
>
> [1]> 4. Statements that have nondeterministic side effects (e.g., as caused
> > by triggers, stored procedures, user-defined functions) may result in
> > different side effects occurring on each subscriber.
> >
> > Think how to handle triggers and functions with same name but different
> > purpose.
> >
>
> Examples:
> ALTER TABLE ADD CONSTRAINT func()
> ALTER TABLE ADD COLUMN DEFAULT func()
> CREATE TRIGGER ... execute procedure func()
> ...
>
> When replication the above DDLs, there are some cases we need to think about.
>
> ----------------
> 1) The functions used in DDL have same definition among pub/sub.
>
> In this case, if the functions include DML/DDL operations, it will result in
> unexpected behavior.
>
> For example:
>
> --- both pub and sub have the same function test_func().
> create function test_func()
> returns integer as '
> begin
>     CREATE TABLE test(a int);
>         INSERT INTO test values(1);
>         return 0;
> end
> ' language plpgsql;
>
> --- We replicate the following DDL
> ALTER TABLE t ADD COLUMN a int DEFAULT test_func();
>
> There are three SQLs that would be replicated to the subscriber:
> CREATE TABLE test(a int);
> INSERT(1);
> ALTER TABLE t ADD COLUMN a int DEFAULT test_func();
>
> Then, we would create the table "test" twice and insert same value twice(first by
> executing DEFAULT function, second by replaying the CREATE TABLE and INSERT
> command) on subcriber.
>

The other kind of problem is that we don't know whether the tables
being accessed by these DDL/DML are published or not. So blindly
allowing such functions can allow unintended clauses like the tables
accessed in those functions may not even exist on the subscriber-side.

> One possible idea is that we only allow replicating 'immutable/stable' function
> which won't include write action so that we don't have this problem. But it
> seems a bit restrictive.
>
> ----------------
>
> 2) The functions used in DDL have different definitions among pub/sub.
>
> I am not sure how to handle this case as this could result in unpredictable
> behavior based on the different definitions. And it's difficult to compare the
> function definition among pub/sub because the function could call other
> functions nested.
>
> OTOH, the current behavior of this case on HEAD is that we don't check the
> consistency of the functions executed on pub/sub. For example: the row triggers
> will always be fired on subscriber(when enabled) without checking if the same
> trigger exists on publisher. So, it might be acceptable if we document this
> restriction, although I am not sure.
>
> *******************
>
> - About the solution for the above two points. I think one solution could be:
>
> We can document that user should make sure the DDL to be replicated should not
> execute any function which could execute DML/DDL. This seems acceptable as it's
> a pretty rare case to execute DML/DDL in a CONSTRAINT function or DEFAULT
> function. And the document[1] already suggest similar thing for CONSTRAINT
> function. Besides, we can also document that and the functions should be
> defined in a consistent way among pub/sub.
>
> [1] https://www.postgresql.org/docs/devel/ddl-constraints.html
> > PostgreSQL assumes that CHECK constraints' conditions are immutable, that is,
> > they will always give the same result for the same input row. This assumption
> > is what justifies examining CHECK constraints only when rows are inserted or
> > updated, and not at other times. (The warning above about not referencing
> > other table data is really a special case of this restriction.)
>
> - Another solution could be:
>
> We coud introduce a new function flag called (replica_safety) and the values
> could be 'safe'/'unsafe'. 'safe' indicates that it's safe to be replicated to
> subscriber and it's safe to be executed when replay the DDL from other
> publisher.
>

Yeah, something like this could be a good idea but I think for the
first version we should simply raise a WARNING for such DDLs
indicating that they won't be replicated.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Using PQexecQuery in pipeline mode produces unexpected Close messages
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Using PQexecQuery in pipeline mode produces unexpected Close messages