RE: Support logical replication of DDLs - Mailing list pgsql-hackers
From | houzj.fnst@fujitsu.com |
---|---|
Subject | RE: Support logical replication of DDLs |
Date | |
Msg-id | OS0PR01MB5716733D6C79D2198E5773CC94A79@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | RE: Support logical replication of DDLs ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>) |
Responses |
Re: Support logical replication of DDLs
|
List | pgsql-hackers |
Hi, I did some research for one potential problem[1] mentioned earlier which is related to the function execution when replicating DDL. [1]> 4. Statements that have nondeterministic side effects (e.g., as caused > by triggers, stored procedures, user-defined functions) may result in > different side effects occurring on each subscriber. > > Think how to handle triggers and functions with same name but different > purpose. > Examples: ALTER TABLE ADD CONSTRAINT func() ALTER TABLE ADD COLUMN DEFAULT func() CREATE TRIGGER ... execute procedure func() ... When replication the above DDLs, there are some cases we need to think about. ---------------- 1) The functions used in DDL have same definition among pub/sub. In this case, if the functions include DML/DDL operations, it will result in unexpected behavior. For example: --- both pub and sub have the same function test_func(). create function test_func() returns integer as ' begin CREATE TABLE test(a int); INSERT INTO test values(1); return 0; end ' language plpgsql; --- We replicate the following DDL ALTER TABLE t ADD COLUMN a int DEFAULT test_func(); There are three SQLs that would be replicated to the subscriber: CREATE TABLE test(a int); INSERT(1); ALTER TABLE t ADD COLUMN a int DEFAULT test_func(); Then, we would create the table "test" twice and insert same value twice(first by executing DEFAULT function, second by replaying the CREATE TABLE and INSERT command) on subcriber. One possible idea is that we only allow replicating 'immutable/stable' function which won't include write action so that we don't have this problem. But it seems a bit restrictive. ---------------- 2) The functions used in DDL have different definitions among pub/sub. I am not sure how to handle this case as this could result in unpredictable behavior based on the different definitions. And it's difficult to compare the function definition among pub/sub because the function could call other functions nested. OTOH, the current behavior of this case on HEAD is that we don't check the consistency of the functions executed on pub/sub. For example: the row triggers will always be fired on subscriber(when enabled) without checking if the same trigger exists on publisher. So, it might be acceptable if we document this restriction, although I am not sure. ******************* - About the solution for the above two points. I think one solution could be: We can document that user should make sure the DDL to be replicated should not execute any function which could execute DML/DDL. This seems acceptable as it's a pretty rare case to execute DML/DDL in a CONSTRAINT function or DEFAULT function. And the document[1] already suggest similar thing for CONSTRAINT function. Besides, we can also document that and the functions should be defined in a consistent way among pub/sub. [1] https://www.postgresql.org/docs/devel/ddl-constraints.html > PostgreSQL assumes that CHECK constraints' conditions are immutable, that is, > they will always give the same result for the same input row. This assumption > is what justifies examining CHECK constraints only when rows are inserted or > updated, and not at other times. (The warning above about not referencing > other table data is really a special case of this restriction.) - Another solution could be: We coud introduce a new function flag called (replica_safety) and the values could be 'safe'/'unsafe'. 'safe' indicates that it's safe to be replicated to subscriber and it's safe to be executed when replay the DDL from other publisher. When replicating the DDL on publisher, we first check if the functions used in DDL are replica_safe, if not, we don't replicate that DDL. If yes, before replicating the DDL, we first send a new type FUNCTIONS message which include functions(used in DDL or ..) information to the subscriber which is similar to the RELATION messages. And on subscriber, we check if the received functions exist and if they are also marked replica safe on subscriber. if functions are not replica safe on sub, we report an error and suggest user to adjust the problematic function. I haven't tried to implement it so I could miss something. Thoughts ? BTW, attach the POC patch set which only fixed a CFbot error and added some testcases provided by Osumi-san. Best regards, Hou zj
Attachment
pgsql-hackers by date: