Re: Support logical replication of DDLs - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Support logical replication of DDLs
Date
Msg-id CAFiTN-u9McY9Df2dUOX3u326LMJAY8uP6YE1K8GdpYp95x=S5Q@mail.gmail.com
Whole thread Raw
In response to Re: Support logical replication of DDLs  (Zheng Li <zhengli10@gmail.com>)
Responses Re: Support logical replication of DDLs  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Support logical replication of DDLs  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Support logical replication of DDLs  (Zheng Li <zhengli10@gmail.com>)
List pgsql-hackers
On Thu, Mar 24, 2022 at 11:24 PM Zheng Li <zhengli10@gmail.com> wrote:
>
> Hi Dilip,
>
> Thanks for the feedback.
>
> > > > > The table creation WAL and table insert WAL are available. The tricky
> > > > > part is how do we break down this command into two parts (a normal
> > > > > CREATE TABLE followed by insertions) either from the parsetree or the
> > > > > WALs. I’ll have to dig more on this.
>
> > > I had put some more thought about this, basically, during CTAS we are
> > > generating the CreateStmt inside "create_ctas_internal" and executing
> > > it first before inserting the tuple, so can't we generate the
> > > independent sql just for creating the tuple maybe using deparsing or
> > > something?
>
> Yes, deparsing might help for edge cases like this. However I found
> a simple solution for this specific case:
>
> The idea is to force skipping any direct data population (which can
> potentially cause data inconsistency on the subscriber)
> in CREATE AS and SELECT INTO command on the subscriber by forcing the
> skipData flag in the intoClause of the parsetree after
> the logical replication worker parses the command. The data sync will
> be taken care of by the DML replication after the DDL replication
> finishes.

Okay, something like that should work, I am not sure it is the best
design though.

> This is implemented in the latest commit:
> https://github.com/zli236/postgres/commit/116c33451da8d44577b8d6fdb05c4b6998cd0167
>
> > > Apart from that I have one more question, basically if you are
> > > directly logging the sql query then how you are identifying under
> > > which schema you need to create that table, are you changing the sql
> > > and generating schema-qualified name?
> >
> > I was going through the patch and it seems you are logging the search
> > path as well along with the query so I think this will probably work.
>
> Yes, currently we log the search path as well as the user name. And we
> enforce the same search path and user name when applying the DDL command
> on the subscriber.

Yeah this looks fine to me.

>
> > I have got one more query while looking into the code.  In the below
> > code snippet you are logging DDL command only if it is a top level
> > query but there are no comments explaining what sort of queries we
> > don't want to log.  Suppose I am executing a DDL statement inside a PL
> > then that will not be a top level statement so is your intention to
> > block that as well or that is an unintentional side effect?
> >
> > +        /*
> > +         * Consider logging the DDL command if logical logging is
> > enabled and this is
> > +         * a top level query.
> > +         */
> > +        if (XLogLogicalInfoActive() && isTopLevel)
> > +            LogLogicalDDLCommand(parsetree, queryString);
>
> Good catch. The reason for having isTopLevel in the condition is
> because I haven't decided if a DDL statement inside a PL should
> be replicated from the user point of view. For example, if I execute a
> plpgsql function or a stored procedure which creates a table under the hood,
> does it always make sense to replicate the DDL without running the same
> function or stored procedure on the subscriber? It probably depends on
> the specific
> use case. Maybe we can consider making this behavior configurable by the user.

But then this could be true for DML as well right?  Like after
replicating the function to the subscriber if we are sending the DML
done by function then what's the problem in DDL.  I mean if there is
no design issue in implementing this then I don't think there is much
point in blocking the same or even providing configuration for this.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Nikola Ivanov
Date:
Subject: Re: SQL/JSON: functions
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: standby recovery fails (tablespace related) (tentative patch and discussion)