Re: Support logical replication of DDLs - Mailing list pgsql-hackers

From Zheng Li
Subject Re: Support logical replication of DDLs
Date
Msg-id CAAD30UJ8v7RAxfLLx-nVO9COKNeBiKVzcwu1LdCe5dZJU2bUDA@mail.gmail.com
Whole thread Raw
In response to Re: Support logical replication of DDLs  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Support logical replication of DDLs
List pgsql-hackers
Hi Dilip,

Thanks for the feedback.

> > > > The table creation WAL and table insert WAL are available. The tricky
> > > > part is how do we break down this command into two parts (a normal
> > > > CREATE TABLE followed by insertions) either from the parsetree or the
> > > > WALs. I’ll have to dig more on this.

> > I had put some more thought about this, basically, during CTAS we are
> > generating the CreateStmt inside "create_ctas_internal" and executing
> > it first before inserting the tuple, so can't we generate the
> > independent sql just for creating the tuple maybe using deparsing or
> > something?

Yes, deparsing might help for edge cases like this. However I found
a simple solution for this specific case:

The idea is to force skipping any direct data population (which can
potentially cause data inconsistency on the subscriber)
in CREATE AS and SELECT INTO command on the subscriber by forcing the
skipData flag in the intoClause of the parsetree after
the logical replication worker parses the command. The data sync will
be taken care of by the DML replication after the DDL replication
finishes.

This is implemented in the latest commit:
https://github.com/zli236/postgres/commit/116c33451da8d44577b8d6fdb05c4b6998cd0167

> > Apart from that I have one more question, basically if you are
> > directly logging the sql query then how you are identifying under
> > which schema you need to create that table, are you changing the sql
> > and generating schema-qualified name?
>
> I was going through the patch and it seems you are logging the search
> path as well along with the query so I think this will probably work.

Yes, currently we log the search path as well as the user name. And we
enforce the same search path and user name when applying the DDL command
on the subscriber.

> I have got one more query while looking into the code.  In the below
> code snippet you are logging DDL command only if it is a top level
> query but there are no comments explaining what sort of queries we
> don't want to log.  Suppose I am executing a DDL statement inside a PL
> then that will not be a top level statement so is your intention to
> block that as well or that is an unintentional side effect?
>
> +        /*
> +         * Consider logging the DDL command if logical logging is
> enabled and this is
> +         * a top level query.
> +         */
> +        if (XLogLogicalInfoActive() && isTopLevel)
> +            LogLogicalDDLCommand(parsetree, queryString);

Good catch. The reason for having isTopLevel in the condition is
because I haven't decided if a DDL statement inside a PL should
be replicated from the user point of view. For example, if I execute a
plpgsql function or a stored procedure which creates a table under the hood,
does it always make sense to replicate the DDL without running the same
function or stored procedure on the subscriber? It probably depends on
the specific
use case. Maybe we can consider making this behavior configurable by the user.

Thanks,
Zheng



pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: [PATCH] Expose port->authn_id to extensions and triggers
Next
From: Robert Haas
Date:
Subject: Re: [PATCH] Expose port->authn_id to extensions and triggers