Re: pglogical - logical replication contrib module - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: pglogical - logical replication contrib module
Date
Msg-id CAMsr+YE5DnMkWF7iRCN_zRLHYte90xZYbfHpf2rbK3nku6cFUA@mail.gmail.com
Whole thread Raw
In response to Re: pglogical - logical replication contrib module  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: pglogical - logical replication contrib module
List pgsql-hackers
On 17 February 2016 at 16:24, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
 
Thanks for your explanation. I have to agree with your arguments that in general case replication of DDL statement using logical decoding seems to be problematic. But we are mostly considering logical decoding in quite limited context: replication between two identical Postgres database nodes (multimaster).

Yep, much like BDR. Where all this infrastructure came from and is/was aimed at.
 
Do you think that it in this case replication of DLL can be done as sequence of low level operations with system catalog tables
including manipulation with locks?

No.

For one thing logical decoding doesn't see catalog tuple changes right now. Though I imagine that could be changed easily enough.

More importantly - oids. You add a column to a table:

ALTER TABLE mytable ADD COLUMN mycolumn some_type UNIQUE NOT NULL DEFAULT some_function()

This writes to catalogs including:

pg_attribute
pg_constraint
pg_index
pg_class (for the index relation)

... probably more. It also refers to pg_class (for the definition of mytable), pg_type (definition of some_type), pg_proc (definition of some_function), the b-tree operator class for some_type in pg_opclass, the b-tree indexam in pg_am, ... more.

Everything is linked by oids, and the oids are all node local. You can't just blindly re-use them. If "some_type" is hstore, the oid of hstore in pg_type might be different on the upstream and downstream. The only exception is the oids of built-in types and even then that's not guaranteed across major versions.

So if you blindly replicate catalog row changes you'll get a horrible mess. That's before considering a table's relfilenode, which is initially the same as its oid, but subject to change if truncated or rewritten.

To even begin to do this half-sanely you'd have to maintain a mapping of upstream object oids->names on the downstream, with invalidations replicated from the upstream. That's only the beginning. There's handling of extensions and lots more fun.
 
So in your example with ALTER TABLE statement, can we correctly replicate it to other nodes
as request to set exclusive lock + some manipulations with catalog tables and data table itself?

Nope. No hope, not unless "some manipulations with catalog tables and data table its self" is a lot more comprehensive than I think you mean.
 
1. Add option whether to include operations on system catalog tables in logical replication or not.

I would like to have this anyway.
 
2. Make it possible to replicate lock requests (can be useful not only for DDLs)

I have no idea how you'd even begin to do that.
 
I looked how DDL was implemented in BDR and did it in similar way in our multimaster.
But it is awful: we need to have two different channels for propagating changes.

Yeah, it's not beautiful, but maybe you misunderstood something? The DDL is written to a table, and that table's changes are replayed along with everything else. It's consistent and keeps DDL changes as part of the xact that performed them. Maybe you misunderstood how it works in BDR and missed the indirection via a table?
 
Additionally, in multimaster we want to enforce cluster wide ACID. It certainly includes operations with metadata. It will be very difficult to implement if replication of DML and DDL is done in two different ways...

That's pretty much why BDR does it this way, warts and all. Though it doesn't offer cluster-wide ACID it does need atomic commit of xacts that may contain DML, DDL, or some mix of the two.
 
Let me ask one more question concerning logical replication: how difficult it will be from your point of view to support two phase commit in logical replication? Are there some principle problems?

I haven't looked closely yet. Andres will know more.

I very, very badly want to be able to decode 2PC prepared xacts myself.

The main issue I'm aware of is locking - specifically the inability to impersonate another backend and treat locks held by that backend (which might be a fake backend for a pg_prepared_xacts entry) as held by ourselves for the purpose of being able to access relations, etc.

The work Robert is doing on group locking looks absolutely ideal for this, but won't land before 9.7.

(Closely related, I also want to be able to hook into commit and transform a normal COMMIT into a PREPARE TRANSACTION, <do some stuff>, COMMIT PREPARED with the application that issued the commit none the wiser. This will allow pessimistic 2PC-based conflict handling for must-succeed xacts like those that do DDL).

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Identifying a message in emit_log_hook.
Next
From: Craig Ringer
Date:
Subject: Re: pglogical - logical replication contrib module