Re: [HACKERS] Dealing with logical replication - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: [HACKERS] Dealing with logical replication
Date
Msg-id CAMsr+YFMcHSN_SbrT0dXHhvrzCKHhiU22pGQwYxTO9pndyg4kw@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Dealing with logical replication  (Tatsuo Ishii <ishii@sraoss.co.jp>)
Responses Re: [HACKERS] Dealing with logical replication
List pgsql-hackers
On 19 July 2017 at 16:34, Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
Now that we are going to have logical replication in PostgreSQL 10, I
have started thinking how Pgpool-II can deal with it. For example, the
logical replication does not replicate DDLs. Isn't it convenient for
users to do it automatically in Pgpool-II? Or even doing it for
TRUNCATE?

Or are they against the design philosophy of the logical replication?

(Disclaimer - Petr Jelinek and Peter Eisentraut were the main ones working on in in-core logical rep, and I haven't had time to play with it much).

TL;DR: a pooler can only solve a limited subset of the problem, fairly unreliably, and this is really something that needs further work in core.

Not replicating TRUNCATE and schema changes is more an implementation limitation than anything else. It's desirable to get to the point where logical replication can transparently replicate DDL. There are some hurdles for doing so, but it'll be possible to get something in place that's good enough when time/release progress permits.

Similarly, with TRUNCATE, AFAIK support just didn't make the cut for pg10.

A pooler could well help in the mean time, but you have to consider ordering with care. For example, given "U" upstream, "D" downstream:

U: CREATE TABLE x (a integer, b integer);
D: CREATE TABLE x (a); -- user script syncs
U: INSERT INTO x (a,b) VALUES (1,0);
D: [applies INSERT 1,0]
U: INSERT INTO x (a,b) VALUES (2,0);
U: ALTER TABLE x DROP COLUMN b;
D: ALTER TABLE x DROP COLUMN b; -- user script syncs
U: INSERT INTO x (a) VALUES (3);
D: [ERROR on INSERT of 2,0: no column 'b']

Because the DDL here is transported out of band vs the row data, you can easily create situations where the schema change is applied before the last committed-but-not-yet-replicated data from the upstream that was based on the old schema.

To achieve correct ordering, the simplest approach is to record DDL in a table when you perform it on the upstream, and replay it when you see rows in that table appear on the downstream.  You know it's safe to replay it now. This is the essence of what BDR and pglogical do with their DDL replication, but they handle DDL in the middle of transactions that also make row data changes by intercepting writes to the queue table and performing the DDL at the exact point in the transaction where it happened on the upstream. I don't think that's possible with the in-core logical replication yet, and certainly not something a pooler can do.

To do it externally, you have to take note of when a schema change happened on the upstream and apply it on the downstream at or after the point where the downstream has replayed and confirmed up to the upstream lsn where the schema change happened. Then apply the schema change.

A pooler trying to help here must also be very aware of the impact of multi-statements. If you send a single simple query message with a mixture of schema change commands and normal DML, you probably don't want to repeat the DML on downstream nodes or you'll get duplicate rows etc. But ... by unless it embeds transaction control commands, a simple query message executes in a single implicit transaction, so if you extract just the DDL you'll again have ordering issues of upstream vs downstream.

There are even a variety of difficulties to overcome with doing it in core: event triggers don't capture ddl command text and have no facility to turn the internal command representation back into SQL command text, nor do we have any way to turn the internal representation back into a parsenode tree for execution on a downstream's standard_ProcessUtility. However, we can capture raw command text with ProcessUtility_hook now that we have byte-ranges for the query-parts of a multi-part query (yay!), and that works well enough if you also capture the active search_path and apply with the same search_path. It can match the wrong object if extra objects with the same name are present earlier on the search_path on the downstream than on the upstream, so it's not ideal, but that's a weird corner case.

If we had a hook in the logical apply worker's insert or wal-message routines it'd be possible to write an extension to do this for pg10, but AFAICS we don't.

So schema changes in logical replication currently require more care than in physical replication.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: [HACKERS] psql's \r broken since e984ef5861d
Next
From: Craig Ringer
Date:
Subject: Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash