Logical Replication vs. 2PC - Mailing list pgsql-hackers

From Amit Kapila
Subject Logical Replication vs. 2PC
Date
Msg-id CAA4eK1+opiV4aFTmWWUF9h_32=HfPOW9vZASHarT0UA5oBrtGw@mail.gmail.com
Whole thread Raw
Responses Re: Logical Replication vs. 2PC  (vignesh C <vignesh21@gmail.com>)
Re: Logical Replication vs. 2PC  (Ajin Cherian <itsajin@gmail.com>)
Re: Logical Replication vs. 2PC  (Markus Wanner <markus.wanner@enterprisedb.com>)
List pgsql-hackers
While reviewing/testing subscriber-side work for $SUBJECT [1], I
noticed a problem that seems to need a broader discussion, so started
this thread. We can get prepare for the same GID more than once for
the cases where we have defined multiple subscriptions for
publications on the same server and prepared transaction has
operations on tables subscribed to those subscriptions. For such
cases, one of the prepare will be successful and others will fail in
which case the server will send them again. Once the commit prepared
is done for the first one, the next prepare will be successful. Now,
this is not ideal but will work.

However, if the user has setup synchronous_standby_names for all the
subscriptions then we won't be able to proceed because the prepare on
publisher will wait for all the subscriptions to ack and the
subscriptions are waiting for the first prepare to finish. See an
example below for such a situation. I think this can also happen if we
get any key violation while applying the changes on the subscriber,
but for that, we can ask the user to remove the violating key on the
subscriber as that is what we suggest now also for commits. Similarly,
say the user has already prepared the transaction with the same GID on
subscriber-node, then also we can get into a similar situation but for
that, we can ask the user to commit such a GID.

We can think of appending some unique identifier (like subid) with GID
but that won't work for cascaded standby setup (where the prepares on
subscriber will be again sent to another subscriber) as the GID can
become too long. So that might not be a good solution, maybe we can
optimize it in some way that we append only when there is a GID clash.
The other thing we could do is to ask the user to temporarily disable
the subscription and change synchronous_standby_settings on the
publisher node. Any better ideas?

Example of the above scenario, you can see this problem after applying
the patches at [1].

Publisher
=================
CREATE TABLE mytbl(id SERIAL PRIMARY KEY, somedata int, text varchar(120));

BEGIN;
INSERT INTO mytbl(somedata, text) VALUES (1, 1);
INSERT INTO mytbl(somedata, text) VALUES (1, 2);
COMMIT;

CREATE PUBLICATION mypub FOR TABLE mytbl;

CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120));

BEGIN;
INSERT INTO mytbl1(somedata, text) VALUES (1, 1);
INSERT INTO mytbl1(somedata, text) VALUES (1, 2);
COMMIT;

CREATE PUBLICATION mypub1 FOR TABLE mytbl1;

Subscriber
=============
CREATE TABLE mytbl(id SERIAL PRIMARY KEY, somedata int, text varchar(120));

CREATE SUBSCRIPTION mysub
         CONNECTION 'host=localhost port=5432 dbname=postgres'
        PUBLICATION mypub WITH(two_phase = on);

CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text
varchar(120)); CREATE SUBSCRIPTION mysub1
         CONNECTION 'host=localhost port=5432 dbname=postgres'
        PUBLICATION mypub1 WITH(two_phase = on);

Now, set synchronous_standby_names = 'FIRST 2 (mysub, mysub1)' on the
publisher in postgresql.conf and restart both publisher and
subscriber, actually restart is not required as
synchronous_standby_names is a SIGHUP parameter.

Publisher
=============
BEGIN;
Insert into mytbl values(17,1,18);
Insert into mytbl1 values(17,1,18);
Prepare Transaction 'foo';

Now, this Prepare transaction will wait forever because on subscriber
we are getting "ERROR:  transaction identifier "foo" is already in
use" which means it is waiting for a publisher to send commit prepared
for first apply worker and publisher is waiting for both the
subscriptions to send ack. This is happening because the prepared
transaction on publisher operates on tables of both subscriptions.

In short, on the subscriber, both the apply workers (corresponding to
two subscriptions) are getting the same prepare transaction GID,
leading to an error on the subscriber and making the publisher wait
forever.

Thoughts?

[1] - https://www.postgresql.org/message-id/CAHut%2BPv3X7YH_nDEjH1ZJf5U6M6DHHtEjevu7PY5Dv5071jQ4A%40mail.gmail.com

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: crash during cascaded foreign key update
Next
From: Greg Nancarrow
Date:
Subject: Re: Avoid CommandCounterIncrement in RI trigger when INSERT INTO referencing table