On Tue, Feb 16, 2021 at 3:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> After further thinking on this problem and some off-list discussions
> with Ajin, there appears to be another way to solve the above problem
> by which we can avoid resending the prepare after restart if it has
> already been processed by the subscriber. The main reason why we were
> not able to distinguish between the two cases ((a) prepare happened
> before SNAPBUILD_CONSISTENT state but commit prepared happened after
> we reach SNAPBUILD_CONSISTENT state and (b) prepare is already
> decoded, successfully processed by the subscriber and we have
> restarted the decoding) is that we can re-use the serialized snapshot
> at LSN location prior to Prepare of some concurrent WALSender after
> the restart. Now, if we ensure that we don't use serialized snapshots
> for decoding via slots where two_phase decoding option is enabled then
> we won't have that problem. The drawback is that in some cases it can
> take a bit more time for initial snapshot building but maybe that is
> better than the current solution.
Based on this suggestion, I have created a patch on HEAD which now
does not allow repeated decoding
of prepared transactions. For this, the code now enforces
full_snapshot if two-phase decoding is enabled.
Do have a look at the patch and see if you have any comments.
Currently one problem with this, as you have also mentioned in your
last mail, is that if initially two-phase is disabled in
test_decoding while
decoding prepare (causing the prepared transaction to not be decoded)
and later enabled after the commit prepared (where it assumes that the
transaction was decoded at prepare time), then the transaction is not
decoded at all. For eg:
postgres=# begin;
BEGIN
postgres=*# INSERT INTO do_write DEFAULT VALUES;
INSERT 0 1
postgres=*# PREPARE TRANSACTION 'test1';
PREPARE TRANSACTION
postgres=# SELECT data FROM
pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit',
'0');
data
------
(0 rows)
postgres=# commit prepared 'test1';
COMMIT PREPARED
postgres=# SELECT data FROM
pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit',
'1');
data
-------------------------
COMMIT PREPARED 'test1' (1 row)
1st pg_logical_slot_get_changes is called with two-phase-commit off,
2nd is called with two-phase-commit on. You can see that the
transaction is not decoded at all.
For this, I am planning to change the semantics such that
two-phase-commit can only be specified while creating the slot using
pg_create_logical_replication_slot()
and not in pg_logical_slot_get_changes, thus preventing
two-phase-commit flag from being toggled between restarts of the
decoder. Let me know if anybody objects to this
change, else I will update that in the next patch.
regards,
Ajin Cherian
Fujitsu Australia