Re: Track replica origin progress for Rollback Prepared - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Track replica origin progress for Rollback Prepared
Date
Msg-id CAA4eK1KWu+wGj6Puo+OJG4ZuAM=fuJooQTrvXjNu5Xu_=j6hCw@mail.gmail.com
Whole thread Raw
In response to Re: Track replica origin progress for Rollback Prepared  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Track replica origin progress for Rollback Prepared  (Ajin Cherian <itsajin@gmail.com>)
List pgsql-hackers
On Wed, Jan 6, 2021 at 5:18 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jan 05, 2021 at 04:24:21PM +0530, Amit Kapila wrote:
> > There are already tests [1] in one of the upcoming patches for logical
> > decoding of 2PC which covers this code using which I have found this
> > problem. So, I thought those would be sufficient. I have not checked
> > the feasibility of using test_decoding because I thought adding more
> > using test_decoding will unnecessarily duplicate the tests.
>
> Hmm.  This stuff does not check after replication origins even if it
> stresses 2PC, so that looks incomplete when seen from here.
>

I think it does. Let me try to explain in a bit more detail.
Internally, the apply worker uses replication origins to track the
progress of apply, see the code near
ApplyWorkerMain->replorigin_create. We will store the progress (WAL
LSN) for each commit (prepared)/ rollback prepared with this origin.
If the server crashes and restarts, we will use the origin's LSN as
the start decoding point (the subscriber sends the last LSN to the
publisher). The bug here is that after restart the origin was not
advanced for rollback prepared which I have fixed with this patch.

Now, let us see how the tests mentioned by me cover this code. In the
first test (check that 2PC gets replicated to subscriber then ROLLBACK
PREPARED), we do below on publisher and wait for it to be applied on
the subscriber.
BEGIN;
INSERT INTO tab_full VALUES (12);
PREPARE TRANSACTION 'test_prepared_tab_full';
ROLLBACK PREPARED 'test_prepared_tab_full';

Note that we would have WAL logged the LSN (replication_origin_lsn)
corresponding to ROLLBACK PREPARED on the subscriber during apply.
Now, in the second test(Check that ROLLBACK PREPARED is decoded
properly on crash restart (publisher and subscriber crash)), we
prepare a transaction and crash the server. After the restart, because
we have not advanced the replication origin in the recovery of
Rollback Prepared, the subscriber won't consider that transaction has
been applied so it again requests that transaction.

Actually speaking, we don't need the second test to reproduce this
exact problem, if we would have restarted after the first test the
problem would be reproduced but I was consistent getting the problem
so with the current way tests are written. However, we can change it
slightly to restart after the first test if we want.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Track replica origin progress for Rollback Prepared
Next
From: Bharath Rupireddy
Date:
Subject: Re: New Table Access Methods for Multi and Single Inserts