Re: Minimal logical decoding on standbys - Mailing list pgsql-hackers

From Amit Khandekar
Subject Re: Minimal logical decoding on standbys
Date
Msg-id CAJ3gD9fwvgXO9L+gcoqj-XNuHxFR+iw10GiuoB7ytnUVWMeXeg@mail.gmail.com
Whole thread Raw
In response to Re: Minimal logical decoding on standbys  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Responses Re: Minimal logical decoding on standbys
List pgsql-hackers
On Fri, 8 Mar 2019 at 20:59, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Mon, 4 Mar 2019 at 14:09, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> >
> > On Fri, 14 Dec 2018 at 06:25, Andres Freund <andres@anarazel.de> wrote:
> > > I've a prototype attached, but let's discuss the details in a separate
> > > thread. This also needs to be changed for pluggable storage, as we don't
> > > know about table access methods in the startup process, so we can't call
> > > can't determine which AM the heap is from during
> > > btree_xlog_delete_get_latestRemovedXid() (and sibling routines).
> >
> > Attached is a WIP test patch
> > 0003-WIP-TAP-test-for-logical-decoding-on-standby.patch that has a
> > modified version of Craig Ringer's test cases
>
> Hi Andres,
>
> I am trying to come up with new testcases to test the recovery
> conflict handling. Before that I have some queries :
>
> With Craig Ringer's approach, the way to reproduce the recovery
> conflict was, I believe, easy : Do a checkpoint, which will log the
> global-catalog-xmin-advance WAL record, due to which the standby -
> while replaying the message - may find out that it's a recovery
> conflict. But with your approach, the latestRemovedXid is passed only
> during specific vacuum-related WAL records, so to reproduce the
> recovery conflict error, we need to make sure some specific WAL
> records are logged, such as XLOG_BTREE_DELETE. So we need to create a
> testcase such that while creating an index tuple, it erases dead
> tuples from a page, so that it eventually calls
> _bt_vacuum_one_page()=>_bt_delitems_delete(), thus logging a
> XLOG_BTREE_DELETE record.
>
> I tried to come up with this reproducible testcase without success.
> This seems difficult. Do you have an easier option ? May be we can use
> some other WAL records that may have easier more reliable test case
> for showing up recovery conflict ?
>

I managed to get a recovery conflict by :
1. Setting hot_standby_feedback to off
2. Creating a logical replication slot on standby
3. Creating a table on master, and insert some data.
2. Running : VACUUM FULL;

This gives WARNING messages in the standby log file.
2019-03-14 14:57:56.833 IST [40076] WARNING:  slot decoding_standby w/
catalog xmin 474 conflicts with removed xid 477
2019-03-14 14:57:56.833 IST [40076] CONTEXT:  WAL redo at 0/3069E98
for Heap2/CLEAN: remxid 477

But I did not add such a testcase into the test file, because with the
current patch, it does not do anything with the slot; it just keeps on
emitting WARNING in the log file; so we can't test this scenario as of
now using the tap test.


> Further, with your patch, in ResolveRecoveryConflictWithSlots(), it
> just throws a WARNING error level; so the wal receiver would not make
> the backends throw an error; hence the test case won't catch the
> error. Is that right ?

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Next
From: MikalaiKeida@ibagroup.eu
Date:
Subject: RE: Timeout parameters