Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers

From vignesh C
Subject Re: Introduce XID age and inactive timeout based replication slot invalidation
Date
Msg-id CALDaNm2mwkVFLfe8pLcU1W5Oy1vRr1Wzp53XGV08kr4Z2=SJpA@mail.gmail.com
Whole thread Raw
In response to Re: Introduce XID age and inactive timeout based replication slot invalidation  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Thu, 7 Nov 2024 at 15:33, Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> On Mon, Sep 16, 2024 at 3:31 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Please find the attached v46 patch having changes for the above review
> > comments and your test review comments and Shveta's review comments.
> >
> Hi,
>
> I’ve reviewed this thread and am interested in working on the
> remaining tasks and comments, as well as the future review comments.
> However, Bharath, please let me know if you'd prefer to continue with
> it.
>
> Attached the rebased v47 patch, which also addresses Peter’s comments
> #2, #3, and #4 at [1]. I will try addressing other comments as well in
> next versions.

The following crash occurs while upgrading:
2024-11-13 14:19:45.955 IST [44539] LOG:  checkpoint starting: time
TRAP: failed Assert("!(*invalidated && SlotIsLogical(s) &&
IsBinaryUpgrade)"), File: "slot.c", Line: 1793, PID: 44539
postgres: checkpointer (ExceptionalCondition+0xbb)[0x555555e305bd]
postgres: checkpointer (+0x63ab04)[0x555555b8eb04]
postgres: checkpointer
(InvalidateObsoleteReplicationSlots+0x149)[0x555555b8ee5f]
postgres: checkpointer (CheckPointReplicationSlots+0x267)[0x555555b8f125]
postgres: checkpointer (+0x1f3ee8)[0x555555747ee8]
postgres: checkpointer (CreateCheckPoint+0x78f)[0x5555557475ee]
postgres: checkpointer (CheckpointerMain+0x632)[0x555555b2f1e7]
postgres: checkpointer (postmaster_child_launch+0x119)[0x555555b30892]
postgres: checkpointer (+0x5e2dc8)[0x555555b36dc8]
postgres: checkpointer (PostmasterMain+0x14bd)[0x555555b33647]
postgres: checkpointer (+0x487f2e)[0x5555559dbf2e]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7ffff6c29d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7ffff6c29e40]
postgres: checkpointer (_start+0x25)[0x555555634c25]
2024-11-13 14:19:45.967 IST [44538] LOG:  checkpointer process (PID
44539) was terminated by signal 6: Aborted

This can happen in the following case:
1) Setup a logical replication cluster with enough data so that it
will take at least few minutes to upgrade
2) Stop the publisher node
3) Configure replication_slot_inactive_timeout and checkpoint_timeout
to 30 seconds
4) Upgrade the publisher node.

This is happening because logical replication slots are getting
invalidated during upgrade and there is an assertion which checks that
the slots are not invalidated.
I feel this can be fixed by having a function similar to
check_max_slot_wal_keep_size which will make sure that
replication_slot_inactive_timeout is 0 during upgrade.

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Commit Timestamp and LSN Inversion issue
Next
From: Nisha Moond
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation