Re: WAL segments removed from primary despite the fact that logical replication slot needs it. - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: WAL segments removed from primary despite the fact that logical replication slot needs it.
Date
Msg-id CAA4eK1KcnTvwrVqmpRTEMpyarBeTxwr8KA+kaveQOiqJ0zYsXA@mail.gmail.com
Whole thread Raw
In response to Re: WAL segments removed from primary despite the fact that logical replication slot needs it.  (Andres Freund <andres@anarazel.de>)
Responses Re: WAL segments removed from primary despite the fact that logical replication slot needs it.
List pgsql-bugs
On Tue, Nov 22, 2022 at 1:38 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2022-11-21 19:56:20 +0530, Amit Kapila wrote:
> > I think this problem could arise when walsender exits due to some
> > error like "terminating walsender process due to replication timeout".
> > Here is the theory I came up with:
> >
> > 1. Initially the restart_lsn is updated to 1039D/83825958. This will
> > allow all files till 000000000001039D00000082 to be removed.
> > 2. Next the slot->candidate_restart_lsn is updated to a 1039D/8B5773D8.
> > 3. walsender restarts due to replication timeout.
> > 4. After restart, it starts reading WAL from 1039D/83825958 as that
> > was restart_lsn.
> > 5. walsender gets a message to update write, flush, apply, etc. As
> > part of that, it invokes
> > ProcessStandbyReplyMessage->LogicalConfirmReceivedLocation.
> > 6. Due to step 5, the restart_lsn is updated to 1039D/8B5773D8 and
> > replicationSlotMinLSN will also be computed to the same value allowing
> > to remove of all files older than 000000000001039D0000008A. This will
> > allow removing 000000000001039D00000083, 000000010001039D00000084,
> > etc.
>
> This would require that the client acknowledged an LSN that we haven't
> sent out, no?  Shouldn't the
>   MyReplicationSlot->candidate_restart_valid <= lsn
> from LogicalConfirmReceivedLocation() prevented this from happening
> unless the client acknowledges up to candidate_restart_valid?
>

We would have sent the higher LSN prior to the restart. Let me share a
snippet of LOG which I have extracted from a large LOG depesz has
shared me offlist after my request:

2022-11-10 21:03:23.478+00,upgrayedd,canvas,21748,10.1.238.101:35640,636d671b.54f4,4,idle,2022-11-10
21:03:23+00,7/0,0,DEBUG,00000,received replication command:
IDENTIFY_SYSTEM,,,,,,,,,focal14
2022-11-10 21:03:23.479+00,upgrayedd,canvas,21748,10.1.238.101:35640,636d671b.54f4,5,idle,2022-11-10
21:03:23+00,7/0,0,DEBUG,00000,"received replication command:
START_REPLICATION SLOT ""focal14"" LOGICAL 1039D/91F41B50
(proto_version '1', publication_names '""focal14""')",,,,,,,,,focal14
2022-11-10 21:03:23.481+00,upgrayedd,canvas,21748,10.1.238.101:35640,636d671b.54f4,6,idle,2022-11-10
21:03:23+00,7/0,0,LOG,00000,"starting logical decoding for slot
""focal14""","Streaming transactions committing after 1039D/911A8DB0,
reading WAL from 1039D/83825958.",,,,,"START_REPLICATION SLOT
""focal14"" LOGICAL 1039D/91F41B50 (proto_version '1',
publication_names '""focal14""')",,,focal14
2022-11-10 21:03:23.481+00,upgrayedd,canvas,21748,10.1.238.101:35640,636d671b.54f4,7,idle,2022-11-10
21:03:23+00,7/0,0,DEBUG,00000,sending replication
keepalive,,,,,,,,,focal14
2022-11-10 21:03:23.484+00,upgrayedd,canvas,21748,10.1.238.101:35640,636d671b.54f4,8,idle,2022-11-10
21:03:23+00,7/0,0,LOG,00000,logical decoding found consistent point at
1039D/83825958,Logical decoding will begin using saved
snapshot.,,,,,"START_REPLICATION SLOT ""focal14"" LOGICAL
1039D/91F41B50 (proto_version '1', publication_names
'""focal14""')",,,focal14
2022-11-10 21:03:23.484+00,upgrayedd,canvas,21748,10.1.238.101:35640,636d671b.54f4,9,idle,2022-11-10
21:03:23+00,7/0,0,DEBUG,00000,write 1039D/91F41B50 flush
1039D/91F41B50 apply 1039D/91F41B50 reply_time 2022-11-10
21:03:23.482052+00,,,,,,,,,focal14
2022-11-10 21:03:23.495+00,upgrayedd,canvas,21748,10.1.238.101:35640,636d671b.54f4,10,idle,2022-11-10
21:03:23+00,7/0,0,DEBUG,00000,updated xmin: 1 restart:
1,,,,,,,,,focal14

Here, we can see that even though we have started reading WAL from
1039D/83825958, the confiremed_flush location is 1039D/911A8DB0. Then
we get a client reply (write 1039D/91F41B50 flush 1039D/91F41B50)
probably for a keepalive message. The flush LSN here is higher than
candidate_restart_valid. The value of candidate_restart_valid before
walsender restart was 1039D/9170B010 and candidate_restart_lsn was
1039D/8B5773D8.

>
> > 7. Now, we got new slot->candidate_restart_lsn as 1039D/83825958.
> > Remember from step 1, we are still reading WAL from that location.
>
> I don't think LogicalIncreaseRestartDecodingForSlot() would do anything
> in that case, because of the
>         /* don't overwrite if have a newer restart lsn */
> check.
>

But later in LogicalIncreaseRestartDecodingForSlot(), we have the
below check which can allow to override it.
/*
* Only increase if the previous values have been applied, otherwise we
* might never end up updating if the receiver acks too slowly. A missed
* value here will just cause some extra effort after reconnecting.
*/
if (slot->candidate_restart_valid == InvalidXLogRecPtr)

Do we need else if here instead of if? I think that would save us from
retreating the restart_lsn value. We seem to have "else if" in the
corresponding case in the function LogicalIncreaseXminForSlot():
/*
* Only increase if the previous values have been applied, otherwise we
* might never end up updating if the receiver acks too slowly.
*/
else if (slot->candidate_xmin_lsn == InvalidXLogRecPtr)

-- 
With Regards,
Amit Kapila.



pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: WAL segments removed from primary despite the fact that logical replication slot needs it.
Next
From: Chris Rohlfs
Date:
Subject: Sort Order inconsistent when using Grouping Sets Rollup