Re: WAL segments removed from primary despite the fact that logical replication slot needs it. - Mailing list pgsql-bugs

From Masahiko Sawada
Subject Re: WAL segments removed from primary despite the fact that logical replication slot needs it.
Date
Msg-id CAD21AoAC7NvyFvTfDxy-bK6Dvnm8D=-TG0c7B6==W5vhA0dtxA@mail.gmail.com
Whole thread Raw
In response to Re: WAL segments removed from primary despite the fact that logical replication slot needs it.  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: WAL segments removed from primary despite the fact that logical replication slot needs it.
List pgsql-bugs
On Tue, Nov 15, 2022 at 6:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Nov 15, 2022 at 4:55 PM hubert depesz lubaczewski
> <depesz@depesz.com> wrote:
> >
> > On Tue, Nov 15, 2022 at 11:20:19AM +0900, Masahiko Sawada wrote:
> > > On Fri, Nov 11, 2022 at 11:50 PM hubert depesz lubaczewski
> > > <depesz@depesz.com> wrote:
> > > >
> > > > 2022-11-10 20:59:55 UTC  103A7/7B666388      focal14    pgoutput  logical    16607   canvas    f          t
 10414       \N    3241434855    1039D/83825958  1039D/8DECBA18 
> > > > 2022-11-10 21:00:54 UTC  103A7/88225C28      focal14    pgoutput  logical    16607   canvas    f          t
 10414       \N    3241434855    1039D/83825958  1039D/911A8DB0 
> > > > 2022-11-10 21:01:53 UTC  103A7/9E77EF90      focal14    pgoutput  logical    16607   canvas    f          f
 \N          \N    3241434855    1039D/83825958  1039D/911A8DB0 
> > > > 2022-11-10 21:02:52 UTC  103A7/B45ECEA8      focal14    pgoutput  logical    16607   canvas    f          f
 \N          \N    3241434855    1039D/83825958  1039D/911A8DB0 
> > > > 2022-11-10 21:03:52 UTC  103A7/C4355F60      focal14    pgoutput  logical    16607   canvas    f          t
 21748       \N    3241443528    1039D/83825958  1039D/955633D0 
> > > > 2022-11-10 21:04:51 UTC  103A7/D5A6FDD8      focal14    pgoutput  logical    16607   canvas    f          t
 21748       \N    3241443528    1039D/83825958  1039D/96453F38 
> > > > 2022-11-10 21:05:50 UTC  103A7/E2C7F590      focal14    pgoutput  logical    16607   canvas    f          f
 \N          \N    3241443528    1039D/83825958  1039D/96453F38 
> > > > 2022-11-10 21:06:49 UTC  103A7/F28B2000      focal14    pgoutput  logical    16607   canvas    f          f
 \N          \N    3241443528    1039D/83825958  1039D/96453F38 
> > > > 2022-11-10 21:07:48 UTC  103A8/3828D20       focal14    pgoutput  logical    16607   canvas    f          f
 \N          \N    3241443528    1039D/83825958  1039D/96453F38 
> > > > 2022-11-10 21:08:47 UTC  103A8/15BBF088      focal14    pgoutput  logical    16607   canvas    f          f
 \N          \N    3241443528    1039D/83825958  1039D/96453F38 
> > >
> > > It happened twice that the walsender process was active for at least
> > > one minute while not advancing restart_lsn and exited. Could you share
> > > the server logs from around 2022-11-10 20:59:55 to 21:08:47 so that we
> > > can find out what the wal sender processes were doing?
> >
> > That will be ~ 270MB. Not sure how that would fly, plus that's awfully
> > lot of logs to check if I don't leak any sensitive info.
> >
> > But, if I'll remove lines that are generated by walsender to pg12
> > replicas (streaming replication to bionic/pg12 replicas), the logfile is
> > only ~ 3MB. Still a bit too much, as there can be some queries with
> > sensitive info.
> >
> > Perhaps I can grep something out?
> >
> > > Also, you may have logs for "failed to increase restart lsn: proposed
> > > %X/%X, after %X/%X, current candidate %X/%X, current after %X/%X,
> > > flushed up to %X/%X" and "got new restart lsn %X/%X at %X/%X". If so,
> > > please share it as well.
> >
> > Found one in the time frame you mentioned:
> > 2022-11-10 21:03:24.612 UTC,"upgrayedd","canvas",21748,"10.1.238.101:35640",636d671b.54f4,39,"idle",2022-11-10
21:03:23UTC,7/0,0,DEBUG,00000,"failed to increase restart lsn: proposed 1039D/8B5773D8, after 1039D/9170B010, current
candidate1039D/83825958, current after 1039D/8B5773D8, flushed up to 1039D/91F41B50",,,,,,,,,"focal14" 
> >
>
> Thanks!
>
> LSN 1039D/8B5773D8 seems to be related to this issue. If we advance
> slot's restart_lsn to this LSN, we remove WAL files older than
> 000000000001039D0000008A.
>
> In LogicalIncreaseRestartDecodingForSlot(), since
> "current_lsn(1039D/9170B010) <
> slot->data.confirmed_flush(1039D/91F41B50)", we executed the following
> part and called LogicalConfirmReceivedLocation():
>
>     else if (current_lsn <= slot->data.confirmed_flush)
>     {
>         slot->candidate_restart_valid = current_lsn;
>         slot->candidate_restart_lsn = restart_lsn;
>
>         /* our candidate can directly be used */
>         updated_lsn = true;
>     }
>
> Then in LogicalConfirmReceivedLocation(), IIUC we executed the following code:
>
>        if (MyReplicationSlot->candidate_restart_valid != InvalidXLogRecPtr &&
>            MyReplicationSlot->candidate_restart_valid <= lsn)
>        {
>            Assert(MyReplicationSlot->candidate_restart_lsn !=
> InvalidXLogRecPtr);
>
>            MyReplicationSlot->data.restart_lsn =
> MyReplicationSlot->candidate_restart_lsn;
>            MyReplicationSlot->candidate_restart_lsn = InvalidXLogRecPtr;
>            MyReplicationSlot->candidate_restart_valid = InvalidXLogRecPtr;
>            updated_restart = true;
>        }
>
> So you should have logs for "updated xmin: %u restart: %u", could you check it?
>
> After the above codes, we do:
>
>         SpinLockRelease(&MyReplicationSlot->mutex);
>
>         /* first write new xmin to disk, so we know what's up after a crash */
>         if (updated_xmin || updated_restart)
>         {
>             ReplicationSlotMarkDirty();
>             ReplicationSlotSave();
>             elog(DEBUG1, "updated xmin: %u restart: %u", updated_xmin,
> updated_restart);
>         }
>
> Is something like the following scenario possible to happen?
>
> 1. wal sender updates slot's restart_lsn and releases the spin lock
> (not saved in the disk yet)
> 2. someone updates slots' minimum restart_lsn (note that slot's
> restart_lsn in memory is already updated).
> 3. checkpointer removes WAL files older than the minimum restart_lsn
> calculated at step 2.
> 4. wal sender restarts for some reason (or server crashed).
> 5. after the restarts, wal sender acquires the slot but the slot's
> restart_lsn is old.
> 6. WAL files required for restart the decoding are already removed.

I'm not sure that the reported issue happened in the above scenario
but I've confirmed this scenario happened on my environment by adding
sleeps and killing the wal sender process. After the server restarts,
pg_replication_slots.restart_lsn retreated from 0/5390F78 to
0/1659FC0, and the walsender raised an error "requested WAL segment
000000010000000000000001 has already been removed".

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-bugs by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: WAL segments removed from primary despite the fact that logical replication slot needs it.
Next
From: PG Bug reporting form
Date:
Subject: BUG #17688: The Database Cluster Initialization Failed