Re: BUG #17903: There is a bug in the KeepLogSeg() - Mailing list pgsql-bugs

From Junwang Zhao
Subject Re: BUG #17903: There is a bug in the KeepLogSeg()
Date
Msg-id CAEG8a3L2BB5k9USO4fC_wQjULGX-1rqaX2JSHn1_0vco8KB28Q@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17903: There is a bug in the KeepLogSeg()  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: BUG #17903: There is a bug in the KeepLogSeg()
List pgsql-bugs
If `segno` can be larger than `currSegNo`, your patch seems to miss
the following branch,
are you missing this for some particular reason?

```
/* but, keep at least wal_keep_size if that's set */
if (wal_keep_size_mb > 0)
{
    uint64 keep_segs;

    keep_segs = ConvertToXSegs(wal_keep_size_mb, wal_segment_size);
    if (currSegNo - segno < keep_segs)           <<<<  see here
    {
        /* avoid underflow, don't go below 1 */
        if (currSegNo <= keep_segs)
            segno = 1;
        else
            segno = currSegNo - keep_segs;
        }
}
```

On Thu, Apr 20, 2023 at 11:04 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Wed, 19 Apr 2023 10:26:13 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in
> > I found that KeepLogSeg() has a piece of code that is not correctly.
> >
> > segno may be larger than currSegNo, since the slot_keep_segs variable is of
> > type "uint64", in this case the code "if (currSegNo - segno >
> > slot_keep_segs)" is incorrect.
> >
> > "if (currSegNo - segno < keep_segs)" is also the same.
> >
> > Checkpoint calls the KeepLogSeg function, and there are many operations
> > between recptr and XLogGetReplicationSlotMinimumLSN, including updating the
> > pg_control file, so segno may be larger than currSegNo.
>
> Correct. Thanks for the report.
>
> If checkpointer somehow takes a long time between inserting a
> checkpoint record and removing WAL files, while replication advances a
> certain distnace, it can actually happen. Although that behavior
> doesn't directly affect max_slot_wal_keep_size, it does disrupt the
> effect of wal_keep_size.
>
> The thinko was that we incorrectly assumed the slot minimum LSN can't
> be larger than the checkpoint record LSN. We don't need to consider
> max_slot_wal_keep_size if the slot minimum LSN is already larger than
> currSegNo.
>
> The attached fix works. However, I can't come up with a reasonable
> testing script.
>
> This dates back to 13, where max_slot_wal_keep_size was introduced.
>
> regards.
>
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center



--
Regards
Junwang Zhao



pgsql-bugs by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: pg_basebackup: errors on macOS on directories with ".DS_Store" files
Next
From: Daniel Gustafsson
Date:
Subject: Re: pg_basebackup: errors on macOS on directories with ".DS_Store" files