On Mon, Feb 6, 2023 at 8:15 PM hubert depesz lubaczewski
<depesz@depesz.com> wrote:
>
> On Mon, Feb 06, 2023 at 05:25:42PM +0900, Masahiko Sawada wrote:
> > Based on the analysis we did[1][2], I've created the manual scenario
> > to reproduce this issue with the attached patch and the script.
> >
> > The scenario.md explains the basic steps to reproduce this issue. It
> > consists of 13 steps (very tricky!!). It's not sophisticated and could
> > be improved. test.sh is the shell script I used to execute the
> > reproduction steps from 1 to 10. In my environment, I could reproduce
> > this issue by the following steps.
> >
> > 1. apply the patch and build PostgreSQL.
> > 2. run test.sh.
> > 3. execute the step 11 and later described in scenario.md.
> >
> > The test.sh is a very hacky and dirty script and is optimized in my
> > environment (especially adding many sleeps). You might need to adjust
> > it while checking scenario.md.
> >
> > I've also confirmed that this issue is fixed by the attached patch,
> > which clears candidate_restart_lsn and friends during
> > ReplicationSlotRelease().
>
> Hi,
> one important question - do I patch newer Pg, or older? The thing is
> that we were able to replicate the problem (with some luck) only on
> production databases, and patching them will be hard sell. Maybe
> possible, but if it's enough to patch the pg14 (recipient) it would make
> my life much easier.
Unfortunately, the patch I attached is for the publisher (i.e., sender
side). There might be a way to fix this issue from the receiver side
but I have no idea for now.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com