Re: pgsql: Prevent invalidation of newly synced replication slots. - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: pgsql: Prevent invalidation of newly synced replication slots.
Date
Msg-id CAA4eK1LhMuxYdf6aR+UZuxdp7+SJUT_4Mf9yz7eiXdY1VB0Z+g@mail.gmail.com
Whole thread Raw
In response to Re: pgsql: Prevent invalidation of newly synced replication slots.  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: pgsql: Prevent invalidation of newly synced replication slots.
Re: pgsql: Prevent invalidation of newly synced replication slots.
List pgsql-hackers
On Wed, Jan 28, 2026 at 4:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jan 28, 2026 at 11:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > It is not clear to me either why the similar test like
> > 040_standby_failover_slots_sync is successful and
> > 046_checkpoint_logical_slot is failing. I am still thinking about it
> > but thought of sharing the information I could gather by debugging.
> >
>
> It seems there is some interaction with previous test in same file
> which is causing this failure as we are using the primary node from
> previous test. When I tried to comment out get_changes and its
> corresponding injection_point in the previous test as attached, the
> entire test passed. I think if we use a freshly created primary node,
> this test will pass but I wanted to spend some more time to see
> how/why previous test is causing this issue?
>

I noticed that the previous test didn't quitted the background psql
session used for concurrent checkpoint. By quitting that background
session, the test passed for me consistently. See attached. It is
written in comments atop background_psql: "Be sure to "quit" the
returned object when done with it.". Now, this background session
doesn't directly access the backup_label file but it could be
accessing one of the parent directories where backup_label is present.
One of gen-AI says as follows: "In Windows, MoveFileEx (Error 32:
ERROR_SHARING_VIOLATION) can fail if a process is accessing the file's
parent directory in a way that creates a lock. While the error message
usually points to the file itself, the parent folder is a critical
part of the operation.". I admit that I don't know the internals of
MoveFileEx, so can't say with complete conviction but the attached
sounds like a reasonable fix. Can anyone else who can reproduce the
issue once test the attached patch and share the results?

Does this fix/theory sound plausible?

--
With Regards,
Amit Kapila.

Attachment

pgsql-hackers by date:

Previous
From: Álvaro Herrera
Date:
Subject: Re: [PATCH] Add max_logical_replication_slots GUC
Next
From: Matheus Alcantara
Date:
Subject: Re: [PATCH] llvmjit: always add the simplifycfg pass