On Mon, Oct 20, 2025 at 11:41 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Fri, Oct 17, 2025 at 03:08:07PM -0700, Masahiko Sawada wrote:
> > On Fri, Oct 17, 2025 at 12:18 AM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
>
> > > do you think that's also safe to not
> > > invalidate the slots for effective_xmin and catalog_effective_xmin if they
> > > advance far enough?
> >
> > I find the same in xmin cases. ResolveRecoveryConflictWithSnapshot()
> > is called only during the recovery by the startup process, and it also
> > tries to invalidate possibly obsolete slots. Catalog tuples are not
> > removed as long as the startup calls
> > ResolveRecoveryConflictWithSnapshot() before actually removing the
> > tuples and it's busy in InvalidatePossiblyObsoleteSlot().
>
> I looked more closely at the xmin related cases and I agree with the above.
>
> > I might be
> > missing something but this is the reason why I'm confused with the
> > 818fefd8fd4 fix and the proposed change.
>
> Yeah so 818fefd8fd4 is well suited for tests consistency but in some rare cases
> it invalidates a slot while it would be safe not to do so.
>
> OTOH it looks to me that the initial pre-818fefd8fd4 intend was to invalidate
> the slot as per this comment:
>
> "
> /*
> * Re-acquire lock and start over; we expect to invalidate the
> * slot next time (unless another process acquires the slot in the
> * meantime).
> */
> "
>
The comment doesn't indicate the intent that we will invalidate the
slot after re-acquiring the lock even when the new conditions don't
warrant the slot to be invalidated. The comment could be improved
though.
> The fact that it could move forward far enough before we terminate the
> process holding the slot is a race condition due to the fact that we released
> the mutex.
>
> If the above looks right to you then 818fefd8fd4 is doing what was "initially"
> expected, do you agree?
>
Based on the discussion and points presented in this thread, I don't
agree. I feel we should restore behavior prior to 818fefd8fd4 and fix
the test case which relies on different messages.
> If so, then maybe it's fine to keep 818fefd8fd4 as is?
>
I don't think so.
--
With Regards,
Amit Kapila.