Re: Question about InvalidatePossiblyObsoleteSlot() - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Question about InvalidatePossiblyObsoleteSlot()
Date
Msg-id CAA4eK1Kvj+XAWoLFx7ipFqKY3EB+htAF_dfyACZVW4UEbyX1bw@mail.gmail.com
Whole thread Raw
In response to Re: Question about InvalidatePossiblyObsoleteSlot()  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
On Mon, Oct 20, 2025 at 11:41 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Fri, Oct 17, 2025 at 03:08:07PM -0700, Masahiko Sawada wrote:
> > On Fri, Oct 17, 2025 at 12:18 AM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
>
> > > do you think that's also safe to not
> > > invalidate the slots for effective_xmin and catalog_effective_xmin if they
> > > advance far enough?
> >
> > I find the same in xmin cases. ResolveRecoveryConflictWithSnapshot()
> > is called only during the recovery by the startup process, and it also
> > tries to invalidate possibly obsolete slots. Catalog tuples are not
> > removed as long as the startup calls
> > ResolveRecoveryConflictWithSnapshot() before actually removing the
> > tuples and it's busy in InvalidatePossiblyObsoleteSlot().
>
> I looked more closely at the xmin related cases and I agree with the above.
>
> > I might be
> > missing something but this is the reason why I'm confused with the
> > 818fefd8fd4 fix and the proposed change.
>
> Yeah so 818fefd8fd4 is well suited for tests consistency but in some rare cases
> it invalidates a slot while it would be safe not to do so.
>
> OTOH it looks to me that the initial pre-818fefd8fd4 intend was to invalidate
> the slot as per this comment:
>
> "
> /*
>  * Re-acquire lock and start over; we expect to invalidate the
>  * slot next time (unless another process acquires the slot in the
>  * meantime).
>  */
> "
>

The comment doesn't indicate the intent that we will invalidate the
slot after re-acquiring the lock even when the new conditions don't
warrant the slot to be invalidated. The comment could be improved
though.

> The fact that it could move forward far enough before we terminate the
> process holding the slot is a race condition due to the fact that we released
> the mutex.
>
> If the above looks right to you then 818fefd8fd4 is doing what was "initially"
> expected, do you agree?
>

Based on the discussion and points presented in this thread, I don't
agree. I feel we should restore behavior prior to 818fefd8fd4 and fix
the test case which relies on different messages.

> If so, then maybe it's fine to keep 818fefd8fd4 as is?
>

I don't think so.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Add wal_fpi_bytes_[un]compressed to pg_stat_wal
Next
From: Daniel Gustafsson
Date:
Subject: Re: CI: Add task that runs pgindent