Re: Question about InvalidatePossiblyObsoleteSlot() - Mailing list pgsql-hackers
| From | Masahiko Sawada |
|---|---|
| Subject | Re: Question about InvalidatePossiblyObsoleteSlot() |
| Date | |
| Msg-id | CAD21AoB_C6V1PLNs=SuOejgGh1o6ZHJMstD7X4X1b_z==LdH1Q@mail.gmail.com Whole thread Raw |
| In response to | Re: Question about InvalidatePossiblyObsoleteSlot() (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) |
| Responses |
Re: Question about InvalidatePossiblyObsoleteSlot()
|
| List | pgsql-hackers |
On Thu, Oct 23, 2025 at 3:07 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Wed, Oct 22, 2025 at 02:18:33PM +0530, Amit Kapila wrote:
> > On Mon, Oct 20, 2025 at 11:41 AM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > Yeah so 818fefd8fd4 is well suited for tests consistency but in some rare cases
> > > it invalidates a slot while it would be safe not to do so.
> > >
> > > OTOH it looks to me that the initial pre-818fefd8fd4 intend was to invalidate
> > > the slot as per this comment:
> > >
> > > "
> > > /*
> > > * Re-acquire lock and start over; we expect to invalidate the
> > > * slot next time (unless another process acquires the slot in the
> > > * meantime).
> > > */
> > > "
> > >
> >
> > The comment doesn't indicate the intent that we will invalidate the
> > slot after re-acquiring the lock even when the new conditions don't
> > warrant the slot to be invalidated. The comment could be improved
> > though.
>
> Thanks for looking at it and clarifying this point. In the attached I try
> to improve the comment.
>
> > > The fact that it could move forward far enough before we terminate the
> > > process holding the slot is a race condition due to the fact that we released
> > > the mutex.
> > >
> > > If the above looks right to you then 818fefd8fd4 is doing what was "initially"
> > > expected, do you agree?
> > >
> >
> > Based on the discussion and points presented in this thread, I don't
> > agree. I feel we should restore behavior prior to 818fefd8fd4
>
> Done in the attached.
+1 to change the behavior to pre-818fefd8fd4. Here are some review
comments for the v2 patch:
- if (initial_restart_lsn != InvalidXLogRecPtr &&
- initial_restart_lsn < oldestLSN)
+ XLogRecPtr restart_lsn = s->data.restart_lsn;
+
+ if (restart_lsn != InvalidXLogRecPtr &&
+ restart_lsn < oldestLSN)
I would suggest using XLogRecPtrIsInvalid() for better readability.
---
/*
- * The slot's mutex will be released soon, and it is possible that
- * those values change since the process holding the slot has been
- * terminated (if any), so record them here to ensure that we
- * would report the correct invalidation cause.
- *
* Unlike other slot attributes, slot's inactive_since can't be
* changed until the acquired slot is released or the owning
* process is terminated. So, the inactive slot can only be
* invalidated immediately without being terminated.
*/
- if (!terminated)
- {
- initial_restart_lsn = s->data.restart_lsn;
- initial_effective_xmin = s->effective_xmin;
- initial_catalog_effective_xmin = s->effective_catalog_xmin;
- }
invalidation_cause = DetermineSlotInvalidationCause(possible_causes,
s, oldestLSN,
dboid,
After deleting the first paragraph of the comments, the comment starts
with "Unlike other slot attributes.." seems no longer match what this
block of codes do. It needs to be updated or moved to a more
appropriate place.
>
> > and fix
> > the test case which relies on different messages.
>
> I think that 105b2cb3361 fixed it already or do you have something else in
> mind?
IIUC the test instability issue was caused by the fact that
RUNNING_XACTS record is decoded and catalog_xmin gets advanced between
terminating the slot owner process and invalidating the slot. The
issue was fixed by 105b2cb3361 by stopping the writing of
RUNNING_XACTS by injection points. 818fefd8fd4 was meant to fix the
inconsistency that the reason of process termination could be
different from the reason of the subsequent slot invalidation.
Therefore, the test instability issue is already cleared, but I think
we might want to do something to deal with the inconsistency that we
originally wanted to address.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: