Re: Question about InvalidatePossiblyObsoleteSlot() - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Question about InvalidatePossiblyObsoleteSlot()
Date
Msg-id aQCFm2LeKOu1lRs0@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Question about InvalidatePossiblyObsoleteSlot()  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Question about InvalidatePossiblyObsoleteSlot()
List pgsql-hackers
Hi,

On Mon, Oct 27, 2025 at 10:22:32AM -0700, Masahiko Sawada wrote:
> On Thu, Oct 23, 2025 at 3:07 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Done in the attached.
> 
> +1 to change the behavior to pre-818fefd8fd4.

Thanks for looking at it!

> Here are some review
> comments for the v2 patch:
> 
> -       if (initial_restart_lsn != InvalidXLogRecPtr &&
> -           initial_restart_lsn < oldestLSN)
> +       XLogRecPtr  restart_lsn = s->data.restart_lsn;
> +
> +       if (restart_lsn != InvalidXLogRecPtr &&
> +           restart_lsn < oldestLSN)
> 
> I would suggest using XLogRecPtrIsInvalid() for better readability.

Yeah and I think that should be done consistently, see the proposal in [1].

> ---
>             /*
> -            * The slot's mutex will be released soon, and it is possible that
> -            * those values change since the process holding the slot has been
> -            * terminated (if any), so record them here to ensure that we
> -            * would report the correct invalidation cause.
> -            *
>              * Unlike other slot attributes, slot's inactive_since can't be
>              * changed until the acquired slot is released or the owning
>              * process is terminated. So, the inactive slot can only be
>              * invalidated immediately without being terminated.
>              */
> -           if (!terminated)
> -           {
> -               initial_restart_lsn = s->data.restart_lsn;
> -               initial_effective_xmin = s->effective_xmin;
> -               initial_catalog_effective_xmin = s->effective_catalog_xmin;
> -           }
> 
>             invalidation_cause = DetermineSlotInvalidationCause(possible_causes,
>                                                                 s, oldestLSN,
>                                                                 dboid,
> 
> After deleting the first paragraph of the comments, the comment starts
> with "Unlike other slot attributes.."

Yeah, it has been added in ac0e33136abc.

> seems no longer match what this
> block of codes do.

Agree.

> It needs to be updated or moved to a more
> appropriate place.

What about moving it after?

"
 * If the slot can be acquired, do so and mark it invalidated
 * immediately.  Otherwise we'll signal the owning process, below, and
 * retry."

That looks like a good place to me.

> 
> >
> > > and fix
> > > the test case which relies on different messages.
> >
> > I think that 105b2cb3361 fixed it already or do you have something else in
> > mind?
> 
> IIUC the test instability issue was caused by the fact that
> RUNNING_XACTS record is decoded and catalog_xmin gets advanced between
> terminating the slot owner process and invalidating the slot. The
> issue was fixed by 105b2cb3361 by stopping the writing of
> RUNNING_XACTS by injection points.

Right.

> 818fefd8fd4 was meant to fix the
> inconsistency that the reason of process termination could be
> different from the reason of the subsequent slot invalidation.

Right.

> Therefore, the test instability issue is already cleared,

Yes.

> but I think
> we might want to do something to deal with the inconsistency that we
> originally wanted to address.

I see, you mean that the tests are stable now (thanks to 105b2cb3361) but
that we should still do something for "production" cases? (i.e not making use
of injection points).

[1]: https://www.postgresql.org/message-id/flat/aQB7EvGqrbZXrMlg@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Kirill Reshke
Date:
Subject: Re: on_error table, saving error info to a table
Next
From: Peter Eisentraut
Date:
Subject: Re: [PATCH] Check that index can return in get_actual_variable_range()