Re: Missing LWLock protection in pgstat_reset_replslot() - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Missing LWLock protection in pgstat_reset_replslot()
Date
Msg-id Z1FSOj66GaJUTbVi@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Missing LWLock protection in pgstat_reset_replslot()  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
Hi,

On Thu, Dec 05, 2024 at 01:26:38PM +0900, Michael Paquier wrote:
> On Wed, Dec 04, 2024 at 03:20:03PM +0000, Bertrand Drouvot wrote:
> > I need to think more about it but it seems to me that those values make sense,
> > so maybe we should drop the entry for this particular case (shmem_exit()).
> 
> The values reported in the central hash table make sense to me.

Yeah.

> What
> does not is that we could hold in the local cache of the process doing
> the peeks references to stats from a different slot when doing drops
> and creates that would reuse the same stats key.
> 
> I have added in the backends some logs to get into the details of the
> sequence with this specific test regarding the generation markup and
> the refcount (feel free to use the 0002 attached, grep for "key
> 4/0/0" in your logs by replaying the test), and here some details
> based on the two permutations in the test sent:

Agree.

I had created about the same 0002 and was seeing the exact same behavior that
lead to my previous message. BTW, that's a good idea to share your 0002, I'll keep
in mind to do the same in the future (for ease of discussion).

> The assertion is right to complain here at shutdown when writing the
> stats: we shouldn't try to drop this entry.

Yes.

> I've spent a few hours
> checking the whole, and we are doing the right things with
> gc_request_count, pgStatSharedRefAge and the local cache refresh, as
> I've thought first that we were not aggressive with this part of the
> cache resets.  The problem, as far as I can see, is that
> pgstat_gc_entry_refs() did not get the call that it needs to think
> about refreshing its local references when an entry updates its
> generation number; it only checks if the entry has been dropped, the
> generation check is equally important after a reinit because the local
> reference is also outdated.

I had the same diagnostic but did not reach the "what should be done" yet.

I agree with the proposal and with the fact that 818119afcc did miss to use
the "generation" in pgstat_gc_entry_refs() to also discard reinitialized entries:

-               if (!entry_ref->shared_entry->dropped)
+               /*
+                * "generation" checks for the case of entries being reinitialized, and
+                * "dropped" for the case where these are..  dropped.
+                */
+               if (!entry_ref->shared_entry->dropped &&
+                       pg_atomic_read_u32(&entry_ref->shared_entry->generation) ==
+                       entry_ref->generation)

Yeah that seems the right thing to do for me too.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: generic plans and "initial" pruning
Next
From: Yugo NAGATA
Date:
Subject: Re: Doc: clarify the log message level of the VERBOSE option