Re: Fix race condition in InvalidatePossiblyObsoleteSlot() - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Fix race condition in InvalidatePossiblyObsoleteSlot()
Date
Msg-id ZdMHi/yKAd+fJBjd@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Fix race condition in InvalidatePossiblyObsoleteSlot()  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Hi,

On Mon, Feb 19, 2024 at 03:14:07PM +0900, Michael Paquier wrote:
> On Thu, Feb 15, 2024 at 11:24:59AM +0000, Bertrand Drouvot wrote:
> > Thanks for looking at it!
> > 
> > Agree, using an assertion instead in v3 attached.
> 
> And you did not forget the PG_USED_FOR_ASSERTS_ONLY.

Thanks to the "CompilerWarnings" cirrus test ;-)

> 
> > > It seems important to document why we are saving this data here; while
> > > we hold the LWLock for repslots, before we perform any termination,
> > > and we  want to protect conflict reports with incorrect data if the
> > > slot data got changed while the lwlock is temporarily released while
> > > there's a termination.
> > 
> > Yeah, comments added in v3.
> 
> The contents look rather OK, I may do some word-smithing for both.

Thanks!

> >> For example just after the kill()?  It seems to me
> >> that we should stuck the checkpointer, perform a forced upgrade of the
> >> xmins, release the checkpointer and see the effects of the change in
> >> the second loop.  Now, modules/injection_points/ does not allow that,
> >> yet, but I've posted a patch among these lines that can stop a process
> >> and release it using a condition variable (should be 0006 on [1]).  I
> >> was planning to start a new thread with a patch posted for the next CF
> >> to add this kind of facility with a regression test for an old bug,
> >> the patch needs a few hours of love, first.  I should be able to post
> >> that next week.
> > 
> > Great, that looks like a good idea!
> 
> I've been finally able to spend some time on what I had in mind and
> posted it here for the next CF:
> https://www.postgresql.org/message-id/ZdLuxBk5hGpol91B@paquier.xyz
> 
> You should be able to reuse that the same way I do in 0002 posted on
> the thread, where I'm causing the checkpointer to wait, then wake it
> up.

Thanks! I'll look at it.

> > Are we going to fix this on master and 16 stable first and then later on add a
> > test on master with the injection points?
> 
> Still, the other patch is likely going to take a couple of weeks
> before getting into the tree, so I have no objection to fix the bug
> first and backpatch, then introduce a test.  Things have proved that
> failures could show up in the buildfarm in this area, so more time
> running this code across two branches is not a bad concept, either.

Fully agree.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Nazir Bilal Yavuz
Date:
Subject: Re: Show WAL write and fsync stats in pg_stat_io
Next
From: Michael Paquier
Date:
Subject: Re: Injection points: some tools to wait and wake