Home > mailing lists

Re: USE_BARRIER_SMGRRELEASE on Linux? - Mailing list pgsql-hackers

From	Nathan Bossart
Subject	Re: USE_BARRIER_SMGRRELEASE on Linux?
Date	February 16, 2022 21:00:53
Msg-id	20220216210053.GA3031150@nathanxps13 Whole thread Raw
In response to	Re: USE_BARRIER_SMGRRELEASE on Linux? (Andres Freund <andres@anarazel.de>)
Responses	Re: USE_BARRIER_SMGRRELEASE on Linux?
List	pgsql-hackers

Tree view

On Wed, Feb 16, 2022 at 11:27:31AM -0800, Andres Freund wrote:
> Did you check whether this is a problem recently introduced or long-lived?

I've reproduced it back to v9.3.  I'm assuming it's much older than that.

> Does USE_BARRIER_SMGRRELEASE actually prevent this problem? Or was it just
> that it influences the timing in a way that makes it less likely?

I think it just influences the timing.  I believe the WAL pre-allocation
stuff triggers the issue because it adds a step between
AbsorbSyncRequests() and incrementing the started counter.

> ISTM that the problem is partially caused by having multiple "checkpoint"
> counters that are used in different ways, but then only waiting for one of
> them. I wonder whether a better approach could be to either unify the
> different counters, or to request / wait for the sync counter specifically?
> 
> Historically the sync stuff was something in md.c that the global system
> didn't really know anything about, but now it's a "proper" subsystem, so we
> can change the separation of concerns a bit more.

If the counters were unified, I think we might still need to do an extra
aborb after incrementing it, and we'd need to make sure that all of those
requests were tagged with the previous counter value so that they are
processed in the current checkpoint.  If callers requested with a specific
counter value, they might need to lock the counter (ckpt_lck) when making
requests.  Maybe that is okay.

>> Here's a patch that adds a call to AbsorbSyncRequests() in
>> CheckpointerMain() instead of SyncPreCheckpoint().
> 
> That doesn't strike me as great architecturally. E.g. in theory the same
> problem could exist in single user mode. I think it doesn't today, because
> RegisterSyncRequest() will effectively "absorb" it immediately, but that kind
> of feels like an implementation detail?

Yeah, maybe that is a reason to add an absorb somewhere within
CreateCheckPoint() instead, like v1 [0] does.  Then the extra absorb would
be sufficient for single-user mode if the requests were not absorbed
immediately.

[0]
https://www.postgresql.org/message-id/attachment/130994/v1-0001-call-AbsorbSyncRequests-before-advancing-checkpoi.patch

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Andrew Dunstan
Date: 16 February 2022, 20:46:28
Subject: killing perl2host

From: Andres Freund
Date: 16 February 2022, 21:01:41
Subject: Re: killing perl2host

Re: USE_BARRIER_SMGRRELEASE on Linux? - Mailing list pgsql-hackers

Previous

Next