Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad |
Date | |
Msg-id | 20220301215848.e5dy3wmnfyxx2n27@alap3.anarazel.de Whole thread Raw |
In response to | Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad
|
List | pgsql-hackers |
Hi, On 2022-03-02 06:46:23 +1300, Thomas Munro wrote: > > I do think it's worth giving that sleep a proper wait event though, even in the back branches. > > I'm thinking that 0002 should be back-patched all the way, but 0001 > could be limited to 14. No strong opinion on back to where to backpatch. It'd be nice to have a proper wait event everywhere, but especially < 12 it looks different enough to be some effort. > From a9344bb2fb2a363bec4be526f87560cb212ca10b Mon Sep 17 00:00:00 2001 > From: Thomas Munro <thomas.munro@gmail.com> > Date: Mon, 28 Feb 2022 11:27:05 +1300 > Subject: [PATCH v2 1/3] Wake up for latches in CheckpointWriteDelay(). LGTM. Would be nice to have this fixed soon, even if it's just to reduce test times :) > From 1eb0266fed7ccb63a2430e4fbbaef2300f2aa0d0 Mon Sep 17 00:00:00 2001 > From: Thomas Munro <thomas.munro@gmail.com> > Date: Tue, 1 Mar 2022 11:38:27 +1300 > Subject: [PATCH v2 2/3] Fix waiting in RegisterSyncRequest(). LGTM. > From 50060e5a0ed66762680ddee9e30acbad905c6e98 Mon Sep 17 00:00:00 2001 > From: Thomas Munro <thomas.munro@gmail.com> > Date: Tue, 1 Mar 2022 17:34:43 +1300 > Subject: [PATCH v2 3/3] Use condition variable to wait when sync request queue > is full. > > Previously, in the (hopefully) rare case that we need to wait for the > checkpointer to create space in the sync request queue, we'd enter a > sleep/retry loop. Instead, create a condition variable so the > checkpointer can wake us up whenever there is a transition from 'full' > to 'not full'. > @@ -1076,10 +1078,11 @@ RequestCheckpoint(int flags) > * to perform its own fsync, which is far more expensive in practice. It > * is theoretically possible a backend fsync might still be necessary, if > * the queue is full and contains no duplicate entries. In that case, we > - * let the backend know by returning false. > + * let the backend know by returning false, or if 'wait' is true, then we > + * wait for space to become available. > */ > bool > -ForwardSyncRequest(const FileTag *ftag, SyncRequestType type) > +ForwardSyncRequest(const FileTag *ftag, SyncRequestType type, bool wait) > { > CheckpointerRequest *request; > bool too_full; > @@ -1101,9 +1104,9 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type) > * backend will have to perform its own fsync request. But before forcing > * that to happen, we can try to compact the request queue. > */ > - if (CheckpointerShmem->checkpointer_pid == 0 || > - (CheckpointerShmem->num_requests >= CheckpointerShmem->max_requests && > - !CompactCheckpointerRequestQueue())) > + if (CheckpointerShmem->num_requests >= CheckpointerShmem->max_requests && > + !CompactCheckpointerRequestQueue() && > + !wait) Bit confused about the addition of the wait parameter / removal of the CheckpointerShmem->checkpointer_pid check. What's that about? > + /* > + * If we still don't have enough space and the caller asked us to wait, > + * wait for the checkpointer to advertise that there is space. > + */ > + if (CheckpointerShmem->num_requests >= CheckpointerShmem->max_requests) > + { > + ConditionVariablePrepareToSleep(&CheckpointerShmem->requests_not_full_cv); > + while (CheckpointerShmem->num_requests >= > + CheckpointerShmem->max_requests) > + { > + LWLockRelease(CheckpointerCommLock); > + ConditionVariableSleep(&CheckpointerShmem->requests_not_full_cv, > + WAIT_EVENT_FORWARD_SYNC_REQUEST); > + LWLockAcquire(CheckpointerCommLock, LW_EXCLUSIVE); > + } > + ConditionVariableCancelSleep(); > + } Could there be a problem with a lot of backends trying to acquire CheckpointerCommLock in exclusive mode? I'm inclined to think it's rare enough to not worry. Greetings, Andres Freund
pgsql-hackers by date: