Re: sync_standbys_defined read/write race on startup - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: sync_standbys_defined read/write race on startup |
Date | |
Msg-id | Z_csIqEv6dhZ77jz@paquier.xyz Whole thread Raw |
In response to | Re: sync_standbys_defined read/write race on startup ("Maksim.Melnikov" <m.melnikov@postgrespro.ru>) |
Responses |
Re: sync_standbys_defined read/write race on startup
|
List | pgsql-hackers |
On Wed, Apr 09, 2025 at 06:21:50PM +0300, Maksim.Melnikov wrote: > Test 041_checkpoint_at_promote.pl is really good example > of using injection points, but anyway, I still don't see > how injection points can help us here. In failed test case > we started postgres, immediately open psql connection and commit > prepared transaction. Postgres, on start, before accepting > connections, fork checkpointer and checkpointer initialize > shared parameter sync_standbys_defined in CheckpointerMain, > before process loop. So, in most attempts, we can't call > injection_points_attach in psql before > macro INJECTION_POINT() code will be executed in > checkpointer(I mean INJECTION_POINT() code instead sleep()), > bacause checkpointer process is forking before database > system is ready to accept connections. Confirmed. One thing where it would be possible to make things work is to introduce some persistency of the injection points, say these are flushed at shutdown. We could do that without touching at the backend code and only in the module injection_points, but perhaps this thread is not enough to justify this extra complexity. > The manual execution of checkpoint can't help us too. > > P.S. > sorry for style errors, I am new in postgres, so can miss > some rules. Anyway, to avoid any community rules mistakes, > attached patch with correct naming. No worries, we are all here to learn, as am I. + sync_standbys_defined = WalSndCtl->sync_standbys_defined & SYNCSTANDBYSINITIALIZED + ? (WalSndCtl->sync_standbys_defined & SYNCSTANDBYSDEFINED) : SyncStandbysDefined(); So this is the core point of the patch: if the sync standby data has not been initialized, we try the only thing we can do with a check on the GUC value. Then: - If the GUC has no value, then we know for sure that there is no point to wait, so the patch skips the wait. - If the GUC has a value, it may be possible that we have to wait but we are not sure yet without letting the checkpointer process the GUC and update the information in shared memory, so we decide that it's better to wait and let the checkpointer process the setup until it wakes up the queue, relying on the fact that there should be no processes that wait as long as the GUC is not set. This is mentioned in your commit message but there is close to zero explanation about the reason of this choice, and even worse this fact is hidden to the reader of the code because of the overly-complex setup around sync_standbys_defined and the fact that there is no comment to document this fact. +/* + * WalSndCtlData->sync_standbys_defined bit flags. 7th bit check standbys_defined or not, + * 8th bit check WalSndCtlData->sync_standbys_defined field has been initialized. + */ +#define SYNCSTANDBYSINITIALIZED (1 << 0) +#define SYNCSTANDBYSDEFINED (1 << 1) + +#define SET_SYNCSTANDBYS_INITIALIZED(sync_standbys) \ + (sync_standbys | SYNCSTANDBYSINITIALIZED) +#define SET_SYNCSTANDBYS_DEFINED(sync_standbys) \ + (sync_standbys | SYNCSTANDBYSINITIALIZED | SYNCSTANDBYSDEFINED) +#define RESET_SYNCSTANDBYS_DEFINED(sync_standbys) \ + (SET_SYNCSTANDBYS_INITIALIZED(sync_standbys) & (~SYNCSTANDBYSDEFINED)) +#define IS_SYNCSTANDBYS_INITIALIZED_ONLY(sync_standbys) \ + ((sync_standbys & (SYNCSTANDBYSINITIALIZED | SYNCSTANDBYSDEFINED)) == SYNCSTANDBYSINITIALIZED) + The use of the flags is a good idea to protect the atomicity of the lookup in the fast path, but hiding that in a set of macros, with some of them being used in only one location reduces the readability of the code in walsender.c and syncrep.c because we would need to do a permanent mapping with what walsender_private.h uses. The reset one depends on the set one, which is also confusing. A couple of comments in walsender.c and syncrep.c also need a refresh around the manipulations of sync_standbys_defined. Also, the name "sync_standbys_defined" is not adapted anymore, as it stores a status of the data in shared memory associated to the sync standbys, so I've switched that to sync_standbys_status, with bits8 as type so as we have the same size as a boolean. I have taken a stab at all that, finishing with the attached. The patch has a pg_usleep() that I have used to stress the CI. That should be removed in the final result, obviously. Side note: I am pretty sure that this issue has been the origin of some spurious failures in the buildfarm. I cannot point my finger to a particular one now, but it would not go unnoticed especially on the slow animals.. What do you think? -- Michael
Attachment
pgsql-hackers by date: