Re: sync_standbys_defined read/write race on startup - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: sync_standbys_defined read/write race on startup
Date
Msg-id Z_csIqEv6dhZ77jz@paquier.xyz
Whole thread Raw
In response to Re: sync_standbys_defined read/write race on startup  ("Maksim.Melnikov" <m.melnikov@postgrespro.ru>)
Responses Re: sync_standbys_defined read/write race on startup
List pgsql-hackers
On Wed, Apr 09, 2025 at 06:21:50PM +0300, Maksim.Melnikov wrote:
> Test 041_checkpoint_at_promote.pl is really good example
> of using injection points, but anyway, I still don't see
> how injection points can help us here. In failed test case
> we started postgres, immediately open psql connection and commit
> prepared transaction. Postgres, on start, before accepting
> connections, fork checkpointer and checkpointer initialize
> shared parameter sync_standbys_defined in CheckpointerMain,
> before process loop. So, in most attempts, we can't call
> injection_points_attach in psql before
> macro INJECTION_POINT() code will be executed in
> checkpointer(I mean INJECTION_POINT() code instead sleep()),
> bacause checkpointer process is forking before database
> system is ready to accept connections.

Confirmed.  One thing where it would be possible to make things work
is to introduce some persistency of the injection points, say these
are flushed at shutdown.  We could do that without touching at the
backend code and only in the module injection_points, but perhaps this
thread is not enough to justify this extra complexity.

> The manual execution of checkpoint can't help us too.
>
> P.S.
> sorry for style errors, I am new in postgres, so can miss
> some rules. Anyway, to avoid any community rules mistakes,
> attached patch with correct naming.

No worries, we are all here to learn, as am I.

+    sync_standbys_defined = WalSndCtl->sync_standbys_defined & SYNCSTANDBYSINITIALIZED
+        ? (WalSndCtl->sync_standbys_defined & SYNCSTANDBYSDEFINED) : SyncStandbysDefined();

So this is the core point of the patch: if the sync standby data has
not been initialized, we try the only thing we can do with a check on
the GUC value.  Then:
- If the GUC has no value, then we know for sure that there is no
point to wait, so the patch skips the wait.
- If the GUC has a value, it may be possible that we have to wait but
we are not sure yet without letting the checkpointer process the GUC
and update the information in shared memory, so we decide that it's
better to wait and let the checkpointer process the setup until it
wakes up the queue, relying on the fact that there should be no
processes that wait as long as the GUC is not set.

This is mentioned in your commit message but there is close to zero
explanation about the reason of this choice, and even worse this fact
is hidden to the reader of the code because of the overly-complex
setup around sync_standbys_defined and the fact that there is no
comment to document this fact.

+/*
+ * WalSndCtlData->sync_standbys_defined bit flags. 7th bit check standbys_defined or not,
+ * 8th bit check WalSndCtlData->sync_standbys_defined field has been initialized.
+ */
+#define SYNCSTANDBYSINITIALIZED    (1 << 0)
+#define SYNCSTANDBYSDEFINED        (1 << 1)
+
+#define SET_SYNCSTANDBYS_INITIALIZED(sync_standbys)    \
+    (sync_standbys | SYNCSTANDBYSINITIALIZED)
+#define SET_SYNCSTANDBYS_DEFINED(sync_standbys)    \
+    (sync_standbys | SYNCSTANDBYSINITIALIZED | SYNCSTANDBYSDEFINED)
+#define RESET_SYNCSTANDBYS_DEFINED(sync_standbys)    \
+    (SET_SYNCSTANDBYS_INITIALIZED(sync_standbys) & (~SYNCSTANDBYSDEFINED))
+#define IS_SYNCSTANDBYS_INITIALIZED_ONLY(sync_standbys)    \
+    ((sync_standbys & (SYNCSTANDBYSINITIALIZED | SYNCSTANDBYSDEFINED)) == SYNCSTANDBYSINITIALIZED)
+

The use of the flags is a good idea to protect the atomicity of the
lookup in the fast path, but hiding that in a set of macros, with some
of them being used in only one location reduces the readability of the
code in walsender.c and syncrep.c because we would need to do a
permanent mapping with what walsender_private.h uses.  The reset one
depends on the set one, which is also confusing.

A couple of comments in walsender.c and syncrep.c also need a refresh
around the manipulations of sync_standbys_defined.

Also, the name "sync_standbys_defined" is not adapted anymore, as it
stores a status of the data in shared memory associated to the sync
standbys, so I've switched that to sync_standbys_status, with bits8 as
type so as we have the same size as a boolean.

I have taken a stab at all that, finishing with the attached.  The
patch has a pg_usleep() that I have used to stress the CI.  That
should be removed in the final result, obviously.

Side note: I am pretty sure that this issue has been the origin of
some spurious failures in the buildfarm.  I cannot point my finger to
a particular one now, but it would not go unnoticed especially on the
slow animals..

What do you think?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: n_ins_since_vacuum stats for aborted transactions
Next
From: jian he
Date:
Subject: Re: speedup COPY TO for partitioned table.