Re: SyncRepGetSyncStandbysPriority() vs. SIGHUP - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: SyncRepGetSyncStandbysPriority() vs. SIGHUP
Date
Msg-id 20200207.125251.146972241588695685.horikyota.ntt@gmail.com
Whole thread Raw
In response to SyncRepGetSyncStandbysPriority() vs. SIGHUP  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
At Wed, 5 Feb 2020 23:45:52 -0800, Noah Misch <noah@leadboat.com> wrote in 
> Buildfarm runs have triggered the assertion at the end of
> SyncRepGetSyncStandbysPriority():
..
> On my development system, this delay injection reproduces the failure:
> 
> --- a/src/backend/replication/syncrep.c
> +++ b/src/backend/replication/syncrep.c
> @@ -399,6 +399,8 @@ SyncRepInitConfig(void)
>  {
>     int         priority;
>  
> +   pg_usleep(100 * 1000);

Though I couldn't see that, but actually that can happen.

That happens if:

  all potentially-sync standbys have lower priority than the number of
  syncrep list members.

  the number of sync standbys is short.

  the number of the potentially-sync standbys is enough.

If all of the above are true, the while (priority <-= lowest_priority)
doesn't loops and goes into the assertion.

> SyncRepInitConfig() is the function responsible for updating, after SIGHUP,
> the sync_standby_priority values that SyncRepGetSyncStandbysPriority()
> consults.  The assertion holds if each walsender's sync_standby_priority (in
> shared memory) accounts for the latest synchronous_standby_names GUC value.
> That ceases to hold for brief moments after a SIGHUP that changes the
> synchronous_standby_names GUC value.

Agreed.

> I think the way to fix this is to nominate one process to update all
> sync_standby_priority values after SIGHUP.  That process should acquire
> SyncRepLock once per ProcessConfigFile(), not once per walsender.  If
> walsender startup occurs at roughly the same time as a SIGHUP, the new
> walsender should avoid computing sync_standby_priority based on a GUC value
> different from the one used for the older walsenders.

If we update the priority of all walsenders at once, the other
walsenders may calculate required LSN using the old configuration with
the new priority. I'm not sure the probability of happening this, but
that causes similar situation.

The priority calcuation happens for every standby repliy. So if
there're some standbys with wrong priority, They will catch up at
receiving the next standby reply. Thus just asuuming walsenders with
out-of-range priority as non-sync standbys can "fix" it, I believe.
pg_stat_get_wal_senders() reveals such inconsistent state but I don't
think it's not worth addressing.

> Would anyone like to fix this?  I could add it to my queue, but it would wait
> a year or more.

The attached does that. Isn't it enough?

# The more significant problem is I haven't suceeded to replay the
# problem..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center
From ac080100b692700688054630d77190d63027798c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 7 Feb 2020 12:40:56 +0900
Subject: [PATCH] Fix handling of tentative state after chainging
 synchronous_standby_names

When reloading config after updating synchronous_standby_names,
walsenders don't recalculate their sync-priority exactly at the same
time. SyncRepGetSyncStandbysPriority may fail under such inconsistent
situation.  Fix that by assuming a walsender as non-sync if it has a
priority number that is impossible against the
synchronous_standby_names.
---
 src/backend/replication/syncrep.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index c284103b54..2689421bcd 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -842,6 +842,16 @@ SyncRepGetSyncStandbysPriority(bool *am_sync)
         if (this_priority == 0)
             continue;
 
+        /*
+         * Walsenders don't reload configuration at the same time. A walsender
+         * may have out-of-range priority before updating by the reload. Treat
+         * them the same way as non-syncronous standbys.  Priority is going to
+         * be fixed soon and we can work with consistent priorities at the next
+         * standby-reply message.
+         */
+        if (this_priority > lowest_priority)
+            continue;
+        
         /* Must have a valid flush position */
         if (XLogRecPtrIsInvalid(flush))
             continue;
-- 
2.18.2


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Expose lock group leader pid in pg_stat_activity
Next
From: Amit Kapila
Date:
Subject: Re: typos in comments and user docs