Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
Date
Msg-id CA+TgmoZVmJX1+QTWw2tSnPHrnkwKZxC3ZsRynFB-Fpzm1Oxuhw@mail.gmail.com
Whole thread Raw
In response to Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name  (Andres Freund <andres@anarazel.de>)
Responses Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name  (Andres Freund <andres@anarazel.de>)
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Thu, Jan 10, 2019 at 7:42 PM Andres Freund <andres@anarazel.de> wrote:
> I still think this whole direction of accessing the GUC in walreceiver
> is a bad idea and shouldn't be pursued further.  There's definite
> potential for startup process and WAL receiver having different states
> of GUCs, the code doesn't get meaningfully simpler, the GUC value checks
> in walreceiver make for horrible reporting up the chain.

I'm trying to assess the validity of this complaint.  It seems to me
that it might advance the discussion to be more clear about what the
problem is here.  When pg_ctl reload is executed (or the equivalent),
different backends reload the file at different times.  Currently,
because the values are explicitly passed down from the startup process
to the walreceiver, it's guaranteed that they will both use the same
value.  With this change, one might see an older value and the other
the current value.  So there's room to be nervous about code like
this:

                    if (PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0)
                    {
... assorted other code ...
                        RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
                                             PrimarySlotName);
                        receivedUpto = 0;
                    }

With the patch, the PrimaryConnInfo and PrimarySlotName arguments are
removed from RequestXLogStreaming.  That means that the new
walreceiver could come to a different conclusion about the values of
those arguments than the startup process.  In particular, it could end
up thinking that primary_conninfo is empty when, if the startup
process had thought that, the walreceiver would never have been
launched in the first place.  But it's not obvious that you've added
any logic in WALReceiverMain or elsewhere to compensate for this
possibility -- what would happen in that case?  Would we crash?
Connect to the wrong server?

I might be wrong here, but I'm inclined to think that this scenario
hasn't really been contemplated carefully by the patch authors.  There
are no added TAP tests for the scenario where the values differ
between the two processes, no code comments which explain why it's OK
if that happens, really no mention of it in the patch at all.  And on
that basis I'm inclined to think that Andres is really quite correct
to be worried about this.  The problem he's talking about here is very
low-probability because the race condition is narrow, but it's real,
and it surely needs to be handled somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: Ryu floating point output patch
Next
From: Robert Haas
Date:
Subject: Re: Ryu floating point output patch