Re: allow online change primary_conninfo - Mailing list pgsql-hackers

From Sergei Kornilov
Subject Re: allow online change primary_conninfo
Date
Msg-id 19414261544518748@iva1-a2ffb02749cf.qloud-c.yandex.net
Whole thread Raw
In response to Re: allow online change primary_conninfo  (Michael Paquier <michael@paquier.xyz>)
Responses Re: allow online change primary_conninfo  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Hello

thank you for review!

> but I think that this patch should document clearly that if
> primary_conninfo or primary_slot_name are changed then the WAL receiver
> is stopped immediately.
Good idea, will change.

>     /*
> - * replication slot name; is also used for walreceiver to connect with the
> - * primary
> + * replication slot name used by runned walreceiver
>      */
> Not sure that there is much point in updating those comments, because
> it still has the same meaning in the new context.
"is also used" seems outdated for me. With proposed patch this field means only currently active slot, not "also".
Well, depends on RequestXLogStreaming discussion

> -RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
> - const char *slotname)
> +RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr)
> That's perhaps a matter of taste, but keeping the current API as-is
> looks cleaner to me, particularly because those fields get copied into
> WalRcv, so I'd rather not change what is done in
> WaitForWALToBecomeAvailable and keep the patch at its simplest shape.
Unfortunally here is race condition if i leave RequestXLogStreaming infrastructure as-is and walreceiver will use this
structfor startup.
 
This case:
- startup process calls RequestXLogStreaming and fill WalRcv conninfo/slotname
- reload primary_conninnfo by SIGHUP
- walreceiver start with new GUC but trying conninfo/slotname from WalRcv struct. If walreceiver is able to connect -
itwill not restart till another error or primary_conninfo/slotname will be changed again (SIGHUP without change is not
enough).

I already had such failures while testing this patch.

If walreceiver will use primary_conninfo GUC at startup - i see no reason to leave *conninfo in RequestXLogStreaming.
Theseparameters will be misleading, we request them, but not using for anything.
 

> +$node_standby_2->reload; # should have effect without restart
> This does not wait for the change to be effective, so I think that you
> introduce a race condition for slow machines with this test, making it
> unstable.
No, here is no race condition because just after this line we wait this replication slot in upstream
pg_catalog.pg_replication_slots(get_slot_xmins routine).
 
Before reload we have no replication slot and should reconnect here with replication slot. I can add some comment
here.

regards, Sergei


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] Can ICU be used for a database's default sort order?
Next
From: Dmitry Dolgov
Date:
Subject: Re: Pluggable Storage - Andres's take