Thread: Making WAL receiver startup rely on GUC context for primary_conninfoand primary_slot_name
Making WAL receiver startup rely on GUC context for primary_conninfoand primary_slot_name
From
Michael Paquier
Date:
Hi all, Since 2dedf4d, recovery.conf is dead and all recovery parameters are now GUCs. While looking at a patch to switch primary_conninfo and primary_slot_name to be reloadable, Sergei Kornilov had a very good point that the WAL receiver uses a connection string and a physical slot name based on what the startup process wants the WAL receiver to use: https://www.postgresql.org/message-id/20181212043208.GI17695@paquier.xyz It seems to me that doing so is now strange as the WAL receiver knows about the GUC context, and actually it knows the parameters it should use, so there is no point to pass down the values when requesting the WAL receiver to start. What do you think about the attached to simplify the logic? Even if primary_conninfo and primary_slot_name are not switched to SIGHUP this cleanup looks like a good thing to me. Thoughts? -- Michael
Attachment
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
From
Andres Freund
Date:
On December 11, 2018 9:30:42 PM PST, Michael Paquier <michael@paquier.xyz> wrote: >Hi all, > >Since 2dedf4d, recovery.conf is dead and all recovery parameters are >now >GUCs. While looking at a patch to switch primary_conninfo and >primary_slot_name to be reloadable, Sergei Kornilov had a very good >point that the WAL receiver uses a connection string and a physical >slot >name based on what the startup process wants the WAL receiver to use: >https://www.postgresql.org/message-id/20181212043208.GI17695@paquier.xyz > >It seems to me that doing so is now strange as the WAL receiver knows >about the GUC context, and actually it knows the parameters it should >use, so there is no point to pass down the values when requesting the >WAL receiver to start. > >What do you think about the attached to simplify the logic? Even if >primary_conninfo and primary_slot_name are not switched to SIGHUP this >cleanup looks like a good thing to me. I am not convinced this is a good idea. This allows the state of walrcv and startup to diverge, they could e.g. have differentconfiguration, due to differently time config reloads. And they need to communicate via shmem anyway, so there'snot a lot of complexity avoided. And I think it's good to be able to show the active connection via functions, ratherthan the one currently in pg.conf, which might be different. Andres Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Michael Paquier
Date:
On Tue, Dec 11, 2018 at 09:34:58PM -0800, Andres Freund wrote: > I am not convinced this is a good idea. This allows the state of > walrcv and startup to diverge, they could e.g. have different > configuration, due to differently time config reloads. And they need > to communicate via shmem anyway, so there's not a lot of complexity > avoided. And I think it's good to be able to show the active > connection via functions, rather than the one currently in pg.conf, > which might be different. Well, the conninfo and slot name accessible to the user are the values available only once the information of the WAL receiver in shared memory is ready to be displayed. Relying more on the GUC context has the advantage to never have in shared memory the original string and only store the clobbered one, which actually simplifies what's stored in shared memory because you can entirely remove ready_to_display (I forgot to remove that in the patch actually). If those parameters become reloadable, you actually rely only on what the param context has, and not on what the shared memory context may have or not. Removing entirely ready_to_display is quite appealing actually.. -- Michael
Attachment
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Simon Riggs
Date:
On Wed, 12 Dec 2018 at 05:35, Andres Freund <andres@anarazel.de> wrote:
--
>What do you think about the attached to simplify the logic? Even if
>primary_conninfo and primary_slot_name are not switched to SIGHUP this
>cleanup looks like a good thing to me.
I am not convinced this is a good idea. This allows the state of walrcv and startup to diverge, they could e.g. have different configuration, due to differently time config reloads.
That sounds bad, but most parameters apply to one or the other, not both.
If there are some that apply to both, then yes, coordination would be important.
It does seem likely that the new scheme will require us to look carefully at when parameters are reloaded, since the timing of reloads was never taken into account in the previous coding.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
From
Sergei Kornilov
Date:
Hello > This allows the state of walrcv and startup to diverge, they could e.g. have different configuration, due to differentlytime config reloads. So we have exactly same problem with, for example, hot_standby_feedback, right? We change hot_standby_feedback value, reload it and we can have 'show hot_standby_feedback' different to currently runningwalreceiver. But why we have nothing about hot_standby_feedback in pg_stat_get_wal_receiver()? Where is difference? regards, Sergei
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Michael Paquier
Date:
On Wed, Dec 12, 2018 at 06:55:04PM +0300, Sergei Kornilov wrote: >> This allows the state of walrcv and startup to diverge, they could >> e.g. have different configuration, due to differently time config >> reloads. > > So we have exactly same problem with, for example, hot_standby_feedback, right? > We change hot_standby_feedback value, reload it and we can have 'show > hot_standby_feedback' different to currently running walreceiver. But > why we have nothing about hot_standby_feedback in pg_stat_get_wal_receiver()? > Where is difference? The difference is in the fact that a WAL receiver uses the connection string and the slot name when establishing the connection when using START_STREAMING through the replication protocol, and hot_standby_feedback gets used depending on the context of the messages exchanged. -- Michael
Attachment
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Michael Paquier
Date:
On Wed, Dec 12, 2018 at 02:55:17PM +0900, Michael Paquier wrote: > Well, the conninfo and slot name accessible to the user are the values > available only once the information of the WAL receiver in shared memory > is ready to be displayed. Relying more on the GUC context has the > advantage to never have in shared memory the original string and only > store the clobbered one, which actually simplifies what's stored in > shared memory because you can entirely remove ready_to_display (I forgot > to remove that in the patch actually). If those parameters become > reloadable, you actually rely only on what the param context has, and > not on what the shared memory context may have or not. > > Removing entirely ready_to_display is quite appealing actually.. I have been looking at that stuff this morning, and indeed ready_for_display can be entirely removed. I think that's quite an advantage as WalRcv->conninfo only stores the connection string cloberred to hide security-sensitive fields and it never has to save the initial state which could have sensitive data, simplifying how the WAL receiver data is filled. I am adding that to the next commit fest. -- Michael
Attachment
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
From
Sergei Kornilov
Date:
Hi Not sure i can be reviewer since this was initially my proposal. I vote +1 for this patch. Code looks good and i think we have no reason to leave RequestXLogStreaming as-is. regards, Sergei
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Michael Paquier
Date:
On Fri, Dec 14, 2018 at 12:23:09PM +0300, Sergei Kornilov wrote: > Not sure i can be reviewer since this was initially my proposal. No issues from me if you wish to review the code. In my opinion, it is not a problem if you are a reviewer as I wrote the code. > I vote +1 for this patch. Code looks good and i think we have no > reason to leave RequestXLogStreaming as-is. Thanks for the input. The take on my side is to be able to remove ready_to_display from WalRcv so let's see what others think. -- Michael
Attachment
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Donald Dong
Date:
Hi Michael, Thank you for the information! > On Dec 11, 2018, at 9:55 PM, Michael Paquier <michael@paquier.xyz> wrote: > > Well, the conninfo and slot name accessible to the user are the values > available only once the information of the WAL receiver in shared memory > is ready to be displayed. Relying more on the GUC context has the > advantage to never have in shared memory the original string and only > store the clobbered one, which actually simplifies what's stored in > shared memory because you can entirely remove ready_to_display (I forgot > to remove that in the patch actually). If those parameters become > reloadable, you actually rely only on what the param context has, and > not on what the shared memory context may have or not. I wonder why do we need to have this addition? src/backend/replication/walreceiver.c + memset(walrcv->slotname, 0, NAMEDATALEN); + if (PrimarySlotName) + strlcpy((char *) walrcv->slotname, PrimarySlotName, NAMEDATALEN); src/backend/replication/walreceiver.c 288: PrimaryConnInfo == NULL || PrimaryConnInfo[0] == '\0' 291: errmsg("cannot connect to the primary server as \"primary_conninfo\" is not defined"))); 392: PrimarySlotName && PrimarySlotName[0] != '\0' src/backend/access/transam/xlog.c 11824: * If primary_conninfo is set, launch walreceiver to try 11833: PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0 I notice these patterns appear in different places. Maybe it will be better to have a common method such as pg_str_empty()? Regards, Donald Dong
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Michael Paquier
Date:
On Wed, Jan 09, 2019 at 06:00:23AM -0800, Donald Dong wrote: > I wonder why do we need to have this addition? > > src/backend/replication/walreceiver.c > + memset(walrcv->slotname, 0, NAMEDATALEN); > + if (PrimarySlotName) > + strlcpy((char *) walrcv->slotname, PrimarySlotName, NAMEDATALEN); > That's much cleaner to me to clean up the field for safety before starting the process. When requesting a WAL receiver to start, slotname and conninfo gets zeroed before doing anything, you are right that we could do without it actually. > I notice these patterns appear in different places. Maybe it will be > better to have a common method such as pg_str_empty()? That's a pattern used quite a lot for many GUCs, so that's quite separate from this patch. Perhaps it would make sense to have a macro for that purpose for GUCs, I am not sure if that's worth it though. -- Michael
Attachment
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Donald Dong
Date:
On Jan 9, 2019, at 4:47 PM, Michael Paquier <michael@paquier.xyz> wrote: > That's much cleaner to me to clean up the field for safety before > starting the process. When requesting a WAL receiver to start, > slotname and conninfo gets zeroed before doing anything, you are right > that we could do without it actually. Cool! I agree that cleaning up the field would make the logic cleaner. > That's a pattern used quite a lot for many GUCs, so that's quite > separate from this patch. Perhaps it would make sense to have a macro > for that purpose for GUCs, I am not sure if that's worth it though. Yeah. I think having such macro would make the code a bit cleaner. Tho the duplication of logic is not too significant.
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Michael Paquier
Date:
On Wed, Jan 09, 2019 at 05:38:53PM -0800, Donald Dong wrote: > On Jan 9, 2019, at 4:47 PM, Michael Paquier <michael@paquier.xyz> wrote: >> That's much cleaner to me to clean up the field for safety before >> starting the process. When requesting a WAL receiver to start, >> slotname and conninfo gets zeroed before doing anything, you are right >> that we could do without it actually. > > Cool! I agree that cleaning up the field would make the logic cleaner. I was just double-checking the code, and it is possible to remove the part from RequestXLogStreaming() which sets slotname and conninfo to '\0', so I removed that part from my local branch to simplify the code. -- Michael
Attachment
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
From
Sergei Kornilov
Date:
Hello > I was just double-checking the code, and it is possible to remove the > part from RequestXLogStreaming() which sets slotname and conninfo to > '\0', so I removed that part from my local branch to simplify the > code. Without both ready_to_display and cleaning in RequestXLogStreaming we do not show outdated PrimaryConnInfo in case walreceiverjust started, but does not connected yet? While waiting walrcv_connect for example. regards, Sergei
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Michael Paquier
Date:
On Thu, Jan 10, 2019 at 01:10:16PM +0300, Sergei Kornilov wrote: > Without both ready_to_display and cleaning in RequestXLogStreaming > we do not show outdated PrimaryConnInfo in case walreceiver just > started, but does not connected yet? While waiting walrcv_connect > for example. Good point. There is a gap between the moment the WAL receiver PID is set when the WAL receiver starts up and the moment where the different fields are reset to 0 which is not good as it could be possible that the user sees the information from the previous WAL receiver, so we should move the memset calls when the PID is set, reaching a state where the PID is alive but where there is no connection yet. The port number needs a similar treatment. Looking closer at the code, it seems to me that it would be a good thing if we update the shared memory state when the WAL receiver dies, so as not only the PID is set to 0, but all connection-related information gets the call. With all those comments I am finishing with the updated attached. Thoughts? -- Michael
Attachment
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
From
Sergei Kornilov
Date:
Hi Thank you, patch looks good for me. regards, Sergei
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Michael Paquier
Date:
On Thu, Jan 10, 2019 at 02:20:27PM +0300, Sergei Kornilov wrote: > Thank you, patch looks good for me. Thanks Sergei for the review. I have been looking at the patch again this morning with a fresh set of eyes and the thing looks in a committable shape (the GUC values for NULL checks are a bit inconsistent in the last patch by the way, so I have fixed them locally). I'll do an extra pass on it in the next couple of days and commit. Then let's move on with the reloadable portions. -- Michael
Attachment
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Andres Freund
Date:
Hi, On 2019-01-11 09:34:28 +0900, Michael Paquier wrote: > On Thu, Jan 10, 2019 at 02:20:27PM +0300, Sergei Kornilov wrote: > > Thank you, patch looks good for me. > > Thanks Sergei for the review. I have been looking at the patch again > this morning with a fresh set of eyes and the thing looks in a > committable shape (the GUC values for NULL checks are a bit > inconsistent in the last patch by the way, so I have fixed them > locally). > > I'll do an extra pass on it in the next couple of days and commit. > Then let's move on with the reloadable portions. I still think this whole direction of accessing the GUC in walreceiver is a bad idea and shouldn't be pursued further. There's definite potential for startup process and WAL receiver having different states of GUCs, the code doesn't get meaningfully simpler, the GUC value checks in walreceiver make for horrible reporting up the chain. Greetings, Andres Freund
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Michael Paquier
Date:
On Thu, Jan 10, 2019 at 04:41:47PM -0800, Andres Freund wrote: > I still think this whole direction of accessing the GUC in walreceiver > is a bad idea and shouldn't be pursued further. There's definite > potential for startup process and WAL receiver having different states > of GUCs, the code doesn't get meaningfully simpler, the GUC value checks > in walreceiver make for horrible reporting up the chain. Did you notice the set of messages from upthread? The code *gets* simpler by removing ready_to_display and the need to manipulate the non-clobbered connection string sent directly from the startup process. In my opinion that's a clear gain. We gain also the possibility to track down that a WAL receiver is started but not connected yet for monitoring tools. -- Michael
Attachment
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Andres Freund
Date:
On 2019-01-11 09:50:49 +0900, Michael Paquier wrote: > On Thu, Jan 10, 2019 at 04:41:47PM -0800, Andres Freund wrote: > > I still think this whole direction of accessing the GUC in walreceiver > > is a bad idea and shouldn't be pursued further. There's definite > > potential for startup process and WAL receiver having different states > > of GUCs, the code doesn't get meaningfully simpler, the GUC value checks > > in walreceiver make for horrible reporting up the chain. > > Did you notice the set of messages from upthread? The code *gets* > simpler by removing ready_to_display and the need to manipulate the > non-clobbered connection string sent directly from the startup > process. It's a minor simplification for code that's existed for quite a while. Not worth it.
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Michael Paquier
Date:
On Thu, Jan 10, 2019 at 04:52:53PM -0800, Andres Freund wrote: > It's a minor simplification for code that's existed for quite a > while. Not worth it. Meh, I disagree for the ready_to_display part as it has been around for a long time: commit: 9ed551e0a4fdb046d8e5ea779adfd3e184d5dc0d author: Alvaro Herrera <alvherre@alvh.no-ip.org> date: Wed, 29 Jun 2016 16:57:17 -0400 Add conninfo to pg_stat_wal_receiver I was honestly unhappy from the start with it because there was no actual way to have the WAL receiver *not* save the original connection string. In my opinion, getting rid of it is worth it because this really simplifies the way the WAL receiver data visibility is handled at SQL level and we don't finish by using again and again the same field for different purposes, consolidating the code. For the reloading part, we also make the WAL receiver rely on the GUC context and stop it if there is a change in primary_conninfo and primary_slot_name. It would be inconsistent to rely on the GUC context for one thing, and the startup process for the other one. Adding a safeguard to fail WAL receiver startup is actually more of sanity check that anything else (that could be an assertion but for forks a failure is of better benefit). At this stage, I would like to hear more opinions before doing or not doing something. Alvaro, you got involved in the introduction of ready_to_siplay. Peter, you have committed the merge of the recovery params. Perhaps you have an opinion to share? -- Michael
Attachment
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Michael Paquier
Date:
On Fri, Jan 11, 2019 at 10:09:04AM +0900, Michael Paquier wrote: > On Thu, Jan 10, 2019 at 04:52:53PM -0800, Andres Freund wrote: > > It's a minor simplification for code that's existed for quite a > > while. Not worth it. > > Meh, I disagree for the ready_to_display part as it has been around > for a long time: > commit: 9ed551e0a4fdb046d8e5ea779adfd3e184d5dc0d > author: Alvaro Herrera <alvherre@alvh.no-ip.org> > date: Wed, 29 Jun 2016 16:57:17 -0400 > Add conninfo to pg_stat_wal_receiver This should read "ready_to_display has not been around for a long time" :) -- Michael
Attachment
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Robert Haas
Date:
On Thu, Jan 10, 2019 at 7:42 PM Andres Freund <andres@anarazel.de> wrote: > I still think this whole direction of accessing the GUC in walreceiver > is a bad idea and shouldn't be pursued further. There's definite > potential for startup process and WAL receiver having different states > of GUCs, the code doesn't get meaningfully simpler, the GUC value checks > in walreceiver make for horrible reporting up the chain. I'm trying to assess the validity of this complaint. It seems to me that it might advance the discussion to be more clear about what the problem is here. When pg_ctl reload is executed (or the equivalent), different backends reload the file at different times. Currently, because the values are explicitly passed down from the startup process to the walreceiver, it's guaranteed that they will both use the same value. With this change, one might see an older value and the other the current value. So there's room to be nervous about code like this: if (PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0) { ... assorted other code ... RequestXLogStreaming(tli, ptr, PrimaryConnInfo, PrimarySlotName); receivedUpto = 0; } With the patch, the PrimaryConnInfo and PrimarySlotName arguments are removed from RequestXLogStreaming. That means that the new walreceiver could come to a different conclusion about the values of those arguments than the startup process. In particular, it could end up thinking that primary_conninfo is empty when, if the startup process had thought that, the walreceiver would never have been launched in the first place. But it's not obvious that you've added any logic in WALReceiverMain or elsewhere to compensate for this possibility -- what would happen in that case? Would we crash? Connect to the wrong server? I might be wrong here, but I'm inclined to think that this scenario hasn't really been contemplated carefully by the patch authors. There are no added TAP tests for the scenario where the values differ between the two processes, no code comments which explain why it's OK if that happens, really no mention of it in the patch at all. And on that basis I'm inclined to think that Andres is really quite correct to be worried about this. The problem he's talking about here is very low-probability because the race condition is narrow, but it's real, and it surely needs to be handled somehow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Andres Freund
Date:
Hi, Thanks for chiming in. On 2019-01-11 12:52:08 -0500, Robert Haas wrote: > And on that basis I'm inclined to think that Andres is really quite > correct to be worried about this. The problem he's talking about here > is very low-probability because the race condition is narrow, but it's > real, and it surely needs to be handled somehow. I think it's also going to get more likely with projects we really got to tackle like providing more builtin support for failing over to different systems and such. Greetings, Andres Freund
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Andres Freund
Date:
Hi, On 2019-01-12 01:55:23 +0300, Sergei Kornilov wrote: > > pg_ctl reload is executed (or the equivalent), > > different backends reload the file at different times. > > There are no added TAP tests for the scenario where the values differ between > the two processes, no code comments which explain why it's OK > > But wait, connection string can not be changed via reload, only during cluster > start. How it can differs between processes? One of the major reasons for moving recovery parameters from recovery.conf to GUCs was to make them changable at runtime in the future. Greetings, Andres Freund
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Michael Paquier
Date:
On Fri, Jan 11, 2019 at 12:52:08PM -0500, Robert Haas wrote: > With the patch, the PrimaryConnInfo and PrimarySlotName arguments are > removed from RequestXLogStreaming. That means that the new > walreceiver could come to a different conclusion about the values of > those arguments than the startup process. In particular, it could end > up thinking that primary_conninfo is empty when, if the startup > process had thought that, the walreceiver would never have been > launched in the first place. But it's not obvious that you've added > any logic in WALReceiverMain or elsewhere to compensate for this > possibility -- what would happen in that case? Would we crash? > Connect to the wrong server? If I contemplate the patch this morning there is this bit: @@ -291,32 +295,40 @@ WalReceiverMain(void) /* Unblock signals (they were blocked when the postmaster forked us) */ PG_SETMASK(&UnBlockSig); + /* + * Fail immediately if primary_conninfo goes missing, better safe than + * sorry. + */ + if (PrimaryConnInfo == NULL || strcmp(PrimaryConnInfo, "") == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("cannot connect to the primary server as \"primary_conninfo\" is not defined"))); So the answer to your question is: the WAL receiver fails to start. > I might be wrong here, but I'm inclined to think that this scenario > hasn't really been contemplated carefully by the patch authors. There > are no added TAP tests for the scenario where the values differ > between the two processes, no code comments which explain why it's OK > if that happens, really no mention of it in the patch at all. And on > that basis I'm inclined to think that Andres is really quite correct > to be worried about this. The problem he's talking about here is very > low-probability because the race condition is narrow, but it's real, > and it surely needs to be handled somehow. primary_conninfo and primary_slot_name are PGC_POSTMASTER now, so adding tests now don't really make sense. -- Michael
Attachment
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Michael Paquier
Date:
On Sat, Jan 12, 2019 at 08:10:07AM +0900, Michael Paquier wrote: > On Fri, Jan 11, 2019 at 12:52:08PM -0500, Robert Haas wrote: >> With the patch, the PrimaryConnInfo and PrimarySlotName arguments are >> removed from RequestXLogStreaming. That means that the new >> walreceiver could come to a different conclusion about the values of >> those arguments than the startup process. In particular, it could end >> up thinking that primary_conninfo is empty when, if the startup >> process had thought that, the walreceiver would never have been >> launched in the first place. But it's not obvious that you've added >> any logic in WALReceiverMain or elsewhere to compensate for this >> possibility -- what would happen in that case? Would we crash? >> Connect to the wrong server? > > If I contemplate the patch this morning there is this bit: > @@ -291,32 +295,40 @@ WalReceiverMain(void) > /* Unblock signals (they were blocked when the postmaster forked > us) */ > PG_SETMASK(&UnBlockSig); > > + /* > + * Fail immediately if primary_conninfo goes missing, better safe than > + * sorry. > + */ > + if (PrimaryConnInfo == NULL || strcmp(PrimaryConnInfo, "") == 0) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("cannot connect to the primary server as \"primary_conninfo\" is not defined"))); > > So the answer to your question is: the WAL receiver fails to start. Robert, does this answer your question? I agree that depending on the timing, we could finish by having the startup process spawning a WAL receiver with a given connection string, and that the WAL receiver could use a different one, but as long as we fail the WAL receiver startup this does not seem like an issue to me, so I disagree with the upthread statement that the patch author has not thought about such cases :) It seems to me that making the WAL receiver relying more on the GUC context makes it more robust when it comes to reloading the parameters which would be an upcoming change as we need to rely on the WAL receiver check the GUC context itself and FATAL (or we would need to have the startup process check for a change in primary_conninfo/primary_slot_name, and then have the startup process kill the WAL receiver by itself). In short, handling all that in the WAL receiver would be more robust in the long term in my opinion as we reduce interactions between the WAL receiver and the startup process. And on top of it we get rid of ready_to_display, which is something I am unhappy with since we had to introduce it. -- Michael
Attachment
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Robert Haas
Date:
On Tue, Jan 15, 2019 at 8:48 PM Michael Paquier <michael@paquier.xyz> wrote: > > So the answer to your question is: the WAL receiver fails to start. > > Robert, does this answer your question? I agree that depending on the > timing, we could finish by having the startup process spawning a WAL > receiver with a given connection string, and that the WAL receiver > could use a different one, but as long as we fail the WAL receiver > startup this does not seem like an issue to me, so I disagree with the > upthread statement that the patch author has not thought about such > cases :) OK, if that was in the patch, I dunno why I didn't see it. I really didn't think it was there, but if I'm wrong, then I am. > It seems to me that making the WAL receiver relying more on the GUC > context makes it more robust when it comes to reloading the parameters > which would be an upcoming change as we need to rely on the WAL > receiver check the GUC context itself and FATAL (or we would need to > have the startup process check for a change in > primary_conninfo/primary_slot_name, and then have the startup process > kill the WAL receiver by itself). In short, handling all that in the > WAL receiver would be more robust in the long term in my opinion as we > reduce interactions between the WAL receiver and the startup process. > And on top of it we get rid of ready_to_display, which is something I > am unhappy with since we had to introduce it. I think you have some valid points here, but I also think that the patch as written doesn't really seem to have any user-visible benefits. Sure, ready_to_display is a crock, but getting rid of it doesn't immediately help anybody. And on the flip side your patch might have bugs, in which case we'll be worse off. I'm not going to stand on my soapbox and say that committing this patch is a terrible idea, because as far as I can tell, it isn't. But it feels like it might be a commit for the sake of making a commit, and I can't get too excited about that either. Since the logic will have to be further revised anyway to make primary_conninfo PGC_SIGHUP, why not just wait until that's done and then commit all the changes together instead of guessing that this might make that easier? If this does get committed now, and count me as about -0.25 for that, then I at least think it should have some comments clearly addressing the synchronization issues. Unless those are also in the patch and I missed them, too, but I hope I'm not that dense. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
From
Michael Paquier
Date:
On Wed, Jan 16, 2019 at 11:02:48AM -0500, Robert Haas wrote: > I think you have some valid points here, but I also think that the > patch as written doesn't really seem to have any user-visible > benefits. Sure, ready_to_display is a crock, but getting rid of it > doesn't immediately help anybody. And on the flip side your patch > might have bugs, in which case we'll be worse off. I'm not going to > stand on my soapbox and say that committing this patch is a terrible > idea, because as far as I can tell, it isn't. But it feels like it > might be a commit for the sake of making a commit, and I can't get too > excited about that either. Since the logic will have to be further > revised anyway to make primary_conninfo PGC_SIGHUP, why not just wait > until that's done and then commit all the changes together instead of > guessing that this might make that easier? You can say that about any patch which gets committed, even refactoring patches have a risk of introducing bugs, and even subtle ones understood only after release. I was justifying the existence of this thread exactly for opposite reasons. After reviewing the other patch switch to reload the parameters we could do more, and spawning a new thread to attract the correct audience looked rather right (it still does): https://www.postgresql.org/message-id/20181212043208.GI17695@paquier.xyz And this refactoring seemed to be justified as part of a different thread. I don't mind dropping this patch and this thread and just go back there, but that seems like taking steps backward, and what's proposed here is a bit different than just making the parameters reloadable. Still the refactoring looks justified to me for correctness with the parameter handling. Anyway, based on the opinions gathered until now, I don't mind just dropping this patch and move on to the other thread, though we will likely finish with the same discussion as what's done here :) A patch on which two committers are expressing concerns about is as good as going to the waste bin. That's of course fine by me. -- Michael