Thread: Turning recovery.conf into GUCs
Hi everyone, Seems that the latest patch in this series is: http://www.postgresql.org/message-id/CAB7nPqRLWLH1b0YsvqYF-zOFjrv4FRiurQ6yqcP3wjBp=TJOLg@mail.gmail.com = Applying = Reading the threads it seems there is consensus in this happening, so let's move in that direction. The patch applies almost cleanly except for a minor change in xlog.c = Building = In pg_basebackup we have now 2 unused functions: escapeConnectionParameter and escape_quotes. the only use of these functions was when that tool created the recovery.conf file so they aren't needed anymore. = Functionality = I have been testing functionality and it looks good so far, i still need to test a few things but i don't expect anything bad. I have 2 complaints though: 1) the file to trigger recovery is now called standby.enabled. I know this is because we use the file to also make the node a standby. Now, reality is that the file doesn't make the node a standby but the combination of this file (which starts recovery) + standby_mode=on. so, i agree with the suggestion of naming the file: recovery.trigger 2) This patch removes the dual existence of recovery.conf: as a state file and as a configuration file - the former is accomplished by changing the name of the file that triggers recovery (from recovery.conf to standby.enabled) - the latter is done by moving all parameters to postgresql.conf and *not reading* recovery.conf anymore so, after applying this patch postgres don't use recovery.conf for anything... except for complaining. it's completely fair to say we are no longer using that file and ignoring its existence, but why we should care if users want to use a file with that name for inclusion in postgresql.conf or where they put that hypotetic file? after this patch, recovery.conf will have no special meaning anymore so let's the user put it whatever files he wants, wherever he wants. if we really want to warn the user, use WARNING instead of FATAL = Code & functionality = why did you changed the context in which we can set archive_command? please, let it as a SIGHUP parameter - {"archive_command", PGC_SIGHUP, WAL_ARCHIVING, + {"archive_command", PGC_POSTMASTER, WAL_ARCHIVING, All parameters moved from recovery.conf has been marked as PGC_POSTMASTER, but most of them are clearly PGC_SIGHUP candidates + {"restore_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, + {"archive_cleanup_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, + {"recovery_end_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, + {"recovery_target_xid", PGC_POSTMASTER, WAL_RECOVERY_TARGET, + {"recovery_target_name", PGC_POSTMASTER, WAL_RECOVERY_TARGET, + {"recovery_target_time", PGC_POSTMASTER, WAL_RECOVERY_TARGET, + {"trigger_file", PGC_POSTMASTER, REPLICATION_STANDBY, Not sure about these ones + {"recovery_target_timeline", PGC_POSTMASTER, WAL_RECOVERY_TARGET, + {"primary_conninfo", PGC_POSTMASTER, REPLICATION_STANDBY, This is the only one i agree, should be PGC_POSTMASTER only + {"standby_mode", PGC_POSTMASTER, REPLICATION_STANDBY, -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157
Jaime, > = Building = > > In pg_basebackup we have now 2 unused functions: > escapeConnectionParameter and escape_quotes. the only use of these > functions was when that tool created the recovery.conf file so they > aren't needed anymore. Except that we'll want 9.4's -R to do something, probably create a file called conf.d/replication.conf. Mind you, it won't need the same wonky quoting stuff. > 1) the file to trigger recovery is now called standby.enabled. I know > this is because we use the file to also make the node a standby. > Now, reality is that the file doesn't make the node a standby but the > combination of this file (which starts recovery) + standby_mode=on. > so, i agree with the suggestion of naming the file: recovery.trigger > > 2) This patch removes the dual existence of recovery.conf: as a state > file and as a configuration file > > - the former is accomplished by changing the name of the file that > triggers recovery (from recovery.conf to standby.enabled) > - the latter is done by moving all parameters to postgresql.conf and > *not reading* recovery.conf anymore > > so, after applying this patch postgres don't use recovery.conf for > anything... except for complaining. > it's completely fair to say we are no longer using that file and > ignoring its existence, but why we should care if users want to use a > file with that name for inclusion in postgresql.conf or where they put > that hypotetic file? So this is a bit of a catch-22. If we allow the user to use a file named "recovery.conf" in $PGDATA, then the user may be unaware that the *meaning* of the file has changed, which can result in unplanned promotion of replicas after upgrade. *on the other hand*, if we prevent creation of a configuration file named "recovery.conf", then we block efforts to write backwards-compatible management utilities. AFAIK, there is no good solution for this, which is why it's taken so long for us to resolve the general issue. Given that, I think it's better to go for the breakage and all the warnings. If a user wants a file called recovery.conf, put it in the conf.d directory. I don't remember, though: how does this patch handle PITR recovery? > = Code & functionality = > + {"restore_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, > + {"archive_cleanup_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, > + {"recovery_end_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, > + {"recovery_target_xid", PGC_POSTMASTER, WAL_RECOVERY_TARGET, > + {"recovery_target_name", PGC_POSTMASTER, WAL_RECOVERY_TARGET, > + {"recovery_target_time", PGC_POSTMASTER, WAL_RECOVERY_TARGET, > + {"trigger_file", PGC_POSTMASTER, REPLICATION_STANDBY, > > Not sure about these ones > > + {"recovery_target_timeline", PGC_POSTMASTER, WAL_RECOVERY_TARGET, > + {"primary_conninfo", PGC_POSTMASTER, REPLICATION_STANDBY, It would be really nice to change these on the fly; it would help a lot of issues with minor changes to replication config. I can understand, though, that the replication code might not be prepared for that. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Wed, Oct 16, 2013 at 1:34 PM, Josh Berkus <josh@agliodbs.com> wrote: > Jaime, >> = Building = >> >> In pg_basebackup we have now 2 unused functions: >> escapeConnectionParameter and escape_quotes. the only use of these >> functions was when that tool created the recovery.conf file so they >> aren't needed anymore. > > Except that we'll want 9.4's -R to do something, probably create a file > called conf.d/replication.conf. Mind you, it won't need the same wonky > quoting stuff. > Currently the patch uses -R to create the recovery trigger file >> 1) the file to trigger recovery is now called standby.enabled. I know >> this is because we use the file to also make the node a standby. >> Now, reality is that the file doesn't make the node a standby but the >> combination of this file (which starts recovery) + standby_mode=on. >> so, i agree with the suggestion of naming the file: recovery.trigger >> >> 2) This patch removes the dual existence of recovery.conf: as a state >> file and as a configuration file >> >> - the former is accomplished by changing the name of the file that >> triggers recovery (from recovery.conf to standby.enabled) >> - the latter is done by moving all parameters to postgresql.conf and >> *not reading* recovery.conf anymore >> >> so, after applying this patch postgres don't use recovery.conf for >> anything... except for complaining. >> it's completely fair to say we are no longer using that file and >> ignoring its existence, but why we should care if users want to use a >> file with that name for inclusion in postgresql.conf or where they put >> that hypotetic file? > > So this is a bit of a catch-22. If we allow the user to use a file > named "recovery.conf" in $PGDATA, then the user may be unaware that the > *meaning* of the file has changed, which can result in unplanned > promotion of replicas after upgrade. > well, after upgrade you should do checks. and even if it happens, after it happens once people will be aware of the change. now, some suggestions were made to avoid the problem. 1) read the file if exists last in the process of postgresql.conf, 2) add a GUC indicating if it should read it and include it (not using it as a trigger file). another one, 3) include in this release an include_if_exists directive and give a warning if it sees the file then include it, on next release remove the include_if_exists (at least that way people will be warned before breaking compatibility) please, keep in mind none of these suggestions include make the recovery.conf file act as a trigger for recovery > *on the other hand*, if we prevent creation of a configuration file > named "recovery.conf", then we block efforts to write > backwards-compatible management utilities. > and all tools and procedures that currently exists. > AFAIK, there is no good solution for this, which is why it's taken so > long for us to resolve the general issue. Given that, I think it's > better to go for the breakage and all the warnings. If a user wants a > file called recovery.conf, put it in the conf.d directory. > well, there should be good solutions... maybe we haven't thought them yet. anyway, we can't postpone the decision forever. we need to make a decision and stick with it otherwise this patch will be stalled N releases for no good reason > I don't remember, though: how does this patch handle PITR recovery? > exactly as it is now, if it sees the recovery trigger file, then it starts ArchiveRecovery and after it finish delete the file (the only difference) and increment the timeline >> = Code & functionality = > >> + {"restore_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, >> + {"archive_cleanup_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, >> + {"recovery_end_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, >> + {"recovery_target_xid", PGC_POSTMASTER, WAL_RECOVERY_TARGET, >> + {"recovery_target_name", PGC_POSTMASTER, WAL_RECOVERY_TARGET, >> + {"recovery_target_time", PGC_POSTMASTER, WAL_RECOVERY_TARGET, >> + {"trigger_file", PGC_POSTMASTER, REPLICATION_STANDBY, >> >> Not sure about these ones >> >> + {"recovery_target_timeline", PGC_POSTMASTER, WAL_RECOVERY_TARGET, >> + {"primary_conninfo", PGC_POSTMASTER, REPLICATION_STANDBY, > > It would be really nice to change these on the fly; it would help a lot > of issues with minor changes to replication config. I can understand, > though, that the replication code might not be prepared for that. > well, archive_command can be changed right now with a SIGHUP so at least that one shouldn't change... and i don't think most of these are too different. even if we are not sure we can do this now and change them as SIGHUP later -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157
On Fri, Oct 18, 2013 at 1:13 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote: >>> = Code & functionality = >> >>> + {"restore_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, >>> + {"archive_cleanup_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, >>> + {"recovery_end_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, >>> + {"recovery_target_xid", PGC_POSTMASTER, WAL_RECOVERY_TARGET, >>> + {"recovery_target_name", PGC_POSTMASTER, WAL_RECOVERY_TARGET, >>> + {"recovery_target_time", PGC_POSTMASTER, WAL_RECOVERY_TARGET, >>> + {"trigger_file", PGC_POSTMASTER, REPLICATION_STANDBY, >>> >>> Not sure about these ones >>> >>> + {"recovery_target_timeline", PGC_POSTMASTER, WAL_RECOVERY_TARGET, >>> + {"primary_conninfo", PGC_POSTMASTER, REPLICATION_STANDBY, >> >> It would be really nice to change these on the fly; it would help a lot >> of issues with minor changes to replication config. I can understand, >> though, that the replication code might not be prepared for that. >> > > well, archive_command can be changed right now with a SIGHUP so at > least that one shouldn't change... and i don't think most of these are > too different. even if we are not sure we can do this now and change > them as SIGHUP later Changing those parameters don't really matter as long as the node is not performing a recovery IMO, but I'd rather see a careful approach here and let all those parameters as PGC_POSTMASTER for now to avoid any surprises. Perhaps a second patch on top of this one could be the addition of context name like SIGHUP_RECOVERY, aka just allow those parameters to be updated with SIGHUP as long as the node is not in recovery. -- Michael
Jaime, >> Except that we'll want 9.4's -R to do something, probably create a file >> called conf.d/replication.conf. Mind you, it won't need the same wonky >> quoting stuff. >> > > Currently the patch uses -R to create the recovery trigger file Right, I'm saying that we'll want to do better than that for release, but that's dependant on committing the conf directory patch. Note that this change makes committing the conf.d patch extra-important; it's going to be a LOT easier to upgrade tools for 9.4 if we have that. > well, after upgrade you should do checks. and even if it happens, > after it happens once people will be aware of the change. > now, some suggestions were made to avoid the problem. 1) read the file > if exists last in the process of postgresql.conf, 2) add a GUC > indicating if it should read it and include it (not using it as a > trigger file). another one, 3) include in this release an > include_if_exists directive and give a warning if it sees the file > then include it, on next release remove the include_if_exists (at > least that way people will be warned before breaking compatibility) I think all of these suggestions just make our code more complicated without improving the upgrade situation. The reason given (and I think it's pretty good) for erroring on recovery.conf is that we don't want people to accidentally take a server out of replication because they didn't check which version of PostgreSQL they are on. >> *on the other hand*, if we prevent creation of a configuration file >> named "recovery.conf", then we block efforts to write >> backwards-compatible management utilities. >> > > and all tools and procedures that currently exists. Right. However, exploring your suggestions above, none of those workarounds prevent breakage. And in some cases, they make the breakage less obvious than the current patch does. If repmgr 1.2 / OmniPITR 1.2 won't work correctly with 9.4, then we want those tools to break at upgrade time, not later when the user is trying to fail over. For that matter, 9.4 is a very good time (relatively speaking) to break replication tools because the new logical replication is going to cause everyone to rev their tools anyway. This kind of breakage alone might end up being a good reason to call the next version 10.0. > well, there should be good solutions... maybe we haven't thought them yet. > anyway, we can't postpone the decision forever. we need to make a > decision and stick with it otherwise this patch will be stalled N > releases for no good reason I think if there were a good solution, sometime in the last 1.5 years someone would have suggested it. Gods know Simon has tried. > exactly as it is now, if it sees the recovery trigger file, then it > starts ArchiveRecovery and after it finish delete the file (the only > difference) and increment the timeline OK, so if I'm doing a PITR recovery, I just put the recovery variables into postgresql.conf, then? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Fri, Oct 18, 2013 at 11:32 AM, Josh Berkus <josh@agliodbs.com> wrote: > >> exactly as it is now, if it sees the recovery trigger file, then it >> starts ArchiveRecovery and after it finish delete the file (the only >> difference) and increment the timeline > > OK, so if I'm doing a PITR recovery, I just put the recovery variables > into postgresql.conf, then? > create a recovery trigger file (called standby.enabled in current patch) in $PGDATA and start the server -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157
On 2013-10-18 09:32:15 -0700, Josh Berkus wrote: > For that matter, 9.4 is a very good time (relatively speaking) to break > replication tools because the new logical replication is going to cause > everyone to rev their tools anyway. We're hopefully getting changeset extraction in, but there's little chance of a full blown replication solution making it in in 9.4... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10/18/2013 12:29 PM, Andres Freund wrote: > On 2013-10-18 09:32:15 -0700, Josh Berkus wrote: >> For that matter, 9.4 is a very good time (relatively speaking) to break >> replication tools because the new logical replication is going to cause >> everyone to rev their tools anyway. > > We're hopefully getting changeset extraction in, but there's little > chance of a full blown replication solution making it in in 9.4... I thought changeset extraction was the only thing going into core? What else do we need? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2013-10-18 13:16:52 -0700, Josh Berkus wrote: > On 10/18/2013 12:29 PM, Andres Freund wrote: > > On 2013-10-18 09:32:15 -0700, Josh Berkus wrote: > >> For that matter, 9.4 is a very good time (relatively speaking) to break > >> replication tools because the new logical replication is going to cause > >> everyone to rev their tools anyway. > > > > We're hopefully getting changeset extraction in, but there's little > > chance of a full blown replication solution making it in in 9.4... > > I thought changeset extraction was the only thing going into core? What > else do we need? Well, I personally want more in core mid/long term, but anyway. Without released, proven and stable logical in-core replication technology using this, I don't see why repmgr or something related would need/want to change? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10/18/2013 01:35 PM, Andres Freund wrote: > On 2013-10-18 13:16:52 -0700, Josh Berkus wrote: >> I thought changeset extraction was the only thing going into core? What >> else do we need? > > Well, I personally want more in core mid/long term, but anyway. I've lost track of the plan, then. Hmmm ... we need replication of DDL commands, no? > Without released, proven and stable logical in-core replication > technology using this, I don't see why repmgr or something related would > need/want to change? Repmgr is designed to manage binary replication, not perform it. What will likely change first is Slony and Bucardo, who have a strong interest in dumping triggers and queues. A contrib module which did the simplest implementation -- that is, whole-database M-S replication -- would also be a good idea, especially since it would provide an example of how to build your own. But I'd be wary of going beyond that in core, because you very quickly get into the territory of trying to satisfy multiple exclusive use-cases. Let's focus on providing a really good API which enables people to build their own tools. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2013-10-18 14:16:04 -0700, Josh Berkus wrote: > On 10/18/2013 01:35 PM, Andres Freund wrote: > > On 2013-10-18 13:16:52 -0700, Josh Berkus wrote: > >> I thought changeset extraction was the only thing going into core? What > >> else do we need? > > > > Well, I personally want more in core mid/long term, but anyway. > > I've lost track of the plan, then. > > Hmmm ... we need replication of DDL commands, no? > > > Without released, proven and stable logical in-core replication > > technology using this, I don't see why repmgr or something related would > > need/want to change? > > Repmgr is designed to manage binary replication, not perform it. Obviously. > What will likely change first is Slony and Bucardo, who have a strong > interest in dumping triggers and queues. But I don't understand what that has to do with recovery.conf and breakage around it. > A contrib module which did the > simplest implementation -- that is, whole-database M-S replication -- > would also be a good idea, especially since it would provide an example > of how to build your own. > > But I'd be wary of going beyond that in core, because you very quickly > get into the territory of trying to satisfy multiple exclusive > use-cases. Let's focus on providing a really good API which enables > people to build their own tools. We'll see. I am certain we'll have many discussions about the bits and pieces you need to build a great replication solution (of which we imo don't have any yet). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Oct 18, 2013 at 11:32 AM, Josh Berkus <josh@agliodbs.com> wrote: > Jaime, > >> well, after upgrade you should do checks. and even if it happens, >> after it happens once people will be aware of the change. >> now, some suggestions were made to avoid the problem. 1) read the file >> if exists last in the process of postgresql.conf, 2) add a GUC >> indicating if it should read it and include it (not using it as a >> trigger file). another one, 3) include in this release an >> include_if_exists directive and give a warning if it sees the file >> then include it, on next release remove the include_if_exists (at >> least that way people will be warned before breaking compatibility) > > I think all of these suggestions just make our code more complicated > without improving the upgrade situation. > well #3 just add a line in postgresql.conf (an include_if_exists) and current patch gives an error in case it finds the file (i'm suggesting to make it a warning instead). how does that makes our code more complicated? > The reason given (and I think it's pretty good) for erroring on > recovery.conf is that we don't want people to accidentally take a server > out of replication because they didn't check which version of PostgreSQL > they are on. > well, people will go out of replication also if they forgot the recovery trigger file even if they set the variables in postgresql.conf it happens two me twice today ;) -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157
On 10/18/2013 02:58 PM, Jaime Casanova wrote: > well #3 just add a line in postgresql.conf (an include_if_exists) and > current patch gives an error in case it finds the file (i'm suggesting > to make it a warning instead). > how does that makes our code more complicated? Well, that's a couple extra lines only, I know. However, it doesn't actually help with the breakage any, since recovery.conf *still* won't work as a trigger file. The only thing which would prevent breakage (proposed by Simon, I think) is having recovery.conf have an include_if_exists, AND have recovery.conf be an 'alternate' name for replication.trigger. However, even this would break, and in IMHO ways which would tend to happen at failover time rather than upgrade time. To put it clearly: if we're going to have breakage, I want it to be at upgrade time, when the database is *already down*, and not at failover time or some other time when downtime is not planned. > well, people will go out of replication also if they forgot the > recovery trigger file > even if they set the variables in postgresql.conf > > it happens two me twice today ;) Right. What I'd like to avoid is having folks try to use, for example, repmgr 1.2 with PostgreSQL 9.4 and have their replication break and them not notice for a couple hours of operation. I'd rather have PostgreSQL 9.4 refuse to come up, so that they know *immediately* that something is wrong. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 10/18/2013 02:36 PM, Andres Freund wrote: >> > What will likely change first is Slony and Bucardo, who have a strong >> > interest in dumping triggers and queues. > But I don't understand what that has to do with recovery.conf and > breakage around it. The simple thinking is this: if we announce and promote new replication, then our users who do upgrade are going to expect to upgrade their replication tools at the same time, even if they're not using the new replication. That is people will look for a repmgr 2.0 / OmniPITR 1.5 and update to it. Now, as a tool author, I know that supporting both models is going to be annoying. But necessary. And, as I said before, we need to do the GUC merger in the same release we introduce configuration directory (or after it). -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Mon, Oct 21, 2013 at 11:53 AM, Josh Berkus <josh@agliodbs.com> wrote: > On 10/18/2013 02:36 PM, Andres Freund wrote: >>> > What will likely change first is Slony and Bucardo, who have a strong >>> > interest in dumping triggers and queues. >> But I don't understand what that has to do with recovery.conf and >> breakage around it. > > The simple thinking is this: if we announce and promote new replication, > then our users who do upgrade are going to expect to upgrade their > replication tools at the same time, even if they're not using the new > replication. That is people will look for a repmgr 2.0 / OmniPITR 1.5 > and update to it. > > Now, as a tool author, I know that supporting both models is going to be > annoying. But necessary. > AFAIU, even if we get in all about logical replication today that won't affect tools that manage binary replication. > And, as I said before, we need to do the GUC merger in the same release > we introduce configuration directory (or after it). > you mean the ALTER SYSTEM syntax? anyway, why that restriction? -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157
Jaime, >> And, as I said before, we need to do the GUC merger in the same release >> we introduce configuration directory (or after it). >> > > you mean the ALTER SYSTEM syntax? anyway, why that restriction? No, I'm referring to the proposal to have an automatically created and included conf.d directory for extra configuration files. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Mon, Oct 21, 2013 at 2:57 PM, Josh Berkus <josh@agliodbs.com> wrote: > Jaime, > >>> And, as I said before, we need to do the GUC merger in the same release >>> we introduce configuration directory (or after it). >>> >> >> you mean the ALTER SYSTEM syntax? anyway, why that restriction? > > No, I'm referring to the proposal to have an automatically created and > included conf.d directory for extra configuration files. > 9.3 has include_dir and postgresql.conf has this setted: """ #include_dir = 'conf.d' # include files ending in '.conf' from # directory 'conf.d' """ anything before this should be up to the packagers, no? or, do you mean something else? -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157
> 9.3 has include_dir and postgresql.conf has this setted: > """ > #include_dir = 'conf.d' # include files ending in '.conf' from > # directory 'conf.d' > """ > > anything before this should be up to the packagers, no? > > or, do you mean something else? Well, I was just pointing out that it would make things *much* easier for tool authors if conf.d were enabled by default, instead of disabled by default. Then they could change tools to write a "recovery.conf" file to conf.d. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Mon, Oct 21, 2013 at 3:57 PM, Josh Berkus <josh@agliodbs.com> wrote: > >> 9.3 has include_dir and postgresql.conf has this setted: >> """ >> #include_dir = 'conf.d' # include files ending in '.conf' from >> # directory 'conf.d' >> """ >> >> anything before this should be up to the packagers, no? >> >> or, do you mean something else? > > Well, I was just pointing out that it would make things *much* easier > for tool authors if conf.d were enabled by default, instead of disabled > by default. Then they could change tools to write a "recovery.conf" > file to conf.d. > I agree, it would be much easier, but that is not relevant to this patch. I have rebased Michael Paquier's patch and did a few changes: * changed standby.enabled filename to recovery.trigger * make archive_command a SIGHUP parameter again * make restore_command a SIGHUP parameter * rename restore_command variable to XLogRestoreCommand to match XLogArchiveCommand we can also make primary_conninfo a SIGHUP parameter, of course the DBA will need to force a reconection after that. after this is done we can look at the other recovery parameters. -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157
Attachment
On 11/13/13, 12:17 AM, Jaime Casanova wrote: > I have rebased Michael Paquier's patch and did a few changes: > > * changed standby.enabled filename to recovery.trigger > * make archive_command a SIGHUP parameter again > * make restore_command a SIGHUP parameter > * rename restore_command variable to XLogRestoreCommand to match > XLogArchiveCommand Please check for compiler warnings in pg_basebackup: pg_basebackup.c:1109:1: warning: ‘escapeConnectionParameter’ defined but not used [-Wunused-function] pg_basebackup.c:1168:1: warning: ‘escape_quotes’ defined but not used [-Wunused-function]
On Fri, Nov 15, 2013 at 9:28 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > On 11/13/13, 12:17 AM, Jaime Casanova wrote: >> I have rebased Michael Paquier's patch and did a few changes: >> >> * changed standby.enabled filename to recovery.trigger >> * make archive_command a SIGHUP parameter again >> * make restore_command a SIGHUP parameter >> * rename restore_command variable to XLogRestoreCommand to match >> XLogArchiveCommand > > Please check for compiler warnings in pg_basebackup: > > pg_basebackup.c:1109:1: warning: ‘escapeConnectionParameter’ defined but not used [-Wunused-function] > pg_basebackup.c:1168:1: warning: ‘escape_quotes’ defined but not used [-Wunused-function] > those are functions that are no longer used but Josh considered they could become useful before release. i can put them inside #ifdef _NOT_USED_ decorations or just remove them now and if/when we find some use for them re add them -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157
On Fri, Nov 15, 2013 at 11:38 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote: > those are functions that are no longer used but Josh considered they > could become useful before release. > i can put them inside #ifdef _NOT_USED_ decorations or just remove > them now and if/when we find some use for them re add them Why not simply removing them? They will be kept in the git history either way. -- Michael
On 11/15/2013 06:38 AM, Jaime Casanova wrote: >> > Please check for compiler warnings in pg_basebackup: >> > >> > pg_basebackup.c:1109:1: warning: ‘escapeConnectionParameter’ defined but not used [-Wunused-function] >> > pg_basebackup.c:1168:1: warning: ‘escape_quotes’ defined but not used [-Wunused-function] >> > > those are functions that are no longer used but Josh considered they > could become useful before release. > i can put them inside #ifdef _NOT_USED_ decorations or just remove > them now and if/when we find some use for them re add them Wait, which Josh? Not me ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Fri, Nov 15, 2013 at 4:11 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 11/15/2013 06:38 AM, Jaime Casanova wrote: >>> > Please check for compiler warnings in pg_basebackup: >>> > >>> > pg_basebackup.c:1109:1: warning: ‘escapeConnectionParameter’ defined but not used [-Wunused-function] >>> > pg_basebackup.c:1168:1: warning: ‘escape_quotes’ defined but not used [-Wunused-function] >>> > >> those are functions that are no longer used but Josh considered they >> could become useful before release. >> i can put them inside #ifdef _NOT_USED_ decorations or just remove >> them now and if/when we find some use for them re add them > > Wait, which Josh? Not me ... > sorry, i clearly misunderstood you. attached a rebased patch with those functions removed. -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157
Attachment
Hi, On 2013-11-15 22:38:05 -0500, Jaime Casanova wrote: > sorry, i clearly misunderstood you. attached a rebased patch with > those functions removed. * --write-standby-enable seems to loose quite some functionality in comparison to --write-recovery-conf since it doesn'tseem to set primary_conninfo, standby anymore. * CheckRecoveryReadyFile() doesn't seem to be a very descriptive function name. * Why does StartupXLOG() now use ArchiveRecoveryRequested && StandbyModeRequested instead of just the former? * I am not sure I like "recovery.trigger" as a name. It seems to close to what I've seen people use to trigger failover andtoo close to trigger_file. * You check for a trigger file in a way that's not compatible with it being NULL. Why did you change that? - if (TriggerFile == NULL) + if (!trigger_file[0]) * You made recovery_target_inclusive/pause_at_recovery_target PGC_SIGHUP - did you review that actually works? Imo that shouldbe changed in a separate commit. * Maybe we should rename names like pause_at_recovery_target to recovery_pause_at_target? Since we already force everyoneto bother changing their setup... * the description of archive_cleanup_command seems wrong to me. * Why did you change some of the recovery gucs to lowercase names, but left out XLogRestoreCommand? * Why the PG_TRY/PG_CATCH in check_recovery_target_time? Besides being really strangely formatted (multiline :? inside afunction?) it doesn't a) seem to be correct to ignore potential memory allocation errors by just switching back into thecontext that just errored out, and continue to work there b) forgo cleanup by just continuing as if nothing happened.That's unlikely to be acceptable. * You access recovery_target_name[0] unconditionally, although it's intialized to NULL. * Generally, ISTM there's too much logic in the assign hooks. * Why do you include xlog_internal.h in guc.c and not xlog.h? Ok, I think that's enough for now ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Nov 19, 2013 at 2:27 AM, Andres Freund <andres@2ndquadrant.com> wrote: > * --write-standby-enable seems to loose quite some functionality in > comparison to --write-recovery-conf since it doesn't seem to set > primary_conninfo, standby anymore. Yes... The idea here might be to generate a new file that is then included in postgresql.conf or to add parameters at the bottom of postgresql.conf itself. The code for plain base backup is straight-forward, but it could get ugly for the tar part... > * CheckRecoveryReadyFile() doesn't seem to be a very descriptive > function name. CheckRecoveryTriggerPresence? > * Why does StartupXLOG() now use ArchiveRecoveryRequested && > StandbyModeRequested instead of just the former? > * I am not sure I like "recovery.trigger" as a name. It seems to close > to what I've seen people use to trigger failover and too close to > trigger_file. This name was chosen and kept in accordance to the spec of this feature. Looks fine for me... > * You made recovery_target_inclusive/pause_at_recovery_target PGC_SIGHUP > - did you review that actually works? Imo that should be changed in a > separate commit. Yep, I'd rather see those parameters as PGC_POSTMASTER... As of now, once recovery is started those parameter values do not change once readRecoveryCommandFile is kicked. Having them SIGHUP would mean that you could change them during recovery. Sounds kind of dangerous, no? > * Maybe we should rename names like pause_at_recovery_target to > recovery_pause_at_target? Since we already force everyone to bother > changing their setup... I disagree here. It is true that this patch introduces many changes with a new configuration file layer, but this idea with this patch was to allow people to reuse their old recovery.conf as it is. And what is actually wrong with pause_at_recovery_target? > > * the description of archive_cleanup_command seems wrong to me. > * Why did you change some of the recovery gucs to lowercase names, but > left out XLogRestoreCommand? This was part of the former patch, perhaps you are right and keeping the names as close as possible to the old ones would make sense to facilitate maintenance across versions. > > * Why the PG_TRY/PG_CATCH in check_recovery_target_time? Besides being > really strangely formatted (multiline :? inside a function?) it > doesn't a) seem to be correct to ignore potential memory allocation > errors by just switching back into the context that just errored out, > and continue to work there b) forgo cleanup by just continuing as if > nothing happened. That's unlikely to be acceptable. Interestingly, that was part of the first versions of the patch as well. I don't recall modifying anything in this area when I hacked that... But yes it should be modified to something like what is in check_recovery_target_xid. Regards, -- Michael
On 2013-11-19 22:09:48 +0900, Michael Paquier wrote: > On Tue, Nov 19, 2013 at 2:27 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > * --write-standby-enable seems to loose quite some functionality in > > comparison to --write-recovery-conf since it doesn't seem to set > > primary_conninfo, standby anymore. > Yes... The idea here might be to generate a new file that is then > included in postgresql.conf or to add parameters at the bottom of > postgresql.conf itself. The code for plain base backup is > straight-forward, but it could get ugly for the tar part... Well, just removing most of the feature - which the current patch seems to be doing - looks like a regression to me, so it has to be either fixed or explicitly discussed. > > * CheckRecoveryReadyFile() doesn't seem to be a very descriptive > > function name. > CheckRecoveryTriggerPresence? CheckStartingAsStandby()? Recovery alone doesn't say very much. > > * Why does StartupXLOG() now use ArchiveRecoveryRequested && > > StandbyModeRequested instead of just the former? ? > > * I am not sure I like "recovery.trigger" as a name. It seems to close > > to what I've seen people use to trigger failover and too close to > > trigger_file. > This name was chosen and kept in accordance to the spec of this > feature. Looks fine for me... I still think "start_as_standby.trigger" or such would be much clearer and far less likely to be confused with the promotion trigger file. > > * You made recovery_target_inclusive/pause_at_recovery_target PGC_SIGHUP > > - did you review that actually works? Imo that should be changed in a > > separate commit. > > Yep, I'd rather see those parameters as PGC_POSTMASTER... As of now, > once recovery is started those parameter values do not change once > readRecoveryCommandFile is kicked. Having them SIGHUP would mean that > you could change them during recovery. Sounds kind of dangerous, no? I think it's desirable to make them changeable during recovery, but it's a separate patch. > > * Maybe we should rename names like pause_at_recovery_target to > > recovery_pause_at_target? Since we already force everyone to bother > > changing their setup... > I disagree here. It is true that this patch introduces many changes > with a new configuration file layer, but this idea with this patch was > to allow people to reuse their old recovery.conf as it is. And what is > actually wrong with pause_at_recovery_target? That nearly all the other variables start with recovery_. But I don't feel very strongly abou thtis. > > * Why did you change some of the recovery gucs to lowercase names, but > > left out XLogRestoreCommand? > > This was part of the former patch, perhaps you are right and keeping > the names as close as possible to the old ones would make sense to > facilitate maintenance across versions. I think lowercase is slightly more consistent with the majority of the other GUCs, but if you change it you should change all the new GUC variables. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-11-19 22:09:48 +0900, Michael Paquier wrote: >> On Tue, Nov 19, 2013 at 2:27 AM, Andres Freund <andres@2ndquadrant.com> wrote: >>> * Why did you change some of the recovery gucs to lowercase names, but >>> left out XLogRestoreCommand? >> This was part of the former patch, perhaps you are right and keeping >> the names as close as possible to the old ones would make sense to >> facilitate maintenance across versions. > I think lowercase is slightly more consistent with the majority of the > other GUCs, but if you change it you should change all the new GUC variables. Please *don't* create any more mixed-case GUC names. The spelling of TimeZone and the one or two other historical exceptions was a very unfortunate thing; it's confused more people than it's helped. Put in some underscores if you feel a need for word boundaries. regards, tom lane
On Tue, Nov 19, 2013 at 3:32 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-11-19 22:09:48 +0900, Michael Paquier wrote: >> On Tue, Nov 19, 2013 at 2:27 AM, Andres Freund <andres@2ndquadrant.com> wrote: > >> > * I am not sure I like "recovery.trigger" as a name. It seems to close >> > to what I've seen people use to trigger failover and too close to >> > trigger_file. > >> This name was chosen and kept in accordance to the spec of this >> feature. Looks fine for me... > > I still think "start_as_standby.trigger" or such would be much clearer > and far less likely to be confused with the promotion trigger file. > the function of the file is to inform the server it's in recovery and it needs to consider recovery parameters, not to make the server a standby. yes, i admit that is half the way to make the server a standby. for example, if you are doing PITR and stopping the server before some specific point (recovery_target_*) then "start_as_standby.trigger" will has no meaning and could confuse people >> > * You made recovery_target_inclusive/pause_at_recovery_target PGC_SIGHUP >> > - did you review that actually works? Imo that should be changed in a >> > separate commit. >> >> Yep, I'd rather see those parameters as PGC_POSTMASTER... As of now, >> once recovery is started those parameter values do not change once >> readRecoveryCommandFile is kicked. Having them SIGHUP would mean that >> you could change them during recovery. Sounds kind of dangerous, no? > > I think it's desirable to make them changeable during recovery, but it's > a separate patch. > uh! i read the patch and miss that. will change > >> > * Why did you change some of the recovery gucs to lowercase names, but >> > left out XLogRestoreCommand? >> >> This was part of the former patch, perhaps you are right and keeping >> the names as close as possible to the old ones would make sense to >> facilitate maintenance across versions. > > I think lowercase is slightly more consistent with the majority of the > other GUCs, but if you change it you should change all the new GUC variables. > This one was my change, in the original patch is called "restore_command" and i changed it because archive_command's parameter is called XLogArchiveCommand so i wanted the name to follow the same format. i can return it to the original name if no one likes XLogRestoreCommand -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157
On 2013-11-19 22:24:19 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-11-19 22:09:48 +0900, Michael Paquier wrote: > >> On Tue, Nov 19, 2013 at 2:27 AM, Andres Freund <andres@2ndquadrant.com> wrote: > >>> * Why did you change some of the recovery gucs to lowercase names, but > >>> left out XLogRestoreCommand? > > >> This was part of the former patch, perhaps you are right and keeping > >> the names as close as possible to the old ones would make sense to > >> facilitate maintenance across versions. > > > I think lowercase is slightly more consistent with the majority of the > > other GUCs, but if you change it you should change all the new GUC variables. > > Please *don't* create any more mixed-case GUC names. The spelling of > TimeZone and the one or two other historical exceptions was a very > unfortunate thing; it's confused more people than it's helped. > Put in some underscores if you feel a need for word boundaries. That's a misunderstanding - I was only talking about the variables below the GUCs, no the GUC's name. The patch changed quite some variable names, around, but left others leaving an inconsistent casing of related variables... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-11-20 08:10:44 -0500, Jaime Casanova wrote: > On Tue, Nov 19, 2013 at 3:32 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2013-11-19 22:09:48 +0900, Michael Paquier wrote: > >> On Tue, Nov 19, 2013 at 2:27 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > > >> > * I am not sure I like "recovery.trigger" as a name. It seems to close > >> > to what I've seen people use to trigger failover and too close to > >> > trigger_file. > > > >> This name was chosen and kept in accordance to the spec of this > >> feature. Looks fine for me... > > > > I still think "start_as_standby.trigger" or such would be much clearer > > and far less likely to be confused with the promotion trigger file. > > > > the function of the file is to inform the server it's in recovery and > it needs to consider recovery parameters, not to make the server a > standby. yes, i admit that is half the way to make the server a > standby. for example, if you are doing PITR and stopping the server > before some specific point (recovery_target_*) then > "start_as_standby.trigger" will has no meaning and could confuse > people 'recovery' includes crash recovery, that's why I quite dislike your function name since it's not crash recovery you're checking for since during that we certainly do not want to interpet those parameters. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Nov 18, 2013 at 12:27 PM, Andres Freund <andres@2ndquadrant.com> wrote: > Hi, > > On 2013-11-15 22:38:05 -0500, Jaime Casanova wrote: >> sorry, i clearly misunderstood you. attached a rebased patch with >> those functions removed. > > * --write-standby-enable seems to loose quite some functionality in > comparison to --write-recovery-conf since it doesn't seem to set > primary_conninfo, standby anymore. we can add code that looks for postgresql.conf in $PGDATA but if postgresql.conf is not there (like the case in debian, there is not much we can do about it) or we can write a file ready to be included in postgresql.conf, any sugestion? > * CheckRecoveryReadyFile() doesn't seem to be a very descriptive > function name. I left it as CheckStartingAsStandby() but i still have a problem of this not being completely clear. this function is useful for standby or pitr. which leads me to the other problem i have: the recovery trigger file, i have left it as standby.enabled but i still don't like it. other options for the recovery trigger file: recovery.trigger (Andres objected on this name) forced_recovery.trigger user_forced_recovery.trigger or just allow standby.enabled and pitr.enabled. One advantage of this is that if you put pitr.enabled you can check that standby_mode is off. the bad part on this approach is that it can end being any_number_of_features.enabled, but i don't think that will happen > * Why does StartupXLOG() now use ArchiveRecoveryRequested && > StandbyModeRequested instead of just the former? good question. anyway, this patch shouldn't change that. if that should be changed that is probably a bug and deserves to be in its own patch > * I am not sure I like "recovery.trigger" as a name. It seems to close > to what I've seen people use to trigger failover and too close to > trigger_file. look above > * You check for a trigger file in a way that's not compatible with it > being NULL. Why did you change that? > - if (TriggerFile == NULL) > + if (!trigger_file[0]) returned to the old way of checking it > * You made recovery_target_inclusive/pause_at_recovery_target PGC_SIGHUP > - did you review that actually works? Imo that should be changed in a > separate commit. agreed. i left all parameters that were in recovery.conf as PGC_POSTMASTER and will start a new thread about which ones we should change > * Maybe we should rename names like pause_at_recovery_target to > recovery_pause_at_target? Since we already force everyone to bother > changing their setup... i don't have a problem with this, anyone else? if no one speaks i will do what Andres suggests > * the description of archive_cleanup_command seems wrong to me. why? it seems to be the same that was in recovery.conf. where did you see the description you're complaining at? > * Why did you change some of the recovery gucs to lowercase names, but > left out XLogRestoreCommand? my bad, left it as it was in the original patch: restore_command > * Why the PG_TRY/PG_CATCH in check_recovery_target_time? Besides being > really strangely formatted (multiline :? inside a function?) it > doesn't a) seem to be correct to ignore potential memory allocation > errors by just switching back into the context that just errored out, > and continue to work there b) forgo cleanup by just continuing as if > nothing happened. That's unlikely to be acceptable. the code that read recovery.conf didn't has that, so i just removed it > * You access recovery_target_name[0] unconditionally, although it's > intialized to NULL. > * Generally, ISTM there's too much logic in the assign hooks. probably that is mood now, because the comment that Heikki put in commit 815d71deed5df2a91b06da76edbe5bc64965bfea says "If multiple recovery_targets are specified, use the latest one." so now we should just use the last one setted. Heikki? Any opinions? > * Why do you include xlog_internal.h in guc.c and not xlog.h? > we actually need both but including xlog_internal.h also includes xlog.h i added xlog.h and if someone things is enough only putting xlog_internal.h let me know thank you -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157
Attachment
On Wed, Jan 15, 2014 at 2:00 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote: > On Mon, Nov 18, 2013 at 12:27 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >> * Maybe we should rename names like pause_at_recovery_target to >> recovery_pause_at_target? Since we already force everyone to bother >> changing their setup... > > i don't have a problem with this, anyone else? if no one speaks i will > do what Andres suggests > Actually Michael had objected to this idea but i forgot about that... so i will wait for some consensus (my personal opinion is that Michael's argument is a good one) -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157
On Wed, Jan 15, 2014 at 2:06 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote: > On Wed, Jan 15, 2014 at 2:00 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote: >> On Mon, Nov 18, 2013 at 12:27 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> >>> * Maybe we should rename names like pause_at_recovery_target to >>> recovery_pause_at_target? Since we already force everyone to bother >>> changing their setup... >> >> i don't have a problem with this, anyone else? if no one speaks i will >> do what Andres suggests >> > > Actually Michael had objected to this idea but i forgot about that... > so i will wait for some consensus (my personal opinion is that > Michael's argument is a good one) I prefer the current name. It's more like the way English is spoken. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2014-01-15 02:00:51 -0500, Jaime Casanova wrote: > On Mon, Nov 18, 2013 at 12:27 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > Hi, > > > > On 2013-11-15 22:38:05 -0500, Jaime Casanova wrote: > >> sorry, i clearly misunderstood you. attached a rebased patch with > >> those functions removed. > > > > * --write-standby-enable seems to loose quite some functionality in > > comparison to --write-recovery-conf since it doesn't seem to set > > primary_conninfo, standby anymore. > > we can add code that looks for postgresql.conf in $PGDATA but if > postgresql.conf is not there (like the case in debian, there is not > much we can do about it) or we can write a file ready to be included > in postgresql.conf, any sugestion? People might not like me for the suggestion, but I think we should simply always include a 'recovery.conf' in $PGDATA unconditionally. That'd make this easier. Alternatively we could pass a filename to --write-recovery-conf. > > * CheckRecoveryReadyFile() doesn't seem to be a very descriptive > > function name. > > I left it as CheckStartingAsStandby() but i still have a problem of > this not being completely clear. this function is useful for standby > or pitr. > which leads me to the other problem i have: the recovery trigger file, > i have left it as standby.enabled but i still don't like it. > recovery.trigger (Andres objected on this name) > forced_recovery.trigger > user_forced_recovery.trigger stay_in_recovery.trigger? That'd be pretty clear for anybody involved in pg, but otherwise... > > * the description of archive_cleanup_command seems wrong to me. > > why? it seems to be the same that was in recovery.conf. where did you > see the description you're complaining at? I dislike the description in guc.c > + {"archive_cleanup_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, > + gettext_noop("Sets the shell command that will be executed at every restartpoint."), > + NULL > + }, > + &archive_cleanup_command, previously it was: > -# specifies an optional shell command to execute at every restartpoint. > -# This can be useful for cleaning up the archive of a standby server. > > * Why the PG_TRY/PG_CATCH in check_recovery_target_time? Besides being > > really strangely formatted (multiline :? inside a function?) it > > doesn't a) seem to be correct to ignore potential memory allocation > > errors by just switching back into the context that just errored out, > > and continue to work there b) forgo cleanup by just continuing as if > > nothing happened. That's unlikely to be acceptable. > > the code that read recovery.conf didn't has that, so i just removed it Well, that's not necessarily correct. recovery.conf was only read during startup, while this is read on SIGHUP. > > * Why do you include xlog_internal.h in guc.c and not xlog.h? > we actually need both but including xlog_internal.h also includes xlog.h > i added xlog.h and if someone things is enough only putting > xlog_internal.h let me know What's required from xlog_internal.h? > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c > index b53ae87..54f6a0d 100644 > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -64,11 +64,12 @@ > extern uint32 bootstrap_data_checksum_version; > > /* File path names (all relative to $PGDATA) */ > -#define RECOVERY_COMMAND_FILE "recovery.conf" > -#define RECOVERY_COMMAND_DONE "recovery.done" > +#define RECOVERY_ENABLE_FILE "standby.enabled" Imo the file and variable names should stay coherent. > +/* recovery.conf is not supported anymore */ > +#define RECOVERY_COMMAND_FILE "recovery.conf" > +bool StandbyModeRequested = false; > +static TimestampTz recoveryDelayUntilTime; This imo should be lowercase now, the majority of GUC variables are. > +/* are we currently in standby mode? */ > +bool StandbyMode = false; Why did you move this? > - if (rtliGiven) > + if (strcmp(recovery_target_timeline_string, "") != 0) > { Why not have the convention that NULL indicates a unset target_timeline like you use for other GUCs? Mixing things like this is confusing. Why is recovery_target_timeline stored as a string? Because it's a unsigned int? If so, you should have an assign hook setting up a) rtliGiven, b) properly typed variable. > - if (rtli) > + if (recovery_target_timeline) > { > /* Timeline 1 does not have a history file, all else should */ > - if (rtli != 1 && !existsTimeLineHistory(rtli)) > + if (recovery_target_timeline != 1 && > + !existsTimeLineHistory(recovery_target_timeline)) > ereport(FATAL, > (errmsg("recovery target timeline %u does not exist", > - rtli))); > - recoveryTargetTLI = rtli; > + recovery_target_timeline))); > + recoveryTargetTLI = recovery_target_timeline; > recoveryTargetIsLatest = false; So, now we have a recoveryTargetTLI and recovery_target_timeline variable? Really? Why do we need recoveryTargetTLI at all now? > +static void > +assign_recovery_target_xid(const char *newval, void *extra) > +{ > + recovery_target_xid = *((TransactionId *) extra); > + > + if (recovery_target_xid != InvalidTransactionId) > + recovery_target = RECOVERY_TARGET_XID; > + else if (recovery_target_name[0]) > + recovery_target = RECOVERY_TARGET_NAME; > + else if (recovery_target_time != 0) > + recovery_target = RECOVERY_TARGET_TIME; > + else > + recovery_target = RECOVERY_TARGET_UNSET; > +} > +static void > +assign_recovery_target_time(const char *newval, void *extra) > +{ > + recovery_target_time = *((TimestampTz *) extra); > + > + if (recovery_target_xid != InvalidTransactionId) > + recovery_target = RECOVERY_TARGET_XID; > + else if (recovery_target_name[0]) > + recovery_target = RECOVERY_TARGET_NAME; > + else if (recovery_target_time != 0) > + recovery_target = RECOVERY_TARGET_TIME; > + else > + recovery_target = RECOVERY_TARGET_UNSET; > +} > + I don't think it's correct to do such hangups in the assign hook - you have no ideas in which order they will be called and such. Imo that should happen at startup, like we also do it for other interdependent variables like wal_buffers. > +static bool > +check_recovery_target_time(char **newval, void **extra, GucSource source) > +{ > + TimestampTz time; > + TimestampTz *myextra; > + > + if (strcmp(*newval, "") == 0) > + time = 0; > + else > + time = DatumGetTimestampTz(DirectFunctionCall3(timestamptz_in, > + CStringGetDatum(*newval), > + ObjectIdGetDatum(InvalidOid), > + Int32GetDatum(-1))); > + > + myextra = (TimestampTz *) guc_malloc(ERROR, sizeof(TimestampTz)); > + *myextra = time; > + *extra = (void *) myextra; > + > + return true; > +} Trailing space behind the strcmp(). I don't think that's correct. Afaics it will cause the postmaster to crash on a SIGHUP with invalid data. I think you're unfortunately going to have to copy a fair bit of timestamptz_in() and even DateTimeParseError(), replacing the ereport()s by the guc error reporting. Greetings, Andres Freund
This patch is in "Waiting for Author" for a couple of weeks and has received a review at least from Andres during this commit fest. As the situation is not much progressing, I am going to mark it as "Returned with feedback". If there are any problems with that please let me know. Thanks, -- Michael
All, Can we get this patch going again for 9.5? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Hi, On 2014-06-04 16:32:33 -0700, Josh Berkus wrote: > Can we get this patch going again for 9.5? A patch gets going by somebody working on it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 06/04/2014 04:35 PM, Andres Freund wrote: > > Hi, > > On 2014-06-04 16:32:33 -0700, Josh Berkus wrote: >> Can we get this patch going again for 9.5? > > A patch gets going by somebody working on it. Well yes, but it is also great to have someone remind others that it is of interest. JD > > Greetings, > > Andres Freund > -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc Political Correctness is for cowards.
Hello, Here's an attempt to revive this patch. It is rebased onto the latest master and also includes handling and documentation of newly added recovery.conf parameters such as primary_slot_name, recovery_min_apply_delay and recovery_target='immediate'. The following feedback had been addressed: Andres Freund <andres@2ndquadrant.com> writes: >> > >> > * --write-standby-enable seems to loose quite some functionality in >> > comparison to --write-recovery-conf since it doesn't seem to set >> > primary_conninfo, standby anymore. >> >> we can add code that looks for postgresql.conf in $PGDATA but if >> postgresql.conf is not there (like the case in debian, there is not >> much we can do about it) or we can write a file ready to be included >> in postgresql.conf, any sugestion? > > People might not like me for the suggestion, but I think we should > simply always include a 'recovery.conf' in $PGDATA > unconditionally. That'd make this easier. > Alternatively we could pass a filename to --write-recovery-conf. Well, the latest version of this patch fails to start when it sees 'recovery.conf' in PGDATA: FATAL: "recovery.conf" is not supported anymore as a recovery method DETAIL: Refer to appropriate documentation about migration methods I've missed all the discussion behind this decision and after reading the ALTER SYSTEM/conf.d thread I'm even more confused so I'd like someone more knowledgeable to speak up on the status of this. Do we want to keep this behavior of the patch? >> > * CheckRecoveryReadyFile() doesn't seem to be a very descriptive >> > function name. >> >> I left it as CheckStartingAsStandby() but i still have a problem of >> this not being completely clear. this function is useful for standby >> or pitr. There's not much left for this function in the current patch version, so maybe we should just move it to StartupXLOG (it's not called from anywhere else either way). >> which leads me to the other problem i have: the recovery trigger file, >> i have left it as standby.enabled but i still don't like it. > >> recovery.trigger (Andres objected on this name) >> forced_recovery.trigger >> user_forced_recovery.trigger > > stay_in_recovery.trigger? That'd be pretty clear for anybody involved in > pg, but otherwise... The docs use the term "continuous recovery". Either way, from the code it is clear that we only stay in recovery if standby_mode is directly turned on. This makes the whole check for a specially named file unnecessary, IMO: we should just check the value of standby_mode (which is off by default). By the way, is there any use in setting standby_mode=on and any of the recovery_target* GUCs at the same time? I think it can only play together if you set the target farther than the latest point you've got in the archive locally. So that's sort of "Point-in-Future-Recovery". Does that make any sense at all? >> > * the description of archive_cleanup_command seems wrong to me. >> >> why? it seems to be the same that was in recovery.conf. where did you >> see the description you're complaining at? > > I dislike the description in guc.c > >> + {"archive_cleanup_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, >> + gettext_noop("Sets the shell command that will be executed at every restartpoint."), >> + NULL >> + }, >> + &archive_cleanup_command, > > previously it was: > >> -# specifies an optional shell command to execute at every restartpoint. >> -# This can be useful for cleaning up the archive of a standby server. Expanded the GUC desc. >> > * Why the PG_TRY/PG_CATCH in check_recovery_target_time? Besides being >> > really strangely formatted (multiline :? inside a function?) it >> > doesn't a) seem to be correct to ignore potential memory allocation >> > errors by just switching back into the context that just errored out, >> > and continue to work there b) forgo cleanup by just continuing as if >> > nothing happened. That's unlikely to be acceptable. >> >> the code that read recovery.conf didn't has that, so i just removed it > > Well, that's not necessarily correct. recovery.conf was only read during > startup, while this is read on SIGHUP. > [copied from the bottom, related] > > I don't think that's correct. Afaics it will cause the postmaster to > crash on a SIGHUP with invalid data. I think you're unfortunately going > to have to copy a fair bit of timestamptz_in() and even > DateTimeParseError(), replacing the ereport()s by the guc error > reporting. The use of PG_TRY/CATCH does protect from FATALs in SIGHUP indeed. Using CopyErrorData() we can also fetch the actual error message from timestamptz_in, though I wonder we really have to make a full copy. >> > * Why do you include xlog_internal.h in guc.c and not xlog.h? > >> we actually need both but including xlog_internal.h also includes xlog.h >> i added xlog.h and if someone things is enough only putting >> xlog_internal.h let me know > > What's required from xlog_internal.h? Looks like this was addressed before me. >> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c >> index b53ae87..54f6a0d 100644 >> --- a/src/backend/access/transam/xlog.c >> +++ b/src/backend/access/transam/xlog.c >> @@ -64,11 +64,12 @@ >> extern uint32 bootstrap_data_checksum_version; >> >> /* File path names (all relative to $PGDATA) */ >> -#define RECOVERY_COMMAND_FILE "recovery.conf" >> -#define RECOVERY_COMMAND_DONE "recovery.done" >> +#define RECOVERY_ENABLE_FILE "standby.enabled" > > Imo the file and variable names should stay coherent. Yes, once we settle on the name (and if we really need that extra trigger file.) >> +/* recovery.conf is not supported anymore */ >> +#define RECOVERY_COMMAND_FILE "recovery.conf" > >> +bool StandbyModeRequested = false; >> +static TimestampTz recoveryDelayUntilTime; > > This imo should be lowercase now, the majority of GUC variables are. This is not a GUC variable, though it's calculated based on a GUC recovery_min_apply_delay. >> +/* are we currently in standby mode? */ >> +bool StandbyMode = false; > > Why did you move this? It was easy to move it back though. >> - if (rtliGiven) >> + if (strcmp(recovery_target_timeline_string, "") != 0) >> { > > Why not have the convention that NULL indicates a unset target_timeline > like you use for other GUCs? Mixing things like this is confusing. > > Why is recovery_target_timeline stored as a string? Because it's a > unsigned int? If so, you should have an assign hook setting up a) > rtliGiven, b) properly typed variable. Yes, I believe setting these to NULL by default makes a lot more sense. Then one can check if there was a non-default setting by checking the *_string variable, which is not possible with int or TimestampTz. >> - if (rtli) >> + if (recovery_target_timeline) >> { >> /* Timeline 1 does not have a history file, all else should */ >> - if (rtli != 1 && !existsTimeLineHistory(rtli)) >> + if (recovery_target_timeline != 1 && >> + !existsTimeLineHistory(recovery_target_timeline)) >> ereport(FATAL, >> (errmsg("recovery target timeline %u does not exist", >> - rtli))); >> - recoveryTargetTLI = rtli; >> + recovery_target_timeline))); >> + recoveryTargetTLI = recovery_target_timeline; >> recoveryTargetIsLatest = false; > > So, now we have a recoveryTargetTLI and recovery_target_timeline > variable? Really? Why do we need recoveryTargetTLI at all now? Looks like we still do need both of them. The initial value of recoveryTargetTLI is derived from pg_control, but later it can be overriden by this setting. However, if the recovery_target_timeline was "latest" we need to find the latest TLI, based on the initial value from pg_control. But we can't set final recoveryTargetTLI value in the GUC assign hook, because we might need to fetch some history files first. >> +static void >> +assign_recovery_target_time(const char *newval, void *extra) >> +{ >> + recovery_target_time = *((TimestampTz *) extra); >> + >> + if (recovery_target_xid != InvalidTransactionId) >> + recovery_target = RECOVERY_TARGET_XID; >> + else if (recovery_target_name[0]) >> + recovery_target = RECOVERY_TARGET_NAME; >> + else if (recovery_target_time != 0) >> + recovery_target = RECOVERY_TARGET_TIME; >> + else >> + recovery_target = RECOVERY_TARGET_UNSET; >> +} >> + > > I don't think it's correct to do such hangups in the assign hook - you > have no ideas in which order they will be called and such. Imo that > should happen at startup, like we also do it for other interdependent > variables like wal_buffers. Yeah, that looked weird. Moved to StartupXLOG(). I disliked the strtoul handling in the earlier version of the patch, especially given that with the base=0 it can parse 0x-prefixed hex strings. I would rather error out on non-hex digit instead of stopping and calling it OK. This change is included in the new version. Should we really allow specifying negative values for XID/timeline? Right now it will happily consume "-1" for recovery_target_xid and complain if it's out of range, like this: LOG: starting point-in-time recovery to XID 4294967295 LOG: invalid primary checkpoint record LOG: invalid secondary checkpoint record Allowing negative values makes even less sense for timelines, IMO. -- Alex
Attachment
On 11/21/2014 09:35 AM, Alex Shulgin wrote: > Hello, > > Here's an attempt to revive this patch. Yayy! Thank you. >> People might not like me for the suggestion, but I think we should >> simply always include a 'recovery.conf' in $PGDATA >> unconditionally. That'd make this easier. >> Alternatively we could pass a filename to --write-recovery-conf. > > Well, the latest version of this patch fails to start when it sees > 'recovery.conf' in PGDATA: > > FATAL: "recovery.conf" is not supported anymore as a recovery method > DETAIL: Refer to appropriate documentation about migration methods > > I've missed all the discussion behind this decision and after reading > the ALTER SYSTEM/conf.d thread I'm even more confused so I'd like > someone more knowledgeable to speak up on the status of this. The argument was that people wanted to be able to have an empty recovery.conf file as a "we're in standby" trigger so that they could preserve backwards compatibility with external tools. I don't agree with this argument, but several people championed it. > The docs use the term "continuous recovery". > > Either way, from the code it is clear that we only stay in recovery if > standby_mode is directly turned on. This makes the whole check for a > specially named file unnecessary, IMO: we should just check the value of > standby_mode (which is off by default). So, what happens when someone does "pg_ctl promote"? Somehow standby_mode needs to get set to "off". Maybe we write "standby_mode = off" to postgresql.auto.conf? > By the way, is there any use in setting standby_mode=on and any of the > recovery_target* GUCs at the same time? See my thread on this topic from last week. Short answer: No. > I think it can only play together if you set the target farther than the > latest point you've got in the archive locally. So that's sort of > "Point-in-Future-Recovery". Does that make any sense at all? Again, see the thread. This doesn't work in a useful way, so there's no reason to write code to enable it. >>> /* File path names (all relative to $PGDATA) */ >>> -#define RECOVERY_COMMAND_FILE "recovery.conf" >>> -#define RECOVERY_COMMAND_DONE "recovery.done" >>> +#define RECOVERY_ENABLE_FILE "standby.enabled" >> >> Imo the file and variable names should stay coherent. > > Yes, once we settle on the name (and if we really need that extra > trigger file.) Personally, if we have three methods of promotion: 1) pg_ctl promote 2) edit postgresql.conf and reload 3) ALTER SYSTEM SET and reload ... I don't honestly think we need a 4th method for promotion. The main reason to want a "we're in recovery file" is for PITR rather than for replication, where it has a number of advantages as a method, the main one being that recovery.conf is unlikely to be overwritten by the contents of the backup. HOWEVER, there's a clear out for this with conf.d. If we enable conf.d by default, then we can simply put recovery settings in conf.d, and since that specific file (which can have whatever name the user chooses) will not be part of backups, it would have the same advantage as recovery.conf, without the drawbacks. Discuss? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
* Josh Berkus (josh@agliodbs.com) wrote: > > Either way, from the code it is clear that we only stay in recovery if > > standby_mode is directly turned on. This makes the whole check for a > > specially named file unnecessary, IMO: we should just check the value of > > standby_mode (which is off by default). > > So, what happens when someone does "pg_ctl promote"? Somehow > standby_mode needs to get set to "off". Maybe we write "standby_mode = > off" to postgresql.auto.conf? Uhh, and then not work for the sane folks who disable postgresql.auto.conf? No thanks.. > HOWEVER, there's a clear out for this with conf.d. If we enable conf.d > by default, then we can simply put recovery settings in conf.d, and > since that specific file (which can have whatever name the user chooses) > will not be part of backups, it would have the same advantage as > recovery.conf, without the drawbacks. conf.d is a possibility, but there may be environments where the postgres users doesn't have access to write into the conf.d directrory.. Not sure if that'd be an issue for what you're suggesting but wanted to point it out. Thanks, Stephen
On 11/21/2014 10:54 AM, Stephen Frost wrote: > * Josh Berkus (josh@agliodbs.com) wrote: >>> Either way, from the code it is clear that we only stay in recovery if >>> standby_mode is directly turned on. This makes the whole check for a >>> specially named file unnecessary, IMO: we should just check the value of >>> standby_mode (which is off by default). >> >> So, what happens when someone does "pg_ctl promote"? Somehow >> standby_mode needs to get set to "off". Maybe we write "standby_mode = >> off" to postgresql.auto.conf? > > Uhh, and then not work for the sane folks who disable > postgresql.auto.conf? No thanks.. Other ideas then, without reverting to the old system? Being able to promote servers over port 5432 will be a huge advantage for container-based systems, so I don't want to give that up as a feature. >> HOWEVER, there's a clear out for this with conf.d. If we enable conf.d >> by default, then we can simply put recovery settings in conf.d, and >> since that specific file (which can have whatever name the user chooses) >> will not be part of backups, it would have the same advantage as >> recovery.conf, without the drawbacks. > > conf.d is a possibility, but there may be environments where the > postgres users doesn't have access to write into the conf.d directrory.. > Not sure if that'd be an issue for what you're suggesting but wanted to > point it out. It's not a real issue for any use-case I'm currently dealing with. Would this approach break things for other people? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Sat, Nov 22, 2014 at 12:24 AM, Stephen Frost <sfrost@snowman.net> wrote:
>
> * Josh Berkus (josh@agliodbs.com) wrote:
> > > Either way, from the code it is clear that we only stay in recovery if
> > > standby_mode is directly turned on. This makes the whole check for a
> > > specially named file unnecessary, IMO: we should just check the value of
> > > standby_mode (which is off by default).
> >
> > So, what happens when someone does "pg_ctl promote"? Somehow
> > standby_mode needs to get set to "off". Maybe we write "standby_mode =
> > off" to postgresql.auto.conf?
>
> Uhh, and then not work for the sane folks who disable
> postgresql.auto.conf? No thanks..
>
What exactly you mean by 'disable postgresql.auto.conf', do you
>
> * Josh Berkus (josh@agliodbs.com) wrote:
> > > Either way, from the code it is clear that we only stay in recovery if
> > > standby_mode is directly turned on. This makes the whole check for a
> > > specially named file unnecessary, IMO: we should just check the value of
> > > standby_mode (which is off by default).
> >
> > So, what happens when someone does "pg_ctl promote"? Somehow
> > standby_mode needs to get set to "off". Maybe we write "standby_mode =
> > off" to postgresql.auto.conf?
>
> Uhh, and then not work for the sane folks who disable
> postgresql.auto.conf? No thanks..
>
What exactly you mean by 'disable postgresql.auto.conf', do you
mean user runs Alter System to remove that entry or manually disable
some particular entry?
By default postgresql.auto.conf is always read, so if there is any
setting present in that file, that will taken into consideration.
On 2014-11-21 10:59:23 -0800, Josh Berkus wrote: > On 11/21/2014 10:54 AM, Stephen Frost wrote: > > * Josh Berkus (josh@agliodbs.com) wrote: > >>> Either way, from the code it is clear that we only stay in recovery if > >>> standby_mode is directly turned on. This makes the whole check for a > >>> specially named file unnecessary, IMO: we should just check the value of > >>> standby_mode (which is off by default). > >> > >> So, what happens when someone does "pg_ctl promote"? Somehow > >> standby_mode needs to get set to "off". Maybe we write "standby_mode = > >> off" to postgresql.auto.conf? > > > > Uhh, and then not work for the sane folks who disable > > postgresql.auto.conf? No thanks.. > > Other ideas then, without reverting to the old system? Being able to > promote servers over port 5432 will be a huge advantage for > container-based systems, so I don't want to give that up as a feature. Why is a trigger file making that impossible? Adding the code from pg_ctl promote into a SQL callable function is a couple minutes worth of work? A guc alone won't work very well - standby_mode is checked in specific places, you can't just turn that off and expect that that'd result in speedy promotion. And it'd break people using scripts pg_standby. And it also has other effects. So, no. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Alex Shulgin <ash@commandprompt.com> writes: > >>> > * Why do you include xlog_internal.h in guc.c and not xlog.h? >> >>> we actually need both but including xlog_internal.h also includes xlog.h >>> i added xlog.h and if someone things is enough only putting >>> xlog_internal.h let me know >> >> What's required from xlog_internal.h? > > Looks like this was addressed before me. Actually, it's there to bring in MAXFNAMELEN which is used in check_recovery_target_name. Not sure how it even compiled without that, but after some branch switching it doesn't anymore. :-p Maybe we should move these check/assign hooks to xlog.c, but that's likely going to create header files dependency problem due to use of GucSource in the hook prototype... -- Alex
Josh Berkus <josh@agliodbs.com> writes: >> >> Well, the latest version of this patch fails to start when it sees >> 'recovery.conf' in PGDATA: >> >> FATAL: "recovery.conf" is not supported anymore as a recovery method >> DETAIL: Refer to appropriate documentation about migration methods >> >> I've missed all the discussion behind this decision and after reading >> the ALTER SYSTEM/conf.d thread I'm even more confused so I'd like >> someone more knowledgeable to speak up on the status of this. > > The argument was that people wanted to be able to have an empty > recovery.conf file as a "we're in standby" trigger so that they could > preserve backwards compatibility with external tools. I don't agree > with this argument, but several people championed it. It doesn't look like we can provide for 100% backwards compatibility with existing tools, here's why: a) The only sensible way to use recovery.conf as GUC source is via include/include_dir from postgresql.conf. b) The server used to rename $PGDATA/recovery.conf to $PGDATA/recovery.done so when it is re-started it doesn't accidentallystart in recovery mode. We can't keep doing (b) because that will make postgres fail to start on a missing include. Well, it won't error out if we place the file under include_dir, as only files with the ".conf" suffix are included, but then again the server wouldn't know which file to rename unless we tell it or hard-code the the filename. Either way this is not 100% backwards compatible. Now the question is if we are going to break compatibility anyway: should we try to minimize the difference or will it disserve the user by providing a false sense of compatibility? For one, if we're breaking things, the trigger_file GUC should be renamed to end_recovery_trigger_file or something like that, IMO. That's if we decide to keep it at all. >> The docs use the term "continuous recovery". >> >> Either way, from the code it is clear that we only stay in recovery if >> standby_mode is directly turned on. This makes the whole check for a >> specially named file unnecessary, IMO: we should just check the value of >> standby_mode (which is off by default). > > So, what happens when someone does "pg_ctl promote"? Somehow > standby_mode needs to get set to "off". Maybe we write "standby_mode = > off" to postgresql.auto.conf? Well, standby_mode is only consulted at startup after crash recovery. But suddenly, a tool that *writes* recovery.conf needs to also consult/edit postgresql.auto.conf... Maybe that's inevitable, but still a point to consider. >> By the way, is there any use in setting standby_mode=on and any of the >> recovery_target* GUCs at the same time? > > See my thread on this topic from last week. Short answer: No. > >> I think it can only play together if you set the target farther than the >> latest point you've got in the archive locally. So that's sort of >> "Point-in-Future-Recovery". Does that make any sense at all? > > Again, see the thread. This doesn't work in a useful way, so there's no > reason to write code to enable it. Makes sense. Should we incorporate the actual tech and doc fix in this patch? >>>> /* File path names (all relative to $PGDATA) */ >>>> -#define RECOVERY_COMMAND_FILE "recovery.conf" >>>> -#define RECOVERY_COMMAND_DONE "recovery.done" >>>> +#define RECOVERY_ENABLE_FILE "standby.enabled" >>> >>> Imo the file and variable names should stay coherent. >> >> Yes, once we settle on the name (and if we really need that extra >> trigger file.) > > Personally, if we have three methods of promotion: > > 1) pg_ctl promote > 2) edit postgresql.conf and reload > 3) ALTER SYSTEM SET and reload > > ... I don't honestly think we need a 4th method for promotion. Me neither. It is tempting to make everything spin around GUCs without the need for any external trigger files. > The main reason to want a "we're in recovery file" is for PITR rather > than for replication, where it has a number of advantages as a method, > the main one being that recovery.conf is unlikely to be overwritten by > the contents of the backup. > > HOWEVER, there's a clear out for this with conf.d. If we enable conf.d > by default, then we can simply put recovery settings in conf.d, and > since that specific file (which can have whatever name the user chooses) > will not be part of backups, it would have the same advantage as > recovery.conf, without the drawbacks. > > Discuss? Well, I don't readily see how conf.d is special with regard to base backup, wouldn't you need to exclude it explicitly still? -- Alex
* Amit Kapila (amit.kapila16@gmail.com) wrote: > What exactly you mean by 'disable postgresql.auto.conf', do you > mean user runs Alter System to remove that entry or manually disable > some particular entry? Last I paid attention to this, there was a clear way to disable the inclusion of postgresql.auto.conf in the postgresql.conf. If that's gone, then there is a serious problem. Administrators who manage their postgresql.conf (eg: through a CM system like puppet or chef..) must have a way to prevent other changes. Thanks, Stephen
Stephen Frost wrote: > * Amit Kapila (amit.kapila16@gmail.com) wrote: > > What exactly you mean by 'disable postgresql.auto.conf', do you > > mean user runs Alter System to remove that entry or manually disable > > some particular entry? > > Last I paid attention to this, there was a clear way to disable the > inclusion of postgresql.auto.conf in the postgresql.conf. If that's > gone, then there is a serious problem. Administrators who manage their > postgresql.conf (eg: through a CM system like puppet or chef..) must > have a way to prevent other changes. Sigh, here we go again. I don't think you can disable postgresql.auto.conf in the current code. As I recall, the argument was that it's harder to diagnose problems if postgresql.auto.conf takes effect in some system but not others. I think if you want puppet or chef etc you'd add postgresql.auto.conf as a config file in those systems, so that ALTER SYSTEM is reflected there. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > Stephen Frost wrote: > > * Amit Kapila (amit.kapila16@gmail.com) wrote: > > > What exactly you mean by 'disable postgresql.auto.conf', do you > > > mean user runs Alter System to remove that entry or manually disable > > > some particular entry? > > > > Last I paid attention to this, there was a clear way to disable the > > inclusion of postgresql.auto.conf in the postgresql.conf. If that's > > gone, then there is a serious problem. Administrators who manage their > > postgresql.conf (eg: through a CM system like puppet or chef..) must > > have a way to prevent other changes. > > Sigh, here we go again. I don't think you can disable postgresql.auto.conf > in the current code. As I recall, the argument was that it's harder to > diagnose problems if postgresql.auto.conf takes effect in some system > but not others. I don't buy this at all. What's going to be terribly confusing is to have config changes start happening for users who are managing their configs through a CM (which most should be..). ALTER SYSTEM is going to cause more problems than it solves. > I think if you want puppet or chef etc you'd add postgresql.auto.conf as > a config file in those systems, so that ALTER SYSTEM is reflected there. That's really a horrible, horrible answer. The DBA makes some change and then reloads remotely, only to have puppet or chef come along and change it back later? Talk about a recipe for disaster. The only reason I stopped worrying about the foolishness of ALTER SYSTEM was because it could be disabled. I'm very disappointed to hear that someone saw fit to remove that. I'll also predict that it'll be going back in for 9.5. Thanks, Stephen
Stephen Frost wrote: > * Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > > > Last I paid attention to this, there was a clear way to disable the > > > inclusion of postgresql.auto.conf in the postgresql.conf. If that's > > > gone, then there is a serious problem. Administrators who manage their > > > postgresql.conf (eg: through a CM system like puppet or chef..) must > > > have a way to prevent other changes. > > > > Sigh, here we go again. I don't think you can disable postgresql.auto.conf > > in the current code. As I recall, the argument was that it's harder to > > diagnose problems if postgresql.auto.conf takes effect in some system > > but not others. > > I don't buy this at all. What's going to be terribly confusing is to > have config changes start happening for users who are managing their > configs through a CM (which most should be..). ALTER SYSTEM is going to > cause more problems than it solves. I guess if you have two DBAs who don't talk to each other, and one changes things through puppet and another through ALTER SYSTEM, it's going to be confusing, yes. > > I think if you want puppet or chef etc you'd add postgresql.auto.conf as > > a config file in those systems, so that ALTER SYSTEM is reflected there. > > That's really a horrible, horrible answer. The DBA makes some change > and then reloads remotely, only to have puppet or chef come along and > change it back later? Talk about a recipe for disaster. Are you saying puppet/chef don't have the concept that a file is to be backed up and changes on it notified, but that direct changes to it should not be allowed? That sounds, um, limited. > The only reason I stopped worrying about the foolishness of ALTER SYSTEM > was because it could be disabled. I'm very disappointed to hear that > someone saw fit to remove that. I'll also predict that it'll be going > back in for 9.5. *shrug* -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > Stephen Frost wrote: > > * Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > > > > > Last I paid attention to this, there was a clear way to disable the > > > > inclusion of postgresql.auto.conf in the postgresql.conf. If that's > > > > gone, then there is a serious problem. Administrators who manage their > > > > postgresql.conf (eg: through a CM system like puppet or chef..) must > > > > have a way to prevent other changes. > > > > > > Sigh, here we go again. I don't think you can disable postgresql.auto.conf > > > in the current code. As I recall, the argument was that it's harder to > > > diagnose problems if postgresql.auto.conf takes effect in some system > > > but not others. > > > > I don't buy this at all. What's going to be terribly confusing is to > > have config changes start happening for users who are managing their > > configs through a CM (which most should be..). ALTER SYSTEM is going to > > cause more problems than it solves. > > I guess if you have two DBAs who don't talk to each other, and one > changes things through puppet and another through ALTER SYSTEM, it's > going to be confusing, yes. It's not DBAs, that's the point.. You have sysadmins who manage the system configs (things like postgresql.conf) and you have DBAs whose only access to the system is through 5432. This seperation of responsibilities is very common, in my experience at least, and conflating the two through ALTER SYSTEM is going to cause nothing but problems. There had been a way to keep that seperation by simply disabling the postgresql.auto.conf, but that's now been removed. > > > I think if you want puppet or chef etc you'd add postgresql.auto.conf as > > > a config file in those systems, so that ALTER SYSTEM is reflected there. > > > > That's really a horrible, horrible answer. The DBA makes some change > > and then reloads remotely, only to have puppet or chef come along and > > change it back later? Talk about a recipe for disaster. > > Are you saying puppet/chef don't have the concept that a file is to be > backed up and changes on it notified, but that direct changes to it > should not be allowed? That sounds, um, limited. Of course they can but that's completely missing the point. The postgresql.conf file is *already* managed in puppet or chef in a *lot* of places. We're removing the ability to do that and reverting to a situation where auditing has to be done instead. That's a regression, not a step forward. > > The only reason I stopped worrying about the foolishness of ALTER SYSTEM > > was because it could be disabled. I'm very disappointed to hear that > > someone saw fit to remove that. I'll also predict that it'll be going > > back in for 9.5. > > *shrug* ... and I'll continue to argue against anything which requires postgresql.auto.conf to be hacked to work (as was proposed on this thread..). Thanks, Stephen
On 2014-11-24 10:13:58 -0500, Stephen Frost wrote: > * Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > > Stephen Frost wrote: > > > * Amit Kapila (amit.kapila16@gmail.com) wrote: > > > > What exactly you mean by 'disable postgresql.auto.conf', do you > > > > mean user runs Alter System to remove that entry or manually disable > > > > some particular entry? > > > > > > Last I paid attention to this, there was a clear way to disable the > > > inclusion of postgresql.auto.conf in the postgresql.conf. If that's > > > gone, then there is a serious problem. Administrators who manage their > > > postgresql.conf (eg: through a CM system like puppet or chef..) must > > > have a way to prevent other changes. > > > > Sigh, here we go again. I don't think you can disable postgresql.auto.conf > > in the current code. As I recall, the argument was that it's harder to > > diagnose problems if postgresql.auto.conf takes effect in some system > > but not others. > > I don't buy this at all. What's going to be terribly confusing is to > have config changes start happening for users who are managing their > configs through a CM (which most should be..). ALTER SYSTEM is going to > cause more problems than it solves. I fail to see how this really has really anything to do with this topic. Obviously ALTER SYSTEM isn't a applicable solution for this as HS might not be in use or HS might not have reached consistency at that point. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Alex Shulgin <ash@commandprompt.com> writes: > Maybe we should move these check/assign hooks to xlog.c, but that's > likely going to create header files dependency problem due to use of > GucSource in the hook prototype... As far as that goes, there is already plenty of precedent for declaring assorted check/assign hook functions in guc.h rather than including guc.h into the headers where they would otherwise belong. regards, tom lane
Stephen Frost wrote > * Alvaro Herrera ( > alvherre@ > ) wrote: >> Stephen Frost wrote: >> > * Amit Kapila ( > amit.kapila16@ > ) wrote: >> > > What exactly you mean by 'disable postgresql.auto.conf', do you >> > > mean user runs Alter System to remove that entry or manually disable >> > > some particular entry? >> >> Sigh, here we go again. I don't think you can disable >> postgresql.auto.conf >> in the current code. As I recall, the argument was that it's harder to >> diagnose problems if postgresql.auto.conf takes effect in some system >> but not others. > > I don't buy this at all. What's going to be terribly confusing is to > have config changes start happening for users who are managing their > configs through a CM (which most should be..). ALTER SYSTEM is going to > cause more problems than it solves. So what happens if someone makes postgresql.auto.conf read-only (to everyone)? David J. -- View this message in context: http://postgresql.nabble.com/Turning-recovery-conf-into-GUCs-tp5774757p5828052.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On Fri, Nov 21, 2014 at 12:55 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 11/21/2014 09:35 AM, Alex Shulgin wrote: >> Hello, >> >> Here's an attempt to revive this patch. > > Yayy! Thank you. > >>> People might not like me for the suggestion, but I think we should >>> simply always include a 'recovery.conf' in $PGDATA >>> unconditionally. That'd make this easier. >>> Alternatively we could pass a filename to --write-recovery-conf. >> >> Well, the latest version of this patch fails to start when it sees >> 'recovery.conf' in PGDATA: >> >> FATAL: "recovery.conf" is not supported anymore as a recovery method >> DETAIL: Refer to appropriate documentation about migration methods >> >> I've missed all the discussion behind this decision and after reading >> the ALTER SYSTEM/conf.d thread I'm even more confused so I'd like >> someone more knowledgeable to speak up on the status of this. > > The argument was that people wanted to be able to have an empty > recovery.conf file as a "we're in standby" trigger so that they could > preserve backwards compatibility with external tools. I don't agree > with this argument, but several people championed it. > well, my approach was that postgres just ignore the file completely. I mean, recovery.conf will no longer mean anything special. Then, every tool that create recovery.conf in $PGDATA only has to add an ALTER SYSTEM to include it >> The docs use the term "continuous recovery". >> >> Either way, from the code it is clear that we only stay in recovery if >> standby_mode is directly turned on. This makes the whole check for a >> specially named file unnecessary, IMO: we should just check the value of >> standby_mode (which is off by default). > no. currently we enter in recovery mode when postgres see a recovery.conf and stays in recovery mode when standby_mode is on or an appropiate restore_command is provided. which means recovery.conf has two uses: 1) start in recovery mode (not continuous) 2) provide parameters for recovery mode and for streaming we still need a "recovery trigger" file that forces postgres to start in recovery mode and acts accordingly to recovery GUCs > So, what happens when someone does "pg_ctl promote"? Somehow > standby_mode needs to get set to "off". Maybe we write "standby_mode = > off" to postgresql.auto.conf? > we need to delete or rename the "recovery trigger" file, all standby GUCs are ignored (and recovery GUCs should be ignored too) unless you're in recovery mode >> By the way, is there any use in setting standby_mode=on and any of the >> recovery_target* GUCs at the same time? > > See my thread on this topic from last week. Short answer: No. > haven't read that thread, will do > >>>> /* File path names (all relative to $PGDATA) */ >>>> -#define RECOVERY_COMMAND_FILE "recovery.conf" >>>> -#define RECOVERY_COMMAND_DONE "recovery.done" >>>> +#define RECOVERY_ENABLE_FILE "standby.enabled" >>> >>> Imo the file and variable names should stay coherent. >> >> Yes, once we settle on the name (and if we really need that extra >> trigger file.) > yes, we need it. but other names were suggested "standby.enabled" transmit the wrong idea > Personally, if we have three methods of promotion: > > 1) pg_ctl promote > 2) edit postgresql.conf and reload > 3) ALTER SYSTEM SET and reload > > ... I don't honestly think we need a 4th method for promotion. > this is not for promotion, this is to force postgres to start in recovery mode and read recovery configuration parameters. > The main reason to want a "we're in recovery file" is for PITR rather > than for replication, where it has a number of advantages as a method, > the main one being that recovery.conf is unlikely to be overwritten by > the contents of the backup. > only that you need to start in recovery mode to start replication -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157
On Mon, Nov 24, 2014 at 12:24 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote: > On Fri, Nov 21, 2014 at 12:55 PM, Josh Berkus <josh@agliodbs.com> wrote: >> On 11/21/2014 09:35 AM, Alex Shulgin wrote: >>> Hello, >>> >>> Here's an attempt to revive this patch. >> >> Yayy! Thank you. >> >>>> People might not like me for the suggestion, but I think we should >>>> simply always include a 'recovery.conf' in $PGDATA >>>> unconditionally. That'd make this easier. >>>> Alternatively we could pass a filename to --write-recovery-conf. >>> >>> Well, the latest version of this patch fails to start when it sees >>> 'recovery.conf' in PGDATA: >>> >>> FATAL: "recovery.conf" is not supported anymore as a recovery method >>> DETAIL: Refer to appropriate documentation about migration methods >>> >>> I've missed all the discussion behind this decision and after reading >>> the ALTER SYSTEM/conf.d thread I'm even more confused so I'd like >>> someone more knowledgeable to speak up on the status of this. >> >> The argument was that people wanted to be able to have an empty >> recovery.conf file as a "we're in standby" trigger so that they could >> preserve backwards compatibility with external tools. I don't agree >> with this argument, but several people championed it. >> > > well, my approach was that postgres just ignore the file completely. I > mean, recovery.conf will no longer mean anything special. > Then, every tool that create recovery.conf in $PGDATA only has to add > an ALTER SYSTEM to include it > i mean ALTER SYSTEM in master before copying or just add the line in postgresql.conf but, as the patch shows agreement was to break backwards compatibility and fail to start if the file is present -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157
On 11/24/2014 09:24 AM, Jaime Casanova wrote: >> ... I don't honestly think we need a 4th method for promotion. >> > > this is not for promotion, this is to force postgres to start in > recovery mode and read recovery configuration parameters. > >> The main reason to want a "we're in recovery file" is for PITR rather >> than for replication, where it has a number of advantages as a method, >> the main one being that recovery.conf is unlikely to be overwritten by >> the contents of the backup. >> > > only that you need to start in recovery mode to start replication Right, but my point is that having a trigger file *is not necessary for replication, only for PITR* -- and maybe not necessary even for PITR. That is, in a streaming replication configuration, having a "standby_mode = on|off" parameter is 100% sufficient to control replication with the small detail that "pg_ctl promote" needs to set it in pg.auto.conf or conf.d. And, now, having given it some thought, I'm going to argue that it's not required for PITR either, provided that we can use the auto.conf method. Before I go into my ideas, though, what does the current patch do regarding non-replication PITR? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 11/24/2014 07:29 AM, Stephen Frost wrote: >> > > > Sigh, here we go again. I don't think you can disable postgresql.auto.conf >> > > > in the current code. As I recall, the argument was that it's harder to >> > > > diagnose problems if postgresql.auto.conf takes effect in some system >> > > > but not others. >> > >> > I don't buy this at all. What's going to be terribly confusing is to >> > have config changes start happening for users who are managing their >> > configs through a CM (which most should be..). ALTER SYSTEM is going to >> > cause more problems than it solves. The main reason why disabling auto.conf was found not to be worthwhile is that anyone with superuser rights can *already* change the config by using ALTER DATABASE and ALTER ROLE, and that's a LOT less auditable than pg.auto.conf is. Heck, we don't even have a good system view for SET parameters on DB objects. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
* Josh Berkus (josh@agliodbs.com) wrote: > On 11/24/2014 07:29 AM, Stephen Frost wrote: > >> > > > Sigh, here we go again. I don't think you can disable postgresql.auto.conf > >> > > > in the current code. As I recall, the argument was that it's harder to > >> > > > diagnose problems if postgresql.auto.conf takes effect in some system > >> > > > but not others. > >> > > >> > I don't buy this at all. What's going to be terribly confusing is to > >> > have config changes start happening for users who are managing their > >> > configs through a CM (which most should be..). ALTER SYSTEM is going to > >> > cause more problems than it solves. > > The main reason why disabling auto.conf was found not to be worthwhile > is that anyone with superuser rights can *already* change the config by > using ALTER DATABASE and ALTER ROLE, and that's a LOT less auditable > than pg.auto.conf is. Heck, we don't even have a good system view for > SET parameters on DB objects. If this was accurate, we wouldn't have any need for ALTER SYSTEM. They *can't* make certain configuration changes which is why ALTER SYSTEM was put into place to begin with. Those pieces are also, generally speaking, what's needed to get the database started and which control authentication (and possibly authorization to connect). As I've pointed out before, we'll end up with DBAs making changes which break the system from starting and they can't properly test that change nor do anything about it when the DB can no longer start, and the sysadmins will be extremely confused and, worse, will have zero clue that this 'auto.conf' thing even exists or where the config is coming from that's preventing the DB from starting. At least when postgresql.auto.conf was explicitly in the top-level postgresql.conf, there was a hope that they'd realize there's another config file in play (and figure out how to disable it to get the system back online..) and now we haven't even got that. I agree that it'd be nice to have a better view of what has been set using ALTER DATABASE and ALTER ROLE, but nothing here makes me feel any differently about ALTER SYSTEM. Thanks, Stephen
Jaime Casanova <jaime@2ndquadrant.com> writes: >>> >>> Either way, from the code it is clear that we only stay in recovery if >>> standby_mode is directly turned on. This makes the whole check for a >>> specially named file unnecessary, IMO: we should just check the value of >>> standby_mode (which is off by default). > > no. currently we enter in recovery mode when postgres see a > recovery.conf and stays in recovery mode when standby_mode is on or an > appropiate restore_command is provided. > > which means recovery.conf has two uses: > 1) start in recovery mode (not continuous) > 2) provide parameters for recovery mode and for streaming > > we still need a "recovery trigger" file that forces postgres to start > in recovery mode and acts accordingly to recovery GUCs Yes, these were my latest findings also, but if instead of removing the trigger file upon successful recovery the server would set standby_mode=off, that would also work. -- Alex
Josh Berkus <josh@agliodbs.com> writes: >> >> only that you need to start in recovery mode to start replication > > Right, but my point is that having a trigger file *is not necessary for > replication, only for PITR* -- and maybe not necessary even for PITR. > That is, in a streaming replication configuration, having a > "standby_mode = on|off" parameter is 100% sufficient to control > replication with the small detail that "pg_ctl promote" needs to set it > in pg.auto.conf or conf.d. > > And, now, having given it some thought, I'm going to argue that it's not > required for PITR either, provided that we can use the auto.conf method. > > Before I go into my ideas, though, what does the current patch do > regarding non-replication PITR? It removes that $PGDATA/standby.enable trigger file it relies on to start the PITR in the first place. -- Alex
On 2014-11-24 12:02:52 -0800, Josh Berkus wrote: > On 11/24/2014 09:24 AM, Jaime Casanova wrote: > >> ... I don't honestly think we need a 4th method for promotion. > >> > > > > this is not for promotion, this is to force postgres to start in > > recovery mode and read recovery configuration parameters. > > > >> The main reason to want a "we're in recovery file" is for PITR rather > >> than for replication, where it has a number of advantages as a method, > >> the main one being that recovery.conf is unlikely to be overwritten by > >> the contents of the backup. > >> > > > > only that you need to start in recovery mode to start replication > > Right, but my point is that having a trigger file *is not necessary for > replication, only for PITR* -- and maybe not necessary even for PITR. > That is, in a streaming replication configuration, having a > "standby_mode = on|off" parameter is 100% sufficient to control > replication with the small detail that "pg_ctl promote" needs to set it > in pg.auto.conf or conf.d. > > And, now, having given it some thought, I'm going to argue that it's not > required for PITR either, provided that we can use the auto.conf method. > > Before I go into my ideas, though, what does the current patch do > regarding non-replication PITR? Guys. We aren't rereading the GUCs in the relevant places. It's also decidedly nontrivial to make standby_mode PGC_SIGHUP. Don't make this patch more complex than it has to be. That's what stalled it the last times round. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11/24/2014 02:00 PM, Alex Shulgin wrote: > > Josh Berkus <josh@agliodbs.com> writes: >>> >>> only that you need to start in recovery mode to start replication >> >> Right, but my point is that having a trigger file *is not necessary for >> replication, only for PITR* -- and maybe not necessary even for PITR. >> That is, in a streaming replication configuration, having a >> "standby_mode = on|off" parameter is 100% sufficient to control >> replication with the small detail that "pg_ctl promote" needs to set it >> in pg.auto.conf or conf.d. >> >> And, now, having given it some thought, I'm going to argue that it's not >> required for PITR either, provided that we can use the auto.conf method. >> >> Before I go into my ideas, though, what does the current patch do >> regarding non-replication PITR? > > It removes that $PGDATA/standby.enable trigger file it relies on to > start the PITR in the first place. OK, and that's required for replication too? I'm OK with that if it gets the patch in. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: >>> >>> Before I go into my ideas, though, what does the current patch do >>> regarding non-replication PITR? >> >> It removes that $PGDATA/standby.enable trigger file it relies on to >> start the PITR in the first place. > > OK, and that's required for replication too? I'm OK with that if it > gets the patch in. In the current form of the patch, yes. Thought I don't think I like it. -- Alex
On 11/24/2014 02:18 PM, Alex Shulgin wrote: > > Josh Berkus <josh@agliodbs.com> writes: >>>> >>>> Before I go into my ideas, though, what does the current patch do >>>> regarding non-replication PITR? >>> >>> It removes that $PGDATA/standby.enable trigger file it relies on to >>> start the PITR in the first place. >> >> OK, and that's required for replication too? I'm OK with that if it >> gets the patch in. > > In the current form of the patch, yes. Thought I don't think I like it. One step at a time. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Alex Shulgin <ash@commandprompt.com> writes: > > Here's an attempt to revive this patch. Here's the patch rebased against current HEAD, that is including the recently committed action_at_recovery_target option. The default for the new GUC is 'pause', as in HEAD, and pause_at_recovery_target is removed completely in favor of it. I've also taken the liberty to remove that part that errors out when finding $PGDATA/recovery.conf. Now get your rotten tomatoes ready. ;-) -- Alex
Attachment
On Tue, Dec 2, 2014 at 1:58 AM, Alex Shulgin <ash@commandprompt.com> wrote: > Here's the patch rebased against current HEAD, that is including the > recently committed action_at_recovery_target option. If this patch gets in, it gives a good argument to jump to 10.0 IMO. That's not a bad thing, only the cost of making recovery params as GUCs which is still a feature wanted. > The default for the new GUC is 'pause', as in HEAD, and > pause_at_recovery_target is removed completely in favor of it. Makes sense. Another idea that popped out was to rename this parameter as recovery_target_action as well, but that's not really something this patch should care about. > I've also taken the liberty to remove that part that errors out when > finding $PGDATA/recovery.conf. I am not in favor of this part. It may be better to let the users know that their old configuration is not valid anymore with an error. This patch cuts in the flesh with a huge axe, let's be sure that users do not ignore the side pain effects, or recovery.conf would be simply ignored and users would not be aware of that. Regards, -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > On Tue, Dec 2, 2014 at 1:58 AM, Alex Shulgin <ash@commandprompt.com> wrote: >> Here's the patch rebased against current HEAD, that is including the >> recently committed action_at_recovery_target option. > If this patch gets in, it gives a good argument to jump to 10.0 IMO. > That's not a bad thing, only the cost of making recovery params as > GUCs which is still a feature wanted. > >> The default for the new GUC is 'pause', as in HEAD, and >> pause_at_recovery_target is removed completely in favor of it. > Makes sense. Another idea that popped out was to rename this parameter > as recovery_target_action as well, but that's not really something > this patch should care about. Indeed, but changing the name after the fact is straightforward. >> I've also taken the liberty to remove that part that errors out when >> finding $PGDATA/recovery.conf. > I am not in favor of this part. It may be better to let the users know > that their old configuration is not valid anymore with an error. This > patch cuts in the flesh with a huge axe, let's be sure that users do > not ignore the side pain effects, or recovery.conf would be simply > ignored and users would not be aware of that. Yeah, that is good point. I'd be in favor of a solution that works the same way as before the patch, without the need for extra trigger files, etc., but that doesn't seem to be nearly possible. Whatever tricks we might employ will likely be defeated by the fact that the oldschool user will fail to *include* recovery.conf in the main conf file. -- Alex
On 12/02/2014 06:25 AM, Alex Shulgin wrote: >> I am not in favor of this part. It may be better to let the users know >> > that their old configuration is not valid anymore with an error. This >> > patch cuts in the flesh with a huge axe, let's be sure that users do >> > not ignore the side pain effects, or recovery.conf would be simply >> > ignored and users would not be aware of that. > Yeah, that is good point. > > I'd be in favor of a solution that works the same way as before the > patch, without the need for extra trigger files, etc., but that doesn't > seem to be nearly possible. As previously discussed, there are ways to avoid having a trigger file for replication. However, it's hard to avoid having one for PITR recovery; at least, I can't think of a method which isn't just as awkward, and we might as well stick with the awkward method we know. > Whatever tricks we might employ will likely > be defeated by the fact that the oldschool user will fail to *include* > recovery.conf in the main conf file. Well, can we merge this patch and then fight out what to do for the transitional users as a separate patch? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus wrote: > On 12/02/2014 06:25 AM, Alex Shulgin wrote: > > Whatever tricks we might employ will likely > > be defeated by the fact that the oldschool user will fail to *include* > > recovery.conf in the main conf file. > > Well, can we merge this patch and then fight out what to do for the > transitional users as a separate patch? You seem to be saying "I don't have any good idea how to solve this problem now, but I will magically have one once this is committed". I'm not sure that works very well. In any case, the proposal upthread that we raise an error if recovery.conf is found seems sensible enough. Users will see it and they will adjust their stuff -- it's a one-time thing. It's not like they switch a version forwards one week and back the following week. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 12/02/2014 10:31 AM, Alvaro Herrera wrote: > Josh Berkus wrote: >> On 12/02/2014 06:25 AM, Alex Shulgin wrote: > >>> Whatever tricks we might employ will likely >>> be defeated by the fact that the oldschool user will fail to *include* >>> recovery.conf in the main conf file. >> >> Well, can we merge this patch and then fight out what to do for the >> transitional users as a separate patch? > > You seem to be saying "I don't have any good idea how to solve this > problem now, but I will magically have one once this is committed". I'm > not sure that works very well. No, I'm saying "this problem is easy to solve technically, but we have intractable arguments on this list about the best way to solve it, even though the bulk of the patch isn't in dispute". > In any case, the proposal upthread that we raise an error if > recovery.conf is found seems sensible enough. Users will see it and > they will adjust their stuff -- it's a one-time thing. It's not like > they switch a version forwards one week and back the following week. I'm OK with that solution. Apparently others aren't though. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2014-12-02 17:25:14 +0300, Alex Shulgin wrote: > I'd be in favor of a solution that works the same way as before the > patch, without the need for extra trigger files, etc., but that doesn't > seem to be nearly possible. Whatever tricks we might employ will likely > be defeated by the fact that the oldschool user will fail to *include* > recovery.conf in the main conf file. I think removing the ability to define a trigger file is pretty much unacceptable. It's not *too* bad to rewrite recovery.conf's contents into actual valid postgresql.conf format, but it's harder to change an existing complex failover setup that relies on the existance of such a trigger. I think aiming for removal of that is a sure way to prevent the patch from getting in. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-12-02 17:25:14 +0300, Alex Shulgin wrote: >> I'd be in favor of a solution that works the same way as before the >> patch, without the need for extra trigger files, etc., but that doesn't >> seem to be nearly possible. Whatever tricks we might employ will likely >> be defeated by the fact that the oldschool user will fail to *include* >> recovery.conf in the main conf file. > > I think removing the ability to define a trigger file is pretty much > unacceptable. It's not *too* bad to rewrite recovery.conf's contents > into actual valid postgresql.conf format, but it's harder to change an > existing complex failover setup that relies on the existance of such a > trigger. I think aiming for removal of that is a sure way to prevent > the patch from getting in. To make it clear, I was talking not about trigger_file, but about standby.enabled file which triggers the recovery/pitr/standby scenario in the current form of the patch and stands as a replacement check instead of the original check for the presense of recovery.conf. -- Alex
Alex Shulgin <ash@commandprompt.com> writes: > Alex Shulgin <ash@commandprompt.com> writes: >> >> Here's an attempt to revive this patch. > > Here's the patch rebased against current HEAD, that is including the > recently committed action_at_recovery_target option. > > The default for the new GUC is 'pause', as in HEAD, and > pause_at_recovery_target is removed completely in favor of it. > > I've also taken the liberty to remove that part that errors out when > finding $PGDATA/recovery.conf. Now get your rotten tomatoes ready. ;-) This was rather short-sighted, so I've restored that part. Also, rebased on current HEAD, following the rename of action_at_recovery_target to recovery_target_action. -- Alex
Attachment
On 12/12/14 16:34, Alex Shulgin wrote: > Alex Shulgin <ash@commandprompt.com> writes: > >> Alex Shulgin <ash@commandprompt.com> writes: >>> >>> Here's an attempt to revive this patch. >> >> Here's the patch rebased against current HEAD, that is including the >> recently committed action_at_recovery_target option. >> >> The default for the new GUC is 'pause', as in HEAD, and >> pause_at_recovery_target is removed completely in favor of it. >> >> I've also taken the liberty to remove that part that errors out when >> finding $PGDATA/recovery.conf. Now get your rotten tomatoes ready. ;-) > > This was rather short-sighted, so I've restored that part. > > Also, rebased on current HEAD, following the rename of > action_at_recovery_target to recovery_target_action. > Hi, I had a quick look, the patch does not apply cleanly anymore but it's just release notes so nothing too bad. I did not do any testing yet, but I have few comments about the code: - the patch mixes functionality change with the lowercasing of the config variables, I wonder if those could be separated into 2 separate diffs - it would make review somewhat easier, but I can live with it as it is if it's too much work do separate into 2 patches - the StandbyModeRequested and StandbyMode should be lowercased like the rest of the GUCs - I am wondering if StandbyMode and hot_standby should be merged into one GUC if we are breaking backwards compatibility anyway - Could you explain the reasoning behind: + if ((*newval)[0] == 0) + xid = InvalidTransactionId; + else in check_recovery_target_xid() (and same check in check_recovery_target_timeline()), isn't the strtoul which is called later enough? - The whole CopyErrorData and memory context logic is not needed in check_recovery_target_time() as the FlushErrorState() is not called there - The new code in StartupXLOG() like + if (recovery_target_xid_string != NULL) + InitRecoveryTarget(RECOVERY_TARGET_XID); + + if (recovery_target_time_string != NULL) + InitRecoveryTarget(RECOVERY_TARGET_TIME); + + if (recovery_target_name != NULL) + InitRecoveryTarget(RECOVERY_TARGET_NAME); + + if (recovery_target_string != NULL) + InitRecoveryTarget(RECOVERY_TARGET_IMMEDIATE); would probably be better in separate function that you then call StartupXLOG() as StartupXLOG() is crazy long already so I think it's preferable to not make it worse. - I wonder if we should rename trigger_file to standby_tigger_file -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Petr Jelinek <petr@2ndquadrant.com> writes: > > I had a quick look, the patch does not apply cleanly anymore but it's > just release notes so nothing too bad. Yes, there were some ongoing changes that touched some parts of this and I must have missed the latest one (or maybe I was waiting for it to settle down). > I did not do any testing yet, but I have few comments about the code: > > - the patch mixes functionality change with the lowercasing of the > config variables, I wonder if those could be separated into 2 separate > diffs - it would make review somewhat easier, but I can live with it > as it is if it's too much work do separate into 2 patches If we can get the lowercasing committed soon, I'd be in favor of that, otherwise it doesn't sound as pleasing, and there's some renaming to be made too (that standby.enabled, trigger_file, etc.) > - the StandbyModeRequested and StandbyMode should be lowercased like > the rest of the GUCs Yes, except that standby_mode is linked to StandbyModeRequested, not the other one. I can try see if there's a sane way out of this. > - I am wondering if StandbyMode and hot_standby should be merged into > one GUC if we are breaking backwards compatibility anyway I also have the feeling that there's room for merging some knobs together. > - Could you explain the reasoning behind: > + if ((*newval)[0] == 0) > + xid = InvalidTransactionId; > + else > > in check_recovery_target_xid() (and same check in > check_recovery_target_timeline()), isn't the strtoul which is called > later enough? Well, that makes it a bit more apparent to the reader I think and we're not abusing the fact that InvalidTransactionId equals zero. > - The whole CopyErrorData and memory context logic is not needed in > check_recovery_target_time() as the FlushErrorState() is not called > there You must be right. I recall having some trouble with strings being free'd prematurely, but if it's ERROR, then there should be no need for that. I'll check again. > - The new code in StartupXLOG() like > + if (recovery_target_xid_string != NULL) > + InitRecoveryTarget(RECOVERY_TARGET_XID); > + > + if (recovery_target_time_string != NULL) > + InitRecoveryTarget(RECOVERY_TARGET_TIME); > + > + if (recovery_target_name != NULL) > + InitRecoveryTarget(RECOVERY_TARGET_NAME); > + > + if (recovery_target_string != NULL) > + InitRecoveryTarget(RECOVERY_TARGET_IMMEDIATE); > > would probably be better in separate function that you then call > StartupXLOG() as StartupXLOG() is crazy long already so I think it's > preferable to not make it worse. We can move it at top of CheckStartingAsStandby() obviously. > - I wonder if we should rename trigger_file to standby_tigger_file I was also suggesting that, just not sure if mixing it into the same patch is any good. Thank you for the review! -- Alex
Alex Shulgin <ash@commandprompt.com> writes: > Petr Jelinek <petr@2ndquadrant.com> writes: >> >> I had a quick look, the patch does not apply cleanly anymore but it's >> just release notes so nothing too bad. > > Yes, there were some ongoing changes that touched some parts of this and > I must have missed the latest one (or maybe I was waiting for it to > settle down). The rebased version is attached. >> - the StandbyModeRequested and StandbyMode should be lowercased like >> the rest of the GUCs > > Yes, except that standby_mode is linked to StandbyModeRequested, not the > other one. I can try see if there's a sane way out of this. As I see it, renaming these will only add noise to this patch, and there is also ArchiveRecoveryRequested / InArchiveRecovery. This is going to be tricky and I'm not sure it's really worth the effort. >> - The whole CopyErrorData and memory context logic is not needed in >> check_recovery_target_time() as the FlushErrorState() is not called >> there > > You must be right. I recall having some trouble with strings being > free'd prematurely, but if it's ERROR, then there should be no need for > that. I'll check again. Hrm, after going through this again I'm pretty sure that was correct: the only way to obtain the current error message is to use CopyErrorData(), but that requires you to switch back to non-error memory context (via Assert). The FlushErrorState() call is not there because it's handled by the hook caller (or it can exit via ereport() with elevel==ERROR). >> - The new code in StartupXLOG() like >> + if (recovery_target_xid_string != NULL) >> + InitRecoveryTarget(RECOVERY_TARGET_XID); >> + >> + if (recovery_target_time_string != NULL) >> + InitRecoveryTarget(RECOVERY_TARGET_TIME); >> + >> + if (recovery_target_name != NULL) >> + InitRecoveryTarget(RECOVERY_TARGET_NAME); >> + >> + if (recovery_target_string != NULL) >> + InitRecoveryTarget(RECOVERY_TARGET_IMMEDIATE); >> >> would probably be better in separate function that you then call >> StartupXLOG() as StartupXLOG() is crazy long already so I think it's >> preferable to not make it worse. > > We can move it at top of CheckStartingAsStandby() obviously. Moved. -- Alex
Attachment
On 24/12/14 20:11, Alex Shulgin wrote: > Alex Shulgin <ash@commandprompt.com> writes: > >>> - the StandbyModeRequested and StandbyMode should be lowercased like >>> the rest of the GUCs >> >> Yes, except that standby_mode is linked to StandbyModeRequested, not the >> other one. I can try see if there's a sane way out of this. > > As I see it, renaming these will only add noise to this patch, and there > is also ArchiveRecoveryRequested / InArchiveRecovery. This is going to > be tricky and I'm not sure it's really worth the effort. > I don't buy that to be honest, most (if not all) places that would be affected are already in diff as part of context around other renames anyway and it just does not seem right to rename some variables and not the others. I still think there should be some thought given to merging the hot_standby and standby_mode, but I think we'd need opinion of more people (especially some committers) on this one. >>> - The whole CopyErrorData and memory context logic is not needed in >>> check_recovery_target_time() as the FlushErrorState() is not called >>> there >> >> You must be right. I recall having some trouble with strings being >> free'd prematurely, but if it's ERROR, then there should be no need for >> that. I'll check again. > > Hrm, after going through this again I'm pretty sure that was correct: > the only way to obtain the current error message is to use > CopyErrorData(), but that requires you to switch back to non-error > memory context (via Assert). Right, my bad. > > The FlushErrorState() call is not there because it's handled by the hook > caller (or it can exit via ereport() with elevel==ERROR). > Yes I've seen that, shouldn't you FreeErrorData(edata) then though? I guess it does not matter too much considering that you are going to throw error and die eventually anyway once you are in that code path, maybe just for the clarity... -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
I think this is a valuable effort, but I wonder how we are going to arrive at a consensus. I don't see any committer supporting these changes. Clearly, there are some advantages to having recovery parameters be GUCs, but the proposed changes also come with some disadvantages, and the tradeoffs aren't so clear. For example, putting recovery target parameters into postgresql.conf might not make sense to some people. Once you have reached the recovery end point, the information is obsolete. Keeping it set could be considered confusing. Moreover, when I'm actually doing point-in-time-recovery attempts, I don't think I want to be messing with my pristine postgresql.conf file.Having those settings separate isn't a bad idea inthat case. Some people might like the recovery.done file as a historical record. Having standby.enabled just be deleted would destroy that information once recovery has ended. In the past, when some people have complained that recovery.conf cannot be moved to a configuration directory, I (and others?) have argued that recovery.conf is really more of a command script file than a configuration file (that was before replication was commonplace). The premise of this patch is that some options really are more configuration than command most of the time, but that doesn't mean the old view was invalid. The current system makes it easy to share postgresql.conf between primary and standby and just maintain the information related to the standby locally in recovery.conf. pg_basebackup -R makes that even easier. It's still possible to do this in the new system, but it's decidedly more work. These are all "soft" reasons, but they add up. The wins on the other hand are obscure: You can now use SHOW to inspect recovery settings. You can design your own configuration file include structures to set them. These are not bad, but is that all? I note that all the new settings are PGC_POSTMASTER, so you can't set them on the fly. Changing primary_conninfo without a restart would be a big win, for example. Is that planned? It might be acceptable to break all the old workflows if the new system was obviously great, but it's not. It's still confusing as heck. For example, we would have a file "standby.enabled" and a setting "standby_mode", which would mean totally different things. I think there is also some conceptual overlap between standby_mode and recovery_target_action (standby_mode is really something like recovery_target_action = keep_going). I also find the various uses of "trigger file" or "to trigger" confusing. The old trigger file to trigger promotion makes sense. But calling "standby.enabled" a trigger file as well confuses two entirely different concepts. I have previously argued for a different approach: Ready recovery.conf as a configuration file after postgresql.conf, but keep it as a file to start recovery. That doesn't break any old workflows, it gets you all the advantages of having recovery parameters in the GUC system, and it keeps all the options open for improving things further in the future.
On 01/05/2015 05:43 PM, Peter Eisentraut wrote: > The wins on the other hand are obscure: You can now use SHOW to inspect > recovery settings. You can design your own configuration file include > structures to set them. These are not bad, but is that all? That's not the only potential win, and it's not small either. I'll be able to tell what master a replica is replicating from using via a port 5432 connection (currently there is absolutely no way to do this). > I note that all the new settings are PGC_POSTMASTER, so you can't set > them on the fly. Changing primary_conninfo without a restart would be a > big win, for example. Is that planned? ... but there you have the flaw in the ointment. Unless we make some of the settings superuserset, no, it doesn't win us a lot, except ... > I have previously argued for a different approach: Ready recovery.conf > as a configuration file after postgresql.conf, but keep it as a file to > start recovery. That doesn't break any old workflows, it gets you all > the advantages of having recovery parameters in the GUC system, and it > keeps all the options open for improving things further in the future. ... and there you hit on one of the other issues with recovery.conf, which is that it's a configuration file with configuration parameters which gets automatically renamed when a standby is promoted. This plays merry hell with configuration management systems. The amount of conditional logic I've had to write for Salt to handle recovery.conf truly doesn't bear thinking about. There may be some other way to make recovery.conf configuration-management friendly, but I haven't thought of it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 06/01/15 18:57, Josh Berkus wrote: > On 01/05/2015 05:43 PM, Peter Eisentraut wrote: >> I have previously argued for a different approach: Ready recovery.conf >> as a configuration file after postgresql.conf, but keep it as a file to >> start recovery. That doesn't break any old workflows, it gets you all >> the advantages of having recovery parameters in the GUC system, and it >> keeps all the options open for improving things further in the future. > > ... and there you hit on one of the other issues with recovery.conf, > which is that it's a configuration file with configuration parameters > which gets automatically renamed when a standby is promoted. This plays > merry hell with configuration management systems. The amount of > conditional logic I've had to write for Salt to handle recovery.conf > truly doesn't bear thinking about. There may be some other way to make > recovery.conf configuration-management friendly, but I haven't thought > of it. > It hurts and it helps with config management - because right now primary and standby can have identical postgresql.conf. Obviously if we merge the recovery.conf settings in there then this is no longer true in the most simple case of one conf file...I guess we can work around it using a conf.d include dir and having the recovery specific (and any other am-I-a-replica params) in a (lol) recovery.conf config file in there... Cheers Mark
On 1/6/15 12:57 AM, Josh Berkus wrote: > On 01/05/2015 05:43 PM, Peter Eisentraut wrote: >> The wins on the other hand are obscure: You can now use SHOW to inspect >> recovery settings. You can design your own configuration file include >> structures to set them. These are not bad, but is that all? > > That's not the only potential win, and it's not small either. I'll be > able to tell what master a replica is replicating from using via a port > 5432 connection (currently there is absolutely no way to do this). That's one particular case of what I mentioned above under using SHOW to inspect recovery settings. I agree that that's important, but it doesn't look like there is a consensus that it justifies all the drawbacks. That said, there is a much simpler way to achieve that specific functionality: Expose all the recovery settings as fake read-only GUC variables. See attached patch for an example. Btw., I'm not sure that everyone will be happy to have primary_conninfo visible, since it might contain passwords. > ... and there you hit on one of the other issues with recovery.conf, > which is that it's a configuration file with configuration parameters > which gets automatically renamed when a standby is promoted. This plays > merry hell with configuration management systems. The amount of > conditional logic I've had to write for Salt to handle recovery.conf > truly doesn't bear thinking about. There may be some other way to make > recovery.conf configuration-management friendly, but I haven't thought > of it. I have written similar logic, and while it's not pleasant, it's doable. This issue would really only go away if you don't use a file to signal recovery at all, which you have argued for, but which is really a separate and more difficult problem.
Attachment
On 01/06/2015 01:22 PM, Peter Eisentraut wrote: > That said, there is a much simpler way to achieve that specific > functionality: Expose all the recovery settings as fake read-only GUC > variables. See attached patch for an example. I have no objections to that idea in principle. As long as it gets into 9.5. > Btw., I'm not sure that everyone will be happy to have primary_conninfo > visible, since it might contain passwords. Didn't we discuss this? I forgot what the conclusion was ... probably not to put passwords in primary_conninfo. > >> ... and there you hit on one of the other issues with recovery.conf, >> which is that it's a configuration file with configuration parameters >> which gets automatically renamed when a standby is promoted. This plays >> merry hell with configuration management systems. The amount of >> conditional logic I've had to write for Salt to handle recovery.conf >> truly doesn't bear thinking about. There may be some other way to make >> recovery.conf configuration-management friendly, but I haven't thought >> of it. > > I have written similar logic, and while it's not pleasant, it's doable. > This issue would really only go away if you don't use a file to signal > recovery at all, which you have argued for, but which is really a > separate and more difficult problem. As far as CMSes are concerned, there is a vast difference between a file which contains configuration variables and one which does not. That is, an *empty* recovery.conf file which just signals the start of recovery is not a configuration problem. The problem comes in in that recovery.conf serves two separate purposes: it's a configuration file, and it's also a trigger file. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Peter, all: Let me go over the issues I find with recovery.conf, based on 3 aspects of my experience (1) doing DBA support (2) as a tool author (HandyRep) and (3) as a trainer, teaching new users about it. If we agree on a list of problems with the current setup (as well as a list of benefits) that's a good litmus test on whether any new version is worth adopting.Basically, new ways of doing this should remove someof the issues while not jettisoning the benefits as much as possible. Issues: A. different formatting compared with PostgreSQL.conf, particularly quoting, and lack of support for include files. B. Inability to find out recovery.conf variables over port 5432. C. Difficulty of management due to being both a trigger file and a configuration file. D. Wierd name: for those doing only replication, not PITR, "recovery.conf" is completely baffling. E. Replication/PITR confusion: some configuration items (particularly recovery_target_*) are contradictory. F. Inability to remaster without restarting the replica. G. Inability to change parameters using ALTER SYSTEM SET. H. Requirement of being in the data directory instead of the configuration directory. I. Fairly duplicative options between pg.conf and recovery.conf (i.e. "standby_mode" vs. "hot_standby") Benefits: 1. Ability to share the exact same postgresql.conf for replica and master. 2. Easy pg_basebackup because you can exclude/generate a recovery.conf automatically. 3. Battle-tested way to make sure that replication/recovery state persists across unexpected restarts, and simple promotion workflow. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2015-01-06 16:33:36 -0800, Josh Berkus wrote: > F. Inability to remaster without restarting the replica. That has pretty much nothing to do with the topic at hand. > I. Fairly duplicative options between pg.conf and recovery.conf (i.e. > "standby_mode" vs. "hot_standby") Those aren't the same. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 01/06/2015 04:42 PM, Andres Freund wrote: > On 2015-01-06 16:33:36 -0800, Josh Berkus wrote: >> F. Inability to remaster without restarting the replica. > > That has pretty much nothing to do with the topic at hand. It has *everything* to do with the topic at hand. The *only* reason we can't remaster without restarting is that recovery.conf is only read at startup time, and can't be reloaded. http://www.databasesoup.com/2014/05/remastering-without-restarting.html -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2015-01-05 20:43:12 -0500, Peter Eisentraut wrote: > For example, putting recovery target parameters into postgresql.conf > might not make sense to some people. Once you have reached the recovery > end point, the information is obsolete. Keeping it set could be > considered confusing. I don't know, but I think that ship has sailed. hot_standby, archive_command, archive_mode, hot_standby_feedback are all essentially treated differently between primary and standby. > Moreover, when I'm actually doing point-in-time-recovery attempts, I > don't think I want to be messing with my pristine postgresql.conf file. > Having those settings separate isn't a bad idea in that case. Well, nothing stops you from having a include file or something similar. I think we should just make recovery.conf behave exactly the way it does right now, except parse it according to guc rules. That way the changes when migrating are minimal and we don't desupport any usecases. Disallowing that way of operating just seems like intentionally throwing rocks in the way of getting this done. > In the past, when some people have complained that recovery.conf cannot > be moved to a configuration directory, I (and others?) have argued that > recovery.conf is really more of a command script file than a > configuration file (that was before replication was commonplace). The > premise of this patch is that some options really are more configuration > than command most of the time, but that doesn't mean the old view was > invalid. Again, I think this ship has long since sailed. > The current system makes it easy to share postgresql.conf between > primary and standby and just maintain the information related to the > standby locally in recovery.conf. pg_basebackup -R makes that even > easier. It's still possible to do this in the new system, but it's > decidedly more work. Really? Howso? > The wins on the other hand are obscure: You can now use SHOW to inspect > recovery settings. You can design your own configuration file include > structures to set them. These are not bad, but is that all? It's much more: a) One configuration format instead of two somewhat, but not really, similar ones. b) Proper infrastructure to deal with configuration variable boundaries and such. Just a few days ago there was e7c11887et al. c) Infrastructure for changing settings effective during recovery. Right now we'd have to rebuild a lot of guc infrasturctureto allow that. It'd not be that hard to allow changing parameters like restore_command, primary_conninfo,recovery_target_* et al. That's for sure not the same commit, but once the infrastructure is in those won't be too hard. > I note that all the new settings are PGC_POSTMASTER, so you can't set > them on the fly. Changing primary_conninfo without a restart would be a > big win, for example. Is that planned? I think it's not too hard to do - but I'll fight hard to do that separately. Once we've the infrastructure I'd be surprised if took too long to change some of them to PGC_SIGHUP. > I have previously argued for a different approach: Ready recovery.conf > as a configuration file after postgresql.conf, but keep it as a file to > start recovery. That doesn't break any old workflows, it gets you all > the advantages of having recovery parameters in the GUC system, and it > keeps all the options open for improving things further in the future. Well, it breaks because quoting and such is different. Otherwise I think I agree. It allows you to keep parameters in recovery.conf if you want, if not not. If we add a recovery_file guc that defaults to '$PGDATA/recovery.conf' we'd have a rather easy way to move forward imo. We can even allow two filenames so we could default to something like recovery_file = 'PGDATA/recovery.conf; PGDATA/recovery.trigger' Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2015-01-06 17:08:20 -0800, Josh Berkus wrote: > On 01/06/2015 04:42 PM, Andres Freund wrote: > > On 2015-01-06 16:33:36 -0800, Josh Berkus wrote: > >> F. Inability to remaster without restarting the replica. > > > > That has pretty much nothing to do with the topic at hand. > > It has *everything* to do with the topic at hand. The *only* reason we > can't remaster without restarting is that recovery.conf is only read at > startup time, and can't be reloaded. > http://www.databasesoup.com/2014/05/remastering-without-restarting.html Not really. It's a good way to introduce suble and hard to understand corruption and other strange corner cases. Your replication connection was lost/reconnected in the wrong moment? Oops, received/replayed too much. To achieve what you describe there, you don't even need a proxy, simple dns based failover suffices. A real solution to this requires more. You need to 1) disable writing any wal 2) force catchup of all possible standbys, including apply. Most importantly of the new master. This requires knowing which standbys exist. 3) promote new master. 4) only now allow reconnects. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 01/06/2015 05:17 PM, Andres Freund wrote: > A real solution to this requires more. You need to 1) disable writing > any wal 2) force catchup of all possible standbys, including apply. Most > importantly of the new master. This requires knowing which standbys > exist. 3) promote new master. 4) only now allow reconnect That can all be handled externally to PostgreSQL. However, as long as we have a recovery.conf file which only gets read at server start, and at no other time, no external solution is possible. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Andres Freund <andres@2ndquadrant.com> wrote: > On 2015-01-06 17:08:20 -0800, Josh Berkus wrote: >> On 01/06/2015 04:42 PM, Andres Freund wrote: >>> On 2015-01-06 16:33:36 -0800, Josh Berkus wrote: >>>> F. Inability to remaster without restarting the replica. >>> >>> That has pretty much nothing to do with the topic at hand. >> >> It has *everything* to do with the topic at hand. The *only* reason we >> can't remaster without restarting is that recovery.conf is only read at >> startup time, and can't be reloaded. >> >> http://www.databasesoup.com/2014/05/remastering-without-restarting.html > > Not really. It's a good way to introduce suble and hard to understand > corruption and other strange corner cases. Your replication connection > was lost/reconnected in the wrong moment? Oops, received/replayed too > much. > A real solution to this requires more. You need to 1) disable writing > any wal 2) force catchup of all possible standbys, including apply. Most > importantly of the new master. This requires knowing which standbys > exist. 3) promote new master. 4) only now allow reconnects. I'm not following. As long as each standby knows what point it is at in the transaction stream, it could make a request to any transaction source for transactions past that point. If a node which will be promoted to the new master isn't caught up to that point, it should not send anything. When it does get caught up it could start sending transactions past that point, including the timeline switch when it is promoted. If you were arguing that some changes besides *just* allowing a standby to read from a new source without a restart, OK -- some changes might also be needed for a promoted master to behave as described above; but certainly the ability to read WAL from a new source on the floor would be a prerequisite, and possibly the biggest hurdle to clear. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/6/15 7:33 PM, Josh Berkus wrote: > D. Wierd name: for those doing only replication, not PITR, > "recovery.conf" is completely baffling. I don't disagree, but "standby.enabled" or whatever isn't any better, for the inverse reason. But replication and PITR are the same thing, so any name is going to have that problem. One way out of that would be to develop higher-level abstractions, like pg_ctl start -m standby or something, similar to how pg_ctl promote is an abstraction and got people away from fiddling with trigger files.
On 01/07/2015 02:31 PM, Peter Eisentraut wrote: > On 1/6/15 7:33 PM, Josh Berkus wrote: >> D. Wierd name: for those doing only replication, not PITR, >> "recovery.conf" is completely baffling. > > I don't disagree, but "standby.enabled" or whatever isn't any better, > for the inverse reason. Yeah. Like I said, I posted a list of bugs and features so that we can test your solution and the pg.conf merger to see how much they actually improve things. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 1/6/15 8:08 PM, Andres Freund wrote: > On 2015-01-05 20:43:12 -0500, Peter Eisentraut wrote: >> For example, putting recovery target parameters into postgresql.conf >> might not make sense to some people. Once you have reached the recovery >> end point, the information is obsolete. Keeping it set could be >> considered confusing. > > I don't know, but I think that ship has sailed. hot_standby, > archive_command, archive_mode, hot_standby_feedback are all essentially > treated differently between primary and standby. I don't mind those. I mean things like recovery_target_time. >> Moreover, when I'm actually doing point-in-time-recovery attempts, I >> don't think I want to be messing with my pristine postgresql.conf file. >> Having those settings separate isn't a bad idea in that case. > > Well, nothing stops you from having a include file or something similar. Sure, I need to update postgresql.conf to have an include file. > I think we should just make recovery.conf behave exactly the way it does > right now, except parse it according to guc rules. That way the changes > when migrating are minimal and we don't desupport any > usecases. Disallowing that way of operating just seems like > intentionally throwing rocks in the way of getting this done. That was more or less my proposal. >> The current system makes it easy to share postgresql.conf between >> primary and standby and just maintain the information related to the >> standby locally in recovery.conf. pg_basebackup -R makes that even >> easier. It's still possible to do this in the new system, but it's >> decidedly more work. > > Really? Howso? You have to set up include files, override the include file on the standby, I don't know how pg_basebackup -R would even work. And most importantly, you have to come up with all of that yourself, instead of it just working. >> The wins on the other hand are obscure: You can now use SHOW to inspect >> recovery settings. You can design your own configuration file include >> structures to set them. These are not bad, but is that all? > > It's much more: > a) One configuration format instead of two somewhat, but not really, > similar ones. Agreed, but that's also fixable by just changing how recovery.conf is parsed. It doesn't require user space changes. > b) Proper infrastructure to deal with configuration variable boundaries > and such. Just a few days ago there was e7c11887 et al. That's just an implementation issue. > c) Infrastructure for changing settings effective during recovery. Right > now we'd have to rebuild a lot of guc infrasturcture to allow > that. It'd not be that hard to allow changing parameters like > restore_command, primary_conninfo, recovery_target_* et al. That's > for sure not the same commit, but once the infrastructure is in those > won't be too hard. Right, if that worked, then it would be a real win. But this discussion is about right now, and the perspective of the user.
On 1/6/15 4:40 PM, Josh Berkus wrote: >> Btw., I'm not sure that everyone will be happy to have primary_conninfo >> > visible, since it might contain passwords. > Didn't we discuss this? I forgot what the conclusion was ... probably > not to put passwords in primary_conninfo. One can always say, don't do that then. But especially with pg_basebackup -R mindlessly copying passwords from .pgpass into recovery.conf, the combination of these factors would proliferate passwords a bit too easily for my taste. Maybe a separate primary_conninfo_password that is a kind of write-only GUC would work. (That's how passwords usually work: You can change your password, but can't see your existing one.)
On 01/08/2015 12:57 PM, Peter Eisentraut wrote: >> > c) Infrastructure for changing settings effective during recovery. Right >> > now we'd have to rebuild a lot of guc infrasturcture to allow >> > that. It'd not be that hard to allow changing parameters like >> > restore_command, primary_conninfo, recovery_target_* et al. That's >> > for sure not the same commit, but once the infrastructure is in those >> > won't be too hard. > Right, if that worked, then it would be a real win. But this discussion > is about right now, and the perspective of the user. That's rather a catch-22, isn't it? Last I checked, it was our policy to try to get smaller, more discrete patches rather than patches which try to change everything at once. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Mon, Jan 5, 2015 at 8:43 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > I have previously argued for a different approach: Ready recovery.conf > as a configuration file after postgresql.conf, but keep it as a file to > start recovery. That doesn't break any old workflows, it gets you all > the advantages of having recovery parameters in the GUC system, and it > keeps all the options open for improving things further in the future. But this is confusing, too, because it means that if you set a parameter in both postgresql.conf and recovery.conf, you'll end up with the recovery.conf value of the parameter even after recovery is done. Until you restart, and then you won't. That's weird. I think your idea of adding read-only GUCs with the same names as all of the recovey.conf parameters is a clear win. Even if we do nothing more than that, it makes the values visible from the SQL level, and that's good. But I think we should go further and make them really be GUCs. Otherwise, if you want to be able to change one of those parameters other than at server startup time, you've got to invent a separate system for reloading them on SIGHUP. If you make them part of the GUC mechanism, you can use that. I think it's quite safe to say that the whole reason we *have* a GUC mechanism is so that all of our configuration can be done through one grand, unified mechanism rather than being fragmented across many files. Some renaming might be in order. Heikki previously suggested merging all of the recovery_target_whatever settings down into a single parameter recovery_target='kindofrecoverytarget furtherdetailsgohere', like recovery_target='xid 1234' or recovery_target='name bob'. Maybe that would be more clear (or not). Maybe standby_mode needs a better name. But I think the starting point for this discussion shouldn't be "why in the world would we merge recovery.conf into postgresql.conf?" but "why, when we've otherwise gone to such trouble to push all of our configuration through a single, unified mechanism that offers many convenient features, do we continue to suffer our recovery.conf settings to go through some other, less-capable mechanism?". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14/01/15 14:24, Robert Haas wrote: > On Mon, Jan 5, 2015 at 8:43 PM, Peter Eisentraut <peter_e@gmx.net> wrote: >> I have previously argued for a different approach: Ready recovery.conf >> as a configuration file after postgresql.conf, but keep it as a file to >> start recovery. That doesn't break any old workflows, it gets you all >> the advantages of having recovery parameters in the GUC system, and it >> keeps all the options open for improving things further in the future. > > But this is confusing, too, because it means that if you set a > parameter in both postgresql.conf and recovery.conf, you'll end up > with the recovery.conf value of the parameter even after recovery is > done. Until you restart, and then you won't. That's weird. > > I think your idea of adding read-only GUCs with the same names as all > of the recovey.conf parameters is a clear win. Even if we do nothing > more than that, it makes the values visible from the SQL level, and > that's good. But I think we should go further and make them really be > GUCs. Otherwise, if you want to be able to change one of those > parameters other than at server startup time, you've got to invent a > separate system for reloading them on SIGHUP. If you make them part > of the GUC mechanism, you can use that. I think it's quite safe to > say that the whole reason we *have* a GUC mechanism is so that all of > our configuration can be done through one grand, unified mechanism > rather than being fragmented across many files. This is basically what the patch which is in commitfest does no? > > Some renaming might be in order. Heikki previously suggested merging > all of the recovery_target_whatever settings down into a single > parameter recovery_target='kindofrecoverytarget furtherdetailsgohere', > like recovery_target='xid 1234' or recovery_target='name bob'. Maybe > that would be more clear (or not). I was thinking about this a lot myself while reviewing the patch too, seems strange to have multiple config parameters for what is essentially same thing, my thinking was similar except I'd use ":" as separator ('kindofrecoverytarget:furtherdetailsgohere'). I think though while it is technically nicer to do this, it might hurt usability for users. > Maybe standby_mode needs a better name. I actually think standby_mode should be merged with hot_standby (as in standby_mode = 'hot'/'warm'/'off' or something). > But I think the starting point for this discussion shouldn't be > "why in the world would we merge recovery.conf into postgresql.conf?" > but "why, when we've otherwise gone to such trouble to push all of our > configuration through a single, unified mechanism that offers many > convenient features, do we continue to suffer our recovery.conf > settings to go through some other, less-capable mechanism?". > +1 -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 01/15/2015 02:24 AM, Robert Haas wrote: > I think your idea of adding read-only GUCs with the same names as all > of the recovey.conf parameters is a clear win. Even if we do nothing > more than that, it makes the values visible from the SQL level, and > that's good. But I think we should go further and make them really be > GUCs. Otherwise, if you want to be able to change one of those > parameters other than at server startup time, you've got to invent a > separate system for reloading them on SIGHUP. If you make them part > of the GUC mechanism, you can use that. I think it's quite safe to > say that the whole reason we *have* a GUC mechanism is so that all of > our configuration can be done through one grand, unified mechanism > rather than being fragmented across many files. +1 I do find it ironic that the creator of the Grand Unified Configuration Settings is arguing against unifying the files. > Some renaming might be in order. Heikki previously suggested merging > all of the recovery_target_whatever settings down into a single > parameter recovery_target='kindofrecoverytarget furtherdetailsgohere', > like recovery_target='xid 1234' or recovery_target='name bob'. Maybe > that would be more clear (or not). Not keen on non-atomic settings, personally. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Wed, Jan 7, 2015 at 6:22 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > I have written similar logic, and while it's not pleasant, it's doable. > This issue would really only go away if you don't use a file to signal > recovery at all, which you have argued for, but which is really a > separate and more difficult problem. Moving this patch to the next CF and marking it as returned with feedback for current CF as there is visibly no consensus reached. -- Michael
On 2015-01-16 21:43:43 +0900, Michael Paquier wrote: > On Wed, Jan 7, 2015 at 6:22 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > > I have written similar logic, and while it's not pleasant, it's doable. > > This issue would really only go away if you don't use a file to signal > > recovery at all, which you have argued for, but which is really a > > separate and more difficult problem. > Moving this patch to the next CF and marking it as returned with > feedback for current CF as there is visibly no consensus reached. I don't think that's a good idea. If we defer this another couple months we'l *never* reach anything coming close to concensus. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jan 16, 2015 at 9:45 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2015-01-16 21:43:43 +0900, Michael Paquier wrote: >> On Wed, Jan 7, 2015 at 6:22 AM, Peter Eisentraut <peter_e@gmx.net> wrote: >> > I have written similar logic, and while it's not pleasant, it's doable. >> > This issue would really only go away if you don't use a file to signal >> > recovery at all, which you have argued for, but which is really a >> > separate and more difficult problem. >> Moving this patch to the next CF and marking it as returned with >> feedback for current CF as there is visibly no consensus reached. > > I don't think that's a good idea. If we defer this another couple months > we'l *never* reach anything coming close to concensus. What makes you think that the situation could move suddendly move into a direction more than another? (FWIW, my vote goes to the all GUC approach with standby.enabled.) -- Michael
On 2015-01-16 21:50:16 +0900, Michael Paquier wrote: > On Fri, Jan 16, 2015 at 9:45 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2015-01-16 21:43:43 +0900, Michael Paquier wrote: > >> On Wed, Jan 7, 2015 at 6:22 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > >> > I have written similar logic, and while it's not pleasant, it's doable. > >> > This issue would really only go away if you don't use a file to signal > >> > recovery at all, which you have argued for, but which is really a > >> > separate and more difficult problem. > >> Moving this patch to the next CF and marking it as returned with > >> feedback for current CF as there is visibly no consensus reached. > > > > I don't think that's a good idea. If we defer this another couple months > > we'l *never* reach anything coming close to concensus. > What makes you think that the situation could move suddendly move into > a direction more than another? That we have to fix this. I see absolutely no advantage of declaring the discussion closed for now. That doesn't exactly increase the chance of this ever succeeding. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 1/6/15 4:22 PM, Peter Eisentraut wrote: > That said, there is a much simpler way to achieve that specific > functionality: Expose all the recovery settings as fake read-only GUC > variables. See attached patch for an example. The commit fest app has this as the patch of record. I don't think there is a real updated patch under consideration here. This item should probably not be in the commit fest at all.
On 02/19/2015 12:23 PM, Peter Eisentraut wrote: > On 1/6/15 4:22 PM, Peter Eisentraut wrote: >> That said, there is a much simpler way to achieve that specific >> functionality: Expose all the recovery settings as fake read-only GUC >> variables. See attached patch for an example. > > The commit fest app has this as the patch of record. I don't think > there is a real updated patch under consideration here. This item > should probably not be in the commit fest at all. What happened to the original patch? Regardless of what we do, keeping recovery.conf the way it is can't be what we go forward with. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2/19/15 4:33 PM, Josh Berkus wrote: > On 02/19/2015 12:23 PM, Peter Eisentraut wrote: >> On 1/6/15 4:22 PM, Peter Eisentraut wrote: >>> That said, there is a much simpler way to achieve that specific >>> functionality: Expose all the recovery settings as fake read-only GUC >>> variables. See attached patch for an example. >> >> The commit fest app has this as the patch of record. I don't think >> there is a real updated patch under consideration here. This item >> should probably not be in the commit fest at all. > > What happened to the original patch? Regardless of what we do, keeping > recovery.conf the way it is can't be what we go forward with. There was disagreement on many of the details, and no subsequent new proposal.
On Sat, Feb 21, 2015 at 6:45 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > On 2/19/15 4:33 PM, Josh Berkus wrote: >> On 02/19/2015 12:23 PM, Peter Eisentraut wrote: >>> On 1/6/15 4:22 PM, Peter Eisentraut wrote: >>>> That said, there is a much simpler way to achieve that specific >>>> functionality: Expose all the recovery settings as fake read-only GUC >>>> variables. See attached patch for an example. >>> >>> The commit fest app has this as the patch of record. I don't think >>> there is a real updated patch under consideration here. This item >>> should probably not be in the commit fest at all. >> >> What happened to the original patch? Regardless of what we do, keeping >> recovery.conf the way it is can't be what we go forward with. > > There was disagreement on many of the details, and no subsequent new > proposal. Patch has been marked as "Returned with feedback". -- Michael