Re: unite recovery.conf and postgresql.conf - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Re: unite recovery.conf and postgresql.conf |
Date | |
Msg-id | 4EE91248.8010505@2ndQuadrant.com Whole thread Raw |
In response to | Re: unite recovery.conf and postgresql.conf (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: unite recovery.conf and postgresql.conf
Re: unite recovery.conf and postgresql.conf Re: unite recovery.conf and postgresql.conf |
List | pgsql-hackers |
I've submitted two patches for adding new include features to the postgresql.conf file. While not quite commit quality yet, I hope everyone will agree their reviews this week suggest both are close enough that any number of people could finish them off. Before re-opening this can of worms, I wanted people to be comfortable that we can expect them to be available as building blocks before 9.2 development is finished. Both of those came out of requests from this unification thread, and they're a helpful part of what I'd like to propose. I don't see any path forward here that still expects the recovery.conf file to function as it used to. The "make replication easy" crowd will eventually be larger than the pre-9.0 user base, if it isn't already. And they clearly want no parts of that thing. There's been over a year of arguing around how to cope with it that will satisfy everyone, so many messages I couldn't even read them all usefully in our archives and had to go here: http://postgresql.1045698.n5.nabble.com/recovery-conf-location-td2854644.html http://postgresql.1045698.n5.nabble.com/unite-recovery-conf-and-postgresql-conf-td4785717.html I don't think it's possible. What I would propose is a specification based on forced failure if there's any sign of recovery.conf, combined with the simplest migration path we can design to ease upgrades from older versions. I think we can make the transition easy enough. Users and tool vendors can make relatively simple changes to support 9.2 without changing everything they're used to just yet--while still being very clear deprecation has arrived and they should reconsider their approach. Only bug-compatible levels of backwards compatibility would make this transparent to them, and there's way too many issues to allow moving forward that way--a forward path that as far as I can see is desired by the majority of users, and just as importantly for all of the potential new users we're impacting with the current mess. There's another, perhaps under considered, concern I want to highlight as well. Tom has repeatedly noted that one of the worst problems here would go away if the "existence means do recovery" nature of recovery.conf went elsewhere. And we know some packagers want to separate the necessary to manipulate configuration files from the database directory, for permissions and management reasons. As Heikki nicely put it (over a year ago), "You don't want to give monitoring tools that decide on failover write access to the data directory". Any information that the user is supplying for the purpose of specifying things needs to be easy to relocate to a separate config file area, instead of treating it more like a control file in $PGDATA. Some chatting this morning with Simon pointed out a second related concern there, which makes ideas like "specify the path to the recovery.conf file" infeasible. The data_directory is itself a parameter, so anything tied to that or a new GUC means that config files specified there we would need two passes. First identify the data directory, then go back again to read recovery.conf from somewhere else. And nobody wants to wander that way. If it doesn't fit cleanly into the existing postgresql.conf parsing, it's gotta go. Here's the rough outline of what I think would work here: -All settings move into the postgresql.conf. -As of 9.2, relocating the postgresql.conf means there are no user writable files needed in the data directory. -Creating a standby.enabled file in the directory that houses the postgresql.conf (same logic as "include" uses to locate things) puts the system into recovery mode. That feature needs to save some state, and making those decisions based on existence of a file is already a thing we do. Rather than emulating the rename to recovery.done that happens now, the server can just delete it, to keep from incorrectly returning to a state it's exited. A UI along the lines of the promote one, allowing "pg_ctl standby", should fall out of here. I think this is enough that people who just want to use replication features need never hear about this file at all, at least during the important to simplify first run through. -If startup finds a recovery.conf file where it used to live at, abort--someone is expecting the old behavior. Hint to RTFM or include a short migration guide right on the spot. That can have a nice section about how you might use the various postgresql.conf include* features if they want to continue managing those files separately. Example: rename it as replication.conf and use include_if_exists if you want to be able to rename it to recovery.done like before. Or drop it into a conf.d directory where the rename will make it then skipped. -Tools such as pgpool that want to write a simple configuration file, only touching the things that used to go into recovery.conf, can tell people to do the same trick. End their postgresql.conf with a call to \include_if_exists replication.conf as part of setup. While I don't like pushing problems toward tool vendors, as one I think validating if this has been done doesn't require the sort of fully GUC compatible parser people (rightly) want to avoid. A simple scan of the postgresql.conf looking for the recommended text at the front of a line could confirm whether that bit is there. And adding a single "include_if_exists" line to the end of the postgresql.conf is not a terrible edit job to consider pushing toward tools. None of this is any more complicated than the little search and replace job that initdb does right now. -[Optional: for 9.2 but not forever, report incompatibilities, or potentially stop working altogether, if the things that used to go into recovery.conf show up when they aren't going to do anything. That would match the 'stop if the old way is used' theme, but may introduce its own new operational issues.] That's basically it for new ideas. The mechanics of making everything in recovery.conf turn into a GUC seemed pretty uncontroversial when I scanned the archives here. Another idea not to forget is getting the read permissions on the new GUCs correct, like making primary_conninfo superuser visible only. Seems like some of the restart error handling might still have a tricky part or two left in it. But if you've accepted that a clear break from existing behavior is happening, some of those simplify I think. If we're moving forward this way for 9.2, we do need to be careful that a spec is nailed down and people take responsibility for their respective parts of it. There's a few interlocking parts that won't work well if partially completed. I marked myself as reviewer for this patch hoping I could crack the deadlock around the specification. One open item I'm not going to do is a more serious review of the already submitted code. To step back toward the thinking that brought me to here (which has helped pull Simon a bit closer to the middle of the opinion pack too), an application facing change to PostgreSQL has serious backward compatibility requirements. It's really bad if the server runs but will quietly malfunction, such that you may not notice the change until much later when a rare code path is hit. But when we're talking about something that impacts whether the server will start or not, that can be different. You have a new option. Make sure the break is obvious and total when it happens: do not function at all if someone tries to operate the old way. The migration path toward the new one should be as painless as possible. If people only have to tweak one or two simple things to get an approximation of the old behavior, provide that, give migration instructions. But this is a new major version; if people expected all their old configurations to just move over with zero changes, that's full of fail in all directions. The best we can do is try to improve the odds their test migrations will fail early, and make the path to upgrade as simple and well defined as possible. (It can be argued that people will not actually run into the proposed "server blocks if recovery.conf exists" fail-safe during migration testing. It might hit them only when the first fail-over happens in the field. Then they've got a production server down and are staring at the new error message, at best; if they don't see the logs even that may not be facing them. I would say that someone who migrates a high-availability system over to a new major version, and doesn't read the release notes nor do the simplest of fail-over testing, is doomed regardless of what we do to try and help them.) -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
pgsql-hackers by date: