Thread: PostgreSQL configuration
About a year or two ago I submitted a configuration patch that allowed PostgreSQL to be fully configured by postgresql.conf -- enabling data and configuration to be in separate locations. The idea was that, like most UNIX systems, that the configuration file could be stored in the /etc directory (or /etc/postgres or /usr/etc or whatever) and it could contain all the various system directory and file locations, like pg_hba, and so on. There was a lot of debate about it, and I don't recall many arguments against this sort of configuration strategy, only that there was a dislike of my patch because it wasn't an all encompassing re-write of the configuration system. I have been maintaining it for the various versions of PostgreSQL since that time for my own use, can we re-open this debate? It has been a good deal of time with no progress, and I don't think anyone can deny that a more flexable configuration based on the idea that configuration and data are in SEPARATE locations is important.
On Thu, 8 Apr 2004 pgsql@mohawksoft.com wrote: > more flexable configuration based on the idea that configuration and data > are in SEPARATE locations is important. Why is it important and wouldn't it just make it harder to have several database clusters (for example with different locale) or several versions of pg installed at the same time? I guess I should search the archive for the old discussion. If someone have a link please post :-) -- /Dennis Björklund
> On Thu, 8 Apr 2004 pgsql@mohawksoft.com wrote: > >> more flexable configuration based on the idea that configuration and >> data >> are in SEPARATE locations is important. > > Why is it important and wouldn't it just make it harder to have several > database clusters (for example with different locale) or several versions > of pg installed at the same time? My patch did not remove any functionality, it merely augmented it. To say that it would make it more difficult to deploy multiple databases is misleading for (2) reasons. (1) It need not do that, because the configuration system would seem unchanged for those who do not wish to use it in this way. (2) I would bet that *most* deployments of PostgreSQL only use one database environment per server, so I'm not even sure that it would be an issue for the majority of current or prospective users. It is all well and good to say "our way is better," (with which I do not agree) but there are, more or less, if not "standards," "standard concepts" from which good software design follows. Besides PostgreSQL, name one popular open source project that is widely used that stores its configuration information inside its data repository. From the "new user" perspective, configuration within the data directory is an alien concept. From a sysadmin perspective, having configuration in a standard location makes sense. It makes these things easy to backup, archive, and put under version control. (Many sysadmins put machine configuration under version control to see what changes are made over time.) Finally, I'm not suggesting removing any functionality, I am suggesting that configuration can and should be able to be located in a standard location and the the configuration be able to point to the data volume. How many systems have you been asked to inspect for problems? It is one of the things I do for a living. On many systems, I can just look in the '/etc' directory for most of what I need. If they are running PostgreSQL, I have to look around and figure out where the database is located.
Dennis Bjorklund <db@zigo.dhs.org> writes: > On Thu, 8 Apr 2004 pgsql@mohawksoft.com wrote: >> more flexable configuration based on the idea that configuration and data >> are in SEPARATE locations is important. > Why is it important and wouldn't it just make it harder to have several > database clusters (for example with different locale) or several versions > of pg installed at the same time? My recollection of the arguments against were first that and second reliability --- there was concern about getting config and data of multiple installations mixed up if they weren't kept together. In the worst case you could conceivably bollix an installation unrecoverably that way. (Right now I do not think there is anything quite that critical in postgresql.conf, but someday there might be. My very vague recollection is that the proposed patch changed things so that WAL and DATA directories would be separately specified in the config file; if correct, mismatching them definitely would be a great chance to shoot oneself in the foot.) I've recently had some very unpleasant experiences trying to install test versions of MySQL on machines that already had older versions installed normally. It seems that MySQL *will* read /etc/my.cnf if it exists, whether it's appropriate or not, and so it's impossible to have a truly independent test installation, even though you can configure it to build/install into nonstandard directories. Let's not emulate that bit of brain damage. regards, tom lane
On Thu, Apr 08, 2004 at 10:31:44AM -0400, Tom Lane wrote: > > I've recently had some very unpleasant experiences trying to install > test versions of MySQL on machines that already had older versions > installed normally. It seems that MySQL *will* read /etc/my.cnf if it > exists, whether it's appropriate or not, and so it's impossible to have > a truly independent test installation, even though you can configure it > to build/install into nonstandard directories. Let's not emulate that > bit of brain damage. A counterexample of Apache shows that you can easily use -f or another command line option to point the server to alternate master config file (which I believe is the same with MySQL). From that config files, another files can be included, making it easy to share pieces of configuration, or separate them in any way. -- ------------------------------------------------------------------------Honza Pazdziora | adelton@fi.muni.cz | http://www.fi.muni.cz/~adelton/.project:Perl, mod_perl, DBI, Oracle, large Web systems, XML/XSL, ... Only self-confidentpeople can be simple.
I have the file location discussion in my 7.4 hold mailbox: http:/momjian.postgresql.org/cgi-bin/pgpatches2 I am going to revisit it the next month and see if I can get all the opinions merged into a plan everyone can agree on. I think it can be done. --------------------------------------------------------------------------- Tom Lane wrote: > Dennis Bjorklund <db@zigo.dhs.org> writes: > > On Thu, 8 Apr 2004 pgsql@mohawksoft.com wrote: > >> more flexable configuration based on the idea that configuration and data > >> are in SEPARATE locations is important. > > > Why is it important and wouldn't it just make it harder to have several > > database clusters (for example with different locale) or several versions > > of pg installed at the same time? > > My recollection of the arguments against were first that and second > reliability --- there was concern about getting config and data of > multiple installations mixed up if they weren't kept together. In the > worst case you could conceivably bollix an installation unrecoverably > that way. (Right now I do not think there is anything quite that > critical in postgresql.conf, but someday there might be. My very vague > recollection is that the proposed patch changed things so that WAL and > DATA directories would be separately specified in the config file; if > correct, mismatching them definitely would be a great chance to shoot > oneself in the foot.) > > I've recently had some very unpleasant experiences trying to install > test versions of MySQL on machines that already had older versions > installed normally. It seems that MySQL *will* read /etc/my.cnf if it > exists, whether it's appropriate or not, and so it's impossible to have > a truly independent test installation, even though you can configure it > to build/install into nonstandard directories. Let's not emulate that > bit of brain damage. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Tom Lane wrote: > I've recently had some very unpleasant experiences trying to install > test versions of MySQL on machines that already had older versions > installed normally. It seems that MySQL *will* read /etc/my.cnf if it > exists, whether it's appropriate or not, and so it's impossible to have > a truly independent test installation, even though you can configure it > to build/install into nonstandard directories. Let's not emulate that > bit of brain damage. > > regards, tom lane It seems to me that this is a packaging problem and not a postgresql problem. If someone wants to package PostgreSQL so that there's a symlink to a config file in /etc/pgsql or vice versa for the main database they're welcome to do that, and why not? As for test databases, there's already a -D for the datadir, why not add a -C for the config file as many software packages allow. Then packagers could put the config file anywhere they wanted. I would certainly welcome this feature as it would allow for easy tweaking/benchmarking. I agree that we should avoid the viral-like MySQL configuration plague. As to pgsql AT mohawksoft.com requested, here are a few widely used software packages that keep configuration close to the data, some in /var, some in /usr: Mailman OpenSSL Cyrus-IMAP Apache I believe doesn't install anything to /etc/ when you build from source.
Honza Pazdziora <adelton@informatics.muni.cz> writes: > On Thu, Apr 08, 2004 at 10:31:44AM -0400, Tom Lane wrote: >> It seems that MySQL *will* read /etc/my.cnf if it >> exists, whether it's appropriate or not, and so it's impossible to have >> a truly independent test installation, even though you can configure it >> to build/install into nonstandard directories. Let's not emulate that >> bit of brain damage. > A counterexample of Apache shows that you can easily use -f or another > command line option to point the server to alternate master config > file (which I believe is the same with MySQL). According to http://www.mysql.com/documentation/mysql/bychapter/manual_Using_MySQL_Programs.html#Option_files /etc/my.cnf will be read if it exists, no matter what you say on the command line. So AFAICS the only way to make a private installation is to make sure that you have overridden each and every setting in /etc/my.cnf in a private config file that you do control. This is tedious and breakage-prone, of course. regards, tom lane
> Dennis Bjorklund <db@zigo.dhs.org> writes: >> On Thu, 8 Apr 2004 pgsql@mohawksoft.com wrote: >>> more flexable configuration based on the idea that configuration and >>> data >>> are in SEPARATE locations is important. > >> Why is it important and wouldn't it just make it harder to have several >> database clusters (for example with different locale) or several >> versions >> of pg installed at the same time? > > My recollection of the arguments against were first that and second > reliability --- there was concern about getting config and data of > multiple installations mixed up if they weren't kept together. In the > worst case you could conceivably bollix an installation unrecoverably > that way. (Right now I do not think there is anything quite that > critical in postgresql.conf, but someday there might be. My very vague > recollection is that the proposed patch changed things so that WAL and > DATA directories would be separately specified in the config file; if > correct, mismatching them definitely would be a great chance to shoot > oneself in the foot.) The patch I had kept the directory layout as one single setting, just that postgresql,conf was able to contain the location of pg_hba.conf, pg_ident.conf, and the data directory. Thus, one could start PostgreSQL as: postmaster -C /etc/postgres/webdb.conf Which would allow full configuration from that one file. > > I've recently had some very unpleasant experiences trying to install > test versions of MySQL on machines that already had older versions > installed normally. It seems that MySQL *will* read /etc/my.cnf if it > exists, whether it's appropriate or not, and so it's impossible to have > a truly independent test installation, even though you can configure it > to build/install into nonstandard directories. Let's not emulate that > bit of brain damage. MySQL is, in general, unpleasent, but that is more or less a packaging issue.
On Thu, 2004-04-08 at 09:49, pgsql@mohawksoft.com wrote: > > On Thu, 8 Apr 2004 pgsql@mohawksoft.com wrote: > > > >> more flexable configuration based on the idea that configuration and > >> data > >> are in SEPARATE locations is important. > > > > Why is it important and wouldn't it just make it harder to have several > > database clusters (for example with different locale) or several versions > > of pg installed at the same time? > > My patch did not remove any functionality, it merely augmented it. > > To say that it would make it more difficult to deploy multiple databases > is misleading for (2) reasons. > > (1) It need not do that, because the configuration system would seem > unchanged for those who do not wish to use it in this way. > True, but it is more difficult to deal with multiple databases if one configures there system in the fashion... debian packages their installations this way via symlinks so i've experience the difficulty first hand. . > (2) I would bet that *most* deployments of PostgreSQL only use one > database environment per server, so I'm not even sure that it would be an > issue for the majority of current or prospective users. > except that when doing major version upgrades, i find it far better practice to install multiple versions on the machine whenever possible, even if you only intend to run a single version. > It is all well and good to say "our way is better," (with which I do not > agree) but there are, more or less, if not "standards," "standard > concepts" from which good software design follows. Besides PostgreSQL, > name one popular open source project that is widely used that stores its > configuration information inside its data repository. From the "new user" > perspective, configuration within the data directory is an alien concept. > i remember refuting this last time and i have to say something again because this is equally misleading... apache does things this way if you build from source, and there are others as well. > >From a sysadmin perspective, having configuration in a standard location > makes sense. It makes these things easy to backup, archive, and put under > version control. (Many sysadmins put machine configuration under version > control to see what changes are made over time.) and i would say that right now the way postgresql does it is much easier. when you first get on a machine and need to find the webroot of an apache install, theres no telling where it could be simply because a lot of packagers do package things up differently. > > Finally, I'm not suggesting removing any functionality, I am suggesting > that configuration can and should be able to be located in a standard > location and the the configuration be able to point to the data volume. > IIRC part of the problem with the initial patch/proposal is that it had implementation issues following a couple of OS guidelines/specs, and there was an issue with the pid. One potential bonus I would see to this type of functionality is that on some servers I have multiple postgresql.confs on a server tuned to specific tasks at hand... ie one for a pg_restore vs. one for normal operations... it would be nice to point the db at a specific one rather than having to copy files back and forth. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes: > On Thu, 2004-04-08 at 09:49, pgsql@mohawksoft.com wrote: >> (2) I would bet that *most* deployments of PostgreSQL only use one >> database environment per server, so I'm not even sure that it would be an >> issue for the majority of current or prospective users. > except that when doing major version upgrades, i find it far better > practice to install multiple versions on the machine whenever possible, > even if you only intend to run a single version. In any case, you will never get such a proposal past the core developers, because we all run multiple PG installs per machine. My primary development machine currently has six postmasters alive on it (7.0, 7.1, ..., 7.4 + CVS tip); my alternate machine has five installations on it, though not all are alive since I've not had reason to restart them all since last reboot; even the laptop I'm physically typing on right now has more than one Postgres installation on it. And practically any time someone allows me access to a machine of theirs to check out some kind of portability issue, I'll build a test installation in my guest-account home directory, rather than muck with their live server. So, don't bother proposing anything that makes it even slightly harder to run multiple servers per machine. It will not happen. End of discussion. regards, tom lane
On Thu, Apr 08, 2004 at 11:32:19AM -0400, Tom Lane wrote: > > > A counterexample of Apache shows that you can easily use -f or another > > command line option to point the server to alternate master config > > file (which I believe is the same with MySQL). > > According to > http://www.mysql.com/documentation/mysql/bychapter/manual_Using_MySQL_Programs.html#Option_files > /etc/my.cnf will be read if it exists, no matter what you say on the > command line. So AFAICS the only way to make a private installation is > to make sure that you have overridden each and every setting in :-) I never used that "feature" so was never bitten by it. Anyway, Apache HTTP server seems to do it the right way, doesn't it? -- ------------------------------------------------------------------------Honza Pazdziora | adelton@fi.muni.cz | http://www.fi.muni.cz/~adelton/.project:Perl, mod_perl, DBI, Oracle, large Web systems, XML/XSL, ... Only self-confidentpeople can be simple.
> Robert Treat <xzilla@users.sourceforge.net> writes: >> On Thu, 2004-04-08 at 09:49, pgsql@mohawksoft.com wrote: >>> (2) I would bet that *most* deployments of PostgreSQL only use one >>> database environment per server, so I'm not even sure that it would be >>> an >>> issue for the majority of current or prospective users. > >> except that when doing major version upgrades, i find it far better >> practice to install multiple versions on the machine whenever possible, >> even if you only intend to run a single version. > > In any case, you will never get such a proposal past the core > developers, because we all run multiple PG installs per machine. > My primary development machine currently has six postmasters alive > on it (7.0, 7.1, ..., 7.4 + CVS tip); my alternate machine has five > installations on it, though not all are alive since I've not had reason > to restart them all since last reboot; even the laptop I'm physically > typing on right now has more than one Postgres installation on it. > And practically any time someone allows me access to a machine of > theirs to check out some kind of portability issue, I'll build a test > installation in my guest-account home directory, rather than muck with > their live server. > > So, don't bother proposing anything that makes it even slightly harder > to run multiple servers per machine. It will not happen. End of > discussion. > The problem with this conversation is that you assume the functionality desired would affect your methodology in any way. All I am asking for, and this is what my patch did, was add a few entries to postgresql.conf. "data_dir, hba_conf, and ident_conf. A later version of the patch added "include" and "runtime_pidfile." These features allow a postgreSQL system to be fully configurable via a postgresql.conf file. It may, in fact, make it easier to have multiple installs.
Tom Lane wrote: > Honza Pazdziora <adelton@informatics.muni.cz> writes: > > On Thu, Apr 08, 2004 at 10:31:44AM -0400, Tom Lane wrote: > >> It seems that MySQL *will* read /etc/my.cnf if it > >> exists, whether it's appropriate or not, and so it's impossible to have > >> a truly independent test installation, even though you can configure it > >> to build/install into nonstandard directories. Let's not emulate that > >> bit of brain damage. > > > A counterexample of Apache shows that you can easily use -f or another > > command line option to point the server to alternate master config > > file (which I believe is the same with MySQL). > > According to > http://www.mysql.com/documentation/mysql/bychapter/manual_Using_MySQL_Programs.html#Option_files > /etc/my.cnf will be read if it exists, no matter what you say on the > command line. So AFAICS the only way to make a private installation is > to make sure that you have overridden each and every setting in > /etc/my.cnf in a private config file that you do control. This is > tedious and breakage-prone, of course. Yes. But we don't have to do that. If we're truly concerned about the possibility of multiple installations attempting to use the same config, then the answer is simple: require that the location of the config file be specified on the command line and don't compile a default location into the binary. Similarly, don't take the value from an environment variable. Packaged installations won't have trouble with this: they supply a startup script which would pass the appropriate argument to the postmaster. If we want to be a bit paranoid (justifiable if you've got really important data on the line), we could also require that a version string exist in the config file. If the version string doesn't match the version of the postmaster being started, the postmaster exits with an error (and a hint of what to set the version string to and what the name of the version string parameter is). That way, even if you screw up on the command line, you won't hose a database by starting the wrong version of the postmaster against it. Not sure if this would break anything, though. -- Kevin Brown kevin@sysexperts.com
Kevin Brown wrote: > >If we're truly concerned about the possibility of multiple installations >attempting to use the same config, then the answer is simple: require >that the location of the config file be specified on the command line >and don't compile a default location into the binary. Similarly, don't >take the value from an environment variable. > >Packaged installations won't have trouble with this: they supply a startup >script which would pass the appropriate argument to the postmaster. > In order to keep with existing practice, you could say that you have to supply *either* a config file, which points to the data dir etc., *or* a data dir, in which case the config files must be in the data dir. I very much agree with the idea of not compiling in a default config file location. > > >If we want to be a bit paranoid (justifiable if you've got really >important data on the line), we could also require that a version >string exist in the config file. If the version string doesn't match >the version of the postmaster being started, the postmaster exits with >an error (and a hint of what to set the version string to and what the >name of the version string parameter is). That way, even if you screw >up on the command line, you won't hose a database by starting the wrong >version of the postmaster against it. Not sure if this would break >anything, though. > It won't start now if there's a version mismatch, and that's nothing whatever to do with the config file - it matches against the PG_VERSION file. We're already rightly paranoid on this point. cheers andrew
In the last exciting episode, kevin@sysexperts.com (Kevin Brown) wrote: > If we want to be a bit paranoid (justifiable if you've got really > important data on the line), we could also require that a version > string exist in the config file. If the version string doesn't match > the version of the postmaster being started, the postmaster exits with > an error (and a hint of what to set the version string to and what the > name of the version string parameter is). That way, even if you screw > up on the command line, you won't hose a database by starting the wrong > version of the postmaster against it. Not sure if this would break > anything, though. How would this differ from the present situation where $PGDATA/PG_VERSION is already required to match against the postmaster? As far as I can see, the only thing that is to be changed by the proposal is that instead of postgresql.conf being in $PGDATA, it might be found somewhere else. (And perhaps pg_hba.conf and pg_ident.conf will also be located in that mystical "somewhere else.") The change that _might_ be relevant would be to put a version string into postgresql.conf so that there would be _two_ matches made, not just one: - $PGDATA/PG_VERSION is required to match the postmaster; - $SOMEWHERE_ELSE/postgresql.conf's variable "version" is required to match the postmaster. But I think Tom put it pretty well when he commented that all of the core developers make extensive use of the notion of having _many_ backends around, and therefore would oppose any proposal that would make it less convenient to do that. Core folk aren't likely to write up patches designed to shoot themselves in the foot this way, nor are they likely to accept patches that clearly do so. -- let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;; http://www3.sympatico.ca/cbbrowne/linuxxian.html "There's no longer a boycott of Apple. But MacOS is still a proprietary OS." -- RMS - June 13, 1998
> Robert Treat <xzilla@users.sourceforge.net> writes: >> On Thu, 2004-04-08 at 09:49, pgsql@mohawksoft.com wrote: >>> (2) I would bet that *most* deployments of PostgreSQL only use one database environment per server, so I'm not even sure that it would be an >>> issue for the majority of current or prospective users. > >> except that when doing major version upgrades, i find it far better practice to install multiple versions on the machine whenever possible, even if you only intend to run a single version. > > In any case, you will never get such a proposal past the core > developers, because we all run multiple PG installs per machine. My primary development machine currently has six postmasters alive on it (7.0, 7.1, ..., 7.4 + CVS tip); my alternate machine has five installations on it, though not all are alive since I've not had reason to restart them all since last reboot; even the laptop I'm physically typing on right now has more than one Postgres installation on it. And practically any time someone allows me access to a machine of theirs to check out some kind of portability issue, I'll build a test installation in my guest-account home directory, rather than muck with their live server. > > So, don't bother proposing anything that makes it even slightly harder to run multiple servers per machine. It will not happen. End of discussion. I'll just post the README file at the bottom for a reference, but it PostgreSQL can appear completely unchanged, but with the addition of just two command line arguments, work "better" in situations where this sort of functionality is desired, I don't see why people would wish it not to be included. If there are issues with coding style, or other finer details, I would be very much willing to address any issues. Personally, I think the requirement of using symlinks to synthesize this functionality is a hard sell to many administrators. :::::::::: This patch enables PostgreSQL to be far more flexible in its configuration methodology. Specifically, It adds two more command line parameters, "-C" which specifies either the location of the postgres configuration file or a directory containing the configuration files, and "-R" which directs PostgreSQL to write its runtime process ID to a standard file which can be used by control scripts to control PostgreSQL. A patched version of PostgreSQL will function as: --- Configuration file --- postmaster -C /etc/postgres/postgresql.conf This will direct the postmaster program to use the configuration file "/etc/postgres/postgresql.conf" --- Configuration Directory --- postmaster -C /etc/postgres This will direct the postmaster program to search the directory "/etc/postgres" for the standard configuration file names: postgresql.conf, pg_hba.conf, and pg_ident.conf. --- Run-time process ID --- postmaster -R /var/run/postmaster.pid This will direct PostgreSQL to write its process ID number to a file, /var/run/postgresql.conf --- postgresql.conf options --- Within the configuration file there are five additional parameters: include, hba_conf,ident_conf, data_dir, and runtime_pidfile. They are used as: include = '/etc/postgres/debug.conf' data_dir = '/vol01/postgres' hba_conf = '/etc/postgres/pg_hba_conf' ident_conf = '/etc/postgres/pg_ident.conf' runtime_pidfile = '/var/run/postgresql.conf' The "-D" option on the command line overrides the "data_dir" in the configuration file. The "-R" option on the command line overrides the "runtime_pidfile" in the configuration file. If no hba_conf and/or ident_conf setting is specified, the default $PGDATA/pg_hba.conf and/or $PGDATA/pg_ident.conf will be used. If the "-C" option specifies a diretcory, pg_hba.conf and pg_ident.conf files must be in the specified directory. This patch is intended to move the PostgreSQL configuration out of the data directory so that it can be modified and backed up as well as answer some of the issues with deploying PostgreSQL in an FHS (File Hierarchy Standard) way. This patch is also useful for running multiple servers with the same parameters: postmaster -C /etc/postgres/postgresql.conf -D /VOL01/postgres -p 5432 postmaster -C /etc/postgres/postgresql.conf -D /VOL02/postgres -p 5433 To apply the patch, enter your PostgreSQL source directory, and run: cat pgec-PGVERSON.patch | patch -p 1
On Fri, 9 Apr 2004, Christopher Browne wrote: > > ...Tom ... commented that all of the core developers make extensive use > of the notion of having _many_ backends around, and therefore ... > > Core folk aren't likely to write up patches designed to shoot > themselves in the foot this way ... I's not just core developers who use this feature. For a program that's trying to be compatabile with Oracle, MySQL, MSSQLServer and PostgreSQL for backends, it's nice to have 7.3.X, 7.4.X, heck, even 7.0 family postgresql's running. And indeed, all except SQLServer (another guy's doing this one) are running on my machine. I test frequently against whatever database(s) are running on my development mahines. I test rarely against databases that aren't. Anything that makes that harder would be bad for developers using PostgreSQL as well as for the core team. Ron
> > On Fri, 9 Apr 2004, Christopher Browne wrote: >> >> ...Tom ... commented that all of the core developers make extensive use >> of the notion of having _many_ backends around, and therefore ... >> >> Core folk aren't likely to write up patches designed to shoot >> themselves in the foot this way ... > > I's not just core developers who use this feature. > > For a program that's trying to be compatabile with Oracle, > MySQL, MSSQLServer and PostgreSQL for backends, it's nice > to have 7.3.X, 7.4.X, heck, even 7.0 family postgresql's > running. And indeed, all except SQLServer (another guy's > doing this one) are running on my machine. > > I test frequently against whatever database(s) are running on > my development mahines. I test rarely against databases that > aren't. Anything that makes that harder would be bad for developers > using PostgreSQL as well as for the core team. > This is so frustrating, NO ONE IS TRYING TO MAKE IT HARDER! All the patch that I propose does is ADD functionality. Two command line switches, and five config file entries: include = '/etc/postgres/debug.conf' data_dir = '/vol01/postgres' hba_conf = '/etc/postgres/pg_hba_conf' ident_conf = '/etc/postgres/pg_ident.conf' runtime_pidfile = '/var/run/postgresql.conf' I am neither suggesting nor implementing any change in the current default behavior of PostgreSQL. I am merely adding features that would make it easier to do things like configure from a centralized directory which is different than the data directory, the ability to included "sub-configuration" like specific tuning or debug info, and to write a usable PID file for standard UNIX admin scripts.
pgsql@mohawksoft.com writes: > I am neither suggesting nor implementing any change in the current default > behavior of PostgreSQL. I am merely adding features that would make it > easier to do things like configure from a centralized directory which is > different than the data directory, the ability to included > "sub-configuration" like specific tuning or debug info, and to write a > usable PID file for standard UNIX admin scripts. Well, let's take it one piece at a time here. I can see some value in providing "#include" functionality in postgresql.conf (and the other config files too). I'm not convinced that it's a must-have, because the desired contents of the config files tend to change with each new PG version. But to the extent that you're admining multiple clusters of the same version, it would have some use. Moving the PID file out of the data directory is actively dangerous, because we use that file as part of the safety interlock against starting multiple postmasters in the same data directory. I suppose we could offer an option to write a second copy of the PID file at a different place, but I'm not seeing what that buys except confusion (especially if two postmasters are mistakenly instructed to put their copied PID files at the same place). The whole idea of having multiple command-line switches to pick config and data separately bothers me. ISTM this would mostly create great new opportunities to shoot yourself in the foot (by accidentally picking the wrong combination), without nearly enough benefit to outweigh the risk. Possibly this perspective is somewhat developer-centric --- I'm sure I manually start postmasters far more often than the average person. But then this whole discussion seems of interest only to people with outlier requirements; the existing setup works fine for the average user with only one Postgres installation. Could we compromise on just adding #include functionality? ISTM that would cover the desire for separate config and data directories. You could keep a postgresql.conf file in each data directory that simply says#include /etc/postgres/debug.conf and likewise for other config files. Doesn't that accomplish what you want? regards, tom lane
> pgsql@mohawksoft.com writes: >> I am neither suggesting nor implementing any change in the current >> default >> behavior of PostgreSQL. I am merely adding features that would make it >> easier to do things like configure from a centralized directory which is >> different than the data directory, the ability to included >> "sub-configuration" like specific tuning or debug info, and to write a >> usable PID file for standard UNIX admin scripts. > > Well, let's take it one piece at a time here. Cool. > > I can see some value in providing "#include" functionality in > postgresql.conf (and the other config files too). I'm not convinced > that it's a must-have, because the desired contents of the config files > tend to change with each new PG version. But to the extent that you're > admining multiple clusters of the same version, it would have some use. Speaking for myself, I like it because I can keep a set of debugging parameters in a separate file, and only comment out the "include ..." for production. (Not to mention that multiple databases can use it.) > > Moving the PID file out of the data directory is actively dangerous, > because we use that file as part of the safety interlock against > starting multiple postmasters in the same data directory. I suppose > we could offer an option to write a second copy of the PID file at > a different place, but I'm not seeing what that buys except confusion > (especially if two postmasters are mistakenly instructed to put their > copied PID files at the same place). The patch that I have creates a completly separate PID file from that in the PGDATA directory. It is used for more compatible UNIX init scripting. (Obviously on machines with only one database system.) > > The whole idea of having multiple command-line switches to pick config > and data separately bothers me. ISTM this would mostly create great new > opportunities to shoot yourself in the foot (by accidentally picking the > wrong combination), without nearly enough benefit to outweigh the risk. This is where I think we disagree. Very much so, in fact. I think having something like: /etc/postgres/webdb.conf In which there is a line: datadir=/RAID0/postgres and /etc/postgres/testdb.conf In which there is this line datadir=/RAID1/postgres Allows for a very standardized, and IMHO, very self documenting installation. > Possibly this perspective is somewhat developer-centric --- I'm sure > I manually start postmasters far more often than the average person. > But then this whole discussion seems of interest only to people with > outlier requirements; the existing setup works fine for the average user > with only one Postgres installation. Tom, I really disagree here. I really don't know how to convey my feelings about this, other than banging my head against the wall. I setup, develop on, and manage a lot of different systems, PostgreSQL is frustrating for me because I do not always have control over what is what. I have to deploy systems on machines which I do not get to specify the layout. I do not know where the various volumes will be. A year later, I will have completely forgotten, and of course my notes are no where to be found. One of the reasons I wrote these mods was so I could create a "standard." All my PostgreSQL systems have an /etc/postgres/postgresql.conf file. I sit down and know immediately where to look. I can *always* tell a user, over the phone, run: "/usr/local/pgsql/bin/postmaster -C /etc/postgres/postgresql.conf" It *always* works, and when it doesn't it is because something has changed. It may make it easier for an expert to shoot themselves in the foot, but it also makes it easier for an expert to make it bullet proof. > > Could we compromise on just adding #include functionality? ISTM that > would cover the desire for separate config and data directories. You > could keep a postgresql.conf file in each data directory that simply > says > #include /etc/postgres/debug.conf > and likewise for other config files. Doesn't that accomplish what you > want? > The include functionality was added as a result from a debate about this patch a couple years ago. The Primary purpose of my patch was to have the configuration in a standard location.
Tom Lane wrote: > pgsql@mohawksoft.com writes: > > I am neither suggesting nor implementing any change in the current default > > behavior of PostgreSQL. I am merely adding features that would make it > > easier to do things like configure from a centralized directory which is > > different than the data directory, the ability to included > > "sub-configuration" like specific tuning or debug info, and to write a > > usable PID file for standard UNIX admin scripts. > > Well, let's take it one piece at a time here. > The whole idea of having multiple command-line switches to pick config > and data separately bothers me. ISTM this would mostly create great new > opportunities to shoot yourself in the foot (by accidentally picking the > wrong combination), without nearly enough benefit to outweigh the risk. > Possibly this perspective is somewhat developer-centric --- I'm sure > I manually start postmasters far more often than the average person. > But then this whole discussion seems of interest only to people with > outlier requirements; the existing setup works fine for the average user > with only one Postgres installation. > > Could we compromise on just adding #include functionality? ISTM that > would cover the desire for separate config and data directories. You > could keep a postgresql.conf file in each data directory that simply > says > #include /etc/postgres/debug.conf > and likewise for other config files. Doesn't that accomplish what you > want? As I remember, there were two threads in the 7.4 discussion: http:/momjian.postgresql.org/cgi-bin/pgpatches2 The discussions are the top-most threads. One issue was having the config file, postgresql.conf, drive the PGDATA location. The second issue was putting all the config files, postgresql.conf, pg_hba.conf, and pg_ident.conf in a separate directory, so it was easier to backup, easier to know which files to edit, and easier to symlink it to some other location. On the issue of having postgresql.conf point to the data directory, that basically add a level of indirection between the config file and the data file, and I know some are concerned that there could be a configuration error that could corrupt the database. It is basically putting the config file first, and letting the data directory derive from that, rather than pointing to the data directory and finding the config file in there. A third option just mentioned is adding an #include capability to the config file. That gives per-line control over the file contents. We already have the ability to include a list of database/user/group names in pg_hba.conf. A fourth idea, where someone just posted a patch, was to have the config directory and data directory independent and add flags to point to each separately. I think lots of folks didn't like that because forgetting to specify the config directory would give you a running postmaster with different config values from previous times you did specify the config directory. That just seems too error-prone. Obviously, we need to do something. There are just too many people who want improvement in this area. The question is what changes to make. My personal opinion is that we move the config files in /data/etc, and allow admins to move that directory somewhere else with symlinks. If we want to add #include capability too, that would help things. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
> Tom Lane wrote: >> pgsql@mohawksoft.com writes: >> > I am neither suggesting nor implementing any change in the current >> default >> > behavior of PostgreSQL. I am merely adding features that would make it >> > easier to do things like configure from a centralized directory which >> is >> > different than the data directory, the ability to included >> > "sub-configuration" like specific tuning or debug info, and to write a >> > usable PID file for standard UNIX admin scripts. >> >> Well, let's take it one piece at a time here. >> The whole idea of having multiple command-line switches to pick config >> and data separately bothers me. ISTM this would mostly create great new >> opportunities to shoot yourself in the foot (by accidentally picking the >> wrong combination), without nearly enough benefit to outweigh the risk. >> Possibly this perspective is somewhat developer-centric --- I'm sure >> I manually start postmasters far more often than the average person. >> But then this whole discussion seems of interest only to people with >> outlier requirements; the existing setup works fine for the average user >> with only one Postgres installation. >> >> Could we compromise on just adding #include functionality? ISTM that >> would cover the desire for separate config and data directories. You >> could keep a postgresql.conf file in each data directory that simply >> says >> #include /etc/postgres/debug.conf >> and likewise for other config files. Doesn't that accomplish what you >> want? > > As I remember, there were two threads in the 7.4 discussion: > > http:/momjian.postgresql.org/cgi-bin/pgpatches2 > > The discussions are the top-most threads. The threads I am talking about took place about a year or two ago. February 2003 sounds about right. > > One issue was having the config file, postgresql.conf, drive the PGDATA > location. The second issue was putting all the config files, > postgresql.conf, pg_hba.conf, and pg_ident.conf in a separate directory, > so it was easier to backup, easier to know which files to edit, and > easier to symlink it to some other location. Most DBA/Admins, myself included, don't like symlinks. > > On the issue of having postgresql.conf point to the data directory, that > basically add a level of indirection between the config file and the > data file, and I know some are concerned that there could be a > configuration error that could corrupt the database. It is basically > putting the config file first, and letting the data directory derive > from that, rather than pointing to the data directory and finding the > config file in there. This is a phylosophical argument about software configuration: How do you configure software, in configuration files or known files within a directory. I prefer everything relative from a configuration file. > > A third option just mentioned is adding an #include capability to the > config file. That gives per-line control over the file contents. We > already have the ability to include a list of database/user/group names > in pg_hba.conf. That is easy enought. > > A fourth idea, where someone just posted a patch, was to have the config > directory and data directory independent and add flags to point to each > separately. I think lots of folks didn't like that because forgetting > to specify the config directory would give you a running postmaster with > different config values from previous times you did specify the config > directory. That just seems too error-prone. I have 2 huge problems with using the data directory as the location of the configuration: (1) Backup and sharing of configuration state is not obvious. (2) There is no self documenting equivilent using the data directory. This directory can be *anywhere* on the system. If using a standardized configuration, the install becomes obvious. > > Obviously, we need to do something. There are just too many people who > want improvement in this area. The question is what changes to make. > > My personal opinion is that we move the config files in /data/etc, and > allow admins to move that directory somewhere else with symlinks. If we > want to add #include capability too, that would help things. > I wish I could impress on you the distaste the average admin has for symlinks. If you knew how much DBAs and sys-admins hated symlinks, you wouldn't think of them as a solution. To most, a symlink is used when the software has no other viable option. When and admin needs to use a symlink to configure software, they view this as a cop-out.
I seems to me that the existing situation is actually correct : The configuration is a property of the initialized database cluster, so a logical place for it is in the root of said cluster. It is *not* a property of the installed binary distribution (e.g /usr/local/pgsql/etc) - as you may have *several* database clusters created using *this* binary distribution, each of which requiring a different configuration. Having said that, I am ok about the 'include' idea. regards Mark
> I seems to me that the existing situation is actually correct : > > The configuration is a property of the initialized database cluster, so > a logical place for it is in the root of said cluster. > > It is *not* a property of the installed binary distribution (e.g > /usr/local/pgsql/etc) - as you may have *several* database clusters > created using *this* binary distribution, each of which requiring a > different configuration. > > Having said that, I am ok about the 'include' idea. What I am finding difficult in this debate is that people are so resistent, not to change, but to the idea that someone would want to manage the system in a different way than they would. Yes, there are probably many people who have multiple PostgreSQL database clusters installed and operating simultaneously on their systems. No one is saying that this needs to change in any way. IMHO my patch can do this in a self documenting way, thus making it easier to do, i.e. postmaster -C /etc/postgres/fundb.conf postmaster -C /etc/postgres/testdb.conf I think that is far more intuitive than: postmaster -D /some/path/who/knows/where/fundb postmaster -D /another/path/i/don/t/know/testdb (Sorry for the sarcasm :-) The point is, that configuration, including data cluster location, through the configuration file is where a lot of PostgreSQL admins would like to be. I understand the ease and historical nessesity of having everything in the PGDATA directory, and as I've said many many times, I'm not suggesting changing this default behavior, I simply want to add the features that would allow PostgreSQL to be managed similarly to more mainstream UNIX daemons like named, dhcpd, and so on. I have been using this patch for a while and it makes administration easier for me. What is difficult in this patch is that it is not technically a "SQL feature" which can be debated on functionality, it is more of a usability feature which, by nature, is quite subjective. After a certain point, people get polarized and debate sort of stops and discussion becomes stating and restating the same contrary opinions. It is frustrating. I think this is important, as I would not have written and maintained it otherwise, but by being a somewhat subjective feature I can't make any iron clad arguments for it. I can only say it makes administration easier for those who whould like PostgreSQL administered this way. If the prevailing view is "we don't think so," then it doesn't get put it, but it doesn't make my arguments any less valid.
On Sat, Apr 10, 2004 at 03:53:49PM -0400, pgsql@mohawksoft.com wrote: > > The whole idea of having multiple command-line switches to pick config > > and data separately bothers me. ISTM this would mostly create great new > > opportunities to shoot yourself in the foot (by accidentally picking the > > wrong combination), without nearly enough benefit to outweigh the risk. > > This is where I think we disagree. Very much so, in fact. I think having > something like: > > /etc/postgres/webdb.conf > In which there is a line: > datadir=/RAID0/postgres > > and > > /etc/postgres/testdb.conf > In which there is this line > datadir=/RAID1/postgres > > Allows for a very standardized, and IMHO, very self documenting installation. But not as flexible as the existing alternative. For instance, what if webdb is PostgreSQL 7.3 and testdb is PostgreSQL 7.4? There is no way you can put that difference in a configuration file, so the user will still need to know which binary of postgresql to fire up. So, yes, let's have a standard directory for storing the configuration for all the PostgreSQL installations on the machine. /etc/postgres sounds fine. In /etc/postgres/webdb: #!/bin/sh datadir=/RAID0/postgres /usr/local/pgsql73/bin/postmaster -D $datadir and in /etc/postgres/testdb #!/bin/sh datadir=/RAID1/postgres /usr/local/pgsql742/bin/postmaster -D $datadir Much more flexible and explicitly self-documenting. For more flexibility still, do what I do and make the scripts standard rc.d style startup scripts. To walk a user through listing the supported installations is easy - 'ls /etc/postgres'. Starting and stopping one - '/etc/postgres/webdb start' or '/etc/postgres/webdb stop'. Checking system status and displaying the data directory '/etc/postgres/webdb status'. It seems to me to be far more intuitive to the end user, and to the typical admin than your -C suggestion, it's certainly safer, and it works fine now. Cheers, Steve
> On Sat, Apr 10, 2004 at 03:53:49PM -0400, pgsql@mohawksoft.com wrote: >> > The whole idea of having multiple command-line switches to pick config >> > and data separately bothers me. ISTM this would mostly create great >> new >> > opportunities to shoot yourself in the foot (by accidentally picking >> the >> > wrong combination), without nearly enough benefit to outweigh the >> risk. >> >> This is where I think we disagree. Very much so, in fact. I think having >> something like: >> >> /etc/postgres/webdb.conf >> In which there is a line: >> datadir=/RAID0/postgres >> >> and >> >> /etc/postgres/testdb.conf >> In which there is this line >> datadir=/RAID1/postgres >> >> Allows for a very standardized, and IMHO, very self documenting >> installation. > > But not as flexible as the existing alternative. But your existing alternative is *NOT* going away. > > For instance, what if webdb is PostgreSQL 7.3 and testdb is PostgreSQL > 7.4? There is no way you can put that difference in a configuration > file, so the user will still need to know which binary of postgresql > to fire up. > > So, yes, let's have a standard directory for storing the configuration > for all the PostgreSQL installations on the machine. > > /etc/postgres sounds fine. > > In /etc/postgres/webdb: > > #!/bin/sh > datadir=/RAID0/postgres > /usr/local/pgsql73/bin/postmaster -D $datadir > > and in /etc/postgres/testdb > > #!/bin/sh > datadir=/RAID1/postgres > /usr/local/pgsql742/bin/postmaster -D $datadir > > Much more flexible and explicitly self-documenting. But also has multiple shell scripts and you can't share or have standard configuration files like pg_hba or pg_ident. > > For more flexibility still, do what I do and make the scripts standard > rc.d style startup scripts. > > To walk a user through listing the supported installations is easy - > 'ls /etc/postgres'. Starting and stopping one - '/etc/postgres/webdb > start' > or '/etc/postgres/webdb stop'. Checking system status and displaying the > data directory '/etc/postgres/webdb status'. > > It seems to me to be far more intuitive to the end user, and to the > typical admin than your -C suggestion, it's certainly safer, and it > works fine now. I don't see the "safer" argument. If we wanted "safer" we would code PostgreSQL in Java or BASIC. What we want is efficiency. Admittedly, my patch is not intended to make the users of multiple installations/versions of PostgreSQL any easier or, for that matter, any different. No one is suggesting changing the default behavior of PostgreSQL. All the people arguing against this patch will never even notice that it is there. For all the people who would like PostgreSQL to fit in a FHS system, easily, they will probably use it. In fact, I would bet real money, that if this functionality is incorporated into PostgreSQL, it will become the defacto methodology for the various distributions.
pgsql@mohawksoft.com wrote: > >IMHO my patch can do this in a self >documenting way, thus making it easier to do, i.e. > >postmaster -C /etc/postgres/fundb.conf >postmaster -C /etc/postgres/testdb.conf > >I think that is far more intuitive than: > >postmaster -D /some/path/who/knows/where/fundb >postmaster -D /another/path/i/don/t/know/testdb > > > To be honest - to me, both these look about the same on the intuitiveness front :-) I do not like lots of command line agruments so usually use : export PGDATA=/var/pgdata/<version> pg_ctl start I realize that I cannot objectively argue that this is intuitively better...it is just what I prefer. >It is frustrating. I think this is important, as I would not have written >and maintained it otherwise, but by being a somewhat subjective feature I >can't make any iron clad arguments for it. I can only say it makes >administration easier for those who whould like PostgreSQL administered >this way. If the prevailing view is "we don't think so," then it doesn't >get put it, but it doesn't make my arguments any less valid. > > > I completely agree. We are discussing what we would prefer - which is a valid thing to do. Clearly if most people prefer most of what is in your patch, then it would be silly to ignore it! So anyway, here is my vote on it : i) the inlcude - I like it ii) the -C switch - could be persuaded (provided some safety is there - like mutually exclusive with -D or PGDATA) iii) the pid file - don't like it regards Mark
Mark Kirkwood wrote: > I seems to me that the existing situation is actually correct : > > The configuration is a property of the initialized database cluster, so > a logical place for it is in the root of said cluster. > > It is *not* a property of the installed binary distribution (e.g > /usr/local/pgsql/etc) - as you may have *several* database clusters > created using *this* binary distribution, each of which requiring a > different configuration. > > Having said that, I am ok about the 'include' idea. My idea was to put config files in /usr/local/pgsql/data/etc, not pgsql/etc. We don't put Unix configuration files in /, etc put them in /etc. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
pgsql@mohawksoft.com wrote: > > Obviously, we need to do something. There are just too many people who > > want improvement in this area. The question is what changes to make. > > > > My personal opinion is that we move the config files in /data/etc, and > > allow admins to move that directory somewhere else with symlinks. If we > > want to add #include capability too, that would help things. > > > > I wish I could impress on you the distaste the average admin has for > symlinks. If you knew how much DBAs and sys-admins hated symlinks, you > wouldn't think of them as a solution. To most, a symlink is used when the > software has no other viable option. When and admin needs to use a symlink > to configure software, they view this as a cop-out. Let me tell you the compromise I thought of. First, we put the config files (postgresql.conf, pg_hba.conf, pg_ident.conf) in data/etc by default. Then, we could add an initdb option to put the config files in another location. If you choose that, the config files are put into that new directory, and a symlink is created in /data/etc to point to that new location. That way, you can centralize all your config files under one central directory, you can find and back them up easily, and the /data directory contains a symlink pointing to the config directory so you don't need to specify a separate config directory on the command line. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian wrote: > My idea was to put config files in /usr/local/pgsql/data/etc, not > >pgsql/etc. > >We don't put Unix configuration files in /, etc put them in /etc. > > > Sorry, I missed the 'data' pathname. However - I may be a bit slow - but I do not see how this will handle the situation where you have one installation of pgsql running several clusters. (I am not sure how common this situation is mind you) regards Mark
Mark Kirkwood wrote: > Bruce Momjian wrote: > > > My idea was to put config files in /usr/local/pgsql/data/etc, not > > > >pgsql/etc. > > > >We don't put Unix configuration files in /, etc put them in /etc. > > > > > > > Sorry, I missed the 'data' pathname. However - I may be a bit slow - but > I do not see how this will handle the situation where you have one > installation of pgsql running several clusters. (I am not sure how > common this situation is mind you) It is common. Moving things to data/etc will make things clearer, and see my later email on an initdb option to put /data/etc/ somewhere else and put a symlink for /data/etc. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
> pgsql@mohawksoft.com wrote: > >> >>IMHO my patch can do this in a self >>documenting way, thus making it easier to do, i.e. >> >>postmaster -C /etc/postgres/fundb.conf >>postmaster -C /etc/postgres/testdb.conf >> >>I think that is far more intuitive than: >> >>postmaster -D /some/path/who/knows/where/fundb >>postmaster -D /another/path/i/don/t/know/testdb >> >> >> > > To be honest - to me, both these look about the same on the > intuitiveness front :-) OK, I am yelling in a sound proof room. :-) > > I do not like lots of command line agruments so usually use : > > export PGDATA=/var/pgdata/<version> > pg_ctl start > > I realize that I cannot objectively argue that this is intuitively > better...it is just what I prefer. > >>It is frustrating. I think this is important, as I would not have written >>and maintained it otherwise, but by being a somewhat subjective feature I >>can't make any iron clad arguments for it. I can only say it makes >>administration easier for those who whould like PostgreSQL administered >>this way. If the prevailing view is "we don't think so," then it doesn't >>get put it, but it doesn't make my arguments any less valid. >> >> >> > I completely agree. We are discussing what we would prefer - which is a > valid thing to do. Clearly if most people prefer most of what is in your > patch, then it would be silly to ignore it! > > So anyway, here is my vote on it : > > i) the inlcude - I like it > ii) the -C switch - could be persuaded (provided some safety is there - > like mutually exclusive with -D or PGDATA) > iii) the pid file - don't like it i) include, I don't care too much, I like it, but it isn't important to me. (ironic, yes?) ii) I think the -C switch *WITH* the -D switch has viable usability. Consider this, you are testing two different database layouts and/or RAID controllers. You could easily bounce back and forth from *identical* configurations like this: postmaster -C /etc/postgres/postgresql.conf -D /OLDRAID Test performance on various clients. postmaster -C /etc/postgres/postgresql.conf -D /NEWRAID Test performance again with same clients. In the above example, you don't need to configure the two systems separately. iii) I don't like the PID file at all. Not one bit, but I had a few people ask for it in the patch, it works as advertized and expected. It isn't my place to say how someone should use something. One of my customers wanted it, so I provided them with it. That is the beauty of open source.
> pgsql@mohawksoft.com wrote: >> > Obviously, we need to do something. There are just too many people >> who >> > want improvement in this area. The question is what changes to make. >> > >> > My personal opinion is that we move the config files in /data/etc, and >> > allow admins to move that directory somewhere else with symlinks. If >> we >> > want to add #include capability too, that would help things. >> > >> >> I wish I could impress on you the distaste the average admin has for >> symlinks. If you knew how much DBAs and sys-admins hated symlinks, you >> wouldn't think of them as a solution. To most, a symlink is used when >> the >> software has no other viable option. When and admin needs to use a >> symlink >> to configure software, they view this as a cop-out. > > Let me tell you the compromise I thought of. > > First, we put the config files (postgresql.conf, pg_hba.conf, > pg_ident.conf) in data/etc by default. What does that really give you? > > Then, we could add an initdb option to put the config files in another > location. If you choose that, the config files are put into that new > directory, and a symlink is created in /data/etc to point to that new > location. Symlinks don't go with an scp. This is frustrating. Please take no offense by this, symlinks do not always act exactly like files. Most of the time they do, but every now and then, depending on the application or utilities used, symlinks get copied with invalid links or ignored alltogether. IMHO, the PostgreSQL team depends too much on symlinks as a bandaid to real defects and issues. I would prefer to be able to configure a system without symlinks. Sysadmins and DBAs do not like symlinks. Any "solution" based on symlinks will be used grudgingly. > > That way, you can centralize all your config files under one central > directory, you can find and back them up easily, and the /data directory > contains a symlink pointing to the config directory so you don't need to > specify a separate config directory on the command line. I would like to ask you, why does there need to be a compromise? (I am not oppsed to compromise, but you are relying on symlinks again, and this is a problem. ) (1) The code is written. (2) The code is working. (3) The code does not affect current default behavior. (4) I am willing to change to fit any coding standards which may be an issue. It makes no sense to me to write something new as a compromise, when I already have something that works, is (obviously) already what I want, and does not, in fact, change any default PostgreSQL behavior. Take a look at the patch, I submitted it about a year or so ago, and it was rejected in favor of a redesign peter was going to do. Needless to say that was not done. This is such a *little* thing (The patch is only 760 lines), I can't believe it is so difficult, I simply do not understand the opposition to it, not then and not now. Could someone please tell me why this is bad? I "get it" that people on this group don't want to do it this way, but what is *wrong*, and by wrong, I mean harmful to PostgreSQL, about it? No one that does not like this functionality would ever even be inconvenienced by it. Those of us who want it, would find it more convenient. Could someone please tell me why this is such a fight? I've been maintaining this patch for well over year now, it spans two major versions, and I have people downloading it from my site every month.
Bruce Momjian wrote: >Mark Kirkwood wrote: > > >>Bruce Momjian wrote: >> >> >> >>>My idea was to put config files in /usr/local/pgsql/data/etc, not >>> >>>pgsql/etc. >>> >>>We don't put Unix configuration files in /, etc put them in /etc. >>> >>> >>> >>> >>> >>Sorry, I missed the 'data' pathname. However - I may be a bit slow - but >>I do not see how this will handle the situation where you have one >>installation of pgsql running several clusters. (I am not sure how >>common this situation is mind you) >> >> > >It is common. Moving things to data/etc will make things clearer, and >see my later email on an initdb option to put /data/etc/ somewhere else >and put a symlink for /data/etc. > > Hmmm, the current setup handles this situation sensibly and without the need for symlinks. So this does not look like animprovement to me... This *could* work without symlinks if you introduce a "name" for each initialized cluster, and make this part of the configfile name. This would mean that you could use 'data/etc' and have many config files therein, each of which would *unambiguously*point to a given cluster. As a general point I share Tom's concern about breaking the association between the initialized cluster and its configurationfile - e.g: I start "prod" with the configuration for "test" by mistake, and "test" has fsync=false... and somethingpulls the power... regards Mark
The only other idea I can think of is to create a new pg_path.conf file. It would have the same format as postgresql.conf, but contain information about /data location, config file location, and perhaps pg_xlog location. The file would be created by special flags to initdb, and once created, would have to be used instead of pgdata for postmaster startup. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
> The only other idea I can think of is to create a new pg_path.conf file. > It would have the same format as postgresql.conf, but contain > information about /data location, config file location, and perhaps > pg_xlog location. > > The file would be created by special flags to initdb, and once created, > would have to be used instead of pgdata for postmaster startup. That seems like a lot more risky, doesn't it? What is technically bad about my patch? Why is it "bad?" Everyone is offering something different than what I suggest. What is technically wrong with the patch? What can I alter to correct any concerns? I'm not a very good at politics, I sometimes tend to alianate people in discussions, but I am simply unable to understand why the features I suggest are not being considered "as is." I have been using them for a while now, I find them very useful, and I have people downloading the patch from my site on a regular basis. Yet I an unable to say "Here can we add this." The response is "We don't like this for x, y, and z," but reasons x, y, and z already exist in one form or another in the current implementation. (1) What tangable harm comes to postgresql.conf from these features? (2) What problem (security, stabilitry, safety, etc.) is created by these features that doesn't already exist in some form already. (3) Isn't having this as an option "better" than making it normal for people to mess around in the PGDATA directory? (4) Isn't open source and UNIX phylosophy about providing capability not enforcing policy?
On Sunday 11 April 2004 11:56, pgsql@mohawksoft.com wrote: > > On Sat, Apr 10, 2004 at 03:53:49PM -0400, pgsql@mohawksoft.com wrote: > For all the people who would like PostgreSQL to fit in a FHS system, > easily, they will probably use it. In fact, I would bet real money, that > if this functionality is incorporated into PostgreSQL, it will become the > defacto methodology for the various distributions. > IIRC (and admittidly I am being too lazy to look it up here) but doesn't the FHS require the pid file to be in a specific location (/tmp?) ISTR that this became an issue last time around, since your patch didn't actually allow full FHS compliance (while admitidly allowing more compliance, but that's like being a little pregnant) So this is the one thing I think is a potential sticking point... how do we prevent users from blowing up thier databases by specifying multiple PID locations for the same DATA dir? Anything that makes this easier to do is A Bad Thing (tm) because it can certainly lead to irrecoverable data corruption. One other thought relevant to this topic... one thing I have always wished for was some type of GUC (for lack of a better mechanism) that would tell me from inside the database what PGDATA path is currently being used to power the database. I've certainly seen enough cases of people modifying the /wrong/ postgresql.conf on thier systems to think that the ability to figure out which configuration files you are using inside the db would certainly be a bonus... and this would have also solved the original complaint of not knowing where the $PGDATA path was... connect to the database and query for it... Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
pgsql@mohawksoft.com wrote: > > The only other idea I can think of is to create a new pg_path.conf file. > > It would have the same format as postgresql.conf, but contain > > information about /data location, config file location, and perhaps > > pg_xlog location. > > > > The file would be created by special flags to initdb, and once created, > > would have to be used instead of pgdata for postmaster startup. > > That seems like a lot more risky, doesn't it? What is technically bad > about my patch? Why is it "bad?" Everyone is offering something different > than what I suggest. What is technically wrong with the patch? What can I > alter to correct any concerns? > > I'm not a very good at politics, I sometimes tend to alianate people in > discussions, but I am simply unable to understand why the features I > suggest are not being considered "as is." I have been using them for a > while now, I find them very useful, and I have people downloading the > patch from my site on a regular basis. Yet I an unable to say "Here can we > add this." The response is "We don't like this for x, y, and z," but > reasons x, y, and z already exist in one form or another in the current > implementation. > > (1) What tangable harm comes to postgresql.conf from these features? > (2) What problem (security, stabilitry, safety, etc.) is created by these > features that doesn't already exist in some form already. > (3) Isn't having this as an option "better" than making it normal for > people to mess around in the PGDATA directory? > (4) Isn't open source and UNIX phylosophy about providing capability not > enforcing policy? I think the major problem with your -C & -D idea is that you require the administrator to link the config file and data directory everytime you start the db, and that might be error-prone. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Mon, 12 Apr 2004, Bruce Momjian wrote: > pgsql@mohawksoft.com wrote: > > > The only other idea I can think of is to create a new pg_path.conf file. > > > It would have the same format as postgresql.conf, but contain > > > information about /data location, config file location, and perhaps > > > pg_xlog location. > > > > > > The file would be created by special flags to initdb, and once created, > > > would have to be used instead of pgdata for postmaster startup. > > > > That seems like a lot more risky, doesn't it? What is technically bad > > about my patch? Why is it "bad?" Everyone is offering something different > > than what I suggest. What is technically wrong with the patch? What can I > > alter to correct any concerns? > > > > I'm not a very good at politics, I sometimes tend to alianate people in > > discussions, but I am simply unable to understand why the features I > > suggest are not being considered "as is." I have been using them for a > > while now, I find them very useful, and I have people downloading the > > patch from my site on a regular basis. Yet I an unable to say "Here can we > > add this." The response is "We don't like this for x, y, and z," but > > reasons x, y, and z already exist in one form or another in the current > > implementation. > > > > (1) What tangable harm comes to postgresql.conf from these features? > > (2) What problem (security, stabilitry, safety, etc.) is created by these > > features that doesn't already exist in some form already. > > (3) Isn't having this as an option "better" than making it normal for > > people to mess around in the PGDATA directory? > > (4) Isn't open source and UNIX phylosophy about providing capability not > > enforcing policy? > > I think the major problem with your -C & -D idea is that you require the > administrator to link the config file and data directory everytime you > start the db, and that might be error-prone. Well, AFAICS the patch doesn't require that actually, it merely allows the separation. You can place the data directory in the configuration file and only use -C, you can place the configuration in the standard place under data and only use -D or you can specify both on the command line. I think the real potential harm would be from any current or future options where it'd be possible to have the system behave improperly when started up with the wrong value relative to a particular data directory. This would be especially bad if it was difficult or impossible to realize that it had happened and might then actually destroy data. I'm reasonably sure that such an option shouldn't be in an expected to be edited by admin configuration file, though.
<quote who="Bruce Momjian"> > The only other idea I can think of is to create a new pg_path.conf file. > It would have the same format as postgresql.conf, but contain > information about /data location, config file location, and perhaps > pg_xlog location. > > The file would be created by special flags to initdb, and once created, > would have to be used instead of pgdata for postmaster startup. > Bruce, I thought the idea was to *reduce* the number of config files and provide a unified configuration file. Ideally, the unified configuration file could eliminate the need for environment variables altogether. If I understand this correctly, the author was adding the ability to do this, not remove the default behavior. A single configuration point (which can be changed with a commandline switch) with the ability to include would be an exceptionally versatile asset for postgresql. Maybe relocating PID would be a bad idea and someone could clobber their database, but that could be addressed with LARGE WARNING in that config file where the option is available. Outside of the unified config file argument. "Configuration includes" give postgresql the ability to have shared settings. You could have a shared pg_hba.conf and test all other manner of settings with a set of config files (sort_mem, shared_buffers, etc.) that say include a standard_pg_hba.conf to control access. The single config file argument has the capacity to emulate the existing default behavior. # SINGLE DEFAULT CONFIG FILE Include /var/lib/data/postgresql/postgresql.conf Include /var/lib/data/postgresql/pg_hba.conf Include /var/lib/data/postgresql/pg_ident.conf or #SINGLE DEFAULT CONFIG FILE include options /var/lib/postgresql/data/postgresql.conf include access /var/lib/postgresql/data/pg_hba.conf include identity_map /var/lib/postgresql/data/pg_ident.conf
Thomas Swan wrote: > I thought the idea was to *reduce* the number of config files and provide > a unified configuration file. Ideally, the unified configuration file > could eliminate the need for environment variables altogether. > > If I understand this correctly, the author was adding the ability to do > this, not remove the default behavior. > > A single configuration point (which can be changed with a commandline > switch) with the ability to include would be an exceptionally versatile > asset for postgresql. Maybe relocating PID would be a bad idea and > someone could clobber their database, but that could be addressed with > LARGE WARNING in that config file where the option is available. > > Outside of the unified config file argument. "Configuration includes" > give postgresql the ability to have shared settings. You could have a > shared pg_hba.conf and test all other manner of settings with a set of > config files (sort_mem, shared_buffers, etc.) that say include a > standard_pg_hba.conf to control access. I suggested a new pg_path configuration file because it would enable centralized config only if it was used. By adding /data location to postgresql.conf, you have the postgresql.conf file acting sometimes via PGDATA and sometimes as a central config file, and I thought that was confusing. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Obviously, we need to do something. There are just too many people who > want improvement in this area. The question is what changes to make. As far as I've seen in this thread, there's only *one* person arguing for change, and even he isn't advocating changing the default behavior. So why are you of the opinion that we need to make radical changes in the default behavior? Which is what the proposals you've suggested are. I haven't seen anything that involves changing the default behavior that would not make it materially harder to run multiple copies of PG, and especially would make it materially harder to create a test installation without needing root privileges to do the install. Anything that pushes config files into fixed places means you need root. The whole discussion reminds me quite a bit of Tom Lockhart's patch to specify WAL file location on the postmaster command line. That one unfortunately degenerated into a religious war :-( which it seems we are coming perilously close to here as well. But I think the issues are very similar --- convenience of setup versus probability of accidentally setting up the wrong thing. The potential downside to the WAL location business was a lot worse than what we face for config, but it's still a real risk. Mark Kirkwood pointed out the risk of starting a production server with the fsync=off setting you use for a test database. Another example is starting server A with the pg_hba.conf settings you mean to use with server B, and thereby allowing the wrong set of people access to server A; in the worst case scenario that'd be a major security breach. My general feeling about it is that adding additional postmaster command line switches is not the way to go, especially not when those switches can specify things that might be subtly incompatible with other switch- selected things. That's why I don't like the -C versus -D business. It's too easy to make a mistake if you are starting the postmaster manually, and it's too hard to handle if you are starting the postmaster from an init script (since generally users aren't supposed to edit init scripts directly, no?). There should be just *one* switch. From a pure functionality point of view it wouldn't matter much whether it was -C or -D, as the system could be designed to find either from the other. But we have a longstanding precedent that it is -D and you find the config from that. I don't think we should lightly cast aside backwards compatibility just to reverse the convention. I have not heard any argument so far that explains to me why it wouldn't work fine to leave the postmaster switch set as-is (-D only), and expect people who want centralized config to set up the config files inside that data directory to be dummies that point to master config files elsewhere. You can do that today with symlinks, and for those who dislike symlinks I'm willing to adopt the part of the patch that allows "#include"-type functionality. This approach keeps the config-to-data association stored in the filesystem where it should be, rather than relying on the DBA to remember to specify the correct -C and -D pair every time he starts the postmaster. It also allows many-to-one relationships to work properly. You can easily make multiple data directories point to the same config files, if that is indeed what you mean to do. You can't make one config file point to multiple data directories, so the other way requires both -C and -D on the command line which is error-prone. I find no merit in the argument about "I can't remember where the data directory is". If you can't remember that then how are you going to remember where the config file is either? The only way is to establish a personal standard. If you want to have a personal standard about where the centralized config files are, fine --- you can even add comments to them to remind you of which data directory(s) each one is used with. But I don't see that that's fundamentally superior to doing things in the reverse way. regards, tom lane
> pgsql@mohawksoft.com wrote: >> > The only other idea I can think of is to create a new pg_path.conf >> file. >> > It would have the same format as postgresql.conf, but contain >> > information about /data location, config file location, and perhaps >> > pg_xlog location. >> > >> > The file would be created by special flags to initdb, and once >> created, >> > would have to be used instead of pgdata for postmaster startup. >> >> That seems like a lot more risky, doesn't it? What is technically bad >> about my patch? Why is it "bad?" Everyone is offering something >> different >> than what I suggest. What is technically wrong with the patch? What can >> I >> alter to correct any concerns? >> >> I'm not a very good at politics, I sometimes tend to alianate people in >> discussions, but I am simply unable to understand why the features I >> suggest are not being considered "as is." I have been using them for a >> while now, I find them very useful, and I have people downloading the >> patch from my site on a regular basis. Yet I an unable to say "Here can >> we >> add this." The response is "We don't like this for x, y, and z," but >> reasons x, y, and z already exist in one form or another in the current >> implementation. >> >> (1) What tangable harm comes to postgresql.conf from these features? >> (2) What problem (security, stabilitry, safety, etc.) is created by >> these >> features that doesn't already exist in some form already. >> (3) Isn't having this as an option "better" than making it normal for >> people to mess around in the PGDATA directory? >> (4) Isn't open source and UNIX phylosophy about providing capability not >> enforcing policy? > > I think the major problem with your -C & -D idea is that you require the > administrator to link the config file and data directory everytime you > start the db, and that might be error-prone. > The patch does no such thing. This is a misunderstanding of the description. (I don't even know where it is in this chain of emails) The -C parameter sets the defaults which can be overridden by the command line, which seems "logical," correct? postmaster -C /etc/db/postgresql.conf Can be sufficient to start PostgreSQL, however, since command line arguments take precedent (as one would expect) postmaster -C /etc/db/postgresql.conf -D /RAID1/test_cluster Also works. PostgreSQL continues to use the defaults it currently does, but the patch adds five extra configuration entries: include = '/etc/postgres/debug.conf' data_dir = '/vol01/postgres' hba_conf = '/etc/postgres/pg_hba_conf' ident_conf = '/etc/postgres/pg_ident.conf' runtime_pidfile = '/var/run/postgresql.conf' The order of default is this: PostgreSQL default, configuration default, and finally command line. Lastly, do not confuse "runtime_pidfile" with the PID stored in $PGDATA. It is separate, it is used ONLY for external administration utilities that assume something like /var/run/foobar.pid
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Obviously, we need to do something. There are just too many people who > > want improvement in this area. The question is what changes to make. > > As far as I've seen in this thread, there's only *one* person arguing > for change, and even he isn't advocating changing the default behavior. > So why are you of the opinion that we need to make radical changes in > the default behavior? Which is what the proposals you've suggested are. > > I haven't seen anything that involves changing the default behavior that > would not make it materially harder to run multiple copies of PG, and > especially would make it materially harder to create a test installation > without needing root privileges to do the install. Anything that pushes > config files into fixed places means you need root. I don't see any big reason to change our existing default, but we have had a lot of requests/discussion on this in the past, so though there is only one person proposing a patch now, we do have folks who want improvement in this area. My personal opinion is that we should move the config files from pgsql/data to pgsql/data/etc. Unix config files aren't put in /, they are in /etc, so this seems logical. I was never comfortable with having editable files right next to files that shouldn't be touched. This makes backup of the config files easier, and allows for use of a symlink for the directory for those who want them. I assume some will argue that the change isn't worth it. Secondly, everyone seems to like the 'include' idea, and it gives per-line control over file sharing. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > My personal opinion is that we should move the config files from > pgsql/data to pgsql/data/etc. Unix config files aren't put in /, they > are in /etc, so this seems logical. I was never comfortable with having > editable files right next to files that shouldn't be touched. Perhaps we are arguing at cross-purposes. Are you saying that the postmaster should seek config files as, eg, $PGDATA/etc/postgresql.conf instead of $PGDATA/postgresql.conf? That would be all right with me. I thought you were proposing to move them to /etc (absolute path), which isn't all right ... > Secondly, everyone seems to like the 'include' idea, and it gives > per-line control over file sharing. Yeah, I think include is non-controversial, the argument is about what else (if anything) to change. regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: >> Obviously, we need to do something. There are just too many people who >> want improvement in this area. The question is what changes to make. [snip] > > The whole discussion reminds me quite a bit of Tom Lockhart's patch to > specify WAL file location on the postmaster command line. That one > unfortunately degenerated into a religious war :-( which it seems we are > coming perilously close to here as well. But I think the issues are > very similar --- convenience of setup versus probability of accidentally > setting up the wrong thing. The potential downside to the WAL location > business was a lot worse than what we face for config, but it's still a > real risk. Mark Kirkwood pointed out the risk of starting a production > server with the fsync=off setting you use for a test database. Another > example is starting server A with the pg_hba.conf settings you mean to > use with server B, and thereby allowing the wrong set of people access > to server A; in the worst case scenario that'd be a major security > breach. I am concerned about trying to "protect" users from themselves too aggresively. A chainsaw that won't cut off a persons arm, is probably not a useful chainsaw. "Dangerous" tools often need to be to do their job. > > My general feeling about it is that adding additional postmaster command > line switches is not the way to go, especially not when those switches > can specify things that might be subtly incompatible with other switch- > selected things. That's why I don't like the -C versus -D business. I don't understand this position. There are settings in the configuration file which can be overridden by the command line already. The problem already exists. > It's too easy to make a mistake if you are starting the postmaster > manually, and it's too hard to handle if you are starting the postmaster > from an init script (since generally users aren't supposed to edit init > scripts directly, no?). There should be just *one* switch. From a > pure functionality point of view it wouldn't matter much whether it was > -C or -D, as the system could be designed to find either from the other. > But we have a longstanding precedent that it is -D and you find the > config from that. I don't think we should lightly cast aside backwards > compatibility just to reverse the convention. I don't understand why you say there needs to be "one" switch. Already the command line overides config settings. All I am arguing is adding one more command line switch, and four or five GUC settings. > > I have not heard any argument so far that explains to me why it wouldn't > work fine to leave the postmaster switch set as-is (-D only), and expect > people who want centralized config to set up the config files inside > that data directory to be dummies that point to master config files > elsewhere. You can do that today with symlinks, and for those who > dislike symlinks I'm willing to adopt the part of the patch that allows > "#include"-type functionality. This approach keeps the config-to-data > association stored in the filesystem where it should be, rather than > relying on the DBA to remember to specify the correct -C and -D pair > every time he starts the postmaster. Ahh, I see the problem, -D is not required if you specify the data directory in the config file. postmaster -c /etc/db/postgresql.conf Is sufficient, however, if "-D" is specified it overides the config file, just like other parameters. Here are the GUC parameters I want to add: include = '/etc/postgres/debug.conf' data_dir = '/vol01/postgres' hba_conf = '/etc/postgres/pg_hba_conf' ident_conf = '/etc/postgres/pg_ident.conf' > It also allows many-to-one > relationships to work properly. You can easily make multiple data > directories point to the same config files, if that is indeed what you > mean to do. You can't make one config file point to multiple data > directories, so the other way requires both -C and -D on the command > line which is error-prone. I'm not sure how the misconception became part of the debate, I did use one example where you could have multiple databases with the same configuration, but it in no way the motivation for the patch. > > I find no merit in the argument about "I can't remember where the data > directory is". If you can't remember that then how are you going to > remember where the config file is either? This I don't agree with. I have been using this for a while and I wrote it so I can set a standard. "/etc/postgres/postgresql.conf" is a nice standard. Yes, during develoment and testing, multiple databases are key, but for most enterprise deployments, it is boot time initialization script running one database. The difficulty is that all systems are different, where the storage is mounted, rights give, etc. The "-C" switch allows me, and people like me, to define a standard that does not use symlinks, is independent of the storage layout of the system, and is fairly self documenting. I said before, if this functionality gets put into PostgreSQL, I bet that most VARs and Linux distributions will adopt this as the defacto standard. It makes configuration much more flexable.
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > My personal opinion is that we should move the config files from > > pgsql/data to pgsql/data/etc. Unix config files aren't put in /, they > > are in /etc, so this seems logical. I was never comfortable with having > > editable files right next to files that shouldn't be touched. > > Perhaps we are arguing at cross-purposes. Are you saying that the > postmaster should seek config files as, eg, $PGDATA/etc/postgresql.conf > instead of $PGDATA/postgresql.conf? That would be all right with me. > I thought you were proposing to move them to /etc (absolute path), > which isn't all right ... I was always proposing $PGDATA/etc/postgresql.conf. /etc would be terrible, as you say. One of my other ideas was to auto-create a symlink during initdb if someone wants the config directory (or pg_xlog directory) in a different location, but that is another issue. This is the Lockhart issue that I think we actually agreed to, but Thomas didn't want us to use symlinks, hence the propogation of flags to many programs that we didn't like. I eventually had to back out the patch, and no one continued the process. > > Secondly, everyone seems to like the 'include' idea, and it gives > > per-line control over file sharing. > > Yeah, I think include is non-controversial, the argument is about what > else (if anything) to change. Yea. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > On Mon, 12 Apr 2004, Bruce Momjian wrote: >> I think the major problem with your -C & -D idea is that you require the >> administrator to link the config file and data directory everytime you >> start the db, and that might be error-prone. > Well, AFAICS the patch doesn't require that actually, it merely allows the > separation. Well, it doesn't *require* it, but if you actually *use* the patch in the proposed way then you end up with the error-prone need to specify the correct combination of -C and -D on the command line. I think what people are questioning is whether we can't find a variant solution that avoids that risk. The bottom line to me is that config versus data ought to be a one-to- many relationship, at least if you accept the premise that shared config is reasonable at all. Putting a datadir spec inside the config file makes it impossible to share config files across datadirs, and so that seems to conflict with the argument (which is being made in support of this very same patch) that sharable config is good. On the other hand, if you make data point to config then you have a very natural way to manage the one-to-many relationship. Separate -C and -D would make sense if it were a many-to-many relationship (ie, you could sensibly use many different configs with the same data dir), but the case for multiple configs with one data dir seems pretty weak to me, and outweighed by the risk factors. regards, tom lane
> Stephan Szabo <sszabo@megazone.bigpanda.com> writes: >> On Mon, 12 Apr 2004, Bruce Momjian wrote: >>> I think the major problem with your -C & -D idea is that you require >>> the >>> administrator to link the config file and data directory everytime you >>> start the db, and that might be error-prone. > >> Well, AFAICS the patch doesn't require that actually, it merely allows >> the >> separation. > > Well, it doesn't *require* it, but if you actually *use* the patch in > the proposed way then you end up with the error-prone need to specify > the correct combination of -C and -D on the command line. I think what > people are questioning is whether we can't find a variant solution that > avoids that risk. This is completely wrong with regards to the patch. The patch "allows" "-D" on the command line, just like you can override the socket port, number of buffers, and other options, but the intention is that you do NOT use the "-D" option. > > The bottom line to me is that config versus data ought to be a one-to- > many relationship, at least if you accept the premise that shared config > is reasonable at all. Putting a datadir spec inside the config file > makes it impossible to share config files across datadirs, and so that > seems to conflict with the argument (which is being made in support of > this very same patch) that sharable config is good. On the other hand, > if you make data point to config then you have a very natural way to > manage the one-to-many relationship. > > Separate -C and -D would make sense if it were a many-to-many > relationship (ie, you could sensibly use many different configs with the > same data dir), but the case for multiple configs with one data dir > seems pretty weak to me, and outweighed by the risk factors. I hear "risk" but what risk?
pgsql@mohawksoft.com wrote: > > The bottom line to me is that config versus data ought to be a one-to- > > many relationship, at least if you accept the premise that shared config > > is reasonable at all. Putting a datadir spec inside the config file > > makes it impossible to share config files across datadirs, and so that > > seems to conflict with the argument (which is being made in support of > > this very same patch) that sharable config is good. On the other hand, > > if you make data point to config then you have a very natural way to > > manage the one-to-many relationship. > > > > Separate -C and -D would make sense if it were a many-to-many > > relationship (ie, you could sensibly use many different configs with the > > same data dir), but the case for multiple configs with one data dir > > seems pretty weak to me, and outweighed by the risk factors. > > I hear "risk" but what risk? OK, you look at your postgresql.conf file, and it says the data is in /var/data, but the postgresql.conf file was found via PGDATA, so it is ignored, and the directory is /var/local/pgsql. That seems confusing because someone looking at the file sees the wrong information. For me, having a config file that both "is found" with ignored values, and another mode where the config file points to everything seems strange. Does any other OS project do this? What if someone does -C /var/data/postgresql.conf, and postgresql.conf say to use /usr/local/data for data, what do we do? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > pgsql@mohawksoft.com wrote: > What if someone does -C /var/data/postgresql.conf, and postgresql.conf > say to use /usr/local/data for data, what do we do? Well, the patch says that the command line switch wins, which is consistent with what we do for other command line switches (they all override the equivalent postgresql.conf entries). This does seem a bit at variance with the stated goal of making the configuration more clearly documented, though :-(. If you actually use the capability then your config file will be lying to you about where things are. It's worth pointing out in this connection that for the most part I think people are moving *away* from using command line switches; it's better to set the value in postgresql.conf, both for documentation reasons and because that way you have some chance of changing the value via config file update and SIGHUP. The only way to change a value on the command line is to restart the postmaster. Plus, if you're using a distribution-supplied init script to start the postmaster, it's hard to get any switches in without hacking the script anyway. Most of these objections also apply to values obtained from environment variables (the exception is that postgresql.conf can override environment variables). So all in all I feel that we don't want to encourage more use of command line switches or environment variables to configure the postmaster. regards, tom lane
Bruce Momjian wrote: > My personal opinion is that we should move the config files from > pgsql/data to pgsql/data/etc. Unix config files aren't put in /, > they are in /etc, so this seems logical. I was never comfortable > with having editable files right next to files that shouldn't be > touched. This makes backup of the config files easier, and allows > for use of a symlink for the directory for those who want them. I > assume some will argue that the change isn't worth it. I would say that moving the configuration files even deeper into the data directory makes it all the more likely for people to not find them or be inclined to edit or delete other files nearby ("which of these log files can I delete"?). As much as I would like to see a solution that allows us to move the configuration files out of the data directory, I find some of the tendency in this thread to be rather ludicrous: trying to improve the administration facility of the system by adding half a dozen options to move things all over the place and half a dozen addional rules about how these options interact when conflicting values are given. I don't see how that would help the end goal.
Peter Eisentraut wrote: > Bruce Momjian wrote: > > My personal opinion is that we should move the config files from > > pgsql/data to pgsql/data/etc. Unix config files aren't put in /, > > they are in /etc, so this seems logical. I was never comfortable > > with having editable files right next to files that shouldn't be > > touched. This makes backup of the config files easier, and allows > > for use of a symlink for the directory for those who want them. I > > assume some will argue that the change isn't worth it. > > I would say that moving the configuration files even deeper into the > data directory makes it all the more likely for people to not find them > or be inclined to edit or delete other files nearby ("which of these > log files can I delete"?). My idea was that we put the config files in /data/etc, and folks are less likely to look at the top-level directory for things to muck with. They can look in data/etc and know exactly which files they should be touching. Right now they see:PG_VERSION pg_hba.conf postmaster.optsbase/ pg_ident.conf postmaster.pidglobal/ pg_xlog/pg_clog/ postgresql.conf and it isn't clear which files to touch. After the reorganization it would be: PG_VERSION global/ postmaster.optsbase/ pg_clog/ postmaster.pidetc/ pg_xlog/ and /etc would be: pg_hba.conf pg_ident.conf postgresql.conf which is much cleaner, I think, no? It also makes backup of the config files easier, and you can symlink the directory somewhere else if you want. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > ... it isn't clear which files to touch. After the reorganization it > would be: > PG_VERSION global/ postmaster.opts > base/ pg_clog/ postmaster.pid > etc/ pg_xlog/ > and /etc would be: > pg_hba.conf pg_ident.conf postgresql.conf > which is much cleaner, I think, no? I think if you spelled the subdir name "config" rather than "etc", it would be more obvious what's what. A further possibility is to move the runtime-changeable files (postmaster.pid and postmaster.opts) into still another subdirectory, but I'm not really in favor of that. I think there might be some possibilities for cross-version confusion if we move the .pid interlock file. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > ... it isn't clear which files to touch. After the reorganization it > > would be: > > > PG_VERSION global/ postmaster.opts > > base/ pg_clog/ postmaster.pid > > etc/ pg_xlog/ > > > and /etc would be: > > > pg_hba.conf pg_ident.conf postgresql.conf > > > which is much cleaner, I think, no? > > I think if you spelled the subdir name "config" rather than "etc", > it would be more obvious what's what. OK. > A further possibility is to move the runtime-changeable files > (postmaster.pid and postmaster.opts) into still another subdirectory, > but I'm not really in favor of that. I think there might be some > possibilities for cross-version confusion if we move the .pid interlock > file. Agreed. That is too fancy. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
pgsql@mohawksoft.com writes: >> Well, it doesn't *require* it, but if you actually *use* the patch in >> the proposed way then you end up with the error-prone need to specify >> the correct combination of -C and -D on the command line. I think what >> people are questioning is whether we can't find a variant solution that >> avoids that risk. > This is completely wrong with regards to the patch. The patch "allows" > "-D" on the command line, just like you can override the socket port, > number of buffers, and other options, but the intention is that you do NOT > use the "-D" option. Well, yeah, if you are considering only the single-database case (or even the separate-config-for-every-database case) then you could put "datadir = foo" in the config file and not use -D. The complaints are basically coming from the fact that this doesn't seem to scale up to more complex cases. To make use of shared config files you'd have to start the postmaster with both -C and -D, and I for one think that's risky. Plus it negates the claimed documentation benefit, since the filesystem contains no indication (or a wrong one) of which data dirs use the config file. If we're going to tackle this problem then I'd like to see a solution that works conveniently in the general case of N config files each being used by multiple databases. If we don't solve the general case then we'll just have to revisit the problem again sometime soon ... and one of the things we avoid when possible is API thrashing. If we have to break DBAs' established habits to improve things, then so be it, but let's not do so only to do it over again in the next release. >> Separate -C and -D would make sense if it were a many-to-many >> relationship (ie, you could sensibly use many different configs with the >> same data dir), but the case for multiple configs with one data dir >> seems pretty weak to me, and outweighed by the risk factors. > I hear "risk" but what risk? Two specific risks were pointed out already: starting a production server with fsync=off risks data loss, and starting it with the wrong pg_hba.conf risks security breaches (eg, letting the developer weenies into the payroll database ;-)). But those same settings would very likely be in use "next door" for a development database. With separate config and data it's real easy to foresee a DBA making the wrong association, if there's nothing in the filesystem to strongly tie a data directory to the config it should be used with. I think the feature needs to be designed to minimize that risk. regards, tom lane
> pgsql@mohawksoft.com wrote: >> > The bottom line to me is that config versus data ought to be a one-to- >> > many relationship, at least if you accept the premise that shared >> config >> > is reasonable at all. Putting a datadir spec inside the config file >> > makes it impossible to share config files across datadirs, and so that >> > seems to conflict with the argument (which is being made in support of >> > this very same patch) that sharable config is good. On the other >> hand, >> > if you make data point to config then you have a very natural way to >> > manage the one-to-many relationship. >> > >> > Separate -C and -D would make sense if it were a many-to-many >> > relationship (ie, you could sensibly use many different configs with >> the >> > same data dir), but the case for multiple configs with one data dir >> > seems pretty weak to me, and outweighed by the risk factors. >> >> I hear "risk" but what risk? > > OK, you look at your postgresql.conf file, and it says the data is in > /var/data, but the postgresql.conf file was found via PGDATA, so it is > ignored, and the directory is /var/local/pgsql. That seems confusing > because someone looking at the file sees the wrong information. Given enough time and tinkering, anyone can screw up an installation of anything. > > For me, having a config file that both "is found" with ignored values, > and another mode where the config file points to everything seems > strange. Does any other OS project do this? Almost all of the open source services allow you to override the default settings in the configuration file with a command line option. Anyone can make a case for almost any system which seems confusing. I think we all agree that all configration should be made in a configuration file, however we all recognize that, sometimes, an easy to use command line option to override the configuration settings is helpful for many reasons. > > What if someone does -C /var/data/postgresql.conf, and postgresql.conf > say to use /usr/local/data for data, what do we do? Command line is always the last authority, followed by the configuration file, followed environment, followed by any hard coded defaults. Sure, you can come up with problems in every system, but that's easy. You guys are very fond of symlinks, do you know how problematic they are? Their behavior is very much dependent on the application and the options used. I can tell you how often I've seen "unexpected" behavior with symlinks. Either the backup system just backs up the link when you think it should copy the data, or copies the data when it should copy just the link.
I just had a thought about this: seems like a big part of the objection is the risk of specifying -C and -D that don't go together. Well, what if they were the same switch? Consider the following simplification of the proposed patch: 1. Postmaster has just one switch, '-D datadir' with fallback to environmental variable PGDATA, same as it ever was. 2. The files that must be found in this directory are the configuration files, namely postgresql.conf, pg_hba.conf, pg_ident.conf. (And any files they include are taken as relative to this directory if not specified with absolute path. We'll still add the #include facility to postgresql.conf; the others have it already IIRC.) 3. postgresql.conf can contain a new changeable-only-at-startup configuration setting which we need to think of a good name for. ("datadir" seems confusing to me in this context, though maybe it would do; anyway I haven't got a better idea yet.) All the non-configuration files are located under that directory. Of course it defaults to being the -D directory if not specified in postgresql.conf. If we do things this way, we have the following properties: * Default behavior is same as it ever was, in particular there is no difficulty in making a test installation in a nontypical place. * Config files can easily be separated from data and can be backed up separately (no need for the etc/ or config/ subdirectory Bruce suggested). * It is not directly possible to use the same config with multiple databases. However one can easily imagine pointing the postmaster to a config file that contains only a "datadir = " spec and a #include of a sharable config file. (I have to confess not having thought about doing that in connection with the original patch proposal.) * If you want to think of this as config-centric, you can; if you want to think of it as data-centric, you can do that too. It's agnostic. A typical setup for sharable config files would look like this: you make directories named say "/etc/postgresql/postmasterN" which will be the -D targets for each of your postmasters. These contain postgresql.conf files that contain "datadir = someplace" and "include ../sharedconfigfile" and nothing else. Shared config files live in /etc/postgresql, per-database ones in its subdirectories. This notion is really almost the same as the patch-as-submitted, but there are a couple of key differences: * I did not like the patch's confusion over -C-specifies-config-directory versus -C-specifies-config-file. One big reason not to like it is that in the latter case it's not very clear what is the origin directory for #include references in the config files. I think we would do fine with less confusion if we adopt just the specify-a-config-directory behavior. I don't see a use-case that justifies the config-file option nor the separate postgresql.conf entries for pg_hba.conf and pg_ident.conf (which would have to be extended any time we add another config file). Surely requiring a separate config subdirectory for each postmaster isn't an objectionable amount of overhead. * There isn't a way to get things wrong on the command line. Well, actually there is: if the "datadir" parameter works the same as all other GUC parameters then one could override it on the command line with "-c datadir=whatever". Depending on how strongly you feel about that being a Bad Idea, we could imagine putting in a special prohibition against it. But at least it wouldn't be the designed-in way of working with shared config files. * Barring the "-c datadir" scenario, there is a strong link from a config subdirectory to its data area. A simple addition to the proposal would be to add a back-link: on first start, the postmaster would automatically make a file in the data directory that contains the absolute path of the config dir; on subsequent starts, check it still matches. This provides a simple interlock against accidentally starting a postmaster with the wrong config files for the data area. (You could break the interlock at need by deleting the back-link file.) In particular, if you'd not bothered to remove the config files placed in the data area by initdb, something like this is useful to ensure you don't accidentally start the postmaster with -D pointing straight at the data area where previously you'd pointed to a config directory. It also provides documentation in both places about where the other place is. Something that remains unclear to me is what to do with the proposed patch to support a secondary PID file. This strikes me as a solution in search of a problem --- it was claimed that this makes it easier to manipulate the postmaster with "standard Unix tools", but what tools are those and do we really want people frobbing the postmaster with them? Again I'm not sold on the use-case for the feature. regards, tom lane
> I just had a thought about this: seems like a big part of the objection > is the risk of specifying -C and -D that don't go together. Well, what > if they were the same switch? Consider the following simplification of > the proposed patch: I was really excited about this idea, then I thought about it, and while it would answer some of the issues I mean to address, I find myself a little disappointed that some of the functionality I wanted, i.e. multiple databases with the same configuration, was not possible. However, compromise is good. > > 1. Postmaster has just one switch, '-D datadir' with fallback to > environmental variable PGDATA, same as it ever was. I like this, I think, ... but it removes the posibility to run the same configuration with the same database. This scenario is one of my "best case" reasons why I think my patch is good, but, I think I can get 99% of what I'm looking for with my modification outlined at the bottom of this post. > > 2. The files that must be found in this directory are the configuration > files, namely postgresql.conf, pg_hba.conf, pg_ident.conf. (And any > files they include are taken as relative to this directory if not > specified with absolute path. We'll still add the #include facility > to postgresql.conf; the others have it already IIRC.) My patch *already* has this functionality if it is a directory. I agree with this, it was suggested (maybe even by you) over a year ago. [snip -- good stuff] Tom, this is great! I think we are almost there and I really appreciate your flexibility in view of my obstinance. :-) I like what you suggest, While I don't get the -D and -C functionality (which I don't use, but thought was cool), I think I would like to add one thing: postmaster -D /etc/postgres/postgresql.conf If the path specified is a config file, then "data_dir" MUST address a valid PostgreSQL data directory. So, here is (how I see) the logical breakdown of the feature: "postmaster -D /somedir/data" works as it always has, it points to the data dirtectory in which all the various config files live. If No "data_dir" is specified, then "/somedir/data" is assumed to be where base, pg_xlog, pg_clog, and etc. reside. If, however, "data_dir" is specified, the data oriented elements like "global," "base," "pg_clog," and "pg_xlog" are contained within that directory. (In the future, we may be able to specify these locations separately) If "postmaster -D /etc/postgresql.conf" points to a file, then that file MUST specify the location of "data_dir," "hba_conf," and "ident_conf." Like I said, while I don't get the convenience of combining "-D ..." and "-C ..." I do get most of what I'm asking for. If this works for all you guys, I'll submit a patch Wednesday.
Bruce Momjian wrote: >Thomas Swan wrote: > > >>I thought the idea was to *reduce* the number of config files and provide >>a unified configuration file. Ideally, the unified configuration file >>could eliminate the need for environment variables altogether. >> >>If I understand this correctly, the author was adding the ability to do >>this, not remove the default behavior. >> >>A single configuration point (which can be changed with a commandline >>switch) with the ability to include would be an exceptionally versatile >>asset for postgresql. Maybe relocating PID would be a bad idea and >>someone could clobber their database, but that could be addressed with >>LARGE WARNING in that config file where the option is available. >> >>Outside of the unified config file argument. "Configuration includes" >>give postgresql the ability to have shared settings. You could have a >>shared pg_hba.conf and test all other manner of settings with a set of >>config files (sort_mem, shared_buffers, etc.) that say include a >>standard_pg_hba.conf to control access. >> >> > >I suggested a new pg_path configuration file because it would enable >centralized config only if it was used. By adding /data location to >postgresql.conf, you have the postgresql.conf file acting sometimes via >PGDATA and sometimes as a central config file, and I thought that was >confusing. > > > Understandably. I think that using a config file that can specify all of this would be a big win. Imagine a simple start of the postmaster with only a pointer to a config file, and not having to rely on special environment variables or other command line switches.
pgsql@mohawksoft.com wrote: >>I just had a thought about this: seems like a big part of the objection >>is the risk of specifying -C and -D that don't go together. Well, what >>if they were the same switch? Consider the following simplification of >>the proposed patch: >> >> > >I was really excited about this idea, then I thought about it, and while >it would answer some of the issues I mean to address, I find myself a >little disappointed that some of the functionality I wanted, i.e. multiple >databases with the same configuration, was not possible. However, >compromise is good. > > > >>1. Postmaster has just one switch, '-D datadir' with fallback to >>environmental variable PGDATA, same as it ever was. >> >> > >I like this, I think, ... but it removes the posibility to run the same >configuration with the same database. This scenario is one of my "best >case" reasons why I think my patch is good, but, I think I can get 99% of >what I'm looking for with my modification outlined at the bottom of this >post. > > > > >>2. The files that must be found in this directory are the configuration >>files, namely postgresql.conf, pg_hba.conf, pg_ident.conf. (And any >>files they include are taken as relative to this directory if not >>specified with absolute path. We'll still add the #include facility >>to postgresql.conf; the others have it already IIRC.) >> >> > >My patch *already* has this functionality if it is a directory. I agree >with this, it was suggested (maybe even by you) over a year ago. > > >[snip -- good stuff] > >Tom, this is great! I think we are almost there and I really appreciate >your flexibility in view of my obstinance. :-) > >I like what you suggest, While I don't get the -D and -C functionality >(which I don't use, but thought was cool), I think I would like to add one >thing: > >postmaster -D /etc/postgres/postgresql.conf > >If the path specified is a config file, then "data_dir" MUST address a >valid PostgreSQL data directory. > > This is exceptionally confusing. Why not do a test and say that you cannot specify a -C and a -D option at the same time. This would still assure backwards compatability and safeguard future installations. If the -C option is specified the datadir must be present in the config file. If someone wants to specify the config file from a startup option, then they must follow the new rules. And, as this is new functionality, the rules can be set now. Adding one command line switch with the future possibility of eliminating the others is a good tradeoff, IMHO. >So, here is (how I see) the logical breakdown of the feature: > >"postmaster -D /somedir/data" works as it always has, it points to the >data dirtectory in which all the various config files live. If No >"data_dir" is specified, then "/somedir/data" is assumed to be where base, >pg_xlog, pg_clog, and etc. reside. > >If, however, "data_dir" is specified, the data oriented elements like >"global," "base," "pg_clog," and "pg_xlog" are contained within that >directory. (In the future, we may be able to specify these locations >separately) > >If "postmaster -D /etc/postgresql.conf" points to a file, then that file >MUST specify the location of "data_dir," "hba_conf," and "ident_conf." > >Like I said, while I don't get the convenience of combining "-D ..." and >"-C ..." I do get most of what I'm asking for. > >If this works for all you guys, I'll submit a patch Wednesday. > >---------------------------(end of broadcast)--------------------------- >TIP 8: explain analyze is your friend > >
Tom Lane wrote: > Well, the patch says that the command line switch wins, which is > consistent with what we do for other command line switches (they all > override the equivalent postgresql.conf entries). This does seem a > bit at variance with the stated goal of making the configuration more > clearly documented, though :-(. Hmm...well, think of it as a tool. It makes it *possible* to make the configuration more clearly documented, and in fact makes it easy to do so, but doesn't guarantee safety in all cases. > If you actually use the capability then > your config file will be lying to you about where things are. Of course. Just like your config file is lying about any configuration option that is overridden on the command line. I don't see why this is a problem, unless we intend to change the way the entire GUC system works. > It's worth pointing out in this connection that for the most part > I think people are moving *away* from using command line switches; > it's better to set the value in postgresql.conf, both for documentation > reasons and because that way you have some chance of changing the value > via config file update and SIGHUP. The only way to change a value on > the command line is to restart the postmaster. Plus, if you're using a > distribution-supplied init script to start the postmaster, it's hard to > get any switches in without hacking the script anyway. Now this raises a very interesting problem. Namely, what happens if you use the -C option to the postmaster as is being advocated, then change the datadir entry in the config file, and send SIGHUP to the postmaster? Ooops. Score one for Tom. :-) > Most of these objections also apply to values obtained from environment > variables (the exception is that postgresql.conf can override > environment variables). To be honest, I think the use of the PG_DATA environment variable is the biggest impediment to "self documentation" - the postmaster should not use it. The reason is that if PG_DATA is used to specify the location of the data directory, you won't be able to find out where a running postmaster's data directory is located without doing some heavy-duty investigation. Not all operating systems make it possible to determine the values of a particular process' environment variables. By requiring that the data directory be specified on the postmaster command line, it becomes possible to always determine where a postmaster's data directory resides just by looking at the ps output. Now, I know you guys who do heavy duty development make use of PG_DATA. I see no problem with having the code in postmaster that looks at PG_DATA be surrounded by a #ifdef that is active whenever you're doing development work. But it should *not* be active on a production system. Oh, as to the safety issue of a config file not properly corresponding to a given data directory, that seems easy enough to solve: if a file (call it "magic" for the purposes of discussion, though perhaps a better name would be "do_not_remove" :-) ) exists in the data directory, then the value of a configuration variable (call it "magic", too) must match the contents of that file. If the values don't match then the postmaster will issue an error and refuse to start. If the file doesn't exist then no "magic" configuration option need exist in the config file, and the postmaster will start as usual. So any administrator who wants to make sure that a configuration file has to explicitly be targetted at the data directory can do so. End result: if you use the -D option on the command line with an inappropriate -C option, the postmaster will refuse to run. -- Kevin Brown kevin@sysexperts.com
Bruce Momjian wrote: > Let me tell you the compromise I thought of. > >First, we put the config files (postgresql.conf, pg_hba.conf, >pg_ident.conf) in data/etc by default. > > > > Sorry Bruce, I was being slow :-) , I was thinking you were going to associate the config files with the binary distribution - I think I now realize that you were looking at pushing them down into $PGDATA/etc, which is quite nice and tidy. best wishes Mark
pgsql@mohawksoft.com wrote: >ii) I think the -C switch *WITH* the -D switch has viable usability. >Consider this, you are testing two different database layouts and/or RAID >controllers. You could easily bounce back and forth from *identical* >configurations like this: > > > Convenient indeed, but I would like to see the association of .conf file -> data dir remain reasonably solid. Its all about the foot gun. >iii) I don't like the PID file at all. Not one bit, but I had a few people >ask for it in the patch, it works as advertized and expected. It isn't my >place to say how someone should use something. One of my customers wanted >it, so I provided them with it. That is the beauty of open source. > > > > > I think that there is a difference between a special patch suitable for a particular customer and general release, and that maybe this addition falls right in there. best wishes Mark
Tom Lane wrote: >I think if you spelled the subdir name "config" rather than "etc", >it would be more obvious what's what. > > > > How about 'conf' - (familiar to anyone who has used apache or tomcat ....) regards Mark
On Tuesday 13 April 2004 01:14, Kevin Brown wrote: > Tom Lane wrote: <snip> > To be honest, I think the use of the PG_DATA environment variable is the > biggest impediment to "self documentation" - the postmaster should not > use it. > > The reason is that if PG_DATA is used to specify the location of the > data directory, you won't be able to find out where a running > postmaster's data directory is located without doing some heavy-duty > investigation. Not all operating systems make it possible to determine > the values of a particular process' environment variables. > I think this is another vote for "store the PGDATA dir value inside a running postgresql" so you can query the running database to find out what datafiles it is using. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes: > On Tuesday 13 April 2004 01:14, Kevin Brown wrote: >> To be honest, I think the use of the PG_DATA environment variable is the >> biggest impediment to "self documentation" - the postmaster should not >> use it. > I think this is another vote for "store the PGDATA dir value inside a running > postgresql" so you can query the running database to find out what datafiles > it is using. I agree --- we could answer this by adding some readout capability (think "show datadir") rather than by taking away functionality. Personally I rely quite a lot on setting PGDATA to keep straight which installation I'm currently working with, so I'm not going to be happy with a redesign that eliminates that variable without providing an adequate substitute :-( regards, tom lane
Tom Lane wrote: > Personally I rely quite a lot on setting PGDATA to keep straight which > installation I'm currently working with, so I'm not going to be happy > with a redesign that eliminates that variable without providing an > adequate substitute :-( I'll second that. Joe
Mark Kirkwood wrote: > > Tom Lane wrote: > >> I think if you spelled the subdir name "config" rather than "etc", >> it would be more obvious what's what. >> > How about 'conf' - (familiar to anyone who has used apache or tomcat ....) How about 'etc' - (familiar ot anyone who has used unix) -- Andrew Hammond
Joe Conway wrote > Tom Lane wrote: > > Personally I rely quite a lot on setting PGDATA to keep > straight which > > installation I'm currently working with, so I'm not going > to be happy > > with a redesign that eliminates that variable without providing an > > adequate substitute :-( > > I'll second that. Very much agreed. PGDATA is important, lets keep it, please. For one thing, this type of mechanism is already used by Oracle, with ORACLE_SID and ORACLE_HOME. [It might not work like Apache, but IMHO this is less relevant. Apache isn't typically configured by DBAs. PostgreSQL is, and familiar concepts from other industry areas are probably more important to the success of pg than conformance to internet/LINUX norms.] Best Regards, Simon Riggs
Simon Riggs wrote: > Very much agreed. PGDATA is important, lets keep it, please. To me it's not so much whether or not PGDATA is kept around for the system as a whole so much as how it's used. In the general case, scripts are used to start the postmaster. So using PGDATA even if the postmaster doesn't directly make use of it is a simple matter of adding '-D "$PGDATA"' to the command that invokes the postmaster. The goal here is simply to make it obvious to a system administrator where the PG data directory that a given postmaster is using resides. We can't rely on the mechanism used to change the command string that ps shows for the process: in my experience it's something that often does not work. And in any case, the system administrator will also want to know exactly what options were passed to the postmaster when it was invoked. If there's any group that can figure out how to effortlessly get PGDATA onto the command line of the backend utilities, it's the developer group. :-) In any case, I'm not at all opposed to having the backend stuff know about PGDATA during development, but for production you should have to explicitly specify the data directory on the command line. That seems easy enough to do: #ifdef is your friend. -- Kevin Brown kevin@sysexperts.com
Kevin Brown <kevin@sysexperts.com> writes: > The goal here is simply to make it obvious to a system administrator where > the PG data directory that a given postmaster is using resides. Why would it not be sufficient to add a read-only GUC variable that tells that? Connect to the postmaster and do "show datadir" and you're done. (Without this, it's not clear you've made any particular gain anyway, since "a given postmaster" would typically mean "the one I can connect to at this port", no?) In any case I don't see how removing PGDATA would make this more obvious. You yourself just pointed out that the command-line arguments of a postmaster aren't necessarily visible through ps; if they're not, what have you gained in transparency by forbidding PGDATA? > In any case, I'm not at all opposed to having the backend stuff know > about PGDATA during development, but for production you should have to > explicitly specify the data directory on the command line. If you wish to do things that way, you can; but that doesn't mean that everyone else should have to do it that way too. If there were a security or reliability hazard involved, I might agree with taking the fascist approach, but I see no such hazard here ... regards, tom lane
Joe Conway wrote: > Tom Lane wrote: > >> Personally I rely quite a lot on setting PGDATA to keep straight which >> installation I'm currently working with, so I'm not going to be happy >> with a redesign that eliminates that variable without providing an >> adequate substitute :-( > > > I'll second that. > > I'll third (or whatever) it too :-)
Tom Lane wrote: > Kevin Brown <kevin@sysexperts.com> writes: > > The goal here is simply to make it obvious to a system administrator where > > the PG data directory that a given postmaster is using resides. > > Why would it not be sufficient to add a read-only GUC variable that > tells that? Connect to the postmaster and do "show datadir" and you're > done. (Without this, it's not clear you've made any particular gain > anyway, since "a given postmaster" would typically mean "the one I can > connect to at this port", no?) That would probably be sufficient for most cases. It wouldn't take care of the case where there's a strict separation of powers between the system administrator and the DBA, but only if the system were managed badly (i.e., the SA and the DBA don't talk to each other very well). That's probably something we shouldn't concern ourselves with. > In any case I don't see how removing PGDATA would make this more > obvious. You yourself just pointed out that the command-line arguments > of a postmaster aren't necessarily visible through ps; if they're not, > what have you gained in transparency by forbidding PGDATA? I think you misunderstood what I was saying (which means I didn't say it right). There are ways within a program to change what 'ps' shows as the command line. We use those methods to make it possible to see what a given backend is doing by looking at the 'ps' output. It would be possible to have the postmaster use those ways in order to show which data directory it is using even if it wasn't specified on the command line. But in my experience, those ways don't work reliably on all systems. On the systems that those methods don't work, what 'ps' shows is the original command line that was used. So clearly, the only way 'ps' will show the data directory in that instance is if it was actually specified on the command line. > > In any case, I'm not at all opposed to having the backend stuff know > > about PGDATA during development, but for production you should have to > > explicitly specify the data directory on the command line. > > If you wish to do things that way, you can; but that doesn't mean that > everyone else should have to do it that way too. If there were a > security or reliability hazard involved, I might agree with taking the > fascist approach, but I see no such hazard here ... Fair enough. The PGDATA issue isn't a big enough one that I'm terribly concerned about it, especially if a read-only GUC variable is available to give that information (something that, I think, should be there anyway). -- Kevin Brown kevin@sysexperts.com