Thread: PostgreSQL configuration

PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
About a year or two ago I submitted a configuration patch that allowed
PostgreSQL to be fully configured by postgresql.conf -- enabling data and
configuration to be in separate locations. The idea was that, like most
UNIX systems, that the configuration file could be stored in the /etc
directory (or /etc/postgres or /usr/etc or whatever) and it could contain
all the various system directory and file locations, like pg_hba, and so
on.

There was a lot of debate about it, and I don't recall many arguments
against this sort of configuration strategy, only that there was a dislike
of my patch because it wasn't an all encompassing re-write of the
configuration system.

I have been maintaining it for the various versions of PostgreSQL since
that time for my own use, can we re-open this debate? It has been a good
deal of time with no progress, and I don't think anyone can deny that a
more flexable configuration based on the idea that configuration and data
are in SEPARATE locations is important.




Re: PostgreSQL configuration

From
Dennis Bjorklund
Date:
On Thu, 8 Apr 2004 pgsql@mohawksoft.com wrote:

> more flexable configuration based on the idea that configuration and data
> are in SEPARATE locations is important.

Why is it important and wouldn't it just make it harder to have several 
database clusters (for example with different locale) or several versions 
of pg installed at the same time?

I guess I should search the archive for the old discussion. If someone 
have a link please post :-)

-- 
/Dennis Björklund



Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
> On Thu, 8 Apr 2004 pgsql@mohawksoft.com wrote:
>
>> more flexable configuration based on the idea that configuration and
>> data
>> are in SEPARATE locations is important.
>
> Why is it important and wouldn't it just make it harder to have several
> database clusters (for example with different locale) or several versions
> of pg installed at the same time?

My patch did not remove any functionality, it merely augmented it.

To say that it would make it more difficult to deploy multiple databases
is misleading for (2) reasons.

(1) It need not do that, because the configuration system would seem
unchanged for those who do not wish to use it in this way.

(2) I would bet that *most* deployments of PostgreSQL only use one
database environment per server, so I'm not even sure that it would be an
issue for the majority of current or prospective users.

It is all well and good to say "our way is better," (with which I do not
agree) but there are, more or less, if not "standards," "standard
concepts" from which good software design follows. Besides PostgreSQL,
name one popular open source project that is widely used that stores its
configuration information inside its data repository. From the "new user"
perspective, configuration within the data directory is an alien concept.

From a sysadmin perspective, having configuration in a standard location
makes sense. It makes these things easy to backup, archive, and put under
version control. (Many sysadmins put machine configuration under version
control to see what changes are made over time.)

Finally, I'm not suggesting removing any functionality, I am suggesting
that configuration can and should be able to be located in a standard
location and the the configuration be able to point to the data volume.

How many systems have you been asked to inspect for problems? It is one of
the things I do for a living. On many systems, I can just look in the
'/etc' directory for most of what I need. If they are running PostgreSQL,
I have to look around and figure out where the database is located.


Re: PostgreSQL configuration

From
Tom Lane
Date:
Dennis Bjorklund <db@zigo.dhs.org> writes:
> On Thu, 8 Apr 2004 pgsql@mohawksoft.com wrote:
>> more flexable configuration based on the idea that configuration and data
>> are in SEPARATE locations is important.

> Why is it important and wouldn't it just make it harder to have several 
> database clusters (for example with different locale) or several versions 
> of pg installed at the same time?

My recollection of the arguments against were first that and second
reliability --- there was concern about getting config and data of
multiple installations mixed up if they weren't kept together.  In the
worst case you could conceivably bollix an installation unrecoverably
that way.  (Right now I do not think there is anything quite that
critical in postgresql.conf, but someday there might be.  My very vague
recollection is that the proposed patch changed things so that WAL and
DATA directories would be separately specified in the config file; if
correct, mismatching them definitely would be a great chance to shoot
oneself in the foot.)

I've recently had some very unpleasant experiences trying to install
test versions of MySQL on machines that already had older versions
installed normally.  It seems that MySQL *will* read /etc/my.cnf if it
exists, whether it's appropriate or not, and so it's impossible to have
a truly independent test installation, even though you can configure it
to build/install into nonstandard directories.  Let's not emulate that
bit of brain damage.
        regards, tom lane


Re: PostgreSQL configuration

From
Honza Pazdziora
Date:
On Thu, Apr 08, 2004 at 10:31:44AM -0400, Tom Lane wrote:
> 
> I've recently had some very unpleasant experiences trying to install
> test versions of MySQL on machines that already had older versions
> installed normally.  It seems that MySQL *will* read /etc/my.cnf if it
> exists, whether it's appropriate or not, and so it's impossible to have
> a truly independent test installation, even though you can configure it
> to build/install into nonstandard directories.  Let's not emulate that
> bit of brain damage.

A counterexample of Apache shows that you can easily use -f or another
command line option to point the server to alternate master config
file (which I believe is the same with MySQL). From that config
files, another files can be included, making it easy to share pieces
of configuration, or separate them in any way.

-- 
------------------------------------------------------------------------Honza Pazdziora | adelton@fi.muni.cz |
http://www.fi.muni.cz/~adelton/.project:Perl, mod_perl, DBI, Oracle, large Web systems, XML/XSL, ...    Only
self-confidentpeople can be simple.
 


Re: PostgreSQL configuration

From
Bruce Momjian
Date:
I have the file location discussion in my 7.4 hold mailbox:
http:/momjian.postgresql.org/cgi-bin/pgpatches2

I am going to revisit it the next month and see if I can get all the
opinions merged into a plan everyone can agree on.  I think it can be
done.

---------------------------------------------------------------------------

Tom Lane wrote:
> Dennis Bjorklund <db@zigo.dhs.org> writes:
> > On Thu, 8 Apr 2004 pgsql@mohawksoft.com wrote:
> >> more flexable configuration based on the idea that configuration and data
> >> are in SEPARATE locations is important.
> 
> > Why is it important and wouldn't it just make it harder to have several 
> > database clusters (for example with different locale) or several versions 
> > of pg installed at the same time?
> 
> My recollection of the arguments against were first that and second
> reliability --- there was concern about getting config and data of
> multiple installations mixed up if they weren't kept together.  In the
> worst case you could conceivably bollix an installation unrecoverably
> that way.  (Right now I do not think there is anything quite that
> critical in postgresql.conf, but someday there might be.  My very vague
> recollection is that the proposed patch changed things so that WAL and
> DATA directories would be separately specified in the config file; if
> correct, mismatching them definitely would be a great chance to shoot
> oneself in the foot.)
> 
> I've recently had some very unpleasant experiences trying to install
> test versions of MySQL on machines that already had older versions
> installed normally.  It seems that MySQL *will* read /etc/my.cnf if it
> exists, whether it's appropriate or not, and so it's impossible to have
> a truly independent test installation, even though you can configure it
> to build/install into nonstandard directories.  Let's not emulate that
> bit of brain damage.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PostgreSQL configuration

From
Joseph Tate
Date:
Tom Lane wrote:
> I've recently had some very unpleasant experiences trying to install
> test versions of MySQL on machines that already had older versions
> installed normally.  It seems that MySQL *will* read /etc/my.cnf if it
> exists, whether it's appropriate or not, and so it's impossible to have
> a truly independent test installation, even though you can configure it
> to build/install into nonstandard directories.  Let's not emulate that
> bit of brain damage.
> 
>             regards, tom lane

It seems to me that this is a packaging problem and not a postgresql 
problem.  If someone wants to package PostgreSQL so that there's a 
symlink to a config file in /etc/pgsql or vice versa for the main 
database they're welcome to do that, and why not?  As for test 
databases, there's already a -D for the datadir, why not add a -C for 
the config file as many software packages allow.  Then packagers could 
put the config file anywhere they wanted.  I would certainly welcome 
this feature as it would allow for easy tweaking/benchmarking.

I agree that we should avoid the viral-like MySQL configuration plague.

As to pgsql AT mohawksoft.com requested, here are a few widely used 
software packages that keep configuration close to the data, some in 
/var, some in /usr:

Mailman
OpenSSL
Cyrus-IMAP
Apache I believe doesn't install anything to /etc/ when you build from 
source.


Re: PostgreSQL configuration

From
Tom Lane
Date:
Honza Pazdziora <adelton@informatics.muni.cz> writes:
> On Thu, Apr 08, 2004 at 10:31:44AM -0400, Tom Lane wrote:
>> It seems that MySQL *will* read /etc/my.cnf if it
>> exists, whether it's appropriate or not, and so it's impossible to have
>> a truly independent test installation, even though you can configure it
>> to build/install into nonstandard directories.  Let's not emulate that
>> bit of brain damage.

> A counterexample of Apache shows that you can easily use -f or another
> command line option to point the server to alternate master config
> file (which I believe is the same with MySQL).

According to
http://www.mysql.com/documentation/mysql/bychapter/manual_Using_MySQL_Programs.html#Option_files
/etc/my.cnf will be read if it exists, no matter what you say on the
command line.  So AFAICS the only way to make a private installation is
to make sure that you have overridden each and every setting in
/etc/my.cnf in a private config file that you do control.  This is
tedious and breakage-prone, of course.
        regards, tom lane


Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
> Dennis Bjorklund <db@zigo.dhs.org> writes:
>> On Thu, 8 Apr 2004 pgsql@mohawksoft.com wrote:
>>> more flexable configuration based on the idea that configuration and
>>> data
>>> are in SEPARATE locations is important.
>
>> Why is it important and wouldn't it just make it harder to have several
>> database clusters (for example with different locale) or several
>> versions
>> of pg installed at the same time?
>
> My recollection of the arguments against were first that and second
> reliability --- there was concern about getting config and data of
> multiple installations mixed up if they weren't kept together.  In the
> worst case you could conceivably bollix an installation unrecoverably
> that way.  (Right now I do not think there is anything quite that
> critical in postgresql.conf, but someday there might be.  My very vague
> recollection is that the proposed patch changed things so that WAL and
> DATA directories would be separately specified in the config file; if
> correct, mismatching them definitely would be a great chance to shoot
> oneself in the foot.)

The patch I had kept the directory layout as one single setting, just that
postgresql,conf was able to contain the location of pg_hba.conf,
pg_ident.conf, and the data directory.

Thus, one could start PostgreSQL as:
postmaster -C /etc/postgres/webdb.conf

Which would allow full configuration from that one file.

>
> I've recently had some very unpleasant experiences trying to install
> test versions of MySQL on machines that already had older versions
> installed normally.  It seems that MySQL *will* read /etc/my.cnf if it
> exists, whether it's appropriate or not, and so it's impossible to have
> a truly independent test installation, even though you can configure it
> to build/install into nonstandard directories.  Let's not emulate that
> bit of brain damage.

MySQL is, in general, unpleasent, but that is more or less a packaging issue.


Re: PostgreSQL configuration

From
Robert Treat
Date:
On Thu, 2004-04-08 at 09:49, pgsql@mohawksoft.com wrote:
> > On Thu, 8 Apr 2004 pgsql@mohawksoft.com wrote:
> >
> >> more flexable configuration based on the idea that configuration and
> >> data
> >> are in SEPARATE locations is important.
> >
> > Why is it important and wouldn't it just make it harder to have several
> > database clusters (for example with different locale) or several versions
> > of pg installed at the same time?
> 
> My patch did not remove any functionality, it merely augmented it.
> 
> To say that it would make it more difficult to deploy multiple databases
> is misleading for (2) reasons.
> 
> (1) It need not do that, because the configuration system would seem
> unchanged for those who do not wish to use it in this way.
> 

True, but it is more difficult to deal with multiple databases if one
configures there system in the fashion... debian packages their
installations this way via symlinks so i've experience the difficulty
first hand. .

> (2) I would bet that *most* deployments of PostgreSQL only use one
> database environment per server, so I'm not even sure that it would be an
> issue for the majority of current or prospective users.
> 

except that when doing major version upgrades, i find it far better
practice to install multiple versions on the machine whenever possible,
even if you only intend to run a single version. 

> It is all well and good to say "our way is better," (with which I do not
> agree) but there are, more or less, if not "standards," "standard
> concepts" from which good software design follows. Besides PostgreSQL,
> name one popular open source project that is widely used that stores its
> configuration information inside its data repository. From the "new user"
> perspective, configuration within the data directory is an alien concept.
> 

i remember refuting this last time and i have to say something again
because this is equally misleading... apache does things this way if you
build from source, and there are others as well. 

> >From a sysadmin perspective, having configuration in a standard location
> makes sense. It makes these things easy to backup, archive, and put under
> version control. (Many sysadmins put machine configuration under version
> control to see what changes are made over time.)

and i would say that right now the way postgresql does it is much
easier. when you first get on a machine and need to find the webroot of
an apache install, theres no telling where it could be simply because a
lot of packagers do package things up differently.  

> 
> Finally, I'm not suggesting removing any functionality, I am suggesting
> that configuration can and should be able to be located in a standard
> location and the the configuration be able to point to the data volume.
> 

IIRC part of the problem with the initial patch/proposal is that it had
implementation issues following a couple of OS guidelines/specs, and
there was an issue with the pid. 

One potential bonus I would see to this type of functionality is that on
some servers I have multiple postgresql.confs on a server tuned to
specific tasks at hand... ie one for a pg_restore vs. one for normal
operations... it would be nice to point the db at a specific one rather
than having to copy files back and forth.

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL



Re: PostgreSQL configuration

From
Tom Lane
Date:
Robert Treat <xzilla@users.sourceforge.net> writes:
> On Thu, 2004-04-08 at 09:49, pgsql@mohawksoft.com wrote:
>> (2) I would bet that *most* deployments of PostgreSQL only use one
>> database environment per server, so I'm not even sure that it would be an
>> issue for the majority of current or prospective users.

> except that when doing major version upgrades, i find it far better
> practice to install multiple versions on the machine whenever possible,
> even if you only intend to run a single version. 

In any case, you will never get such a proposal past the core
developers, because we all run multiple PG installs per machine.
My primary development machine currently has six postmasters alive
on it (7.0, 7.1, ..., 7.4 + CVS tip); my alternate machine has five
installations on it, though not all are alive since I've not had reason
to restart them all since last reboot; even the laptop I'm physically
typing on right now has more than one Postgres installation on it.
And practically any time someone allows me access to a machine of
theirs to check out some kind of portability issue, I'll build a test
installation in my guest-account home directory, rather than muck with
their live server.

So, don't bother proposing anything that makes it even slightly harder
to run multiple servers per machine.  It will not happen.  End of
discussion.
        regards, tom lane


Re: PostgreSQL configuration

From
Honza Pazdziora
Date:
On Thu, Apr 08, 2004 at 11:32:19AM -0400, Tom Lane wrote:
> 
> > A counterexample of Apache shows that you can easily use -f or another
> > command line option to point the server to alternate master config
> > file (which I believe is the same with MySQL).
> 
> According to
> http://www.mysql.com/documentation/mysql/bychapter/manual_Using_MySQL_Programs.html#Option_files
> /etc/my.cnf will be read if it exists, no matter what you say on the
> command line.  So AFAICS the only way to make a private installation is
> to make sure that you have overridden each and every setting in

:-) I never used that "feature" so was never bitten by it. Anyway,
Apache HTTP server seems to do it the right way, doesn't it?

-- 
------------------------------------------------------------------------Honza Pazdziora | adelton@fi.muni.cz |
http://www.fi.muni.cz/~adelton/.project:Perl, mod_perl, DBI, Oracle, large Web systems, XML/XSL, ...    Only
self-confidentpeople can be simple.
 


Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
> Robert Treat <xzilla@users.sourceforge.net> writes:
>> On Thu, 2004-04-08 at 09:49, pgsql@mohawksoft.com wrote:
>>> (2) I would bet that *most* deployments of PostgreSQL only use one
>>> database environment per server, so I'm not even sure that it would be
>>> an
>>> issue for the majority of current or prospective users.
>
>> except that when doing major version upgrades, i find it far better
>> practice to install multiple versions on the machine whenever possible,
>> even if you only intend to run a single version.
>
> In any case, you will never get such a proposal past the core
> developers, because we all run multiple PG installs per machine.
> My primary development machine currently has six postmasters alive
> on it (7.0, 7.1, ..., 7.4 + CVS tip); my alternate machine has five
> installations on it, though not all are alive since I've not had reason
> to restart them all since last reboot; even the laptop I'm physically
> typing on right now has more than one Postgres installation on it.
> And practically any time someone allows me access to a machine of
> theirs to check out some kind of portability issue, I'll build a test
> installation in my guest-account home directory, rather than muck with
> their live server.
>
> So, don't bother proposing anything that makes it even slightly harder
> to run multiple servers per machine.  It will not happen.  End of
> discussion.
>

The problem with this conversation is that you assume the functionality
desired would affect your methodology in any way.

All I am asking for, and this is what my patch did, was add a few entries
to postgresql.conf. "data_dir, hba_conf, and ident_conf. A later version
of the patch added "include" and "runtime_pidfile."

These features allow a postgreSQL system to be fully configurable via a
postgresql.conf file. It may, in fact, make it easier to have multiple
installs.


Re: PostgreSQL configuration

From
Kevin Brown
Date:
Tom Lane wrote:
> Honza Pazdziora <adelton@informatics.muni.cz> writes:
> > On Thu, Apr 08, 2004 at 10:31:44AM -0400, Tom Lane wrote:
> >> It seems that MySQL *will* read /etc/my.cnf if it
> >> exists, whether it's appropriate or not, and so it's impossible to have
> >> a truly independent test installation, even though you can configure it
> >> to build/install into nonstandard directories.  Let's not emulate that
> >> bit of brain damage.
> 
> > A counterexample of Apache shows that you can easily use -f or another
> > command line option to point the server to alternate master config
> > file (which I believe is the same with MySQL).
> 
> According to
> http://www.mysql.com/documentation/mysql/bychapter/manual_Using_MySQL_Programs.html#Option_files
> /etc/my.cnf will be read if it exists, no matter what you say on the
> command line.  So AFAICS the only way to make a private installation is
> to make sure that you have overridden each and every setting in
> /etc/my.cnf in a private config file that you do control.  This is
> tedious and breakage-prone, of course.

Yes.  But we don't have to do that.

If we're truly concerned about the possibility of multiple installations
attempting to use the same config, then the answer is simple: require
that the location of the config file be specified on the command line
and don't compile a default location into the binary.  Similarly, don't
take the value from an environment variable.

Packaged installations won't have trouble with this: they supply a startup
script which would pass the appropriate argument to the postmaster.


If we want to be a bit paranoid (justifiable if you've got really
important data on the line), we could also require that a version
string exist in the config file.  If the version string doesn't match
the version of the postmaster being started, the postmaster exits with
an error (and a hint of what to set the version string to and what the
name of the version string parameter is).  That way, even if you screw
up on the command line, you won't hose a database by starting the wrong
version of the postmaster against it.  Not sure if this would break
anything, though.


-- 
Kevin Brown                          kevin@sysexperts.com


Re: PostgreSQL configuration

From
Andrew Dunstan
Date:

Kevin Brown wrote:

>
>If we're truly concerned about the possibility of multiple installations
>attempting to use the same config, then the answer is simple: require
>that the location of the config file be specified on the command line
>and don't compile a default location into the binary.  Similarly, don't
>take the value from an environment variable.
>
>Packaged installations won't have trouble with this: they supply a startup
>script which would pass the appropriate argument to the postmaster.
>

In order to keep with existing practice, you could say that you have to 
supply *either* a config file, which points to the data dir etc., *or* a 
data dir, in which case the config files must be in the data dir. I very 
much agree with the idea of not compiling in a default config file location.

>
>
>If we want to be a bit paranoid (justifiable if you've got really
>important data on the line), we could also require that a version
>string exist in the config file.  If the version string doesn't match
>the version of the postmaster being started, the postmaster exits with
>an error (and a hint of what to set the version string to and what the
>name of the version string parameter is).  That way, even if you screw
>up on the command line, you won't hose a database by starting the wrong
>version of the postmaster against it.  Not sure if this would break
>anything, though.
>

It won't start now if there's a version mismatch, and that's nothing 
whatever to do with the config file - it matches against the PG_VERSION 
file. We're already rightly paranoid on this point.

cheers

andrew





Re: PostgreSQL configuration

From
Christopher Browne
Date:
In the last exciting episode, kevin@sysexperts.com (Kevin Brown) wrote:
> If we want to be a bit paranoid (justifiable if you've got really
> important data on the line), we could also require that a version
> string exist in the config file.  If the version string doesn't match
> the version of the postmaster being started, the postmaster exits with
> an error (and a hint of what to set the version string to and what the
> name of the version string parameter is).  That way, even if you screw
> up on the command line, you won't hose a database by starting the wrong
> version of the postmaster against it.  Not sure if this would break
> anything, though.

How would this differ from the present situation where
$PGDATA/PG_VERSION is already required to match against the
postmaster?

As far as I can see, the only thing that is to be changed by the
proposal is that instead of postgresql.conf being in $PGDATA, it might
be found somewhere else.  (And perhaps pg_hba.conf and pg_ident.conf
will also be located in that mystical "somewhere else.")

The change that _might_ be relevant would be to put a version string
into postgresql.conf so that there would be _two_ matches made, not
just one:
- $PGDATA/PG_VERSION is required to match the postmaster;
- $SOMEWHERE_ELSE/postgresql.conf's variable "version" is required to  match the postmaster.

But I think Tom put it pretty well when he commented that all of the
core developers make extensive use of the notion of having _many_
backends around, and therefore would oppose any proposal that would
make it less convenient to do that.

Core folk aren't likely to write up patches designed to shoot
themselves in the foot this way, nor are they likely to accept patches
that clearly do so.
-- 
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://www3.sympatico.ca/cbbrowne/linuxxian.html
"There's  no  longer  a  boycott  of  Apple.  But  MacOS  is  still  a
proprietary OS." -- RMS - June 13, 1998


Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
> Robert Treat <xzilla@users.sourceforge.net> writes:
>> On Thu, 2004-04-08 at 09:49, pgsql@mohawksoft.com wrote:
>>> (2) I would bet that *most* deployments of PostgreSQL only use one
database environment per server, so I'm not even sure that it would be
an
>>> issue for the majority of current or prospective users.
>
>> except that when doing major version upgrades, i find it far better
practice to install multiple versions on the machine whenever possible,
even if you only intend to run a single version.
>
> In any case, you will never get such a proposal past the core
> developers, because we all run multiple PG installs per machine. My
primary development machine currently has six postmasters alive on it
(7.0, 7.1, ..., 7.4 + CVS tip); my alternate machine has five
installations on it, though not all are alive since I've not had reason
to restart them all since last reboot; even the laptop I'm physically
typing on right now has more than one Postgres installation on it. And
practically any time someone allows me access to a machine of theirs to
check out some kind of portability issue, I'll build a test installation
in my guest-account home directory, rather than muck with their live
server.
>
> So, don't bother proposing anything that makes it even slightly harder
to run multiple servers per machine.  It will not happen.  End of
discussion.

I'll just post the README file at the bottom for a reference, but it
PostgreSQL can appear completely unchanged, but with the addition of just
two command line arguments, work "better" in situations where this sort of
functionality is desired, I don't see why people would wish it not to be
included.

If there are issues with coding style, or other finer details, I would be
very much willing to address any issues. Personally, I think the
requirement of using symlinks to synthesize this functionality is a hard
sell to many administrators.




::::::::::

This patch enables PostgreSQL to be far more flexible in
its configuration methodology.

Specifically, It adds two more command line parameters, "-C"
which specifies either the location of the postgres
configuration file or a directory containing the configuration
files, and "-R" which directs PostgreSQL to write its runtime
process ID to a standard file which can be used by control
scripts to control PostgreSQL.

A patched version of PostgreSQL will function as:

--- Configuration file ---
postmaster -C /etc/postgres/postgresql.conf

This will direct the postmaster program to use the
configuration file "/etc/postgres/postgresql.conf"

--- Configuration Directory ---
postmaster -C /etc/postgres

This will direct the postmaster program to search the
directory "/etc/postgres" for the standard configuration
file names: postgresql.conf, pg_hba.conf, and pg_ident.conf.

--- Run-time process ID ---
postmaster -R /var/run/postmaster.pid

This will direct PostgreSQL to write its process ID number
to a file, /var/run/postgresql.conf


--- postgresql.conf  options ---
Within the configuration file there  are five  additional
parameters: include, hba_conf,ident_conf, data_dir, and
runtime_pidfile.

They are used as:
include = '/etc/postgres/debug.conf'
data_dir = '/vol01/postgres'
hba_conf = '/etc/postgres/pg_hba_conf'
ident_conf = '/etc/postgres/pg_ident.conf'
runtime_pidfile = '/var/run/postgresql.conf'


The "-D" option on the command line overrides the "data_dir"
in the configuration file.

The "-R" option on the command line overrides the
"runtime_pidfile" in the configuration file.

If no hba_conf and/or ident_conf setting is specified, the default
$PGDATA/pg_hba.conf and/or $PGDATA/pg_ident.conf will be used.

If the "-C" option specifies a diretcory, pg_hba.conf and pg_ident.conf
files must be in the specified directory.

This patch is intended to move the PostgreSQL configuration out of the
data directory so that it can be modified and backed up as well as answer
some of the issues with deploying PostgreSQL in an FHS (File Hierarchy
Standard) way.

This patch is also useful for running multiple servers with the same
parameters:

postmaster -C /etc/postgres/postgresql.conf -D /VOL01/postgres -p 5432
postmaster -C /etc/postgres/postgresql.conf -D /VOL02/postgres -p 5433

To apply the patch, enter your PostgreSQL source directory, and run:

cat pgec-PGVERSON.patch | patch -p 1




Re: PostgreSQL configuration

From
rm_pg@cheapcomplexdevices.com
Date:
On Fri, 9 Apr 2004, Christopher Browne wrote:
>
> ...Tom ... commented that all of the core developers make extensive use
> of the notion of having _many_ backends around, and therefore ...
>
> Core folk aren't likely to write up patches designed to shoot
> themselves in the foot this way ...

I's not just core developers who use this feature.

For a program that's trying to be compatabile with Oracle,
MySQL, MSSQLServer and PostgreSQL for backends, it's nice
to have 7.3.X, 7.4.X, heck, even 7.0 family postgresql's
running.  And indeed, all except SQLServer (another guy's
doing this one) are running on my machine.

I test frequently against whatever database(s) are running on
my development mahines.  I test rarely against databases that
aren't.  Anything that makes that harder would be bad for developers
using PostgreSQL as well as for the core team.
  Ron


Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
>
> On Fri, 9 Apr 2004, Christopher Browne wrote:
>>
>> ...Tom ... commented that all of the core developers make extensive use
>> of the notion of having _many_ backends around, and therefore ...
>>
>> Core folk aren't likely to write up patches designed to shoot
>> themselves in the foot this way ...
>
> I's not just core developers who use this feature.
>
> For a program that's trying to be compatabile with Oracle,
> MySQL, MSSQLServer and PostgreSQL for backends, it's nice
> to have 7.3.X, 7.4.X, heck, even 7.0 family postgresql's
> running.  And indeed, all except SQLServer (another guy's
> doing this one) are running on my machine.
>
> I test frequently against whatever database(s) are running on
> my development mahines.  I test rarely against databases that
> aren't.  Anything that makes that harder would be bad for developers
> using PostgreSQL as well as for the core team.
>

This is so frustrating, NO ONE IS TRYING TO MAKE IT HARDER! All the patch
that I propose does is ADD functionality. Two command line switches, and
five config file entries:

include = '/etc/postgres/debug.conf'
data_dir = '/vol01/postgres'
hba_conf = '/etc/postgres/pg_hba_conf'
ident_conf = '/etc/postgres/pg_ident.conf'
runtime_pidfile = '/var/run/postgresql.conf'

I am neither suggesting nor implementing any change in the current default
behavior of PostgreSQL. I am merely adding features that would make it
easier to do things like configure from a centralized directory which is
different than the data directory, the ability to included
"sub-configuration" like specific tuning or debug info, and to write a
usable PID file for standard UNIX admin scripts.





Re: PostgreSQL configuration

From
Tom Lane
Date:
pgsql@mohawksoft.com writes:
> I am neither suggesting nor implementing any change in the current default
> behavior of PostgreSQL. I am merely adding features that would make it
> easier to do things like configure from a centralized directory which is
> different than the data directory, the ability to included
> "sub-configuration" like specific tuning or debug info, and to write a
> usable PID file for standard UNIX admin scripts.

Well, let's take it one piece at a time here.

I can see some value in providing "#include" functionality in
postgresql.conf (and the other config files too).  I'm not convinced
that it's a must-have, because the desired contents of the config files
tend to change with each new PG version.  But to the extent that you're
admining multiple clusters of the same version, it would have some use.

Moving the PID file out of the data directory is actively dangerous,
because we use that file as part of the safety interlock against
starting multiple postmasters in the same data directory.  I suppose
we could offer an option to write a second copy of the PID file at
a different place, but I'm not seeing what that buys except confusion
(especially if two postmasters are mistakenly instructed to put their
copied PID files at the same place).

The whole idea of having multiple command-line switches to pick config
and data separately bothers me.  ISTM this would mostly create great new
opportunities to shoot yourself in the foot (by accidentally picking the
wrong combination), without nearly enough benefit to outweigh the risk.
Possibly this perspective is somewhat developer-centric --- I'm sure
I manually start postmasters far more often than the average person.
But then this whole discussion seems of interest only to people with
outlier requirements; the existing setup works fine for the average user
with only one Postgres installation.

Could we compromise on just adding #include functionality?  ISTM that
would cover the desire for separate config and data directories.  You
could keep a postgresql.conf file in each data directory that simply
says#include /etc/postgres/debug.conf
and likewise for other config files.  Doesn't that accomplish what you
want?
        regards, tom lane


Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
> pgsql@mohawksoft.com writes:
>> I am neither suggesting nor implementing any change in the current
>> default
>> behavior of PostgreSQL. I am merely adding features that would make it
>> easier to do things like configure from a centralized directory which is
>> different than the data directory, the ability to included
>> "sub-configuration" like specific tuning or debug info, and to write a
>> usable PID file for standard UNIX admin scripts.
>
> Well, let's take it one piece at a time here.

Cool.

>
> I can see some value in providing "#include" functionality in
> postgresql.conf (and the other config files too).  I'm not convinced
> that it's a must-have, because the desired contents of the config files
> tend to change with each new PG version.  But to the extent that you're
> admining multiple clusters of the same version, it would have some use.

Speaking for myself, I like it because I can keep a set of debugging
parameters in a separate file, and only comment out the "include ..." for
production. (Not to mention that multiple databases can use it.)

>
> Moving the PID file out of the data directory is actively dangerous,
> because we use that file as part of the safety interlock against
> starting multiple postmasters in the same data directory.  I suppose
> we could offer an option to write a second copy of the PID file at
> a different place, but I'm not seeing what that buys except confusion
> (especially if two postmasters are mistakenly instructed to put their
> copied PID files at the same place).

The patch that I have creates a completly separate PID file from that in
the PGDATA directory. It is used for more compatible UNIX init scripting.
(Obviously on machines with only one database system.)

>
> The whole idea of having multiple command-line switches to pick config
> and data separately bothers me.  ISTM this would mostly create great new
> opportunities to shoot yourself in the foot (by accidentally picking the
> wrong combination), without nearly enough benefit to outweigh the risk.

This is where I think we disagree. Very much so, in fact. I think having
something like:

/etc/postgres/webdb.conf
In which there is a line:
datadir=/RAID0/postgres

and

/etc/postgres/testdb.conf
In which there is this line
datadir=/RAID1/postgres

Allows for a very standardized, and IMHO, very self documenting installation.


> Possibly this perspective is somewhat developer-centric --- I'm sure
> I manually start postmasters far more often than the average person.
> But then this whole discussion seems of interest only to people with
> outlier requirements; the existing setup works fine for the average user
> with only one Postgres installation.

Tom, I really disagree here. I really don't know how to convey my feelings
about this, other than banging my head against the wall. I setup, develop
on, and manage a lot of different systems, PostgreSQL is frustrating for
me because I do not always have control over what is what. I have to
deploy systems on machines which I do not get to specify the layout. I do
not know where the various volumes will be. A year later, I will have
completely forgotten, and of course my notes are no where to be found.

One of the reasons I wrote these mods was so I could create a "standard."
All my PostgreSQL systems have an /etc/postgres/postgresql.conf file. I
sit down and know immediately where to look. I can *always* tell a user,
over the phone, run:
"/usr/local/pgsql/bin/postmaster -C /etc/postgres/postgresql.conf"

It *always* works, and when it doesn't it is because something has changed.

It may make it easier for an expert to shoot themselves in the foot, but
it also makes it easier for an expert to make it bullet proof.

>
> Could we compromise on just adding #include functionality?  ISTM that
> would cover the desire for separate config and data directories.  You
> could keep a postgresql.conf file in each data directory that simply
> says
>     #include /etc/postgres/debug.conf
> and likewise for other config files.  Doesn't that accomplish what you
> want?
>

The include functionality was added as a result from a debate about this
patch a couple years ago. The Primary purpose of my patch was to have the
configuration in a standard location.



Re: PostgreSQL configuration

From
Bruce Momjian
Date:
Tom Lane wrote:
> pgsql@mohawksoft.com writes:
> > I am neither suggesting nor implementing any change in the current default
> > behavior of PostgreSQL. I am merely adding features that would make it
> > easier to do things like configure from a centralized directory which is
> > different than the data directory, the ability to included
> > "sub-configuration" like specific tuning or debug info, and to write a
> > usable PID file for standard UNIX admin scripts.
> 
> Well, let's take it one piece at a time here.
> The whole idea of having multiple command-line switches to pick config
> and data separately bothers me.  ISTM this would mostly create great new
> opportunities to shoot yourself in the foot (by accidentally picking the
> wrong combination), without nearly enough benefit to outweigh the risk.
> Possibly this perspective is somewhat developer-centric --- I'm sure
> I manually start postmasters far more often than the average person.
> But then this whole discussion seems of interest only to people with
> outlier requirements; the existing setup works fine for the average user
> with only one Postgres installation.
> 
> Could we compromise on just adding #include functionality?  ISTM that
> would cover the desire for separate config and data directories.  You
> could keep a postgresql.conf file in each data directory that simply
> says
>     #include /etc/postgres/debug.conf
> and likewise for other config files.  Doesn't that accomplish what you
> want?

As I remember, there were two threads in the 7.4 discussion:
http:/momjian.postgresql.org/cgi-bin/pgpatches2

The discussions are the top-most threads.

One issue was having the config file, postgresql.conf, drive the PGDATA
location.  The second issue was putting all the config files,
postgresql.conf, pg_hba.conf, and pg_ident.conf in a separate directory,
so it was easier to backup, easier to know which files to edit, and
easier to symlink it to some other location.

On the issue of having postgresql.conf point to the data directory, that
basically add a level of indirection between the config file and the
data file, and I know some are concerned that there could be a
configuration error that could corrupt the database.  It is basically
putting the config file first, and letting the data directory derive
from that, rather than pointing to the data directory and finding the
config file in there.

A third option just mentioned is adding an #include capability to the
config file.  That gives per-line control over the file contents.  We
already have the ability to include a list of database/user/group names
in pg_hba.conf.

A fourth idea, where someone just posted a patch, was to have the config
directory and data directory independent and add flags to point to each
separately.  I think lots of folks didn't like that because forgetting
to specify the config directory would give you a running postmaster with
different config values from previous times you did specify the config
directory.  That just seems too error-prone.

Obviously, we need to do something.  There are just too many people who
want improvement in this area.  The question is what changes to make.

My personal opinion is that we move the config files in /data/etc, and
allow admins to move that directory somewhere else with symlinks.  If we
want to add #include capability too, that would help things.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
> Tom Lane wrote:
>> pgsql@mohawksoft.com writes:
>> > I am neither suggesting nor implementing any change in the current
>> default
>> > behavior of PostgreSQL. I am merely adding features that would make it
>> > easier to do things like configure from a centralized directory which
>> is
>> > different than the data directory, the ability to included
>> > "sub-configuration" like specific tuning or debug info, and to write a
>> > usable PID file for standard UNIX admin scripts.
>>
>> Well, let's take it one piece at a time here.
>> The whole idea of having multiple command-line switches to pick config
>> and data separately bothers me.  ISTM this would mostly create great new
>> opportunities to shoot yourself in the foot (by accidentally picking the
>> wrong combination), without nearly enough benefit to outweigh the risk.
>> Possibly this perspective is somewhat developer-centric --- I'm sure
>> I manually start postmasters far more often than the average person.
>> But then this whole discussion seems of interest only to people with
>> outlier requirements; the existing setup works fine for the average user
>> with only one Postgres installation.
>>
>> Could we compromise on just adding #include functionality?  ISTM that
>> would cover the desire for separate config and data directories.  You
>> could keep a postgresql.conf file in each data directory that simply
>> says
>>     #include /etc/postgres/debug.conf
>> and likewise for other config files.  Doesn't that accomplish what you
>> want?
>
> As I remember, there were two threads in the 7.4 discussion:
>
>     http:/momjian.postgresql.org/cgi-bin/pgpatches2
>
> The discussions are the top-most threads.

The threads I am talking about took place about a year or two ago.
February 2003 sounds about right.

>
> One issue was having the config file, postgresql.conf, drive the PGDATA
> location.  The second issue was putting all the config files,
> postgresql.conf, pg_hba.conf, and pg_ident.conf in a separate directory,
> so it was easier to backup, easier to know which files to edit, and
> easier to symlink it to some other location.

Most DBA/Admins, myself included, don't like symlinks.

>
> On the issue of having postgresql.conf point to the data directory, that
> basically add a level of indirection between the config file and the
> data file, and I know some are concerned that there could be a
> configuration error that could corrupt the database.  It is basically
> putting the config file first, and letting the data directory derive
> from that, rather than pointing to the data directory and finding the
> config file in there.

This is a phylosophical argument about software configuration: How do you
configure software, in configuration files or known files within a
directory. I prefer everything relative from a configuration file.

>
> A third option just mentioned is adding an #include capability to the
> config file.  That gives per-line control over the file contents.  We
> already have the ability to include a list of database/user/group names
> in pg_hba.conf.

That is easy enought.

>
> A fourth idea, where someone just posted a patch, was to have the config
> directory and data directory independent and add flags to point to each
> separately.  I think lots of folks didn't like that because forgetting
> to specify the config directory would give you a running postmaster with
> different config values from previous times you did specify the config
> directory.  That just seems too error-prone.

I have 2 huge problems with using the data directory as the location of
the configuration:

(1) Backup and sharing of configuration state is not obvious.
(2) There is no self documenting equivilent using the data directory. This
directory can be *anywhere* on the system. If using a standardized
configuration, the install becomes obvious.

>
> Obviously, we need to do something.  There are just too many people who
> want improvement in this area.  The question is what changes to make.
>
> My personal opinion is that we move the config files in /data/etc, and
> allow admins to move that directory somewhere else with symlinks.  If we
> want to add #include capability too, that would help things.
>

I wish I could impress on you the distaste the average admin has for
symlinks. If you knew how much DBAs and sys-admins hated symlinks, you
wouldn't think of them as a solution. To most, a symlink is used when the
software has no other viable option. When and admin needs to use a symlink
to configure software, they view this as a cop-out.




Re: PostgreSQL configuration

From
Mark Kirkwood
Date:
I seems to me that the existing situation is actually correct :

The configuration is a property of the initialized database cluster, so 
a logical place for it is in the root of said cluster.

It is *not* a property of the installed binary distribution (e.g 
/usr/local/pgsql/etc) - as you may have *several*  database clusters 
created using *this* binary distribution, each of which requiring a 
different configuration.

Having said that, I am ok about the 'include' idea.

regards

Mark


Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
> I seems to me that the existing situation is actually correct :
>
> The configuration is a property of the initialized database cluster, so
> a logical place for it is in the root of said cluster.
>
> It is *not* a property of the installed binary distribution (e.g
> /usr/local/pgsql/etc) - as you may have *several*  database clusters
> created using *this* binary distribution, each of which requiring a
> different configuration.
>
> Having said that, I am ok about the 'include' idea.

What I am finding difficult in this debate is that people are so
resistent, not to change, but to the idea that someone would want to
manage the system in a different way than they would. Yes, there are
probably many people who have multiple PostgreSQL database clusters
installed and operating simultaneously on their systems. No one is saying
that this needs to change in any way. IMHO my patch can do this in a self
documenting way, thus making it easier to do, i.e.

postmaster -C /etc/postgres/fundb.conf
postmaster -C /etc/postgres/testdb.conf

I think that is far more intuitive than:

postmaster -D /some/path/who/knows/where/fundb
postmaster -D /another/path/i/don/t/know/testdb

(Sorry for the sarcasm :-)

The point is, that configuration, including data cluster location, through
the configuration file is where a lot of PostgreSQL admins would like to
be. I understand the ease and historical nessesity of having everything in
the PGDATA directory, and as I've said many many times, I'm not suggesting
changing this default behavior, I simply want to add the features that
would allow PostgreSQL to be managed similarly to more mainstream UNIX
daemons like named, dhcpd, and so on.

I have been using this patch for a while and it makes administration
easier for me.

What is difficult in this patch is that it is not technically a "SQL
feature" which can be debated on functionality, it is more of a usability
feature which, by nature, is quite subjective. After a certain point,
people get polarized and debate sort of stops and discussion becomes
stating and restating the same contrary opinions.

It is frustrating. I think this is important, as I would not have written
and maintained it otherwise, but by being a somewhat subjective feature I
can't make any iron clad arguments for it. I can only say it makes
administration easier for those who whould like PostgreSQL administered
this way. If the prevailing view is "we don't think so," then it doesn't
get put it, but it doesn't make my arguments any less valid.



Re: PostgreSQL configuration

From
Steve Atkins
Date:
On Sat, Apr 10, 2004 at 03:53:49PM -0400, pgsql@mohawksoft.com wrote:
> > The whole idea of having multiple command-line switches to pick config
> > and data separately bothers me.  ISTM this would mostly create great new
> > opportunities to shoot yourself in the foot (by accidentally picking the
> > wrong combination), without nearly enough benefit to outweigh the risk.
> 
> This is where I think we disagree. Very much so, in fact. I think having
> something like:
> 
> /etc/postgres/webdb.conf
> In which there is a line:
> datadir=/RAID0/postgres
> 
> and
> 
> /etc/postgres/testdb.conf
> In which there is this line
> datadir=/RAID1/postgres
> 
> Allows for a very standardized, and IMHO, very self documenting installation.

But not as flexible as the existing alternative.

For instance, what if webdb is PostgreSQL 7.3 and testdb is PostgreSQL
7.4?  There is no way you can put that difference in a configuration
file, so the user will still need to know which binary of postgresql
to fire up.

So, yes, let's have a standard directory for storing the configuration
for all the PostgreSQL installations on the machine.

/etc/postgres sounds fine.

In /etc/postgres/webdb:

#!/bin/sh
datadir=/RAID0/postgres
/usr/local/pgsql73/bin/postmaster -D $datadir

and in /etc/postgres/testdb

#!/bin/sh
datadir=/RAID1/postgres
/usr/local/pgsql742/bin/postmaster -D $datadir

Much more flexible and explicitly self-documenting.

For more flexibility still, do what I do and make the scripts standard
rc.d style startup scripts.

To walk a user through listing the supported installations is easy -
'ls /etc/postgres'. Starting and stopping one - '/etc/postgres/webdb start'
or '/etc/postgres/webdb stop'. Checking system status and displaying the
data directory '/etc/postgres/webdb status'.

It seems to me to be far more intuitive to the end user, and to the
typical admin than your -C suggestion, it's certainly safer, and it
works fine now.

Cheers, Steve


Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
> On Sat, Apr 10, 2004 at 03:53:49PM -0400, pgsql@mohawksoft.com wrote:
>> > The whole idea of having multiple command-line switches to pick config
>> > and data separately bothers me.  ISTM this would mostly create great
>> new
>> > opportunities to shoot yourself in the foot (by accidentally picking
>> the
>> > wrong combination), without nearly enough benefit to outweigh the
>> risk.
>>
>> This is where I think we disagree. Very much so, in fact. I think having
>> something like:
>>
>> /etc/postgres/webdb.conf
>> In which there is a line:
>> datadir=/RAID0/postgres
>>
>> and
>>
>> /etc/postgres/testdb.conf
>> In which there is this line
>> datadir=/RAID1/postgres
>>
>> Allows for a very standardized, and IMHO, very self documenting
>> installation.
>
> But not as flexible as the existing alternative.

But your existing alternative is *NOT* going away.

>
> For instance, what if webdb is PostgreSQL 7.3 and testdb is PostgreSQL
> 7.4?  There is no way you can put that difference in a configuration
> file, so the user will still need to know which binary of postgresql
> to fire up.
>
> So, yes, let's have a standard directory for storing the configuration
> for all the PostgreSQL installations on the machine.
>
> /etc/postgres sounds fine.
>
> In /etc/postgres/webdb:
>
> #!/bin/sh
> datadir=/RAID0/postgres
> /usr/local/pgsql73/bin/postmaster -D $datadir
>
> and in /etc/postgres/testdb
>
> #!/bin/sh
> datadir=/RAID1/postgres
> /usr/local/pgsql742/bin/postmaster -D $datadir
>
> Much more flexible and explicitly self-documenting.

But also has multiple shell scripts and you can't share or have standard
configuration files like  pg_hba or pg_ident.

>
> For more flexibility still, do what I do and make the scripts standard
> rc.d style startup scripts.
>
> To walk a user through listing the supported installations is easy -
> 'ls /etc/postgres'. Starting and stopping one - '/etc/postgres/webdb
> start'
> or '/etc/postgres/webdb stop'. Checking system status and displaying the
> data directory '/etc/postgres/webdb status'.
>
> It seems to me to be far more intuitive to the end user, and to the
> typical admin than your -C suggestion, it's certainly safer, and it
> works fine now.

I don't see the "safer" argument. If we wanted "safer" we would code
PostgreSQL in Java or BASIC. What we want is efficiency.

Admittedly, my patch is not intended to make the users of multiple
installations/versions of PostgreSQL any easier or, for that matter, any
different. No one is suggesting changing the default behavior of
PostgreSQL. All the people arguing against this patch will never even
notice that it is there.

For all the people who would like PostgreSQL to fit in a FHS system,
easily, they will probably use it. In fact, I would bet real money, that
if this functionality is incorporated into PostgreSQL, it will become the
defacto methodology for the various distributions.




Re: PostgreSQL configuration

From
Mark Kirkwood
Date:
pgsql@mohawksoft.com wrote:

>
>IMHO my patch can do this in a self
>documenting way, thus making it easier to do, i.e.
>
>postmaster -C /etc/postgres/fundb.conf
>postmaster -C /etc/postgres/testdb.conf
>
>I think that is far more intuitive than:
>
>postmaster -D /some/path/who/knows/where/fundb
>postmaster -D /another/path/i/don/t/know/testdb
>
>  
>

To be honest - to me, both these look about the same on the 
intuitiveness front :-)

I do not like lots of command line agruments so usually use :

export PGDATA=/var/pgdata/<version>
pg_ctl start

I realize that I cannot objectively argue that this is intuitively 
better...it is just what I prefer.

>It is frustrating. I think this is important, as I would not have written
>and maintained it otherwise, but by being a somewhat subjective feature I
>can't make any iron clad arguments for it. I can only say it makes
>administration easier for those who whould like PostgreSQL administered
>this way. If the prevailing view is "we don't think so," then it doesn't
>get put it, but it doesn't make my arguments any less valid.
>
>  
>
I completely agree. We are discussing what we would prefer - which is a 
valid thing to do. Clearly if most people prefer most of what is in your 
patch, then it would be silly to ignore it!

So anyway, here is my vote on it :

i) the inlcude - I like it
ii) the -C switch - could be persuaded (provided some safety is there - 
like mutually exclusive with -D or PGDATA)
iii) the pid file - don't like it


regards

Mark


Re: PostgreSQL configuration

From
Bruce Momjian
Date:
Mark Kirkwood wrote:
> I seems to me that the existing situation is actually correct :
> 
> The configuration is a property of the initialized database cluster, so 
> a logical place for it is in the root of said cluster.
> 
> It is *not* a property of the installed binary distribution (e.g 
> /usr/local/pgsql/etc) - as you may have *several*  database clusters 
> created using *this* binary distribution, each of which requiring a 
> different configuration.
> 
> Having said that, I am ok about the 'include' idea.

My idea was to put config files in /usr/local/pgsql/data/etc, not
pgsql/etc.

We don't put Unix configuration files in /, etc put them in /etc.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PostgreSQL configuration

From
Bruce Momjian
Date:
pgsql@mohawksoft.com wrote:
> > Obviously, we need to do something.  There are just too many people who
> > want improvement in this area.  The question is what changes to make.
> >
> > My personal opinion is that we move the config files in /data/etc, and
> > allow admins to move that directory somewhere else with symlinks.  If we
> > want to add #include capability too, that would help things.
> >
> 
> I wish I could impress on you the distaste the average admin has for
> symlinks. If you knew how much DBAs and sys-admins hated symlinks, you
> wouldn't think of them as a solution. To most, a symlink is used when the
> software has no other viable option. When and admin needs to use a symlink
> to configure software, they view this as a cop-out.

Let me tell you the compromise I thought of.  

First, we put the config files (postgresql.conf, pg_hba.conf,
pg_ident.conf) in data/etc by default.

Then, we could add an initdb option to put the config files in another
location.  If you choose that, the config files are put into that new
directory, and a symlink is created in /data/etc to point to that new
location.

That way, you can centralize all your config files under one central
directory, you can find and back them up easily, and the /data directory
contains a symlink pointing to the config directory so you don't need to
specify a separate config directory on the command line.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PostgreSQL configuration

From
Mark Kirkwood
Date:
Bruce Momjian wrote:

> My idea was to put config files in /usr/local/pgsql/data/etc, not
>
>pgsql/etc.
>
>We don't put Unix configuration files in /, etc put them in /etc.
>
>  
>
Sorry, I missed the 'data' pathname. However - I may be a bit slow - but 
I do not see how this will handle the situation where you have one 
installation of pgsql running several clusters. (I am not sure how 
common this situation is mind you)

regards

Mark


Re: PostgreSQL configuration

From
Bruce Momjian
Date:
Mark Kirkwood wrote:
> Bruce Momjian wrote:
> 
> > My idea was to put config files in /usr/local/pgsql/data/etc, not
> >
> >pgsql/etc.
> >
> >We don't put Unix configuration files in /, etc put them in /etc.
> >
> >  
> >
> Sorry, I missed the 'data' pathname. However - I may be a bit slow - but 
> I do not see how this will handle the situation where you have one 
> installation of pgsql running several clusters. (I am not sure how 
> common this situation is mind you)

It is common.  Moving things to data/etc will make things clearer, and
see my later email on an initdb option to put /data/etc/ somewhere else
and put a symlink for /data/etc.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
> pgsql@mohawksoft.com wrote:
>
>>
>>IMHO my patch can do this in a self
>>documenting way, thus making it easier to do, i.e.
>>
>>postmaster -C /etc/postgres/fundb.conf
>>postmaster -C /etc/postgres/testdb.conf
>>
>>I think that is far more intuitive than:
>>
>>postmaster -D /some/path/who/knows/where/fundb
>>postmaster -D /another/path/i/don/t/know/testdb
>>
>>
>>
>
> To be honest - to me, both these look about the same on the
> intuitiveness front :-)

OK, I am yelling in a sound proof room. :-)

>
> I do not like lots of command line agruments so usually use :
>
> export PGDATA=/var/pgdata/<version>
> pg_ctl start
>
> I realize that I cannot objectively argue that this is intuitively
> better...it is just what I prefer.
>
>>It is frustrating. I think this is important, as I would not have written
>>and maintained it otherwise, but by being a somewhat subjective feature I
>>can't make any iron clad arguments for it. I can only say it makes
>>administration easier for those who whould like PostgreSQL administered
>>this way. If the prevailing view is "we don't think so," then it doesn't
>>get put it, but it doesn't make my arguments any less valid.
>>
>>
>>
> I completely agree. We are discussing what we would prefer - which is a
> valid thing to do. Clearly if most people prefer most of what is in your
> patch, then it would be silly to ignore it!
>
> So anyway, here is my vote on it :
>
> i) the inlcude - I like it
> ii) the -C switch - could be persuaded (provided some safety is there -
> like mutually exclusive with -D or PGDATA)
> iii) the pid file - don't like it

i) include, I don't care too much, I like it, but it isn't important to
me. (ironic, yes?)

ii) I think the -C switch *WITH* the -D switch has viable usability.
Consider this, you are testing two different database layouts and/or RAID
controllers. You could easily bounce back and forth from *identical*
configurations like this:

postmaster -C /etc/postgres/postgresql.conf -D /OLDRAID
Test performance on various clients.

postmaster -C /etc/postgres/postgresql.conf -D /NEWRAID
Test performance again with same clients.

In the above example, you don't need to configure the two systems separately.

iii) I don't like the PID file at all. Not one bit, but I had a few people
ask for it in the patch, it works as advertized and expected. It isn't my
place to say how someone should use something. One of my customers wanted
it, so I provided them with it. That is the beauty of open source.





Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
> pgsql@mohawksoft.com wrote:
>> > Obviously, we need to do something.  There are just too many people
>> who
>> > want improvement in this area.  The question is what changes to make.
>> >
>> > My personal opinion is that we move the config files in /data/etc, and
>> > allow admins to move that directory somewhere else with symlinks.  If
>> we
>> > want to add #include capability too, that would help things.
>> >
>>
>> I wish I could impress on you the distaste the average admin has for
>> symlinks. If you knew how much DBAs and sys-admins hated symlinks, you
>> wouldn't think of them as a solution. To most, a symlink is used when
>> the
>> software has no other viable option. When and admin needs to use a
>> symlink
>> to configure software, they view this as a cop-out.
>
> Let me tell you the compromise I thought of.
>
> First, we put the config files (postgresql.conf, pg_hba.conf,
> pg_ident.conf) in data/etc by default.

What does that really give you?

>
> Then, we could add an initdb option to put the config files in another
> location.  If you choose that, the config files are put into that new
> directory, and a symlink is created in /data/etc to point to that new
> location.

Symlinks don't go with an scp. This is frustrating. Please take no offense
by this, symlinks do not always act exactly like files. Most of the time
they do, but every now and then, depending on the application or utilities
used, symlinks get copied with invalid links or ignored alltogether. IMHO,
the PostgreSQL team depends too much on symlinks as a bandaid to real
defects and issues.

I would prefer to be able to configure a system without symlinks.
Sysadmins and DBAs do not like symlinks. Any "solution" based on symlinks
will be used grudgingly.

>
> That way, you can centralize all your config files under one central
> directory, you can find and back them up easily, and the /data directory
> contains a symlink pointing to the config directory so you don't need to
> specify a separate config directory on the command line.

I would like to ask you, why does there need to be a compromise?
(I am not oppsed to compromise, but you are relying on symlinks again, and
this is a problem. )

(1) The code is written.
(2) The code is working.
(3) The code does not affect current default behavior.
(4) I am willing to change to fit any coding standards which may be an issue.

It makes no sense to me to write something new as a compromise, when I
already have something that works, is (obviously) already what I want, and
does not, in fact, change any default PostgreSQL behavior.

Take a look at the patch, I submitted it about a year or so ago, and it
was rejected in favor of a redesign peter was going to do. Needless to say
that was not done. This is such a *little* thing (The patch is only 760
lines), I can't believe it is so difficult, I simply do not understand the
opposition to it, not then and not now. Could someone please tell me why
this is bad? I "get it" that people on this group don't want to do it this
way, but what is *wrong*, and by wrong, I mean harmful to PostgreSQL,
about it?

No one that does not like this functionality would ever even be
inconvenienced by it. Those of us who want it, would find it more
convenient.

Could someone please tell me why this is such a fight? I've been
maintaining this patch for well over year now, it spans two major
versions, and I have people downloading it from my site every month.


Re: PostgreSQL configuration

From
Mark Kirkwood
Date:
Bruce Momjian wrote:

>Mark Kirkwood wrote:
>  
>
>>Bruce Momjian wrote:
>>
>>    
>>
>>>My idea was to put config files in /usr/local/pgsql/data/etc, not
>>>
>>>pgsql/etc.
>>>
>>>We don't put Unix configuration files in /, etc put them in /etc.
>>>
>>> 
>>>
>>>      
>>>
>>Sorry, I missed the 'data' pathname. However - I may be a bit slow - but 
>>I do not see how this will handle the situation where you have one 
>>installation of pgsql running several clusters. (I am not sure how 
>>common this situation is mind you)
>>    
>>
>
>It is common.  Moving things to data/etc will make things clearer, and
>see my later email on an initdb option to put /data/etc/ somewhere else
>and put a symlink for /data/etc.
>  
>
Hmmm, the current setup handles this situation sensibly and without the need for symlinks. So this does not look like
animprovement to me...
 

This *could* work without symlinks if you introduce a "name" for each initialized cluster, and make this part of the
configfile name. This would mean that you could use 'data/etc' and have many config files therein, each of which would
*unambiguously*point to a given cluster. 
 

As a general point I share Tom's concern about breaking the association between the initialized cluster and its
configurationfile - e.g: I start "prod" with the configuration for "test" by mistake, and "test" has fsync=false... and
somethingpulls the power... 
 

regards

Mark 




Re: PostgreSQL configuration

From
Bruce Momjian
Date:
The only other idea I can think of is to create a new pg_path.conf file.
It would have the same format as postgresql.conf, but contain
information about /data location, config file location, and perhaps
pg_xlog location.

The file would be created by special flags to initdb, and once created,
would have to be used instead of pgdata for postmaster startup.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
> The only other idea I can think of is to create a new pg_path.conf file.
> It would have the same format as postgresql.conf, but contain
> information about /data location, config file location, and perhaps
> pg_xlog location.
>
> The file would be created by special flags to initdb, and once created,
> would have to be used instead of pgdata for postmaster startup.

That seems like a lot more risky, doesn't it? What is technically bad
about my patch? Why is it "bad?" Everyone is offering something different
than what I suggest. What is technically wrong with the patch? What can I
alter to correct any concerns?

I'm not a very good at politics, I sometimes tend to alianate people in
discussions, but I am simply unable to understand why the features I
suggest are not being considered "as is." I have been using them for a
while now, I find them very useful, and I have people downloading the
patch from my site on a regular basis. Yet I an unable to say "Here can we
add this." The response is "We don't like this for x, y, and z," but
reasons x, y, and z already exist in one form or another in the current
implementation.

(1) What tangable harm comes to postgresql.conf from these features?
(2) What problem (security, stabilitry, safety, etc.) is created by these
features that doesn't already exist in some form already.
(3) Isn't having this as an option "better" than making it normal for
people to mess around in the PGDATA directory?
(4) Isn't open source and UNIX phylosophy about providing capability not
enforcing policy?









Re: PostgreSQL configuration

From
Robert Treat
Date:
On Sunday 11 April 2004 11:56, pgsql@mohawksoft.com wrote:
> > On Sat, Apr 10, 2004 at 03:53:49PM -0400, pgsql@mohawksoft.com wrote:
> For all the people who would like PostgreSQL to fit in a FHS system,
> easily, they will probably use it. In fact, I would bet real money, that
> if this functionality is incorporated into PostgreSQL, it will become the
> defacto methodology for the various distributions.
>

IIRC (and admittidly I am being too lazy to look it up here) but doesn't the 
FHS require the pid file to be in a specific location (/tmp?)  ISTR that this 
became an issue last time around, since your patch didn't actually allow full 
FHS compliance (while admitidly allowing more compliance, but that's like 
being a little pregnant)  So this is the one thing I think is a potential 
sticking point... how do we prevent users from blowing up thier databases by 
specifying multiple PID locations for the same DATA dir?  Anything that makes 
this easier to do is A Bad Thing (tm) because it can certainly lead to 
irrecoverable data corruption.  

One other thought relevant to this topic... one thing I have always wished for 
was some type of GUC (for lack of a better mechanism) that would tell me from 
inside the database what PGDATA path is currently being used to power the 
database.  I've certainly seen enough cases of people modifying the /wrong/ 
postgresql.conf on thier systems to think that the ability to figure out 
which configuration files you are using inside the db would certainly be a 
bonus... and this would have also solved the original complaint of not 
knowing where the $PGDATA path was... connect to the database and query for 
it... 

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


Re: PostgreSQL configuration

From
Bruce Momjian
Date:
pgsql@mohawksoft.com wrote:
> > The only other idea I can think of is to create a new pg_path.conf file.
> > It would have the same format as postgresql.conf, but contain
> > information about /data location, config file location, and perhaps
> > pg_xlog location.
> >
> > The file would be created by special flags to initdb, and once created,
> > would have to be used instead of pgdata for postmaster startup.
> 
> That seems like a lot more risky, doesn't it? What is technically bad
> about my patch? Why is it "bad?" Everyone is offering something different
> than what I suggest. What is technically wrong with the patch? What can I
> alter to correct any concerns?
> 
> I'm not a very good at politics, I sometimes tend to alianate people in
> discussions, but I am simply unable to understand why the features I
> suggest are not being considered "as is." I have been using them for a
> while now, I find them very useful, and I have people downloading the
> patch from my site on a regular basis. Yet I an unable to say "Here can we
> add this." The response is "We don't like this for x, y, and z," but
> reasons x, y, and z already exist in one form or another in the current
> implementation.
> 
> (1) What tangable harm comes to postgresql.conf from these features?
> (2) What problem (security, stabilitry, safety, etc.) is created by these
> features that doesn't already exist in some form already.
> (3) Isn't having this as an option "better" than making it normal for
> people to mess around in the PGDATA directory?
> (4) Isn't open source and UNIX phylosophy about providing capability not
> enforcing policy?

I think the major problem with your -C & -D idea is that you require the
administrator to link the config file and data directory everytime you
start the db, and that might be error-prone.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PostgreSQL configuration

From
Stephan Szabo
Date:
On Mon, 12 Apr 2004, Bruce Momjian wrote:

> pgsql@mohawksoft.com wrote:
> > > The only other idea I can think of is to create a new pg_path.conf file.
> > > It would have the same format as postgresql.conf, but contain
> > > information about /data location, config file location, and perhaps
> > > pg_xlog location.
> > >
> > > The file would be created by special flags to initdb, and once created,
> > > would have to be used instead of pgdata for postmaster startup.
> >
> > That seems like a lot more risky, doesn't it? What is technically bad
> > about my patch? Why is it "bad?" Everyone is offering something different
> > than what I suggest. What is technically wrong with the patch? What can I
> > alter to correct any concerns?
> >
> > I'm not a very good at politics, I sometimes tend to alianate people in
> > discussions, but I am simply unable to understand why the features I
> > suggest are not being considered "as is." I have been using them for a
> > while now, I find them very useful, and I have people downloading the
> > patch from my site on a regular basis. Yet I an unable to say "Here can we
> > add this." The response is "We don't like this for x, y, and z," but
> > reasons x, y, and z already exist in one form or another in the current
> > implementation.
> >
> > (1) What tangable harm comes to postgresql.conf from these features?
> > (2) What problem (security, stabilitry, safety, etc.) is created by these
> > features that doesn't already exist in some form already.
> > (3) Isn't having this as an option "better" than making it normal for
> > people to mess around in the PGDATA directory?
> > (4) Isn't open source and UNIX phylosophy about providing capability not
> > enforcing policy?
>
> I think the major problem with your -C & -D idea is that you require the
> administrator to link the config file and data directory everytime you
> start the db, and that might be error-prone.

Well, AFAICS the patch doesn't require that actually, it merely allows the
separation. You can place the data directory in the configuration file
and only use -C, you can place the configuration in the standard place
under data and only use -D or you can specify both on the command line.

I think the real potential harm would be from any current or future
options where it'd be possible to have the system behave improperly when
started up with the wrong value relative to a particular data directory.
This would be especially bad if it was difficult or impossible to realize
that it had happened and might then actually destroy data. I'm reasonably
sure that such an option shouldn't be in an expected to be edited by admin
configuration file, though.



Re: PostgreSQL configuration

From
"Thomas Swan"
Date:
<quote who="Bruce Momjian">
> The only other idea I can think of is to create a new pg_path.conf file.
> It would have the same format as postgresql.conf, but contain
> information about /data location, config file location, and perhaps
> pg_xlog location.
>
> The file would be created by special flags to initdb, and once created,
> would have to be used instead of pgdata for postmaster startup.
>

Bruce,

I thought the idea was to *reduce* the number of config files and provide
a unified configuration file.  Ideally, the unified configuration file
could eliminate the need for environment variables altogether.

If I understand this correctly, the author was adding the ability to do
this, not remove the default behavior.

A single configuration point (which can be changed with a commandline
switch) with the ability to include would be an exceptionally versatile
asset for postgresql.  Maybe relocating PID would be a bad idea and
someone could clobber their database, but that could be addressed with
LARGE WARNING in that config file where the option is available.

Outside of the unified config file argument.   "Configuration includes"
give postgresql the ability to have shared settings.  You could have a
shared pg_hba.conf and test all other manner of settings with a set of
config files (sort_mem, shared_buffers, etc.) that say include a
standard_pg_hba.conf to control access.

The single config file argument has the capacity to emulate the existing
default behavior.

# SINGLE DEFAULT CONFIG FILE
Include /var/lib/data/postgresql/postgresql.conf
Include /var/lib/data/postgresql/pg_hba.conf
Include /var/lib/data/postgresql/pg_ident.conf

or

#SINGLE DEFAULT CONFIG FILE
include options /var/lib/postgresql/data/postgresql.conf
include access /var/lib/postgresql/data/pg_hba.conf
include identity_map /var/lib/postgresql/data/pg_ident.conf



Re: PostgreSQL configuration

From
Bruce Momjian
Date:
Thomas Swan wrote:
> I thought the idea was to *reduce* the number of config files and provide
> a unified configuration file.  Ideally, the unified configuration file
> could eliminate the need for environment variables altogether.
> 
> If I understand this correctly, the author was adding the ability to do
> this, not remove the default behavior.
> 
> A single configuration point (which can be changed with a commandline
> switch) with the ability to include would be an exceptionally versatile
> asset for postgresql.  Maybe relocating PID would be a bad idea and
> someone could clobber their database, but that could be addressed with
> LARGE WARNING in that config file where the option is available.
> 
> Outside of the unified config file argument.   "Configuration includes"
> give postgresql the ability to have shared settings.  You could have a
> shared pg_hba.conf and test all other manner of settings with a set of
> config files (sort_mem, shared_buffers, etc.) that say include a
> standard_pg_hba.conf to control access.

I suggested a new pg_path configuration file because it would enable
centralized config only if it was used.  By adding /data location to
postgresql.conf, you have the postgresql.conf file acting sometimes via
PGDATA and sometimes as a central config file, and I thought that was
confusing.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PostgreSQL configuration

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Obviously, we need to do something.  There are just too many people who
> want improvement in this area.  The question is what changes to make.

As far as I've seen in this thread, there's only *one* person arguing
for change, and even he isn't advocating changing the default behavior.
So why are you of the opinion that we need to make radical changes in
the default behavior?  Which is what the proposals you've suggested are.

I haven't seen anything that involves changing the default behavior that
would not make it materially harder to run multiple copies of PG, and
especially would make it materially harder to create a test installation
without needing root privileges to do the install.  Anything that pushes
config files into fixed places means you need root.

The whole discussion reminds me quite a bit of Tom Lockhart's patch to
specify WAL file location on the postmaster command line.  That one
unfortunately degenerated into a religious war :-( which it seems we are
coming perilously close to here as well.  But I think the issues are
very similar --- convenience of setup versus probability of accidentally
setting up the wrong thing.  The potential downside to the WAL location
business was a lot worse than what we face for config, but it's still a
real risk.  Mark Kirkwood pointed out the risk of starting a production
server with the fsync=off setting you use for a test database.  Another
example is starting server A with the pg_hba.conf settings you mean to
use with server B, and thereby allowing the wrong set of people access
to server A; in the worst case scenario that'd be a major security
breach.

My general feeling about it is that adding additional postmaster command
line switches is not the way to go, especially not when those switches
can specify things that might be subtly incompatible with other switch-
selected things.  That's why I don't like the -C versus -D business.
It's too easy to make a mistake if you are starting the postmaster
manually, and it's too hard to handle if you are starting the postmaster
from an init script (since generally users aren't supposed to edit init
scripts directly, no?).  There should be just *one* switch.  From a
pure functionality point of view it wouldn't matter much whether it was
-C or -D, as the system could be designed to find either from the other.
But we have a longstanding precedent that it is -D and you find the
config from that.  I don't think we should lightly cast aside backwards
compatibility just to reverse the convention.

I have not heard any argument so far that explains to me why it wouldn't
work fine to leave the postmaster switch set as-is (-D only), and expect
people who want centralized config to set up the config files inside
that data directory to be dummies that point to master config files
elsewhere.  You can do that today with symlinks, and for those who
dislike symlinks I'm willing to adopt the part of the patch that allows
"#include"-type functionality.  This approach keeps the config-to-data
association stored in the filesystem where it should be, rather than
relying on the DBA to remember to specify the correct -C and -D pair
every time he starts the postmaster.  It also allows many-to-one
relationships to work properly.  You can easily make multiple data
directories point to the same config files, if that is indeed what you
mean to do.  You can't make one config file point to multiple data
directories, so the other way requires both -C and -D on the command
line which is error-prone.

I find no merit in the argument about "I can't remember where the data
directory is".  If you can't remember that then how are you going to
remember where the config file is either?  The only way is to establish
a personal standard.  If you want to have a personal standard about
where the centralized config files are, fine --- you can even add
comments to them to remind you of which data directory(s) each one is
used with.  But I don't see that that's fundamentally superior to doing
things in the reverse way.
        regards, tom lane


Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
> pgsql@mohawksoft.com wrote:
>> > The only other idea I can think of is to create a new pg_path.conf
>> file.
>> > It would have the same format as postgresql.conf, but contain
>> > information about /data location, config file location, and perhaps
>> > pg_xlog location.
>> >
>> > The file would be created by special flags to initdb, and once
>> created,
>> > would have to be used instead of pgdata for postmaster startup.
>>
>> That seems like a lot more risky, doesn't it? What is technically bad
>> about my patch? Why is it "bad?" Everyone is offering something
>> different
>> than what I suggest. What is technically wrong with the patch? What can
>> I
>> alter to correct any concerns?
>>
>> I'm not a very good at politics, I sometimes tend to alianate people in
>> discussions, but I am simply unable to understand why the features I
>> suggest are not being considered "as is." I have been using them for a
>> while now, I find them very useful, and I have people downloading the
>> patch from my site on a regular basis. Yet I an unable to say "Here can
>> we
>> add this." The response is "We don't like this for x, y, and z," but
>> reasons x, y, and z already exist in one form or another in the current
>> implementation.
>>
>> (1) What tangable harm comes to postgresql.conf from these features?
>> (2) What problem (security, stabilitry, safety, etc.) is created by
>> these
>> features that doesn't already exist in some form already.
>> (3) Isn't having this as an option "better" than making it normal for
>> people to mess around in the PGDATA directory?
>> (4) Isn't open source and UNIX phylosophy about providing capability not
>> enforcing policy?
>
> I think the major problem with your -C & -D idea is that you require the
> administrator to link the config file and data directory everytime you
> start the db, and that might be error-prone.
>

The patch does no such thing. This is a misunderstanding of the
description. (I don't even know where it is in this chain of emails)

The -C parameter sets the defaults which can be overridden by the command
line, which seems "logical," correct?

postmaster -C /etc/db/postgresql.conf

Can be sufficient to start PostgreSQL, however, since command line
arguments take precedent (as one would expect)

postmaster -C /etc/db/postgresql.conf -D /RAID1/test_cluster

Also works. PostgreSQL continues to use the defaults it currently does,
but the patch adds five extra configuration entries:

include = '/etc/postgres/debug.conf'
data_dir = '/vol01/postgres'
hba_conf = '/etc/postgres/pg_hba_conf'
ident_conf = '/etc/postgres/pg_ident.conf'
runtime_pidfile = '/var/run/postgresql.conf'

The order of default is this:
PostgreSQL default, configuration default, and finally command line.

Lastly, do not confuse "runtime_pidfile" with the PID stored in $PGDATA.
It is separate, it is used ONLY for external administration utilities that
assume something like /var/run/foobar.pid




Re: PostgreSQL configuration

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Obviously, we need to do something.  There are just too many people who
> > want improvement in this area.  The question is what changes to make.
> 
> As far as I've seen in this thread, there's only *one* person arguing
> for change, and even he isn't advocating changing the default behavior.
> So why are you of the opinion that we need to make radical changes in
> the default behavior?  Which is what the proposals you've suggested are.
> 
> I haven't seen anything that involves changing the default behavior that
> would not make it materially harder to run multiple copies of PG, and
> especially would make it materially harder to create a test installation
> without needing root privileges to do the install.  Anything that pushes
> config files into fixed places means you need root.

I don't see any big reason to change our existing default, but we have
had a lot of requests/discussion on this in the past, so though there is
only one person proposing a patch now, we do have folks who want
improvement in this area.

My personal opinion is that we should move the config files from
pgsql/data to pgsql/data/etc.  Unix config files aren't put in /, they
are in /etc, so this seems logical.  I was never comfortable with having
editable files right next to files that shouldn't be touched.  This
makes backup of the config files easier, and allows for use of a symlink
for the directory for those who want them.  I assume some will argue
that the change isn't worth it.

Secondly, everyone seems to like the 'include' idea, and it gives
per-line control over file sharing.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PostgreSQL configuration

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> My personal opinion is that we should move the config files from
> pgsql/data to pgsql/data/etc.  Unix config files aren't put in /, they
> are in /etc, so this seems logical.  I was never comfortable with having
> editable files right next to files that shouldn't be touched.

Perhaps we are arguing at cross-purposes.  Are you saying that the
postmaster should seek config files as, eg, $PGDATA/etc/postgresql.conf
instead of $PGDATA/postgresql.conf?  That would be all right with me.
I thought you were proposing to move them to /etc (absolute path),
which isn't all right ...

> Secondly, everyone seems to like the 'include' idea, and it gives
> per-line control over file sharing.

Yeah, I think include is non-controversial, the argument is about what
else (if anything) to change.
        regards, tom lane


Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> Obviously, we need to do something.  There are just too many people who
>> want improvement in this area.  The question is what changes to make.
[snip]
>
> The whole discussion reminds me quite a bit of Tom Lockhart's patch to
> specify WAL file location on the postmaster command line.  That one
> unfortunately degenerated into a religious war :-( which it seems we are
> coming perilously close to here as well.  But I think the issues are
> very similar --- convenience of setup versus probability of accidentally
> setting up the wrong thing.  The potential downside to the WAL location
> business was a lot worse than what we face for config, but it's still a
> real risk.  Mark Kirkwood pointed out the risk of starting a production
> server with the fsync=off setting you use for a test database.  Another
> example is starting server A with the pg_hba.conf settings you mean to
> use with server B, and thereby allowing the wrong set of people access
> to server A; in the worst case scenario that'd be a major security
> breach.

I am concerned about trying to "protect" users from themselves too
aggresively. A chainsaw that won't cut off a persons arm, is probably not
a useful chainsaw. "Dangerous" tools often need to be to do their job.

>
> My general feeling about it is that adding additional postmaster command
> line switches is not the way to go, especially not when those switches
> can specify things that might be subtly incompatible with other switch-
> selected things.  That's why I don't like the -C versus -D business.

I don't understand this position. There are settings in the configuration
file which can be overridden by the command line already. The problem
already exists.


> It's too easy to make a mistake if you are starting the postmaster
> manually, and it's too hard to handle if you are starting the postmaster
> from an init script (since generally users aren't supposed to edit init
> scripts directly, no?).  There should be just *one* switch.  From a
> pure functionality point of view it wouldn't matter much whether it was
> -C or -D, as the system could be designed to find either from the other.
> But we have a longstanding precedent that it is -D and you find the
> config from that.  I don't think we should lightly cast aside backwards
> compatibility just to reverse the convention.

I don't understand why you say there needs to be "one" switch. Already the
command line overides config settings. All I am arguing is adding one more
command line switch, and four or five GUC settings.

>
> I have not heard any argument so far that explains to me why it wouldn't
> work fine to leave the postmaster switch set as-is (-D only), and expect
> people who want centralized config to set up the config files inside
> that data directory to be dummies that point to master config files
> elsewhere.  You can do that today with symlinks, and for those who
> dislike symlinks I'm willing to adopt the part of the patch that allows
> "#include"-type functionality.  This approach keeps the config-to-data
> association stored in the filesystem where it should be, rather than
> relying on the DBA to remember to specify the correct -C and -D pair
> every time he starts the postmaster.

Ahh, I see the problem, -D is not required if you specify the data
directory in the config file.

postmaster -c /etc/db/postgresql.conf

Is sufficient, however, if "-D" is specified it overides the config file,
just like other parameters. Here are the GUC parameters I want to add:

include = '/etc/postgres/debug.conf'
data_dir = '/vol01/postgres'
hba_conf = '/etc/postgres/pg_hba_conf'
ident_conf = '/etc/postgres/pg_ident.conf'


> It also allows many-to-one
> relationships to work properly.  You can easily make multiple data
> directories point to the same config files, if that is indeed what you
> mean to do.  You can't make one config file point to multiple data
> directories, so the other way requires both -C and -D on the command
> line which is error-prone.

I'm not sure how the misconception became part of the debate, I did use
one example where you could have multiple databases with the same
configuration, but it in no way the motivation for the patch.

>
> I find no merit in the argument about "I can't remember where the data
> directory is".  If you can't remember that then how are you going to
> remember where the config file is either?

This I don't agree with. I have been using this for a while and I wrote it
so I can set a standard. "/etc/postgres/postgresql.conf" is a nice
standard.

Yes, during develoment and testing, multiple databases are key, but for
most enterprise deployments, it is boot time initialization script running
one database. The difficulty is that all systems are different, where the
storage is mounted, rights give, etc. The "-C" switch allows me, and
people like me, to define a standard that does not use symlinks, is
independent of the storage layout of the system, and is fairly self
documenting.


I said before, if this functionality gets put into PostgreSQL, I bet that
most VARs and Linux distributions will adopt this as the defacto standard.
It makes configuration much more flexable.


Re: PostgreSQL configuration

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > My personal opinion is that we should move the config files from
> > pgsql/data to pgsql/data/etc.  Unix config files aren't put in /, they
> > are in /etc, so this seems logical.  I was never comfortable with having
> > editable files right next to files that shouldn't be touched.
> 
> Perhaps we are arguing at cross-purposes.  Are you saying that the
> postmaster should seek config files as, eg, $PGDATA/etc/postgresql.conf
> instead of $PGDATA/postgresql.conf?  That would be all right with me.
> I thought you were proposing to move them to /etc (absolute path),
> which isn't all right ...

I was always proposing $PGDATA/etc/postgresql.conf.  /etc would be
terrible, as you say.

One of my other ideas was to auto-create a symlink during initdb if
someone wants the config directory (or pg_xlog directory) in a different
location, but that is another issue.  This is the Lockhart issue that I
think we actually agreed to, but Thomas didn't want us to use symlinks,
hence the propogation of flags to many programs that we didn't like.  I
eventually had to back out the patch, and no one continued the process.

> > Secondly, everyone seems to like the 'include' idea, and it gives
> > per-line control over file sharing.
> 
> Yeah, I think include is non-controversial, the argument is about what
> else (if anything) to change.

Yea.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PostgreSQL configuration

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> On Mon, 12 Apr 2004, Bruce Momjian wrote:
>> I think the major problem with your -C & -D idea is that you require the
>> administrator to link the config file and data directory everytime you
>> start the db, and that might be error-prone.

> Well, AFAICS the patch doesn't require that actually, it merely allows the
> separation.

Well, it doesn't *require* it, but if you actually *use* the patch in
the proposed way then you end up with the error-prone need to specify
the correct combination of -C and -D on the command line.  I think what
people are questioning is whether we can't find a variant solution that
avoids that risk.

The bottom line to me is that config versus data ought to be a one-to-
many relationship, at least if you accept the premise that shared config
is reasonable at all.  Putting a datadir spec inside the config file
makes it impossible to share config files across datadirs, and so that
seems to conflict with the argument (which is being made in support of
this very same patch) that sharable config is good.  On the other hand,
if you make data point to config then you have a very natural way to
manage the one-to-many relationship.

Separate -C and -D would make sense if it were a many-to-many
relationship (ie, you could sensibly use many different configs with the
same data dir), but the case for multiple configs with one data dir
seems pretty weak to me, and outweighed by the risk factors.
        regards, tom lane


Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
>> On Mon, 12 Apr 2004, Bruce Momjian wrote:
>>> I think the major problem with your -C & -D idea is that you require
>>> the
>>> administrator to link the config file and data directory everytime you
>>> start the db, and that might be error-prone.
>
>> Well, AFAICS the patch doesn't require that actually, it merely allows
>> the
>> separation.
>
> Well, it doesn't *require* it, but if you actually *use* the patch in
> the proposed way then you end up with the error-prone need to specify
> the correct combination of -C and -D on the command line.  I think what
> people are questioning is whether we can't find a variant solution that
> avoids that risk.

This is completely wrong with regards to the patch. The patch "allows"
"-D" on the command line, just like you can override the socket port,
number of buffers, and other options, but the intention is that you do NOT
use the "-D" option.

>
> The bottom line to me is that config versus data ought to be a one-to-
> many relationship, at least if you accept the premise that shared config
> is reasonable at all.  Putting a datadir spec inside the config file
> makes it impossible to share config files across datadirs, and so that
> seems to conflict with the argument (which is being made in support of
> this very same patch) that sharable config is good.  On the other hand,
> if you make data point to config then you have a very natural way to
> manage the one-to-many relationship.
>
> Separate -C and -D would make sense if it were a many-to-many
> relationship (ie, you could sensibly use many different configs with the
> same data dir), but the case for multiple configs with one data dir
> seems pretty weak to me, and outweighed by the risk factors.

I hear "risk" but what risk?



Re: PostgreSQL configuration

From
Bruce Momjian
Date:
pgsql@mohawksoft.com wrote:
> > The bottom line to me is that config versus data ought to be a one-to-
> > many relationship, at least if you accept the premise that shared config
> > is reasonable at all.  Putting a datadir spec inside the config file
> > makes it impossible to share config files across datadirs, and so that
> > seems to conflict with the argument (which is being made in support of
> > this very same patch) that sharable config is good.  On the other hand,
> > if you make data point to config then you have a very natural way to
> > manage the one-to-many relationship.
> >
> > Separate -C and -D would make sense if it were a many-to-many
> > relationship (ie, you could sensibly use many different configs with the
> > same data dir), but the case for multiple configs with one data dir
> > seems pretty weak to me, and outweighed by the risk factors.
> 
> I hear "risk" but what risk?

OK, you look at your postgresql.conf file, and it says the data is in
/var/data, but the postgresql.conf file was found via PGDATA, so it is
ignored, and the directory is /var/local/pgsql.  That seems confusing
because someone looking at the file sees the wrong information.

For me, having a config file that both "is found" with ignored values,
and another mode where the config file points to everything seems
strange.  Does any other OS project do this?

What if someone does -C /var/data/postgresql.conf, and postgresql.conf
say to use /usr/local/data for data, what do we do?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PostgreSQL configuration

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> pgsql@mohawksoft.com wrote:
> What if someone does -C /var/data/postgresql.conf, and postgresql.conf
> say to use /usr/local/data for data, what do we do?

Well, the patch says that the command line switch wins, which is
consistent with what we do for other command line switches (they all
override the equivalent postgresql.conf entries).  This does seem a
bit at variance with the stated goal of making the configuration more
clearly documented, though :-(.  If you actually use the capability then
your config file will be lying to you about where things are.

It's worth pointing out in this connection that for the most part
I think people are moving *away* from using command line switches;
it's better to set the value in postgresql.conf, both for documentation
reasons and because that way you have some chance of changing the value
via config file update and SIGHUP.  The only way to change a value on
the command line is to restart the postmaster.  Plus, if you're using a
distribution-supplied init script to start the postmaster, it's hard to
get any switches in without hacking the script anyway.

Most of these objections also apply to values obtained from environment
variables (the exception is that postgresql.conf can override
environment variables).

So all in all I feel that we don't want to encourage more use of command
line switches or environment variables to configure the postmaster.
        regards, tom lane


Re: PostgreSQL configuration

From
Peter Eisentraut
Date:
Bruce Momjian wrote:
> My personal opinion is that we should move the config files from
> pgsql/data to pgsql/data/etc.  Unix config files aren't put in /,
> they are in /etc, so this seems logical.  I was never comfortable
> with having editable files right next to files that shouldn't be
> touched.  This makes backup of the config files easier, and allows
> for use of a symlink for the directory for those who want them.  I
> assume some will argue that the change isn't worth it.

I would say that moving the configuration files even deeper into the 
data directory makes it all the more likely for people to not find them 
or be inclined to edit or delete other files nearby ("which of these 
log files can I delete"?).

As much as I would like to see a solution that allows us to move the 
configuration files out of the data directory, I find some of the 
tendency in this thread to be rather ludicrous: trying to improve the 
administration facility of the system by adding half a dozen options to 
move things all over the place and half a dozen addional rules about 
how these options interact when conflicting values are given.  I don't 
see how that would help the end goal.



Re: PostgreSQL configuration

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> Bruce Momjian wrote:
> > My personal opinion is that we should move the config files from
> > pgsql/data to pgsql/data/etc.  Unix config files aren't put in /,
> > they are in /etc, so this seems logical.  I was never comfortable
> > with having editable files right next to files that shouldn't be
> > touched.  This makes backup of the config files easier, and allows
> > for use of a symlink for the directory for those who want them.  I
> > assume some will argue that the change isn't worth it.
> 
> I would say that moving the configuration files even deeper into the 
> data directory makes it all the more likely for people to not find them 
> or be inclined to edit or delete other files nearby ("which of these 
> log files can I delete"?).

My idea was that we put the config files in /data/etc, and folks are
less likely to look at the top-level directory for things to muck with.
They can look in data/etc and know exactly which files they should be
touching.  Right now they see:PG_VERSION        pg_hba.conf        postmaster.optsbase/            pg_ident.conf
postmaster.pidglobal/           pg_xlog/pg_clog/        postgresql.conf
 

and it isn't clear which files to touch.  After the reorganization it
would be:
PG_VERSION              global/                 postmaster.optsbase/                   pg_clog/
postmaster.pidetc/                   pg_xlog/
 

and /etc would be:
pg_hba.conf             pg_ident.conf           postgresql.conf

which is much cleaner, I think, no?

It also makes backup of the config files easier, and you can symlink the
directory somewhere else if you want.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PostgreSQL configuration

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> ... it isn't clear which files to touch.  After the reorganization it
> would be:

>     PG_VERSION              global/                 postmaster.opts
>     base/                   pg_clog/                postmaster.pid
>     etc/                    pg_xlog/

> and /etc would be:

>     pg_hba.conf             pg_ident.conf           postgresql.conf

> which is much cleaner, I think, no?

I think if you spelled the subdir name "config" rather than "etc",
it would be more obvious what's what.

A further possibility is to move the runtime-changeable files
(postmaster.pid and postmaster.opts) into still another subdirectory,
but I'm not really in favor of that.  I think there might be some
possibilities for cross-version confusion if we move the .pid interlock
file.
        regards, tom lane


Re: PostgreSQL configuration

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > ... it isn't clear which files to touch.  After the reorganization it
> > would be:
> 
> >     PG_VERSION              global/                 postmaster.opts
> >     base/                   pg_clog/                postmaster.pid
> >     etc/                    pg_xlog/
> 
> > and /etc would be:
> 
> >     pg_hba.conf             pg_ident.conf           postgresql.conf
> 
> > which is much cleaner, I think, no?
> 
> I think if you spelled the subdir name "config" rather than "etc",
> it would be more obvious what's what.

OK.

> A further possibility is to move the runtime-changeable files
> (postmaster.pid and postmaster.opts) into still another subdirectory,
> but I'm not really in favor of that.  I think there might be some
> possibilities for cross-version confusion if we move the .pid interlock
> file.

Agreed.  That is too fancy.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PostgreSQL configuration

From
Tom Lane
Date:
pgsql@mohawksoft.com writes:
>> Well, it doesn't *require* it, but if you actually *use* the patch in
>> the proposed way then you end up with the error-prone need to specify
>> the correct combination of -C and -D on the command line.  I think what
>> people are questioning is whether we can't find a variant solution that
>> avoids that risk.

> This is completely wrong with regards to the patch. The patch "allows"
> "-D" on the command line, just like you can override the socket port,
> number of buffers, and other options, but the intention is that you do NOT
> use the "-D" option.

Well, yeah, if you are considering only the single-database case (or
even the separate-config-for-every-database case) then you could put
"datadir = foo" in the config file and not use -D.  The complaints are
basically coming from the fact that this doesn't seem to scale up to
more complex cases.  To make use of shared config files you'd have
to start the postmaster with both -C and -D, and I for one think that's
risky.  Plus it negates the claimed documentation benefit, since the
filesystem contains no indication (or a wrong one) of which data dirs
use the config file.

If we're going to tackle this problem then I'd like to see a solution
that works conveniently in the general case of N config files each being
used by multiple databases.  If we don't solve the general case then
we'll just have to revisit the problem again sometime soon ... and one
of the things we avoid when possible is API thrashing.  If we have to
break DBAs' established habits to improve things, then so be it, but
let's not do so only to do it over again in the next release.

>> Separate -C and -D would make sense if it were a many-to-many
>> relationship (ie, you could sensibly use many different configs with the
>> same data dir), but the case for multiple configs with one data dir
>> seems pretty weak to me, and outweighed by the risk factors.

> I hear "risk" but what risk?

Two specific risks were pointed out already: starting a production
server with fsync=off risks data loss, and starting it with the wrong
pg_hba.conf risks security breaches (eg, letting the developer weenies
into the payroll database ;-)).  But those same settings would very
likely be in use "next door" for a development database.  With separate
config and data it's real easy to foresee a DBA making the wrong
association, if there's nothing in the filesystem to strongly tie a
data directory to the config it should be used with.  I think the
feature needs to be designed to minimize that risk.
        regards, tom lane


Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
> pgsql@mohawksoft.com wrote:
>> > The bottom line to me is that config versus data ought to be a one-to-
>> > many relationship, at least if you accept the premise that shared
>> config
>> > is reasonable at all.  Putting a datadir spec inside the config file
>> > makes it impossible to share config files across datadirs, and so that
>> > seems to conflict with the argument (which is being made in support of
>> > this very same patch) that sharable config is good.  On the other
>> hand,
>> > if you make data point to config then you have a very natural way to
>> > manage the one-to-many relationship.
>> >
>> > Separate -C and -D would make sense if it were a many-to-many
>> > relationship (ie, you could sensibly use many different configs with
>> the
>> > same data dir), but the case for multiple configs with one data dir
>> > seems pretty weak to me, and outweighed by the risk factors.
>>
>> I hear "risk" but what risk?
>
> OK, you look at your postgresql.conf file, and it says the data is in
> /var/data, but the postgresql.conf file was found via PGDATA, so it is
> ignored, and the directory is /var/local/pgsql.  That seems confusing
> because someone looking at the file sees the wrong information.

Given enough time and tinkering, anyone can screw up an installation of
anything.
>
> For me, having a config file that both "is found" with ignored values,
> and another mode where the config file points to everything seems
> strange.  Does any other OS project do this?

Almost all of the open source services allow you to override the default
settings in the configuration file with a command line option. Anyone can
make a case for almost any system which seems confusing. I think we all
agree that all configration should be made in a configuration file,
however we all recognize that, sometimes, an easy to use command line
option to override the configuration settings is helpful for many reasons.


>
> What if someone does -C /var/data/postgresql.conf, and postgresql.conf
> say to use /usr/local/data for data, what do we do?

Command line is always the last authority, followed by the configuration
file, followed environment, followed by any hard coded defaults.

Sure, you can come up with problems in every system, but that's easy. You
guys are very fond of symlinks, do you know how problematic they are?
Their behavior is very much dependent on the application and the options
used. I can tell you how often I've seen "unexpected" behavior with
symlinks. Either the backup system just backs up the link when you think
it should copy the data, or copies the data when it should copy just the
link.


Re: PostgreSQL configuration

From
Tom Lane
Date:
I just had a thought about this: seems like a big part of the objection
is the risk of specifying -C and -D that don't go together.  Well, what
if they were the same switch?  Consider the following simplification of
the proposed patch:

1. Postmaster has just one switch, '-D datadir' with fallback to
environmental variable PGDATA, same as it ever was.

2. The files that must be found in this directory are the configuration
files, namely postgresql.conf, pg_hba.conf, pg_ident.conf.  (And any
files they include are taken as relative to this directory if not
specified with absolute path.  We'll still add the #include facility
to postgresql.conf; the others have it already IIRC.)

3. postgresql.conf can contain a new changeable-only-at-startup
configuration setting which we need to think of a good name for.
("datadir" seems confusing to me in this context, though maybe it would
do; anyway I haven't got a better idea yet.)  All the non-configuration
files are located under that directory.  Of course it defaults to being
the -D directory if not specified in postgresql.conf.

If we do things this way, we have the following properties:

* Default behavior is same as it ever was, in particular there is no
difficulty in making a test installation in a nontypical place.

* Config files can easily be separated from data and can be backed up
separately (no need for the etc/ or config/ subdirectory Bruce
suggested).

* It is not directly possible to use the same config with multiple
databases.  However one can easily imagine pointing the postmaster
to a config file that contains only a "datadir = " spec and a
#include of a sharable config file.  (I have to confess not having
thought about doing that in connection with the original patch
proposal.)

* If you want to think of this as config-centric, you can; if you want
to think of it as data-centric, you can do that too.  It's agnostic.

A typical setup for sharable config files would look like this:
you make directories named say "/etc/postgresql/postmasterN" which
will be the -D targets for each of your postmasters.  These contain
postgresql.conf files that contain "datadir = someplace" and
"include ../sharedconfigfile" and nothing else.  Shared config files
live in /etc/postgresql, per-database ones in its subdirectories.

This notion is really almost the same as the patch-as-submitted, but
there are a couple of key differences:

* I did not like the patch's confusion over -C-specifies-config-directory
versus -C-specifies-config-file.  One big reason not to like it is that
in the latter case it's not very clear what is the origin directory for
#include references in the config files.  I think we would do fine with
less confusion if we adopt just the specify-a-config-directory behavior.
I don't see a use-case that justifies the config-file option nor the
separate postgresql.conf entries for pg_hba.conf and pg_ident.conf
(which would have to be extended any time we add another config file).
Surely requiring a separate config subdirectory for each postmaster
isn't an objectionable amount of overhead.

* There isn't a way to get things wrong on the command line.  Well,
actually there is: if the "datadir" parameter works the same as all
other GUC parameters then one could override it on the command line
with "-c datadir=whatever".  Depending on how strongly you feel about
that being a Bad Idea, we could imagine putting in a special prohibition
against it.  But at least it wouldn't be the designed-in way of working
with shared config files.

* Barring the "-c datadir" scenario, there is a strong link from a
config subdirectory to its data area.  A simple addition to the proposal
would be to add a back-link: on first start, the postmaster would
automatically make a file in the data directory that contains the
absolute path of the config dir; on subsequent starts, check it still
matches.  This provides a simple interlock against accidentally starting
a postmaster with the wrong config files for the data area.  (You could
break the interlock at need by deleting the back-link file.)  In
particular, if you'd not bothered to remove the config files placed in
the data area by initdb, something like this is useful to ensure you
don't accidentally start the postmaster with -D pointing straight at
the data area where previously you'd pointed to a config directory.
It also provides documentation in both places about where the other
place is.


Something that remains unclear to me is what to do with the proposed
patch to support a secondary PID file.  This strikes me as a solution
in search of a problem --- it was claimed that this makes it easier to
manipulate the postmaster with "standard Unix tools", but what tools are
those and do we really want people frobbing the postmaster with them?
Again I'm not sold on the use-case for the feature.
        regards, tom lane


Re: PostgreSQL configuration

From
pgsql@mohawksoft.com
Date:
> I just had a thought about this: seems like a big part of the objection
> is the risk of specifying -C and -D that don't go together.  Well, what
> if they were the same switch?  Consider the following simplification of
> the proposed patch:

I was really excited about this idea, then I thought about it, and while
it would answer some of the issues I mean to address, I find myself a
little disappointed that some of the functionality I wanted, i.e. multiple
databases with the same configuration, was not possible. However,
compromise is good.

>
> 1. Postmaster has just one switch, '-D datadir' with fallback to
> environmental variable PGDATA, same as it ever was.

I like this, I think, ... but it removes the posibility to run the same
configuration with the same database. This scenario is one of my "best
case" reasons why I think my patch is good, but, I think I can get 99% of
what I'm looking for with my modification outlined at the bottom of this
post.


>
> 2. The files that must be found in this directory are the configuration
> files, namely postgresql.conf, pg_hba.conf, pg_ident.conf.  (And any
> files they include are taken as relative to this directory if not
> specified with absolute path.  We'll still add the #include facility
> to postgresql.conf; the others have it already IIRC.)

My patch *already* has this functionality if it is a directory. I agree
with this, it was suggested (maybe even by you) over a year ago.


[snip -- good stuff]

Tom, this is great! I think we are almost there and I really appreciate
your flexibility in view of my obstinance. :-)

I like what you suggest, While I don't get the -D and -C functionality
(which I don't use, but thought was cool), I think I would like to add one
thing:

postmaster -D /etc/postgres/postgresql.conf

If the path specified is a config file, then "data_dir" MUST address a
valid PostgreSQL data directory.

So, here is (how I see) the logical breakdown of the feature:

"postmaster -D /somedir/data" works as it always has, it points to the
data dirtectory in which all the various config files live. If No
"data_dir" is specified, then "/somedir/data" is assumed to be where base,
pg_xlog, pg_clog, and etc. reside.

If, however, "data_dir" is specified, the data oriented elements like
"global," "base," "pg_clog," and "pg_xlog" are contained within that
directory. (In the future, we may be able to specify these locations
separately)

If "postmaster -D /etc/postgresql.conf" points to a file, then that file
MUST specify the location of "data_dir," "hba_conf," and "ident_conf."

Like I said, while I don't get the convenience of combining "-D ..." and
"-C ..." I do get most of what I'm asking for.

If this works for all you guys, I'll submit a patch Wednesday.


Re: PostgreSQL configuration

From
Thomas Swan
Date:
Bruce Momjian wrote:

>Thomas Swan wrote:
>  
>
>>I thought the idea was to *reduce* the number of config files and provide
>>a unified configuration file.  Ideally, the unified configuration file
>>could eliminate the need for environment variables altogether.
>>
>>If I understand this correctly, the author was adding the ability to do
>>this, not remove the default behavior.
>>
>>A single configuration point (which can be changed with a commandline
>>switch) with the ability to include would be an exceptionally versatile
>>asset for postgresql.  Maybe relocating PID would be a bad idea and
>>someone could clobber their database, but that could be addressed with
>>LARGE WARNING in that config file where the option is available.
>>
>>Outside of the unified config file argument.   "Configuration includes"
>>give postgresql the ability to have shared settings.  You could have a
>>shared pg_hba.conf and test all other manner of settings with a set of
>>config files (sort_mem, shared_buffers, etc.) that say include a
>>standard_pg_hba.conf to control access.
>>    
>>
>
>I suggested a new pg_path configuration file because it would enable
>centralized config only if it was used.  By adding /data location to
>postgresql.conf, you have the postgresql.conf file acting sometimes via
>PGDATA and sometimes as a central config file, and I thought that was
>confusing.
>
>  
>
Understandably.    I think that using a config file that can specify all
of this would be a big win.   Imagine a simple start of the postmaster
with only a pointer to a config file, and not having to rely on special
environment variables or other command line switches.



Re: PostgreSQL configuration

From
Thomas Swan
Date:
pgsql@mohawksoft.com wrote:

>>I just had a thought about this: seems like a big part of the objection
>>is the risk of specifying -C and -D that don't go together.  Well, what
>>if they were the same switch?  Consider the following simplification of
>>the proposed patch:
>>    
>>
>
>I was really excited about this idea, then I thought about it, and while
>it would answer some of the issues I mean to address, I find myself a
>little disappointed that some of the functionality I wanted, i.e. multiple
>databases with the same configuration, was not possible. However,
>compromise is good.
>
>  
>
>>1. Postmaster has just one switch, '-D datadir' with fallback to
>>environmental variable PGDATA, same as it ever was.
>>    
>>
>
>I like this, I think, ... but it removes the posibility to run the same
>configuration with the same database. This scenario is one of my "best
>case" reasons why I think my patch is good, but, I think I can get 99% of
>what I'm looking for with my modification outlined at the bottom of this
>post.
>
>
>  
>
>>2. The files that must be found in this directory are the configuration
>>files, namely postgresql.conf, pg_hba.conf, pg_ident.conf.  (And any
>>files they include are taken as relative to this directory if not
>>specified with absolute path.  We'll still add the #include facility
>>to postgresql.conf; the others have it already IIRC.)
>>    
>>
>
>My patch *already* has this functionality if it is a directory. I agree
>with this, it was suggested (maybe even by you) over a year ago.
>
>
>[snip -- good stuff]
>
>Tom, this is great! I think we are almost there and I really appreciate
>your flexibility in view of my obstinance. :-)
>
>I like what you suggest, While I don't get the -D and -C functionality
>(which I don't use, but thought was cool), I think I would like to add one
>thing:
>
>postmaster -D /etc/postgres/postgresql.conf
>
>If the path specified is a config file, then "data_dir" MUST address a
>valid PostgreSQL data directory.
>  
>

This is exceptionally confusing.  Why not do a test and say that you
cannot specify a -C and a -D option at the same time.   This would still
assure backwards compatability and safeguard future installations.   If
the -C option is specified the datadir must be present in the config
file.  

If someone wants to specify the config file from a startup option, then
they must follow the new rules.  And, as this is new functionality, the
rules can be set now.

Adding one command line switch with the future possibility of
eliminating the others is a good tradeoff, IMHO.

>So, here is (how I see) the logical breakdown of the feature:
>
>"postmaster -D /somedir/data" works as it always has, it points to the
>data dirtectory in which all the various config files live. If No
>"data_dir" is specified, then "/somedir/data" is assumed to be where base,
>pg_xlog, pg_clog, and etc. reside.
>
>If, however, "data_dir" is specified, the data oriented elements like
>"global," "base," "pg_clog," and "pg_xlog" are contained within that
>directory. (In the future, we may be able to specify these locations
>separately)
>
>If "postmaster -D /etc/postgresql.conf" points to a file, then that file
>MUST specify the location of "data_dir," "hba_conf," and "ident_conf."
>
>Like I said, while I don't get the convenience of combining "-D ..." and
>"-C ..." I do get most of what I'm asking for.
>
>If this works for all you guys, I'll submit a patch Wednesday.
>
>---------------------------(end of broadcast)---------------------------
>TIP 8: explain analyze is your friend
>  
>



Re: PostgreSQL configuration

From
Kevin Brown
Date:
Tom Lane wrote:
> Well, the patch says that the command line switch wins, which is
> consistent with what we do for other command line switches (they all
> override the equivalent postgresql.conf entries).  This does seem a
> bit at variance with the stated goal of making the configuration more
> clearly documented, though :-(.  

Hmm...well, think of it as a tool.  It makes it *possible* to make the
configuration more clearly documented, and in fact makes it easy to do
so, but doesn't guarantee safety in all cases.

> If you actually use the capability then
> your config file will be lying to you about where things are.

Of course.  Just like your config file is lying about any configuration
option that is overridden on the command line.

I don't see why this is a problem, unless we intend to change the way
the entire GUC system works.

> It's worth pointing out in this connection that for the most part
> I think people are moving *away* from using command line switches;
> it's better to set the value in postgresql.conf, both for documentation
> reasons and because that way you have some chance of changing the value
> via config file update and SIGHUP.  The only way to change a value on
> the command line is to restart the postmaster.  Plus, if you're using a
> distribution-supplied init script to start the postmaster, it's hard to
> get any switches in without hacking the script anyway.

Now this raises a very interesting problem.  Namely, what happens if
you use the -C option to the postmaster as is being advocated, then
change the datadir entry in the config file, and send SIGHUP to the
postmaster?  Ooops.  Score one for Tom.  :-)

> Most of these objections also apply to values obtained from environment
> variables (the exception is that postgresql.conf can override
> environment variables).

To be honest, I think the use of the PG_DATA environment variable is the
biggest impediment to "self documentation" - the postmaster should not
use it.

The reason is that if PG_DATA is used to specify the location of the
data directory, you won't be able to find out where a running
postmaster's data directory is located without doing some heavy-duty
investigation.  Not all operating systems make it possible to determine
the values of a particular process' environment variables.

By requiring that the data directory be specified on the postmaster
command line, it becomes possible to always determine where a
postmaster's data directory resides just by looking at the ps output.


Now, I know you guys who do heavy duty development make use of PG_DATA.
I see no problem with having the code in postmaster that looks at
PG_DATA be surrounded by a #ifdef that is active whenever you're doing
development work.  But it should *not* be active on a production system.



Oh, as to the safety issue of a config file not properly corresponding
to a given data directory, that seems easy enough to solve: if a file
(call it "magic" for the purposes of discussion, though perhaps a better
name would be "do_not_remove" :-)  ) exists in the data directory, then
the value of a configuration variable (call it "magic", too) must match
the contents of that file.  If the values don't match then the postmaster
will issue an error and refuse to start.  If the file doesn't exist then
no "magic" configuration option need exist in the config file, and the
postmaster will start as usual.  So any administrator who wants to make
sure that a configuration file has to explicitly be targetted at the data
directory can do so.  End result: if you use the -D option on the command
line with an inappropriate -C option, the postmaster will refuse to run.



-- 
Kevin Brown                          kevin@sysexperts.com


Re: PostgreSQL configuration

From
Mark Kirkwood
Date:

Bruce Momjian wrote:

> Let me tell you the compromise I thought of.
>
>First, we put the config files (postgresql.conf, pg_hba.conf,
>pg_ident.conf) in data/etc by default.
>
>
>  
>
Sorry Bruce,

I was being slow :-) , I was thinking you were going to associate the 
config files with the binary distribution - I think I now realize that 
you were looking at pushing them down into $PGDATA/etc, which is quite 
nice and tidy.

best wishes

Mark


Re: PostgreSQL configuration

From
Mark Kirkwood
Date:
pgsql@mohawksoft.com wrote:

>ii) I think the -C switch *WITH* the -D switch has viable usability.
>Consider this, you are testing two different database layouts and/or RAID
>controllers. You could easily bounce back and forth from *identical*
>configurations like this:
>
>  
>
Convenient indeed, but I would like to see the association of .conf file 
-> data dir remain reasonably solid. Its all about the foot gun.

>iii) I don't like the PID file at all. Not one bit, but I had a few people
>ask for it in the patch, it works as advertized and expected. It isn't my
>place to say how someone should use something. One of my customers wanted
>it, so I provided them with it. That is the beauty of open source.
>
>
>
>  
>
I think that there is a difference between a special patch suitable for 
a particular customer and general release, and that maybe this addition 
falls right in there.

best wishes

Mark


Re: PostgreSQL configuration

From
Mark Kirkwood
Date:
Tom Lane wrote:

>I think if you spelled the subdir name "config" rather than "etc",
>it would be more obvious what's what.
>
>
>  
>
How about 'conf' - (familiar to anyone who has used apache or tomcat ....)

regards

Mark


Re: PostgreSQL configuration

From
Robert Treat
Date:
On Tuesday 13 April 2004 01:14, Kevin Brown wrote:
> Tom Lane wrote:
<snip>
> To be honest, I think the use of the PG_DATA environment variable is the
> biggest impediment to "self documentation" - the postmaster should not
> use it.
>
> The reason is that if PG_DATA is used to specify the location of the
> data directory, you won't be able to find out where a running
> postmaster's data directory is located without doing some heavy-duty
> investigation.  Not all operating systems make it possible to determine
> the values of a particular process' environment variables.
>

I think this is another vote for "store the PGDATA dir value inside a running 
postgresql" so you can query the running database to find out what datafiles 
it is using.

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


Re: PostgreSQL configuration

From
Tom Lane
Date:
Robert Treat <xzilla@users.sourceforge.net> writes:
> On Tuesday 13 April 2004 01:14, Kevin Brown wrote:
>> To be honest, I think the use of the PG_DATA environment variable is the
>> biggest impediment to "self documentation" - the postmaster should not
>> use it.

> I think this is another vote for "store the PGDATA dir value inside a running
> postgresql" so you can query the running database to find out what datafiles 
> it is using.

I agree --- we could answer this by adding some readout capability
(think "show datadir") rather than by taking away functionality.
Personally I rely quite a lot on setting PGDATA to keep straight which
installation I'm currently working with, so I'm not going to be happy
with a redesign that eliminates that variable without providing an
adequate substitute :-(
        regards, tom lane


Re: PostgreSQL configuration

From
Joe Conway
Date:
Tom Lane wrote:
> Personally I rely quite a lot on setting PGDATA to keep straight which
> installation I'm currently working with, so I'm not going to be happy
> with a redesign that eliminates that variable without providing an
> adequate substitute :-(

I'll second that.

Joe


Re: PostgreSQL configuration

From
Andrew Hammond
Date:
Mark Kirkwood wrote:
> 
> Tom Lane wrote:
> 
>> I think if you spelled the subdir name "config" rather than "etc",
>> it would be more obvious what's what.
>>
> How about 'conf' - (familiar to anyone who has used apache or tomcat ....)

How about 'etc' - (familiar ot anyone who has used unix)

--
Andrew Hammond


Re: PostgreSQL configuration

From
"Simon Riggs"
Date:
Joe Conway wrote
> Tom Lane wrote:
> > Personally I rely quite a lot on setting PGDATA to keep
> straight which
> > installation I'm currently working with, so I'm not going
> to be happy
> > with a redesign that eliminates that variable without providing an
> > adequate substitute :-(
>
> I'll second that.

Very much agreed. PGDATA is important, lets keep it, please.

For one thing, this type of mechanism is already used by Oracle, with
ORACLE_SID and ORACLE_HOME.

[It might not work like Apache, but IMHO this is less relevant. Apache
isn't typically configured by DBAs. PostgreSQL is, and familiar concepts
from other industry areas are probably more important to the success of
pg than conformance to internet/LINUX norms.]

Best Regards, Simon Riggs



Re: PostgreSQL configuration

From
Kevin Brown
Date:
Simon Riggs wrote:
> Very much agreed. PGDATA is important, lets keep it, please.

To me it's not so much whether or not PGDATA is kept around for the
system as a whole so much as how it's used.

In the general case, scripts are used to start the postmaster.  So using
PGDATA even if the postmaster doesn't directly make use of it is a
simple matter of adding '-D "$PGDATA"' to the command that invokes the
postmaster.

The goal here is simply to make it obvious to a system administrator where
the PG data directory that a given postmaster is using resides.  We can't
rely on the mechanism used to change the command string that ps shows for
the process: in my experience it's something that often does not work.
And in any case, the system administrator will also want to know exactly
what options were passed to the postmaster when it was invoked.


If there's any group that can figure out how to effortlessly get PGDATA
onto the command line of the backend utilities, it's the developer
group.  :-)


In any case, I'm not at all opposed to having the backend stuff know
about PGDATA during development, but for production you should have to
explicitly specify the data directory on the command line.  That seems
easy enough to do: #ifdef is your friend.



-- 
Kevin Brown                          kevin@sysexperts.com


Re: PostgreSQL configuration

From
Tom Lane
Date:
Kevin Brown <kevin@sysexperts.com> writes:
> The goal here is simply to make it obvious to a system administrator where
> the PG data directory that a given postmaster is using resides.

Why would it not be sufficient to add a read-only GUC variable that
tells that?  Connect to the postmaster and do "show datadir" and you're
done.  (Without this, it's not clear you've made any particular gain
anyway, since "a given postmaster" would typically mean "the one I can
connect to at this port", no?)

In any case I don't see how removing PGDATA would make this more
obvious.  You yourself just pointed out that the command-line arguments
of a postmaster aren't necessarily visible through ps; if they're not,
what have you gained in transparency by forbidding PGDATA?

> In any case, I'm not at all opposed to having the backend stuff know
> about PGDATA during development, but for production you should have to
> explicitly specify the data directory on the command line.

If you wish to do things that way, you can; but that doesn't mean that
everyone else should have to do it that way too.  If there were a
security or reliability hazard involved, I might agree with taking the
fascist approach, but I see no such hazard here ...
        regards, tom lane


Re: PostgreSQL configuration

From
Mark Kirkwood
Date:
Joe Conway wrote:

> Tom Lane wrote:
>
>> Personally I rely quite a lot on setting PGDATA to keep straight which
>> installation I'm currently working with, so I'm not going to be happy
>> with a redesign that eliminates that variable without providing an
>> adequate substitute :-(
>
>
> I'll second that.
>
>
I'll third (or whatever) it too :-)


Re: PostgreSQL configuration

From
Kevin Brown
Date:
Tom Lane wrote:
> Kevin Brown <kevin@sysexperts.com> writes:
> > The goal here is simply to make it obvious to a system administrator where
> > the PG data directory that a given postmaster is using resides.
> 
> Why would it not be sufficient to add a read-only GUC variable that
> tells that?  Connect to the postmaster and do "show datadir" and you're
> done.  (Without this, it's not clear you've made any particular gain
> anyway, since "a given postmaster" would typically mean "the one I can
> connect to at this port", no?)

That would probably be sufficient for most cases.  It wouldn't take care
of the case where there's a strict separation of powers between the
system administrator and the DBA, but only if the system were managed
badly (i.e., the SA and the DBA don't talk to each other very well).
That's probably something we shouldn't concern ourselves with.

> In any case I don't see how removing PGDATA would make this more
> obvious.  You yourself just pointed out that the command-line arguments
> of a postmaster aren't necessarily visible through ps; if they're not,
> what have you gained in transparency by forbidding PGDATA?

I think you misunderstood what I was saying (which means I didn't say it
right).

There are ways within a program to change what 'ps' shows as the
command line.  We use those methods to make it possible to see what
a given backend is doing by looking at the 'ps' output.  It would be
possible to have the postmaster use those ways in order to show which data
directory it is using even if it wasn't specified on the command line.
But in my experience, those ways don't work reliably on all systems.
On the systems that those methods don't work, what 'ps' shows is the
original command line that was used.  So clearly, the only way 'ps'
will show the data directory in that instance is if it was actually
specified on the command line.

> > In any case, I'm not at all opposed to having the backend stuff know
> > about PGDATA during development, but for production you should have to
> > explicitly specify the data directory on the command line.
> 
> If you wish to do things that way, you can; but that doesn't mean that
> everyone else should have to do it that way too.  If there were a
> security or reliability hazard involved, I might agree with taking the
> fascist approach, but I see no such hazard here ...

Fair enough.  The PGDATA issue isn't a big enough one that I'm terribly
concerned about it, especially if a read-only GUC variable is available
to give that information (something that, I think, should be there
anyway).


-- 
Kevin Brown                          kevin@sysexperts.com