Thread: Re: [SQL] createdb -D xxxx not working
[hackers added to cc: list] Peter Eisentraut <peter_e@gmx.net> writes: > On 2000-01-08, Tom Lane mentioned: >> Chris Griffin <cgriffin@websales.com> writes: >>>> # setenv PGDATA2 /web/sites/1.192/db-data >>>> # initlocation PGDATA2 >>>> # createdb -D PGDATA2 >> >> Surely you want $PGDATA2 in the latter two lines? > You might recall me mentioning that the whole alternative location > business doesn't work in the first place. I'll fix that this week; Chris > will have to wait until the next release. Fine. BTW, am I the only one who thinks that it's really silly for initlocation to expect to be given an unexpanded environment variable name? That's not normal practice (either elsewhere in Postgres, or anywhere else that I know of). It's confusing because it's not how an ordinary Unix user would expect a shell command to behave, and it doesn't buy any functionality that I can see. I'd vote for taking out the auto-expansion, so that the correct invocation becomesinitlocation $PGDATA2 which is what an average user would expect. Actually, after looking at what initlocation really does, which is darn near nothing, I wonder whether we shouldn't dispense with it altogether... regards, tom lane
> >> Surely you want $PGDATA2 in the latter two lines? > > You might recall me mentioning that the whole alternative location > > business doesn't work in the first place. I'm not recalling the details here; what is the problem? It works for me (but then I wrote it ;) > Fine. BTW, am I the only one who thinks that it's really silly for > initlocation to expect to be given an unexpanded environment variable > name? That's not normal practice (either elsewhere in Postgres, or > anywhere else that I know of). It's confusing because it's not how > an ordinary Unix user would expect a shell command to behave, and it > doesn't buy any functionality that I can see. It is not silly at all, unless you want to shoot holes in the basic premise of the alternate location. This is described in the docs, but the short form is: initlocation is used to create the directory structure *with correct permissions* for an alternate location. It takes an environment variable because usually the backend should have that same environment variable defined to ensure consistancy and as a security measure. The environment variable can be expanded or not; initlocation does the right thing in either case. createdb takes a "-D" argument, which specifies the alternate location. Unless allowed at compile-time, via the ALLOW_ABSOLUTE_DBPATHS variable, all references to an alternate location must refer to an environment variable, to give the dbadmin control over *where* alternate locations will appear. The mechanism is enforced by the backend, by looking for a directory delimiter in the datpath field of pg_database, then expanding the first field as an environment variable. On my system, with one database in an alternate location, this table looks like: test=# select * from pg_database; datname | datdba | encoding | datpath ------------+--------+----------+--------------template1 | 100 | 0 | template1postgres | 100 | 0 |postgresregression | 100 | 0 | regressiontest | 100 | 0 | PGDATA2/test (4 rows) So, this works: [postgres@golem parser]$ createdb -D PGDATA2 test2 CREATE DATABASE But this does not (but can be enabled at compile-time if the sysadmin wants to allow users to scatter databases everywhere (?!!)): [postgres@golem regress]$ createdb -D /opt/postgres/data2 test2 ERROR: The path '/opt/postgres/data2/test2' is invalid. This may be due to a missing environment variable on the server. createdb: Database creation failed. > I'd vote for taking > out the auto-expansion, so that the correct invocation becomes > initlocation $PGDATA2 > which is what an average user would expect. But the average user does not necessarily have access to the PGDATA2 environment variable! This is usually a sysadmin function. > Actually, after looking at what initlocation really does, which is > darn near nothing, I wonder whether we shouldn't dispense with it > altogether... ?? I hope the explanation above helps... - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
On 2000-01-11, Thomas Lockhart mentioned: > > >> Surely you want $PGDATA2 in the latter two lines? > > > You might recall me mentioning that the whole alternative location > > > business doesn't work in the first place. > > I'm not recalling the details here; what is the problem? It works for > me (but then I wrote it ;) > Darn, now I already rewrote the thing to where I considered it working. We had several on-and-off discussions (mostly Tom and I) about it during the last two months at least, including users complaining. The sort of scheme I came up with scraps the environment variable stuff (since it's not quite safe either and has several bugs in the code that create a bunch of problems along the way) and lets the following happen (example paths): CREATE DATABASE foo; --> /usr/local/pgsql/data/base/foo/pg_class CREATE DATABASE foo WITH LOCATION 'bar'; --> /usr/local/pgsql/data/base/bar/pg_class CREATE DATABASE foo WITH LOCATION '/some/where'; --> /some/where/pg_class CREATE DATABASE foo WITH LOCATION 'foo/bar'; is disabled. I suppose I could stick the environment variable deal back in so that users don't have to remember any complicated paths, but this is the dbadmin's job anyway, so he should be able to remember it. Anyway, in order for a path to be allowed it has to be listed in PG_ALTLOC, which is a colon-delimited, marginally wildcard enabled list of allowed locations. In some future life this could be included in a configuration file. > initlocation is used to create the directory structure *with correct > permissions* for an alternate location. It takes an environment I think one of the problems was actually that the path initlocation assumed and the one createdb worked with were different. > variable because usually the backend should have that same environment > variable defined to ensure consistancy and as a security measure. The > environment variable can be expanded or not; initlocation does the > right thing in either case. It's false security though, since for complete security the dbadmin would have to delete all other environment variables altogether. > > I'd vote for taking > > out the auto-expansion, so that the correct invocation becomes > > initlocation $PGDATA2 > > which is what an average user would expect. > > But the average user does not necessarily have access to the PGDATA2 > environment variable! This is usually a sysadmin function. The average user doesn't even have access to the machine the database is running on, so he can't do any initlocation either way. Then again the average user doesn't create databases either. But there is no real point for initlocation since a mere mkdir() call in createdb() will do the job. Now I just got done with that coding 10 minutes ago but now that someone actually spoke up in defence of this mechanism I'm going to wait and see what you think about the revised (or any) scheme. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
> Darn, now I already rewrote the thing to where I considered it working. We > had several on-and-off discussions (mostly Tom and I) about it during the > last two months at least, including users complaining. afaik I've been on the hackers list most of the time, and don't recall significant discussion. > The sort of scheme I came up with scraps the environment variable stuff > (since it's not quite safe either and has several bugs in the code that > create a bunch of problems along the way) and lets the following > happen (example paths): > > CREATE DATABASE foo; --> /usr/local/pgsql/data/base/foo/pg_class > CREATE DATABASE foo WITH LOCATION 'bar'; > --> /usr/local/pgsql/data/base/bar/pg_class > CREATE DATABASE foo WITH LOCATION '/some/where'; > --> /some/where/pg_class > CREATE DATABASE foo WITH LOCATION 'foo/bar'; > is disabled. I suppose I could stick the > environment variable deal back in so that users don't have to remember any > complicated paths, but this is the dbadmin's job anyway, so he should be > able to remember it. Huh? The "environment variable deal" is intended to provide security and easier operation. The examples above which allow specifying absolute paths (e.g. "/some/where/pg_class") are specifically disallowed by default, for a reason. If there were bugs in that, let's talk about it, but I'm *very* uncomfortable making wholesale changes without a discussion. And what exactly were "several bugs in the code"?? Better be specific; that's my code we're talking about :/ It sounds like you and Tom Lane have had discussions on this, but I'm not certain it was with a clear understanding of what the environment variables provided. Before killing the capability, I'd like to make sure that we understand what it did. otoh, you have a new proposal (the first I've heard of it afaik)... > Anyway, in order for a path to be allowed it has to be listed in > PG_ALTLOC, which is a colon-delimited, marginally wildcard enabled list of > allowed locations. In some future life this could be included in a > configuration file. That sounds like a good capability. There is no reason (yet) why the environment variable mechanism can not use this too; e.g. setenv PG_ALTLOC PGDATA2:/home/peter/unsafe/unprotected/open/path (just had to put that last one in :))) > > initlocation is used to create the directory structure *with correct > > permissions* for an alternate location. It takes an environment > I think one of the problems was actually that the path initlocation > assumed and the one createdb worked with were different. I'm not sure to what you are referring. To the fact that there is an implicit "/base" in the actual path? e.g. initlocation PGDATA2 createdb -D PGDATA2 test gives an actual path $PGDATA2/base/pg_class ? That's a security/integrity benefit since: 1) there is not likely to be a random environment variable which happens to point to a valid directory which *also* has a subdirectory called "base". 2) it reduces the chance that a user/dbadmin will not use initlocation to create the database area, hence reducing the chance that a user/dbadmin has not set the permissions correctly. > > variable because usually the backend should have that same environment > > variable defined to ensure consistancy and as a security measure. The > > environment variable can be expanded or not; initlocation does the > > right thing in either case. > It's false security though, since for complete security the dbadmin would > have to delete all other environment variables altogether. Theoretically true (though low-risk per above discussion), but this would be addressed by your PG_ALTLOC augmentation, which I like btw. > > > I'd vote for taking > > > out the auto-expansion, so that the correct invocation becomes > > > initlocation $PGDATA2 > > > which is what an average user would expect. > > But the average user does not necessarily have access to the PGDATA2 > > environment variable! This is usually a sysadmin function. > The average user doesn't even have access to the machine the database is > running on, so he can't do any initlocation either way. Then again the > average user doesn't create databases either. But there is no real point > for initlocation since a mere mkdir() call in createdb() will do the job. Hmm. Until I see what you are actually doing, I can't comment on whether this is indeed adequate. But decoupling the location creation from database creation give the dbadmin control over how permissions are set and where databases can be created. I'm not sure that is the case if it is all done within the backend at database creation time. > Now I just got done with that coding 10 minutes ago but now that someone > actually spoke up in defence of this mechanism I'm going to wait and see > what you think about the revised (or any) scheme. PG_ALTLOC seems to be a great feature. But afaict the environment variable feature is useful, safe (as much or more so than absolute paths, anyway), can coexist, and is a convenience. I'd like to continue the discussion until I'm convinced, or least beaten into submission ;) - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
On 2000-01-12, Thomas Lockhart mentioned: > Huh? The "environment variable deal" is intended to provide security > and easier operation. The examples above which allow specifying > absolute paths (e.g. "/some/where/pg_class") are specifically > disallowed by default, for a reason. If there were bugs in that, let's > talk about it, but I'm *very* uncomfortable making wholesale changes > without a discussion. And what exactly were "several bugs in the > code"?? Better be specific; that's my code we're talking about :/ The starting point for my investigation was actually the issue of arbitrary characters in database names, which lead me to fix up the createdb and dropdb code to make it more error proof, which led me to all kinds of areas of the code that looked ages old, with comments not matching the code, dead code, etc. The location issue was just one of those. (Especially those /* this is work arround only !!! */ sections from more than 5 years ago concern me a little ...) It's my idea of a learning experience. > > > initlocation is used to create the directory structure *with correct > > > permissions* for an alternate location. It takes an environment If I create a database with a normal name, the code makes a directory /usr/local/pgsql/data/base/testdb with the proper permissions. There's no reason why it wouldn't be able to do the same somewhere else. It's redundant. > 1) there is not likely to be a random environment variable which > happens to point to a valid directory which *also* has a subdirectory > called "base". Forgive me, but "not likely" means zero security to me. If you want to make it secure, be for real. Perhaps I'm a little envvar-phobic in general. I got my reasons, but that shouldn't stand in other's ways. > 2) it reduces the chance that a user/dbadmin will not use initlocation > to create the database area, hence reducing the chance that a > user/dbadmin has not set the permissions correctly. The prototype code I got lying around insists on creating the directory itself, so the chances of using an insecure directory are zero. But there are also other ways to ensure this rather than another utility. Okay, now to the "formal" proposal: * A database name can contain any character (up to NAMEDATALEN) * By default, the database name is appended to DataDir/base/ to form the file system location of the database. * There are certain characters that will not be allowed in database paths because they are potentially "shell'ishly dangerous". In my current layout this includes everything from 1 to 31 ascii as well as single-quote (') and dot (.). If you choose to name a database this way, you need to override the path to something else. * If you override the path to a name not containing a slash, the name will in the same fashion be appended to DataDir/base/. Any future attempts to use the same path will fail. * If you override the path to a name including a slash but not at the start, the part up to the first slash will be interpreted as an environment variable. The database directory will be the immediate directory specified and will be created by createdb() (which must have permission to do so). If it already exists, it will attempt to fix the permissions. * If you specify an absolute path then it will use that very path and it will create the directory as above. * Either way, in order to use a path outside DataDir, it must be listed in some configuration option (such as PG_ALTLOC). Environment variable based paths can be included in the natural way, e.g., to allow a path 'PGDATA2/foo' write PG_ALTLOC=$PGDATA/foo, to allow anything under PGDATA2, write PG_ALTLOC=$PGDATA/*. In practice that would mean: 1. If you don't use this feature, nothing changes. 2. If you want to use this feature to "recode" funny characters, you may. (e.g., CREATE DATABASE "bbb'''..." WITH LOCATON = 'bbb______';) 3. If you want to stick one particular database somewhere else, create the directory (or at least the directory above it), include it in PG_ALTLOC and go ahead. 4. If you want to provide users the option of storing databases at several alternative locations, set up mnemonic environment variables, create the directories corresponding to them, and put something like PGDATA2/*:PGDATA3/*:... in PG_ALTLOC. Are there other circumstances where this would be used? What could be discussed it the exact interpretation of a path such as LOCATION = '/mnt/somewhere/foo' with regards to whether 'base' should be appended or not or the name of the database should be appended to it or not or how one could otherwise do the name "recoding" in that circumstance. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden