Thread: Re: [GENERAL] Physical Database Configuration
On Wednesday 25 June 2003 20:49, nolan@celery.tssi.com wrote: > > Well, correct solution is to implement tablespaces on which objects like > > databases, tables and indexes can be put. > > I've not looked at the SQL standard, but it seems to me like the order > should be: > > Databases > Tablespaces > Schemas > Objects (tables, indexes, functions, etc.) That should be Tablespaces databases schemas objects with each of them implemented as a directory and data files under it. If we could get a quota check propogated in both direction, that would be pretty good, may be a warning when things start getting close to limit. > And it really isn't hierarchical. As I understand them (based on my > Oracle background), tablespaces, unlike schemas, do NOT create a layer > of data abstraction. That is to say, while the same table name > can exist in multiple schemas, only one instance of a given table name > within a given schema can exist, regardless of what tablespace it is in. Well, if same table name exists in two different databases under same tablespace, what's the problem? > Whether or not two databases can share tablespaces isn't clear to me, > though as a DBA I can think of good reasons why they probably shouldn't > do so, I'm not sure if that is an absolute. Well, I would say they should be allowed to. Shridhar
> That should be > > Tablespaces > databases > schemas > objects > > with each of them implemented as a directory and data files under it. If we > could get a quota check propogated in both direction, that would be pretty > good, may be a warning when things start getting close to limit. I disagree. Just as you can have multiple schemas within one database you can have multiple tablespaces within one database. And the tablespace is irrelevant as far as specifying an object is concerned. A fully qualified object would be: database.schema.object, not tablespace.database.schema.object or database.tablespace.schema.object. -- Mike Nolan
nolan@celery.tssi.com writes: > I disagree. Just as you can have multiple schemas within one database > you can have multiple tablespaces within one database. > And the tablespace is irrelevant as far as specifying an object is concerned. > A fully qualified object would be: > database.schema.object, > not tablespace.database.schema.object or database.tablespace.schema.object. Right, the tablespace structure is really orthogonal to the database/schema structure. I would envision tablespaces as being named by database-cluster-wide names, just as users and groups are. Any given table could be placed in any tablespace (although perhaps we want to invent some permission mechanism here). Physically a tablespace is a directory with sub-directories for databases under it --- so $PGDATA/base plays the role of the default tablespace for a cluster. (The reason you need per-database sub-directories is mostly to support DROP DATABASE, which has to be able to nuke a database without knowing exactly what's in it.) But this structure doesn't have anything to do with the logical structure of the database cluster. There are a bunch of interesting locking issues to be solved, but the storage layout ideas are pretty clear in my mind. regards, tom lane
On Thursday 26 June 2003 20:22, Tom Lane wrote: > nolan@celery.tssi.com writes: > > I disagree. Just as you can have multiple schemas within one database > > you can have multiple tablespaces within one database. > > > > And the tablespace is irrelevant as far as specifying an object is > > concerned. A fully qualified object would be: > > database.schema.object, > > not tablespace.database.schema.object or > > database.tablespace.schema.object. > > Right, the tablespace structure is really orthogonal to the > database/schema structure. Well, consider this. Keep in mind that all of them are directories.. Tablespace1db1 schema1 table1 table2db2 schema1 table1 table2 Tablespace2db1 schema1 index1 Now, unlike Oracle, I don't remember anybody suggesting to get away with $PGDATA entirely because we have tablespaces. $PGDATA is going to be available and will continue to host at least one directory for each database. So postgresql now where to find all tablespaces that hosts that particular database related object. Certainly does not sound impossible or illogical to me. And I agree. The paradigm, 'A tablespace can host multiple database and a database can span across multiple tablespaces' is absolutely arthgonal. Convoluted is what I would term it. But hey, it's the most flexible one!!..:-) > Physically a tablespace is a directory with sub-directories for > databases under it --- so $PGDATA/base plays the role of the default > tablespace for a cluster. (The reason you need per-database > sub-directories is mostly to support DROP DATABASE, which has to be > able to nuke a database without knowing exactly what's in it.) But > this structure doesn't have anything to do with the logical structure > of the database cluster. Well, with above proposal, drop database should be as simple. It's just that it would be more than one `rm -rf`rather than just one. HTHShridhar
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes: > Well, consider this. Keep in mind that all of them are directories.. I can see no reason that we'd want a level of directory associated with schemas... > Well, with above proposal, drop database should be as simple. It's just that > it would be more than one `rm -rf`rather than just one. Right, there would be potentially one per tablespace. The key point here is that the tablespace definitions are known cluster-wide, so a "DROP DATABASE x" command running in database y would still be able to figure out which subdirectories it needs to zap. regards, tom lane
On Thursday 26 June 2003 21:29, Tom Lane wrote: > Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes: > > Well, consider this. Keep in mind that all of them are directories.. > > I can see no reason that we'd want a level of directory associated with > schemas... Moving a multi-hundreds-of-GB table across schemas would be sooo easy..:-) I don't know how difficult/time consuming that is right now. Shouldn't be actually if PG updates the schema contents in it's catalog but anyway.. I just put it for clarification. If PG can do everything directory has to offer, well, we don't need directory for schemas. Shridhar
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes: > On Thursday 26 June 2003 21:29, Tom Lane wrote: >> I can see no reason that we'd want a level of directory associated with >> schemas... > Moving a multi-hundreds-of-GB table across schemas would be sooo easy..:-) No, it would be harder. regards, tom lane
> > Well, with above proposal, drop database should be as simple. It's just that > > it would be more than one `rm -rf`rather than just one. > > Right, there would be potentially one per tablespace. The key point > here is that the tablespace definitions are known cluster-wide, so a > "DROP DATABASE x" command running in database y would still be able > to figure out which subdirectories it needs to zap. It sounds like you envision tablespaces (or at least the default tablespace) as being above databases in some contexts. Reducing other tablespaces to mere subdirectories under the 'base' directory for a database sounds like a practical implementation measure. I presume that the 'global' directory is for stuff that is not specific to any one database within a database cluster. If so, there is an inconsistency in the current directory structure in that SOME global information is in the main /usr/local/pgsql/data directory (or equivalent) while other global information is in the global subdirectory. Being able to zap a database with one or more 'rm -rf' commands assumes that there will be files from just ONE database permitted in any given tablespace, and ONLY files from that database. The former is probably a good thing to require and enforce, the latter is unenforcable but probably still advisable. Is this doable within the time frame for the 7.4 feature freeze? -- Mike Nolan
On Thursday 26 June 2003 21:56, nolan@celery.tssi.com wrote: > Is this doable within the time frame for the 7.4 feature freeze? Good question. If it gets in 7.4, that would be more than a killer feature to put against 7.4 release, with due respect to all other enhancements in progress.. Shridhar
nolan@celery.tssi.com writes: > Being able to zap a database with one or more 'rm -rf' commands assumes > that there will be files from just ONE database permitted in any given > tablespace, and ONLY files from that database. I said no such thing. Look at the structure again: $PGDATA/base/dboid/...stuff... sometablespace/dboid/...stuff... othertablespace/dboid/...stuff... DROPDB needs to nuke <somepath>/dboid/ for each tablespace's associated <somepath>. The other design simplifies DROPDB at the cost of increased complexity for every other tablespace management operation, since you'd need to cope with a symlink in each database for each tablespace. Also, this scheme is at least theoretically amenable to a symlink-free implementation, though I personally don't give a darn whether tablespaces are supported on Windows and thus wouldn't expend the extra effort needed to keep track of full paths. I'd want $PGDATA/tablespaces/tboid to be a symlink to the root of the tablespace with a given OID, and then the actual pathname used to access a table in tablespace tboid, database dboid, table filenode rfoid would look like$PGDATA/tablespaces/tboid/dboid/rfoid But a Windoze version could in theory keep track of tablespace locations directly, and replace the first part of this path with the actual tablespace location. If we put tablespaces under directories then the facility has zero functionality without symlinks, because you couldn't actually do anything to segregate stuff within a database across different devices. BTW, we'd probably remove $PGDATA/base in favor of $PGDATA/tablespaces/N for some fixed-in-advance N that is the system tablespace, and we'd require all system catalogs to live in this tablespace --- certainly at least pg_class and its indexes. Otherwise you have circularity problems in finding the catalogs ... regards, tom lane
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes: > On Thursday 26 June 2003 21:56, nolan@celery.tssi.com wrote: >> Is this doable within the time frame for the 7.4 feature freeze? > Good question. ROTFL... the answer is no. Feature freeze is Tuesday, people. In practice, the time to start coding new stuff is already long past. Especially major new stuff. If you start now you might have something done for 7.5. regards, tom lane
> Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes: > > On Thursday 26 June 2003 21:56, nolan@celery.tssi.com wrote: > >> Is this doable within the time frame for the 7.4 feature freeze? > > > Good question. > > ROTFL... the answer is no. Feature freeze is Tuesday, people. In > practice, the time to start coding new stuff is already long past. > Especially major new stuff. > > If you start now you might have something done for 7.5. Forgive us, Tom, we are still learning how the cycle works. (Or at least I am.) I am also probably grossly underestimating the pervasiveness of implementing tablespaces. -- Mike Nolan
> Good question. > > If it gets in 7.4, that would be more than a killer feature to put against 7.4 > release, with due respect to all other enhancements in progress.. It's not going to happen. Chris
> > Tablespaces > > databases > > schemas > > objects > > > > with each of them implemented as a directory and data files under it. If we > > could get a quota check propogated in both direction, that would be pretty > > good, may be a warning when things start getting close to limit. Database do not exist inside tablespaces, as they are not relations. Only tables, views, sequences and indexes will live in tablespaces. Even then, I probably won't bother allowing it for sequences and views since they don't occupy much space. A database can specify a DEFAULT tablespace, as can a schema, but they do not live in tablespaces themselves. Chris
Tablespaces (was Re: [GENERAL] Physical Database Configuration )
From
"Christopher Kings-Lynne"
Date:
> ROTFL... the answer is no. Feature freeze is Tuesday, people. In > practice, the time to start coding new stuff is already long past. > Especially major new stuff. > > If you start now you might have something done for 7.5. Can everyone who is interested in actually coding a tablespaces implementation please email me, and I will create a list and we'll get cracking. Maybe we'll have to run our own cvs or maybe the Postgres guys will be nice and do a branch for us :) So far, Shridhar and Gavin seem interested?? Chris
I thought Tablespaces were already implemented. Are they not? On Thu, 2003-06-26 at 22:10, Christopher Kings-Lynne wrote: > > ROTFL... the answer is no. Feature freeze is Tuesday, people. In > > practice, the time to start coding new stuff is already long past. > > Especially major new stuff. > > > > If you start now you might have something done for 7.5. > > Can everyone who is interested in actually coding a tablespaces > implementation please email me, and I will create a list and we'll get > cracking. Maybe we'll have to run our own cvs or maybe the Postgres guys > will be nice and do a branch for us :) > > So far, Shridhar and Gavin seem interested?? > > Chris > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- Austin Gonyou <austin@coremetrics.com> Coremetrics, Inc.
> I thought Tablespaces were already implemented. Are they not? Apparently not. A group has been formed to work on it, though. -- Mike Nolan
No, they're not. There are some folks who have hacked on them in the past, but nothing's been committed. On 27 Jun 2003, Austin Gonyou wrote: > I thought Tablespaces were already implemented. Are they not? > > > On Thu, 2003-06-26 at 22:10, Christopher Kings-Lynne wrote: > > > ROTFL... the answer is no. Feature freeze is Tuesday, people. In > > > practice, the time to start coding new stuff is already long past. > > > Especially major new stuff. > > > > > > If you start now you might have something done for 7.5. > > > > Can everyone who is interested in actually coding a tablespaces > > implementation please email me, and I will create a list and we'll get > > cracking. Maybe we'll have to run our own cvs or maybe the Postgres guys > > will be nice and do a branch for us :) > > > > So far, Shridhar and Gavin seem interested?? > > > > Chris > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
No, they're not. Chris On 27 Jun 2003, Austin Gonyou wrote: > I thought Tablespaces were already implemented. Are they not? > > > On Thu, 2003-06-26 at 22:10, Christopher Kings-Lynne wrote: > > > ROTFL... the answer is no. Feature freeze is Tuesday, people. In > > > practice, the time to start coding new stuff is already long past. > > > Especially major new stuff. > > > > > > If you start now you might have something done for 7.5. > > > > Can everyone who is interested in actually coding a tablespaces > > implementation please email me, and I will create a list and we'll get > > cracking. Maybe we'll have to run our own cvs or maybe the Postgres guys > > will be nice and do a branch for us :) > > > > So far, Shridhar and Gavin seem interested?? > > > > Chris > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- > Austin Gonyou <austin@coremetrics.com> > Coremetrics, Inc. >
Tom Lane wrote: > nolan@celery.tssi.com writes: > > I disagree. Just as you can have multiple schemas within one database > > you can have multiple tablespaces within one database. > > > And the tablespace is irrelevant as far as specifying an object is concerned. > > A fully qualified object would be: > > database.schema.object, > > not tablespace.database.schema.object or database.tablespace.schema.object. > > Right, the tablespace structure is really orthogonal to the > database/schema structure. > > I would envision tablespaces as being named by database-cluster-wide > names, just as users and groups are. Any given table could be placed > in any tablespace (although perhaps we want to invent some permission > mechanism here). > > Physically a tablespace is a directory with sub-directories for > databases under it --- so $PGDATA/base plays the role of the default > tablespace for a cluster. (The reason you need per-database > sub-directories is mostly to support DROP DATABASE, which has to be > able to nuke a database without knowing exactly what's in it.) But > this structure doesn't have anything to do with the logical structure > of the database cluster. Another good reason for per-database directories under the tablespace is to prevent directories from containing too many files. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
> Another good reason for per-database directories under the tablespace is > to prevent directories from containing too many files. Actually, I would take that as an reason not to have database directories. If the number of files becomes a concern, we would need some kind of a hashing algorithm to disperse them appropriately. The database directory would just get in the way without really adding anything. Is 1000 per files as a directory limit still the rule of thumb for a limit? A single TB sized db would start to run into those types of limits.
On Fri, 18 Jul 2003, Bruce Momjian wrote: > Tom Lane wrote: > > nolan@celery.tssi.com writes: > > > I disagree. Just as you can have multiple schemas within one database > > > you can have multiple tablespaces within one database. > > > > > And the tablespace is irrelevant as far as specifying an object is concerned. > > > A fully qualified object would be: > > > database.schema.object, > > > not tablespace.database.schema.object or database.tablespace.schema.object. > > > > Right, the tablespace structure is really orthogonal to the > > database/schema structure. > > > > I would envision tablespaces as being named by database-cluster-wide > > names, just as users and groups are. Any given table could be placed > > in any tablespace (although perhaps we want to invent some permission > > mechanism here). > > > > Physically a tablespace is a directory with sub-directories for > > databases under it --- so $PGDATA/base plays the role of the default > > tablespace for a cluster. (The reason you need per-database > > sub-directories is mostly to support DROP DATABASE, which has to be > > able to nuke a database without knowing exactly what's in it.) But > > this structure doesn't have anything to do with the logical structure > > of the database cluster. > > Another good reason for per-database directories under the tablespace is > to prevent directories from containing too many files. I have a 3/4s finish implementation of table spaces. Per database directories under the table space are required because CREATE DATABASE just copies the data directory of the template database. If the databases were in the same table space, they would need to share system catalogs and it would also not be possible to create a database in a new table space because the pg_class and pg_namespace would have references to the table spaces used by the template database, not the new table space. I will elaborate further when I have a near complete patch. Thanks, Gavin