Thread: Re: [GENERAL] Physical Database Configuration

Re: [GENERAL] Physical Database Configuration

From
Shridhar Daithankar
Date:
On Wednesday 25 June 2003 20:49, nolan@celery.tssi.com wrote:
> > Well, correct solution is to implement tablespaces on which objects like
> > databases, tables and indexes can be put.
>
> I've not looked at the SQL standard, but it seems to me like the order
> should be:
>
> Databases
>    Tablespaces
>       Schemas
>          Objects (tables, indexes, functions, etc.)

That should be 

Tablespaces databases  schemas   objects

with each of them implemented as a directory and data files under it. If we 
could get a quota check propogated in both direction, that would be pretty 
good, may be a warning when things start getting close to limit. 

> And it really isn't hierarchical.  As I understand them (based on my
> Oracle background), tablespaces, unlike schemas, do NOT create a layer
> of data abstraction.   That is to say, while the same table name
> can exist in multiple schemas, only one instance of a given table name
> within a given schema can exist, regardless of what tablespace it is in.

Well, if same table name exists in two different databases under same 
tablespace, what's the problem?

> Whether or not two databases can share tablespaces isn't clear to me,
> though as a DBA I can think of good reasons why they probably shouldn't
> do so, I'm not sure if that is an absolute.

Well, I would say they should be allowed to.
Shridhar



Re: [GENERAL] Physical Database Configuration

From
nolan@celery.tssi.com
Date:
> That should be 
> 
> Tablespaces
>   databases
>    schemas
>     objects
> 
> with each of them implemented as a directory and data files under it. If we 
> could get a quota check propogated in both direction, that would be pretty 
> good, may be a warning when things start getting close to limit. 

I disagree.  Just as you can have multiple schemas within one database
you can have multiple tablespaces within one database.  

And the tablespace is irrelevant as far as specifying an object is concerned.
A fully qualified object would be:    database.schema.object,
not tablespace.database.schema.object or database.tablespace.schema.object.
--
Mike Nolan


Re: [GENERAL] Physical Database Configuration

From
Tom Lane
Date:
nolan@celery.tssi.com writes:
> I disagree.  Just as you can have multiple schemas within one database
> you can have multiple tablespaces within one database.  

> And the tablespace is irrelevant as far as specifying an object is concerned.
> A fully qualified object would be: 
>     database.schema.object,
> not tablespace.database.schema.object or database.tablespace.schema.object.

Right, the tablespace structure is really orthogonal to the
database/schema structure.

I would envision tablespaces as being named by database-cluster-wide
names, just as users and groups are.  Any given table could be placed
in any tablespace (although perhaps we want to invent some permission
mechanism here).

Physically a tablespace is a directory with sub-directories for
databases under it --- so $PGDATA/base plays the role of the default
tablespace for a cluster.  (The reason you need per-database
sub-directories is mostly to support DROP DATABASE, which has to be
able to nuke a database without knowing exactly what's in it.)  But
this structure doesn't have anything to do with the logical structure
of the database cluster.

There are a bunch of interesting locking issues to be solved, but the
storage layout ideas are pretty clear in my mind.
        regards, tom lane


Re: [GENERAL] Physical Database Configuration

From
Shridhar Daithankar
Date:
On Thursday 26 June 2003 20:22, Tom Lane wrote:
> nolan@celery.tssi.com writes:
> > I disagree.  Just as you can have multiple schemas within one database
> > you can have multiple tablespaces within one database.
> >
> > And the tablespace is irrelevant as far as specifying an object is
> > concerned. A fully qualified object would be:
> >     database.schema.object,
> > not tablespace.database.schema.object or
> > database.tablespace.schema.object.
>
> Right, the tablespace structure is really orthogonal to the
> database/schema structure.

Well, consider this. Keep in mind that all of them are directories..

Tablespace1db1 schema1  table1  table2db2 schema1  table1  table2

Tablespace2db1 schema1  index1


Now, unlike Oracle, I don't remember anybody suggesting to get away with 
$PGDATA entirely because we have tablespaces. $PGDATA is going to be 
available and will continue to host at least one directory for each database. 
So postgresql now where to find all tablespaces that hosts that particular 
database related object.

Certainly does not sound impossible or illogical to me.

And I agree. The paradigm, 'A tablespace can host multiple database and a 
database can span across multiple tablespaces' is absolutely arthgonal. 
Convoluted is what I would term it. But hey, it's the most flexible 
one!!..:-)

> Physically a tablespace is a directory with sub-directories for
> databases under it --- so $PGDATA/base plays the role of the default
> tablespace for a cluster.  (The reason you need per-database
> sub-directories is mostly to support DROP DATABASE, which has to be
> able to nuke a database without knowing exactly what's in it.)  But
> this structure doesn't have anything to do with the logical structure
> of the database cluster.

Well, with above proposal, drop database should be as simple. It's just that 
it would be more than one `rm -rf`rather than just one.

HTHShridhar



Re: [GENERAL] Physical Database Configuration

From
Tom Lane
Date:
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> Well, consider this. Keep in mind that all of them are directories..

I can see no reason that we'd want a level of directory associated with
schemas...

> Well, with above proposal, drop database should be as simple. It's just that 
> it would be more than one `rm -rf`rather than just one.

Right, there would be potentially one per tablespace.  The key point
here is that the tablespace definitions are known cluster-wide, so a
"DROP DATABASE x" command running in database y would still be able
to figure out which subdirectories it needs to zap.
        regards, tom lane


Re: [GENERAL] Physical Database Configuration

From
Shridhar Daithankar
Date:
On Thursday 26 June 2003 21:29, Tom Lane wrote:
> Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> > Well, consider this. Keep in mind that all of them are directories..
>
> I can see no reason that we'd want a level of directory associated with
> schemas...

Moving a multi-hundreds-of-GB table  across schemas would be sooo easy..:-)

I don't know how difficult/time consuming that is right now. Shouldn't be 
actually if PG updates the schema contents in it's catalog but anyway..

I just put it for clarification. If PG can do everything directory has to 
offer, well, we don't need directory for schemas. 
Shridhar



Re: [GENERAL] Physical Database Configuration

From
Tom Lane
Date:
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> On Thursday 26 June 2003 21:29, Tom Lane wrote:
>> I can see no reason that we'd want a level of directory associated with
>> schemas...

> Moving a multi-hundreds-of-GB table  across schemas would be sooo easy..:-)

No, it would be harder.
        regards, tom lane


Re: [GENERAL] Physical Database Configuration

From
nolan@celery.tssi.com
Date:
> > Well, with above proposal, drop database should be as simple. It's just that 
> > it would be more than one `rm -rf`rather than just one.
> 
> Right, there would be potentially one per tablespace.  The key point
> here is that the tablespace definitions are known cluster-wide, so a
> "DROP DATABASE x" command running in database y would still be able
> to figure out which subdirectories it needs to zap.

It sounds like you envision tablespaces (or at least the default tablespace)
as being above databases in some contexts.  Reducing other tablespaces
to mere subdirectories under the 'base' directory for a database sounds 
like a practical implementation measure.  

I presume that the 'global' directory is for stuff that is not specific 
to any one database within a database cluster.   If so, there is an 
inconsistency in the current directory structure in that SOME global 
information is in the main /usr/local/pgsql/data directory (or equivalent) 
while other global information is in the global subdirectory.

Being able to zap a database with one or more 'rm -rf' commands assumes 
that there will be files from just ONE database permitted in any given 
tablespace, and ONLY files from that database.  The former is probably 
a good thing to require and enforce, the latter is unenforcable but 
probably still advisable.

Is this doable within the time frame for the 7.4 feature freeze?  
--
Mike Nolan


Re: [GENERAL] Physical Database Configuration

From
Shridhar Daithankar
Date:
On Thursday 26 June 2003 21:56, nolan@celery.tssi.com wrote:
> Is this doable within the time frame for the 7.4 feature freeze?

Good question. 

If it gets in 7.4, that would be more than a killer feature to put against 7.4 
release, with due respect to all other enhancements in progress..
Shridhar



Re: [GENERAL] Physical Database Configuration

From
Tom Lane
Date:
nolan@celery.tssi.com writes:
> Being able to zap a database with one or more 'rm -rf' commands assumes 
> that there will be files from just ONE database permitted in any given 
> tablespace, and ONLY files from that database.

I said no such thing.  Look at the structure again:

$PGDATA/base/dboid/...stuff...

sometablespace/dboid/...stuff...

othertablespace/dboid/...stuff...

DROPDB needs to nuke <somepath>/dboid/ for each tablespace's associated
<somepath>.  The other design simplifies DROPDB at the cost of increased
complexity for every other tablespace management operation, since you'd
need to cope with a symlink in each database for each tablespace.

Also, this scheme is at least theoretically amenable to a symlink-free
implementation, though I personally don't give a darn whether
tablespaces are supported on Windows and thus wouldn't expend the extra
effort needed to keep track of full paths.  I'd want
$PGDATA/tablespaces/tboid to be a symlink to the root of the tablespace
with a given OID, and then the actual pathname used to access a table in
tablespace tboid, database dboid, table filenode rfoid would look like$PGDATA/tablespaces/tboid/dboid/rfoid
But a Windoze version could in theory keep track of tablespace locations
directly, and replace the first part of this path with the actual
tablespace location.  If we put tablespaces under directories then the
facility has zero functionality without symlinks, because you couldn't
actually do anything to segregate stuff within a database across
different devices.

BTW, we'd probably remove $PGDATA/base in favor of $PGDATA/tablespaces/N
for some fixed-in-advance N that is the system tablespace, and we'd
require all system catalogs to live in this tablespace --- certainly at
least pg_class and its indexes.  Otherwise you have circularity problems
in finding the catalogs ...
        regards, tom lane


Re: [GENERAL] Physical Database Configuration

From
Tom Lane
Date:
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> On Thursday 26 June 2003 21:56, nolan@celery.tssi.com wrote:
>> Is this doable within the time frame for the 7.4 feature freeze?

> Good question. 

ROTFL... the answer is no.  Feature freeze is Tuesday, people.  In
practice, the time to start coding new stuff is already long past.
Especially major new stuff.

If you start now you might have something done for 7.5.
        regards, tom lane


Re: [GENERAL] Physical Database Configuration

From
nolan@celery.tssi.com
Date:
> Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> > On Thursday 26 June 2003 21:56, nolan@celery.tssi.com wrote:
> >> Is this doable within the time frame for the 7.4 feature freeze?
> 
> > Good question. 
> 
> ROTFL... the answer is no.  Feature freeze is Tuesday, people.  In
> practice, the time to start coding new stuff is already long past.
> Especially major new stuff.
> 
> If you start now you might have something done for 7.5.

Forgive us, Tom, we are still learning how the cycle works.  (Or at least
I am.)  

I am also probably grossly underestimating the pervasiveness of 
implementing tablespaces.
--
Mike Nolan


Re: [GENERAL] Physical Database Configuration

From
"Christopher Kings-Lynne"
Date:
> Good question.
>
> If it gets in 7.4, that would be more than a killer feature to put against
7.4
> release, with due respect to all other enhancements in progress..

It's not going to happen.

Chris



Re: [GENERAL] Physical Database Configuration

From
"Christopher Kings-Lynne"
Date:
> > Tablespaces
> >   databases
> >    schemas
> >     objects
> >
> > with each of them implemented as a directory and data files under it. If
we
> > could get a quota check propogated in both direction, that would be
pretty
> > good, may be a warning when things start getting close to limit.

Database do not exist inside tablespaces, as they are not relations.  Only
tables, views, sequences and indexes will live in tablespaces.  Even then, I
probably won't bother allowing it for sequences and views since they don't
occupy much space.

A database can specify a DEFAULT tablespace, as can a schema, but they do
not live in tablespaces themselves.

Chris



Tablespaces (was Re: [GENERAL] Physical Database Configuration )

From
"Christopher Kings-Lynne"
Date:
> ROTFL... the answer is no.  Feature freeze is Tuesday, people.  In
> practice, the time to start coding new stuff is already long past.
> Especially major new stuff.
>
> If you start now you might have something done for 7.5.

Can everyone who is interested in actually coding a tablespaces
implementation please email me, and I will create a list and we'll get
cracking.  Maybe we'll have to run our own cvs or maybe the Postgres guys
will be nice and do a branch for us :)

So far, Shridhar and Gavin seem interested??

Chris



Re: Tablespaces (was Re: [GENERAL] Physical Database

From
Austin Gonyou
Date:
I thought Tablespaces were already implemented. Are they not?


On Thu, 2003-06-26 at 22:10, Christopher Kings-Lynne wrote:
> > ROTFL... the answer is no.  Feature freeze is Tuesday, people.  In
> > practice, the time to start coding new stuff is already long past.
> > Especially major new stuff.
> >
> > If you start now you might have something done for 7.5.
> 
> Can everyone who is interested in actually coding a tablespaces
> implementation please email me, and I will create a list and we'll get
> cracking.  Maybe we'll have to run our own cvs or maybe the Postgres guys
> will be nice and do a branch for us :)
> 
> So far, Shridhar and Gavin seem interested??
> 
> Chris
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
-- 
Austin Gonyou <austin@coremetrics.com>
Coremetrics, Inc.


Re: Tablespaces (was Re: [GENERAL] Physical Database

From
nolan@celery.tssi.com
Date:
> I thought Tablespaces were already implemented. Are they not?

Apparently not.  

A group has been formed to work on it, though.
--
Mike Nolan


Re: Tablespaces (was Re: [GENERAL] Physical Database

From
"scott.marlowe"
Date:
No, they're not.  There are some folks who have hacked on them in the 
past, but nothing's been committed.

On 27 Jun 2003, Austin Gonyou wrote:

> I thought Tablespaces were already implemented. Are they not?
> 
> 
> On Thu, 2003-06-26 at 22:10, Christopher Kings-Lynne wrote:
> > > ROTFL... the answer is no.  Feature freeze is Tuesday, people.  In
> > > practice, the time to start coding new stuff is already long past.
> > > Especially major new stuff.
> > >
> > > If you start now you might have something done for 7.5.
> > 
> > Can everyone who is interested in actually coding a tablespaces
> > implementation please email me, and I will create a list and we'll get
> > cracking.  Maybe we'll have to run our own cvs or maybe the Postgres guys
> > will be nice and do a branch for us :)
> > 
> > So far, Shridhar and Gavin seem interested??
> > 
> > Chris
> > 
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> 



Re: Tablespaces (was Re: [GENERAL] Physical Database

From
Christopher Kings-Lynne
Date:
No, they're not.

Chris

On 27 Jun 2003, Austin Gonyou wrote:

> I thought Tablespaces were already implemented. Are they not?
>
>
> On Thu, 2003-06-26 at 22:10, Christopher Kings-Lynne wrote:
> > > ROTFL... the answer is no.  Feature freeze is Tuesday, people.  In
> > > practice, the time to start coding new stuff is already long past.
> > > Especially major new stuff.
> > >
> > > If you start now you might have something done for 7.5.
> >
> > Can everyone who is interested in actually coding a tablespaces
> > implementation please email me, and I will create a list and we'll get
> > cracking.  Maybe we'll have to run our own cvs or maybe the Postgres guys
> > will be nice and do a branch for us :)
> >
> > So far, Shridhar and Gavin seem interested??
> >
> > Chris
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> --
> Austin Gonyou <austin@coremetrics.com>
> Coremetrics, Inc.
>



Re: [GENERAL] Physical Database Configuration

From
Bruce Momjian
Date:
Tom Lane wrote:
> nolan@celery.tssi.com writes:
> > I disagree.  Just as you can have multiple schemas within one database
> > you can have multiple tablespaces within one database.  
> 
> > And the tablespace is irrelevant as far as specifying an object is concerned.
> > A fully qualified object would be: 
> >     database.schema.object,
> > not tablespace.database.schema.object or database.tablespace.schema.object.
> 
> Right, the tablespace structure is really orthogonal to the
> database/schema structure.
> 
> I would envision tablespaces as being named by database-cluster-wide
> names, just as users and groups are.  Any given table could be placed
> in any tablespace (although perhaps we want to invent some permission
> mechanism here).
> 
> Physically a tablespace is a directory with sub-directories for
> databases under it --- so $PGDATA/base plays the role of the default
> tablespace for a cluster.  (The reason you need per-database
> sub-directories is mostly to support DROP DATABASE, which has to be
> able to nuke a database without knowing exactly what's in it.)  But
> this structure doesn't have anything to do with the logical structure
> of the database cluster.

Another good reason for per-database directories under the tablespace is
to prevent directories from containing too many files.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: [GENERAL] Physical Database Configuration

From
Rod Taylor
Date:
> Another good reason for per-database directories under the tablespace is
> to prevent directories from containing too many files.

Actually, I would take that as an reason not to have database
directories.

If the number of files becomes a concern, we would need some kind of a
hashing algorithm to disperse them appropriately.  The database
directory would just get in the way without really adding anything. Is
1000 per files as a directory limit still the rule of thumb for a limit?
A single TB sized db would start to run into those types of limits.

Re: [GENERAL] Physical Database Configuration

From
Gavin Sherry
Date:
On Fri, 18 Jul 2003, Bruce Momjian wrote:

> Tom Lane wrote:
> > nolan@celery.tssi.com writes:
> > > I disagree.  Just as you can have multiple schemas within one database
> > > you can have multiple tablespaces within one database.  
> > 
> > > And the tablespace is irrelevant as far as specifying an object is concerned.
> > > A fully qualified object would be: 
> > >     database.schema.object,
> > > not tablespace.database.schema.object or database.tablespace.schema.object.
> > 
> > Right, the tablespace structure is really orthogonal to the
> > database/schema structure.
> > 
> > I would envision tablespaces as being named by database-cluster-wide
> > names, just as users and groups are.  Any given table could be placed
> > in any tablespace (although perhaps we want to invent some permission
> > mechanism here).
> > 
> > Physically a tablespace is a directory with sub-directories for
> > databases under it --- so $PGDATA/base plays the role of the default
> > tablespace for a cluster.  (The reason you need per-database
> > sub-directories is mostly to support DROP DATABASE, which has to be
> > able to nuke a database without knowing exactly what's in it.)  But
> > this structure doesn't have anything to do with the logical structure
> > of the database cluster.
> 
> Another good reason for per-database directories under the tablespace is
> to prevent directories from containing too many files.

I have a 3/4s finish implementation of table spaces. Per database
directories under the table space are required because CREATE DATABASE
just copies the data directory of the template database. If the databases
were in the same table space, they would need to share system catalogs
and it would also not be possible to create a database in a new table
space because the pg_class and pg_namespace would have references to the
table spaces used by the template database, not the new table space.

I will elaborate further when I have a near complete patch.

Thanks,

Gavin