RE: Question about databases in alternate locations... - Mailing list pgsql-general

From Culberson, Philip
Subject RE: Question about databases in alternate locations...
Date
Msg-id A95EFC3B707BD311986C00A0C9E95B6A01AB1CB9@datmail03.dat.com
Whole thread Raw
In response to Question about databases in alternate locations...  (Richard J Kuhns <rjk@grauel.com>)
List pgsql-general
I've worked with various versions of Oracle for several years and can share
some of my experiences with their "system catalog" implementation.

They use a fairly simple design in which a database instance consists of 1
.. n tablespaces (that can contain any type of database object) which in
turn consists of 1 .. n datafiles.  There is a system table which
essentially holds the name of the physical file (full path), file_id,
tablespace it belongs to, and sizing information.  Our database (110 Gig) is
split up into 24 tablespaces and these are further split into 73 datafiles.

Moving the physical location of a datafile is fairly straight forward.
  1) ALTER TABLESPACE foo OFFLINE
  2) Move the physical file using OS command
  3) ALTER TABLESPACE foo RENAME DATAFILE '/old/file' TO '/new/file'
  4) ALTER TABLESPACE foo ONLINE

Most of our datafiles run about 2 Gig, so the longest part of this is
actually doing the move.

One headache is if you want to completely change the locations of ALL your
files.  This involves editing all of the paths and is definitely prone to
error.  This is also where you rabidly curse the DBA who decided to have
path names that are 140 characters long!

A second headache is moving databases from one server to another.  You are
then forced into having the exact same file structure on the second machine.
This can be somewhat amusing if you have a different hardware configuration
which doesn't have the same number of disks, etc.

This second problem is further complicated by some of the backup solutions
available for Oracle. The one that we have uses the system catalog to locate
and backup the appropriate files. This again means that if you want to
restore the backup to another server, it must be configured exactly as the
first.

I think that Thomas' fear of having thousands of entries in the system
catalog for datafiles is alleviated in the Oracle implementation by the use
of the tablespace.  Tablespaces can contain any number of database objects,
so by using a reasonable tablespace layout, one can keep the number of
actual datafiles to a manageable level.

One thing that has been definitely useful is the ability to do load
balancing based on what tablespaces are "hot".  Our system is somewhat of a
cross between OLTP and a data warehouse (don't get me started) so the data
becomes pretty static after, say, about 30 days.  By monitoring which
datafiles are being accessed the most, they can be moved to different
locations on the storage array to avoid contention and maximize throughput.

My first reaction to the suggestion of a pg_location like table was "ARGH,
NO!", but after nursing my sprained back from that violent knee jerk
reaction and actually thinking about it, I talked myself into thinking it'd
probably be a good idea.  If we had our online system built on top of
Postgres, we would need a filesystem with 110+ Gig of disk space and there
would be roughly 3,500 files in its single data directory.  Having the
ability to organize tables, indices, etc into tablespaces, and then
distributing the datafiles in some quasi intelligent fashion is truly pretty
powerful.

Phil Culberson

pgsql-general by date:

Previous
From: Michael Ma
Date:
Subject: How to stop syslogd@mylinux displaying message?
Next
From: "Richard J. Kuhns"
Date:
Subject: [HACKERS] Re: Question about databases in alternate locations...