Re: Architecting a database - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: Architecting a database
Date
Msg-id 4C24E7570200002500032B5D@gw.wicourts.gov
Whole thread Raw
In response to Architecting a database  (tony@exquisiteimages.com)
Responses Re: Architecting a database
Re: Architecting a database
List pgsql-performance
<tony@exquisiteimages.com> wrote:

> With the dBase and ISAM tables I have a good idea of how to handle
> them since I have been working with them since dBASE originally
> came out.

Ah, someone with whom I can reminisce about CP/M and WordStar?  :-)

> With the PostgreSQL type tables I am not so certain how the data
> is arranged within the one file. Does having the data all in one
> database allow PostgreSQL to better utilize indexes and caches or
> does having a number of smaller databases provide performance
> increases? In case it is important, there are 2000 clients
> involved, so that would be 2000 databases if I followed my current
> FoxPro related structure.

Well, there are many options here.  You could have:
 - one PostgreSQL cluster for each client,
 - one database for each client (all in one cluster),
 - one schema for each client (all in one database), or
 - a client_id column in each table to segregate data.

The first would probably be a maintenance nightmare; it's just
listed for completeness.  The cluster is the level at which you
start and stop the database engine, do real-time backups through the
database transaction logging, etc.  You probably don't want to do
that individually for each of 2,000 clients, I'm assuming.  Besides
that, each cluster has its own memory cache, which would probably be
a problem for you.  (The caching issues go away for all the
following options.)

The database is the level at which you can get a connection.  You
can see some cluster-level resources within all databases, like the
list of databases and the list of users, but for the most part, each
database is independent, even though they're running in the same
executable engine.  It would be relatively easy to keep the whole
cluster (all databases) backed up (especially after 9.0 is release
this summer), and you could have a cluster on another machine for
standby, if desired.  You are able to do dumps of individual
databases, but only as snapshots of a moment in time or through
external tools.  It's hard to efficiently join data from a table in
one database to a table in another.

A schema is a logical separation within a database.  Table
client1.account is a different table from client2.account.  While a
user can be limited to tables within a single schema, a user with
rights to all the tables can join between them as needed.  You could
put common reference data in a public schema which all users could
access in addition to their private schemas.

The implications of putting multiple clients in a table, with a
client's rows identified by a client_id column, are probably fairly
obvious. If many of those 2,000 clients have tables with millions of
rows, performance could suffer without very careful indexing,
managing tables with billions of rows can become challenging, and
there could be concerns about how to ensure that data from one
client isn't accidentally shown to another.

Hopefully that's enough to allow you to make a good choice.  If any
of that wasn't clear, please ask.

-Kevin

pgsql-performance by date:

Previous
From: Tom Molesworth
Date:
Subject: Re: Re: sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Next
From: Craig James
Date:
Subject: Re: Architecting a database