Re: Architecting a database - Mailing list pgsql-performance
From | Kevin Grittner |
---|---|
Subject | Re: Architecting a database |
Date | |
Msg-id | 4C24E7570200002500032B5D@gw.wicourts.gov Whole thread Raw |
In response to | Architecting a database (tony@exquisiteimages.com) |
Responses |
Re: Architecting a database
Re: Architecting a database |
List | pgsql-performance |
<tony@exquisiteimages.com> wrote: > With the dBase and ISAM tables I have a good idea of how to handle > them since I have been working with them since dBASE originally > came out. Ah, someone with whom I can reminisce about CP/M and WordStar? :-) > With the PostgreSQL type tables I am not so certain how the data > is arranged within the one file. Does having the data all in one > database allow PostgreSQL to better utilize indexes and caches or > does having a number of smaller databases provide performance > increases? In case it is important, there are 2000 clients > involved, so that would be 2000 databases if I followed my current > FoxPro related structure. Well, there are many options here. You could have: - one PostgreSQL cluster for each client, - one database for each client (all in one cluster), - one schema for each client (all in one database), or - a client_id column in each table to segregate data. The first would probably be a maintenance nightmare; it's just listed for completeness. The cluster is the level at which you start and stop the database engine, do real-time backups through the database transaction logging, etc. You probably don't want to do that individually for each of 2,000 clients, I'm assuming. Besides that, each cluster has its own memory cache, which would probably be a problem for you. (The caching issues go away for all the following options.) The database is the level at which you can get a connection. You can see some cluster-level resources within all databases, like the list of databases and the list of users, but for the most part, each database is independent, even though they're running in the same executable engine. It would be relatively easy to keep the whole cluster (all databases) backed up (especially after 9.0 is release this summer), and you could have a cluster on another machine for standby, if desired. You are able to do dumps of individual databases, but only as snapshots of a moment in time or through external tools. It's hard to efficiently join data from a table in one database to a table in another. A schema is a logical separation within a database. Table client1.account is a different table from client2.account. While a user can be limited to tables within a single schema, a user with rights to all the tables can join between them as needed. You could put common reference data in a public schema which all users could access in addition to their private schemas. The implications of putting multiple clients in a table, with a client's rows identified by a client_id column, are probably fairly obvious. If many of those 2,000 clients have tables with millions of rows, performance could suffer without very careful indexing, managing tables with billions of rows can become challenging, and there could be concerns about how to ensure that data from one client isn't accidentally shown to another. Hopefully that's enough to allow you to make a good choice. If any of that wasn't clear, please ask. -Kevin
pgsql-performance by date: