Re: One huge db vs many small dbs - Mailing list pgsql-performance

From Oliver Seemann
Subject Re: One huge db vs many small dbs
Date
Msg-id 20131207124318.GA11696@server
Whole thread Raw
In response to One huge db vs many small dbs  (Max <maxabbr@yahoo.com.br>)
List pgsql-performance
On Thu, Dec 05, 2013 at 02:42:10AM -0800, Max wrote:
> Hello,
>
> We are starting a new project to deploy a solution in cloud with the
> possibility to be used for 2.000+ clients. Each of this clients will use
> several tables to store their information (our model has about 500+
> tables but there's less than 100 core table with heavy use). Also the
> projected ammout of information per client could be from small (few
> hundreds tuples/MB) to huge (few millions tuples/GB).
>
> One of the many questions we have is about performance of the db if we
> work with only one (using a ClientID to separete de clients info) or
> thousands of separate dbs. The management of the dbs is not a huge
> concert as we have an automated tool.
>
> At Google there's lots of cases about this subject but none have a
> scenario that matchs with the one I presented above, so I would like to
> know if anyone here has a similar situation or knowledgement and could
> share some thoughts.


We have made very good experiences with putting each client into its own
database. We have a few thousand dbs now on 5 machines (each 1TB capacity)
where each client/db is between 100MB and 100GB of data.
As Josh said you have to consider the db overhead. If you have only a few
MBs of data per client it might not be worth it. (An empty DB shows up with
6MB size in psql \l+).

The good thing with a db per client is you can easily scale horizontically
by just adding machines. We have between 100 and 1000 dbs per machine,
depending on client size. There's no real limit on growth regarding client
numbers, we can just always add more machines. We can also easily move
clients between machines with pg_dump piped into pg_restore.

I would not advise using one schema per client, because then you lose the
ability to really use schemas within each client 'namespace'. Afaik schemas
cannot be stacked in Postgres. Schemas are very helpful to seperate
different applications or to implement versioning for complex
views/functions, so let's not waste them for partitioning.

Further things we learned:

- "CREATE DATABASE foo TEMPLATE bar" is a nice way to cleanly create a new
partition/client based on a template database.

- On a very busy server (I/O wise) CREATE DATABASE can take a while to
complete, due to the enforced CHECKPOINT when creating a new DB. We worked
around this by creating empty dbs from the template beforehand, allocating
(renaming) them on demand and periodically restocking those spare dbs.

- pg_dump/pg_restore on individual client dbs is a neat way to implement
backup/restore. It allows you to backup all clients sequentially as well as
concurrently (especially from multiple machines) depending on your
requirements.

- When partitioning into databases it's not trivial to reference data in
other databases. E.g. you can't have foreign keys to your main db (where
you track all clients and their dbs). This could probably be worked around
with dblink / foreign data wrappers if necessary.

- We just completed painless migration from 9.0 to 9.3 simply by installing
9.3 next to 9.0 on all machines and selectively migrating individual client
dbs with pg_dump | pg_restore over a period of 6 weeks. (We did not notice
any problems btw).

- Queries over the data of all clients (e.g. for internal monitoring or
statistics) naturally take a while as you'll have to connect to all
individual dbs and then manually aggregate the result from each one.

- Schema changes are not trivial as you need to develop tools to apply them
to all client dbs and template dbs (in our case). It gets tricky when the
process in interrupted and there are race conditions when new dbs are
created in the process that you need to protect against.

- Deleting all data of an individual client is a simple as dropping the db.


Hope that helps.

Oliver


pgsql-performance by date:

Previous
From: chidamparam muthusamy
Date:
Subject: Re: postgres performance
Next
From: desmodemone
Date:
Subject: Re: postgres performance