Re: One huge db vs many small dbs - Mailing list pgsql-performance
From | Oliver Seemann |
---|---|
Subject | Re: One huge db vs many small dbs |
Date | |
Msg-id | 20131207124318.GA11696@server Whole thread Raw |
In response to | One huge db vs many small dbs (Max <maxabbr@yahoo.com.br>) |
List | pgsql-performance |
On Thu, Dec 05, 2013 at 02:42:10AM -0800, Max wrote: > Hello, > > We are starting a new project to deploy a solution in cloud with the > possibility to be used for 2.000+ clients. Each of this clients will use > several tables to store their information (our model has about 500+ > tables but there's less than 100 core table with heavy use). Also the > projected ammout of information per client could be from small (few > hundreds tuples/MB) to huge (few millions tuples/GB). > > One of the many questions we have is about performance of the db if we > work with only one (using a ClientID to separete de clients info) or > thousands of separate dbs. The management of the dbs is not a huge > concert as we have an automated tool. > > At Google there's lots of cases about this subject but none have a > scenario that matchs with the one I presented above, so I would like to > know if anyone here has a similar situation or knowledgement and could > share some thoughts. We have made very good experiences with putting each client into its own database. We have a few thousand dbs now on 5 machines (each 1TB capacity) where each client/db is between 100MB and 100GB of data. As Josh said you have to consider the db overhead. If you have only a few MBs of data per client it might not be worth it. (An empty DB shows up with 6MB size in psql \l+). The good thing with a db per client is you can easily scale horizontically by just adding machines. We have between 100 and 1000 dbs per machine, depending on client size. There's no real limit on growth regarding client numbers, we can just always add more machines. We can also easily move clients between machines with pg_dump piped into pg_restore. I would not advise using one schema per client, because then you lose the ability to really use schemas within each client 'namespace'. Afaik schemas cannot be stacked in Postgres. Schemas are very helpful to seperate different applications or to implement versioning for complex views/functions, so let's not waste them for partitioning. Further things we learned: - "CREATE DATABASE foo TEMPLATE bar" is a nice way to cleanly create a new partition/client based on a template database. - On a very busy server (I/O wise) CREATE DATABASE can take a while to complete, due to the enforced CHECKPOINT when creating a new DB. We worked around this by creating empty dbs from the template beforehand, allocating (renaming) them on demand and periodically restocking those spare dbs. - pg_dump/pg_restore on individual client dbs is a neat way to implement backup/restore. It allows you to backup all clients sequentially as well as concurrently (especially from multiple machines) depending on your requirements. - When partitioning into databases it's not trivial to reference data in other databases. E.g. you can't have foreign keys to your main db (where you track all clients and their dbs). This could probably be worked around with dblink / foreign data wrappers if necessary. - We just completed painless migration from 9.0 to 9.3 simply by installing 9.3 next to 9.0 on all machines and selectively migrating individual client dbs with pg_dump | pg_restore over a period of 6 weeks. (We did not notice any problems btw). - Queries over the data of all clients (e.g. for internal monitoring or statistics) naturally take a while as you'll have to connect to all individual dbs and then manually aggregate the result from each one. - Schema changes are not trivial as you need to develop tools to apply them to all client dbs and template dbs (in our case). It gets tricky when the process in interrupted and there are race conditions when new dbs are created in the process that you need to protect against. - Deleting all data of an individual client is a simple as dropping the db. Hope that helps. Oliver
pgsql-performance by date: