Thread: How many Cluster database on a single server
Hi everybody, I googled a bit around and also checked the mailing lists but I still can't make an idea. We plan to use postgres 9 and the Cluster Database Replica. My colleagues are asking how many Cluster Databases (initdb) can I create and run on a single server. I mean, supposed my server has the resources, can I create 100 or even 200 Cluster Databases? Everyone with the right configuration and in respect of the requisites? Or the postgres architecture doesn't provide similar numbers? We are thinking to use the replica from near 200 databases around the internet on a single db server. Does anyone already did something like this? BTW, this is my first email to postgresql mailing list. If I'm doing something wrong do not hesitate to correct me :) Thanks Davo
On 10/19/11 2:46 AM, d.davolio@mastertraining.it wrote: > Hi everybody, > I googled a bit around and also checked the mailing lists but I still can't make an idea. We plan to use postgres 9 andthe Cluster Database Replica. > My colleagues are asking how many Cluster Databases (initdb) can I create and run on a single server. I mean, supposedmy server has the resources, can I create 100 or even 200 Cluster Databases? Everyone with the right configurationand in respect of the requisites? > Or the postgres architecture doesn't provide similar numbers? > We are thinking to use the replica from near 200 databases around the internet on a single db server. You don't need to do initdb on each one. Postgres can create many databases on a single server and manage them without difficulty. We currently operate about 300 databases on a single server. Most are small, and one is an aggregate of all the small ones. I believe there are sites that have >1000 separate databases on one server. Postgres has a slightly different concept of a "database" than Oracle or MySQL, which is why your question about initdb isslightly off. You can indeed create several separate instances of Postgres (separate initdb for each), but the only reasonyou ever need to do that is if you're running different versions of Postgres (like 8.4 and 9.0) simultaneously. Postgres runs into problems when the total number of objects (tables, views, sequences, ...) across all databases gets verylarge, where "very large" is ill defined but is somewhere between a few hundred thousand and a million. We once hada rogue process that created 5 million tables, and we had to completely abandon the installation because of some sortof N^2 phenomenon that made it impossible to even use pg_dump to save and restore the system. So the advice is, "don'tdo dumb stuff like that" and you should be able to manage many databases. Craig
Hi Craig, thanks for your reply. I think I need to add some details on my question, like why we would need more than one Cluster Database. We are thinking to use the Streaming Replica feature to keep in sync a number of little DB servers around the net. The replica should happen on one or more centralized servers. I didn't tested the replica personally bus as I can see, it syncs the whole Cluster DB. So, on the centralized server(s), we will have perfect copies of the Cluster Databases. We sure need to test this configuration but first of all I was wondering if there are known drawbacks. Thanks again. On 10/19/2011 03:54 PM, Craig James wrote: > On 10/19/11 2:46 AM, d.davolio@mastertraining.it wrote: >> Hi everybody, >> I googled a bit around and also checked the mailing lists but I still >> can't make an idea. We plan to use postgres 9 and the Cluster >> Database Replica. >> My colleagues are asking how many Cluster Databases (initdb) can I >> create and run on a single server. I mean, supposed my server has the >> resources, can I create 100 or even 200 Cluster Databases? Everyone >> with the right configuration and in respect of the requisites? >> Or the postgres architecture doesn't provide similar numbers? >> We are thinking to use the replica from near 200 databases around the >> internet on a single db server. > You don't need to do initdb on each one. Postgres can create many > databases on a single server and manage them without difficulty. > > We currently operate about 300 databases on a single server. Most are > small, and one is an aggregate of all the small ones. I believe there > are sites that have >1000 separate databases on one server. > > Postgres has a slightly different concept of a "database" than Oracle > or MySQL, which is why your question about initdb is slightly off. > You can indeed create several separate instances of Postgres (separate > initdb for each), but the only reason you ever need to do that is if > you're running different versions of Postgres (like 8.4 and 9.0) > simultaneously. > > Postgres runs into problems when the total number of objects (tables, > views, sequences, ...) across all databases gets very large, where > "very large" is ill defined but is somewhere between a few hundred > thousand and a million. We once had a rogue process that created 5 > million tables, and we had to completely abandon the installation > because of some sort of N^2 phenomenon that made it impossible to even > use pg_dump to save and restore the system. So the advice is, "don't > do dumb stuff like that" and you should be able to manage many databases. > > Craig > >
"d.davolio@mastertraining.it" <d.davolio@mastertraining.it> wrote: > We are thinking to use the Streaming Replica feature to keep in > sync a number of little DB servers around the net. The replica > should happen on one or more centralized servers. I didn't tested > the replica personally bus as I can see, it syncs the whole > Cluster DB. So, on the centralized server(s), we will have perfect > copies of the Cluster Databases. We sure need to test this > configuration but first of all I was wondering if there are known > drawbacks. We do something very much like this with about 100 standby database clusters on a single machine. We don't have any illusion that we could switch to one of these for a normal production load and have good performance with all of these competing for resources -- it's primarily to confirm that the PITR backup process is working and staying up to date, and to provide a quick source for a copy to a standby production server. The one thing I would strongly recommend is that you use a separate OS user as the owner of each cluster's data directory (and, of course, to run the cluster's service). We didn't initially do this, and had problems on recovery when the server crashed. If you search the archives you can probably dig up all the details on why this is an issue and why separate users is a good solution; but really, this is important. -Kevin
On Wed, Oct 19, 2011 at 9:02 AM, d.davolio@mastertraining.it <d.davolio@mastertraining.it> wrote: > Hi Craig, > thanks for your reply. I think I need to add some details on my question, > like why we would need more than one Cluster Database. We are thinking to > use the Streaming Replica feature to keep in sync a number of little DB > servers around the net. The replica should happen on one or more centralized > servers. I didn't tested the replica personally bus as I can see, it syncs > the whole Cluster DB. So, on the centralized server(s), we will have perfect > copies of the Cluster Databases. > We sure need to test this configuration but first of all I was wondering if > there are known drawbacks. The problem with having so many clusters on one machine is the shared memory that each one needs. Even with a relatively small shared memory segment of say 16MB, with 100 clusters you're going to be using 1600MB of memory on that machine for shared memory. You might be better off with one cluster and using slony to replicate just the parts that need replication.
On 10/19/2011 05:46 PM, d.davolio@mastertraining.it wrote: > My colleagues are asking how many Cluster Databases (initdb) can I > create and run on a single server. I mean, supposed my server has the > resources, can I create 100 or even 200 Cluster Databases? Yep. It won't be fast, but it'll work. You'll have two performance problems to deal with: - The memory, CPU and disk I/O overhead of all those extra postmasters, bgwriters, autovacuum daemons etc running for each cluster; and - having to split the available shared memory up between each cluster, so no single cluster gets very much shared memory to use for shared_buffers. If you keep your shared_buffers low, it should work just fine, but it won't perform as well as a single PostgreSQL cluster with lots of databases. In the future I'm hoping someone'll be enthusiastic enough to / need to add support split WAL logging or partial replication so this sort of thing isn't necessary. For now it does seem to be the best way to handle cases where different databases need different replication. -- Craig Ringer
Thanks everybody for the suggestions. Now I have a better idea on which direction proceed. When we will have some results we will post again to share. Thanks again! On 10/19/2011 05:13 PM, Kevin Grittner wrote: > "d.davolio@mastertraining.it"<d.davolio@mastertraining.it> wrote: > >> We are thinking to use the Streaming Replica feature to keep in >> sync a number of little DB servers around the net. The replica >> should happen on one or more centralized servers. I didn't tested >> the replica personally bus as I can see, it syncs the whole >> Cluster DB. So, on the centralized server(s), we will have perfect >> copies of the Cluster Databases. We sure need to test this >> configuration but first of all I was wondering if there are known >> drawbacks. > > We do something very much like this with about 100 standby database > clusters on a single machine. We don't have any illusion that we > could switch to one of these for a normal production load and have > good performance with all of these competing for resources -- it's > primarily to confirm that the PITR backup process is working and > staying up to date, and to provide a quick source for a copy to a > standby production server. > > The one thing I would strongly recommend is that you use a separate > OS user as the owner of each cluster's data directory (and, of > course, to run the cluster's service). We didn't initially do this, > and had problems on recovery when the server crashed. If you search > the archives you can probably dig up all the details on why this is > an issue and why separate users is a good solution; but really, this > is important. > > -Kevin