Thread: Partitioning Vs. Split Databases - performance?
I'm breaking up a database into several sets of data with similar layout. (we currently have multiple customers using a single database and tableset, we're splitting it out to give us more "wiggle room") It seems that there are basically two ways to proceed: 1) Copy out the data specific to a customer and load into a separate database for that customer, or 2) Copy out the data specific to a customer and load into separate tables (with slightly different names, EG table "dates" becomes "cust1_dates") and use data partitioning to help with performance as needed. Given the same physical hardware, which one is likely to perform better? Does it make any difference? Does using separate databases use more RAM than a single database with a bunch of different tables? Company is growing rapidly, so growth room is important... -Ben -- "The best way to predict the future is to invent it." - XEROX PARC slogan, circa 1978
On Thu, 2006-12-21 at 11:10 -0800, Benjamin Smith wrote: > I'm breaking up a database into several sets of data with similar layout. (we > currently have multiple customers using a single database and tableset, we're > splitting it out to give us more "wiggle room") > > It seems that there are basically two ways to proceed: > > 1) Copy out the data specific to a customer and load into a separate database > for that customer, or > > 2) Copy out the data specific to a customer and load into separate tables > (with slightly different names, EG table "dates" becomes "cust1_dates") and > use data partitioning to help with performance as needed. 3) Put each customer in their own schema/namespace which resides within its own table space. Then you can move customers wherever you need in terms of IO. Joshua D. Drake > > Given the same physical hardware, which one is likely to perform better? Does > it make any difference? Does using separate databases use more RAM than a > single database with a bunch of different tables? > > Company is growing rapidly, so growth room is important... > > -Ben -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/21/06 13:17, Joshua D. Drake wrote: > On Thu, 2006-12-21 at 11:10 -0800, Benjamin Smith wrote: >> I'm breaking up a database into several sets of data with similar layout. (we >> currently have multiple customers using a single database and tableset, we're >> splitting it out to give us more "wiggle room") >> >> It seems that there are basically two ways to proceed: >> >> 1) Copy out the data specific to a customer and load into a separate database >> for that customer, or This gives you linear growth potential, since if your current box gets over-utilized, buy a 2nd box and move some of the databases to it. >> 2) Copy out the data specific to a customer and load into separate tables >> (with slightly different names, EG table "dates" becomes "cust1_dates") and >> use data partitioning to help with performance as needed. Definitely *not* scalable. And *very* messy. Yech. > 3) Put each customer in their own schema/namespace which resides within > its own table space. > > Then you can move customers wherever you need in terms of IO. Splitting like mentioned in these three tactics means that you've now got 2x as many tables. Add more customers and you've got that many more tables. Perfect candidate for "schema drift". If each table has cust_id in it, then you could: 4) retain 1 database and partition each table on cust_id. http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html >> Given the same physical hardware, which one is likely to perform better? Does >> it make any difference? Does using separate databases use more RAM than a >> single database with a bunch of different tables? Config files are global, so I doubt it. >> Company is growing rapidly, so growth room is important... Then go for Option 1. - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFiuTPS9HxQb37XmcRAoF/AJ0ZUcj9C3Bwn7II0hfFzFrZjzA2wQCg6pNS Tbmm4Rr8uluu/hjZ5gqrT9s= =lkm+ -----END PGP SIGNATURE-----
On 12/21/06, Ron Johnson <ron.l.johnson@cox.net> wrote:
if it's a web app with persistent connections, then splitting onto several databases may consume more RAM. Example: 100 apache clients connected to 3 databases creates 300 forked postmaster processes ; vs 100 apache clients connected to the same DB using three schemas only takes 100 postmasters
-- Vlad
>> Given the same physical hardware, which one is likely to perform better? Does
>> it make any difference? Does using separate databases use more RAM than a
>> single database with a bunch of different tables?
Config files are global, so I doubt it.
if it's a web app with persistent connections, then splitting onto several databases may consume more RAM. Example: 100 apache clients connected to 3 databases creates 300 forked postmaster processes ; vs 100 apache clients connected to the same DB using three schemas only takes 100 postmasters
-- Vlad
On Thu, 21 Dec 2006, Vlad wrote: > On 12/21/06, Ron Johnson <ron.l.johnson@cox.net> wrote: >> >> >> >> Given the same physical hardware, which one is likely to perform >> better? Does >> >> it make any difference? Does using separate databases use more RAM than >> a >> >> single database with a bunch of different tables? >> >> Config files are global, so I doubt it. >> > > if it's a web app with persistent connections, then splitting onto several > databases may consume more RAM. Example: 100 apache clients connected to 3 > databases creates 300 forked postmaster processes ; vs 100 apache clients > connected to the same DB using three schemas only takes 100 postmasters > > -- Vlad Using something like pgpool between the web servers and the DB should help with that scaling problem... - Marc
I will also second that and if you got a lot of data, go for table partitioning as well but will not recommend dividing into different databases.
-----------------
Shoaib Mir
EnterpriseDB ( www.enterprisedb.com)
-----------------
Shoaib Mir
EnterpriseDB ( www.enterprisedb.com)
On 12/22/06, Vlad <marchenko@gmail.com> wrote:
On 12/21/06, Ron Johnson <ron.l.johnson@cox.net> wrote:
>> Given the same physical hardware, which one is likely to perform better? Does
>> it make any difference? Does using separate databases use more RAM than a
>> single database with a bunch of different tables?
Config files are global, so I doubt it.
if it's a web app with persistent connections, then splitting onto several databases may consume more RAM. Example: 100 apache clients connected to 3 databases creates 300 forked postmaster processes ; vs 100 apache clients connected to the same DB using three schemas only takes 100 postmasters
-- Vlad
On Thursday 21 December 2006 11:47, Ron Johnson wrote: > This gives you linear growth potential, since if your current box > gets over-utilized, buy a 2nd box and move some of the databases to it. So far, I'm inclined to go this way, due to the option for linear scaling. > >> 2) Copy out the data specific to a customer and load into separate tables > >> (with slightly different names, EG table "dates" becomes "cust1_dates") > >> and > >> use data partitioning to help with performance as needed. > > Definitely *not* scalable. And *very* messy. Yech. Scales better than present, methinks, but still not the best idea. I'd have to revisit all my queries to make sure that they use the correct tablename. > > 3) Put each customer in their own schema/namespace which resides within > > its own table space. > > > > Then you can move customers wherever you need in terms of IO. How is that functionally different than using a separate database? What's the advantage here? I don't *need* to restrict myself to one database, and doing this does require that I revisit 100% of the SQL queries to make sure that I'm referencing the right schema. This solution seems to have the same problems as using dynamic tablenames. > Splitting like mentioned in these three tactics means that you've > now got 2x as many tables. Add more customers and you've got that > many more tables. Perfect candidate for "schema drift". > > If each table has cust_id in it, then you could: > > 4) retain 1 database and partition each table on cust_id. > > http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html Intriguing idea, and one that I might do in the future. However, I have 170 normalized tables defined at present. The task of auditing each of these tables for the constraint ranges sounds somewhat nightmarish. Not all the tables have the cust_id record defined - some are implicit. > >> Given the same physical hardware, which one is likely to perform better? Does > >> it make any difference? Does using separate databases use more RAM than a > >> single database with a bunch of different tables? > > Config files are global, so I doubt it. > > >> Company is growing rapidly, so growth room is important... > > Then go for Option 1. My conclusion, too. Another poster mentioned schema drift, and that's a real concern, but we're already updating the schema through a script which could be extended to update all databases, not just one, so I feel this problem would be minor to non-existent. Thanks! -Ben -- "The best way to predict the future is to invent it." - XEROX PARC slogan, circa 1978
> > > 3) Put each customer in their own schema/namespace which resides within > > > its own table space. > > > > > > Then you can move customers wherever you need in terms of IO. > > How is that functionally different than using a separate database? What's the > advantage here? I don't *need* to restrict myself to one database, and doing > this does require that I revisit 100% of the SQL queries to make sure that > I'm referencing the right schema. > > This solution seems to have the same problems as using dynamic tablenames. Not really. You should read up on schemas and how they work. Plus the addition of schemas and table spaces means you can infinite scaling within the confines of your hardware itself. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/21/06 16:41, Joshua D. Drake wrote: >>>> 3) Put each customer in their own schema/namespace which resides within >>>> its own table space. >>>> >>>> Then you can move customers wherever you need in terms of IO. >> How is that functionally different than using a separate database? What's the >> advantage here? I don't *need* to restrict myself to one database, and doing >> this does require that I revisit 100% of the SQL queries to make sure that >> I'm referencing the right schema. >> >> This solution seems to have the same problems as using dynamic tablenames. > > Not really. You should read up on schemas and how they work. Plus the > addition of schemas and table spaces means you can infinite scaling > within the confines of your hardware itself. "infinite scaling within the confines of your hardware"! How is that accomplished? - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFixCWS9HxQb37XmcRAnw/AJ4obPHIHvJcRKq1xzILN7YtKfQscACg1uaq c6FRxkXjP/Pneyy1lxA+Dl8= =iFX6 -----END PGP SIGNATURE-----
On Thu, 2006-12-21 at 16:54 -0600, Ron Johnson wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 12/21/06 16:41, Joshua D. Drake wrote: > >>>> 3) Put each customer in their own schema/namespace which resides within > >>>> its own table space. > >>>> > >>>> Then you can move customers wherever you need in terms of IO. > >> How is that functionally different than using a separate database? What's the > >> advantage here? I don't *need* to restrict myself to one database, and doing > >> this does require that I revisit 100% of the SQL queries to make sure that > >> I'm referencing the right schema. > >> > >> This solution seems to have the same problems as using dynamic tablenames. > > > > Not really. You should read up on schemas and how they work. Plus the > > addition of schemas and table spaces means you can infinite scaling > > within the confines of your hardware itself. > > "infinite scaling within the confines of your hardware"! > > How is that accomplished? Well with a tablespace, you can place the entire schema, easily on its own array. Say a nice little HP-MSA30. If that array gets tired you can add another array, and move a couple of schemas off to another array. Also if you spec the hardware correctly, you can get up to a total of 16 cores without any significant cost... just add cpus as needed. PostgreSQL 8.1 does extremely well up to 8 cpus (8 cores). 8.2 is said to perform better on > 8 cores, but I have not tested it. Sincerely, Joshua D. Drake > > - -- > Ron Johnson, Jr. > Jefferson LA USA > > Is "common sense" really valid? > For example, it is "common sense" to white-power racists that > whites are superior to blacks, and that those with brown skins > are mud people. > However, that "common sense" is obviously wrong. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.6 (GNU/Linux) > > iD8DBQFFixCWS9HxQb37XmcRAnw/AJ4obPHIHvJcRKq1xzILN7YtKfQscACg1uaq > c6FRxkXjP/Pneyy1lxA+Dl8= > =iFX6 > -----END PGP SIGNATURE----- > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org/ > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
On Thursday 21 December 2006 14:41, Joshua D. Drake wrote: >You should read up on schemas and how they work. Plus the > addition of schemas and table spaces means you can infinite scaling > within the confines of your hardware itself. Ok, so I'd like you to correct me if I'm wrong: 1) Schemas operate within a database. A schema is analogized as a filesystem directory in the docs, except that you can't recurse schemas. 2) A database runs on one machine, with the following addenda: A) slony lets you copy that database to another system, B) pgtool lets you duplicate the database if you're real careful around updating with unique IDs and aggregate functions, C) you can essentially do cross-machine RAID so that if your primary DB gets hosed, you can fire up the backup machine and continue working. D) pg-cluster, a synchronous clustering solution appears to be new/unstable, doesn't appear to be current to 8.x, and takes a performance hit on writes. 3) Thus, any service that splits up a database (EG: a schema) is subject to all the limitations outlined in #2. Did I miss anything? -- "The best way to predict the future is to invent it." - XEROX PARC slogan, circa 1978
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/21/06 17:15, Joshua D. Drake wrote: > On Thu, 2006-12-21 at 16:54 -0600, Ron Johnson wrote: > On 12/21/06 16:41, Joshua D. Drake wrote: [snip] >>>>> This solution seems to have the same problems as using dynamic tablenames. >>>> Not really. You should read up on schemas and how they work. Plus the >>>> addition of schemas and table spaces means you can infinite scaling >>>> within the confines of your hardware itself. > "infinite scaling within the confines of your hardware"! > > How is that accomplished? > >> Well with a tablespace, you can place the entire schema, easily on its >> own array. Say a nice little HP-MSA30. If that array gets tired you can >> add another array, and move a couple of schemas off to another array. > >> Also if you spec the hardware correctly, you can get up to a total of 16 >> cores without any significant cost... just add cpus as needed. > >> PostgreSQL 8.1 does extremely well up to 8 cpus (8 cores). 8.2 is said >> to perform better on > 8 cores, but I have not tested it. Sure, but *infinite*? Or were you exercising hyperbole? With One Big Database, you can get a SAN and attach a whole lot of disk space, but your mobo will only accept a certain number of DIMMs and processors of certain designs. And when your growing mega database maxes out your h/w, you're stuck. - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFixgDS9HxQb37XmcRAm5TAKCVGpWvb9SKB+Kk44J/88WTEhnYoQCeKgcd BhorUjImDd+SldObdWuhFjw= =lSm5 -----END PGP SIGNATURE-----
Depending on what kind of company you have, you may (or may not) want to consider how you will upgrade the schema over time. Might it be the case that some customers will need a different schema than others? If so, a simple data partitioning by customer id probably won't be the best option. On Thu, 21 Dec 2006, Benjamin Smith wrote: > I'm breaking up a database into several sets of data with similar layout. (we > currently have multiple customers using a single database and tableset, we're > splitting it out to give us more "wiggle room") > > It seems that there are basically two ways to proceed: > > 1) Copy out the data specific to a customer and load into a separate database > for that customer, or > > 2) Copy out the data specific to a customer and load into separate tables > (with slightly different names, EG table "dates" becomes "cust1_dates") and > use data partitioning to help with performance as needed. > > Given the same physical hardware, which one is likely to perform better? Does > it make any difference? Does using separate databases use more RAM than a > single database with a bunch of different tables? > > Company is growing rapidly, so growth room is important... > > -Ben > -- > "The best way to predict the future is to invent it." > - XEROX PARC slogan, circa 1978 > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
On Thu, 2006-12-21 at 15:22 -0800, Benjamin Smith wrote: > On Thursday 21 December 2006 14:41, Joshua D. Drake wrote: > >You should read up on schemas and how they work. Plus the > > addition of schemas and table spaces means you can infinite scaling > > within the confines of your hardware itself. > > Ok, so I'd like you to correct me if I'm wrong: > > 1) Schemas operate within a database. A schema is analogized as a filesystem > directory in the docs, except that you can't recurse schemas. Kind of... think of a schema as a database without a new connection :). catalog/cluster ->[n] databases ->[n] schemas (of which public is default) ->[n] objects (tables, functions, views etc..) By default, your schema is public. You can create a new schema called customer_one. Within public you could have table public.accounts and within customer_one you could have customer_one.accounts. They are isolated and contain seperate data sets. You can set the schema based on the user connecting or as part of your queries with set search_path. E.g; set search_path = customer_one; select * from accounts; Will only grab customer_one.accounts. Thus giving virtual multiple database access without multiple databases. > > 2) A database runs on one machine, with the following addenda: > A) slony lets you copy that database to another system, Replicate, not copy (being pedantic but it is for good reason). > B) pgtool lets you duplicate the database if you're real careful around > updating with unique IDs and aggregate functions, You mean pgpool? Yeah, but don't use it for that, use Slony or replicator instead. > C) you can essentially do cross-machine RAID so that if your primary DB gets > hosed, you can fire up the backup machine and continue working. With Slony, Replicator or PITR yes. > D) pg-cluster, a synchronous clustering solution appears to be new/unstable, > doesn't appear to be current to 8.x, and takes a performance hit on writes. Yes. > > 3) Thus, any service that splits up a database (EG: a schema) is subject to > all the limitations outlined in #2. Well and subject to your app understanding what to do, when. Joshua D. Drake > > Did I miss anything? > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
> >> Also if you spec the hardware correctly, you can get up to a total of 16 > >> cores without any significant cost... just add cpus as needed. > > > >> PostgreSQL 8.1 does extremely well up to 8 cpus (8 cores). 8.2 is said > >> to perform better on > 8 cores, but I have not tested it. > > Sure, but *infinite*? Or were you exercising hyperbole? I did say, infinite with the *confines* of the hardware :) > > With One Big Database, you can get a SAN and attach a whole lot of > disk space, but your mobo will only accept a certain number of DIMMs > and processors of certain designs. And when your growing mega > database maxes out your h/w, you're stuck. Define mega... Because you would need to be in the multi-terrabyte range. Joshua D. Drake > > - -- > Ron Johnson, Jr. > Jefferson LA USA > > Is "common sense" really valid? > For example, it is "common sense" to white-power racists that > whites are superior to blacks, and that those with brown skins > are mud people. > However, that "common sense" is obviously wrong. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.6 (GNU/Linux) > > iD8DBQFFixgDS9HxQb37XmcRAm5TAKCVGpWvb9SKB+Kk44J/88WTEhnYoQCeKgcd > BhorUjImDd+SldObdWuhFjw= > =lSm5 > -----END PGP SIGNATURE----- > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/21/06 18:12, Joshua D. Drake wrote: >>>> Also if you spec the hardware correctly, you can get up to a total of 16 >>>> cores without any significant cost... just add cpus as needed. >>>> PostgreSQL 8.1 does extremely well up to 8 cpus (8 cores). 8.2 is said >>>> to perform better on > 8 cores, but I have not tested it. >> Sure, but *infinite*? Or were you exercising hyperbole? > > I did say, infinite with the *confines* of the hardware :) :) >> With One Big Database, you can get a SAN and attach a whole lot of >> disk space, but your mobo will only accept a certain number of DIMMs >> and processors of certain designs. And when your growing mega >> database maxes out your h/w, you're stuck. > > Define mega... Because you would need to be in the multi-terrabyte > range. I'm thinking more of RAM and CPU. - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFi0wFS9HxQb37XmcRAoaZAJ9s4msf5+3zfInemHzOObYwV4THSgCgihbj oV4EcWu9/YtO75po/Bi9rys= =PmCt -----END PGP SIGNATURE-----
> >> With One Big Database, you can get a SAN and attach a whole lot of > >> disk space, but your mobo will only accept a certain number of DIMMs > >> and processors of certain designs. And when your growing mega > >> database maxes out your h/w, you're stuck. > > > > Define mega... Because you would need to be in the multi-terrabyte > > range. > > I'm thinking more of RAM and CPU. 32GB is an awful lot of ram... as is 8 cores. You can get 16 core machines now that will take 64GB. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/21/06 23:44, Joshua D. Drake wrote: >>>> With One Big Database, you can get a SAN and attach a whole lot of >>>> disk space, but your mobo will only accept a certain number of DIMMs >>>> and processors of certain designs. And when your growing mega >>>> database maxes out your h/w, you're stuck. >>> Define mega... Because you would need to be in the multi-terrabyte >>> range. >> I'm thinking more of RAM and CPU. > > 32GB is an awful lot of ram... as is 8 cores. You can get 16 core > machines now that will take 64GB. Sure, you *can*. A Sun Fire X4600 (max 16 cores) takes up to 128GB RAM. I'll bet my house, though, that OP doesn't have a maxed-out X4600. (BTW, maxed out, with 2 U320 adapters and 6 4Gb HBAs, it's only $75000!) Please tell me, though, what's so controversial about saying that CPU and RAM resources are finite? Besides, since pg_dump is single-threaded, backing up a huge database gets impossible. Federating the database allows multiple pg_dumps to simultaneously dump data to multiple tape drives. - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFi3zlS9HxQb37XmcRAtVjAKDGPu4jHPyTH9EX5sqPbuHs+wKE3gCeLnnx RvSfELtJ7bieg3HVVqB/7Zk= =FQCv -----END PGP SIGNATURE-----
Ron Johnson <ron.l.johnson@cox.net> writes: > Besides, since pg_dump is single-threaded, backing up a huge > database gets impossible. Federating the database allows multiple > pg_dumps to simultaneously dump data to multiple tape drives. ... as long as you don't care about having a self-consistent dump ... regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/22/06 01:22, Tom Lane wrote: > Ron Johnson <ron.l.johnson@cox.net> writes: >> Besides, since pg_dump is single-threaded, backing up a huge >> database gets impossible. Federating the database allows multiple >> pg_dumps to simultaneously dump data to multiple tape drives. > > ... as long as you don't care about having a self-consistent dump ... Nothing's perfect. A pg_backup that writes multiple simultaneous data streams that are, when taken as a unit, self-consistent would be tres' useful for Very Large databases. - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFi5UES9HxQb37XmcRAsinAJ0dkMdqdg3ROr9Jmpv6mtmhXaZubwCgi3PK SfisJNzbFz/N5wR92tslHQ0= =goE1 -----END PGP SIGNATURE-----
Ben, On Thu, 2006-12-21 at 20:10, Benjamin Smith wrote: > I'm breaking up a database into several sets of data with similar layout. (we > currently have multiple customers using a single database and tableset, we're > splitting it out to give us more "wiggle room") We have here a very similar situation. We started out with one schema containing all customers, based on a customerid. > 1) Copy out the data specific to a customer and load into a separate database > for that customer, or We went with 1, combined with the original solution we had... i.e. we keep the same schema for customers, and if the number of customers/size of them grows too much, we move them out to another machine. If they need to be isolated from the rest, we move them to another DB possibly on the same machine... but the original schema is still there to allow us having multiple small customers on the same DB. Separate DBs on the same box for each customer would be a maintenance nightmare because we have a (sizable) cluster of application boxes connecting, and all of them must be configured with all connection data to all data bases... so we must keep the nr. of data bases to the minimum. Not to mention that for each DB we have a connection pool, and the nr. of connections would grow too much if we would have one connection pool per customer. Grouping customers according to their business status allows us to upgrade the schema separately for each DB, so we have some choice in when to upgrade different customers to the next version of the application... although usually we make the DB changes in such a way that multiple versions of the software can work with the upgraded schema, and we can upgrade customers from the same DB independently, but this IS a concern when you have the same schema for all customers (we do have occasional pressure from some customers to upgrade sooner than we would like). This does not exclude the possibility of partitioning the biggest tables in your schema per customerid or whatever else you can use to partition them (our biggest table does not have customerid as a column, but it still could be partitioned by something else). Cheers, Csaba.
At 08:12 AM 12/22/2006, Joshua D. Drake wrote: > > > With One Big Database, you can get a SAN and attach a whole lot of > > disk space, but your mobo will only accept a certain number of DIMMs > > and processors of certain designs. And when your growing mega > > database maxes out your h/w, you're stuck. > >Define mega... Because you would need to be in the multi-terrabyte >range. Why multi terabyte? All that needs happen is for the hardware to run out of I/O. Nowadays with the sizes of disks, you can run out of I/O way before you run out of space. It could start to take way too long to backup/restore the entire database. If your app lends itself to horizontal scaling and you think you will need to scale to more than say 5X, its better to scale horizontally than go vertically (get a bigger box). Has clustering technology advanced to the point where making a "bigger box" can be done easily and cheaply with just many small boxes? I've seen stuff like OpenSSI etc, but how well does Postgresql run on such stuff? Shared memory is usually slow/problematic on such systems. Regards, Link.
I haven't seen anyone using OpenSSI with PostgreSQL yet but as it does run with MySQL and Oracle so it might be the case that it goes good with PostgreSQL as well.
Just read the following --> http://wiki.openssi.org/go/FAQ#Does_OpenSSI_support_the_PostgreSQL_database.3F
If anyone did try it PostgreSQL on OpenSSI that will be a good info to have.
-----------------
Shoaib Mir
EnterpriseDB ( www.enterprisedb.com)
Just read the following --> http://wiki.openssi.org/go/FAQ#Does_OpenSSI_support_the_PostgreSQL_database.3F
If anyone did try it PostgreSQL on OpenSSI that will be a good info to have.
-----------------
Shoaib Mir
EnterpriseDB ( www.enterprisedb.com)
On 12/24/06, Lincoln Yeoh <lyeoh@pop.jaring.my > wrote:
At 08:12 AM 12/22/2006, Joshua D. Drake wrote:
>
> > With One Big Database, you can get a SAN and attach a whole lot of
> > disk space, but your mobo will only accept a certain number of DIMMs
> > and processors of certain designs. And when your growing mega
> > database maxes out your h/w, you're stuck.
>
>Define mega... Because you would need to be in the multi-terrabyte
>range.
Why multi terabyte? All that needs happen is for the hardware to run
out of I/O. Nowadays with the sizes of disks, you can run out of I/O
way before you run out of space.
It could start to take way too long to backup/restore the entire database.
If your app lends itself to horizontal scaling and you think you will
need to scale to more than say 5X, its better to scale horizontally
than go vertically (get a bigger box).
Has clustering technology advanced to the point where making a
"bigger box" can be done easily and cheaply with just many small
boxes? I've seen stuff like OpenSSI etc, but how well does Postgresql
run on such stuff? Shared memory is usually slow/problematic on such systems.
Regards,
Link.
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend