Thread: Partitioning Vs. Split Databases - performance?

Partitioning Vs. Split Databases - performance?

From
Benjamin Smith
Date:
I'm breaking up a database into several sets of data with similar layout. (we
currently have multiple customers using a single database and tableset, we're
splitting it out to give us more "wiggle room")

It seems that there are basically two ways to proceed:

1) Copy out the data specific to a customer and load into a separate database
for that customer, or

2) Copy out the data specific to a customer and load into separate tables
(with slightly different names, EG table "dates" becomes "cust1_dates") and
use data partitioning to help with performance as needed.

Given the same physical hardware, which one is likely to perform better? Does
it make any difference? Does using separate databases use more RAM than a
single database with a bunch of different tables?

Company is growing rapidly, so growth room is important...

-Ben
--
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

Re: Partitioning Vs. Split Databases - performance?

From
"Joshua D. Drake"
Date:
On Thu, 2006-12-21 at 11:10 -0800, Benjamin Smith wrote:
> I'm breaking up a database into several sets of data with similar layout. (we
> currently have multiple customers using a single database and tableset, we're
> splitting it out to give us more "wiggle room")
>
> It seems that there are basically two ways to proceed:
>
> 1) Copy out the data specific to a customer and load into a separate database
> for that customer, or
>
> 2) Copy out the data specific to a customer and load into separate tables
> (with slightly different names, EG table "dates" becomes "cust1_dates") and
> use data partitioning to help with performance as needed.


3) Put each customer in their own schema/namespace which resides within
its own table space.

Then you can move customers wherever you need in terms of IO.

Joshua D. Drake


>
> Given the same physical hardware, which one is likely to perform better? Does
> it make any difference? Does using separate databases use more RAM than a
> single database with a bunch of different tables?
>
> Company is growing rapidly, so growth room is important...
>
> -Ben
--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




Re: Partitioning Vs. Split Databases - performance?

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/21/06 13:17, Joshua D. Drake wrote:
> On Thu, 2006-12-21 at 11:10 -0800, Benjamin Smith wrote:
>> I'm breaking up a database into several sets of data with similar layout. (we
>> currently have multiple customers using a single database and tableset, we're
>> splitting it out to give us more "wiggle room")
>>
>> It seems that there are basically two ways to proceed:
>>
>> 1) Copy out the data specific to a customer and load into a separate database
>> for that customer, or

This gives you linear growth potential, since if your current box
gets over-utilized, buy a 2nd box and move some of the databases to it.

>> 2) Copy out the data specific to a customer and load into separate tables
>> (with slightly different names, EG table "dates" becomes "cust1_dates") and
>> use data partitioning to help with performance as needed.

Definitely *not* scalable.  And *very* messy.  Yech.

> 3) Put each customer in their own schema/namespace which resides within
> its own table space.
>
> Then you can move customers wherever you need in terms of IO.

Splitting like mentioned in these three tactics means that you've
now got 2x as many tables.  Add more customers and you've got that
many more tables.  Perfect candidate for "schema drift".

If each table has cust_id in it, then you could:

4) retain 1 database and partition each table on cust_id.

http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html

>> Given the same physical hardware, which one is likely to perform better? Does
>> it make any difference? Does using separate databases use more RAM than a
>> single database with a bunch of different tables?

Config files are global, so I doubt it.

>> Company is growing rapidly, so growth room is important...

Then go for Option 1.


- --
Ron Johnson, Jr.
Jefferson LA  USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFiuTPS9HxQb37XmcRAoF/AJ0ZUcj9C3Bwn7II0hfFzFrZjzA2wQCg6pNS
Tbmm4Rr8uluu/hjZ5gqrT9s=
=lkm+
-----END PGP SIGNATURE-----

Re: Partitioning Vs. Split Databases - performance?

From
Vlad
Date:
On 12/21/06, Ron Johnson <ron.l.johnson@cox.net> wrote:

>> Given the same physical hardware, which one is likely to perform better? Does
>> it make any difference? Does using separate databases use more RAM than a
>> single database with a bunch of different tables?

Config files are global, so I doubt it.

if it's a web app with persistent connections, then splitting onto several databases may consume more RAM. Example: 100 apache clients connected to 3 databases creates 300 forked postmaster processes ; vs 100 apache clients connected to the same DB using three schemas only takes 100 postmasters

--  Vlad

Re: Partitioning Vs. Split Databases - performance?

From
Marc Evans
Date:
On Thu, 21 Dec 2006, Vlad wrote:

> On 12/21/06, Ron Johnson <ron.l.johnson@cox.net> wrote:
>>
>>
>> >> Given the same physical hardware, which one is likely to perform
>> better? Does
>> >> it make any difference? Does using separate databases use more RAM than
>> a
>> >> single database with a bunch of different tables?
>>
>> Config files are global, so I doubt it.
>>
>
> if it's a web app with persistent connections, then splitting onto several
> databases may consume more RAM. Example: 100 apache clients connected to 3
> databases creates 300 forked postmaster processes ; vs 100 apache clients
> connected to the same DB using three schemas only takes 100 postmasters
>
> --  Vlad

Using something like pgpool between the web servers and the DB should help
with that scaling problem...

- Marc

Re: Partitioning Vs. Split Databases - performance?

From
"Shoaib Mir"
Date:
I will also second that and if you got a lot of data, go for table partitioning as well but will not recommend dividing into different databases.

-----------------
Shoaib Mir
EnterpriseDB ( www.enterprisedb.com)

On 12/22/06, Vlad <marchenko@gmail.com> wrote:
On 12/21/06, Ron Johnson <ron.l.johnson@cox.net> wrote:

>> Given the same physical hardware, which one is likely to perform better? Does
>> it make any difference? Does using separate databases use more RAM than a
>> single database with a bunch of different tables?

Config files are global, so I doubt it.

if it's a web app with persistent connections, then splitting onto several databases may consume more RAM. Example: 100 apache clients connected to 3 databases creates 300 forked postmaster processes ; vs 100 apache clients connected to the same DB using three schemas only takes 100 postmasters

--  Vlad

Re: Partitioning Vs. Split Databases - performance?

From
Benjamin Smith
Date:
On Thursday 21 December 2006 11:47, Ron Johnson wrote:
> This gives you linear growth potential, since if your current box
> gets over-utilized, buy a 2nd box and move some of the databases to it.

So far, I'm inclined to go this way, due to the option for linear scaling.

> >> 2) Copy out the data specific to a customer and load into separate tables
> >> (with slightly different names, EG table "dates" becomes "cust1_dates")
> >> and
> >> use data partitioning to help with performance as needed.
>
> Definitely *not* scalable.  And *very* messy.  Yech.

Scales better than present, methinks, but still not the best idea. I'd have to
revisit all my queries to make sure that they use the correct tablename.

> > 3) Put each customer in their own schema/namespace which resides within
> > its own table space.
> >
> > Then you can move customers wherever you need in terms of IO.

How is that functionally different than using a separate database? What's the
advantage here? I don't *need* to restrict myself to one database, and doing
this does require that I revisit 100% of the SQL queries to make sure that
I'm referencing the right schema.

This solution seems to have the same problems as using dynamic tablenames.

> Splitting like mentioned in these three tactics means that you've
> now got 2x as many tables.  Add more customers and you've got that
> many more tables.  Perfect candidate for "schema drift".
>
> If each table has cust_id in it, then you could:
>
> 4) retain 1 database and partition each table on cust_id.
>
> http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html

Intriguing idea, and one that I might do in the future. However, I have 170
normalized tables defined at present. The task of auditing each of these
tables for the constraint ranges sounds somewhat nightmarish. Not all the
tables have the cust_id record defined - some are implicit.

> >> Given the same physical hardware, which one is likely to perform better?
Does
> >> it make any difference? Does using separate databases use more RAM than a
> >> single database with a bunch of different tables?
>
> Config files are global, so I doubt it.
>
> >> Company is growing rapidly, so growth room is important...
>
> Then go for Option 1.

My conclusion, too. Another poster mentioned schema drift, and that's a real
concern, but we're already updating the schema through a script which could
be extended to update all databases, not just one, so I feel this problem
would be minor to non-existent.

Thanks!

-Ben
--
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

Re: Partitioning Vs. Split Databases - performance?

From
"Joshua D. Drake"
Date:
> > > 3) Put each customer in their own schema/namespace which resides within
> > > its own table space.
> > >
> > > Then you can move customers wherever you need in terms of IO.
>
> How is that functionally different than using a separate database? What's the
> advantage here? I don't *need* to restrict myself to one database, and doing
> this does require that I revisit 100% of the SQL queries to make sure that
> I'm referencing the right schema.
>
> This solution seems to have the same problems as using dynamic tablenames.

Not really. You should read up on schemas and how they work. Plus the
addition of schemas and table spaces means you can infinite scaling
within the confines of your hardware itself.

Sincerely,

Joshua D. Drake

--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




Re: Partitioning Vs. Split Databases - performance?

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/21/06 16:41, Joshua D. Drake wrote:
>>>> 3) Put each customer in their own schema/namespace which resides within
>>>> its own table space.
>>>>
>>>> Then you can move customers wherever you need in terms of IO.
>> How is that functionally different than using a separate database? What's the
>> advantage here? I don't *need* to restrict myself to one database, and doing
>> this does require that I revisit 100% of the SQL queries to make sure that
>> I'm referencing the right schema.
>>
>> This solution seems to have the same problems as using dynamic tablenames.
>
> Not really. You should read up on schemas and how they work. Plus the
> addition of schemas and table spaces means you can infinite scaling
> within the confines of your hardware itself.

"infinite scaling within the confines of your hardware"!

How is that accomplished?

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFixCWS9HxQb37XmcRAnw/AJ4obPHIHvJcRKq1xzILN7YtKfQscACg1uaq
c6FRxkXjP/Pneyy1lxA+Dl8=
=iFX6
-----END PGP SIGNATURE-----

Re: Partitioning Vs. Split Databases - performance?

From
"Joshua D. Drake"
Date:
On Thu, 2006-12-21 at 16:54 -0600, Ron Johnson wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 12/21/06 16:41, Joshua D. Drake wrote:
> >>>> 3) Put each customer in their own schema/namespace which resides within
> >>>> its own table space.
> >>>>
> >>>> Then you can move customers wherever you need in terms of IO.
> >> How is that functionally different than using a separate database? What's the
> >> advantage here? I don't *need* to restrict myself to one database, and doing
> >> this does require that I revisit 100% of the SQL queries to make sure that
> >> I'm referencing the right schema.
> >>
> >> This solution seems to have the same problems as using dynamic tablenames.
> >
> > Not really. You should read up on schemas and how they work. Plus the
> > addition of schemas and table spaces means you can infinite scaling
> > within the confines of your hardware itself.
>
> "infinite scaling within the confines of your hardware"!
>
> How is that accomplished?

Well with a tablespace, you can place the entire schema, easily on its
own array. Say a nice little HP-MSA30. If that array gets tired you can
add another array, and move a couple of schemas off to another array.

Also if you spec the hardware correctly, you can get up to a total of 16
cores without any significant cost... just add cpus as needed.

PostgreSQL 8.1 does extremely well up to 8 cpus (8 cores). 8.2 is said
to perform better on > 8 cores, but I have not tested it.

Sincerely,

Joshua D. Drake


>
> - --
> Ron Johnson, Jr.
> Jefferson LA  USA
>
> Is "common sense" really valid?
> For example, it is "common sense" to white-power racists that
> whites are superior to blacks, and that those with brown skins
> are mud people.
> However, that "common sense" is obviously wrong.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
>
> iD8DBQFFixCWS9HxQb37XmcRAnw/AJ4obPHIHvJcRKq1xzILN7YtKfQscACg1uaq
> c6FRxkXjP/Pneyy1lxA+Dl8=
> =iFX6
> -----END PGP SIGNATURE-----
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/
>
--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




Re: Partitioning Vs. Split Databases - performance?

From
Benjamin Smith
Date:
On Thursday 21 December 2006 14:41, Joshua D. Drake wrote:
>You should read up on schemas and how they work. Plus the
> addition of schemas and table spaces means you can infinite scaling
> within the confines of your hardware itself.

Ok, so I'd like you to correct me if I'm wrong:

1) Schemas operate within a database. A schema is analogized as a filesystem
directory in the docs, except that you can't recurse schemas.

2) A database runs on one machine, with the following addenda:
    A) slony lets you copy that database to another system,
    B) pgtool lets you duplicate the database if you're real careful around
updating with unique IDs and aggregate functions,
    C) you can essentially do cross-machine RAID so that if your primary DB gets
hosed, you can fire up the backup machine and continue working.
    D) pg-cluster, a synchronous clustering solution appears to be new/unstable,
doesn't appear to be current to 8.x, and takes a performance hit on writes.

3) Thus, any service that splits up a database (EG: a schema) is subject to
all the limitations outlined in #2.

Did I miss anything?

--
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

Re: Partitioning Vs. Split Databases - performance?

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/21/06 17:15, Joshua D. Drake wrote:
> On Thu, 2006-12-21 at 16:54 -0600, Ron Johnson wrote:
> On 12/21/06 16:41, Joshua D. Drake wrote:
[snip]
>>>>> This solution seems to have the same problems as using dynamic tablenames.
>>>> Not really. You should read up on schemas and how they work. Plus the
>>>> addition of schemas and table spaces means you can infinite scaling
>>>> within the confines of your hardware itself.
> "infinite scaling within the confines of your hardware"!
>
> How is that accomplished?
>
>> Well with a tablespace, you can place the entire schema, easily on its
>> own array. Say a nice little HP-MSA30. If that array gets tired you can
>> add another array, and move a couple of schemas off to another array.
>
>> Also if you spec the hardware correctly, you can get up to a total of 16
>> cores without any significant cost... just add cpus as needed.
>
>> PostgreSQL 8.1 does extremely well up to 8 cpus (8 cores). 8.2 is said
>> to perform better on > 8 cores, but I have not tested it.

Sure, but *infinite*?  Or were you exercising hyperbole?

With One Big Database, you can get a SAN and attach a whole lot of
disk space, but your mobo will only accept a certain number of DIMMs
and processors of certain designs.  And when your growing mega
database maxes out your h/w, you're stuck.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFixgDS9HxQb37XmcRAm5TAKCVGpWvb9SKB+Kk44J/88WTEhnYoQCeKgcd
BhorUjImDd+SldObdWuhFjw=
=lSm5
-----END PGP SIGNATURE-----

Re: Partitioning Vs. Split Databases - performance?

From
Ben
Date:
Depending on what kind of company you have, you may (or may not) want to
consider how you will upgrade the schema over time. Might it be the case
that some customers will need a different schema than others? If so, a
simple data partitioning by customer id probably won't be the best option.

On Thu, 21 Dec 2006, Benjamin Smith wrote:

> I'm breaking up a database into several sets of data with similar layout. (we
> currently have multiple customers using a single database and tableset, we're
> splitting it out to give us more "wiggle room")
>
> It seems that there are basically two ways to proceed:
>
> 1) Copy out the data specific to a customer and load into a separate database
> for that customer, or
>
> 2) Copy out the data specific to a customer and load into separate tables
> (with slightly different names, EG table "dates" becomes "cust1_dates") and
> use data partitioning to help with performance as needed.
>
> Given the same physical hardware, which one is likely to perform better? Does
> it make any difference? Does using separate databases use more RAM than a
> single database with a bunch of different tables?
>
> Company is growing rapidly, so growth room is important...
>
> -Ben
> --
> "The best way to predict the future is to invent it."
> - XEROX PARC slogan, circa 1978
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>

Re: Partitioning Vs. Split Databases - performance?

From
"Joshua D. Drake"
Date:
On Thu, 2006-12-21 at 15:22 -0800, Benjamin Smith wrote:
> On Thursday 21 December 2006 14:41, Joshua D. Drake wrote:
> >You should read up on schemas and how they work. Plus the
> > addition of schemas and table spaces means you can infinite scaling
> > within the confines of your hardware itself.
>
> Ok, so I'd like you to correct me if I'm wrong:
>
> 1) Schemas operate within a database. A schema is analogized as a filesystem
> directory in the docs, except that you can't recurse schemas.

Kind of... think of a schema as a database without a new connection :).

catalog/cluster
   ->[n] databases
      ->[n] schemas (of which public is default)
         ->[n] objects (tables, functions, views etc..)

By default, your schema is public. You can create a new schema called
customer_one. Within public you could have table public.accounts and
within customer_one you could have customer_one.accounts. They are
isolated and contain seperate data sets.

You can set the schema based on the user connecting or as part of your
queries with set search_path. E.g;

set search_path =  customer_one;

select * from accounts;

Will only grab customer_one.accounts.

Thus giving virtual multiple database access without multiple databases.

>
> 2) A database runs on one machine, with the following addenda:
>     A) slony lets you copy that database to another system,

Replicate, not copy (being pedantic but it is for good reason).

>     B) pgtool lets you duplicate the database if you're real careful around
> updating with unique IDs and aggregate functions,

You mean pgpool? Yeah, but don't use it for that, use Slony or
replicator instead.

>     C) you can essentially do cross-machine RAID so that if your primary DB gets
> hosed, you can fire up the backup machine and continue working.

With Slony, Replicator or PITR yes.

>     D) pg-cluster, a synchronous clustering solution appears to be new/unstable,
> doesn't appear to be current to 8.x, and takes a performance hit on writes.

Yes.
>
> 3) Thus, any service that splits up a database (EG: a schema) is subject to
> all the limitations outlined in #2.

Well and subject to your app understanding what to do, when.

Joshua D. Drake


>
> Did I miss anything?
>
--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




Re: Partitioning Vs. Split Databases - performance?

From
"Joshua D. Drake"
Date:
> >> Also if you spec the hardware correctly, you can get up to a total of 16
> >> cores without any significant cost... just add cpus as needed.
> >
> >> PostgreSQL 8.1 does extremely well up to 8 cpus (8 cores). 8.2 is said
> >> to perform better on > 8 cores, but I have not tested it.
>
> Sure, but *infinite*?  Or were you exercising hyperbole?

I did say, infinite with the *confines* of the hardware :)

>
> With One Big Database, you can get a SAN and attach a whole lot of
> disk space, but your mobo will only accept a certain number of DIMMs
> and processors of certain designs.  And when your growing mega
> database maxes out your h/w, you're stuck.

Define mega... Because you would need to be in the multi-terrabyte
range.

Joshua D. Drake



>
> - --
> Ron Johnson, Jr.
> Jefferson LA  USA
>
> Is "common sense" really valid?
> For example, it is "common sense" to white-power racists that
> whites are superior to blacks, and that those with brown skins
> are mud people.
> However, that "common sense" is obviously wrong.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
>
> iD8DBQFFixgDS9HxQb37XmcRAm5TAKCVGpWvb9SKB+Kk44J/88WTEhnYoQCeKgcd
> BhorUjImDd+SldObdWuhFjw=
> =lSm5
> -----END PGP SIGNATURE-----
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>
--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




Re: Partitioning Vs. Split Databases - performance?

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/21/06 18:12, Joshua D. Drake wrote:
>>>> Also if you spec the hardware correctly, you can get up to a total of 16
>>>> cores without any significant cost... just add cpus as needed.
>>>> PostgreSQL 8.1 does extremely well up to 8 cpus (8 cores). 8.2 is said
>>>> to perform better on > 8 cores, but I have not tested it.
>> Sure, but *infinite*?  Or were you exercising hyperbole?
>
> I did say, infinite with the *confines* of the hardware :)

:)

>> With One Big Database, you can get a SAN and attach a whole lot of
>> disk space, but your mobo will only accept a certain number of DIMMs
>> and processors of certain designs.  And when your growing mega
>> database maxes out your h/w, you're stuck.
>
> Define mega... Because you would need to be in the multi-terrabyte
> range.

I'm thinking more of RAM and CPU.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFi0wFS9HxQb37XmcRAoaZAJ9s4msf5+3zfInemHzOObYwV4THSgCgihbj
oV4EcWu9/YtO75po/Bi9rys=
=PmCt
-----END PGP SIGNATURE-----

Re: Partitioning Vs. Split Databases - performance?

From
"Joshua D. Drake"
Date:
> >> With One Big Database, you can get a SAN and attach a whole lot of
> >> disk space, but your mobo will only accept a certain number of DIMMs
> >> and processors of certain designs.  And when your growing mega
> >> database maxes out your h/w, you're stuck.
> >
> > Define mega... Because you would need to be in the multi-terrabyte
> > range.
>
> I'm thinking more of RAM and CPU.

32GB is an awful lot of ram... as is 8 cores. You can get 16 core
machines now that will take 64GB.

Joshua D. Drake


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




Re: Partitioning Vs. Split Databases - performance?

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/21/06 23:44, Joshua D. Drake wrote:
>>>> With One Big Database, you can get a SAN and attach a whole lot of
>>>> disk space, but your mobo will only accept a certain number of DIMMs
>>>> and processors of certain designs.  And when your growing mega
>>>> database maxes out your h/w, you're stuck.
>>> Define mega... Because you would need to be in the multi-terrabyte
>>> range.
>> I'm thinking more of RAM and CPU.
>
> 32GB is an awful lot of ram... as is 8 cores. You can get 16 core
> machines now that will take 64GB.

Sure, you *can*.  A Sun Fire X4600 (max 16 cores) takes up to 128GB
RAM.  I'll bet my house, though, that OP doesn't have a maxed-out
X4600.  (BTW, maxed out, with 2 U320 adapters and 6 4Gb HBAs, it's
only $75000!)

Please tell me, though, what's so controversial about saying that
CPU and RAM resources are finite?

Besides, since pg_dump is single-threaded, backing up a huge
database gets impossible.  Federating the database allows multiple
pg_dumps to simultaneously dump data to multiple tape drives.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFi3zlS9HxQb37XmcRAtVjAKDGPu4jHPyTH9EX5sqPbuHs+wKE3gCeLnnx
RvSfELtJ7bieg3HVVqB/7Zk=
=FQCv
-----END PGP SIGNATURE-----

Re: Partitioning Vs. Split Databases - performance?

From
Tom Lane
Date:
Ron Johnson <ron.l.johnson@cox.net> writes:
> Besides, since pg_dump is single-threaded, backing up a huge
> database gets impossible.  Federating the database allows multiple
> pg_dumps to simultaneously dump data to multiple tape drives.

... as long as you don't care about having a self-consistent dump ...

            regards, tom lane

Re: Partitioning Vs. Split Databases - performance?

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/22/06 01:22, Tom Lane wrote:
> Ron Johnson <ron.l.johnson@cox.net> writes:
>> Besides, since pg_dump is single-threaded, backing up a huge
>> database gets impossible.  Federating the database allows multiple
>> pg_dumps to simultaneously dump data to multiple tape drives.
>
> ... as long as you don't care about having a self-consistent dump ...

Nothing's perfect.

A pg_backup that writes multiple simultaneous data streams that are,
when taken as a unit, self-consistent would be tres' useful for Very
Large databases.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFi5UES9HxQb37XmcRAsinAJ0dkMdqdg3ROr9Jmpv6mtmhXaZubwCgi3PK
SfisJNzbFz/N5wR92tslHQ0=
=goE1
-----END PGP SIGNATURE-----

Re: Partitioning Vs. Split Databases - performance?

From
Csaba Nagy
Date:
Ben,

On Thu, 2006-12-21 at 20:10, Benjamin Smith wrote:
> I'm breaking up a database into several sets of data with similar layout. (we
> currently have multiple customers using a single database and tableset, we're
> splitting it out to give us more "wiggle room")

We have here a very similar situation. We started out with one schema
containing all customers, based on a customerid.

> 1) Copy out the data specific to a customer and load into a separate database
> for that customer, or

We went with 1, combined with the original solution we had... i.e. we
keep the same schema for customers, and if the number of customers/size
of them grows too much, we move them out to another machine. If they
need to be isolated from the rest, we move them to another DB possibly
on the same machine... but the original schema is still there to allow
us having multiple small customers on the same DB. Separate DBs on the
same box for each customer would be a maintenance nightmare because we
have a (sizable) cluster of application boxes connecting, and all of
them must be configured with all connection data to all data bases... so
we must keep the nr. of data bases to the minimum. Not to mention that
for each DB we have a connection pool, and the nr. of connections would
grow too much if we would have one connection pool per customer.

Grouping customers according to their business status allows us to
upgrade the schema separately for each DB, so we have some choice in
when to upgrade different customers to the next version of the
application... although usually we make the DB changes in such a way
that multiple versions of the software can work with the upgraded
schema, and we can upgrade customers from the same DB independently, but
this IS a concern when you have the same schema for all customers (we do
have occasional pressure from some customers to upgrade sooner than we
would like).

This does not exclude the possibility of partitioning the biggest tables
in your schema per customerid or whatever else you can use to partition
them (our biggest table does not have customerid as a column, but it
still could be partitioned by something else).

Cheers,
Csaba.



Re: Partitioning Vs. Split Databases - performance?

From
Lincoln Yeoh
Date:
At 08:12 AM 12/22/2006, Joshua D. Drake wrote:
>
> > With One Big Database, you can get a SAN and attach a whole lot of
> > disk space, but your mobo will only accept a certain number of DIMMs
> > and processors of certain designs.  And when your growing mega
> > database maxes out your h/w, you're stuck.
>
>Define mega... Because you would need to be in the multi-terrabyte
>range.

Why multi terabyte? All that needs happen is for the hardware to run
out of I/O. Nowadays with the sizes of disks, you can run out of I/O
way before you run out of space.

It could start to take way too long to backup/restore the entire database.

If your app lends itself to horizontal scaling and you think you will
need to scale to more than say 5X, its better to scale horizontally
than go vertically (get a bigger box).

Has clustering technology advanced to the point where making a
"bigger box" can be done easily and cheaply with just many small
boxes? I've seen stuff like OpenSSI etc, but how well does Postgresql
run on such stuff? Shared memory is usually slow/problematic on such systems.

Regards,
Link.


Re: Partitioning Vs. Split Databases - performance?

From
"Shoaib Mir"
Date:
I haven't seen anyone using OpenSSI with PostgreSQL yet but as it does run with MySQL and Oracle so it might be the case that it goes good with PostgreSQL as well.

Just read the following --> http://wiki.openssi.org/go/FAQ#Does_OpenSSI_support_the_PostgreSQL_database.3F

If anyone did try it PostgreSQL on OpenSSI that will be a good info to have.

-----------------
Shoaib Mir
EnterpriseDB ( www.enterprisedb.com)

On 12/24/06, Lincoln Yeoh <lyeoh@pop.jaring.my > wrote:
At 08:12 AM 12/22/2006, Joshua D. Drake wrote:
>
> > With One Big Database, you can get a SAN and attach a whole lot of
> > disk space, but your mobo will only accept a certain number of DIMMs
> > and processors of certain designs.  And when your growing mega
> > database maxes out your h/w, you're stuck.
>
>Define mega... Because you would need to be in the multi-terrabyte
>range.

Why multi terabyte? All that needs happen is for the hardware to run
out of I/O. Nowadays with the sizes of disks, you can run out of I/O
way before you run out of space.

It could start to take way too long to backup/restore the entire database.

If your app lends itself to horizontal scaling and you think you will
need to scale to more than say 5X, its better to scale horizontally
than go vertically (get a bigger box).

Has clustering technology advanced to the point where making a
"bigger box" can be done easily and cheaply with just many small
boxes? I've seen stuff like OpenSSI etc, but how well does Postgresql
run on such stuff? Shared memory is usually slow/problematic on such systems.

Regards,
Link.


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend