Re: Partitioning Vs. Split Databases - performance? - Mailing list pgsql-general

From Benjamin Smith
Subject Re: Partitioning Vs. Split Databases - performance?
Date
Msg-id 200612211407.14017.lists@benjamindsmith.com
Whole thread Raw
In response to Re: Partitioning Vs. Split Databases - performance?  (Ron Johnson <ron.l.johnson@cox.net>)
Responses Re: Partitioning Vs. Split Databases - performance?
List pgsql-general
On Thursday 21 December 2006 11:47, Ron Johnson wrote:
> This gives you linear growth potential, since if your current box
> gets over-utilized, buy a 2nd box and move some of the databases to it.

So far, I'm inclined to go this way, due to the option for linear scaling.

> >> 2) Copy out the data specific to a customer and load into separate tables
> >> (with slightly different names, EG table "dates" becomes "cust1_dates")
> >> and
> >> use data partitioning to help with performance as needed.
>
> Definitely *not* scalable.  And *very* messy.  Yech.

Scales better than present, methinks, but still not the best idea. I'd have to
revisit all my queries to make sure that they use the correct tablename.

> > 3) Put each customer in their own schema/namespace which resides within
> > its own table space.
> >
> > Then you can move customers wherever you need in terms of IO.

How is that functionally different than using a separate database? What's the
advantage here? I don't *need* to restrict myself to one database, and doing
this does require that I revisit 100% of the SQL queries to make sure that
I'm referencing the right schema.

This solution seems to have the same problems as using dynamic tablenames.

> Splitting like mentioned in these three tactics means that you've
> now got 2x as many tables.  Add more customers and you've got that
> many more tables.  Perfect candidate for "schema drift".
>
> If each table has cust_id in it, then you could:
>
> 4) retain 1 database and partition each table on cust_id.
>
> http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html

Intriguing idea, and one that I might do in the future. However, I have 170
normalized tables defined at present. The task of auditing each of these
tables for the constraint ranges sounds somewhat nightmarish. Not all the
tables have the cust_id record defined - some are implicit.

> >> Given the same physical hardware, which one is likely to perform better?
Does
> >> it make any difference? Does using separate databases use more RAM than a
> >> single database with a bunch of different tables?
>
> Config files are global, so I doubt it.
>
> >> Company is growing rapidly, so growth room is important...
>
> Then go for Option 1.

My conclusion, too. Another poster mentioned schema drift, and that's a real
concern, but we're already updating the schema through a script which could
be extended to update all databases, not just one, so I feel this problem
would be minor to non-existent.

Thanks!

-Ben
--
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

pgsql-general by date:

Previous
From: Walter Vaughan
Date:
Subject: Re: Website Hosting Service and Security
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Partitioning Vs. Split Databases - performance?