Re: One large v. many small - Mailing list pgsql-performance

From Josh Berkus
Subject Re: One large v. many small
Date
Msg-id 200301300956.56041.josh@agliodbs.com
Whole thread Raw
In response to One large v. many small  (Noah Silverman <noah@allresearch.com>)
Responses Re: One large v. many small  (Jeff <threshar@torgo.978.org>)
Re: One large v. many small  (Curt Sampson <cjs@cynic.net>)
List pgsql-performance
Noah,

> As we continue our evaluation of Postgres, another interesting topic
> has come up that I want to run by the group.
>
> In our current model, we have about 3,000 small tables that we use
> track data for our clients.  Each table is an identical structure, and
> holds the data for one client.

I'd list what's wrong with this structure, but frankly it would take me long
enough that I'd need a consulting fee.   Suffice it to say that the above is
a very, very bad (or at least antiquated) design idea and you need to
transition out of it as soon as possible.

> Another idea that we are considering is one big table instead of 3,000
> smaller ones.  We could simply add a numeric field to indicate which
> client a particular record was for.

Yes.   Absolutely.  Although I'd suggest an Integer field.

> Each table has between 500 and 50,000 records, so the big table could
> have up to 10 million rows if we combined everything.

Sure.

> A query on our current system is (for client #4)
>
> Select (*) from client_4 where foo=2;
>
> A query from the new, proposed system would be
>
> Select (*) from big_results where client=4 and foo=2.
>
> The big questions is, WHICH WILL BE FASTER with Postgres.  Is there any
> performance improvement or cost to switching to this new structure.

Oh, no question query 1 will be faster ... FOR THAT QUERY.  You are asking the
wrong question.

However, explain to me how, under the current system, you can find the client
who ordered $3000 worth of widgets on January 12th if you don't already know
who it is?   I'm not sure a 3000-table UNION query is even *possible*.

Or how about giving me the average number of customer transactions in a month,
across all clients?

<rant>

You've enslaved your application design to performance considerations ... an
approach which was valid in 1990, because processing power was so limited
then.  But now that dual-processor servers with RAID can be had for less than
$3000, there's simply no excuse for violating the principles of good
relational database design just to speed up a query.   Buying more RAM is
much cheaper than having an engineer spend 3 weeks fixing data integrity
problems.

The proper way to go about application design is to build your application on
paper or in a modelling program according to the best principles of software
design available, and *then* to discuss performance issues -- addressing them
*first* by buying hardware, and only compromising your applcation design when
no other alternative is available.

</rant>

I strongly suggest that you purchase Pascal's "Practical Issues in Database
Design" and give it a read.

--
Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-performance by date:

Previous
From: Noah Silverman
Date:
Subject: One large v. many small
Next
From: Andrew Sullivan
Date:
Subject: Re: One large v. many small