Re: One large v. many small - Mailing list pgsql-performance

From Curt Sampson
Subject Re: One large v. many small
Date
Msg-id Pine.NEB.4.51.0301311218530.376@angelic.cynic.net
Whole thread Raw
In response to Re: One large v. many small  (Josh Berkus <josh@agliodbs.com>)
Responses Re: One large v. many small
Re: One large v. many small
List pgsql-performance
On Thu, 30 Jan 2003, Josh Berkus wrote:

> > Another idea that we are considering is one big table instead of 3,000
> > smaller ones.  We could simply add a numeric field to indicate which
> > client a particular record was for.
>
> Yes.   Absolutely.  Although I'd suggest an Integer field.

From the description given in Noah's message, and also the one given in
his later message, I have little doubt that 3000 small tables are going
to be significantly faster than one large table. If you don't believe
me, work out just where the disk blocks are going to end up, and how
many blocks are going to have to be fetched for his typical query in
a semi-clustered or non-clustered table. (If postgres had clustered
indexes a la MS SQL server, where the rows are physically stored in the
order of the clustered index, it would be a different matter.)

> However, explain to me how, under the current system, you can find the client
> who ordered $3000 worth of widgets on January 12th if you don't already know
> who it is?

Explain to me why he has to do this.

It's all very nice to have a general system that can do well on all
sorts of queries, but if you lose time on the queries you do do, in
order to save time on queries you don't do, you're definitely not
getting the best performance out of the system.

> I'm not sure a 3000-table UNION query is even *possible*.

This is not the only solution, either. You could simply just do 3000
queries. If this is something you execute only once a month, the making
that query three or four orders of magnitude more expensive might be a
small price to pay for making cheaper the queries you run several times
per second.

> <rant>
>
> You've enslaved your application design to performance considerations ... an
> approach which was valid in 1990, because processing power was so limited
> then.  But now that dual-processor servers with RAID can be had for less than
> $3000, there's simply no excuse for violating the principles of good
> relational database design just to speed up a query.  Buying more RAM is
> much cheaper than having an engineer spend 3 weeks fixing data integrity
> problems.

*Sigh.* Ok, my turn to rant.

RAM is not cheap enough yet for me to buy several hundred gigabytes of
it for typical applications, even if I could find a server that I could
put it in. Disk performance is not growing the way CPU performance is.
And three weeks of engineering time plus a ten thousand dollar server
is, even at my top billing rate, still a heck of a lot cheaper than a
quarter-million dollar server.

Applying your strategy to all situations is not always going to produce
the most cost-effective solution. And for most businesses, that's what it's
all about. They're not interested in the more "thoretically pure" way of
doing things except insofar as it makes them money.

As for the data integrity problems, I don't know where that came from. I
think that was made up out of whole cloth, because it didn't seem to me
that the original question involved any.

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC

pgsql-performance by date:

Previous
From: Curt Sampson
Date:
Subject: Re: Postgres 7.3.1 poor insert/update/search performance
Next
From: "Josh Berkus"
Date:
Subject: Re: One large v. many small