Re: One large v. many small - Mailing list pgsql-performance
From | Curt Sampson |
---|---|
Subject | Re: One large v. many small |
Date | |
Msg-id | Pine.NEB.4.51.0301311218530.376@angelic.cynic.net Whole thread Raw |
In response to | Re: One large v. many small (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: One large v. many small
Re: One large v. many small |
List | pgsql-performance |
On Thu, 30 Jan 2003, Josh Berkus wrote: > > Another idea that we are considering is one big table instead of 3,000 > > smaller ones. We could simply add a numeric field to indicate which > > client a particular record was for. > > Yes. Absolutely. Although I'd suggest an Integer field. From the description given in Noah's message, and also the one given in his later message, I have little doubt that 3000 small tables are going to be significantly faster than one large table. If you don't believe me, work out just where the disk blocks are going to end up, and how many blocks are going to have to be fetched for his typical query in a semi-clustered or non-clustered table. (If postgres had clustered indexes a la MS SQL server, where the rows are physically stored in the order of the clustered index, it would be a different matter.) > However, explain to me how, under the current system, you can find the client > who ordered $3000 worth of widgets on January 12th if you don't already know > who it is? Explain to me why he has to do this. It's all very nice to have a general system that can do well on all sorts of queries, but if you lose time on the queries you do do, in order to save time on queries you don't do, you're definitely not getting the best performance out of the system. > I'm not sure a 3000-table UNION query is even *possible*. This is not the only solution, either. You could simply just do 3000 queries. If this is something you execute only once a month, the making that query three or four orders of magnitude more expensive might be a small price to pay for making cheaper the queries you run several times per second. > <rant> > > You've enslaved your application design to performance considerations ... an > approach which was valid in 1990, because processing power was so limited > then. But now that dual-processor servers with RAID can be had for less than > $3000, there's simply no excuse for violating the principles of good > relational database design just to speed up a query. Buying more RAM is > much cheaper than having an engineer spend 3 weeks fixing data integrity > problems. *Sigh.* Ok, my turn to rant. RAM is not cheap enough yet for me to buy several hundred gigabytes of it for typical applications, even if I could find a server that I could put it in. Disk performance is not growing the way CPU performance is. And three weeks of engineering time plus a ten thousand dollar server is, even at my top billing rate, still a heck of a lot cheaper than a quarter-million dollar server. Applying your strategy to all situations is not always going to produce the most cost-effective solution. And for most businesses, that's what it's all about. They're not interested in the more "thoretically pure" way of doing things except insofar as it makes them money. As for the data integrity problems, I don't know where that came from. I think that was made up out of whole cloth, because it didn't seem to me that the original question involved any. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
pgsql-performance by date: