Curt Sampson wrote:
> >From the description given in Noah's message, and also the
> one given in his later message, I have little doubt that 3000
> small tables are going to be significantly faster than one
> large table. If you don't believe me, work out just where the
> disk blocks are going to end up, and how many blocks are going
> to have to be fetched for his typical query in a semi-clustered or
> non-clustered table.
You may be right, Curt, but I've seen unintuitive results for this
kind of thing in the past.
Depending on the way the records are accessed and the cache size,
the exact opposite could be true. The index pages will most likely
rarely be in memory when you have 3000 different tables. Meaning
that each search will require at least three or four index page
retrievals plus the tuple page.
So what you might lose due to lack of clustering will be made up
by the more efficient caching of the upper levels of the index
btree pages.
Combine a multi-part index (on both client and foo, which order
would depend on the access required) that is clustered once a week
or so using the admittedly non-optimal PostgreSQL CLUSTER command
and I'll bet you can get equivalent or better performance with the
single table with the concomitant advantages of much better
reporting options.
I've also seen many examples of linear algorithms in database
data dictionaries which would cause a 3000+ table database
to perform poorly during the parsing/query optimization stage.
I don't have any idea whether or not PostgreSQL suffers from this
problem.
I don't think there is any substitute for just trying it out. It
shouldn't be that hard to create a bunch of SQL statements that
concatenate the tables into one large one.
Try the most common queries against both scenarios. You might be
surprised.
- Curtis