Hello,
When large databases, complex queries and frequent access are combined,
the database backend may become a performance bottleneck. This is quite
obvious.
Do I assume correctly that CPU will be the earliest limiting factor
for a database (if we have a reasonable amount of memory in the box)?
If yes, one could deal with it by switching to a more powerful machine,
OR one could distribute the database between several equivalent machines,
as I imagine. The latter makes sense if data can be partitioned so that
consistency and dependencies can be worked around, that is, I can move
two subsets of data (say A and B) to two different boxes if rows in these
subsets are very unlikely to be combined in a single query. Then I can have
my application code access the correct database instance based on some
simple rule, for example, users with IDs 0-1000000 have their messages
stored on box A and all others on box B. Or: articles for newsgroups
with hash code == 0 -> box A, others -> box B. Do we get linear scalability
that way? I guess so. Does it sound cool? Yes. But does it cost less than
upgrading a single server? I'm not sure.
Is anyone out there using such a load balancing scheme with PostgreSQL?
Are there any foreseeable problems which would make it impractical?
Thanks -
JPL