Thread: Scalability
Hello, When large databases, complex queries and frequent access are combined, the database backend may become a performance bottleneck. This is quite obvious. Do I assume correctly that CPU will be the earliest limiting factor for a database (if we have a reasonable amount of memory in the box)? If yes, one could deal with it by switching to a more powerful machine, OR one could distribute the database between several equivalent machines, as I imagine. The latter makes sense if data can be partitioned so that consistency and dependencies can be worked around, that is, I can move two subsets of data (say A and B) to two different boxes if rows in these subsets are very unlikely to be combined in a single query. Then I can have my application code access the correct database instance based on some simple rule, for example, users with IDs 0-1000000 have their messages stored on box A and all others on box B. Or: articles for newsgroups with hash code == 0 -> box A, others -> box B. Do we get linear scalability that way? I guess so. Does it sound cool? Yes. But does it cost less than upgrading a single server? I'm not sure. Is anyone out there using such a load balancing scheme with PostgreSQL? Are there any foreseeable problems which would make it impractical? Thanks - JPL
(Frequent Access) If you just have lots of queries in parallel, try replication, and pick a random server for each connection. (Complex Queries) If you absolutely, positively need one query to be executed across all nodes in the cluster because one machine would just take too long no matter how big, then you may have to shell out for Oracle... (Splitting the Data) Alternatively, you could put separate tables on separate machines, or wait until the "schema" gets advanced enough to let you do full cross-database SQL queries... No idea how far off this is in PostgreSQL... Personally, I find that CPU is rarely the bottleneck. It is usually either lack of memory or the speed of the disks... ----- Original Message ----- From: "Jan Ploski" <jpljpl@gmx.de> To: <pgsql-general@postgresql.org> Sent: Wednesday, March 14, 2001 1:54 PM Subject: [GENERAL] Scalability > Hello, > > When large databases, complex queries and frequent access are combined, > the database backend may become a performance bottleneck. This is quite > obvious. > > Do I assume correctly that CPU will be the earliest limiting factor > for a database (if we have a reasonable amount of memory in the box)? > > If yes, one could deal with it by switching to a more powerful machine, > OR one could distribute the database between several equivalent machines, > as I imagine. The latter makes sense if data can be partitioned so that > consistency and dependencies can be worked around, that is, I can move > two subsets of data (say A and B) to two different boxes if rows in these > subsets are very unlikely to be combined in a single query. Then I can have > my application code access the correct database instance based on some > simple rule, for example, users with IDs 0-1000000 have their messages > stored on box A and all others on box B. Or: articles for newsgroups > with hash code == 0 -> box A, others -> box B. Do we get linear scalability > that way? I guess so. Does it sound cool? Yes. But does it cost less than > upgrading a single server? I'm not sure. > > Is anyone out there using such a load balancing scheme with PostgreSQL? > Are there any foreseeable problems which would make it impractical? > > Thanks - > JPL > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > >
Jan Ploski <jpljpl@gmx.de> writes: > Do I assume correctly that CPU will be the earliest limiting factor > for a database (if we have a reasonable amount of memory in the box)? Depends on your query mix. If you have a lot of updates then I'd recommend spending plenty of money on fast disk drives and good disk controller boards. If it's mostly SELECTs then you don't need such fast drives, as long as you have enough RAM to cache a good chunk of the active database. In that case the CPU probably would be the limiting factor. > Is anyone out there using such a load balancing scheme with PostgreSQL? Hardware is so cheap these days that I'd think it makes more sense just to throw more hardware at the problem... I think client-driven load balancing will be a pain in the neck to maintain. (There has been some talk of actual server support for load balancing, but it's still a long way off.) regards, tom lane
In article <006001c0ac91$4ebe3ea0$800010ac@localdomain>, "Gordan Bobic" <gordan@freeuk.com> wrote: > (Complex Queries) > If you absolutely, positively need one query to be executed across all > nodes in the cluster because one machine would just take too long no > matter how big, then you may have to shell out for Oracle... (Splitting or DB2 EEE (Enterprise Extended Edition). It's cheaper than Oracle, too. Gordon. -- It doesn't get any easier, you just go faster. -- Greg LeMond