>>I know what I would choose. I'd get the mega server w/ a ton of RAM and skip
>>all the trickyness of partitioning a DB over multiple servers. Yes your data
>>will grow to a point where even the XXGB can't cache everything. On the
>>otherhand, memory prices drop just as fast. By that time, you can ebay your
>>original 16/32GB and get 64/128GB.
>
>
> a) What do you do when your calculations show you need 256G of ram? [Yes such
> machines exist but you're not longer in the realm of simply "add more RAM".
> Administering such machines is nigh as complex as clustering]
If you need that much memory, you've got enough customers paying you
cash to pay for anything. :) Technology always increase -- 8X Opterons
would double your memory capacity, higher capacity DIMMs, etc.
> b) What do you do when you find you need multiple machines anyways to divide
> the CPU or I/O or network load up. Now you need n big beefy servers when n
> servers 1/nth as large would really have sufficed. This is a big difference
> when you're talking about the difference between colocating 16 1U boxen with
> 4G of ram vs 16 4U opterons with 64G of RAM...
>
> All that said, yes, speaking as a user I think the path of least resistance is
> to build n complete slaves using Slony and then just divide the workload.
> That's how I'm picturing going when I get to that point.
Replication is good for uptime and high read systems. The problem is
that if your system has a high volume of writes and you need near
realtime data syncing, clusters don't get you anything. A write on one
server means a write on every server. Spreading out the damage over
multiple machines doesn't help a bit.
Plus the fact that we don't have multi-master replication yet is quite a
bugaboo. That requires writing quite extensive code if you can't afford
to have 1 server be your single point of failure. We wrote our own
multi-master replication code at the client app level and it's quite a
chore making sure the replication act logically. Every table needs to
have separate logic to parse situations like "voucher was posted on
server 1 but voided after on server 2, what's the correct action here?"
So I've got a slew of complicated if-then-else statements that not only
have to take into account type of update being made but the sequence.
And yes, I tried doing realtime locks over a VPN link over our servers
in SF and VA. Ugh...latency was absolutely horrible and made
transactions run 1000X slower.