On Thu, 2008-08-21 at 22:17 +0800, Amber wrote:
> Another question, how many people are there maintaining this huge database.
> We have about 2T of compressed SAS datasets, and now considering load them into a RDBMS database,
> according to your experience, it seems a single PostgreSQL instance can't manage such size databases well, it that
right?
Yahoo has a 2PB Postgres single instance Postgres database (modified
engine), but the biggest pure Pg single instance I've heard of is 4TB.
The 4TB database has the additional interesting property in that they've
done none of the standard "scalable" architecture changes (such as
partitioning, etc). To me, this is really a shining example that even
naive Postgres databases can scale to as much hardware as you're willing
to throw at them. Of course, clever solutions will get you much more
bang for your hardware buck.
As for my personal experience, I'd say that the only reason that we're
currently running a dual Pg instance (Master/Replica/Hot Standby)
configuration is for report times. It's really important to us to have
snappy access to our data warehouse. During maintenance our site and
processes can easily be powered by the master database with some
noticeable performance degradation for the users.
The "grid" that we (I) am looking to build is coming out of changing
(yet ever static!) business needs: we're looking to immediately get 2x
the data volume and soon need to scale to 10x. Couple this with
increased user load and the desire to make reports run even faster than
they currently do and we're really going to run up against a hardware
boundary. Besides, writing grid/distributed databases is *fun*!
Uh, for a one sentence answer: A single Pg instance can absolutely
handle 2+ TB without flinching.
> How many CPU cores and memory does your server have :)
My boss asked me not to answer the questions I missed... sorry. I will
say that the hardware is pretty modest, but has good RAM and disk space.
-Mark