On Tue, Dec 9, 2008 at 9:37 AM, Scott Carey <scott@richrelevance.com> wrote:
> As for tipping points and pg_bench -- It doesn't seem to reflect the kind of workload we use postgres for at all,
thoughmy workload does a lot of big hashes and seqscans, and I'm curious how much improved those may be due to the hash
improvements. 32GB RAM and 3TB data (about 250GB scanned regularly) here. And yes, we are almost completely CPU bound
nowexcept for a few tasks. Iostat only reports above 65% disk utilization for about 5% of the workload duty-cycle, and
isregularly < 20%. COPY doesn't get anywhere near platter speeds, on indexless bulk transfer. The highest disk usage
spikesoccur when some of our radom-access data/indexes get shoved out of cache. These aren't too large, but high
enoughseqscan load will cause postgres and the OS to dump them from cache. If we put these on some SSD's the disk
utilization% would drop a lot further.
It definitely reflects our usage pattern, which is very random and
involves tiny bits of data scattered throughout the database. Our
current database is about 20-25 Gig, which means it's quickly reaching
the point where it will not fit in our 32G of ram, and it's likely to
grow too big for 64Gig before a year or two is out.
> I feel confident in saying that in about a year, I could spec out a medium sized budget for hardware ($25k) for
almostany postgres setup and make it almost pure CPU bound.
> SSDs and hybrid tech such as ZFS L2ARC make this possible with easy access to 10k+ iops, and it it will take no more
than12 SATA drives in raid 10 next year (and a good controller or software raid) to get 1GB/sec sequential reads.
Lucky you, having needs that are fulfilled by sequential reads. :)
I wonder how many hard drives it would take to be CPU bound on random
access patterns? About 40 to 60? And probably 15k / SAS drives to
boot. Cause that's what we're looking at in the next few years where
I work.