Thread: What setup would you choose for postgresql 9.2 installation?
LSI MegaRAID SAS 9260-4i with four Intel SSDSC2CW240A3K5 SSDs OR four Hitachi Ultrastar 15K600 SAS drives? My app is pretty write heavy and I have a lot of concurrent connections 300 - (though considering adding pgpool2 in frontto increase throughput). Regards Niels Kristian
Apologies for the tangential question, but how would pgpool2 "increase throughput"? Wouldn't the same number of statements be issued by your application? It would likely reduce the number of concurrent connections, but that doesn't necessarily equate to "increased throughput". -AJ On 3/4/2013 8:52 AM, Niels Kristian Schjødt wrote: > LSI MegaRAID SAS 9260-4i with four Intel SSDSC2CW240A3K5 SSDs OR four Hitachi Ultrastar 15K600 SAS drives? > > My app is pretty write heavy and I have a lot of concurrent connections 300 - (though considering adding pgpool2 in frontto increase throughput). > > Regards Niels Kristian >
On 04/03/13 13:52, Niels Kristian Schjødt wrote: > LSI MegaRAID SAS 9260-4i with four Intel SSDSC2CW240A3K5 SSDs OR four Hitachi Ultrastar 15K600 SAS drives? > > My app is pretty write heavy and I have a lot of concurrent connections 300 - (though considering adding pgpool2 in frontto increase throughput). > If you can afford it, there's no question in my mind that SSDs are the way to go. They can be 1000 times faster for random reads. May I suggest that you do some experiments though - perhaps with just one disk of each type - you can get some pretty good illustrative tests with ordinary SATA drives in an ordinary laptop/desktop (but not a USB adapter). I did this originally when evaluating the (then new) Intel X25 SSD. The other things to note are: * The filesystem matters. For the important thing, fdatasync(), ext2 is 2x as fast as ext4, which itself is much faster than ext3. BUT ext2's fsck is horrid, so we chose ext4. * Will you enable the disk (or RAID controller) write cache? * Have you enough RAM for your key tables (and indexes) to fit in memory? If not, 64GB of RAM is cheap these days. * In some applications, you can get a speed boost by turning synchronous_commit off - this would mean that in a database crash, the last few seconds are potentially lost, even through they application thinks they were committed. You may find this an acceptable tradeoff. * Postgres doesn't always write straight to the tables, but uses the WAL (write-ahead-log). So the benefit of SSD performance for "random writes" is less relevant than for "random reads". Lastly, don't overdo the concurrent connections. You may end up with less thoughput than if you let postgres devote more resources to each request and let it finish faster. Hope that helps, Richard
On Mon, Mar 4, 2013 at 7:04 AM, AJ Weber <aweber@comcast.net> wrote: > Apologies for the tangential question, but how would pgpool2 "increase > throughput"? Wouldn't the same number of statements be issued by your > application? It would likely reduce the number of concurrent connections, > but that doesn't necessarily equate to "increased throughput". This is a pretty common subject. Most servers have a "peak throughput" that occurs at some fixed number of connections. for instance a common throughput graph of pgbench on a server might look like this: conns : tps 1 : 200 2 : 250 4 : 400 8 : 750 12 : 1200 16 : 2000 24 : 2200 28 : 2100 32 : 2000 40 : 1800 64 : 1200 80 : 800 100 : 400 So by concentrating your connections to be ~24 you would get maximum throughput. Such a graph is typical for most db servers, just a different "sweet spot" where the max throughput for a given number of connections. Some servers fall off fast past this number, some just slowly drop off.
Great info, I really appreciate the insight. Is there a FAQ/recommended setup for running pgbench to determine where this might be? (Is there a reason to setup pgbench differently based on the server's cores/memory/etc?) Sorry if this detracts from the OP's original question. -AJ On 3/4/2013 9:36 AM, Scott Marlowe wrote: > On Mon, Mar 4, 2013 at 7:04 AM, AJ Weber<aweber@comcast.net> wrote: >> Apologies for the tangential question, but how would pgpool2 "increase >> throughput"? Wouldn't the same number of statements be issued by your >> application? It would likely reduce the number of concurrent connections, >> but that doesn't necessarily equate to "increased throughput". > This is a pretty common subject. Most servers have a "peak > throughput" that occurs at some fixed number of connections. for > instance a common throughput graph of pgbench on a server might look > like this: > > conns : tps > 1 : 200 > 2 : 250 > 4 : 400 > 8 : 750 > 12 : 1200 > 16 : 2000 > 24 : 2200 > 28 : 2100 > 32 : 2000 > 40 : 1800 > 64 : 1200 > 80 : 800 > 100 : 400 > > So by concentrating your connections to be ~24 you would get maximum > throughput. Such a graph is typical for most db servers, just a > different "sweet spot" where the max throughput for a given number of > connections. Some servers fall off fast past this number, some just > slowly drop off.
On Mon, Mar 4, 2013 at 7:43 AM, AJ Weber <aweber@comcast.net> wrote: > Great info, I really appreciate the insight. Is there a FAQ/recommended > setup for running pgbench to determine where this might be? (Is there a > reason to setup pgbench differently based on the server's cores/memory/etc?) Well keep in mind that pgbench may or may not represent your real load. However it can be used with custom sql scripts to run a different load than that which it runs by default so you can get some idea of where your peak connections / throughput sits. And let's face it that if you're currently running 500 connections and your peak occurs at 64 then a pooler is gonna make a difference whether you set it to 64 or 100 or 48 etc. The basic starting point on pgbench is to use a scale factor of at least 2x however many connections you'll be testing. You can also do read only transactions to get an idea of what the peak number of connections are for read only versus read/write transactions. If read only transactions peak at say 100 while r/w peak at 24, and your app is 95% read, then you're probably pretty safe setting a pooler to ~100 conns instead of the lower 24. If your app is set to have one pool for read only stuff (say reporting) and another for r/w then you can setup two different poolers but that's a bit of added complexity you may not really need. The real danger with lots of connections comes from having lots and lots of idle connections. Let's say you've got 1000 connections and 950 are idle. Then the server gets a load spike and queries start piling up. Suddenly instead of ~50 active connections that number starts to climb to 100, 200, 300 etc. Given the slower throughput most servers see as the number of active connections climbs your server may slow to a crawl and never recover til you remove load.
That's around the behavior I'm seeing - I'll be testing tonight! :-)