Re: hardware upgrade, performance degrade? - Mailing list pgsql-performance
From | Scott Marlowe |
---|---|
Subject | Re: hardware upgrade, performance degrade? |
Date | |
Msg-id | CAOR=d=1qQN5r1LracmbJcahemq9sR5gc9QfTUuAQ9U2+7cEj7Q@mail.gmail.com Whole thread Raw |
In response to | Re: hardware upgrade, performance degrade? (Steven Crandell <steven.crandell@gmail.com>) |
List | pgsql-performance |
On Fri, Mar 1, 2013 at 9:49 AM, Steven Crandell <steven.crandell@gmail.com> wrote: > We saw the same performance problems when this new hardware was running cent > 6.3 with a 2.6.32-279.19.1.el6.x86_64 kernel and when it was matched to the > OS/kernel of the old hardware which was cent 5.8 with a 2.6.18-308.11.1.el5 > kernel. > > Yes the new hardware was thoroughly tested with bonnie before being put into > services and has been tested since. We are unable to find any interesting > differences in our bonnie tests comparisons between the old and new > hardware. pgbench was not used prior to our discovery of the problem but > has been used extensively since. FWIW This server ran a zabbix database > (much lower load requirements) for a month without any problems prior to > taking over as our primary production DB server. > > After quite a bit of trial and error we were able to find a pgbench test (2x > 300 concurrent client sessions doing selects along with 1x 50 concurrent > user session doing the standard pgbench query rotation) that showed the new > hardware under performing when compared to the old hardware to the tune of > about a 1000 TPS difference (2300 to 1300) for the 50 concurrent user > pgbench run and about a 1000 less TPS for each of the select only runs > (~24000 to ~23000). Less demanding tests would be handled equally well by > both old and new servers. More demanding tests would tip both old and new > over with very similar efficacy. > > Hopefully that fleshes things out a bit more. > Please let me know if I can provide additional information. OK I'd recommend testing with various numbers of clients and seeing what kind of shape you get from the curve when you plot it. I.e. does it fall off really hard at some number etc? If the old server degrades more gracefully under very heavy load it may be that you're just admitting too many connections for the new one etc, not hitting its sweet spot. FWIW, the newest intel 10 core xeons and their cousins just barely keep up with or beat the 8 or 12 core AMD Opterons from 3 years ago in most of my testing. They look great on paper, but under heavy load they are luck to keep up most the time. There's also the possibility that even though you've turned off zone reclaim that your new hardware is still running in a numa mode that makes internode communication much more expensive and that's costing you money. This may especially be true with 1TB of memory that it's both running at a lower speed AND internode connection costs are much higher. use the numactl command (I think that's it) to see what the internode costs are, and compare it to the old hardware. IF the internode comm costs are really high, see if you can turn off numa in the BIOS and if it gets somewhat better. Of course check the usual, that your battery backed cache is really working in write back not write through etc. Good luck. Acceptance testing can really suck when newer, supposedly faster hardware is in fact slower.
pgsql-performance by date: