Thread: Opteron scaling with PostgreSQL
Some time ago, I asked about how well PostgreSQL scales with the number of processors in an Opteron system. To my surprise, no one seemed to know! Well, a couple of days ago, a shiny, new Celestica A8440 showed up at my office, so I decided to run it through the paces. Hopefully, this will be useful to someone else as well! Hardware info ------------- Celestica A8440 4xOpteron 848 8 gigs PC3200 reg/ECC memory Software info ------------- Fedora core 2 x86-64 PostgreSQL 7.4.2 Added compile options: -O3 -m64 Startup options: 256 MB shared buffer, fsync OFF to eliminate the disk system as a variable, 128 megs sort memory Testing method -------------- I logged 10,000 queries from our production DB server, and wrote a Perl program to issue them via an arbitrary number of "workers". Before each run, the database was "warmed up" by going through two preliminary runs to ensure that caches and buffers were populated. Instead of removing processors (which would have also reduced the memory), I used the boot argument "maxcpus" to limit the number of CPUs that Linux would use. Preliminary thoughts -------------------- After playing around, I found that the optimal size for the shared buffer was 256 megs. To the opposite of my expectations, using more shared buffer resulted in a lower throughput. Results! -------- maxcpus max queries per second ------- ---------------------- 1 378 qps @ 32 connections (baseline) 2 609 qps @ 96 connections (161% of baseline) 3 853 qps @ 48 connections (225% of baseline) 4 1033 qps @ 64 connections (273% of baseline) A graph of the throughputs for various numbers of CPUs and connections can be found at http://www.codon.com/PG-scaling.gif steve
> -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Steve Wolfe > Sent: Thursday, June 10, 2004 2:09 PM > To: pgsql-general > Subject: [GENERAL] Opteron scaling with PostgreSQL > > > > Some time ago, I asked about how well PostgreSQL scales with the > number of processors in an Opteron system. To my surprise, no one > seemed to know! Well, a couple of days ago, a shiny, new Celestica > A8440 showed up at my office, so I decided to run it through > the paces. > Hopefully, this will be useful to someone else as well! > > Hardware info > ------------- > Celestica A8440 > 4xOpteron 848 > 8 gigs PC3200 reg/ECC memory > > Software info > ------------- > Fedora core 2 x86-64 > PostgreSQL 7.4.2 > Added compile options: -O3 -m64 > Startup options: 256 MB shared buffer, fsync OFF to > eliminate the disk > system as a variable, 128 megs sort memory I would very much like to see the same test with Fsync on. A test that does not reflect real-world use has less value than one that just shows how fast it can go. For a read-only database, fsync could be turned off. For any other system it would be hair-brained and nobody in their right mind would do it. > Testing method > -------------- > I logged 10,000 queries from our production DB server, > and wrote a Perl > program to issue them via an arbitrary number of "workers". > Before each > run, the database was "warmed up" by going through two > preliminary runs > to ensure that caches and buffers were populated. > > Instead of removing processors (which would have also > reduced the > memory), I used the boot argument "maxcpus" to limit the > number of CPUs > that Linux would use. > > Preliminary thoughts > -------------------- > After playing around, I found that the optimal size for > the shared > buffer was 256 megs. To the opposite of my expectations, using more > shared buffer resulted in a lower throughput. > > Results! > -------- > > maxcpus max queries per second > ------- ---------------------- > 1 378 qps @ 32 connections (baseline) > 2 609 qps @ 96 connections (161% of baseline) > 3 853 qps @ 48 connections (225% of baseline) > 4 1033 qps @ 64 connections (273% of baseline) > > > A graph of the throughputs for various numbers of CPUs and > connections can be found at http://www.codon.com/PG-scaling.gif It is very impressive how well the system scales. I would like to see a PostgreSQL system run against these guys: http://www.tpc.org/ It might prove interesting to see how it stacks up against commercial systems. Certainly when it comes to Dollars per TPS, there would be a stupendous leg up to start with! ;-)
> I would very much like to see the same test with Fsync on. > A test that does not reflect real-world use has less value than one that > just shows how fast it can go. > > For a read-only database, fsync could be turned off. For any other > system it would be hair-brained and nobody in their right mind would > do it. Then I must not be in my right mind. : ) Before I explain why *I* run with fsync turned off, the main reason the tests were done without fsync was to test the scalability of the Opteron platform, not the scalability of my disk subsystem. = ) I've run with fsync off on my production servers for years. Power never goes off, and RAID 5 protects me from disk failures. Sooner or later, it may bite me in the butt. We make backups sufficiently often that the small amount of data we'll lose will be far offset by the tremendous performance boost that we've enjoyed. In fact, we even have a backup server sitting there doing nothing, which can take over the duties of the main DB server within a VERY short amount of time. steve
Steve Wolfe <nw@codon.com> writes: > I've run with fsync off on my production servers for years. Power never > goes off, and RAID 5 protects me from disk failures. Sooner or later, it may > bite me in the butt. We make backups sufficiently often that the small amount > of data we'll lose will be far offset by the tremendous performance boost that > we've enjoyed. In fact, we even have a backup server sitting there doing > nothing, which can take over the duties of the main DB server within a VERY > short amount of time. That's good, because you'll eventually need it. All it will take will be a Linux crash for the database files on disk to become corrupted. No amount of UPS or RAID protection will protect from that. -- greg
> -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Greg Stark > Sent: Saturday, June 12, 2004 12:18 AM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Opteron scaling with PostgreSQL > > > > Steve Wolfe <nw@codon.com> writes: > > > I've run with fsync off on my production servers for > years. Power > > never goes off, and RAID 5 protects me from disk failures. > Sooner or > > later, it may bite me in the butt. We make backups > sufficiently often > > that the small amount of data we'll lose will be far offset by the > > tremendous performance boost that we've enjoyed. In fact, we even > > have a backup server sitting there doing nothing, which can > take over > > the duties of the main DB server within a VERY short amount of time. > > That's good, because you'll eventually need it. > > All it will take will be a Linux crash for the database files > on disk to become corrupted. No amount of UPS or RAID > protection will protect from that. Another important point is that the data in an organization is always more valuable than the hardware and the software. Hose up the hardware and the software, and insurance gets new stuff. Hose up the data and you are really hosed for good.
"Dann Corbit" <DCorbit@connx.com> wrote: [snip] > > Another important point is that the data in an organization is always > more valuable than the hardware and the software. > > Hose up the hardware and the software, and insurance gets new stuff. > > Hose up the data and you are really hosed for good. It's amazing, how many people don't seem to get that. Jim
Greg Stark <gsstark@mit.edu> writes: > Steve Wolfe <nw@codon.com> writes: >> I've run with fsync off on my production servers for years. > All it will take will be a Linux crash for the database files on disk to > become corrupted. No amount of UPS or RAID protection will protect from that. And neither will fsync'ing, so I'm not sure what your point is. Steve clearly understands the need for backups, so I think he's prepared as well as he can for worst-case scenarios. He's determined that the particular scenarios fsync can protect him against are not big enough risks *in his environment* to justify the cost. I can't say that I see any flaws in his reasoning. regards, tom lane
On Sat, Jun 12, 2004 at 07:19:05AM -0400, Jim Seymour wrote: > "Dann Corbit" <DCorbit@connx.com> wrote: > [snip] > > > > Another important point is that the data in an organization is always > > more valuable than the hardware and the software. > > > > Hose up the hardware and the software, and insurance gets new stuff. > > > > Hose up the data and you are really hosed for good. > > It's amazing, how many people don't seem to get that. It's often not true. I use postgresql for massive data-mining of a bunch of high-update rate data sources. The value of the data decreases rapidly as it ages. Data over a month old is worthless. Data over a week old has very little value. If I lose all the data and can't recover it from backups then I can be back up and running within two days worth of new data handling, and back to business as usual within a week of new data. If I lose a router or a controller and have to fault-find, order a replacement and get it overnighted, reload the OS and restore the database and analysis software it'll take me offline for at least a couple of days, during which I can't even handle new incoming data, so it'd still take me two-three days after that before I was back up and running with usable data. So, for that particular case the data really isn't as valuable as the infrastructure, despite that segment of the business being primarily data analysis. In other words, different people have different needs. There are perfectly valid cases where you just don't care too much about the data, but need a decent SQL engine, others where you care about data-integrity (no silent corruption) but don't care about data loss, others where recovery from the previous days backup is fine if the system crashes and others where loss of a single transaction is a serious problem. PostgreSQL handles all those cases quite nicely, and provides some good performance/reliability trade-off configuration options. Cheers, Steve
Steve Atkins <steve@blighty.com> wrote: > > On Sat, Jun 12, 2004 at 07:19:05AM -0400, Jim Seymour wrote: > > "Dann Corbit" <DCorbit@connx.com> wrote: > > [snip] > > > > > > Another important point is that the data in an organization is always > > > more valuable than the hardware and the software. > > > > > > Hose up the hardware and the software, and insurance gets new stuff. > > > > > > Hose up the data and you are really hosed for good. > > > > It's amazing, how many people don't seem to get that. > > It's often not true. > > I use postgresql for massive data-mining of a bunch of high-update > rate data sources. The value of the data decreases rapidly as it > ages. Data over a month old is worthless. Data over a week old has > very little value. [snip] > Good argument and well-made. So s/always/frequently/ in Dann Corbit's comments. Perhaps even "most often." The point is: Many people, some even so-called "SysAdmins," will compromise on hardware and software, apparently w/o thought to the fact that the unique, original, irreplaceable data that hardware and software is handling is indeed valuable and (possibly) irreplaceable. Jim
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > Steve Wolfe <nw@codon.com> writes: > >> I've run with fsync off on my production servers for years. > > > All it will take will be a Linux crash for the database files on disk to > > become corrupted. No amount of UPS or RAID protection will protect from that. > > And neither will fsync'ing, so I'm not sure what your point is. Uhm, well a typical panic causes the machine to halt. It's possible that causes the OS to scribble all over disk if that's what you mean, but it's pretty rare. Usually I just get random reboots or halts when things are going wrong. In that case you have a consistent database if you use fsync but not if you don't. > Steve clearly understands the need for backups, so I think he's prepared as > well as he can for worst-case scenarios. He's determined that the particular > scenarios fsync can protect him against are not big enough risks *in his > environment* to justify the cost. I can't say that I see any flaws in his > reasoning. I wasn't disagreeing with that. Just trying to ensure that it was clear what the risk was. Without fsync anything that causes the OS to stop flushing blocks without syncing including power loss but also including a panic of any kind could (and probably would I would think) corrupt the DB. -- greg
> -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jim Seymour > Sent: Saturday, June 12, 2004 12:27 PM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Opteron scaling with PostgreSQL > > > Steve Atkins <steve@blighty.com> wrote: > > > > On Sat, Jun 12, 2004 at 07:19:05AM -0400, Jim Seymour wrote: > > > "Dann Corbit" <DCorbit@connx.com> wrote: > > > [snip] > > > > > > > > Another important point is that the data in an organization is > > > > always more valuable than the hardware and the software. > > > > > > > > Hose up the hardware and the software, and insurance gets new > > > > stuff. > > > > > > > > Hose up the data and you are really hosed for good. > > > > > > It's amazing, how many people don't seem to get that. > > > > It's often not true. > > > > I use postgresql for massive data-mining of a bunch of high-update > > rate data sources. The value of the data decreases rapidly > as it ages. > > Data over a month old is worthless. Data over a week old has very > > little value. > [snip] > > > > Good argument and well-made. So s/always/frequently/ in Dann > Corbit's comments. Perhaps even "most often." The point is: > Many people, some even so-called "SysAdmins," will compromise > on hardware and software, apparently w/o thought to the fact > that the unique, original, irreplaceable data that hardware > and software is handling is indeed valuable and (possibly) > irreplaceable. In addition, a data warehouse is a special case, since the source data remains untouched. With a data warehouse, you intentionally destroy and recreate it on a frequent basis. My statement remains true. The data is more valuable than the hardware. But in the case of a data warehouse, if the warehouse "burns to the ground" you can create another one on-demand. Since the original data is not destroyed, the data is not destroyed. If the original data from which the warehouse is derived were to be destroyed, then we see the value of the data. Of course, there are special cases where you don't care if you lose data. But it is not unusual for DBAs and Sysadmins to underestimate the value of the data, even in these special cases. For example, if a data warehouse goes down, and the data warehouse is used to compute month-end closing information, a delay of 3 days to redo everything can be a tremendous expense. There is an exception to every rule, of course. But I raise my hand and shout about this issue just so that people will think about it. What will the real cost be, if my database fails? Will it really be cheaper to do data integrity shortcuts rather than buying faster hardware? I will tend to err on the side of data integrity, for sure.