Thread: New system recommendations
I am about to take a system from testing into production.
The system is a combination oltp/bi (network monitoring platform).
We are currently inserting about 1 million rows per day, and will increase to probably 5 million once it goes into full deployment.
The current test production server is running Windows Server 2003, PostgreSQL 8.1.3.
Record insertion is done via an ODBC call to a stored procedure from a Windows based host monitor.
In addition to PostgreSQL, the server also runs Apache/PHP for the GUI which displays our dashboards with the querying and drill-down front end..
It is currently running on a 3GHz Xeon HT, 2GB RAM, dual 72GB disks running RAID 1. This server is a 1U without only 2 drive bays, so I have a potential issue with drive space.
As a result, I will be moving the db server to a Dell 1650 with 3 146GB SCSI drives running RAID 0. System is a dual processor, 1.2GHz, with 4GB RAM.
Due to the amount of record insertions being performed, record insertion speed is paramount – also because excessive execution time will have a side effect of causing the monitoring agent to go stale. Some of the users have mentioned that I will get better performance running under a *nix OS. We are mostly a Microsoft OS house, but also run FreeBSD. I am considering deploying with FreeBSD 6.0. I was wondering if anyone has benchmarks showing speed of execution of PostgreSQL 8.1.3 under Win2003 and FreeBSD 6.0. Also, are there any caveats or items I should be aware of if running under FreeBSD? Any issues when running under a multi-processor kernel? Anything in specific which I should include in the kernel build to give me optimum performance for running PostgreSQL?
Needless to say, I am a bit nervous of moving to FreeBSD since I have not tested it in a production environment.
Any advice will be deeply appreciated.
Regards,
Benjamin
On Wed, Apr 26, 2006 at 07:30:49PM -0600, Benjamin Krajmalnik wrote: > I am about to take a system from testing into production. > > The system is a combination oltp/bi (network monitoring platform). > > We are currently inserting about 1 million rows per day, and will > increase to probably 5 million once it goes into full deployment. > > The current test production server is running Windows Server 2003, > PostgreSQL 8.1.3. > > Record insertion is done via an ODBC call to a stored procedure from a > Windows based host monitor. Make sure you're batching inserts with transactions. > It is currently running on a 3GHz Xeon HT, 2GB RAM, dual 72GB disks > running RAID 1. This server is a 1U without only 2 drive bays, so I > have a potential issue with drive space. > > As a result, I will be moving the db server to a Dell 1650 with 3 146GB > SCSI drives running RAID 0. System is a dual processor, 1.2GHz, with > 4GB RAM. 3 drive raid0 is likely to fail within 3-4 years, just so you know; unless it's not new hardware, in which case I'd expect something closer to 2 years (my general experience is that server HDs will last 4-6 years, so with 3 of them you're looking at a failure every ~2 years). You sure you want to trust a monitoring app to raid0? :) > Due to the amount of record insertions being performed, record insertion > speed is paramount - also because excessive execution time will have a > side effect of causing the monitoring agent to go stale. Some of the > users have mentioned that I will get better performance running under a > *nix OS. We are mostly a Microsoft OS house, but also run FreeBSD. I > am considering deploying with FreeBSD 6.0. I was wondering if anyone > has benchmarks showing speed of execution of PostgreSQL 8.1.3 under > Win2003 and FreeBSD 6.0. Also, are there any caveats or items I should > be aware of if running under FreeBSD? Any issues when running under a > multi-processor kernel? Anything in specific which I should include in > the kernel build to give me optimum performance for running PostgreSQL? > > Needless to say, I am a bit nervous of moving to FreeBSD since I have > not tested it in a production environment. Aside from some very large shops running FreeBSD (yahoo does I believe, and hotmail did for a long time, even after MS bought it), it's also a preferred OS by PostgreSQL developers. On the other hand we've only had Windows support for about 2 years and there's known windows-only issues. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Benjamin Krajmalnik wrote: > We are currently inserting about 1 million rows per day, and will > increase to probably 5 million once it goes into full deployment. > > It is currently running on a 3GHz Xeon HT, 2GB RAM, dual 72GB disks > running RAID 1. This server is a 1U without only 2 drive bays, so I > have a potential issue with drive space. > > As a result, I will be moving the db server to a Dell 1650 with 3 146GB > SCSI drives running RAID 0. System is a dual processor, 1.2GHz, with > 4GB RAM. I'm assuming you've already optimized your code with all the necessary tricks for maximum insert/update performance because your disk upgrade plan is rather underwhelming. 3X the disk bandwidth (3dr-RAID0 versus 2dr-RAID1) won't even keep pace with your expected 5X increased volume. Anytime you do a system upgrade, you always have to plan to handle many times the current projected volume so you don't have to upgrade yet again next year. Go nuts -- get as many disks as you can afford. Get as much onboard cache as you can afford. If number of drive bays is an issue, I'd get a separate case/rackmount and run external scsi/sata cables to it.
Hi, > It is currently running on a 3GHz Xeon HT, 2GB RAM, dual 72GB disks > running RAID 1. This server is a 1U without only 2 drive bays, so I > have a potential issue with drive space. > > As a result, I will be moving the db server to a Dell 1650 with 3 146GB > SCSI drives running RAID 0. System is a dual processor, 1.2GHz, with > 4GB RAM. > > > Do you mean RAID0? You face total data loss when the first disk breaks - a 1.2GHz machine is not going to be young, so I assume neither are the disks - a recipe for disaster I think. If your backup strategy is such that you can cope with total data loss and you are more interested in performance I believe you will get better performance by logical splitting of the database across multiple spindles rather than striping. For example if you put the write-ahead-log onto a spindle by itself you will get better performance than running everything on one jumbo RAID0 drive. Databases are usually sensitive to head latency rather than raw transfer rate. Dedicating one drive to the log (which must be written for each and every insert/update) minimizes the latency on writing the log. Is it perhaps an option to take the 146GB drives out of the 1650 and putting them into the Xeon machine? That might get you the best of both worlds (faster CPU, bigger hard disks, RAID1). One final word or warning, whichever O/S you use HT doesn't always work well for postgres. It's hard to pin down exactly what happens, but on a P4 HT system (2.6 kernel again, PG 8.0) we found that we had some very odd performance problems that were hard to reproduce in the lab but occurred in the live environment a lot. Turning off HT made them go away. Hope this helps, Robin
I was thinking of going Raid 0 to increase the active spindles. But, come to think about it, if I go to FreeBSD I really do not need to go that route, since I can use a symlink and, as suggested, move the logs to another spindle. I was also wary of the Raid 0 approach - specially since last year I had 2 drives in a Raid 5 fail within 5 minutes of each other! Assuming that presently the server's capability as far as drives is 3 drives, would you recommend doing a RAID-1 on 2 drives and an additional non RAIDed drive to hold the WAL? Anything else which shold be moved to the other spindle? Any recommendations when building the kernel (FreeBSD 6.0)? Regards, Benjamin
>Assuming that presently the server's capability as far as drives is 3 >drives, would you recommend doing a RAID-1 on 2 drives and an additional >non RAIDed drive to hold the WAL? Anything else which shold be moved to >the other spindle? > > > I would RAID-1 the WAL. Assuming you have a mechanism to keep the logs since the last backup you should be able to restore from that last backup and logs without the data drive. The reverse isn't true - if you lose the WAL but have the data drive, it may not be consistent ... Robin
On Wed, 2006-04-26 at 20:30, Benjamin Krajmalnik wrote: > I am about to take a system from testing into production. > > The system is a combination oltp/bi (network monitoring platform). > > We are currently inserting about 1 million rows per day, and will > increase to probably 5 million once it goes into full deployment. Assuming most of that is during the nominal 8 hour workday, that's about 35 inserts per second, up to 173 inserts per second, on average. Anytime you're looking at a high write load on a database (not just PostgreSQL) you should be looking at a hardware RAID controller with battery backed cache and RAID 1 or RAID 1+0. > The current test production server is running Windows Server 2003, > PostgreSQL 8.1.3. > > Record insertion is done via an ODBC call to a stored procedure from a > Windows based host monitor. Windows support is still pretty new, but I'm guessing that much of the performance problems we've seen have been as much tuning as OS related. > In addition to PostgreSQL, the server also runs Apache/PHP for the GUI > which displays our dashboards with the querying and drill-down front > end.. You might be able to get better performance if you make the postgresql box a single purpose machine. > It is currently running on a 3GHz Xeon HT, 2GB RAM, dual 72GB disks > running RAID 1. This server is a 1U without only 2 drive bays, so I > have a potential issue with drive space. > > As a result, I will be moving the db server to a Dell 1650 with 3 > 146GB SCSI drives running RAID 0. System is a dual processor, 1.2GHz, > with 4GB RAM. Don't run important servers on RAID 0. RAID 1 is a better choice. RAID 1+0 is a betterer choice. I do NOT recommend the Dell 16xx/26xx series, from experience. Last job I had I had a 2600, fairly early model, and it was stable. We had the AMI based RAID controller in it (Perc4/DC), and ran the latest 2.xx version of the megaraid driver. All the other 2600s where I worked (and we had lots of them) had occasional, unexplainable lockups. They were never corrected. They all had the adaptec controllers (Perc4/DI) in them. Started a new job last year, and our servers were occasionally locking up for no good reason. go and look in the server room and guess what we're running. yep. 2650s with Perc4/DI controllers. We have 2850s now, and they've been quite solid. Performance-wise neither the 26xx or 28xx is a stellar performer, but at least the 28xx series seems to be stable. > Due to the amount of record insertions being performed, record > insertion speed is paramount – also because excessive execution time > will have a side effect of causing the monitoring agent to go stale. Then definitely get the right tool for the job: a battery backed caching RAID controller. > Some of the users have mentioned that I will get better performance > running under a *nix OS. We are mostly a Microsoft OS house, but also > run FreeBSD. I am considering deploying with FreeBSD 6.0. I was > wondering if anyone has benchmarks showing speed of execution of > PostgreSQL 8.1.3 under Win2003 and FreeBSD 6.0. Also, are there any > caveats or items I should be aware of if running under FreeBSD? Any > issues when running under a multi-processor kernel? Anything in > specific which I should include in the kernel build to give me optimum > performance for running PostgreSQL? If you've got basic experience with FreeBSD, then running PostgreSQL on it should be a no brainer.
> Anytime you're looking at a high write load on a database (not just > PostgreSQL) you should be looking at a hardware RAID controller with > battery backed cache and RAID 1 or RAID 1+0. > We recently benchmarked the latest and greatest PCI-X SATA mega raid controller (with NCQ support, so approaching SCSI performance) and we found it was no faster than running s/w RAID-1 on a dual-core P4 with the 2.6 series kernel. It would no doubt offer an advantage on running RAID-5 or any other RAID scheme that required XOR computation, but we don't want to use RAID-5. However we did not fit the Transportable Battery Backup Unit and so we didn't measure the benefit of early commit to the cache (fsync() returns early, data is written to disk later, battery keeps data safe in the meanwhile). I plan on repeating the experiment again later on in May once we have a TBBU and will post the results here if anyone is interested. Best wishes, Robin
On Thu, 2006-04-27 at 13:43, Robin Iddon wrote: > > Anytime you're looking at a high write load on a database (not just > > PostgreSQL) you should be looking at a hardware RAID controller with > > battery backed cache and RAID 1 or RAID 1+0. > > > We recently benchmarked the latest and greatest PCI-X SATA mega raid > controller (with NCQ support, so approaching SCSI performance) and we > found it was no faster than running s/w RAID-1 on a dual-core P4 with > the 2.6 series kernel. It would no doubt offer an advantage on running > RAID-5 or any other RAID scheme that required XOR computation, but we > don't want to use RAID-5. > > However we did not fit the Transportable Battery Backup Unit and so we > didn't measure the benefit of early commit to the cache (fsync() returns > early, data is written to disk later, battery keeps data safe in the > meanwhile). That battery backed cache makes all the difference. Also, unless someone's done some work on it, the older linux kernel raid modules serialized the access via multi-level RAID in such a way that RAID 1+0 was no faster than RAID 1. You're right about RAID5 sw versus hw. Without the BBCache unit, they're pretty much the same, and generally limited by the speed of the drives, not the method of implementation. Note that the Areca controllers and a few others get much better scores. But, the megaraid and adaptec controllers are the only ones dell includes in their boxen, so if it's a choice between the adaptec and the megaraid, I'd normally pick the megaraid. Note that some folks have mentioned that Dell uses their own firmware on these cards, and that firmware is supposedly slower than the stock firmware those cards come with. > I plan on repeating the experiment again later on in May once we have a > TBBU and will post the results here if anyone is interested. I'd love to see how the newer cards perform. I know that for RAID 1, straight up, linux kernel sw RAID is quite speedy. I'd imagine it would take a fairly heavily written environment to see an advantage for the HW controller.
smarlowe@g2switchworks.com (Scott Marlowe) writes: > Note that some folks have mentioned that Dell uses their own > firmware on these cards, and that firmware is supposedly slower than > the stock firmware those cards come with. Worse, the default behaviours apparently involve "works *like* RAID10" as opposed to "implements RAID10." I have visions here of Tom Hanks' SNL sketch, _Sabra Price Is Right_, where he's selling all sorts of shoddy bits of junk, using the line "like Sony guts" in it... -- select 'cbbrowne' || '@' || 'ntlug.org'; http://www3.sympatico.ca/cbbrowne/rdbms.html What should you do when you see an endangered animal that is eating an endangered plant?
Jim C. Nasby wrote: > On Wed, Apr 26, 2006 at 07:30:49PM -0600, Benjamin Krajmalnik wrote: > >> I am about to take a system from testing into production. >> >> The system is a combination oltp/bi (network monitoring platform). >> >> We are currently inserting about 1 million rows per day, and will >> increase to probably 5 million once it goes into full deployment. >> >> The current test production server is running Windows Server 2003, >> PostgreSQL 8.1.3. >> >> Record insertion is done via an ODBC call to a stored procedure from a >> Windows based host monitor. >> > > Make sure you're batching inserts with transactions. > > >> It is currently running on a 3GHz Xeon HT, 2GB RAM, dual 72GB disks >> running RAID 1. This server is a 1U without only 2 drive bays, so I >> have a potential issue with drive space. >> >> As a result, I will be moving the db server to a Dell 1650 with 3 146GB >> SCSI drives running RAID 0. System is a dual processor, 1.2GHz, with >> 4GB RAM. >> > > 3 drive raid0 is likely to fail within 3-4 years, just so you know; > unless it's not new hardware, in which case I'd expect something closer > to 2 years (my general experience is that server HDs will last 4-6 > years, so with 3 of them you're looking at a failure every ~2 years). > You sure you want to trust a monitoring app to raid0? :) > > >> Due to the amount of record insertions being performed, record insertion >> speed is paramount - also because excessive execution time will have a >> side effect of causing the monitoring agent to go stale. Some of the >> users have mentioned that I will get better performance running under a >> *nix OS. We are mostly a Microsoft OS house, but also run FreeBSD. I >> am considering deploying with FreeBSD 6.0. I was wondering if anyone >> has benchmarks showing speed of execution of PostgreSQL 8.1.3 under >> Win2003 and FreeBSD 6.0. Also, are there any caveats or items I should >> be aware of if running under FreeBSD? Any issues when running under a >> multi-processor kernel? Anything in specific which I should include in >> the kernel build to give me optimum performance for running PostgreSQL? >> >> Needless to say, I am a bit nervous of moving to FreeBSD since I have >> not tested it in a production environment. >> > > Aside from some very large shops running FreeBSD (yahoo does I believe, > and hotmail did for a long time, even after MS bought it), it's also a > preferred OS by PostgreSQL developers. On the other hand we've only had > Windows support for about 2 years and there's known windows-only issues. > Not certain about freebsd 6.x, but on 4.x, running an intel SMP build limits each machine to three and a half gigs of memory: FreeBSD 5.4-STABLE #0: Fri Sep 2 11:32:58 PDT 2005 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(TM) CPU 3.06GHz (3056.50-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf27 Stepping = 7 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Hyperthreading: 2 logical CPUs real memory = 3759144960 (3585 MB) avail memory = 3679334400 (3508 MB) There is 4 gigs of physical memory in the machine that produced the above. PAE cannot be built into this kernel... I can't even remember what the error was.. I think PAE ended up being incompatible with HTT. -Brian
> Record insertion is done via an ODBC call to a stored procedure from a > Windows based host monitor" I wondered what this stored procedure was (is it a function or something outside postgresql). There was an interesting article on odbc - that application code can make big differences in database related performance. http://ourworld.compuserve.com/homepages/Ken_North/ODBCPERF.HTM Thought odbc performance might be also a thing to consider but this article seemed to attest otherwise. Hence the question. Regards, Ben K.
The stored procedure is a plpgsql function which gets passed parameters from the monitoring agent. It then dynamically creates a device record for the monitored device if one does not yet exist. Once that is done it creates a test record for the particular test if one does not exist. Once that is done, it aggregates dynamically the data into 4 tables - a daily snapshot, a weekly snapshot, and a monthly snapshot, and a dashboard snapshot. Once all of that has been accomplished, it creates a raw log entry (which as of next week will go to a given partition). This data goes into the partitioned table to facilitate purging of retention periods without hammering at the database and having to rebalance indices (I just truncate the partition once it is no longer needed). -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Ben K. Sent: Thursday, April 27, 2006 9:19 PM To: pgsql-admin@postgresql.org Subject: Re: [ADMIN] New system recommendations > Record insertion is done via an ODBC call to a stored procedure from a > Windows based host monitor" I wondered what this stored procedure was (is it a function or something outside postgresql). There was an interesting article on odbc - that application code can make big differences in database related performance. http://ourworld.compuserve.com/homepages/Ken_North/ODBCPERF.HTM Thought odbc performance might be also a thing to consider but this article seemed to attest otherwise. Hence the question. Regards, Ben K. ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
I'm putting both the private email and the thread back on the list, as there's interesting data in here people could use. I don't think I'm betraying any trust here, but if I am, please, let me know... On Thu, 2006-04-27 at 17:37, Benjamin Krajmalnik wrote: > Thanks for the feedback. > I wonder if the Dell backplane will take a stabdard RAID controller > instead of using the ROMB. > I may investigate just getting 146GB drives for the DL360 and using that > instead, and maybe setting up the dell as a backup server, creating > backups and moving the files over. If you've got some spare smaller drives laying about, you can always set up both and benchmark them against each other, plus that gives you a ready to go backup machine. > Quick question - I do not have a transaction frame explicitly declared > in m stored procedure (which is used to handle all of the inserts). > In plpgsql, how would I wrap the code inside a transaction frame? All stored procs run as a single transaction. I.e. the simple act of making it a user defined function has made it a single transaction, with all the benefits that entails. > This is such a dilemma - we are launching a new service, but until we > sign up clients on it I don't have the budget to go crazy on hardware. > I must make do with what I have. What I was thinking was mybe setting > up the Dell to be the web server and the backup database server, and > having the production database run on the DL 360, provided I can get > better database performance. If I can get significant better > performance on FreeBSD I will go to it, otherwise I will stay with > Windows. I'd test the two. I'm guessing that at least for the initial ramp up, windows will be ok. The real performance issue for windows is opening connections is much slower. If the majority of your machine's time is spent on the query, the overhead of opening connections will be lost in the noise. > The stored procedure is a plpgsql function which gets passed parameters > from the monitoring agent. > It then dynamically creates a device record for the monitored device if > one does not yet exist. > Once that is done it creates a test record for the particular test if > one does not exist. Up to now, fine. Performance on any machine should be ok. > Once that is done, it aggregates dynamically the data into 4 tables - a > daily snapshot, a weekly snapshot, and a monthly snapshot, and a > dashboard snapshot. Are you running aggregate functions against the whole table to do this? If so, this isn't going to scale. If you're just taking the data entered in this stored proc and adding it to a table that contains that data, then maybe it's ok. But if it's updating based on a huge amount of data, then that's going to be slow. > Once all of that has been accomplished, it creates a raw log entry > (which as of next week will go to a given partition). This data goes > into the partitioned table to facilitate purging of retention periods > without hammering at the database and having to rebalance indices (I > just truncate the partition once it is no longer needed). Sounds reasonable.
Concerning the aggregation, no – I am not running aggregate functions on the database itself.
Let’s take a small example. Let’s say I am aggregating avg ping time on an hourly basis.
I have, inside the table declarations, structures with an array[24].
As data comes in, I retrieve the test record for the particular test, based on the test time I establish which array offset needs to be taken care of, and then perform the relevant computatis.
For example, I have a testcounter array and a testresult array. When a new event comes in, average is going to be (testresult[n] * testcounter[n] + newvalue) / testcounter[n] + 1, followed by a testcounter[n] += 1. So I am not loading the database doing aggregate functions.
On our UI side of things, now that everything is working properly, I am also optimizing code. The current test code was looping through an array elemnt and aggregating through the backend. Of course, this is inefficient since, for example, let’s say I am showing an hourly graph, now I have 24 selects going on. Right now we are recoding to have one select which will return the aggregate of all of the offsets at once. It would be great if we could aggregate an array in one hot, but in the meantime we’ll do it this way.
From: Scott Marlowe [mailto:smarlowe@g2switchworks.com]
Sent: Friday, April 28, 2006 9:24 AM
To: Benjamin Krajmalnik
Cc: Ben K.; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] New system recommendations
I'm putting both the private email and the thread back on the list, as
there's interesting data in here people could use. I don't think I'm
betraying any trust here, but if I am, please, let me know...
On Thu, 2006-04-27 at 17:37, Benjamin Krajmalnik wrote:
> Thanks for the feedback.
> I wonder if the Dell backplane will take a stabdard RAID controller
> instead of using the ROMB.
> I may investigate just getting 146GB drives for the DL360 and using that
> instead, and maybe setting up the dell as a backup server, creating
> backups and moving the files over.
If you've got some spare smaller drives laying about, you can always set
up both and benchmark them against each other, plus that gives you a
ready to go backup machine.
> Quick question - I do not have a transaction frame explicitly declared
> in m stored procedure (which is used to handle all of the inserts).
> In plpgsql, how would I wrap the code inside a transaction frame?
All stored procs run as a single transaction. I.e. the simple act of
making it a user defined function has made it a single transaction, with
all the benefits that entails.
> This is such a dilemma - we are launching a new service, but until we
> sign up clients on it I don't have the budget to go crazy on hardware.
> I must make do with what I have. What I was thinking was mybe setting
> up the Dell to be the web server and the backup database server, and
> having the production database run on the DL 360, provided I can get
> better database performance. If I can get significant better
> performance on FreeBSD I will go to it, otherwise I will stay with
> Windows.
I'd test the two. I'm guessing that at least for the initial ramp up,
windows will be ok. The real performance issue for windows is opening
connections is much slower. If the majority of your machine's time is
spent on the query, the overhead of opening connections will be lost in
the noise.
> The stored procedure is a plpgsql function which gets passed parameters
> from the monitoring agent.
> It then dynamically creates a device record for the monitored device if
> one does not yet exist.
> Once that is done it creates a test record for the particular test if
> one does not exist.
Up to now, fine. Performance on any machine should be ok.
> Once that is done, it aggregates dynamically the data into 4 tables - a
> daily snapshot, a weekly snapshot, and a monthly snapshot, and a
> dashboard snapshot.
Are you running aggregate functions against the whole table to do this?
If so, this isn't going to scale. If you're just taking the data
entered in this stored proc and adding it to a table that contains that
data, then maybe it's ok. But if it's updating based on a huge amount
of data, then that's going to be slow.
> Once all of that has been accomplished, it creates a raw log entry
> (which as of next week will go to a given partition). This data goes
> into the partitioned table to facilitate purging of retention periods
> without hammering at the database and having to rebalance indices (I
> just truncate the partition once it is no longer needed).
Sounds reasonable.