Thread: Hardware estimation
Hi, I am a first timer in this forum. We have an application using Postgre Database with a considerable growth expected for the next year. We are facing some dificulties to estimate the equipment we´ll need and I would like to exchange ideas about: - hardware estimation - performance concerns with anyone that faced Postgre applications with more than 150Gb. I am talking about a database that will achieve: - 200Gb total space - 2 major space cosuming tables (400 million and 50 million tuples) - major system process - batch process reading from and writing to text files (telephony records) - aprox 2 million records processed (read or written every day) Operating System : Linux Equipment we are planning: - IBM xSeries 255 4 Intel Xeon 1.5GHz, MP, 512KB Cache, 2GB RAM, External Storage System(IBM SSA) Thanks in advance Vidal Zebulum vzebulum@yahoo.com.br
On 5 Nov 2002, Vidal wrote: > Hi, > > I am a first timer in this forum. We have an application using > Postgre Database with a considerable growth expected for the next > year. We are facing some dificulties to estimate the equipment we´ll > need and I would like to exchange ideas about: > > - hardware estimation > - performance concerns > > with anyone that faced Postgre applications with more than 150Gb. > > I am talking about a database that will achieve: > > - 200Gb total space > - 2 major space cosuming tables (400 million and 50 million tuples) > - major system process - batch process reading from and writing to > text files (telephony records) - aprox 2 million records processed > (read or written every day) > > Operating System : Linux > > Equipment we are planning: > - IBM xSeries 255 4 Intel Xeon 1.5GHz, MP, 512KB Cache, 2GB RAM, > External Storage System(IBM SSA) Are you going to be supporting a fair number of simo users, or is this gonna be mostly a single batch at a time operation? If you're mostly gonna be running single user batch files for all the heavy lifting, save the money you'd spend on a quad Xeon and spend it on more / bigger / faster drive arrays and faster individual CPUs, like a dual Athlon 2800. But maybe the multi-user part is still very important to your uses. If so... If you are looking at a quad Xeon, then take a look at going all the way to 64 bit architecture (Ultra Sparc, HP, P6xx IBM, Itanium 2) where you can throw scads of memory at your problems and postgresql can access it. You might find a dual Ultra Sparc will outrun the quad xeon due to better memory access, larger caches (8Meg L2 cache), and being able to put ungodly amounts of memory into the machine. Some of the "low end" 64 bit machines are not any more expensive than a quad xeon can run. A quad 1GHz Power4 with 8 gig ram, dual 36 Gig drives and all the fixings goes for $44,000. That's with 64 Meg L3 cache.
> Some of the "low end" 64 bit machines are not any more expensive than a > quad xeon can run. Are you sure? That might be the case if you use the most over-priced Xeon system you can find.... > > A quad 1GHz Power4 with 8 gig ram, dual 36 Gig drives and all the fixings > goes for $44,000. That's with 64 Meg L3 cache. Wowee. That's a lot of money. You can build a quad Xeon with the same trimmings for a QUARTER of that price. Will the Power4 provide four times the performance? MB, chassis, triple power supply: $4,000 4 x 2.4 GHz P4 Xeon CPU's: $1,200 8 gigs RAM: $3,000 2 x 15k 36 gig HDD: $800 ------------ Total: $9,000 Of course, the limitting factor in PG seems (to me, at least) to be I/O, not CPU power. In a shared-bus sort of arrangement with 4 processers, I think you'll generally run out of I/O (whether it be from the disks, memory, or the CPU bus) long before you'll run out of CPU. Once you get above the level of performance that an Intel or AMD system can provide, then your relative cost (based on performance) goes up pretty steeply - but since there aren't any other options, you're pretty much stuck with it. steve
On Thu, 7 Nov 2002, Steve Wolfe wrote: > > Some of the "low end" 64 bit machines are not any more expensive than a > > quad xeon can run. > > Are you sure? That might be the case if you use the most over-priced > Xeon system you can find.... > > > > > A quad 1GHz Power4 with 8 gig ram, dual 36 Gig drives and all the > fixings > > goes for $44,000. That's with 64 Meg L3 cache. > > Wowee. That's a lot of money. You can build a quad Xeon with the same > trimmings for a QUARTER of that price. Will the Power4 provide four times > the performance? Well, with a maximum memory footprint of 64 Gigabytes, ALL of which can be accessed by Postgresql if need be, it is quite likely to scale much better. Power4 64 bit CPUs are generally much faster at the same clock rate than a P4 Xeon, at least in my limited experience. The dual CPU version of that box with only 4 gigs of ram is only about $26,000 This is also in a rack mount box with dual Hot swappable power supplies, it can map out bad memory on the fly automatically, and can provide REAL 5 9 reliability. 32 MEGS of L3 cache, 6 megs of L2 cache, PER PROCESSOR. A backend connection with GIGs of data bandwidth per second. An Intel based box isn't even close to being in the same class. > MB, chassis, triple power supply: $4,000 > 4 x 2.4 GHz P4 Xeon CPU's: $1,200 > 8 gigs RAM: $3,000 > 2 x 15k 36 gig HDD: $800 > ------------ > Total: $9,000 Are these white box prices, or from someone like IBM or Dell? The best price I've seen with that configuration is about $20k. Is that ECC DDR memory? Keep in mind, that out of that 8 gigs of ram, only 1.5 or so is gonna be available for Postgresql. The rest will be system cache. On a 64 bit machine you can give as much as you want to the database. > Of course, the limitting factor in PG seems (to me, at least) to be > I/O, not CPU power. In a shared-bus sort of arrangement with 4 > processers, I think you'll generally run out of I/O (whether it be from > the disks, memory, or the CPU bus) long before you'll run out of CPU. Actually, even when the CPUs are sitting at 5% load, having faster CPUs tends to speed things up for postgresql, like maybe it's interrupt bound, and faster CPUs can handle more interrupts a second. Not sure why that is, but I've definitely noticed it in my testing. > Once you get above the level of performance that an Intel or AMD system > can provide, then your relative cost (based on performance) goes up pretty > steeply - but since there aren't any other options, you're pretty much > stuck with it. Amen brother, amen. But the 64 bit machines have room to grow. If you find that you need more memory, you can go up to 64 Gigs on many of them. If you do wanna look at 64 bit systems that are Intel based then Dell sells a quad Itanium for a fair price, but by the time you've upped it to 8 gigs and a pair of 36 gig hard drives, and subtracted their gold star on site support, the price is $46k. For 4 800 MHz CPUs. A Dell quad Xeon 1.6Gig with 8 gig ram is $29k IBM is about $20k Dropping down to a 2 way IBM Power4 series box drops you into the $26,000 range, which is very competetive to the Xeons. And I wouldn't be at all surprised to see a dual Power4 1GHz box outrun the quad Xeon, due to the much faster I/O, and much larger L2 and L3 caches.
> This is also in a rack mount box with dual Hot swappable power supplies, > it can map out bad memory on the fly automatically, and can provide REAL 5 > 9 reliability. 32 MEGS of L3 cache, 6 megs of L2 cache, PER PROCESSOR. A > backend connection with GIGs of data bandwidth per second. An Intel based > box isn't even close to being in the same class. And I didn't say that it was in the same class. You said that it was about the same price, and I refuted that. That's all I was saying. I'm not a zealot or a devotee of any one class of server, by any means! > Are these white box prices, or from someone like IBM or Dell? The best > price I've seen with that configuration is about $20k. Is that ECC DDR > memory? White-box, and yes, it's registered/ECC DDR. > Keep in mind, that out of that 8 gigs of ram, only 1.5 or so is gonna be > available for Postgresql. The rest will be system cache. On a 64 bit > machine you can give as much as you want to the database. Actually, any one process will only be able to use the ~1.5 gigs - and PG forks off new processes for each backend, so you are able to make use of all 8 gigs - although I do admit that having a larger address space can be advantageous. > If you do wanna look at 64 bit systems that are Intel based then Dell > sells a quad Itanium for a fair price, but by the time you've upped it to > 8 gigs and a pair of 36 gig hard drives, and subtracted their gold star > on site support, the price is $46k. For 4 800 MHz CPUs. > > A Dell quad Xeon 1.6Gig with 8 gig ram is $29k IBM is about $20k Several years ago, I was in the market for a quad P3 Xeon, and prices from the "big names" were about the same. I built one based on a Supermicro chassis and motherboard for something like $12,000, including a fairly decent SCSI RAID array. I've worked with Compaq's servers before, and haven't seen much advantage. Yes, they have all of the fancy features that management thinks are necessary for uptime, but when the rubber meats the road, the machine I built has run for over two years with absolutely *NO* downtime other than a few planned shutdowns for planned hardware or kernel upgrades. Eventually, it was demoted from production DB server to developmental server, simply because we needed more horsepower than it could provide. (A dual Athlon filled the spot nicely.) It should also be noted that simply going to a 64-bit architecture isn't a magic cure-all. Right after I built the machine I just spoke of, a Compaq rep tried to win us over, and loaned us a $25,000 dual-CPU Alpha for a week. I ran some PostgreSQL stress tests on it with some of our production data, and the Xeon handily kept up with or beat the Alpha, at half of the price. Now if I was doing some raytracing, I'm sure that the outcome would have been very different, but for database work, it just didn't cut it. In the end, it's the same argument that gets hashed over in various forms: When it comes to commodity vs. specialized hardware, commodity hardware is always going to be a cheaper way to get things done within the realm of it's capabilities, but you eventually come to a performance level where commodity hardware just won't cut it any more. That's where the specialized hardware comes in, be it a high-end server like the Power4, a high-end router for an OC192, or a CAD/CAM graphics card. steve
On Thu, 7 Nov 2002, Steve Wolfe wrote: > > This is also in a rack mount box with dual Hot swappable power supplies, > > it can map out bad memory on the fly automatically, and can provide REAL > 5 > > 9 reliability. 32 MEGS of L3 cache, 6 megs of L2 cache, PER PROCESSOR. > A > > backend connection with GIGs of data bandwidth per second. An Intel > based > > box isn't even close to being in the same class. > > And I didn't say that it was in the same class. You said that it was > about the same price, and I refuted that. That's all I was saying. I'm > not a zealot or a devotee of any one class of server, by any means! Hold on there. I didn't mean to sound snippy when I asked that. It's pretty common for folks to think of intel hardware being equivalent when it has similar clock rates, and I was only pointing out that there is more to a server than numbers. That's all. For the amount of performance you're getting, you'd easily spend that much on a Xeon and still not catch up. > > Are these white box prices, or from someone like IBM or Dell? The best > > price I've seen with that configuration is about $20k. Is that ECC DDR > > memory? > > White-box, and yes, it's registered/ECC DDR. Cool. We use a local builder for all our intel boxen, and get much better deals than we would from the big boys. Of course, we had to fight tooth and at first to get them to build quality units (they had really poor ESD procedures in place, and our return rate was about 25%) > > Keep in mind, that out of that 8 gigs of ram, only 1.5 or so is gonna be > > available for Postgresql. The rest will be system cache. On a 64 bit > > machine you can give as much as you want to the database. > > Actually, any one process will only be able to use the ~1.5 gigs - and > PG forks off new processes for each backend, so you are able to make use > of all 8 gigs - although I do admit that having a larger address space can > be advantageous. Sorry, but that is incorrect. Postgresql uses a single large memory segment for all its shared buffers. While sorts could use the extra memory, the database itself is limited to the maximum single largest shared memory segment you can allocate, and on 32 bit intel, that is something under 2 gig with linux and BSD both. > > If you do wanna look at 64 bit systems that are Intel based then Dell > > sells a quad Itanium for a fair price, but by the time you've upped it > to > > 8 gigs and a pair of 36 gig hard drives, and subtracted their gold star > > on site support, the price is $46k. For 4 800 MHz CPUs. > > > > A Dell quad Xeon 1.6Gig with 8 gig ram is $29k IBM is about $20k > > Several years ago, I was in the market for a quad P3 Xeon, and prices > from the "big names" were about the same. I built one based on a > Supermicro chassis and motherboard for something like $12,000, including a > fairly decent SCSI RAID array. Flashback. Five years ago when I first started working here (ihs), I built our PDC/BDC pair on a Supermicro Dual PPro-200 motherboard. Having long since moved on to web development and such, I never expected to see them again. Then, walking down the hall past the equipment cage, there they were. They were being retired. The bigger one of the two is now my test server happily running along under my desk. Running Linux now instead of Windows NT. Supermicro makes kick ass mobos, IMnsvHO > I've worked with Compaq's servers before, > and haven't seen much advantage. Yes, they have all of the fancy features > that management thinks are necessary for uptime, but when the rubber meats > the road, the machine I built has run for over two years with absolutely > *NO* downtime other than a few planned shutdowns for planned hardware or > kernel upgrades. Eventually, it was demoted from production DB server to > developmental server, simply because we needed more horsepower than it > could provide. (A dual Athlon filled the spot nicely.) but I wasn't really talking about the advantages of Compaq or Dell Intel boxen, I was mainly pointing out the advantage of the bigger iron RISC boxen running unix or linux. There, it gets kinda hard to build your own, but not impossible. There are some companies that sell Dual USparc clone motherboards in ATX form factor. But the reason I asked if it was white box was that I was looking for a fair dollar comparison of the RISC versus Intel. Both can be had cheaper than what I was quoting, but not from a big name. Plus most companies usually have some silly policy about buying everything from one or two companies, so I'd bet the guy asking the question can buy any machine he wants to, as long as it says IBM on the front. :-) And, fwiw, I hate compaq boxes. Dell I can live with, but Compaq gives me stomach ulcers. Everything is proprietary, and anything you try to do it a pain with those things. > It should also be noted that simply going to a 64-bit architecture > isn't a magic cure-all. Right after I built the machine I just spoke of, > a Compaq rep tried to win us over, and loaned us a $25,000 dual-CPU Alpha > for a week. I ran some PostgreSQL stress tests on it with some of our > production data, and the Xeon handily kept up with or beat the Alpha, at > half of the price. Now if I was doing some raytracing, I'm sure that the > outcome would have been very different, but for database work, it just > didn't cut it. The alpha was one of the very first chips to really focus on floating point over integer operation. When it came out we were using HP K class machines to build database servers (running O****e) with very fast integer performance. The early K class machines literally stomped the Alphas into the ground on database performance. I think they had 4 integer processors and one FPU back then. This was especially true when under simo load. Say a hundred or so database users at a time. The intel boxes then were in the 200 to 300 MHz range, i.e. just after the PPro and at the beginning of the PII range. The HP was something like 150MHz. The intel boxes were actually a little faster than the 150MHz alphas back then, but no match for the HPs. I think we has an RS6000 too, and it was close to the HPs, but the version of AIX on it was just horrible to administer (that from the guys who adminned it, I never had to actually touch that box.) Modern 64 bit CPUs like the USparc III and Power4 are very fast at both FP and IP operations. The big advantage is addressable memory and VERY large L2/L3 caches. > In the end, it's the same argument that gets hashed over in various > forms: When it comes to commodity vs. specialized hardware, commodity > hardware is always going to be a cheaper way to get things done within the > realm of it's capabilities, but you eventually come to a performance level > where commodity hardware just won't cut it any more. That's where the > specialized hardware comes in, be it a high-end server like the Power4, a > high-end router for an OC192, or a CAD/CAM graphics card. Yep. Now, if Postgresql could run in a load balanced cluster, I'd go dual athlons all the way. Racks full of them. But if you're handling a database of 200 Gigabytes like the original poster was looking at, and you're stuck running it on one machine, fast IO AND a huge buffer memory really help, and 32 bit intel with postgresql really does have a serious limit on maximum buffer memory that 64 bit architechtures don't suffer from. So, even a 2 way 400 MHz box (say a two year old USparc) with 2Megs or more of L2/3 cache, that can hold say 8 gigs of ram and let postgresql use most of it for buffer is likely a better choice than a quad Xeon 1.5GHz, if it can hold more of the data you're accessing in memory and keep you away from disk IO. This is especially true if you're gonna have a high simo load.
On Thursday 07 November 2002 19:31, scott.marlowe wrote: > Flashback. Five years ago when I first started working here (ihs), I > built our PDC/BDC pair on a Supermicro Dual PPro-200 motherboard. Having > long since moved on to web development and such, I never expected to see > them again. I have three of them. P6DNE. They are wonderful. > So, even a 2 way 400 MHz box (say a two year old USparc) with 2Megs or > more of L2/3 cache, that can hold say 8 gigs of ram and let postgresql use > most of it for buffer is likely a better choice than a quad Xeon 1.5GHz, > if it can hold more of the data you're accessing in memory and keep you > away from disk IO. This is especially true if you're gonna have a high > simo load. FWIW, I ran the regression test 'benchmark' in parallel mode on a few machines. I was mildly surprized at the results. My Athlon 1.2 notebook runs the set in about 1.5 minutes. A Pentium II 400 at work, with a _fast_ HD, runs them in 1.75 minutes. My Sun Ultra 5 with a moderately fast IDE HD and a 333 UltraSPARC IIi w/ 768MB RAM runs them in 2.5 minutes. An old SPARCstation 5/110 with a relatively quick 4.3GB Fujitsu SCSI drive and 160MB RAM takes _23.5_ minutes. An Ultra 30/248 with 128MB and the same Fujitsu drive takes 5 minutes. A larger Sun box would have increasingly better times, of course, due to the SMP support. OS in all cases was Red Hat Linux 7.3 or its SPARC twin, Aurora 0.42. The U5 has 2MB ecache, the U30 has 1MB ecache. (ecache = L2 in PC jargon). One must note, however, that the U5 has much slower RAM than the Athlon notebook or the PII 400, which both use PC100 SDRAM -- the U5 uses 60ns EDO. But it still holds its own. I need to rerun them on my dual PPro 200/512 system, once I get it running again. The disk I/O is the killer. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
On Fri, 8 Nov 2002, Vidal Salem Zebulum wrote: > > ----- Original Message ----- > From: "scott.marlowe" <scott.marlowe@ihs.com> > To: "Vidal" <vzebulum@yahoo.com.br> > Cc: <pgsql-general@postgresql.org> > Sent: Thursday, November 07, 2002 3:08 PM > Subject: Re: [GENERAL] Hardware estimation > > > > On 5 Nov 2002, Vidal wrote: > > > > > Hi, > > > > > > I am a first timer in this forum. We have an application using > > > Postgre Database with a considerable growth expected for the next > > > year. We are facing some dificulties to estimate the equipment we´ll > > > need and I would like to exchange ideas about: > > > > > > - hardware estimation > > > - performance concerns > > > > > > with anyone that faced Postgre applications with more than 150Gb. > > > > > > I am talking about a database that will achieve: > > > > > > - 200Gb total space > > > - 2 major space cosuming tables (400 million and 50 million tuples) > > > - major system process - batch process reading from and writing to > > > text files (telephony records) - aprox 2 million records processed > > > (read or written every day) > > > > > > Operating System : Linux > > > > > > Equipment we are planning: > > > - IBM xSeries 255 4 Intel Xeon 1.5GHz, MP, 512KB Cache, 2GB RAM, > > > External Storage System(IBM SSA) > > > > Are you going to be supporting a fair number of simo users, or is this > > gonna be mostly a single batch at a time operation? > > > > If you're mostly gonna be running single user batch files for all the > > heavy lifting, save the money you'd spend on a quad Xeon and spend it on > > more / bigger / faster drive arrays and faster individual CPUs, like a > > dual Athlon 2800. > > > > But maybe the multi-user part is still very important to your uses. If > > so... > > > > If you are looking at a quad Xeon, then take a look at going all the way > > to 64 bit architecture (Ultra Sparc, HP, P6xx IBM, Itanium 2) where you > > can throw scads of memory at your problems and postgresql can access it. > > > > You might find a dual Ultra Sparc will outrun the quad xeon due to better > > memory access, larger caches (8Meg L2 cache), and being able to put > > ungodly amounts of memory into the machine. > > > > Some of the "low end" 64 bit machines are not any more expensive than a > > quad xeon can run. > > > > A quad 1GHz Power4 with 8 gig ram, dual 36 Gig drives and all the fixings > > goes for $44,000. That's with 64 Meg L3 cache. > > > > > Mr Scott, > > > Thanks for the prompt response. > > Our processes will be mostly batch ones. On line activity will be restricted > to a few users. And for some on-line process we will use consolidated data. > But all batch processes will have to use the tables with 50M and 500M > records. > > We will strongly consider your option for external storage. We would rather > relay on a Risc arquitecture due to the I/O effort of our batch processes. > Our IBM rep then raised a problem: there was no linux driver in > thePower3(P610) for the external storage solution we wanted. Then they > mentioned the possibility to use the quad Xeon solution. We will still push > them and HP for a Risc solution with external storafe. Your statement > enforce this choice. Thanks for that. > > Do you think we should have problems with Postgre at that database size > assuming we will take double care with tunning ? Do you know anyone with > similar volume ? There are LOTS of postgresql installations out there with gigabytes of data. Take a search through the pgsql-general and pgsql-hackers list for "large database" or "gigabyte" and you should find a few folks who are running hundreds of gigs of data on postgresql.
Vidal, There is a pgsql-PERFORMANCE mailing list you may be interested in. To subscribe (I think) send an e-mail to majordomo@postgreSQL.org, with "subscribe pgsql-performance your@email.address" as the message. -- -Josh Berkus Aglio Database Solutions San Francisco
----- Original Message ----- From: "scott.marlowe" <scott.marlowe@ihs.com> To: "Vidal" <vzebulum@yahoo.com.br> Cc: <pgsql-general@postgresql.org> Sent: Thursday, November 07, 2002 3:08 PM Subject: Re: [GENERAL] Hardware estimation > On 5 Nov 2002, Vidal wrote: > > > Hi, > > > > I am a first timer in this forum. We have an application using > > Postgre Database with a considerable growth expected for the next > > year. We are facing some dificulties to estimate the equipment we´ll > > need and I would like to exchange ideas about: > > > > - hardware estimation > > - performance concerns > > > > with anyone that faced Postgre applications with more than 150Gb. > > > > I am talking about a database that will achieve: > > > > - 200Gb total space > > - 2 major space cosuming tables (400 million and 50 million tuples) > > - major system process - batch process reading from and writing to > > text files (telephony records) - aprox 2 million records processed > > (read or written every day) > > > > Operating System : Linux > > > > Equipment we are planning: > > - IBM xSeries 255 4 Intel Xeon 1.5GHz, MP, 512KB Cache, 2GB RAM, > > External Storage System(IBM SSA) > > Are you going to be supporting a fair number of simo users, or is this > gonna be mostly a single batch at a time operation? > > If you're mostly gonna be running single user batch files for all the > heavy lifting, save the money you'd spend on a quad Xeon and spend it on > more / bigger / faster drive arrays and faster individual CPUs, like a > dual Athlon 2800. > > But maybe the multi-user part is still very important to your uses. If > so... > > If you are looking at a quad Xeon, then take a look at going all the way > to 64 bit architecture (Ultra Sparc, HP, P6xx IBM, Itanium 2) where you > can throw scads of memory at your problems and postgresql can access it. > > You might find a dual Ultra Sparc will outrun the quad xeon due to better > memory access, larger caches (8Meg L2 cache), and being able to put > ungodly amounts of memory into the machine. > > Some of the "low end" 64 bit machines are not any more expensive than a > quad xeon can run. > > A quad 1GHz Power4 with 8 gig ram, dual 36 Gig drives and all the fixings > goes for $44,000. That's with 64 Meg L3 cache. > > Mr Scott, Thanks for the prompt response. Our processes will be mostly batch ones. On line activity will be restricted to a few users. And for some on-line process we will use consolidated data. But all batch processes will have to use the tables with 50M and 500M records. We will strongly consider your option for external storage. We would rather relay on a Risc arquitecture due to the I/O effort of our batch processes. Our IBM rep then raised a problem: there was no linux driver in thePower3(P610) for the external storage solution we wanted. Then they mentioned the possibility to use the quad Xeon solution. We will still push them and HP for a Risc solution with external storafe. Your statement enforce this choice. Thanks for that. Do you think we should have problems with Postgre at that database size assuming we will take double care with tunning ? Do you know anyone with similar volume ? Thanks a lot Best regards Vidal Salem Zebulum _______________________________________________________________________ Yahoo! GeoCities Tudo para criar o seu site: ferramentas fáceis de usar, espaço de sobra e acessórios. http://br.geocities.yahoo.com/
On Fri, 8 Nov 2002, Vidal Salem Zebulum wrote: > Our IBM rep then raised a problem: there was no linux driver in > thePower3(P610) for the external storage solution we wanted. Consider a SAN or a device which can look like a SCSI drive. Don't have brands in mind right now, but there are some storage devices outthere that can appear as a SCSI subsystem. Don't know anything about 64bit architecture, but wonder if they could connect to an SCSI subsystem. If so you could go with Linux and you would like and then just attach to one of these devices.