Thread: High end server and storage for a PostgreSQL OLTP system
Hi all, I've been following this list for nearly a year now. I've always managed to get PostgreSQL 7.1.x right for the job, which in my case is a large and complex oltp system, run under Pg for 6 years now. We were already planning the switch from 7.1 to 7.4 (or even 8.0). The last project we're facing with has a transaction volume that is something we've never dealt with. By "transaction" I mean something involving 10 to 10,000 (and more) sql queries (a complex mix of insert/ update/ delete/ select). I'd like to ask: 1) What kind of performance gain can I expect switching from 7.1 to 7.4 (or 8.0)? Obviously I'm doing my own testing, but I'm not very impressed by 8.0 speed, may be I'm doing testing on a low end server... 2) The goal is to make the db handle 100 tps (something like 100 users). What kind of server and storage should I provide? The actual servers our application runs on normally have 2 Intel Xeon processors, 2-4 Gb RAM, RAID 0/1/5 SCSI disk storage with hard drives @ 10,000 rpm 3) Highest I/O throughput SCSI adapters? Adaptec? 4) Is it correct to suppose that multiple RAID 1 arrays can provide the fastest I/O ? I usually reserve one RAID1 array to db data directory, one RAID1 array to pg_xlog directory and one RAID1 array for os and application needs. 5) OS and Pg specific tuning? Usually I modify shared memory settings and most of postgresql.conf available settings for 7.1, like `effective_cache', `shared_buffers', `wal_buffers', `wal_files', and so on. -- Cosimo
Cosimo Streppone <cosimo@streppone.it> writes: > 1) What kind of performance gain can I expect switching from > 7.1 to 7.4 (or 8.0)? Obviously I'm doing my own testing, > but I'm not very impressed by 8.0 speed, may be I'm doing > testing on a low end server... Most people report a noticeable speedup in each new release; we hit different things in different releases, but usually at least one performance gain is useful to any one person. For a jump as far as from 7.1 to 8.0 I'm surprised that you're not seeing any gain at all. What was your test case exactly? Have you perhaps tuned your app so specifically to 7.1 that you need to detune it? regards, tom lane
On Mon, Jan 31, 2005 at 09:41:32PM +0100, Cosimo Streppone wrote: > 2) The goal is to make the db handle 100 tps (something like > 100 users). What kind of server and storage should I provide? > > The actual servers our application runs on normally have > 2 Intel Xeon processors, 2-4 Gb RAM, RAID 0/1/5 SCSI > disk storage with hard drives @ 10,000 rpm You might look at Opteron's, which theoretically have a higher data bandwidth. If you're doing anything data intensive, like a sort in memory, this could make a difference. > 4) Is it correct to suppose that multiple RAID 1 arrays > can provide the fastest I/O ? > I usually reserve one RAID1 array to db data directory, > one RAID1 array to pg_xlog directory and one RAID1 array > for os and application needs. RAID10 will be faster than RAID1. The key factor to a high performance database is a high performance I/O system. If you look in the archives you'll find people running postgresql on 30 and 40 drive arrays. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
Tom Lane wrote: > Cosimo writes: > >>1) What kind of performance gain can I expect switching from >> 7.1 to 7.4 (or 8.0)? Obviously I'm doing my own testing, >> but I'm not very impressed by 8.0 speed, may be I'm doing >> testing on a low end server... > > Most people report a noticeable speedup in each new release > [...] > I'm surprised that you're not seeing any gain at all. > What was your test case exactly? Have you perhaps tuned your app > so specifically to 7.1 that you need to detune it? We tend to use the lowest common SQL features that will allow us to work with any db, so probably the problem is the opposite, there is no pg-specific overtuning. Also, the real pg load, that should be my ideal test case, is somewhat difficult to reproduce (~ 50 users with handhelds and browser clients). Another good test is a particular procedure that opens several (~1000) subsequent transactions, composed of many repeated selection queries with massive write loads on 6/7 different tables, as big as 300/400k tuples. Every transaction ends with either commit or rollback state Indexing here should be ok, for I've analyzed every single query also under database "stress". Probably one big issue is that I need to vacuum/reindex too often to keep db performances at a good(tm) level. I realize that this has been addressed in several ways with newer PGs. However, I need to do a lot of application and performance tests and do them more seriously. Then I'll report the results here. -- Cosimo
Jim C. Nasby wrote: > On Mon, Jan 31, 2005 at 09:41:32PM +0100, Cosimo wrote: > > >2) The goal is to make the db handle 100 tps (something like > > 100 users). What kind of server and storage should I provide? > > You might look at Opteron's, which theoretically have a higher data > bandwidth. If you're doing anything data intensive, like a sort in > memory, this could make a difference. Would Opteron systems need 64-bit postgresql (and os, gcc, ...) build to have that advantage? > >4) Is it correct to suppose that multiple RAID 1 arrays > > can provide the fastest I/O ? > > I usually reserve one RAID1 array to db data directory, > > one RAID1 array to pg_xlog directory and one RAID1 array > > for os and application needs. > > RAID10 will be faster than RAID1. Sorry Jim, by RAID10 you mean several raid1 arrays mounted on different linux partitions? Or several raid1 arrays that build up a raid0 array? In the latter case, who decides which data goes in which raid1 array? Raid Adapter? > The key factor to a high performance database is a high > performance I/O system. If you look in the archives > you'll find people running postgresql on 30 and 40 > drive arrays. I'll do a search, thank you. -- Cosimo
On Tue, Feb 01, 2005 at 07:35:35AM +0100, Cosimo Streppone wrote: > >You might look at Opteron's, which theoretically have a higher data > >bandwidth. If you're doing anything data intensive, like a sort in > >memory, this could make a difference. > > Would Opteron systems need 64-bit postgresql (and os, gcc, ...) > build to have that advantage? Well, that would give you the most benefit, but the memory bandwidth is still greater than on a Xeon. There's really no issue with 64 bit if you're using open source software; it all compiles for 64 bits and you're good to go. http://stats.distributed.net runs on a dual opteron box running FreeBSD and I've had no issues. > >RAID10 will be faster than RAID1. > > Sorry Jim, by RAID10 you mean several raid1 arrays mounted on > different linux partitions? Or several raid1 arrays that > build up a raid0 array? In the latter case, who decides which > data goes in which raid1 array? Raid Adapter? You should take a look around online for a description of raid types. There's technically RAID0+1 and RAID1+0; one is a stripe of mirrored drives (a RAID 0 built out of RAID 1s), the other is a mirror of two RAID 0s. The former is much better; if you're lucky you can lose half your drives without any data loss (if each dead drive is part of a different mirror). Recovery is also faster. You'll almost certainly be much happier with hardware raid instead of software raid. stats.distributed.net runs a 3ware controller and SATA drives. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
To be honest I've used compaq, dell and LSI SCSI RAID controllers and got pretty pathetic benchmarks from all of them. The best system I have is the one I just built: 2xOpteron 242, Tyan S2885 MoBo, 4GB Ram, 14xSATA WD Raptor drives: 2xRaid 1, 1x4 disk Raid 10, 1x6 drive Raid 10. 2x3ware (now AMCC) Escalade 9500S-8MI. This system with fsync on has managed 2500 insert transactions/sec (granted they are simple transactions, but still). RAID 10 is a stripe of mirrors. RAID 10 give you the best read and write performance combined. RAID 5 gives very bad write perfomance, but good read performance. With RAID 5 you can only loose a single drive and rebuild times are slow. RAID 10 can loose up to have the array depending on which drives without loosing data. I would be interested in starting a site listing RAID benchmarks under linux. If anyone is interested let me know. I would be interested in at least some bonnie++ benchmarks, and perhaps other if people would like. Alex Turner NetEconomist On Tue, 1 Feb 2005 05:27:27 -0600, Jim C. Nasby <decibel@decibel.org> wrote: > On Tue, Feb 01, 2005 at 07:35:35AM +0100, Cosimo Streppone wrote: > > >You might look at Opteron's, which theoretically have a higher data > > >bandwidth. If you're doing anything data intensive, like a sort in > > >memory, this could make a difference. > > > > Would Opteron systems need 64-bit postgresql (and os, gcc, ...) > > build to have that advantage? > > Well, that would give you the most benefit, but the memory bandwidth is > still greater than on a Xeon. There's really no issue with 64 bit if > you're using open source software; it all compiles for 64 bits and > you're good to go. http://stats.distributed.net runs on a dual opteron > box running FreeBSD and I've had no issues. > > > >RAID10 will be faster than RAID1. > > > > Sorry Jim, by RAID10 you mean several raid1 arrays mounted on > > different linux partitions? Or several raid1 arrays that > > build up a raid0 array? In the latter case, who decides which > > data goes in which raid1 array? Raid Adapter? > > You should take a look around online for a description of raid types. > > There's technically RAID0+1 and RAID1+0; one is a stripe of mirrored > drives (a RAID 0 built out of RAID 1s), the other is a mirror of two > RAID 0s. The former is much better; if you're lucky you can lose half > your drives without any data loss (if each dead drive is part of a > different mirror). Recovery is also faster. > > You'll almost certainly be much happier with hardware raid instead of > software raid. stats.distributed.net runs a 3ware controller and SATA > drives. > -- > Jim C. Nasby, Database Consultant decibel@decibel.org > Give your computer some brain candy! www.distributed.net Team #1828 > > Windows: "Where do you want to go today?" > Linux: "Where do you want to go tomorrow?" > FreeBSD: "Are you guys coming, or what?" > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
> Hi all, > 1) What kind of performance gain can I expect switching from > 7.1 to 7.4 (or 8.0)? Obviously I'm doing my own testing, > but I'm not very impressed by 8.0 speed, may be I'm doing > testing on a low end server... 8.0 gives you savepoints. While this may not seem like a big deal at first, the ability to handle exceptions inside pl/pgsql functions gives you much more flexibility to move code into the server. Also, recent versions of pl/pgsql give you more flexibility with cursors, incuding returning them outside of the function. Corollary: use pl/pgsql. It can be 10 times or more faster than query by query editing. You also have the parse/bind interface. This may not be so easily to implement in your app, but if you are machine gunning your server with queries, use parameterized prepared queries and reap 50% + performance, meaning lower load and quicker transaction turnaround time. Merlin
Merlin Moncure wrote: > Corollary: use pl/pgsql. It can be 10 times or more faster than query > by query editing. Merlin, thanks for your good suggestions. By now, our system has never used "stored procedures" approach, due to the fact that we're staying on the minimum common SQL features that are supported by most db engines. I realize though that it would provide an heavy performance boost. > You also have the parse/bind interface This is something I have already engineered in our core classes (that use DBI + DBD::Pg), so that switching to 8.0 should automatically enable the "single-prepare, multiple-execute" behavior, saving a lot of query planner processing, if I understand correctly. -- Cosimo
Alex Turner wrote: > To be honest I've used compaq, dell and LSI SCSI RAID controllers and > got pretty pathetic benchmarks from all of them. I also have seen average-low results for LSI (at least the 1020 card). > 2xOpteron 242, Tyan S2885 MoBo, 4GB Ram, 14xSATA WD Raptor drives: > 2xRaid 1, 1x4 disk Raid 10, 1x6 drive Raid 10. 2x3ware (now AMCC) > Escalade 9500S-8MI. Thanks, this is precious information. > I would be interested in starting a site listing RAID benchmarks under > linux. If anyone is interested let me know. I would be interested in > at least some bonnie++ benchmarks, and perhaps other if people would > like. I have used also tiobench [http://tiobench.sourceforge.net/] Any experience with it? -- Cosimo
Jim C. Nasby wrote: > On Tue, Feb 01, 2005 at 07:35:35AM +0100, Cosimo Streppone wrote: > >>>You might look at Opteron's, which theoretically have a higher data >>>bandwidth. If you're doing anything data intensive, like a sort in >>>memory, this could make a difference. >> >>Would Opteron systems need 64-bit postgresql (and os, gcc, ...) >>build to have that advantage? > > > Well, that would give you the most benefit, but the memory bandwidth is > still greater than on a Xeon. There's really no issue with 64 bit if > you're using open source software; it all compiles for 64 bits and > you're good to go. http://stats.distributed.net runs on a dual opteron > box running FreeBSD and I've had no issues. You can get 64-bit Xeons also but it takes hit in the I/O department due to the lack of a hardware I/O MMU which limits DMA transfers to addresses below 4GB. This has a two-fold impact: 1) transfering data to >4GB require first a transfer to <4GB and then a copy to the final destination. 2) You must allocate real memory 2X the address space of the devices to act as bounce buffers. This is especially problematic for workstations because if you put a 512MB Nvidia card in your computer for graphics work -- you've just lost 1GB of memory. (I dunno how much the typical SCSI/NIC/etc take up.)
None - but I'll definately take a look.. Alex Turner NetEconomist On Tue, 01 Feb 2005 22:11:30 +0100, Cosimo Streppone <cosimo@streppone.it> wrote: > Alex Turner wrote: > > > To be honest I've used compaq, dell and LSI SCSI RAID controllers and > > got pretty pathetic benchmarks from all of them. > > I also have seen average-low results for LSI (at least the 1020 card). > > > 2xOpteron 242, Tyan S2885 MoBo, 4GB Ram, 14xSATA WD Raptor drives: > > 2xRaid 1, 1x4 disk Raid 10, 1x6 drive Raid 10. 2x3ware (now AMCC) > > Escalade 9500S-8MI. > > Thanks, this is precious information. > > > I would be interested in starting a site listing RAID benchmarks under > > linux. If anyone is interested let me know. I would be interested in > > at least some bonnie++ benchmarks, and perhaps other if people would > > like. > > I have used also tiobench [http://tiobench.sourceforge.net/] > Any experience with it? > > -- > Cosimo > >
William Yu wrote: > > Well, that would give you the most benefit, but the memory bandwidth is > > still greater than on a Xeon. There's really no issue with 64 bit if > > you're using open source software; it all compiles for 64 bits and > > you're good to go. http://stats.distributed.net runs on a dual opteron > > box running FreeBSD and I've had no issues. > > You can get 64-bit Xeons also but it takes hit in the I/O department due > to the lack of a hardware I/O MMU which limits DMA transfers to > addresses below 4GB. This has a two-fold impact: > > 1) transfering data to >4GB require first a transfer to <4GB and then a > copy to the final destination. > > 2) You must allocate real memory 2X the address space of the devices to > act as bounce buffers. This is especially problematic for workstations > because if you put a 512MB Nvidia card in your computer for graphics > work -- you've just lost 1GB of memory. (I dunno how much the typical > SCSI/NIC/etc take up.) I thought Intel was copying AMD's 64-bit API. Is Intel's implementation as poor as you description? Does Intel have any better 64-bit offering other than the Itanium/Itanic? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
>>You can get 64-bit Xeons also but it takes hit in the I/O department due >>to the lack of a hardware I/O MMU which limits DMA transfers to >>addresses below 4GB. This has a two-fold impact: >> >>1) transfering data to >4GB require first a transfer to <4GB and then a >>copy to the final destination. >> >>2) You must allocate real memory 2X the address space of the devices to >>act as bounce buffers. This is especially problematic for workstations >>because if you put a 512MB Nvidia card in your computer for graphics >>work -- you've just lost 1GB of memory. (I dunno how much the typical >>SCSI/NIC/etc take up.) > > > I thought Intel was copying AMD's 64-bit API. Is Intel's > implementation as poor as you description? Does Intel have any better > 64-bit offering other than the Itanium/Itanic? Unfortunately, there's no easy way for Intel to have implemented a 64-bit IOMMU under their current restrictions. The memory controller resides on the chipset and to upgrade the functionality significantly, it would probably require changing the bus protocol. It's not that they couldn't do it -- it would just require all Intel chipset/MB vendors/partners to go through the process of creating & validating totally new products. A way lengthier process than just producing 64-bit CPUs that drop into current motherboards.
William Yu wrote: > >>You can get 64-bit Xeons also but it takes hit in the I/O department due > >>to the lack of a hardware I/O MMU which limits DMA transfers to > >>addresses below 4GB. This has a two-fold impact: > >> > >>1) transfering data to >4GB require first a transfer to <4GB and then a > >>copy to the final destination. > >> > >>2) You must allocate real memory 2X the address space of the devices to > >>act as bounce buffers. This is especially problematic for workstations > >>because if you put a 512MB Nvidia card in your computer for graphics > >>work -- you've just lost 1GB of memory. (I dunno how much the typical > >>SCSI/NIC/etc take up.) When you say "allocate real memory 2X" are you saying that if you have 16GB of RAM only 8GB is actually usable and the other 8GB is for bounce buffers, or is it just address space being used up? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
> By now, our system has never used "stored procedures" approach, > due to the fact that we're staying on the minimum common SQL features > that are supported by most db engines. > I realize though that it would provide an heavy performance boost. I feel your pain. Well, sometimes you have to bite the bullet and do a couple of implementation specific hacks in especially time sensitive components. > > You also have the parse/bind interface > > This is something I have already engineered in our core classes > (that use DBI + DBD::Pg), so that switching to 8.0 should > automatically enable the "single-prepare, multiple-execute" behavior, > saving a lot of query planner processing, if I understand correctly. Yes. You save the planning step (which adds up, even for trivial plans). The 'ExexPrepared' variant of prepared statement execution also provides substantial savings (on server cpu load and execution time) because the statement does not have to be parsed. Oh, and network traffic is reduced correspondingly. I know that the perl people were pushing for certain features into the libpq library (describing prepared statements, IIRC). I think this stuff made it into 8.0...have no clue about DBD::pg. If everything is working the way it's supposed to, 8.0 should be faster than 7.1 (like, twice faster) for what you are probably trying to do. If it isn't, something else is wrong and it's very likely a solvable problem. In short, in pg 8.0, statement by statement query execution is highly optimizeable at the driver level, much more so than 7.1. Law of Unintended Consequences aside, this will translate into direct benefits into your app if it uses this application programming model. Merlin
Bruce Momjian wrote: > William Yu wrote: > >>>>You can get 64-bit Xeons also but it takes hit in the I/O department due >>>>to the lack of a hardware I/O MMU which limits DMA transfers to >>>>addresses below 4GB. This has a two-fold impact: >>>> >>>>1) transfering data to >4GB require first a transfer to <4GB and then a >>>>copy to the final destination. >>>> >>>>2) You must allocate real memory 2X the address space of the devices to >>>>act as bounce buffers. This is especially problematic for workstations >>>>because if you put a 512MB Nvidia card in your computer for graphics >>>>work -- you've just lost 1GB of memory. (I dunno how much the typical >>>>SCSI/NIC/etc take up.) > > > When you say "allocate real memory 2X" are you saying that if you have > 16GB of RAM only 8GB is actually usable and the other 8GB is for > bounce buffers, or is it just address space being used up? > It's 2x the memory space of the devices. E.g. a Nvidia Graphics card w/ 512MB of RAM would require 1GB of memory to act as bounce buffers. And it has to be real chunks of memory in 64-bit mode since DMA transfer must drop it into real memory in order to then be copied to > 4GB.
Merlin Moncure wrote: > > [...] > > (...DBI + DBD::Pg), so that switching to 8.0 should > > automatically enable the "single-prepare, multiple-execute" behavior, > > saving a lot of query planner processing, if I understand correctly. > > [...] > > I know that the perl people were pushing for certain features into the > libpq library (describing prepared statements, IIRC). I think this > stuff made it into 8.0...have no clue about DBD::pg. For the record: yes, DBD::Pg in CVS (> 1.32) has support for server prepared statements. > If everything is working the way it's supposed to, 8.0 should be faster > than 7.1 (like, twice faster) for what you are probably trying to do. In the next days I will be testing the entire application with the same database only changing the backend from 7.1 to 8.0, so this is a somewhat perfect condition to have a "real-world" benchmark of Pg 8.0 vs 7.1.x performances. -- Cosimo
pgman@candle.pha.pa.us (Bruce Momjian) wrote: > William Yu wrote: >> > Well, that would give you the most benefit, but the memory >> > bandwidth is still greater than on a Xeon. There's really no >> > issue with 64 bit if you're using open source software; it all >> > compiles for 64 bits and you're good to >> > go. http://stats.distributed.net runs on a dual opteron box >> > running FreeBSD and I've had no issues. >> >> You can get 64-bit Xeons also but it takes hit in the I/O >> department due to the lack of a hardware I/O MMU which limits DMA >> transfers to addresses below 4GB. This has a two-fold impact: >> >> 1) transfering data to >4GB require first a transfer to <4GB and >> then a copy to the final destination. >> >> 2) You must allocate real memory 2X the address space of the >> devices to act as bounce buffers. This is especially problematic >> for workstations because if you put a 512MB Nvidia card in your >> computer for graphics work -- you've just lost 1GB of memory. (I >> dunno how much the typical SCSI/NIC/etc take up.) > > I thought Intel was copying AMD's 64-bit API. Is Intel's > implementation as poor as you description? Does Intel have any better > 64-bit offering other than the Itanium/Itanic? From what I can see, the resulting "copy of AMD64" amounts to little more than rushing together a project to glue a bag on the side of a Xeon chip with some 64 bit parts in it. I see no reason to expect what is only billed as an "extension technology" <http://www.eweek.com/article2/0,1759,1545734,00.asp> to alleviate the deeply rooted memory bandwidth problems seen on Xeon. -- let name="cbbrowne" and tld="acm.org" in name ^ "@" ^ tld;; http://cbbrowne.com/info/advocacy.html Q: What does the function NULL do? A: The function NULL tests whether or not its argument is NIL or not. If its argument is NIL the value of NULL is NIL. -- Ken Tracton, Programmer's Guide to Lisp, page 73.
Cosimo Streppone wrote: > Merlin Moncure wrote: > > > If everything is working the way it's supposed to, 8.0 should be faster > > than 7.1 (like, twice faster) for what you are probably trying to do. > > In the next days I will be testing the entire application with the > same database only changing the backend from 7.1 to 8.0, so this is > a somewhat perfect condition to have a "real-world" benchmark > of Pg 8.0 vs 7.1.x performances. The "next days" have come. I did a complete migration to Pg 8.0.1 from 7.1.3. It was a *huge* jump. The application is exactly the same, also the database structure is the same. I only dumped the entire 7.1.3 db, changed the backend version, and restored the data in the 8.0.1 db. The performance level of Pg 8 is at least *five* times higher (faster!) than 7.1.3 in "query-intensive" transactions, which is absolutely astounding. In my experience, Pg8 handles far better non-unique indexes with low cardinality built on numeric and integer types, which is very common in our application. -- Cosimo
Cosimo Streppone <cosimo@streppone.it> writes: > The performance level of Pg 8 is at least *five* times higher > (faster!) than 7.1.3 in "query-intensive" transactions, > which is absolutely astounding. Cool. > In my experience, Pg8 handles far better non-unique indexes > with low cardinality built on numeric and integer types, which > is very common in our application. Yes, we've fixed a number of places where the btree code was inefficient with large numbers of equal keys. I'm not sure that that explains a 5x speedup all by itself, though. regards, tom lane