Thread: The results of my PostgreSQL/filesystem performance tests
Hey all. I said I was going to do it, and I finally did it. As with all performance tests/benchmarks, there are probably dozens or more reasons why these results aren't as accurate or wonderful as they should be. Take them for what they are and hopefully everyone can learn a few things from them. Intelligent feedback is welcome. http://www.potentialtech.com/wmoran/postgresql.php -- Bill Moran Potential Technologies http://www.potentialtech.com
Bill, Very interesting results. I'd like to command you on your honesty. Having started out with the intentions of proving that FreeBSD is faster than Linux only to find that the opposite is true must not have been rewarding for you. However, these unexpected results serve only to reinforce the integrity of your tests. Thanks for all the work. Balazs -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Bill Moran Sent: Tuesday, August 26, 2003 6:48 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] The results of my PostgreSQL/filesystem performance tests Hey all. I said I was going to do it, and I finally did it. As with all performance tests/benchmarks, there are probably dozens or more reasons why these results aren't as accurate or wonderful as they should be. Take them for what they are and hopefully everyone can learn a few things from them. Intelligent feedback is welcome. http://www.potentialtech.com/wmoran/postgresql.php -- Bill Moran Potential Technologies http://www.potentialtech.com ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings
On 26 Aug 2003 at 21:47, Bill Moran wrote: > Hey all. > > I said I was going to do it, and I finally did it. > > As with all performance tests/benchmarks, there are probably dozens or > more reasons why these results aren't as accurate or wonderful as they > should be. Take them for what they are and hopefully everyone can > learn a few things from them. > > Intelligent feedback is welcome. > > http://www.potentialtech.com/wmoran/postgresql.php Can we have these benchmarks and relevant information stored in a central archive at techdocs(Say)? That would be better than searching mail archives.. Bye Shridhar -- "... freedom ... is a worship word..." "It is our worship word too." -- Cloud William and Kirk, "The Omega Glory", stardate unknown
On Tue, 26 Aug 2003, Bill Moran wrote: > As with all performance tests/benchmarks, there are probably dozens or > more reasons why these results aren't as accurate or wonderful as they > should be. Take them for what they are and hopefully everyone can > learn a few things from them. What version of pg was used in debian and redhat? For freebsd it's 7.2.4 it says on the page, but I see nothing about the other two. The version that comes with Redhat 9 (Shrike) is 7.3.2. -- /Dennis
On Tue, 2003-08-26 at 20:47, Bill Moran wrote: > Hey all. > > I said I was going to do it, and I finally did it. > > As with all performance tests/benchmarks, there are probably dozens or > more reasons why these results aren't as accurate or wonderful as they > should be. Take them for what they are and hopefully everyone can > learn a few things from them. > > Intelligent feedback is welcome. > > http://www.potentialtech.com/wmoran/postgresql.php Hi, Woody has pg 7.2.1. Note also that Woody comes with kernel 2.4.18. It would be interesting to see how Debian Sid (kernel 2.4.21 and pg 7.3.3) would perform. Thanks for the results! -- ----------------------------------------------------------------- Ron Johnson, Jr. ron.l.johnson@cox.net Jefferson, LA USA "Oh, great altar of passive entertainment, bestow upon me thy discordant images at such speed as to render linear thought impossible" Calvin, regarding TV
On Tue, 2003-08-26 at 20:47, Bill Moran wrote: > Hey all. > > I said I was going to do it, and I finally did it. > > As with all performance tests/benchmarks, there are probably dozens or > more reasons why these results aren't as accurate or wonderful as they > should be. Take them for what they are and hopefully everyone can > learn a few things from them. > > Intelligent feedback is welcome. > > http://www.potentialtech.com/wmoran/postgresql.php I notice that the Linux FSs weren't tested with noatime. Any reason? -- ----------------------------------------------------------------- Ron Johnson, Jr. ron.l.johnson@cox.net Jefferson, LA USA "As I like to joke, I may have invented it, but Microsoft made it popular" David Bradley, regarding Ctrl-Alt-Del
> Intelligent feedback is welcome. > > http://www.potentialtech.com/wmoran/postgresql.php Good work. But I can't find information about xfs. Do you plan to add this one FS in test? Luf
A long time ago, in a galaxy far, far away, balazs@neusolutions.com ("Balazs Wellisch") wrote: > Very interesting results. I'd like to command you on your honesty. > Having started out with the intentions of proving that FreeBSD is faster > than Linux only to find that the opposite is true must not have been > rewarding for you. However, these unexpected results serve only to > reinforce the integrity of your tests. Well put. To see a result that the tester didn't really want to see/present does suggest good things about the tester's honesty. There was incentive to hide unfavorable results. What it still leaves quite open is just what happens when the OS has more than one disk drive or CPU to play with. It's not clear what happens in such cases, whether FreeBSD would catch up, or be "left further in the dust." The traditional "propaganda" has been that there are all sorts of reasons to expect PostgreSQL on FreeBSD to run a bit faster than on Linux; it is a bit unexpected for the opposite to seem true. -- output = reverse("gro.mca" "@" "enworbbc") http://www3.sympatico.ca/cbbrowne/sap.html "I am aware of the benefits of a micro kernel approach. However, the fact remains that Linux is here, and GNU isn't --- and people have been working on Hurd for a lot longer than Linus has been working on Linux." -- Ted T'so, 1992.
On Tue, 26 Aug 2003, Bill Moran wrote: > > Intelligent feedback is welcome. > That's some good work there, Lou. You'll make sgt for that someday. But I think the next step, before trying out other filesystems and options would be concurrency. Run a bunch of these beasts together and see what happens (I don't think too many of us have a single session running). Perhaps even make them "interfere" with each other to create as much "pain" as possible? on a side note - I might be blind here - I didn't see what version of pg you were using or any postgresql.conf tweaks - or did you just use whatever came with each distro? -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
Couple of questions: What was the postgresql.conf configuration used? Default? How many threads of the script ran? Looks like a single user only. I assume there was nothing else running at the time (cron, sendmail, etc. were all off?) Do you know whether the machines were disk or I/O bound? Was PostgreSQL compiled the same for each OS or did you use the rpm, deb, tgz that were available? On Tue, 2003-08-26 at 21:47, Bill Moran wrote: > Hey all. > > I said I was going to do it, and I finally did it. > > As with all performance tests/benchmarks, there are probably dozens or > more reasons why these results aren't as accurate or wonderful as they > should be. Take them for what they are and hopefully everyone can > learn a few things from them. > > Intelligent feedback is welcome. > > http://www.potentialtech.com/wmoran/postgresql.php
Attachment
2003-08-28 ragyogó napján Christopher Browne ezt üzente: > A long time ago, in a galaxy far, far away, balazs@neusolutions.com ("Balazs Wellisch") wrote: > > Very interesting results. I'd like to command you on your honesty. > > Having started out with the intentions of proving that FreeBSD is faster > > than Linux only to find that the opposite is true must not have been > > rewarding for you. However, these unexpected results serve only to > > reinforce the integrity of your tests. > > Well put. > > To see a result that the tester didn't really want to see/present does > suggest good things about the tester's honesty. There was incentive > to hide unfavorable results. > > What it still leaves quite open is just what happens when the OS has > more than one disk drive or CPU to play with. It's not clear what > happens in such cases, whether FreeBSD would catch up, or be "left > further in the dust." The traditional "propaganda" has been that > there are all sorts of reasons to expect PostgreSQL on FreeBSD to run > a bit faster than on Linux; it is a bit unexpected for the opposite to > seem true. AFAIK *BSD better in the handling of big loads - maybe when multiple concurrent tests run against a linux and a bsd box, we see better result. Or not. -- Tomka Gergely "S most - vajon barbárok nélkül mi lesz velünk? Ők mégiscsak megoldás voltak valahogy..."
2003-08-28 ragyogó napján Ludek Finstrle ezt üzente: > > Intelligent feedback is welcome. > > > > http://www.potentialtech.com/wmoran/postgresql.php > > Good work. But I can't find information about xfs. Do you plan to add > this one FS in test? http://mail.sth.sze.hu/~hsz/sql/ -- Tomka Gergely "S most - vajon barbárok nélkül mi lesz velünk? Ők mégiscsak megoldás voltak valahogy..."
> http://www.potentialtech.com/wmoran/postgresql.php > -- > Bill Moran > Potential Technologies > http://www.potentialtech.com Adding my voice to the many, thanks for sharing your results Bill. Very instructive. -- Best, Al Hulaton | Sr. Account Engineer | Command Prompt, Inc. 503.222.2783 | ahulaton@commandprompt.com Home of Mammoth PostgreSQL and 'Practical PostgreSQL' Managed PostgreSQL, Linux services and consulting Read and Search O'Reilly's 'Practical PostgreSQL' at http://www.commandprompt.com
> What it still leaves quite open is just what happens when the OS has > more than one disk drive or CPU to play with. It's not clear what > happens in such cases, whether FreeBSD would catch up, or be "left > further in the dust." The traditional "propaganda" has been that > there are all sorts of reasons to expect PostgreSQL on FreeBSD to > run a bit faster than on Linux; it is a bit unexpected for the > opposite to seem true. Let me nip this in the butt before people run away with ideas that aren't correct. When the tests were performed in FreeBSD 5.1 and Linux, the hard drives were running UDMA. When running 4.8, for some reason his drives settled in on PIO mode: ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) falling back to PIOmode The benchmarks were hardly conclusive as UDMA runs vastly faster than PIO. Until we hear back as to whether cables were jarred loose between the tests or hearing if something else changed, I'd hardly consider these conclusive tests given PIO/UDMA is apples to oranges in terms of speed and I fully expect that FreeBSD 4.8 will perform at least faster than 5.1 (5.x is still being unwound from Giant), but should out perform Linux as well if industry experience iss any indicator. -sc -- Sean Chittenden
I need to step in and do 2 things: First, apologize for posting inaccurate test results. Second, verify that Sean is absolutely correct. FreeBSD 4.8 was accessing the drives in PIO mode, which is significantly lousier than DMA, which RedHat was able to use. As a result, the tests are unreasonably skewed in favor of Linux. The only thing that the currently posted results prove is that Linux is better at dealing with crappy hardware than BSD (which I feel we already knew). I did some rescrounging, and found some newer hardware stuffed in a corner that I had forgotten was even around. I am currently re-running the tests and will post new results as soon as there are enough to be interesting to talk about. In an attempt to avoid the same mistake, I did a timed test with dd(1) to a raw partition on both Linux and FreeBSD to ensure that both systems are able to access the hardware at more or less the same speed. The results of this will be included. I'm also gathering considerably more information about the state of the system during the tests, which should answer a number of questions I've been getting. To the many people who asked questions like "why not try filesystem x on distribution y" and similar questions, the answer in most cases is time. I've pared the tests down some so they run faster, and I'm hoping to be able to run more tests on more combinations of configurations as a result. Also, I never intended for anyone to assume that I was _done_ testing, just that I had enough results for folks to talk about. I'll post again when I have enough results to be interesting, until then, I apologize again for the inaccurate results. Sean Chittenden wrote: >>What it still leaves quite open is just what happens when the OS has >>more than one disk drive or CPU to play with. It's not clear what >>happens in such cases, whether FreeBSD would catch up, or be "left >>further in the dust." The traditional "propaganda" has been that >>there are all sorts of reasons to expect PostgreSQL on FreeBSD to >>run a bit faster than on Linux; it is a bit unexpected for the >>opposite to seem true. > > Let me nip this in the butt before people run away with ideas that > aren't correct. When the tests were performed in FreeBSD 5.1 and > Linux, the hard drives were running UDMA. When running 4.8, for some > reason his drives settled in on PIO mode: > > ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying > ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying > ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying > ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) falling back to PIOmode > > The benchmarks were hardly conclusive as UDMA runs vastly faster than > PIO. Until we hear back as to whether cables were jarred loose > between the tests or hearing if something else changed, I'd hardly > consider these conclusive tests given PIO/UDMA is apples to oranges in > terms of speed and I fully expect that FreeBSD 4.8 will perform at > least faster than 5.1 (5.x is still being unwound from Giant), but > should out perform Linux as well if industry experience iss any > indicator. -- Bill Moran Potential Technologies http://www.potentialtech.com
> I need to step in and do 2 things: Thanks for posting that. Let me know if you have any questions while doing your testing. I've found that using 16K blocks on FreeBSD results in about an 8% speedup in writes to the database, fwiw. I'm likely going to make this the default for PostgreSQL on FreeBSD starting with 7.4 (just posted something to -hackers about this)f. If you'd like to do this in your testing, just apply the following patch. Right now PostgreSQL defaults to 8K blocks, but FreeBSD uses 16K blocks which means that currently, reading two blocks of data in PG is two read calls to the OS, one reads 16K of data off disk and returns the 1st page, the 2nd call pulls the 2nd block from the FS cache. In making things 16K, it avoids the need for the 2nd system call which is where the performance difference is coming from, afaikt. -sc -- Sean Chittenden
Attachment
>>>>> "SC" == Sean Chittenden <sean@chittenden.org> writes: >> I need to step in and do 2 things: SC> Thanks for posting that. Let me know if you have any questions while SC> doing your testing. I've found that using 16K blocks on FreeBSD SC> results in about an 8% speedup in writes to the database, fwiw. Where/how does one set this? In postgresql.conf or on the file system or during compilation of postgres? I'm on FreeBSD 4.8 still. I've got a box right now on which I'm comparing the speed merits of hardware RAID10, RAID5, and RAID50 using a filesystem benchmark utility (bonnie++). If I have time I'm gonna try different striping block sizes. Right now I'm using 32k byte stripe size. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
>>>>> "SC" == Sean Chittenden <sean@chittenden.org> writes: >> I need to step in and do 2 things: SC> Thanks for posting that. Let me know if you have any questions while SC> doing your testing. I've found that using 16K blocks on FreeBSD SC> results in about an 8% speedup in writes to the database, fwiw. ok.. ignore my prior request about how to set that... i missed you had included a patch. Any recommendations on newfs parameters for an overly large file system used solely for Postgres? Over 100Gb (with raid 10) or over 200Gb (with raid 5)? -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
On Thu, 28 Aug 2003, Sean Chittenden wrote: > > What it still leaves quite open is just what happens when the OS has > > more than one disk drive or CPU to play with. It's not clear what > > happens in such cases, whether FreeBSD would catch up, or be "left > > further in the dust." The traditional "propaganda" has been that > > there are all sorts of reasons to expect PostgreSQL on FreeBSD to > > run a bit faster than on Linux; it is a bit unexpected for the > > opposite to seem true. > > Let me nip this in the butt before people run away with ideas that > aren't correct. When the tests were performed in FreeBSD 5.1 and > Linux, the hard drives were running UDMA. When running 4.8, for some > reason his drives settled in on PIO mode: > > ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying > ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying > ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying > ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) falling back to PIOmode > > The benchmarks were hardly conclusive as UDMA runs vastly faster than > PIO. Until we hear back as to whether cables were jarred loose > between the tests or hearing if something else changed, I'd hardly > consider these conclusive tests given PIO/UDMA is apples to oranges in > terms of speed and I fully expect that FreeBSD 4.8 will perform at > least faster than 5.1 (5.x is still being unwound from Giant), but > should out perform Linux as well if industry experience iss any > indicator. Plus, in most "real" servers you're gonna be running SCSI, so it might be nice to see a test with a good SCSI controller (Symbios 875 is a nice choice) and a couple hard drives, one each for WAL and data. This would more closely resemble actual usage and there are likely to be fewer issues with things like UDMA versus PIO on SCSI.
> >> I need to step in and do 2 things: > SC> Thanks for posting that. Let me know if you have any questions while > SC> doing your testing. I've found that using 16K blocks on FreeBSD > SC> results in about an 8% speedup in writes to the database, fwiw. > > ok.. ignore my prior request about how to set that... i missed you > had included a patch. > > Any recommendations on newfs parameters for an overly large file > system used solely for Postgres? Over 100Gb (with raid 10) or over > 200Gb (with raid 5)? Nope, you'll have to test and see. If you find something that works, however, let me know. -sc -- Sean Chittenden
Shridhar Daithankar wrote: > On 26 Aug 2003 at 21:47, Bill Moran wrote: > > >>Hey all. >> >>I said I was going to do it, and I finally did it. >> >>As with all performance tests/benchmarks, there are probably dozens or >>more reasons why these results aren't as accurate or wonderful as they >>should be. Take them for what they are and hopefully everyone can >>learn a few things from them. >> >>Intelligent feedback is welcome. >> >>http://www.potentialtech.com/wmoran/postgresql.php > > > Can we have these benchmarks and relevant information stored in a central > archive at techdocs(Say)? > > That would be better than searching mail archives.. I agree. There doesn't seem to be a place on techdocs for benchmarks at the time. Is there another part of the site that would be good for these to go? I'll keep them posted on my site until a better location is found. -- Bill Moran Potential Technologies http://www.potentialtech.com
> > As with all performance tests/benchmarks, there are probably dozens or > > more reasons why these results aren't as accurate or wonderful as they > > should be. Take them for what they are and hopefully everyone can > > learn a few things from them. > > > > Intelligent feedback is welcome. > > > > http://www.potentialtech.com/wmoran/postgresql.php > > I notice that the Linux FSs weren't tested with noatime. Any > reason? My friend, (a FreeBSD committer), was wondering what the results are if you turn off softupdates (to match Linux default installation) and use noatime. He also wonders how bug the default IO is? Chris
> I'm likely going to make this the default for PostgreSQL on FreeBSD > starting with 7.4 (just posted something to -hackers about this)f. If > you'd like to do this in your testing, just apply the following patch. > > Right now PostgreSQL defaults to 8K blocks, but FreeBSD uses 16K > blocks which means that currently, reading two blocks of data in PG is > two read calls to the OS, one reads 16K of data off disk and returns > the 1st page, the 2nd call pulls the 2nd block from the FS cache. In > making things 16K, it avoids the need for the 2nd system call which is > where the performance difference is coming from, afaikt. -sc Are you _sure_ this won't cause any atomicity problems? Can FreeBSD write 16k as an atomic unit? Chris
Christopher Kings-Lynne wrote: >>>As with all performance tests/benchmarks, there are probably dozens or >>>more reasons why these results aren't as accurate or wonderful as they >>>should be. Take them for what they are and hopefully everyone can >>>learn a few things from them. >>> >>>Intelligent feedback is welcome. >>> >>>http://www.potentialtech.com/wmoran/postgresql.php >> >>I notice that the Linux FSs weren't tested with noatime. Any >>reason? > > My friend, (a FreeBSD committer), was wondering what the results are if you > turn off softupdates (to match Linux default installation) and use noatime. Keep an eye on the page. The test results will be posted shortly after I finish them. Keep in mind, I'm more interested in figuring out what can be done to make Postgres _faster_, so tests along that line are going to have a higher priority than ones that specifically compare "apples to apples" or anything like that. > He also wonders how bug the default IO is? Huh? -- Bill Moran Potential Technologies http://www.potentialtech.com
Christopher Kings-Lynne wrote: > > I'm likely going to make this the default for PostgreSQL on FreeBSD > > starting with 7.4 (just posted something to -hackers about this)f. If > > you'd like to do this in your testing, just apply the following patch. > > > > Right now PostgreSQL defaults to 8K blocks, but FreeBSD uses 16K > > blocks which means that currently, reading two blocks of data in PG is > > two read calls to the OS, one reads 16K of data off disk and returns > > the 1st page, the 2nd call pulls the 2nd block from the FS cache. In > > making things 16K, it avoids the need for the 2nd system call which is > > where the performance difference is coming from, afaikt. -sc > > Are you _sure_ this won't cause any atomicity problems? Can FreeBSD write > 16k as an atomic unit? We pre-modified page images to WAL before modifying the page. The disks are only 512-byte blocks, so we don't rely on file system atomicity anymore anyway. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Balazs Wellisch wrote: > > Bill, > > Very interesting results. I'd like to command you on your honesty. > Having started out with the intentions of proving that FreeBSD is faster > than Linux only to find that the opposite is true must not have been > rewarding for you. However, these unexpected results serve only to > reinforce the integrity of your tests. Looking at the results, ext2 is out because it isn't crash safe, and I have heard Reiser uses a lot more CPU to do its work. It does show ext3 as slow, which was expected. Interesting how JFS came out, and XFS would be interesting. And, of course, it is multiple backends that really shows PostgreSQL off, so it could radically affect the results. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
FreeBSD page size (was Re: The results of my PostgreSQL/filesystem performance tests)
From
Vivek Khera
Date:
>>>>> "SC" == Sean Chittenden <sean@chittenden.org> writes: >> I need to step in and do 2 things: SC> Thanks for posting that. Let me know if you have any questions while SC> doing your testing. I've found that using 16K blocks on FreeBSD SC> results in about an 8% speedup in writes to the database, fwiw. Just double checking: if I do this, then I need to halve the parameters in postgresql.conf that involve buffers, specifically, max_fsm_pages and shared_buffers. I think max_fsm_pages should be adjusted since the number of pages in the system overall has been halved. Anything else that should be re-tuned for this? My tests are still running so I don't have numbers yet. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
Ok... simple tests have completed. Here are some numbers. FreeBSD 4.8 PG 7.4b2 4GB Ram Dual Xeon 2.4GHz processors 14 U320 SCSI disks attached to Dell PERC3/DC RAID controller in RAID 5 config with 32k stripe size Dump file: -rw-r--r-- 1 vivek wheel 1646633745 Aug 28 11:01 19-Aug-2003.dump When restored (after deleting one index that took up ~1Gb -- turned out it was redundant to another multi-column index): % df -k /u/d02 Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/amrd1s1e 226408360 18067260 190228432 9% /u/d02 postgresql.conf alterations from standard: shared_buffers = 60000 sort_mem = 8192 vacuum_mem=131702 max_fsm_pages=1000000 effective_cache_size=25600 random_page-cost = 2 restore time: 14777 seconds vacuum analyze time: 30 minutes select count(*) from user_list where owner_id=315; 50388.64 ms the restore complained often about checkpoints occurring every few seconds: Sep 2 11:57:14 d02 postgres[49721]: [5-1] LOG: checkpoints are occurring too frequently (15 seconds apart) Sep 2 11:57:14 d02 postgres[49721]: [5-2] HINT: Consider increasing CHECKPOINT_SEGMENTS. The HINT threw me off since I had to set checkpoint_segments in postgresql.conf, where as CHECKPOINT_SEGMENTS implied to me a compile-time constant. Anyhow, so I deleted the PG data directory, and made these two changes: checkpoint_segments=50 sort_mem = 131702 This *really* improved the time for the restore: restore time: 11594 seconds then I reset the checkpoint_segments and sort_mem back to old values... vacuum analyze time is still 30 minutes select count(*) from user_list where owner_id=315; 51363.98 ms so the select appears a bit slower but it is hard to say why. the system is otherwise idle as it is not in production yet. Then I took the suggestion to update PG's page size to 16k and did the same increase on sort_mem and checkpoint_segments as above. I also halved the shared_buffers and max_fsm_pages (probably should have halved the effective_cache_size too...) restore time: 11322 seconds vacuum analyze time: 27 minutes select count(*) from user_list where owner_id=315; 48267.66 ms Granted, given this simple test it is hard to say whether the 16k blocks will make an improvement under live load, but I'm gonna give it a shot. The 16k block size shows me roughly 2-6% improvement on these tests. So throw in my vote for 16k blocks on FreeBSD (and annotate the docs to tell which parameters need to be halved to account for it). -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
> Ok... simple tests have completed. Here are some numbers. > > FreeBSD 4.8 > PG 7.4b2 > 4GB Ram > Dual Xeon 2.4GHz processors > 14 U320 SCSI disks attached to Dell PERC3/DC RAID controller in RAID 5 > config with 32k stripe size [snip] > Then I took the suggestion to update PG's page size to 16k and did the > same increase on sort_mem and checkpoint_segments as above. I also > halved the shared_buffers and max_fsm_pages (probably should have > halved the effective_cache_size too...) > > restore time: 11322 seconds > vacuum analyze time: 27 minutes > select count(*) from user_list where owner_id=315; 48267.66 ms > > > Granted, given this simple test it is hard to say whether the 16k > blocks will make an improvement under live load, but I'm gonna give it > a shot. The 16k block size shows me roughly 2-6% improvement on these > tests. > > So throw in my vote for 16k blocks on FreeBSD (and annotate the docs > to tell which parameters need to be halved to account for it). I haven't had a chance to run any tests yet (ELIFE), but there was a suggestion that 32K blocks was a better performer than 16K blocks (!!??!!??). I'm not sure why this is and my only guess is that it relies more heavily on the disk cache to ease IO. Since you have the hardware setup, Vivek, would it be possible for you to run a test with 32K blocks? I've started writing a threaded benchmarking program called pg_crush that I hope to post here in a few days that'll time connection startup times, INSERTs, DELETEs, UPDATEs, and both sequential scans as well as index scans for random and sequentially ordered tuples. It's similar to pgbench, except it generates its own data, uses pthreads (chears on KSE!), and returns more fine grained timing information for the various activities. -sc -- Sean Chittenden
>>>>> "SC" == Sean Chittenden <sean@chittenden.org> writes: SC> hardware setup, Vivek, would it be possible for you to run a test with SC> 32K blocks? Will do. What's another 4 hours... ;-) I guess I'll halve the buffer size parameters again... SC> I've started writing a threaded benchmarking program called pg_crush SC> that I hope to post here in a few days that'll time connection startup Ok. Please post it when it is ready. I've decided to wait until 7.4 is final before going to production so I've got this very expensive very fast box doing not much of anything for a little while...
Just curious, but Bruce(?) mentioned that apparently a 32k block size was found to show a 15% improvement ... care to run one more test? :) On Wed, 3 Sep 2003, Vivek Khera wrote: > Ok... simple tests have completed. Here are some numbers. > > FreeBSD 4.8 > PG 7.4b2 > 4GB Ram > Dual Xeon 2.4GHz processors > 14 U320 SCSI disks attached to Dell PERC3/DC RAID controller in RAID 5 > config with 32k stripe size > > Dump file: > -rw-r--r-- 1 vivek wheel 1646633745 Aug 28 11:01 19-Aug-2003.dump > > When restored (after deleting one index that took up ~1Gb -- turned > out it was redundant to another multi-column index): > > % df -k /u/d02 > Filesystem 1K-blocks Used Avail Capacity Mounted on > /dev/amrd1s1e 226408360 18067260 190228432 9% /u/d02 > > > > postgresql.conf alterations from standard: > shared_buffers = 60000 > sort_mem = 8192 > vacuum_mem=131702 > max_fsm_pages=1000000 > effective_cache_size=25600 > random_page-cost = 2 > > > restore time: 14777 seconds > vacuum analyze time: 30 minutes > select count(*) from user_list where owner_id=315; 50388.64 ms > > > the restore complained often about checkpoints occurring every few > seconds: > > Sep 2 11:57:14 d02 postgres[49721]: [5-1] LOG: checkpoints are occurring too frequently (15 seconds apart) > Sep 2 11:57:14 d02 postgres[49721]: [5-2] HINT: Consider increasing CHECKPOINT_SEGMENTS. > > The HINT threw me off since I had to set checkpoint_segments in > postgresql.conf, where as CHECKPOINT_SEGMENTS implied to me a > compile-time constant. > > Anyhow, so I deleted the PG data directory, and made these two > changes: > > checkpoint_segments=50 > sort_mem = 131702 > > This *really* improved the time for the restore: > > restore time: 11594 seconds > > then I reset the checkpoint_segments and sort_mem back to old > values... > > vacuum analyze time is still 30 minutes > select count(*) from user_list where owner_id=315; 51363.98 ms > > so the select appears a bit slower but it is hard to say why. the > system is otherwise idle as it is not in production yet. > > > Then I took the suggestion to update PG's page size to 16k and did the > same increase on sort_mem and checkpoint_segments as above. I also > halved the shared_buffers and max_fsm_pages (probably should have > halved the effective_cache_size too...) > > restore time: 11322 seconds > vacuum analyze time: 27 minutes > select count(*) from user_list where owner_id=315; 48267.66 ms > > > Granted, given this simple test it is hard to say whether the 16k > blocks will make an improvement under live load, but I'm gonna give it > a shot. The 16k block size shows me roughly 2-6% improvement on these > tests. > > So throw in my vote for 16k blocks on FreeBSD (and annotate the docs > to tell which parameters need to be halved to account for it). > > > -- > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Vivek Khera, Ph.D. Khera Communications, Inc. > Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 > AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
Vivek Khera wrote: > the restore complained often about checkpoints occurring every few > seconds: > > Sep 2 11:57:14 d02 postgres[49721]: [5-1] LOG: checkpoints are occurring too frequently (15 seconds apart) > Sep 2 11:57:14 d02 postgres[49721]: [5-2] HINT: Consider increasing CHECKPOINT_SEGMENTS. > > The HINT threw me off since I had to set checkpoint_segments in > postgresql.conf, where as CHECKPOINT_SEGMENTS implied to me a > compile-time constant. Woo hoo, my warning worked. Great. I uppercased it because config parameters are uppercased in the documentation. Do we mention config parameters in any other error messages? Should it be lowercased? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
> I uppercased it because config parameters are uppercased in the > documentation. Do we mention config parameters in any other error > messages? Should it be lowercased? How about changing the hint? Consider increasing CHECKPOINT_SEGMENTS in your postgresql.conf
Attachment
On Wed, 3 Sep 2003, Bruce Momjian wrote: > Vivek Khera wrote: > > the restore complained often about checkpoints occurring every few > > seconds: > > > > Sep 2 11:57:14 d02 postgres[49721]: [5-1] LOG: checkpoints are occurring too frequently (15 seconds apart) > > Sep 2 11:57:14 d02 postgres[49721]: [5-2] HINT: Consider increasing CHECKPOINT_SEGMENTS. > > > > The HINT threw me off since I had to set checkpoint_segments in > > postgresql.conf, where as CHECKPOINT_SEGMENTS implied to me a > > compile-time constant. > > Woo hoo, my warning worked. Great. > > I uppercased it because config parameters are uppercased in the > documentation. Do we mention config parameters in any other error > messages? Should it be lowercased? k, to me upper case denotes a compiler #define, so I would have been confused ... I'd go with lower case and single quotes around it to denote its a variable to be changed ...
Marc G. Fournier wrote: > On Wed, 3 Sep 2003, Bruce Momjian wrote: > > > Vivek Khera wrote: > > > the restore complained often about checkpoints occurring every few > > > seconds: > > > > > > Sep 2 11:57:14 d02 postgres[49721]: [5-1] LOG: checkpoints are occurring too frequently (15 seconds apart) > > > Sep 2 11:57:14 d02 postgres[49721]: [5-2] HINT: Consider increasing CHECKPOINT_SEGMENTS. > > > > > > The HINT threw me off since I had to set checkpoint_segments in > > > postgresql.conf, where as CHECKPOINT_SEGMENTS implied to me a > > > compile-time constant. > > > > Woo hoo, my warning worked. Great. > > > > I uppercased it because config parameters are uppercased in the > > documentation. Do we mention config parameters in any other error > > messages? Should it be lowercased? > > k, to me upper case denotes a compiler #define, so I would have been > confused ... I'd go with lower case and single quotes around it to denote > its a variable to be changed ... Done. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
>>>>> "MGF" == Marc G Fournier <scrappy@hub.org> writes: MGF> Just curious, but Bruce(?) mentioned that apparently a 32k block size was MGF> found to show a 15% improvement ... care to run one more test? :) Well, it is hard to tell based on my quick and dirty test: 16k page size: restore time: 11322 seconds vacuum analyze time: 1663 seconds (27 minutes) select count(*) from user_list where owner_id=315; 56666.64 ms 32k page size: restore time: 11430 seconds vacuum analyze time: 1346 seconds select count(*) from user_list where owner_id=315; 63275.73 ms one anomaly I note is that if I re-run the select count(*) query above, the large the page size, the longer the query takes. In the standard 8k page size, it was on the order of 306ms, with 16k page size it was over 1400, and with 32k page size nearly 3000ms. Another anomaly I note is that for the larger indexes, the relpages doesn't scale as expected. ie, I'd expect roughly half the relpages per index for 32k page size as for 16k page size, but this is not always the case... some are about the same size and some are about 2/3 and some are about 1/2. The smaller indexes are often the same number of pages (when under 20 pages). I think I'm going to write a synthetic load generator that does a bunch of inserts to some linked tables with several indexes, then goes thru and pounds on it (update/select) from multiple children with occasional vacuum's thrown in. That's the only way to get 'real' numbers, it seems. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
Vivek Khera wrote: > >>>>> "MGF" == Marc G Fournier <scrappy@hub.org> writes: > > MGF> Just curious, but Bruce(?) mentioned that apparently a 32k block size was > MGF> found to show a 15% improvement ... care to run one more test? :) > > > Well, it is hard to tell based on my quick and dirty test: The 32k number is from Tatsuo testing a few years ago. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
>>>>> "BM" == Bruce Momjian <pgman@candle.pha.pa.us> writes: BM> The 32k number is from Tatsuo testing a few years ago. Can you verify for me that these parameters in postgresql.conf are based on the BLCKSZ (ie one buffer is the size of the BLCKSIZ macro): shared_buffers effective_cache_size Logically it makes sense, but I want to be sure I'm adjusting my postgresql.conf parameters accordingly when I try different block sizes. Thanks. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
On Wed, 3 Sep 2003, Vivek Khera wrote: > >>>>> "SC" == Sean Chittenden <sean@chittenden.org> writes: > > >> I need to step in and do 2 things: > SC> Thanks for posting that. Let me know if you have any questions while > SC> doing your testing. I've found that using 16K blocks on FreeBSD > SC> results in about an 8% speedup in writes to the database, fwiw. > > Just double checking: if I do this, then I need to halve the > parameters in postgresql.conf that involve buffers, specifically, > max_fsm_pages and shared_buffers. I think max_fsm_pages should be > adjusted since the number of pages in the system overall has been > halved. > > Anything else that should be re-tuned for this? Yes, effective_cache_size as well is measured in pgsql blocks.
Vivek Khera wrote: > >>>>> "BM" == Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > BM> The 32k number is from Tatsuo testing a few years ago. > > Can you verify for me that these parameters in postgresql.conf are > based on the BLCKSZ (ie one buffer is the size of the BLCKSIZ macro): > > shared_buffers > effective_cache_size > > Logically it makes sense, but I want to be sure I'm adjusting my > postgresql.conf parameters accordingly when I try different block > sizes. Uh, yes, I think they have to be the same because they are pages in the shared buffer cache, not disk blocks. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Vivek Khera wrote: > >>>>> "BM" == Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > BM> The 32k number is from Tatsuo testing a few years ago. > > Can you verify for me that these parameters in postgresql.conf are > based on the BLCKSZ (ie one buffer is the size of the BLCKSIZ macro): > > shared_buffers > effective_cache_size > > Logically it makes sense, but I want to be sure I'm adjusting my > postgresql.conf parameters accordingly when I try different block > sizes. Also, to check, you can use ipcs to see the shared memory sizes allocated. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073