Thread: Some pgbench results
I was doing some load testing on a server, and decided to test it with different file systems to see how it reacts to load/speed. I tested xfs, jfs and ext3. The machine runs FC4 with the latest 2.6.15 kernel from Fedora. Hardware: Dual Opteron 246, 4GB RAM, Adaptec 2230 with battery backup, 2 10K SCSI disks in RAID1 for OS and WAL (with it's own partiton on ext3), 6 10K scsi disks in RAID10 (RAID1 in hw, RAID0 on top of that in sw). Postgres config tweaked as per the performance guide. Initialized the data with: pgbench -i -s 100 Test runs: pgbench -s 100 -t 10000 -c 20 I did 20 runs, removed the first 3 runs from each sample to account for stabilization. Here are the results in tps without connection establishing: FS: JFS XFS EXT3 Avg: 462 425 319 Stdev: 104 74 106 Intererstingly, the first 3 samples I removed had a MUCH higher tps count. Up to 900+. Bye, Guy. -- Family management on rails: http://www.famundo.com - coming soon! My develpment related blog: http://devblog.famundo.com
Just Someone wrote: > 2 10K SCSI disks in RAID1 for OS and WAL (with it's own partiton on > ext3), You'll want the WAL on its own spindle. IIRC a separate partition on a shared disc won't give you much benefit. The idea is to keep the disc's head from moving away for other tasks. Or so they say. regards, bkw
On Mar 23, 2006, at 11:32 AM, Bernhard Weisshuhn wrote: > Just Someone wrote: > >> 2 10K SCSI disks in RAID1 for OS and WAL (with it's own partiton on >> ext3), > > You'll want the WAL on its own spindle. IIRC a separate partition > on a shared disc won't give you much benefit. The idea is to keep > the disc's head from moving away for other tasks. Or so they say. Actually, the OS partitions are normally quiet enough that it won't make a huge difference, unless you're really hammering the database all the time. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Mar 23, 2006, at 11:01 AM, Just Someone wrote: > I was doing some load testing on a server, and decided to test it with > different file systems to see how it reacts to load/speed. I tested > xfs, jfs and ext3. The machine runs FC4 with the latest 2.6.15 kernel > from Fedora. You should also try testing ext3 with data=writeback, on both partitions. People have found it makes a big difference in performance. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim, I did another test with ext3 using data=writeback, and indeed it's much better: Avg: 429.87 Stdev: 77 A bit (very tiny bit) faster than xfs and bit slower than jfs. Still, very much improved. Bye, Guy. On 3/23/06, Jim Nasby <jnasby@pervasive.com> wrote: > On Mar 23, 2006, at 11:32 AM, Bernhard Weisshuhn wrote: > > > Just Someone wrote: > > > >> 2 10K SCSI disks in RAID1 for OS and WAL (with it's own partiton on > >> ext3), > > > > You'll want the WAL on its own spindle. IIRC a separate partition > > on a shared disc won't give you much benefit. The idea is to keep > > the disc's head from moving away for other tasks. Or so they say. > > Actually, the OS partitions are normally quiet enough that it won't > make a huge difference, unless you're really hammering the database > all the time. > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- Family management on rails: http://www.famundo.com - coming soon! My develpment related blog: http://devblog.famundo.com
Hi, > Did you re-initialize the test pgbench database between runs? > I get weird results otherwise since some integers gets overflowed in the > test (it doesn't complete the full 10000 transactions after the first run). No, I didn't. The reason is that I noticed that the first run is always MUCH faster. My initial runs if I reinit pgbench and run again will always hover around 900-970 tps for xfs. And I didn't need this as a real performance test, it was a side effect of a load test I was doing on the server. Also, pgbench isn't close to the load I'll see on my server (web application which will be mostly read) > Could you please tell me what stripe size you have on the raid system? > Could you also share the mkfs and mount options on each filesystem you > tried? RAID stripe size of 256K. File system creation: xfs: mkfs -t xfs -l size=64m /dev/md0 jfs: mkfs -t jfs /dev/md0 Mount for xfs with -o noatime,nodiratime,logbufs=8 jfs: -o noatime,nodiratime > A hint on using a raided ext3 system is to use whole block device > instead of partitions to align the data better and use data=journal with > a big journal. This might seem counter-productive at first (it did to > me) but I increased my throughput a lot when using this. Thanks for the advice! Actually, the RAID 10 I have is mounted as /var/lib/pgsql, so it's ONLY for postgres data, and the pg_xlog directory is mounted on another disk. > My filesystem parameters are calculated like this: > stripe=256 # <- 256k raid stripe size > bsize=4 # 4k blocksize > bsizeb=$(( $bsize * 1024 )) # in bytes > stride=$(( $stripe / $bsize )) > > mke2fs -b $bsizeb -j -J size=400 -m 1 -O sparse_super \ > -T largefile4 -E stride=$stride /dev/sdb > > Mounted with: mount -t ext3 -o data=journal,noatime /dev/sdb /mnt/test8 That's an interesting thing to try, though because of other things I want, I prefer xfs or jfs anyway. I will have an extreme number of schemas and files, which make high demands on the directory structure. My tests showed me that ext3 doesn't cope with many files in directories very well. With xfs and jfs I can create 500K files in one directory in no time (about 250 seconds), with ext3 it start to crawl after about 30K files. > I'm a little surprised that I can get more pgbench performance out of my > system since you're using 10K scsi disks. Please try the above settings > and see if it helps you... > > I've not run so many tests yet, I'll do some more after the weekend... Please share the results. It's very interesting... Bye, Guy. BTW, one thing I also tested is a software RAID0 over two RAID5 SATA arrays. Total disk count in this is 15. The read performance was really good. The write performance (as expected) not so great. But that was just a test to get a feeling of the speed. This RAID5 system is only used for file storage, not database. -- Family management on rails: http://www.famundo.com - coming soon! My development related blog: http://devblog.famundo.com
I played a bit with kernnel versions as I was getting a kernel panic on my Adaptec card. I downgraded to 2.6.11 (the original that came with fedora core 4) and the panic went away, but more than that, the performance on XFS went considerably higher. With the exact same settings as before, I got now Average of 813.65tps with a standard deviation of: 130.33. I hope this kernel doesn't panic on me. But I'll know just tomorrow as I'm pounding on the machine now. Bye, Guy. On 3/23/06, Magnus Naeslund(f) <mag@fbab.net> wrote: > Just Someone wrote: > > > > Initialized the data with: pgbench -i -s 100 > > Test runs: pgbench -s 100 -t 10000 -c 20 > > I did 20 runs, removed the first 3 runs from each sample to account > > for stabilization. > > Did you re-initialize the test pgbench database between runs? > I get weird results otherwise since some integers gets overflowed in the > test (it doesn't complete the full 10000 transactions after the first run). > > > Here are the results in tps without connection > > establishing: > > > > FS: JFS XFS EXT3 > > Avg: 462 425 319 > > Stdev: 104 74 106 > > > > Could you please tell me what stripe size you have on the raid system? > Could you also share the mkfs and mount options on each filesystem you > tried? > > I ran some tests on an somewhat similar system: > A supermicro H8SSL-i-B motherboard with one dual core opteron 165 with > 4gb of memory, debian sarge amd64 (current stable) but with a pristine > kernel.org 2.6.16 kernel (there's no debian patches or packages yet). > > It has a 3ware 9550 + BBU sata raid card with 6 disks in a raid 10 > configuration with 256kb stripe size. I think this results in about > 200mb/s raw read performance and about 155mb/s raw write performance (as > in tested with dd:ing a 10gb file back and forth). > I had no separate WAL device/partition, only tweaked postgresql.conf. > > I get about 520-530 tps with your pgbench parameters on ext3 but very > poor (order of magnitude) performance on xfs (that's why I ask of your > mkfs parameters). > > A hint on using a raided ext3 system is to use whole block device > instead of partitions to align the data better and use data=journal with > a big journal. This might seem counter-productive at first (it did to > me) but I increased my throughput a lot when using this. > > My filesystem parameters are calculated like this: > stripe=256 # <- 256k raid stripe size > bsize=4 # 4k blocksize > bsizeb=$(( $bsize * 1024 )) # in bytes > stride=$(( $stripe / $bsize )) > > mke2fs -b $bsizeb -j -J size=400 -m 1 -O sparse_super \ > -T largefile4 -E stride=$stride /dev/sdb > > Mounted with: mount -t ext3 -o data=journal,noatime /dev/sdb /mnt/test8 > > I'm a little surprised that I can get more pgbench performance out of my > system since you're using 10K scsi disks. Please try the above settings > and see if it helps you... > > I've not run so many tests yet, I'll do some more after the weekend... > > Regards, > Magnus > > > -- Family management on rails: http://www.famundo.com - coming soon! My development related blog: http://devblog.famundo.com
Just Someone wrote: [snip] >> >> mke2fs -b $bsizeb -j -J size=400 -m 1 -O sparse_super \ >> -T largefile4 -E stride=$stride /dev/sdb >> >> Mounted with: mount -t ext3 -o data=journal,noatime /dev/sdb /mnt/test8 > > That's an interesting thing to try, though because of other things I > want, I prefer xfs or jfs anyway. I will have an extreme number of > schemas and files, which make high demands on the directory structure. > My tests showed me that ext3 doesn't cope with many files in > directories very well. With xfs and jfs I can create 500K files in one > directory in no time (about 250 seconds), with ext3 it start to crawl > after about 30K files. > It might seem that I'm selling ext3 or something :) but it's the linux filesystem I know best. If you want ext3 to perform with large directories, there is an mkfs option that enables directory hashing that you can try: -O dir_index. Regards, Magnus
Hi Magnus, > It might seem that I'm selling ext3 or something :) but it's the linux > filesystem I know best. > If you want ext3 to perform with large directories, there is an mkfs > option that enables directory hashing that you can try: -O dir_index. Not at all (sell ext3 ;-) ). It's great to get this kind of info! I rather use ext3 as it's VERY stable., and the default in Fedora anyway. So thanks for the tip! Bye, Guy. -- Family management on rails: http://www.famundo.com - coming soon! My development related blog: http://devblog.famundo.com
"Magnus Naeslund(f)" <mag@fbab.net> writes: > It might seem that I'm selling ext3 or something :) but it's the linux > filesystem I know best. > If you want ext3 to perform with large directories, there is an mkfs > option that enables directory hashing that you can try: -O dir_index. You can also turn it on for an existing filesystem using 'tune2fs' and a remount, but it won't hash already-existing large directories--those will have to be recreated to take advantage of hashing. -Doug
Just Someone wrote: > > Initialized the data with: pgbench -i -s 100 > Test runs: pgbench -s 100 -t 10000 -c 20 > I did 20 runs, removed the first 3 runs from each sample to account > for stabilization. Did you re-initialize the test pgbench database between runs? I get weird results otherwise since some integers gets overflowed in the test (it doesn't complete the full 10000 transactions after the first run). > Here are the results in tps without connection > establishing: > > FS: JFS XFS EXT3 > Avg: 462 425 319 > Stdev: 104 74 106 > Could you please tell me what stripe size you have on the raid system? Could you also share the mkfs and mount options on each filesystem you tried? I ran some tests on an somewhat similar system: A supermicro H8SSL-i-B motherboard with one dual core opteron 165 with 4gb of memory, debian sarge amd64 (current stable) but with a pristine kernel.org 2.6.16 kernel (there's no debian patches or packages yet). It has a 3ware 9550 + BBU sata raid card with 6 disks in a raid 10 configuration with 256kb stripe size. I think this results in about 200mb/s raw read performance and about 155mb/s raw write performance (as in tested with dd:ing a 10gb file back and forth). I had no separate WAL device/partition, only tweaked postgresql.conf. I get about 520-530 tps with your pgbench parameters on ext3 but very poor (order of magnitude) performance on xfs (that's why I ask of your mkfs parameters). A hint on using a raided ext3 system is to use whole block device instead of partitions to align the data better and use data=journal with a big journal. This might seem counter-productive at first (it did to me) but I increased my throughput a lot when using this. My filesystem parameters are calculated like this: stripe=256 # <- 256k raid stripe size bsize=4 # 4k blocksize bsizeb=$(( $bsize * 1024 )) # in bytes stride=$(( $stripe / $bsize )) mke2fs -b $bsizeb -j -J size=400 -m 1 -O sparse_super \ -T largefile4 -E stride=$stride /dev/sdb Mounted with: mount -t ext3 -o data=journal,noatime /dev/sdb /mnt/test8 I'm a little surprised that I can get more pgbench performance out of my system since you're using 10K scsi disks. Please try the above settings and see if it helps you... I've not run so many tests yet, I'll do some more after the weekend... Regards, Magnus