Thread: dbt2 & opteron performance
I'm starting to get results with dbt2 on a 4-way opteron system and wanted to share what I've got so far since people have told me in the past that this architecture is more interesting than the itanium2 that I've been using. This 4-way has 8GB of memory and four Adaptec 2200s controllers attached to 80 spindles (eight 10-disk arrays). For those familiar with the schema, here is a visual of the disk layout:http://www.osdl.org/projects/dbt2dev/results/dev4-015/layout-6.html Results for a 600 warehouse run are there:http://www.osdl.org/projects/dbt2dev/results/dev4-015/6/ The tuning is a still a bit off, but feel free to let me know if there are any issues anyway. Mark
Hi Mark, Great stuff. One of the things that led me to PostgreSQL a couple years back was the exceptional OLTP performance I was able to wring out of it when running my own internal benchmarks against it. I have a couple questions (that I apologize if they are answered elsewhere). Note that the reason I ask is that I want to make sure we are getting close to comparing apples to apples with how the commercial companies "legally" run their tpc benchmarks. 1 - R we using 15,000 RPM SCSI drives mostly configured together as RAID-0. Also what about write-ahead logging and background writing and grouping transactions and ... 2 - I forget the brand off the top of my head, but, I don't think that most commercial tpc tests use Adaptec controllers. 3 - R we yet testing with dual core Opterons (or at least two dual core opterons) It's perfectly reasonable if because of cost considerations the answer is "not yet" to my questions above. For #1 above, a lot of people say "but that's a bad idea...". To that I say yeah, but... it's the only way to compare apples to apples when comparing Postgres to published benchmarks commercial rdbms performance. Please email me privately if there is some way EntepriseDB may be able to help. --Denis Lussier Chief Architect and Chairman EnterpriseDB Corporation > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Mark Wong > Sent: Tuesday, July 12, 2005 3:47 PM > To: pgsql-hackers@postgresql.org; testperf-general@pgfoundry.org > Subject: [HACKERS] dbt2 & opteron performance > > I'm starting to get results with dbt2 on a 4-way opteron system and > wanted to share what I've got so far since people have told me in the > past that this architecture is more interesting than the itanium2 that > I've been using. > > This 4-way has 8GB of memory and four Adaptec 2200s controllers attached > to 80 spindles (eight 10-disk arrays). For those familiar with the > schema, here is a visual of the disk layout: > http://www.osdl.org/projects/dbt2dev/results/dev4-015/layout-6.html > > Results for a 600 warehouse run are there: > http://www.osdl.org/projects/dbt2dev/results/dev4-015/6/ > > The tuning is a still a bit off, but feel free to let me know if there > are any issues anyway. > > Mark > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend
Mark, > I'm starting to get results with dbt2 on a 4-way opteron system and > wanted to share what I've got so far since people have told me in the > past that this architecture is more interesting than the itanium2 that > I've been using. > > This 4-way has 8GB of memory and four Adaptec 2200s controllers attached > to 80 spindles (eight 10-disk arrays). For those familiar with the > schema, here is a visual of the disk layout: > http://www.osdl.org/projects/dbt2dev/results/dev4-015/layout-6.html > > Results for a 600 warehouse run are there: > http://www.osdl.org/projects/dbt2dev/results/dev4-015/6/ > > The tuning is a still a bit off, but feel free to let me know if there > are any issues anyway. This e-mail came in while I was away. I, of course, am very interested in running tests on this machine. Which version of PostgreSQL is this? What configuration are you doing? I would expect that we could get at least 7000 on this platform; let me try to tweak it. -- Josh Berkus Aglio Database Solutions San Francisco
On Wed, Jul 27, 2005 at 07:32:34PM -0700, Josh Berkus wrote: > Mark, > > > I'm starting to get results with dbt2 on a 4-way opteron system and > > wanted to share what I've got so far since people have told me in the > > past that this architecture is more interesting than the itanium2 that > > I've been using. > > > > This 4-way has 8GB of memory and four Adaptec 2200s controllers attached > > to 80 spindles (eight 10-disk arrays). For those familiar with the > > schema, here is a visual of the disk layout: > > http://www.osdl.org/projects/dbt2dev/results/dev4-015/layout-6.html > > > > Results for a 600 warehouse run are there: > > http://www.osdl.org/projects/dbt2dev/results/dev4-015/6/ > > > > The tuning is a still a bit off, but feel free to let me know if there > > are any issues anyway. > > This e-mail came in while I was away. I, of course, am very interested in > running tests on this machine. Which version of PostgreSQL is this? What > configuration are you doing? I would expect that we could get at least 7000 > on this platform; let me try to tweak it. It's dev4-015. You should be able to login as root and create yourself an account. I'm not doing anything on the system now, so feel free to poke around. I've done a bad job of tracking what that first test was, but recently I've been trying CVS from July 25, 2005, and also have that base installed with v15 of the fast copy patch and Bruce's version of the xlog patch. I've only tried DBT2 on the system so far. After seeing the discussion about how bad the disk performance is with a lot of scsi controllers on linux, I'm wondering if we should run some disk tests to see how things look. Mark
On Wed, Jul 27, 2005 at 09:31:39PM -0700, Mark Wong wrote: > After seeing the discussion about how bad the disk performance is with a > lot of scsi controllers on linux, I'm wondering if we should run some > disk tests to see how things look. I'd be very interested to see how FreeBSD compares to Linux on the box... how hard would it be to do some form of multi-boot? -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Wed, Jul 27, 2005 at 07:32:34PM -0700, Josh Berkus wrote: > > This 4-way has 8GB of memory and four Adaptec 2200s controllers attached > > to 80 spindles (eight 10-disk arrays). For those familiar with the > > schema, here is a visual of the disk layout: > > http://www.osdl.org/projects/dbt2dev/results/dev4-015/layout-6.html Have you by-chance tried it with the logs and data just going to seperate RAID10s? I'm wondering if a large RAID10 would do a better job of spreading the load than segmenting things to specific drives. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Thu, Jul 28, 2005 at 05:14:41PM -0500, Jim C. Nasby wrote: > On Wed, Jul 27, 2005 at 09:31:39PM -0700, Mark Wong wrote: > > After seeing the discussion about how bad the disk performance is with a > > lot of scsi controllers on linux, I'm wondering if we should run some > > disk tests to see how things look. > > I'd be very interested to see how FreeBSD compares to Linux on the > box... how hard would it be to do some form of multi-boot? Err, I sent that before realizing where the tests were happening. I'm guessing the answer is 'no'. :) -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Thu, 28 Jul 2005 17:17:25 -0500 "Jim C. Nasby" <decibel@decibel.org> wrote: > On Wed, Jul 27, 2005 at 07:32:34PM -0700, Josh Berkus wrote: > > > This 4-way has 8GB of memory and four Adaptec 2200s controllers attached > > > to 80 spindles (eight 10-disk arrays). For those familiar with the > > > schema, here is a visual of the disk layout: > > > http://www.osdl.org/projects/dbt2dev/results/dev4-015/layout-6.html > > Have you by-chance tried it with the logs and data just going to > seperate RAID10s? I'm wondering if a large RAID10 would do a better job > of spreading the load than segmenting things to specific drives. No, haven't tried that. That would reduce my number of spindles as I scale up. ;) I have the disks attached as JBODs and use LVM2 to stripe the disks together. Mark
On Thu, 28 Jul 2005 17:19:34 -0500 "Jim C. Nasby" <decibel@decibel.org> wrote: > On Thu, Jul 28, 2005 at 05:14:41PM -0500, Jim C. Nasby wrote: > > On Wed, Jul 27, 2005 at 09:31:39PM -0700, Mark Wong wrote: > > > After seeing the discussion about how bad the disk performance is with a > > > lot of scsi controllers on linux, I'm wondering if we should run some > > > disk tests to see how things look. > > > > I'd be very interested to see how FreeBSD compares to Linux on the > > box... how hard would it be to do some form of multi-boot? > > Err, I sent that before realizing where the tests were happening. I'm > guessing the answer is 'no'. :) Yeah, I might get in trouble. ;) Mark
On Thu, Jul 28, 2005 at 04:15:31PM -0700, Mark Wong wrote: > On Thu, 28 Jul 2005 17:17:25 -0500 > "Jim C. Nasby" <decibel@decibel.org> wrote: > > > On Wed, Jul 27, 2005 at 07:32:34PM -0700, Josh Berkus wrote: > > > > This 4-way has 8GB of memory and four Adaptec 2200s controllers attached > > > > to 80 spindles (eight 10-disk arrays). For those familiar with the > > > > schema, here is a visual of the disk layout: > > > > http://www.osdl.org/projects/dbt2dev/results/dev4-015/layout-6.html > > > > Have you by-chance tried it with the logs and data just going to > > seperate RAID10s? I'm wondering if a large RAID10 would do a better job > > of spreading the load than segmenting things to specific drives. > > No, haven't tried that. That would reduce my number of spindles as I > scale up. ;) I have the disks attached as JBODs and use LVM2 to stripe > the disks together. I'm confused... why would it reduce the number of spindles? Is everything just striped right now? You could always s/RAID10/RAID0/. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Thu, 28 Jul 2005 18:48:09 -0500 "Jim C. Nasby" <decibel@decibel.org> wrote: > On Thu, Jul 28, 2005 at 04:15:31PM -0700, Mark Wong wrote: > > On Thu, 28 Jul 2005 17:17:25 -0500 > > "Jim C. Nasby" <decibel@decibel.org> wrote: > > > > > On Wed, Jul 27, 2005 at 07:32:34PM -0700, Josh Berkus wrote: > > > > > This 4-way has 8GB of memory and four Adaptec 2200s controllers attached > > > > > to 80 spindles (eight 10-disk arrays). For those familiar with the > > > > > schema, here is a visual of the disk layout: > > > > > http://www.osdl.org/projects/dbt2dev/results/dev4-015/layout-6.html > > > > > > Have you by-chance tried it with the logs and data just going to > > > seperate RAID10s? I'm wondering if a large RAID10 would do a better job > > > of spreading the load than segmenting things to specific drives. > > > > No, haven't tried that. That would reduce my number of spindles as I > > scale up. ;) I have the disks attached as JBODs and use LVM2 to stripe > > the disks together. > > I'm confused... why would it reduce the number of spindles? Is > everything just striped right now? You could always s/RAID10/RAID0/. RAID10 requires a minimum of 4 devices per LUN, I think. At least 2 devices in a mirror, at least 2 mirrored devices to stripe. RAID0 wouldn't be any different than what I have now, except if I use hardware RAID I can't stripe across controllers. That's treating LVM2 striping equal to software RAID0 of course. Mark
On Thu, 28 Jul 2005 16:55:55 -0700 Mark Wong <markw@osdl.org> wrote: > On Thu, 28 Jul 2005 18:48:09 -0500 > "Jim C. Nasby" <decibel@decibel.org> wrote: > > > On Thu, Jul 28, 2005 at 04:15:31PM -0700, Mark Wong wrote: > > > On Thu, 28 Jul 2005 17:17:25 -0500 > > > "Jim C. Nasby" <decibel@decibel.org> wrote: > > > > > > > On Wed, Jul 27, 2005 at 07:32:34PM -0700, Josh Berkus wrote: > > > > > > This 4-way has 8GB of memory and four Adaptec 2200s controllers attached > > > > > > to 80 spindles (eight 10-disk arrays). For those familiar with the > > > > > > schema, here is a visual of the disk layout: > > > > > > http://www.osdl.org/projects/dbt2dev/results/dev4-015/layout-6.html > > > > > > > > Have you by-chance tried it with the logs and data just going to > > > > seperate RAID10s? I'm wondering if a large RAID10 would do a better job > > > > of spreading the load than segmenting things to specific drives. > > > > > > No, haven't tried that. That would reduce my number of spindles as I > > > scale up. ;) I have the disks attached as JBODs and use LVM2 to stripe > > > the disks together. > > > > I'm confused... why would it reduce the number of spindles? Is > > everything just striped right now? You could always s/RAID10/RAID0/. > > RAID10 requires a minimum of 4 devices per LUN, I think. At least 2 > devices in a mirror, at least 2 mirrored devices to stripe. > > RAID0 wouldn't be any different than what I have now, except if I use > hardware RAID I can't stripe across controllers. That's treating LVM2 > striping equal to software RAID0 of course. Oops, spindles was the wrong word to describe what I was losing. But I wouldn't be able to spread the reads/writes across as many spindles if I have any mirroring. Mark
On Thu, Jul 28, 2005 at 05:00:44PM -0700, Mark Wong wrote: > On Thu, 28 Jul 2005 16:55:55 -0700 > Mark Wong <markw@osdl.org> wrote: > > > On Thu, 28 Jul 2005 18:48:09 -0500 > > "Jim C. Nasby" <decibel@decibel.org> wrote: > > > > > On Thu, Jul 28, 2005 at 04:15:31PM -0700, Mark Wong wrote: > > > > On Thu, 28 Jul 2005 17:17:25 -0500 > > > > "Jim C. Nasby" <decibel@decibel.org> wrote: > > > > > > > > > On Wed, Jul 27, 2005 at 07:32:34PM -0700, Josh Berkus wrote: > > > > > > > This 4-way has 8GB of memory and four Adaptec 2200s controllers attached > > > > > > > to 80 spindles (eight 10-disk arrays). For those familiar with the > > > > > > > schema, here is a visual of the disk layout: > > > > > > > http://www.osdl.org/projects/dbt2dev/results/dev4-015/layout-6.html > > > > > > > > > > Have you by-chance tried it with the logs and data just going to > > > > > seperate RAID10s? I'm wondering if a large RAID10 would do a better job > > > > > of spreading the load than segmenting things to specific drives. > > > > > > > > No, haven't tried that. That would reduce my number of spindles as I > > > > scale up. ;) I have the disks attached as JBODs and use LVM2 to stripe > > > > the disks together. > > > > > > I'm confused... why would it reduce the number of spindles? Is > > > everything just striped right now? You could always s/RAID10/RAID0/. > > > > RAID10 requires a minimum of 4 devices per LUN, I think. At least 2 > > devices in a mirror, at least 2 mirrored devices to stripe. > > > > RAID0 wouldn't be any different than what I have now, except if I use > > hardware RAID I can't stripe across controllers. That's treating LVM2 > > striping equal to software RAID0 of course. > > Oops, spindles was the wrong word to describe what I was losing. But I > wouldn't be able to spread the reads/writes across as many spindles if I > have any mirroring. Not sure I fully understand what you're trying to say, but it seems like it might still be worth trying my original idea of just turning all 80 disks into one giant RAID0/striped array and see how much more bandwidth you get out of that. At a minimum it would allow you to utilize the remaining spindles, which appear to be unused right now. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Fri, 29 Jul 2005 14:39:08 -0500 "Jim C. Nasby" <decibel@decibel.org> wrote: > On Thu, Jul 28, 2005 at 05:00:44PM -0700, Mark Wong wrote: > > On Thu, 28 Jul 2005 16:55:55 -0700 > > Mark Wong <markw@osdl.org> wrote: > > > > > On Thu, 28 Jul 2005 18:48:09 -0500 > > > "Jim C. Nasby" <decibel@decibel.org> wrote: > > > > > > > On Thu, Jul 28, 2005 at 04:15:31PM -0700, Mark Wong wrote: > > > > > On Thu, 28 Jul 2005 17:17:25 -0500 > > > > > "Jim C. Nasby" <decibel@decibel.org> wrote: > > > > > > > > > > > On Wed, Jul 27, 2005 at 07:32:34PM -0700, Josh Berkus wrote: > > > > > > > > This 4-way has 8GB of memory and four Adaptec 2200s controllers attached > > > > > > > > to 80 spindles (eight 10-disk arrays). For those familiar with the > > > > > > > > schema, here is a visual of the disk layout: > > > > > > > > http://www.osdl.org/projects/dbt2dev/results/dev4-015/layout-6.html > > > > > > > > > > > > Have you by-chance tried it with the logs and data just going to > > > > > > seperate RAID10s? I'm wondering if a large RAID10 would do a better job > > > > > > of spreading the load than segmenting things to specific drives. > > > > > > > > > > No, haven't tried that. That would reduce my number of spindles as I > > > > > scale up. ;) I have the disks attached as JBODs and use LVM2 to stripe > > > > > the disks together. > > > > > > > > I'm confused... why would it reduce the number of spindles? Is > > > > everything just striped right now? You could always s/RAID10/RAID0/. > > > > > > RAID10 requires a minimum of 4 devices per LUN, I think. At least 2 > > > devices in a mirror, at least 2 mirrored devices to stripe. > > > > > > RAID0 wouldn't be any different than what I have now, except if I use > > > hardware RAID I can't stripe across controllers. That's treating LVM2 > > > striping equal to software RAID0 of course. > > > > Oops, spindles was the wrong word to describe what I was losing. But I > > wouldn't be able to spread the reads/writes across as many spindles if I > > have any mirroring. > > Not sure I fully understand what you're trying to say, but it seems like > it might still be worth trying my original idea of just turning all 80 > disks into one giant RAID0/striped array and see how much more bandwidth > you get out of that. At a minimum it would allow you to utilize the > remaining spindles, which appear to be unused right now. I have done that before actually, when the tablespace patch came out. I was able to get almost 40% more throughput with half the drives than striping all the disks together. Mark
On Fri, Jul 29, 2005 at 12:51:57PM -0700, Mark Wong wrote: > > Not sure I fully understand what you're trying to say, but it seems like > > it might still be worth trying my original idea of just turning all 80 > > disks into one giant RAID0/striped array and see how much more bandwidth > > you get out of that. At a minimum it would allow you to utilize the > > remaining spindles, which appear to be unused right now. > > I have done that before actually, when the tablespace patch came out. I > was able to get almost 40% more throughput with half the drives than > striping all the disks together. Wow, that's a pretty stunning difference... any idea why? I think it might be very useful to see some raw disk IO benchmarks... -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Fri, 29 Jul 2005 14:57:42 -0500 "Jim C. Nasby" <decibel@decibel.org> wrote: > On Fri, Jul 29, 2005 at 12:51:57PM -0700, Mark Wong wrote: > > > Not sure I fully understand what you're trying to say, but it seems like > > > it might still be worth trying my original idea of just turning all 80 > > > disks into one giant RAID0/striped array and see how much more bandwidth > > > you get out of that. At a minimum it would allow you to utilize the > > > remaining spindles, which appear to be unused right now. > > > > I have done that before actually, when the tablespace patch came out. I > > was able to get almost 40% more throughput with half the drives than > > striping all the disks together. > > Wow, that's a pretty stunning difference... any idea why? > > I think it might be very useful to see some raw disk IO benchmarks... A lot of it has to do with how the disk is being accessed. The log is ideally doing sequential writes, some tables only read, some read/writer. The varying access patterns between tables/log/indexes can negatively conflict with each other. Some of it has to do with how the OS deals with file systems. I think on linux is there a page buffer flush daemon per file system. A real OS person can answer this part better than me. Mark
Mark, > I have done that before actually, when the tablespace patch came out. I > was able to get almost 40% more throughput with half the drives than > striping all the disks together. That's not the figures you showed me. In your report last year it was 14%, not 40%. -- Josh Berkus Aglio Database Solutions San Francisco
On Fri, 29 Jul 2005 13:35:32 -0700 Josh Berkus <josh@agliodbs.com> wrote: > Mark, > > > I have done that before actually, when the tablespace patch came out. I > > was able to get almost 40% more throughput with half the drives than > > striping all the disks together. > > That's not the figures you showed me. In your report last year it was 14%, > not 40%. Sorry I wasn't clear, I'll elaborate. In the BOF at LWE-SF 2004, I did report a 13% improvement but at the same time I also said I had not quantified it as well as I would have liked and was still working on a better physical disk layout. For LWE-Boston 2005, I did a little better and reported 35% (and misquoted myself to say 40%) here in these slides: http://developer.osdl.org/markw/presentations/lwebos2005bof.sxi In that test I still had not separated the primary keys into separate tablespaces. I would imagine there is more throughput to be gained by doing that. I have the build scripts do that now, but again haven't quite quantified it yet. Mark
On Fri, 29 Jul 2005 13:19:06 -0700 "Luke Lonergan" <llonergan@greenplum.com> wrote: > Mark, > > On 7/29/05 12:51 PM, "Mark Wong" <markw@osdl.org> wrote: > > > Adaptec 2200s > > Have you tried non-RAID SCSI controllers in this configuration? When we > used the Adaptec 2120s previously, we got very poor performance using SW > RAID (though much better than HW RAID) compared to simple SCSI controllers. > > See attached, particularly the RAW RESULTS tab. Comments welcome :-) No, we actually don't have any non-RAID SCSI controllers to try... Mark
On Fri, Jul 29, 2005 at 01:11:35PM -0700, Mark Wong wrote: > On Fri, 29 Jul 2005 14:57:42 -0500 > "Jim C. Nasby" <decibel@decibel.org> wrote: > > > On Fri, Jul 29, 2005 at 12:51:57PM -0700, Mark Wong wrote: > > > > Not sure I fully understand what you're trying to say, but it seems like > > > > it might still be worth trying my original idea of just turning all 80 > > > > disks into one giant RAID0/striped array and see how much more bandwidth > > > > you get out of that. At a minimum it would allow you to utilize the > > > > remaining spindles, which appear to be unused right now. > > > > > > I have done that before actually, when the tablespace patch came out. I > > > was able to get almost 40% more throughput with half the drives than > > > striping all the disks together. > > > > Wow, that's a pretty stunning difference... any idea why? > > > > I think it might be very useful to see some raw disk IO benchmarks... > > A lot of it has to do with how the disk is being accessed. The log is > ideally doing sequential writes, some tables only read, some > read/writer. The varying access patterns between tables/log/indexes can > negatively conflict with each other. Well, seperating logs from everything else does make a lot of sense. Still interesting that you've been able to see so much gain. > Some of it has to do with how the OS deals with file systems. I think > on linux is there a page buffer flush daemon per file system. A real OS > person can answer this part better than me. So, about testing with FreeBSD.... :P -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"