Thread: What's the best hardver for PostgreSQL 8.1?

What's the best hardver for PostgreSQL 8.1?

From

Antal Attila

Date:

20 December 2005, 14:26:23

Hi!

What do you suggest for the next problem?
We have complex databases with some 100million rows (2-3million new
records per month). Our current servers are working on low resposibility
in these days, so we have to buy new hardver for database server. Some
weeks ago we started to work with PostgreSQL8.1, which solved the
problem for some months.
There are some massive, hard query execution, which are too slow (5-10
or more minutes). The parallel processing is infrequent (rarely max. 4-5
parallel query execution).
So we need high performance in query execution with medium parallel
processability.
What's your opinion what productions could help us? What is the best or
only better choice?
The budget line is about 30 000$ - 40 000$.

Regards, Atesz

Re: What's the best hardver for PostgreSQL 8.1?

From

"Jim C. Nasby"

Date:

20 December 2005, 16:06:12

On Tue, Dec 20, 2005 at 07:27:15PM +0100, Antal Attila wrote:
> We have complex databases with some 100million rows (2-3million new

How much space does that equate to?

> records per month). Our current servers are working on low resposibility
> in these days, so we have to buy new hardver for database server. Some
> weeks ago we started to work with PostgreSQL8.1, which solved the
> problem for some months.
> There are some massive, hard query execution, which are too slow (5-10
> or more minutes). The parallel processing is infrequent (rarely max. 4-5
> parallel query execution).
> So we need high performance in query execution with medium parallel
> processability.
> What's your opinion what productions could help us? What is the best or
> only better choice?
> The budget line is about 30 000$ - 40 000$.

Have you optimized the queries?

Items that generally have the biggest impact on performance in
decreasing order:
1. System architecture
2. Database design
3. (for long-running/problem queries) Query plans
4. Disk I/O
5. Memory
6. CPU

So, I'd make sure that the queries have been optimized (and that
includes tuning postgresql.conf) before assuming you need more hardware.

Based on what you've told us (very little parallelization), then your
biggest priority is probably either disk IO or memory (or both). Without
knowing the size of your database/working set it's difficult to provide
more specific advice.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: What's the best hardver for PostgreSQL 8.1?

From

Vivek Khera

Date:

20 December 2005, 17:08:24

On Dec 20, 2005, at 1:27 PM, Antal Attila wrote:

> The budget line is about 30 000$ - 40 000$.

Like Jim said, without more specifics it is hard to give more
specific recommendations, but I'm architecting something like this
for my current app which needs ~100GB disk space.  I made room to
grow in my configuration:

dual opteron 2.2GHz
4GB RAM
LSI MegaRAID 320-2X
14-disk SCSI U320 enclosure with 15k RPM drives, 7 connected to each
channel on the RAID.
   1 pair in RAID1 mirror for OS + pg_xlog
   rest in RAID10 with each mirrored pair coming from opposite SCSI
channels for data

I run FreeBSD but whatever you prefer should be sufficient if it is
not windows.

I don't know how prices are in Hungary, but around here something
like this with 36GB drives comes to around $11,000 or $12,000.

The place I concentrate on is the disk I/O bandwidth which is why I
prefer Opteron over Intel XEON.

Re: What's the best hardver for PostgreSQL 8.1?

From

Juan Casero

Date:

20 December 2005, 20:43:32

Can you elaborate on the reasons the opteron is better than the Xeon when it
comes to disk io?   I have a PostgreSQL 7.4.8 box running a DSS.   One of our
tables is about 13 million rows.   I had a number of queries against this
table that used innner joins on 5 or 6 tables including the 13 million row
one.  The performance was atrocious.  The database itself is about 20 gigs
but I want it to scale to 100 gigs.  I tuned postgresql as best I could and
gave the server huge amounts of memory for caching as well.  I also tweaked
the cost parameters for a sequential scan vs an index scan of the query
optimizer and used the query explain mechanism to get some idea of what the
optimizer was doing and where I should index the tables.  When I added the
sixth table to the inner join the query performance took a nose dive.
Admittedly this system is a single PIII 1000Mhz with 1.2 gigs of ram and no
raid.  I do have two Ultra 160 scsi drives with the database tables mount
point on a partition on one physical drive and pg_xlog mount point on another
partition of the second drive.    I have been trying to get my employer to
spring for new hardware ($8k to $10k) which I had planned to be a dual - dual
core opteron system from HP.  Until they agree to spend the money I resorted
to writing a plpgsql functions to handle the queries.  Inside plpgsql I can
break the query apart into seperate stages each of which runs much faster.  I
can use temporary tables to store intermediate results without worrying about
temp table collisions with different users thanks to transaction isolation.
I am convinced we need new hardware to scale this application *but* I agree
with the consensus voiced here that it is more important to optimize the
query first before going out to buy new hardware.   I was able to do things
with PostgreSQL on this cheap server that I could never imagine doing with
SQL server or even oracle on such a low end box.  My OS is Fedora Core 3 but
I wonder if anyone has tested and benchmarked PostgreSQL on the new Sun x64
servers running Solaris 10 x86.

Thanks,
Juan

On Tuesday 20 December 2005 16:08, Vivek Khera wrote:
> On Dec 20, 2005, at 1:27 PM, Antal Attila wrote:
> > The budget line is about 30 000$ - 40 000$.
>
> Like Jim said, without more specifics it is hard to give more
> specific recommendations, but I'm architecting something like this
> for my current app which needs ~100GB disk space.  I made room to
> grow in my configuration:
>
> dual opteron 2.2GHz
> 4GB RAM
> LSI MegaRAID 320-2X
> 14-disk SCSI U320 enclosure with 15k RPM drives, 7 connected to each
> channel on the RAID.
>    1 pair in RAID1 mirror for OS + pg_xlog
>    rest in RAID10 with each mirrored pair coming from opposite SCSI
> channels for data
>
> I run FreeBSD but whatever you prefer should be sufficient if it is
> not windows.
>
> I don't know how prices are in Hungary, but around here something
> like this with 36GB drives comes to around $11,000 or $12,000.
>
> The place I concentrate on is the disk I/O bandwidth which is why I
> prefer Opteron over Intel XEON.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

Re: What's the best hardver for PostgreSQL 8.1?

From

David Lang

Date:

20 December 2005, 22:50:01

On Tue, 20 Dec 2005, Juan Casero wrote:

> Date: Tue, 20 Dec 2005 19:50:47 -0500
> From: Juan Casero <caseroj@comcast.net>
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?
>
> Can you elaborate on the reasons the opteron is better than the Xeon when it
> comes to disk io?

the opteron is cheaper so you have more money to spend on disks :-)

also when you go into multi-cpu systems the front-side-bus design of the
Xeon's can easily become your system bottleneck so that you can't take
advantage of all the CPU's becouse they stall waiting for memory accesses,
Opteron systems have a memory bus per socket so the more CPU's you have
the more memory bandwidth you have.

> The database itself is about 20 gigs
> but I want it to scale to 100 gigs.

how large is the working set? in your tests you ran into swapping on your
1.2G system, buying a dual opteron with 16gigs of ram will allow you to
work with much larger sets of data, and you can go beyond that if needed.

David Lang

Re: What's the best hardver for PostgreSQL 8.1?

From

Antal Attila

Date:

21 December 2005, 06:37:49

Jim C. Nasby wrote:

>How much space does that equate to?
>
>
>Have you optimized the queries?
>
>Items that generally have the biggest impact on performance in
>decreasing order:
>1. System architecture
>2. Database design
>3. (for long-running/problem queries) Query plans
>4. Disk I/O
>5. Memory
>6. CPU
>
>So, I'd make sure that the queries have been optimized (and that
>includes tuning postgresql.conf) before assuming you need more hardware.
>
>Based on what you've told us (very little parallelization), then your
>biggest priority is probably either disk IO or memory (or both). Without
>knowing the size of your database/working set it's difficult to provide
>more specific advice.
>
>
Hi!

We have 3 Compaq Proliant ML530 servers with dual Xeon 2.8GHz
processors, 3 GB DDR RAM, Ultra Wide SCSI RAID5 10000rpm and 1000Gbit
ethernet. We partitioned our databases among these machines, but there
are cross refrences among the machines theoretically. Now the size of
datas is about 100-110GB. We've used these servers for 3 years with
Debian Linux. We have already optimized the given queries and the
postgresql.conf. We tried more tricks and ideas and we read and asked on
mailing lists. We cannot do anything, we should buy new server for the
databases, because we develop our system for newer services, so the size
will grow along. After that we need better responsiblility and shorter
execution time for the big queries (These queries are too complicated to
discuss here, and more times we optimized with plpgsql stored procedures.).
The PostgreSQL 8.1 solved more paralellization and overload problem, the
average load is decreased significantly on our servers. But the big
queries aren't fast enough. We think the hardver is the limit. Generally
2 parallel guery running in working hours, after we make backups at night.

Regards, Atesz

Re: What's the best hardver for PostgreSQL 8.1?

From

William Yu

Date:

21 December 2005, 19:57:59

Juan Casero wrote:
> Can you elaborate on the reasons the opteron is better than the Xeon when it
> comes to disk io?   I have a PostgreSQL 7.4.8 box running a DSS.   One of our

Opterons have 64-bit IOMMU -- Xeons don't. That means in 64-bit mode,
transfers to > 4GB, the OS must allocated the memory < 4GB, DMA to that
block and then the CPU must do extra work in copying the memory to >
4GB. Versus on the Opteron, it's done by the IO adaptor using DMA in the
background.

Re: What's the best hardver for PostgreSQL 8.1?

From

Juan Casero

Date:

21 December 2005, 23:01:50

I just sent my boss an email asking him for a Sun v20z with dual 2.2 Ghz
opterons, 2 Gigs of RAM and RAID 1.  I would have liked a better server
capable of RAID but that seems to be out of his budget right now.  Ok so I
assume I get this Sun box.  Most likely I will go with Linux since it is a
fair bet he doesn't want to pay for the Solaris 10 x86 license.  Although I
kind of like the idea of using Solaris 10 x86 for this.   I will assume I
need to install the x64 kernel that comes with say Fedora Core 4.  Should I
run the Postgresql 8.x binaries in 32 bit mode or 64 bit mode?   My instinct
tells me 64 bit mode is most efficient for our database size about 20 gigs
right now but may grow to 100 gigs in a year or so.  I just finished loading
a 20 gig database on a dual 900 Mhz Ultrasparc III system with 2 gigs of ram
and about 768 megs of shared memory available for the posgresql server
running Solaris 10.  The load has smoked a P4 3.2 Ghz system I am using also
with 2 gigs of ram running postgresql 8.0.3.   I mean I started the sparc
load after the P4 load.  The sparc load has finished already rebuilding the
database from a pg_dump file but the P4 system is still going.  The p4 has
1.3 Gigs of shared memory allocated to postgresql.  How about them apples?

Thanks,
Juan

On Wednesday 21 December 2005 18:57, William Yu wrote:
> Juan Casero wrote:
> > Can you elaborate on the reasons the opteron is better than the Xeon when
> > it comes to disk io?   I have a PostgreSQL 7.4.8 box running a DSS.   One
> > of our
>
> Opterons have 64-bit IOMMU -- Xeons don't. That means in 64-bit mode,
> transfers to > 4GB, the OS must allocated the memory < 4GB, DMA to that
> block and then the CPU must do extra work in copying the memory to >
> 4GB. Versus on the Opteron, it's done by the IO adaptor using DMA in the
> background.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Re: What's the best hardver for PostgreSQL 8.1?

From

Tom Arthurs

Date:

21 December 2005, 23:34:04

AFAIK there are no licensing costs for solaris, unless you are talking about a software support agreement, which is not required.

Juan Casero wrote:

I just sent my boss an email asking him for a Sun v20z with dual 2.2 Ghz 
opterons, 2 Gigs of RAM and RAID 1.  I would have liked a better server 
capable of RAID but that seems to be out of his budget right now.  Ok so I 
assume I get this Sun box.  Most likely I will go with Linux since it is a 
fair bet he doesn't want to pay for the Solaris 10 x86 license.  Although I 
kind of like the idea of using Solaris 10 x86 for this.   I will assume I 
need to install the x64 kernel that comes with say Fedora Core 4.  Should I 
run the Postgresql 8.x binaries in 32 bit mode or 64 bit mode?   My instinct 
tells me 64 bit mode is most efficient for our database size about 20 gigs 
right now but may grow to 100 gigs in a year or so.  I just finished loading 
a 20 gig database on a dual 900 Mhz Ultrasparc III system with 2 gigs of ram 
and about 768 megs of shared memory available for the posgresql server 
running Solaris 10.  The load has smoked a P4 3.2 Ghz system I am using also 
with 2 gigs of ram running postgresql 8.0.3.   I mean I started the sparc 
load after the P4 load.  The sparc load has finished already rebuilding the 
database from a pg_dump file but the P4 system is still going.  The p4 has 
1.3 Gigs of shared memory allocated to postgresql.  How about them apples?


Thanks,
Juan

On Wednesday 21 December 2005 18:57, William Yu wrote:

Juan Casero wrote:

Can you elaborate on the reasons the opteron is better than the Xeon when
it comes to disk io?   I have a PostgreSQL 7.4.8 box running a DSS.   One
of our

Opterons have 64-bit IOMMU -- Xeons don't. That means in 64-bit mode,
transfers to > 4GB, the OS must allocated the memory < 4GB, DMA to that
block and then the CPU must do extra work in copying the memory to >
4GB. Versus on the Opteron, it's done by the IO adaptor using DMA in the
background.

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate      subscribe-nomail command to majordomo@postgresql.org so that your      message can get through to the mailing list cleanly

Re: What's the best hardver for PostgreSQL 8.1?

From

"Jignesh K. Shah"

Date:

22 December 2005, 00:17:43

Hi Juan,

Solaris 10 license is for free.. Infact I believe you do receive the
media with Sun Fire V20z. If you want support then there are various
"pay" plans depending on the level of support. If not your license
allows you Right to Use anyway for free.

That said I haven't done much testing with 32/64 bit differences.
However for long term purposes, 64-bit always seems to be the safe bet.
As for your load performance, lot of it depends on your file system
layout also.

Regards,
Jignesh



Juan Casero wrote:

>I just sent my boss an email asking him for a Sun v20z with dual 2.2 Ghz
>opterons, 2 Gigs of RAM and RAID 1.  I would have liked a better server
>capable of RAID but that seems to be out of his budget right now.  Ok so I
>assume I get this Sun box.  Most likely I will go with Linux since it is a
>fair bet he doesn't want to pay for the Solaris 10 x86 license.  Although I
>kind of like the idea of using Solaris 10 x86 for this.   I will assume I
>need to install the x64 kernel that comes with say Fedora Core 4.  Should I
>run the Postgresql 8.x binaries in 32 bit mode or 64 bit mode?   My instinct
>tells me 64 bit mode is most efficient for our database size about 20 gigs
>right now but may grow to 100 gigs in a year or so.  I just finished loading
>a 20 gig database on a dual 900 Mhz Ultrasparc III system with 2 gigs of ram
>and about 768 megs of shared memory available for the posgresql server
>running Solaris 10.  The load has smoked a P4 3.2 Ghz system I am using also
>with 2 gigs of ram running postgresql 8.0.3.   I mean I started the sparc
>load after the P4 load.  The sparc load has finished already rebuilding the
>database from a pg_dump file but the P4 system is still going.  The p4 has
>1.3 Gigs of shared memory allocated to postgresql.  How about them apples?
>
>
>Thanks,
>Juan
>
>On Wednesday 21 December 2005 18:57, William Yu wrote:
>
>
>>Juan Casero wrote:
>>
>>
>>>Can you elaborate on the reasons the opteron is better than the Xeon when
>>>it comes to disk io?   I have a PostgreSQL 7.4.8 box running a DSS.   One
>>>of our
>>>
>>>
>>Opterons have 64-bit IOMMU -- Xeons don't. That means in 64-bit mode,
>>transfers to > 4GB, the OS must allocated the memory < 4GB, DMA to that
>>block and then the CPU must do extra work in copying the memory to >
>>4GB. Versus on the Opteron, it's done by the IO adaptor using DMA in the
>>background.
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 6: explain analyze is your friend
>>
>>
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>
>

Re: What's the best hardver for PostgreSQL 8.1?

From

Juan Casero

Date:

22 December 2005, 22:35:29

Sorry folks.  I had a couple of glasses of wine as I wrote this.  Anyway I
originally wanted the box to have more than two drives so I could do RAID 5
but that is going to cost too much.  Also, contrary to my statement below it
seems to me I should run the 32 bit postgresql server on the 64 bit kernel.
Would you agree this will probably yield the best performance?    I know it
depends alot on the system but for now this database is about 20 gigabytes.
Not too large right now but it may grow 5x in the next year.

Thanks,
Juan

On Wednesday 21 December 2005 22:09, Juan Casero wrote:
> I just sent my boss an email asking him for a Sun v20z with dual 2.2 Ghz
> opterons, 2 Gigs of RAM and RAID 1.  I would have liked a better server
> capable of RAID but that seems to be out of his budget right now.  Ok so I
> assume I get this Sun box.  Most likely I will go with Linux since it is a
> fair bet he doesn't want to pay for the Solaris 10 x86 license.  Although I
> kind of like the idea of using Solaris 10 x86 for this.   I will assume I
> need to install the x64 kernel that comes with say Fedora Core 4.  Should I
> run the Postgresql 8.x binaries in 32 bit mode or 64 bit mode?   My
> instinct tells me 64 bit mode is most efficient for our database size about
> 20 gigs right now but may grow to 100 gigs in a year or so.  I just
> finished loading a 20 gig database on a dual 900 Mhz Ultrasparc III system
> with 2 gigs of ram and about 768 megs of shared memory available for the
> posgresql server running Solaris 10.  The load has smoked a P4 3.2 Ghz
> system I am using also with 2 gigs of ram running postgresql 8.0.3.   I
> mean I started the sparc load after the P4 load.  The sparc load has
> finished already rebuilding the database from a pg_dump file but the P4
> system is still going.  The p4 has 1.3 Gigs of shared memory allocated to
> postgresql.  How about them apples?
>
>
> Thanks,
> Juan
>
> On Wednesday 21 December 2005 18:57, William Yu wrote:
> > Juan Casero wrote:
> > > Can you elaborate on the reasons the opteron is better than the Xeon
> > > when it comes to disk io?   I have a PostgreSQL 7.4.8 box running a
> > > DSS.   One of our
> >
> > Opterons have 64-bit IOMMU -- Xeons don't. That means in 64-bit mode,
> > transfers to > 4GB, the OS must allocated the memory < 4GB, DMA to that
> > block and then the CPU must do extra work in copying the memory to >
> > 4GB. Versus on the Opteron, it's done by the IO adaptor using DMA in the
> > background.
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: explain analyze is your friend
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

Re: What's the best hardver for PostgreSQL 8.1?

From

David Lang

Date:

22 December 2005, 23:16:05

On Wed, 21 Dec 2005, Juan Casero wrote:

> Date: Wed, 21 Dec 2005 22:31:54 -0500
> From: Juan Casero <caseroj@comcast.net>
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?
>
> Sorry folks.  I had a couple of glasses of wine as I wrote this.  Anyway I
> originally wanted the box to have more than two drives so I could do RAID 5
> but that is going to cost too much.  Also, contrary to my statement below it
> seems to me I should run the 32 bit postgresql server on the 64 bit kernel.
> Would you agree this will probably yield the best performance?

you definantly need a 64 bit kernel to address as much ram as you will
need.

the question of 32 bit vs 64 bit postgres needs to be benchmarked, but my
inclination is that you probably do want 64 bit for that as well.

64 bit binaries are slightly larger then 32 bit ones (less so on x86/AMD64
then on any other mixed platform though), but the 64 bit version also has
access to twice as many registers as a 32 bit one, and the Opteron chips
have some other features that become availabel in 64 bit mode (or more
useful)

like everything else this needs benchmarks to prove with your workload
(I'm trying to get some started, but haven't had a chance yet)

David Lang

> I know it
> depends alot on the system but for now this database is about 20 gigabytes.
> Not too large right now but it may grow 5x in the next year.
>
> Thanks,
> Juan
>
> On Wednesday 21 December 2005 22:09, Juan Casero wrote:
>> I just sent my boss an email asking him for a Sun v20z with dual 2.2 Ghz
>> opterons, 2 Gigs of RAM and RAID 1.  I would have liked a better server
>> capable of RAID but that seems to be out of his budget right now.  Ok so I
>> assume I get this Sun box.  Most likely I will go with Linux since it is a
>> fair bet he doesn't want to pay for the Solaris 10 x86 license.  Although I
>> kind of like the idea of using Solaris 10 x86 for this.   I will assume I
>> need to install the x64 kernel that comes with say Fedora Core 4.  Should I
>> run the Postgresql 8.x binaries in 32 bit mode or 64 bit mode?   My
>> instinct tells me 64 bit mode is most efficient for our database size about
>> 20 gigs right now but may grow to 100 gigs in a year or so.  I just
>> finished loading a 20 gig database on a dual 900 Mhz Ultrasparc III system
>> with 2 gigs of ram and about 768 megs of shared memory available for the
>> posgresql server running Solaris 10.  The load has smoked a P4 3.2 Ghz
>> system I am using also with 2 gigs of ram running postgresql 8.0.3.   I
>> mean I started the sparc load after the P4 load.  The sparc load has
>> finished already rebuilding the database from a pg_dump file but the P4
>> system is still going.  The p4 has 1.3 Gigs of shared memory allocated to
>> postgresql.  How about them apples?
>>
>>
>> Thanks,
>> Juan
>>
>> On Wednesday 21 December 2005 18:57, William Yu wrote:
>>> Juan Casero wrote:
>>>> Can you elaborate on the reasons the opteron is better than the Xeon
>>>> when it comes to disk io?   I have a PostgreSQL 7.4.8 box running a
>>>> DSS.   One of our
>>>
>>> Opterons have 64-bit IOMMU -- Xeons don't. That means in 64-bit mode,
>>> transfers to > 4GB, the OS must allocated the memory < 4GB, DMA to that
>>> block and then the CPU must do extra work in copying the memory to >
>>> 4GB. Versus on the Opteron, it's done by the IO adaptor using DMA in the
>>> background.
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 6: explain analyze is your friend
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>        subscribe-nomail command to majordomo@postgresql.org so that your
>>        message can get through to the mailing list cleanly
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Re: What's the best hardver for PostgreSQL 8.1?

From

Juan Casero

Date:

23 December 2005, 00:00:56

Ok thanks.  I think I will go with 64 bit everything on the box.  If I can get
the Sun Fire V20Z then I will stick with Solaris 10 x86 and download the 64
bit PostgreSQL 8.1 binaries from blastwave.org.   I develop the PHP code to
my DSS system on my Windows XP laptop.  Normally, I test the code on this
laptop but let it hit the live database when I want to run some tests.  Well
just this afternoon I installed PostgreSQL 8.1.1 on my windows laptop and
rebuilt the the entire live database instance on there from a pg_dump
archive.   I am blown away by the performance increase in PostgreSQL 8.1.x.
Has anyone else had a chance to test it?   All the queries I run against it
are remarkably fast but more importantly I can see that the two cores of my
Hyper Threaded P4 are being used.   One of the questions I posted on this
list was whether PostgreSQL could make use of the large number of cores
available on the Ultrasparc T1000/T2000 cores.  I am beginning to think that
with PostgreSQL 8.1.x the buffer manager could indeed use all those cores.
This could make running a DSS or OLTP on an Ultrasparc T1000/T2000 with
PostgreSQL a much better bargain than on an intel system.  Any thoughts?

Thanks,
Juan

On Thursday 22 December 2005 22:12, David Lang wrote:
> On Wed, 21 Dec 2005, Juan Casero wrote:
> > Date: Wed, 21 Dec 2005 22:31:54 -0500
> > From: Juan Casero <caseroj@comcast.net>
> > To: pgsql-performance@postgresql.org
> > Subject: Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?
> >
> > Sorry folks.  I had a couple of glasses of wine as I wrote this.  Anyway
> > I originally wanted the box to have more than two drives so I could do
> > RAID 5 but that is going to cost too much.  Also, contrary to my
> > statement below it seems to me I should run the 32 bit postgresql server
> > on the 64 bit kernel. Would you agree this will probably yield the best
> > performance?
>
> you definantly need a 64 bit kernel to address as much ram as you will
> need.
>
> the question of 32 bit vs 64 bit postgres needs to be benchmarked, but my
> inclination is that you probably do want 64 bit for that as well.
>
> 64 bit binaries are slightly larger then 32 bit ones (less so on x86/AMD64
> then on any other mixed platform though), but the 64 bit version also has
> access to twice as many registers as a 32 bit one, and the Opteron chips
> have some other features that become availabel in 64 bit mode (or more
> useful)
>
> like everything else this needs benchmarks to prove with your workload
> (I'm trying to get some started, but haven't had a chance yet)
>
> David Lang
>
> > I know it
> > depends alot on the system but for now this database is about 20
> > gigabytes. Not too large right now but it may grow 5x in the next year.
> >
> > Thanks,
> > Juan
> >
> > On Wednesday 21 December 2005 22:09, Juan Casero wrote:
> >> I just sent my boss an email asking him for a Sun v20z with dual 2.2 Ghz
> >> opterons, 2 Gigs of RAM and RAID 1.  I would have liked a better server
> >> capable of RAID but that seems to be out of his budget right now.  Ok so
> >> I assume I get this Sun box.  Most likely I will go with Linux since it
> >> is a fair bet he doesn't want to pay for the Solaris 10 x86 license.
> >> Although I kind of like the idea of using Solaris 10 x86 for this.   I
> >> will assume I need to install the x64 kernel that comes with say Fedora
> >> Core 4.  Should I run the Postgresql 8.x binaries in 32 bit mode or 64
> >> bit mode?   My instinct tells me 64 bit mode is most efficient for our
> >> database size about 20 gigs right now but may grow to 100 gigs in a year
> >> or so.  I just finished loading a 20 gig database on a dual 900 Mhz
> >> Ultrasparc III system with 2 gigs of ram and about 768 megs of shared
> >> memory available for the posgresql server running Solaris 10.  The load
> >> has smoked a P4 3.2 Ghz system I am using also with 2 gigs of ram
> >> running postgresql 8.0.3.   I mean I started the sparc load after the P4
> >> load.  The sparc load has finished already rebuilding the database from
> >> a pg_dump file but the P4 system is still going.  The p4 has 1.3 Gigs of
> >> shared memory allocated to postgresql.  How about them apples?
> >>
> >>
> >> Thanks,
> >> Juan
> >>
> >> On Wednesday 21 December 2005 18:57, William Yu wrote:
> >>> Juan Casero wrote:
> >>>> Can you elaborate on the reasons the opteron is better than the Xeon
> >>>> when it comes to disk io?   I have a PostgreSQL 7.4.8 box running a
> >>>> DSS.   One of our
> >>>
> >>> Opterons have 64-bit IOMMU -- Xeons don't. That means in 64-bit mode,
> >>> transfers to > 4GB, the OS must allocated the memory < 4GB, DMA to that
> >>> block and then the CPU must do extra work in copying the memory to >
> >>> 4GB. Versus on the Opteron, it's done by the IO adaptor using DMA in
> >>> the background.
> >>>
> >>> ---------------------------(end of
> >>> broadcast)--------------------------- TIP 6: explain analyze is your
> >>> friend
> >>
> >> ---------------------------(end of broadcast)---------------------------
> >> TIP 1: if posting/reading through Usenet, please send an appropriate
> >>        subscribe-nomail command to majordomo@postgresql.org so that your
> >>        message can get through to the mailing list cleanly
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: Don't 'kill -9' the postmaster
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

Re: What's the best hardver for PostgreSQL 8.1?

From

David Lang

Date:

23 December 2005, 00:18:34

On Thu, 22 Dec 2005, Juan Casero wrote:

> Ok thanks.  I think I will go with 64 bit everything on the box.  If I can get
> the Sun Fire V20Z then I will stick with Solaris 10 x86 and download the 64
> bit PostgreSQL 8.1 binaries from blastwave.org.   I develop the PHP code to
> my DSS system on my Windows XP laptop.  Normally, I test the code on this
> laptop but let it hit the live database when I want to run some tests.  Well
> just this afternoon I installed PostgreSQL 8.1.1 on my windows laptop and
> rebuilt the the entire live database instance on there from a pg_dump
> archive.   I am blown away by the performance increase in PostgreSQL 8.1.x.
> Has anyone else had a chance to test it?   All the queries I run against it
> are remarkably fast but more importantly I can see that the two cores of my
> Hyper Threaded P4 are being used.   One of the questions I posted on this
> list was whether PostgreSQL could make use of the large number of cores
> available on the Ultrasparc T1000/T2000 cores.  I am beginning to think that
> with PostgreSQL 8.1.x the buffer manager could indeed use all those cores.
> This could make running a DSS or OLTP on an Ultrasparc T1000/T2000 with
> PostgreSQL a much better bargain than on an intel system.  Any thoughts?

if you have enough simultanious transactions, and your I/O systems (disk
and memory interfaces) can keep up with your needs then postgres can use
quite a few cores.

there are some limits that will show up with more cores, but I don't think
it's well known where they are (this will also be very dependant on your
workload as well). there was the discussion within the last month or two
that hit the postgres weekly news where more attention is being paied to
the locking mechanisms used so this is an area under active development
(note especially that some locking strategies that work well with multiple
full cores can be crippling with virtual cores (Intel HT etc).

but it boils down to the fact that there just isn't enough experiance with
the new sun systems to know how well they will work. they could end up
being fabulous speed demons, or dogs (and it could even be both, depending
on your workload)

David Lang

> Thanks,
> Juan
>
> On Thursday 22 December 2005 22:12, David Lang wrote:
>> On Wed, 21 Dec 2005, Juan Casero wrote:
>>> Date: Wed, 21 Dec 2005 22:31:54 -0500
>>> From: Juan Casero <caseroj@comcast.net>
>>> To: pgsql-performance@postgresql.org
>>> Subject: Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?
>>>
>>> Sorry folks.  I had a couple of glasses of wine as I wrote this.  Anyway
>>> I originally wanted the box to have more than two drives so I could do
>>> RAID 5 but that is going to cost too much.  Also, contrary to my
>>> statement below it seems to me I should run the 32 bit postgresql server
>>> on the 64 bit kernel. Would you agree this will probably yield the best
>>> performance?
>>
>> you definantly need a 64 bit kernel to address as much ram as you will
>> need.
>>
>> the question of 32 bit vs 64 bit postgres needs to be benchmarked, but my
>> inclination is that you probably do want 64 bit for that as well.
>>
>> 64 bit binaries are slightly larger then 32 bit ones (less so on x86/AMD64
>> then on any other mixed platform though), but the 64 bit version also has
>> access to twice as many registers as a 32 bit one, and the Opteron chips
>> have some other features that become availabel in 64 bit mode (or more
>> useful)
>>
>> like everything else this needs benchmarks to prove with your workload
>> (I'm trying to get some started, but haven't had a chance yet)
>>
>> David Lang
>>
>>> I know it
>>> depends alot on the system but for now this database is about 20
>>> gigabytes. Not too large right now but it may grow 5x in the next year.
>>>
>>> Thanks,
>>> Juan
>>>
>>> On Wednesday 21 December 2005 22:09, Juan Casero wrote:
>>>> I just sent my boss an email asking him for a Sun v20z with dual 2.2 Ghz
>>>> opterons, 2 Gigs of RAM and RAID 1.  I would have liked a better server
>>>> capable of RAID but that seems to be out of his budget right now.  Ok so
>>>> I assume I get this Sun box.  Most likely I will go with Linux since it
>>>> is a fair bet he doesn't want to pay for the Solaris 10 x86 license.
>>>> Although I kind of like the idea of using Solaris 10 x86 for this.   I
>>>> will assume I need to install the x64 kernel that comes with say Fedora
>>>> Core 4.  Should I run the Postgresql 8.x binaries in 32 bit mode or 64
>>>> bit mode?   My instinct tells me 64 bit mode is most efficient for our
>>>> database size about 20 gigs right now but may grow to 100 gigs in a year
>>>> or so.  I just finished loading a 20 gig database on a dual 900 Mhz
>>>> Ultrasparc III system with 2 gigs of ram and about 768 megs of shared
>>>> memory available for the posgresql server running Solaris 10.  The load
>>>> has smoked a P4 3.2 Ghz system I am using also with 2 gigs of ram
>>>> running postgresql 8.0.3.   I mean I started the sparc load after the P4
>>>> load.  The sparc load has finished already rebuilding the database from
>>>> a pg_dump file but the P4 system is still going.  The p4 has 1.3 Gigs of
>>>> shared memory allocated to postgresql.  How about them apples?
>>>>
>>>>
>>>> Thanks,
>>>> Juan
>>>>
>>>> On Wednesday 21 December 2005 18:57, William Yu wrote:
>>>>> Juan Casero wrote:
>>>>>> Can you elaborate on the reasons the opteron is better than the Xeon
>>>>>> when it comes to disk io?   I have a PostgreSQL 7.4.8 box running a
>>>>>> DSS.   One of our
>>>>>
>>>>> Opterons have 64-bit IOMMU -- Xeons don't. That means in 64-bit mode,
>>>>> transfers to > 4GB, the OS must allocated the memory < 4GB, DMA to that
>>>>> block and then the CPU must do extra work in copying the memory to >
>>>>> 4GB. Versus on the Opteron, it's done by the IO adaptor using DMA in
>>>>> the background.
>>>>>
>>>>> ---------------------------(end of
>>>>> broadcast)--------------------------- TIP 6: explain analyze is your
>>>>> friend
>>>>
>>>> ---------------------------(end of broadcast)---------------------------
>>>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>>>        subscribe-nomail command to majordomo@postgresql.org so that your
>>>>        message can get through to the mailing list cleanly
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 2: Don't 'kill -9' the postmaster
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 2: Don't 'kill -9' the postmaster
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>

Re: What's the best hardver for PostgreSQL 8.1?

From

Vivek Khera

Date:

23 December 2005, 12:23:33

On Dec 22, 2005, at 11:14 PM, David Lang wrote:

> but it boils down to the fact that there just isn't enough
> experiance with the new sun systems to know how well they will
> work. they could end up being fabulous speed demons, or dogs (and
> it could even be both, depending on your workload)

The v20z isn't the newest sun hardware anyhow... The X2100, X4100,
and X4200 are.  I've been trying to buy an X4100 for going on three
weeks now but the local sun reseller is making it very hard.  you'd
think they'd actually want to go out of their way to make a sale but
they seem to do the opposite.

for those of you who say 'well, it is a small sale' my original
request was for over $50k in equipment, and after a while decided
that other equipment from other vendors who do care was sufficient,
and only the opteron boxes needed to come from sun. add a zero return
policy and you wonder how they expect to keep in business....

sorry, i had to vent.

but once it does come in I'll be glad to post up some numbers :-)

Re: What's the best hardver for PostgreSQL 8.1?

From

Mark Kirkwood

Date:

23 December 2005, 18:15:52

Vivek Khera wrote:

> and only the
> opteron boxes needed to come from sun. add a zero return  policy and you
> wonder how they expect to keep in business....
>
> sorry, i had to vent.
>

Just out of interest - why did the opterons need to come from Sun?

Re: What's the best hardver for PostgreSQL 8.1?

From

Vivek Khera

Date:

24 December 2005, 00:00:46

On Dec 23, 2005, at 5:15 PM, Mark Kirkwood wrote:

> Vivek Khera wrote:
>
>> and only the opteron boxes needed to come from sun. add a zero
>> return  policy and you wonder how they expect to keep in business....
>> sorry, i had to vent.
>>
>
> Just out of interest - why did the opterons need to come from Sun?

There are three tier-1 vendors selling opteron:  IBM, Sun, and HP.
HP's have historically had slow RAID configurations, and IBM tries to
hide them and only offers really one model, a 1U unit.  I've already
been through buying opteron systems from the smaller vendors and it
basically wasted a lot of my time due to what seems like quality
control issues.

So it could be Sun or IBM.  IBM seems to make it harder to buy from
them than Sun...

Re: What's the best hardver for PostgreSQL 8.1?

From

Frank Wiles

Date:

24 December 2005, 15:50:41

On Wed, 21 Dec 2005 22:31:54 -0500
Juan Casero <caseroj@comcast.net> wrote:

> Sorry folks.  I had a couple of glasses of wine as I wrote this.
> Anyway I originally wanted the box to have more than two drives so I
> could do RAID 5 but that is going to cost too much.  Also, contrary
> to my statement below it seems to me I should run the 32 bit
> postgresql server on the 64 bit kernel. Would you agree this will
> probably yield the best performance?    I know it depends alot on the
> system but for now this database is about 20 gigabytes. Not too large
> right now but it may grow 5x in the next year.

  You definitely DO NOT want to do RAID 5 on a database server.  That
  is probably the worst setup you could have, I've seen it have lower
  performance than just a single hard disk.

  RAID 1 and RAID 1+0 are optimal, but you want to stay far away from
  RAID 5.  IMHO RAID 5 is only useful on near line backup servers or
  Samba file servers where space is more important than speed.

 ---------------------------------
   Frank Wiles <frank@wiles.org>
   http://www.wiles.org
 ---------------------------------

Re: What's the best hardver for PostgreSQL 8.1?

From

Ron

Date:

24 December 2005, 17:31:34

At 02:50 PM 12/24/2005, Frank Wiles wrote:
>On Wed, 21 Dec 2005 22:31:54 -0500
>Juan Casero <caseroj@comcast.net> wrote:
>
> > Sorry folks.  I had a couple of glasses of wine as I wrote this.
> > Anyway I originally wanted the box to have more than two drives so I
> > could do RAID 5 but that is going to cost too much.  Also, contrary
> > to my statement below it seems to me I should run the 32 bit
> > postgresql server on the 64 bit kernel. Would you agree this will
> > probably yield the best performance?    I know it depends alot on the
> > system but for now this database is about 20 gigabytes. Not too large
> > right now but it may grow 5x in the next year.
>
>   You definitely DO NOT want to do RAID 5 on a database server.  That
>   is probably the worst setup you could have, I've seen it have lower
>   performance than just a single hard disk.
>
>   RAID 1 and RAID 1+0 are optimal, but you want to stay far away from
>   RAID 5.  IMHO RAID 5 is only useful on near line backup servers or
>   Samba file servers where space is more important than speed.
That's a bit misleading.  RAID 5 excels when you want read speed but
don't care as much about write speed.  Writes are typical ~2/3 the
speed of reads on a typical decent RAID 5 set up.

Side Note:  Some years ago Mylex had a family of fast (for the time)
RAID 5 HW controllers that actually read and wrote at the same
speed.  IBM bought them to kill them and protect LSI Logic.  Mylex
X24's (?IIRC the model number correctly?) are still reasonable HW.

So if you have tables that are read often and written to rarely or
not at all, putting them on RAID 5 is optimal.  In both data mining
like and OLTP like apps there are usually at least some such tables.

RAID 1 is good for stuff where speed doesn't matter and all you are
looking for is an insurance policy.

RAID 10 is the best way to get high performance on both reads and
writes, but it has a significantly greater cost for the same amount
of usable physical media.

If you've got the budget or are dealing with small enough physical
storage needs, by all means use RAID 10.  OTOH, if you are dealing
with large enterprise class apps like Sarbanes Oxley compliance,
medical and/or insurance, etc, etc, the storage needs can get so
large that RAID 10 for everything or even most things is not
possible.  Even if economically feasible.

RAID levels are like any other tool.  Each is useful in the proper
circumstances.

Happy holidays,
Ron Peacetree

Re: What's the best hardver for PostgreSQL 8.1?

From

"Joshua D. Drake"

Date:

24 December 2005, 17:42:04

>
> If you've got the budget or are dealing with small enough physical
> storage needs, by all means use RAID 10.  OTOH, if you are dealing
> with large enterprise class apps like Sarbanes Oxley compliance,
> medical and/or insurance, etc, etc, the storage needs can get so large
> that RAID 10 for everything or even most things is not possible.  Even
> if economically feasible.
>
> RAID levels are like any other tool.  Each is useful in the proper
> circumstances.
>
There is also RAID 50 which is quite nice.

Joshua D. Drake


> Happy holidays,
> Ron Peacetree
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings



--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/

Re: What's the best hardver for PostgreSQL 8.1?

From

David Lang

Date:

24 December 2005, 17:57:07

On Sat, 24 Dec 2005, Ron wrote:

> At 02:50 PM 12/24/2005, Frank Wiles wrote:
>> Juan Casero <caseroj@comcast.net> wrote:
>>
>> > Sorry folks.  I had a couple of glasses of wine as I wrote this.
>> > Anyway I originally wanted the box to have more than two drives so I
>> > could do RAID 5 but that is going to cost too much.  Also, contrary
>> > to my statement below it seems to me I should run the 32 bit
>> > postgresql server on the 64 bit kernel. Would you agree this will
>> > probably yield the best performance?    I know it depends alot on the
>> > system but for now this database is about 20 gigabytes. Not too large
>> > right now but it may grow 5x in the next year.
>>
>>   You definitely DO NOT want to do RAID 5 on a database server.  That
>>   is probably the worst setup you could have, I've seen it have lower
>>   performance than just a single hard disk.
>>
>>   RAID 1 and RAID 1+0 are optimal, but you want to stay far away from
>>   RAID 5.  IMHO RAID 5 is only useful on near line backup servers or
>>   Samba file servers where space is more important than speed.
> That's a bit misleading.  RAID 5 excels when you want read speed but don't
> care as much about write speed.  Writes are typical ~2/3 the speed of reads
> on a typical decent RAID 5 set up.
>
> So if you have tables that are read often and written to rarely or not at
> all, putting them on RAID 5 is optimal.  In both data mining like and OLTP
> like apps there are usually at least some such tables.

raid 5 is bad for random writes as you state, but how does it do for
sequential writes (for example data mining where you do a large import at
one time, but seldom do other updates). I'm assuming a controller with a
reasonable amount of battery-backed cache.

David Lang

Re: What's the best hardver for PostgreSQL 8.1?

From

Ron

Date:

24 December 2005, 18:25:02

At 04:42 PM 12/24/2005, Joshua D. Drake wrote:

>>If you've got the budget or are dealing with small enough physical
>>storage needs, by all means use RAID 10.  OTOH, if you are dealing
>>with large enterprise class apps like Sarbanes Oxley compliance,
>>medical and/or insurance, etc, etc, the storage needs can get so
>>large that RAID 10 for everything or even most things is not
>>possible.  Even if economically feasible.
>>
>>RAID levels are like any other tool.  Each is useful in the proper
>>circumstances.
>There is also RAID 50 which is quite nice.
The "quite nice" part that Joshua is referring to is that RAID 50
gets most of the write performance of RAID 10 w/o using nearly as
many HD's as RAID 10.  OTOH, there still is a significant increase in
the number of HD's used, and that means MBTF's become more frequent
but you are not getting protection levels you would with RAID 10.

IME RAID 50 gets mixed reviews.  My two biggest issues are
a= Admin of RAID 50 is more complex than the other commonly used
versions (1, 10, 5, and 6)
b= Once a HD failure takes place, you suffer a _permenent_
performance drop, even after the automatic volume rebuild, until you
take the entire RAID 50 array off line, reinitialize it, and rebuild
it from scratch.

IME "a" and "b" make RAID 50 inappropriate for any but the biggest
and most dedicated of DB admin groups.

YMMV,
Ron

Re: What's the best hardver for PostgreSQL 8.1?

From

William Yu

Date:

24 December 2005, 18:37:01

David Lang wrote:
> raid 5 is bad for random writes as you state, but how does it do for
> sequential writes (for example data mining where you do a large import
> at one time, but seldom do other updates). I'm assuming a controller
> with a reasonable amount of battery-backed cache.

Random write performance (small block that only writes to 1 drive):
1 write requires N-1 reads + N writes --> 1/2N-1 %

Sequential write performance (write big enough block to use all N drives):
N-1 Write requires N writes --> N-1/N %

Assuming enough cache so all reads/writes are done in 1 transaction +
onboard processor calcs RAID parity fast enough to not cause an extra delay.

Re: What's the best hardver for PostgreSQL 8.1?

From

Ron

Date:

24 December 2005, 18:45:23

At 04:54 PM 12/24/2005, David Lang wrote:

>raid 5 is bad for random writes as you state, but how does it do for
>sequential writes (for example data mining where you do a large
>import at one time, but seldom do other updates). I'm assuming a
>controller with a reasonable amount of battery-backed cache.
The issue with RAID 5 writes centers on the need to recalculate
checksums for the ECC blocks distributed across the array and then
write the new ones to physical media.

Caches help, and the bigger the cache the better, but once you are
doing enough writes fast enough (and that doesn't take much even with
a few GBs of cache) the recalculate-checksums-and-write-new-ones
overhead will decrease the write speed of real data.  Bear in mind
that the HD's _raw_ write speed hasn't been decreased.  Those HD's
are pounding away as fast as they can for you.  Your _effective_ or
_data level_ write speed is what decreases due to overhead.

Side Note: people often forget the other big reason to use RAID 10
over RAID 5.  RAID 5 is always only 2 HD failures from data
loss.  RAID 10 can lose up to 1/2 the HD's in the array w/o data loss
unless you get unlucky and lose both members of a RAID 1 set.

This can be seen as an example of the classic space vs. time trade
off in performance tuning.  You can use 2x the HDs you need and
implement RAID 10 for best performance and reliability or you can
dedicate less HD's to RAID and implement RAID 5 for less (write)
performance and lower reliability.

TANSTAAFL.
Ron Peacetree

Re: What's the best hardver for PostgreSQL 8.1?

From

"Sander Steffann"

Date:

24 December 2005, 18:53:32

Hi,

> b= Once a HD failure takes place, you suffer a _permenent_ performance
> drop, even after the automatic volume rebuild, until you take the entire
> RAID 50 array off line, reinitialize it, and rebuild it from scratch.

Where did you get that crazy idea? When you have replaced the drive and the
RAID is rebuilt, you have exactly the same situation as before the drive
failed. Why would you get less performance?
Sander.

Re: What's the best hardver for PostgreSQL 8.1?

From

"Luke Lonergan"

Date:

24 December 2005, 21:52:04

Frank,

>   You definitely DO NOT want to do RAID 5 on a database server.  That
>   is probably the worst setup you could have, I've seen it have lower
>   performance than just a single hard disk.

I've seen that on RAID0 and RAID10 as well.

This is more about the quality and modernity of the RAID controller than
anything else at this point, although there are some theoretical
advantages of RAID10 from a random seek standpoint even if the adapter
CPU is infinitely fast at checksumming.  We're using RAID5 in practice
for OLAP / Data Warehousing systems very successfully using the newest
RAID cards from 3Ware (9550SX).

Note that host-based SCSI raid cards from LSI, Adaptec, Intel, Dell, HP
and others have proven to have worse performance than a single disk
drive in many cases, whether for RAID0 or RAID5.  In most circumstances
I've seen, people don't even notice until they write a message to a
mailing list about "my query runs slowly on xxx dbms".  In many cases,
after they run a simple sequential transfer rate test using dd, they see
that their RAID controller is the culprit.

Recently, I helped a company named DeepData to improve their dbms
performance, which was a combination of moving them to software RAID50
on Linux and getting them onto Bizgres.  The disk subsystem sped up on
the same hardware (minus the HW RAID card) by over a factor of 10.  The
downside is that SW RAID is a pain in the neck for management - you have
to shut down the Linux host when a disk fails to replace it.

- Luke

Re: What's the best hardver for PostgreSQL 8.1?

From

David Lang

Date:

24 December 2005, 23:04:30

On Sat, 24 Dec 2005, Luke Lonergan wrote:

> Recently, I helped a company named DeepData to improve their dbms
> performance, which was a combination of moving them to software RAID50
> on Linux and getting them onto Bizgres.  The disk subsystem sped up on
> the same hardware (minus the HW RAID card) by over a factor of 10.  The
> downside is that SW RAID is a pain in the neck for management - you have
> to shut down the Linux host when a disk fails to replace it.

Luke, you should not need to shut down the linux host when a disk fails.

you should be able to use mdadm to mark the drive as failed, then remove
it from the system and replace it, then use mdadm to add the drive to the
array.

I'm fighting through a double disk failure on my system at home and when I
hit a bad spot on a drive (failing it from the array) I can just re-add it
without having to restart everything (if it's the second drive I will have
to stop and restart the array, but that's becouse the entire array has
failed at that point)

now hot-swap may not be supported on all interface types, that may be what
you have run into, but with SCSI or SATA you should be able to hot-swap
with the right controller.

David Lang

Re: What's the best hardver for PostgreSQL 8.1?

From

"Luke Lonergan"

Date:

24 December 2005, 23:16:21

David,

> now hot-swap may not be supported on all interface types, that may be what
> you have run into, but with SCSI or SATA you should be able to hot-swap
> with the right controller.

That's actually the problem - Linux hot swap is virtually non-functional for SCSI.  You can write into the proper
placesin /proc, then remove and rescan to get a new drive up, but I've found that the resulting OS state is flaky.
Thisis true of the latest 2.6 kernels and LSI and Adaptec SCSI controllers. 

The problems I've seen are  with Linux, not the controllers.

- Luke

Re: What's the best hardver for PostgreSQL 8.1?

From

William Yu

Date:

24 December 2005, 23:44:00

Luke Lonergan wrote:
> Note that host-based SCSI raid cards from LSI, Adaptec, Intel, Dell, HP
> and others have proven to have worse performance than a single disk
> drive in many cases, whether for RAID0 or RAID5.  In most circumstances

This is my own experience. Running a LSI MegaRAID in pure passthrough
mode + Linux software RAID10 is a ton faster than configuring the RAID
via the LSI card. One of the things I've noticed is that the card does
not seem to be able to parallel read on mirrors. While looking at iostat
under Linux, I can see software RAID1 reading all drives and the MD
number adding up to the sum of all drives.

The ARECA SATA controller I just got though doesn't seem to exhibit
these problems. Performance is a few % points above Linux software RAID
at lower CPU usage. In fact, I'm getting better single-threaded
bandwidth on a 4x7200RPM SATA config versus a 6x15K SCSI config on the
LSI. The drives are bigger for the SATA drive (300GB) versus 36GB for
the SCSI so that means the heads don't have to move any where as much
and can stay on the fast portion of the disk. Haven't had a chance to
test multi-user DB between the two setup though.

Re: What's the best hardver for PostgreSQL 8.1?

From

"Joshua D. Drake"

Date:

25 December 2005, 00:20:02

Luke Lonergan wrote:

>David,
>
>
>
>>now hot-swap may not be supported on all interface types, that may be what
>>you have run into, but with SCSI or SATA you should be able to hot-swap
>>with the right controller.
>>
>>
>
>That's actually the problem - Linux hot swap is virtually non-functional for SCSI.  You can write into the proper
placesin /proc, then remove and rescan to get a new drive up, but I've found that the resulting OS state is flaky.
Thisis true of the latest 2.6 kernels and LSI and Adaptec SCSI controllers. 
>
>The problems I've seen are  with Linux, not the controllers.
>
>
Interesting, I have had zero problems with Linux and SATA with LSI
controllers and hot plug. I wonder what the difference is. The LSI
controller even though SATA just uses the scsi driver.

Joshua D. Drake

>- Luke
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: Don't 'kill -9' the postmaster
>
>


--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/

Re: What's the best hardver for PostgreSQL 8.1?

From

David Lang

Date:

25 December 2005, 08:15:02

On Sat, 24 Dec 2005, Luke Lonergan wrote:

> David,
>
>> now hot-swap may not be supported on all interface types, that may be what
>> you have run into, but with SCSI or SATA you should be able to hot-swap
>> with the right controller.
>
> That's actually the problem - Linux hot swap is virtually non-functional for SCSI.  You can write into the proper
placesin /proc, then remove and rescan to get a new drive up, but I've found that the resulting OS state is flaky.
Thisis true of the latest 2.6 kernels and LSI and Adaptec SCSI controllers. 
>
> The problems I've seen are  with Linux, not the controllers.

Thanks for the clarification, I knew that PATA didn't do hotswap, and I've
seen discussions on the linux-kernel list about SATA hotswap being worked
on, but I thought that scsi handled it. how recent a kernel have you had
problems with?

David Lang

Re: What's the best hardver for PostgreSQL 8.1?

From

"Steinar H. Gunderson"

Date:

25 December 2005, 08:29:20

On Sun, Dec 25, 2005 at 04:13:57AM -0800, David Lang wrote:
> Thanks for the clarification, I knew that PATA didn't do hotswap, and I've
> seen discussions on the linux-kernel list about SATA hotswap being worked
> on, but I thought that scsi handled it. how recent a kernel have you had
> problems with?

Is has largely worked for us, even though it's a bit hackish -- you _must_
disconnect the drive properly in the kernel before ejecting it physically,
though, or it will never reconnect. At least that's how it is with our
Adaptec 19160.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: What's the best hardver for PostgreSQL 8.1?

From

Bruno Wolff III

Date:

25 December 2005, 11:10:05

On Sat, Dec 24, 2005 at 22:13:43 -0500,
  Luke Lonergan <LLonergan@greenplum.com> wrote:
> David,
>
> > now hot-swap may not be supported on all interface types, that may be what
> > you have run into, but with SCSI or SATA you should be able to hot-swap
> > with the right controller.
>
> That's actually the problem - Linux hot swap is virtually non-functional for SCSI.  You can write into the proper
placesin /proc, then remove and rescan to get a new drive up, but I've found that the resulting OS state is flaky.
Thisis true of the latest 2.6 kernels and LSI and Adaptec SCSI controllers. 
>
> The problems I've seen are  with Linux, not the controllers.

The other option is to keep hot spares available so that you can have a failure
or two before you have to pull drives out. This might allow you to get to a
maintenance window to swap out the bad drives.

Re: What's the best hardver for PostgreSQL 8.1?

From

Michael Stone

Date:

25 December 2005, 13:38:16

On Sat, Dec 24, 2005 at 05:45:20PM -0500, Ron wrote:
>Caches help, and the bigger the cache the better, but once you are
>doing enough writes fast enough (and that doesn't take much even with
>a few GBs of cache) the recalculate-checksums-and-write-new-ones
>overhead will decrease the write speed of real data.  Bear in mind
>that the HD's _raw_ write speed hasn't been decreased.  Those HD's
>are pounding away as fast as they can for you.  Your _effective_ or
>_data level_ write speed is what decreases due to overhead.

You're overgeneralizing. Assuming a large cache and a sequential write,
there's need be no penalty for raid 5. (For random writes you may
need to read unrelated blocks in order to calculate parity, but for
large sequential writes the parity blocks should all be read from
cache.) A modern cpu can calculate parity for raid 5 on the order of
gigabytes per second, and even crummy embedded processors can do
hundreds of megabytes per second. You may have run into some lousy
implementations, but you should be much more specific about what
hardware you're talking about instead of making sweeping
generalizations.

>Side Note: people often forget the other big reason to use RAID 10
>over RAID 5.  RAID 5 is always only 2 HD failures from data
>loss.  RAID 10 can lose up to 1/2 the HD's in the array w/o data loss
>unless you get unlucky and lose both members of a RAID 1 set.

IOW, your RAID 10 is only 2 HD failures from data loss also. If that's
an issue you need to go with RAID 6 or add another disk to each mirror.

Mike Stone

Re: What's the best hardver for PostgreSQL 8.1?

From

Benjamin Arai

Date:

26 December 2005, 07:21:35

Have you done any benchmarking of the 9550SX against a software raid configuration?

Luke Lonergan wrote:

Frank,

  You definitely DO NOT want to do RAID 5 on a database server.  That is probably the worst setup you could have, I've seen it have lower performance than just a single hard disk.

I've seen that on RAID0 and RAID10 as well.

This is more about the quality and modernity of the RAID controller than
anything else at this point, although there are some theoretical
advantages of RAID10 from a random seek standpoint even if the adapter
CPU is infinitely fast at checksumming.  We're using RAID5 in practice
for OLAP / Data Warehousing systems very successfully using the newest
RAID cards from 3Ware (9550SX).

Note that host-based SCSI raid cards from LSI, Adaptec, Intel, Dell, HP
and others have proven to have worse performance than a single disk
drive in many cases, whether for RAID0 or RAID5.  In most circumstances
I've seen, people don't even notice until they write a message to a
mailing list about "my query runs slowly on xxx dbms".  In many cases,
after they run a simple sequential transfer rate test using dd, they see
that their RAID controller is the culprit.

Recently, I helped a company named DeepData to improve their dbms
performance, which was a combination of moving them to software RAID50
on Linux and getting them onto Bizgres.  The disk subsystem sped up on
the same hardware (minus the HW RAID card) by over a factor of 10.  The
downside is that SW RAID is a pain in the neck for management - you have
to shut down the Linux host when a disk fails to replace it.

- Luke


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--
Benjamin Arai
barai@cs.ucr.edu
http://www.benjaminarai.com

Re: What's the best hardver for PostgreSQL 8.1?

From

"Luke Lonergan"

Date:

26 December 2005, 08:13:31

Benjamin,

> Have you done any benchmarking of the 9550SX against a software raid configuration?


Interesting - no, not on SATA, mostly because I've had awful luck with Linux drivers and SATA.  The popular
manufacturersof SATA to PCI bridge chipsets are Silicon Image and Highpoint, and I've not seen Linux work with them at
anyreasonable performance yet.  I've also had problems with Adaptec's cards - I think they manufacture their own SATA
toPCI chipset as well.  So far, I've only had good luck with the on-chipset Intel SATA implementation.  I think the
problemsI've had could be entirely driver-related, but in the end it doesn't matter if you can't find drivers that work
forLinux. 

The other problem is getting enough SATA connections for the number of disks we want.  I do have two new Areca SATA
RAIDcards and I'm going to benchmark those against the 3Ware 9550SX with 2 x 8 = 16 disks on one host. 

I guess we could run the HW RAID controllers in JBOD mode to get a good driver / chipset configuration for software
RAID,but frankly I prefer HW RAID if it performs well.  So far the SATA host-based RAID is blowing the doors off of
everyother HW RAID solution I've tested. 

- Luke

Re: What's the best hardver for PostgreSQL 8.1?

From

Alex Turner

Date:

26 December 2005, 13:38:40

It's irrelavent what controller, you still have to actualy write the
parity blocks, which slows down your write speed because you have to
write n+n/2 blocks. instead of just n blocks making the system write
50% more data.

RAID 5 must write 50% more data to disk therefore it will always be slower.

Alex.

On 12/25/05, Michael Stone <mstone+postgres@mathom.us> wrote:
> On Sat, Dec 24, 2005 at 05:45:20PM -0500, Ron wrote:
> >Caches help, and the bigger the cache the better, but once you are
> >doing enough writes fast enough (and that doesn't take much even with
> >a few GBs of cache) the recalculate-checksums-and-write-new-ones
> >overhead will decrease the write speed of real data.  Bear in mind
> >that the HD's _raw_ write speed hasn't been decreased.  Those HD's
> >are pounding away as fast as they can for you.  Your _effective_ or
> >_data level_ write speed is what decreases due to overhead.
>
> You're overgeneralizing. Assuming a large cache and a sequential write,
> there's need be no penalty for raid 5. (For random writes you may
> need to read unrelated blocks in order to calculate parity, but for
> large sequential writes the parity blocks should all be read from
> cache.) A modern cpu can calculate parity for raid 5 on the order of
> gigabytes per second, and even crummy embedded processors can do
> hundreds of megabytes per second. You may have run into some lousy
> implementations, but you should be much more specific about what
> hardware you're talking about instead of making sweeping
> generalizations.
>
> >Side Note: people often forget the other big reason to use RAID 10
> >over RAID 5.  RAID 5 is always only 2 HD failures from data
> >loss.  RAID 10 can lose up to 1/2 the HD's in the array w/o data loss
> >unless you get unlucky and lose both members of a RAID 1 set.
>
> IOW, your RAID 10 is only 2 HD failures from data loss also. If that's
> an issue you need to go with RAID 6 or add another disk to each mirror.
>
> Mike Stone
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

Re: What's the best hardver for PostgreSQL 8.1?

From

David Lang

Date:

26 December 2005, 14:13:58

On Mon, 26 Dec 2005, Alex Turner wrote:

> It's irrelavent what controller, you still have to actualy write the
> parity blocks, which slows down your write speed because you have to
> write n+n/2 blocks. instead of just n blocks making the system write
> 50% more data.
>
> RAID 5 must write 50% more data to disk therefore it will always be slower.

raid5 writes n+1 blocks not n+n/2 (unless n=2 for a 3-disk raid). you can
have a 15+1 disk raid5 array for example

however raid1 (and raid10) have to write 2*n blocks to disk. so if you are
talking about pure I/O needed raid5 wins hands down. (the same 16 drives
would be a 8+8 array)

what slows down raid 5 is that to modify a block you have to read blocks
from all your drives to re-calculate the parity. this interleaving of
reads and writes when all you are logicly doing is writes can really hurt.
(this is why I asked the question that got us off on this tangent, when
doing new writes to an array you don't have to read the blocks as they are
blank, assuming your cacheing is enough so that you can write blocksize*n
before the system starts actually writing the data)

David Lang

> Alex.
>
> On 12/25/05, Michael Stone <mstone+postgres@mathom.us> wrote:
>> On Sat, Dec 24, 2005 at 05:45:20PM -0500, Ron wrote:
>>> Caches help, and the bigger the cache the better, but once you are
>>> doing enough writes fast enough (and that doesn't take much even with
>>> a few GBs of cache) the recalculate-checksums-and-write-new-ones
>>> overhead will decrease the write speed of real data.  Bear in mind
>>> that the HD's _raw_ write speed hasn't been decreased.  Those HD's
>>> are pounding away as fast as they can for you.  Your _effective_ or
>>> _data level_ write speed is what decreases due to overhead.
>>
>> You're overgeneralizing. Assuming a large cache and a sequential write,
>> there's need be no penalty for raid 5. (For random writes you may
>> need to read unrelated blocks in order to calculate parity, but for
>> large sequential writes the parity blocks should all be read from
>> cache.) A modern cpu can calculate parity for raid 5 on the order of
>> gigabytes per second, and even crummy embedded processors can do
>> hundreds of megabytes per second. You may have run into some lousy
>> implementations, but you should be much more specific about what
>> hardware you're talking about instead of making sweeping
>> generalizations.
>>
>>> Side Note: people often forget the other big reason to use RAID 10
>>> over RAID 5.  RAID 5 is always only 2 HD failures from data
>>> loss.  RAID 10 can lose up to 1/2 the HD's in the array w/o data loss
>>> unless you get unlucky and lose both members of a RAID 1 set.
>>
>> IOW, your RAID 10 is only 2 HD failures from data loss also. If that's
>> an issue you need to go with RAID 6 or add another disk to each mirror.
>>
>> Mike Stone
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: explain analyze is your friend
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org
>

Re: What's the best hardver for PostgreSQL 8.1?

From

Benjamin Arai

Date:

26 December 2005, 14:39:14

Have you have any experience rebuilding arrays in linux using the 3Ware utilities? If so, did it work well?

Luke Lonergan wrote:

Benjamin,

Have you done any benchmarking of the 9550SX against a software raid configuration?

Interesting - no, not on SATA, mostly because I've had awful luck with Linux drivers and SATA.  The popular manufacturers of SATA to PCI bridge chipsets are Silicon Image and Highpoint, and I've not seen Linux work with them at any reasonable performance yet.  I've also had problems with Adaptec's cards - I think they manufacture their own SATA to PCI chipset as well.  So far, I've only had good luck with the on-chipset Intel SATA implementation.  I think the problems I've had could be entirely driver-related, but in the end it doesn't matter if you can't find drivers that work for Linux.
The other problem is getting enough SATA connections for the number of disks we want.  I do have two new Areca SATA RAID cards and I'm going to benchmark those against the 3Ware 9550SX with 2 x 8 = 16 disks on one host.
I guess we could run the HW RAID controllers in JBOD mode to get a good driver / chipset configuration for software RAID, but frankly I prefer HW RAID if it performs well.  So far the SATA host-based RAID is blowing the doors off of every other HW RAID solution I've tested.
- Luke

--
Benjamin Arai
barai@cs.ucr.edu
http://www.benjaminarai.com

Re: What's the best hardver for PostgreSQL 8.1?

From

"Luke Lonergan"

Date:

26 December 2005, 14:50:49

Benjamin,

On 12/26/05 10:21 AM, "Benjamin Arai" <barai@cs.ucr.edu> wrote:

> Have you have any experience rebuilding arrays in linux using the 3Ware
> utilities?  If so, did it work well?

Sure we have - nowadays with disks failing as much as they do how could we
not? ;-)

3Ware has some *nice* tools - including a web browser utility for managing
the RAID.  Rebuilds have been super easy - and the e-mail notification is
fine.  They even have some decent migration options.

What they don't have are tools like snapshot backup, like EMC has, or SRDF
or any of the enterprise SAN features.  We don't need them because Bizgres
MPP takes care of the need in software, but some people have become
accustomed to the features for other uses.

We're pretty happy with 3Ware, but their new 9550SX is, well, new.  We
managed to find a good enough combination of driver and firmware to make it
work well on CentOs 4.1 and that's good enough for us, but there are
definitely some issues with some combinations now.  Lastly, you do have to
set the block device readahead to 16MB to get performance.

- Luke

Re: What's the best hardver for PostgreSQL 8.1?

From

Alex Turner

Date:

26 December 2005, 18:54:55

Yes - they work excellently. I have several medium and large servers
running 3ware 9500S series cards with great success.  We have
rebuilding many failed RAID 10s over the course with no problems.

Alex

On 12/26/05, Benjamin Arai <barai@cs.ucr.edu> wrote:
>  Have you have any experience rebuilding arrays in linux using the 3Ware
> utilities?  If so, did it work well?
>
>
>  Luke Lonergan wrote:
>  Benjamin,
>
>
>
>  Have you done any benchmarking of the 9550SX against a software raid
> configuration?
>
>
> Interesting - no, not on SATA, mostly because I've had awful luck with Linux
> drivers and SATA. The popular manufacturers of SATA to PCI bridge chipsets
> are Silicon Image and Highpoint, and I've not seen Linux work with them at
> any reasonable performance yet. I've also had problems with Adaptec's cards
> - I think they manufacture their own SATA to PCI chipset as well. So far,
> I've only had good luck with the on-chipset Intel SATA implementation. I
> think the problems I've had could be entirely driver-related, but in the end
> it doesn't matter if you can't find drivers that work for Linux.
>
> The other problem is getting enough SATA connections for the number of disks
> we want. I do have two new Areca SATA RAID cards and I'm going to benchmark
> those against the 3Ware 9550SX with 2 x 8 = 16 disks on one host.
>
> I guess we could run the HW RAID controllers in JBOD mode to get a good
> driver / chipset configuration for software RAID, but frankly I prefer HW
> RAID if it performs well. So far the SATA host-based RAID is blowing the
> doors off of every other HW RAID solution I've tested.
>
> - Luke
>
>
>
>
> --
>  Benjamin Arai
>  barai@cs.ucr.edu
>  http://www.benjaminarai.com

Re: What's the best hardver for PostgreSQL 8.1?

From

Alex Turner

Date:

26 December 2005, 19:04:41

Yes, but those blocks in RAID 10 are largely irrelevant as they are to
independant disks.  In RAID 5 you have to write parity to an 'active'
drive that is part of the stripe.  (They are irrelevant unless of
course you are maxing out your SCSI bus - yet another reason why SATA
can be faster than SCSI, particularly in RAID 10, every channel is
independant).

Sorry - my math for RAID 5 was a bit off - I don't know why I was
considering only a three dirve situation - which is the worst.  It's
n+1 you are right.  still, for small arrays thats a big penalty.
Still, there is definately a penatly contrary to the assertion of the
orignal poster.

I agree totally that the read+parity-calc+write in the worst case is
totaly bad, which is why I alway recommend people should _never ever_
use RAID 5.   In this day and age of large capacity chassis, and large
capacity SATA drives, RAID 5 is totally inapropriate IMHO for _any_
application least of all databases.

In reality I have yet to benchmark a system where RAID 5 on the same
number of drives with 8 drives or less in a single array beat a RAID
10 with the same number of drives.  I would definately be interested
in a SCSI card that could actualy achieve the theoretical performance
of RAID 5 especially under Linux.

With RAID 5 you get to watch you system crumble and fail when a drive
fails and the array goes into a failed state.  It's just not worth it.

Alex.


On 12/26/05, David Lang <dlang@invendra.net> wrote:
> On Mon, 26 Dec 2005, Alex Turner wrote:
>
> > It's irrelavent what controller, you still have to actualy write the
> > parity blocks, which slows down your write speed because you have to
> > write n+n/2 blocks. instead of just n blocks making the system write
> > 50% more data.
> >
> > RAID 5 must write 50% more data to disk therefore it will always be slower.
>
> raid5 writes n+1 blocks not n+n/2 (unless n=2 for a 3-disk raid). you can
> have a 15+1 disk raid5 array for example
>
> however raid1 (and raid10) have to write 2*n blocks to disk. so if you are
> talking about pure I/O needed raid5 wins hands down. (the same 16 drives
> would be a 8+8 array)
>
> what slows down raid 5 is that to modify a block you have to read blocks
> from all your drives to re-calculate the parity. this interleaving of
> reads and writes when all you are logicly doing is writes can really hurt.
> (this is why I asked the question that got us off on this tangent, when
> doing new writes to an array you don't have to read the blocks as they are
> blank, assuming your cacheing is enough so that you can write blocksize*n
> before the system starts actually writing the data)
>
> David Lang
>
> > Alex.
> >
> > On 12/25/05, Michael Stone <mstone+postgres@mathom.us> wrote:
> >> On Sat, Dec 24, 2005 at 05:45:20PM -0500, Ron wrote:
> >>> Caches help, and the bigger the cache the better, but once you are
> >>> doing enough writes fast enough (and that doesn't take much even with
> >>> a few GBs of cache) the recalculate-checksums-and-write-new-ones
> >>> overhead will decrease the write speed of real data.  Bear in mind
> >>> that the HD's _raw_ write speed hasn't been decreased.  Those HD's
> >>> are pounding away as fast as they can for you.  Your _effective_ or
> >>> _data level_ write speed is what decreases due to overhead.
> >>
> >> You're overgeneralizing. Assuming a large cache and a sequential write,
> >> there's need be no penalty for raid 5. (For random writes you may
> >> need to read unrelated blocks in order to calculate parity, but for
> >> large sequential writes the parity blocks should all be read from
> >> cache.) A modern cpu can calculate parity for raid 5 on the order of
> >> gigabytes per second, and even crummy embedded processors can do
> >> hundreds of megabytes per second. You may have run into some lousy
> >> implementations, but you should be much more specific about what
> >> hardware you're talking about instead of making sweeping
> >> generalizations.
> >>
> >>> Side Note: people often forget the other big reason to use RAID 10
> >>> over RAID 5.  RAID 5 is always only 2 HD failures from data
> >>> loss.  RAID 10 can lose up to 1/2 the HD's in the array w/o data loss
> >>> unless you get unlucky and lose both members of a RAID 1 set.
> >>
> >> IOW, your RAID 10 is only 2 HD failures from data loss also. If that's
> >> an issue you need to go with RAID 6 or add another disk to each mirror.
> >>
> >> Mike Stone
> >>
> >> ---------------------------(end of broadcast)---------------------------
> >> TIP 6: explain analyze is your friend
> >>
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> >               http://archives.postgresql.org
> >
>

Re: What's the best hardver for PostgreSQL 8.1?

From

David Lang

Date:

26 December 2005, 19:30:11

On Mon, 26 Dec 2005, Alex Turner wrote:

>
> Yes, but those blocks in RAID 10 are largely irrelevant as they are to
> independant disks.  In RAID 5 you have to write parity to an 'active'
> drive that is part of the stripe.  (They are irrelevant unless of
> course you are maxing out your SCSI bus - yet another reason why SATA
> can be faster than SCSI, particularly in RAID 10, every channel is
> independant).

I don't understand your 'active' vs 'inactive' drive argument, in raid 1
or 1+0 all drives are active.

with good components you need to worry about maxing out your PCI bus as
much as any other one (this type of thing is where the hardware raid has a
definante advantage since the card handles the extra I/O, not your system)

> Sorry - my math for RAID 5 was a bit off - I don't know why I was
> considering only a three dirve situation - which is the worst.  It's
> n+1 you are right.  still, for small arrays thats a big penalty.
> Still, there is definately a penatly contrary to the assertion of the
> orignal poster.
>
> I agree totally that the read+parity-calc+write in the worst case is
> totaly bad, which is why I alway recommend people should _never ever_
> use RAID 5.   In this day and age of large capacity chassis, and large
> capacity SATA drives, RAID 5 is totally inapropriate IMHO for _any_
> application least of all databases.
>
> In reality I have yet to benchmark a system where RAID 5 on the same
> number of drives with 8 drives or less in a single array beat a RAID
> 10 with the same number of drives.  I would definately be interested
> in a SCSI card that could actualy achieve the theoretical performance
> of RAID 5 especially under Linux.

but it's not a 'same number of drives' comparison you should be makeing.

if you have a 8 drive RAID5 array you need to compare it with a 14 drive
RAID1/10 array.

> With RAID 5 you get to watch you system crumble and fail when a drive
> fails and the array goes into a failed state.  It's just not worth it.

speed is worth money (and therefor number of drives) in some cases, but
not in all cases. also the speed penalty when you have a raid drive fail
varies based on your controller

it's wrong to flatly rule out any RAID configuration, they all have their
place and the important thing is to understand what the advantages and
disadvantages are for each of them so you can know when to use each one.

for example I have a situation I am looking at where RAID0 is looking
appropriate for a database (a multi-TB array that gets completely reloaded
every month or so as data expires and new data is loaded from the
authoritative source, adding another 16 drives to get redundancy isn't
reasonable)

David Lang

> Alex.
>
>
> On 12/26/05, David Lang <dlang@invendra.net> wrote:
>> On Mon, 26 Dec 2005, Alex Turner wrote:
>>
>>> It's irrelavent what controller, you still have to actualy write the
>>> parity blocks, which slows down your write speed because you have to
>>> write n+n/2 blocks. instead of just n blocks making the system write
>>> 50% more data.
>>>
>>> RAID 5 must write 50% more data to disk therefore it will always be slower.
>>
>> raid5 writes n+1 blocks not n+n/2 (unless n=2 for a 3-disk raid). you can
>> have a 15+1 disk raid5 array for example
>>
>> however raid1 (and raid10) have to write 2*n blocks to disk. so if you are
>> talking about pure I/O needed raid5 wins hands down. (the same 16 drives
>> would be a 8+8 array)
>>
>> what slows down raid 5 is that to modify a block you have to read blocks
>> from all your drives to re-calculate the parity. this interleaving of
>> reads and writes when all you are logicly doing is writes can really hurt.
>> (this is why I asked the question that got us off on this tangent, when
>> doing new writes to an array you don't have to read the blocks as they are
>> blank, assuming your cacheing is enough so that you can write blocksize*n
>> before the system starts actually writing the data)
>>
>> David Lang
>>
>>> Alex.
>>>
>>> On 12/25/05, Michael Stone <mstone+postgres@mathom.us> wrote:
>>>> On Sat, Dec 24, 2005 at 05:45:20PM -0500, Ron wrote:
>>>>> Caches help, and the bigger the cache the better, but once you are
>>>>> doing enough writes fast enough (and that doesn't take much even with
>>>>> a few GBs of cache) the recalculate-checksums-and-write-new-ones
>>>>> overhead will decrease the write speed of real data.  Bear in mind
>>>>> that the HD's _raw_ write speed hasn't been decreased.  Those HD's
>>>>> are pounding away as fast as they can for you.  Your _effective_ or
>>>>> _data level_ write speed is what decreases due to overhead.
>>>>
>>>> You're overgeneralizing. Assuming a large cache and a sequential write,
>>>> there's need be no penalty for raid 5. (For random writes you may
>>>> need to read unrelated blocks in order to calculate parity, but for
>>>> large sequential writes the parity blocks should all be read from
>>>> cache.) A modern cpu can calculate parity for raid 5 on the order of
>>>> gigabytes per second, and even crummy embedded processors can do
>>>> hundreds of megabytes per second. You may have run into some lousy
>>>> implementations, but you should be much more specific about what
>>>> hardware you're talking about instead of making sweeping
>>>> generalizations.
>>>>
>>>>> Side Note: people often forget the other big reason to use RAID 10
>>>>> over RAID 5.  RAID 5 is always only 2 HD failures from data
>>>>> loss.  RAID 10 can lose up to 1/2 the HD's in the array w/o data loss
>>>>> unless you get unlucky and lose both members of a RAID 1 set.
>>>>
>>>> IOW, your RAID 10 is only 2 HD failures from data loss also. If that's
>>>> an issue you need to go with RAID 6 or add another disk to each mirror.
>>>>
>>>> Mike Stone
>>>>
>>>> ---------------------------(end of broadcast)---------------------------
>>>> TIP 6: explain analyze is your friend
>>>>
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 4: Have you searched our list archives?
>>>
>>>               http://archives.postgresql.org
>>>
>>
>

Re: What's the best hardver for PostgreSQL 8.1?

From

Dawid Kuroczko

Date:

27 December 2005, 03:24:47

On 12/26/05, David Lang <dlang@invendra.net> wrote:
> raid5 writes n+1 blocks not n+n/2 (unless n=2 for a 3-disk raid). you can
> have a 15+1 disk raid5 array for example
>
> however raid1 (and raid10) have to write 2*n blocks to disk. so if you are
> talking about pure I/O needed raid5 wins hands down. (the same 16 drives
> would be a 8+8 array)
>
> what slows down raid 5 is that to modify a block you have to read blocks
> from all your drives to re-calculate the parity. this interleaving of
> reads and writes when all you are logicly doing is writes can really hurt.
> (this is why I asked the question that got us off on this tangent, when
> doing new writes to an array you don't have to read the blocks as they are
> blank, assuming your cacheing is enough so that you can write blocksize*n
> before the system starts actually writing the data)

Not exactly true.

Let's assume you have a 4+1 RAID5 (drives A, B, C, D and E),
and you want to update drive A.  Let's assume the parity
is stored in this particular write on drive E.

One way to write it is:
 write A,
 read A, B, C, D,
 combine A+B+C+D and write it E.
 (4 reads + 2 writes)

The other way to write it is:
 read oldA,
 read old parity oldE
 write newA,
 write E = oldE + (newA-oldA) -- calculate difference between new and
old A, and apply it to old parity, then write
 (2 reads + 2 writes)

The more drives you have, the smarter it is to use the second approach,
unless of course A, B, C and D are available in the cache, which is the
niciest situation.

   Regards,
       Dawid

Re: What's the best hardver for PostgreSQL 8.1?

From

Michael Stone

Date:

27 December 2005, 09:35:37

On Mon, Dec 26, 2005 at 12:32:19PM -0500, Alex Turner wrote:
>It's irrelavent what controller, you still have to actualy write the
>parity blocks, which slows down your write speed because you have to
>write n+n/2 blocks. instead of just n blocks making the system write
>50% more data.
>
>RAID 5 must write 50% more data to disk therefore it will always be
>slower.

At this point you've drifted into complete nonsense mode.

On Mon, Dec 26, 2005 at 10:11:00AM -0800, David Lang wrote:
>what slows down raid 5 is that to modify a block you have to read blocks
>from all your drives to re-calculate the parity. this interleaving of
>reads and writes when all you are logicly doing is writes can really hurt.
>(this is why I asked the question that got us off on this tangent, when
>doing new writes to an array you don't have to read the blocks as they are
>blank, assuming your cacheing is enough so that you can write blocksize*n
>before the system starts actually writing the data)

Correct; there's no reason for the controller to read anything back if
your write will fill a complete stripe. That's why I said that there
isn't a "RAID 5 penalty" assuming you've got a reasonably fast
controller and you're doing large sequential writes (or have enough
cache that random writes can be batched as large sequential writes).

On Mon, Dec 26, 2005 at 06:04:40PM -0500, Alex Turner wrote:
>Yes, but those blocks in RAID 10 are largely irrelevant as they are to
>independant disks.  In RAID 5 you have to write parity to an 'active'
>drive that is part of the stripe.

Once again, this doesn't make any sense. Can you explain which parts of
a RAID 10 array are inactive?

>I agree totally that the read+parity-calc+write in the worst case is
>totaly bad, which is why I alway recommend people should _never ever_
>use RAID 5.   In this day and age of large capacity chassis, and large
>capacity SATA drives, RAID 5 is totally inapropriate IMHO for _any_
>application least of all databases.

So I've got a 14 drive chassis full of 300G SATA disks and need at least
3.5TB of data storage. In your mind the only possible solution is to buy
another 14 drive chassis? Must be nice to never have a budget. Must be a
hard sell if you've bought decent enough hardware that your benchmarks
can't demonstrate a difference between a RAID 5 and a RAID 10
configuration on that chassis except in degraded mode (and the customer
doesn't want to pay double for degraded mode performance).

>In reality I have yet to benchmark a system where RAID 5 on the same
>number of drives with 8 drives or less in a single array beat a RAID
>10 with the same number of drives.

Well, those are frankly little arrays, probably on lousy controllers...

Mike Stone

Re: What's the best hardver for PostgreSQL 8.1?

From

Ron

Date:

27 December 2005, 12:50:28

At 08:35 AM 12/27/2005, Michael Stone wrote:
>On Mon, Dec 26, 2005 at 10:11:00AM -0800, David Lang wrote:
>>what slows down raid 5 is that to modify a block you have to read
>>blocks from all your drives to re-calculate the parity. this
>>interleaving of reads and writes when all you are logicly doing is
>>writes can really hurt. (this is why I asked the question that got
>>us off on this tangent, when doing new writes to an array you don't
>>have to read the blocks as they are blank, assuming your cacheing
>>is enough so that you can write blocksize*n before the system
>>starts actually writing the data)
>
>Correct; there's no reason for the controller to read anything back
>if your write will fill a complete stripe. That's why I said that
>there isn't a "RAID 5 penalty" assuming you've got a reasonably fast
>controller and you're doing large sequential writes (or have enough
>cache that random writes can be batched as large sequential writes).

Sorry.  A decade+ RWE in production with RAID 5 using controllers as
bad as Adaptec and as good as Mylex, Chaparral, LSI Logic (including
their Engino stuff), and Xyratex under 5 different OS's (Sun, Linux,
M$, DEC, HP) on each of Oracle, SQL Server, DB2, mySQL, and pg shows
that RAID 5 writes are slower than RAID 5 reads

With the one notable exception of the Mylex controller that was so
good IBM bought Mylex to put them out of business.

Enough IO load, random or sequential, will cause the effect no matter
how much cache you have or how fast the controller is.

The even bigger problem that everyone is ignoring here is that large
RAID 5's spend increasingly larger percentages of their time with 1
failed HD in them.  The math of having that many HDs operating
simultaneously 24x7 makes it inevitable.

This means you are operating in degraded mode an increasingly larger
percentage of the time under exactly the circumstance you least want
to be.  In addition, you are =one= HD failure from data loss on that
array an increasingly larger percentage of the time under exactly the
least circumstances you want to be.

RAID 5 is not a silver bullet.

>  On Mon, Dec 26, 2005 at 06:04:40PM -0500, Alex Turner wrote:
>>Yes, but those blocks in RAID 10 are largely irrelevant as they are
>>to independant disks.  In RAID 5 you have to write parity to an
>>'active' drive that is part of the stripe.
>
>Once again, this doesn't make any sense. Can you explain which parts of
>a RAID 10 array are inactive?
>
>>I agree totally that the read+parity-calc+write in the worst case
>>is totaly bad, which is why I alway recommend people should _never
>>ever_ use RAID 5.   In this day and age of large capacity chassis,
>>and large capacity SATA drives, RAID 5 is totally inapropriate IMHO
>>for _any_ application least of all databases.
I vote with Michael here.  This is an extreme position to take that
can't be followed under many circumstances ITRW.

>So I've got a 14 drive chassis full of 300G SATA disks and need at
>least 3.5TB of data storage. In your mind the only possible solution
>is to buy another 14 drive chassis? Must be nice to never have a budget.

I think you mean an infinite budget.  That's even assuming it's
possible to get the HD's you need.  I've had arrays that used all the
space I could give them in 160 HD cabinets.  Two 160 HD cabinets was
neither within the budget nor going to perform well.  I =had= to use
RAID 5.  RAID 10 was just not usage efficient enough.

>Must be a hard sell if you've bought decent enough hardware that
>your benchmarks can't demonstrate a difference between a RAID 5 and
>a RAID 10 configuration on that chassis except in degraded mode (and
>the customer doesn't want to pay double for degraded mode performance)

I have =never= had this situation.  RAID 10 latency is better than
RAID 5 latency.  RAID 10 write speed under heavy enough load, of any
type, is faster than RAID 5 write speed under the same
circumstances.  RAID 10 robustness is better as well.

Problem is that sometimes budget limits or number of HDs needed
limits mean you can't use RAID 10.

>>In reality I have yet to benchmark a system where RAID 5 on the
>>same number of drives with 8 drives or less in a single array beat
>>a RAID 10 with the same number of drives.
>
>Well, those are frankly little arrays, probably on lousy controllers...
Nah.  Regardless of controller I can take any RAID 5 and any RAID 10
built on the same HW under the same OS running the same DBMS and
=guarantee= there is an IO load above which it can be shown that the
RAID 10 will do writes faster than the RAID 5.  The only exception in
my career thus far has been the aforementioned Mylex controller.

OTOH, sometimes you have no choice but to "take the hit" and use RAID 5.

cheers,
Ron

Re: What's the best hardver for PostgreSQL 8.1?

From

Bruce Momjian

Date:

27 December 2005, 13:52:02

Historically, I have heard that RAID5 is only faster than RAID10 if
there are six or more drives.

---------------------------------------------------------------------------

Ron wrote:
> At 08:35 AM 12/27/2005, Michael Stone wrote:
> >On Mon, Dec 26, 2005 at 10:11:00AM -0800, David Lang wrote:
> >>what slows down raid 5 is that to modify a block you have to read
> >>blocks from all your drives to re-calculate the parity. this
> >>interleaving of reads and writes when all you are logicly doing is
> >>writes can really hurt. (this is why I asked the question that got
> >>us off on this tangent, when doing new writes to an array you don't
> >>have to read the blocks as they are blank, assuming your cacheing
> >>is enough so that you can write blocksize*n before the system
> >>starts actually writing the data)
> >
> >Correct; there's no reason for the controller to read anything back
> >if your write will fill a complete stripe. That's why I said that
> >there isn't a "RAID 5 penalty" assuming you've got a reasonably fast
> >controller and you're doing large sequential writes (or have enough
> >cache that random writes can be batched as large sequential writes).
>
> Sorry.  A decade+ RWE in production with RAID 5 using controllers as
> bad as Adaptec and as good as Mylex, Chaparral, LSI Logic (including
> their Engino stuff), and Xyratex under 5 different OS's (Sun, Linux,
> M$, DEC, HP) on each of Oracle, SQL Server, DB2, mySQL, and pg shows
> that RAID 5 writes are slower than RAID 5 reads
>
> With the one notable exception of the Mylex controller that was so
> good IBM bought Mylex to put them out of business.
>
> Enough IO load, random or sequential, will cause the effect no matter
> how much cache you have or how fast the controller is.
>
> The even bigger problem that everyone is ignoring here is that large
> RAID 5's spend increasingly larger percentages of their time with 1
> failed HD in them.  The math of having that many HDs operating
> simultaneously 24x7 makes it inevitable.
>
> This means you are operating in degraded mode an increasingly larger
> percentage of the time under exactly the circumstance you least want
> to be.  In addition, you are =one= HD failure from data loss on that
> array an increasingly larger percentage of the time under exactly the
> least circumstances you want to be.
>
> RAID 5 is not a silver bullet.
>
>
> >  On Mon, Dec 26, 2005 at 06:04:40PM -0500, Alex Turner wrote:
> >>Yes, but those blocks in RAID 10 are largely irrelevant as they are
> >>to independant disks.  In RAID 5 you have to write parity to an
> >>'active' drive that is part of the stripe.
> >
> >Once again, this doesn't make any sense. Can you explain which parts of
> >a RAID 10 array are inactive?
> >
> >>I agree totally that the read+parity-calc+write in the worst case
> >>is totaly bad, which is why I alway recommend people should _never
> >>ever_ use RAID 5.   In this day and age of large capacity chassis,
> >>and large capacity SATA drives, RAID 5 is totally inapropriate IMHO
> >>for _any_ application least of all databases.
> I vote with Michael here.  This is an extreme position to take that
> can't be followed under many circumstances ITRW.
>
>
> >So I've got a 14 drive chassis full of 300G SATA disks and need at
> >least 3.5TB of data storage. In your mind the only possible solution
> >is to buy another 14 drive chassis? Must be nice to never have a budget.
>
> I think you mean an infinite budget.  That's even assuming it's
> possible to get the HD's you need.  I've had arrays that used all the
> space I could give them in 160 HD cabinets.  Two 160 HD cabinets was
> neither within the budget nor going to perform well.  I =had= to use
> RAID 5.  RAID 10 was just not usage efficient enough.
>
>
> >Must be a hard sell if you've bought decent enough hardware that
> >your benchmarks can't demonstrate a difference between a RAID 5 and
> >a RAID 10 configuration on that chassis except in degraded mode (and
> >the customer doesn't want to pay double for degraded mode performance)
>
> I have =never= had this situation.  RAID 10 latency is better than
> RAID 5 latency.  RAID 10 write speed under heavy enough load, of any
> type, is faster than RAID 5 write speed under the same
> circumstances.  RAID 10 robustness is better as well.
>
> Problem is that sometimes budget limits or number of HDs needed
> limits mean you can't use RAID 10.
>
>
> >>In reality I have yet to benchmark a system where RAID 5 on the
> >>same number of drives with 8 drives or less in a single array beat
> >>a RAID 10 with the same number of drives.
> >
> >Well, those are frankly little arrays, probably on lousy controllers...
> Nah.  Regardless of controller I can take any RAID 5 and any RAID 10
> built on the same HW under the same OS running the same DBMS and
> =guarantee= there is an IO load above which it can be shown that the
> RAID 10 will do writes faster than the RAID 5.  The only exception in
> my career thus far has been the aforementioned Mylex controller.
>
> OTOH, sometimes you have no choice but to "take the hit" and use RAID 5.
>
>
> cheers,
> Ron
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: What's the best hardver for PostgreSQL 8.1?

From

Michael Stone

Date:

27 December 2005, 15:05:24

On Tue, Dec 27, 2005 at 11:50:16AM -0500, Ron wrote:
>Sorry.  A decade+ RWE in production with RAID 5 using controllers as
>bad as Adaptec and as good as Mylex, Chaparral, LSI Logic (including
>their Engino stuff), and Xyratex under 5 different OS's (Sun, Linux,
>M$, DEC, HP) on each of Oracle, SQL Server, DB2, mySQL, and pg shows
>that RAID 5 writes are slower than RAID 5 reads

What does that have to do with anything? That wasn't the question...

>RAID 5 is not a silver bullet.

Who said it was? Nothing is, not even RAID 10. The appropriate thing to
do is to make decisions based on requirements, not to make sweeping
statements that eliminate entire categories of solutions based on hand
waving.

Mike Stone

Re: What's the best hardver for PostgreSQL 8.1?

From

"Luke Lonergan"

Date:

27 December 2005, 15:18:34

Bruce,

On 12/27/05 9:51 AM, "Bruce Momjian" <pgman@candle.pha.pa.us> wrote:

> Historically, I have heard that RAID5 is only faster than RAID10 if
> there are six or more drives.

I think the real question here is "faster for what?"  Also, just like the
optimizer tunables for cpu/disk/memory speed relationships, the standing
guidance for RAID has become outdated.  Couple that with the predominance of
really bad hardware RAID controllers and people not testing them or
reporting their performance (HP, Adaptec, LSI, Dell) and we've got a mess.

All we can really do is report success with various point solutions.

RAID5 and RAID50 work fine for our customers who do OLAP type applications
which are read-mostly.  However, it only works well on good hardware and
software, which at this time include the HW RAID controllers from 3Ware and
reputedly Areca and SW using Linux SW RAID.

I've heard that the external storage RAID controllers from EMC work well,
and I'd suspect there are others, but none of the host-based SCSI HW RAID
controllers I've tested work well on Linux.  I say Linux, because I'm pretty
sure that the HP smartarray controllers work well on Windows, but the Linux
driver is so bad I'd say it doesn't work at all.

WRT RAID10, it seems like throwing double the number of disks at the
problems is something to be avoided if possible, though the random write
performance may be important for OLTP.  I think this assertion should be
retested however in light of the increased speed of checksumming hardware
and / or CPUs and faster, more effective drive electronics (write combining,
write cache, etc).

- Luke

Re: What's the best hardver for PostgreSQL 8.1?

From

"Luke Lonergan"

Date:

27 December 2005, 15:24:34

Bruce,

On 12/27/05 9:51 AM, "Bruce Momjian" <pgman@candle.pha.pa.us> wrote:

> Historically, I have heard that RAID5 is only faster than RAID10 if
> there are six or more drives.

Speaking of testing / proof, check this site out:

  http://www.wlug.org.nz/HarddiskBenchmarks

I really like the idea - post your bonnie++ results so people can learn from
your configurations.

We've built a performance reporting site, but we can't seem to get it into
shape for release.  I'd really like to light a performance leaderboard /
experiences site up somewhere...

- Luke

Re: What's the best hardver for PostgreSQL 8.1?

From

Ron

Date:

27 December 2005, 15:57:23

At 02:05 PM 12/27/2005, Michael Stone wrote:
>On Tue, Dec 27, 2005 at 11:50:16AM -0500, Ron wrote:
>>Sorry.  A decade+ RWE in production with RAID 5 using controllers
>>as bad as Adaptec and as good as Mylex, Chaparral, LSI Logic
>>(including their Engino stuff), and Xyratex under 5 different OS's
>>(Sun, Linux, M$, DEC, HP) on each of Oracle, SQL Server, DB2,
>>mySQL, and pg shows that RAID 5 writes are slower than RAID 5 reads
>
>What does that have to do with anything? That wasn't the question...
Your quoted position is "there isn't a 'RAID 5 penalty' assuming
you've got a reasonably fast controller and you're doing large
sequential writes (or have enough cache that random writes can be
batched as large sequential writes)."

My experience across a wide range of HW, OSs, DBMS, and applications
says you are wrong.  Given enough IO, RAID 5 takes a bigger
performance hit for writes than RAID 10 does.

Enough IO, sequential or otherwise, will result in a situation where
a RAID 10 array using the same number of HDs (and therefore of ~1/2
the usable capacity) will have better write performance than the
equivalent RAID 5 built using the same number of HDs.
There is a 'RAID 5 write penalty'.

Said RAID 10 array will also be more robust than a RAID 5 built using
the same number of HDs.

OTOH, that does not make RAID 5 "bad".  Nor are statements like
"Never use RAID 5!" realistic or reasonable.

Also, performance is not the only or even most important reason for
choosing RAID 10 or RAID 50 over RAID 5.  Robustness considerations
can be more important than performance ones.

cheers,
Ron

Re: What's the best hardver for PostgreSQL 8.1?

From

Michael Stone

Date:

27 December 2005, 17:15:27

On Tue, Dec 27, 2005 at 02:57:13PM -0500, Ron wrote:
>Your quoted position is "there isn't a 'RAID 5 penalty' assuming
>you've got a reasonably fast controller and you're doing large
>sequential writes (or have enough cache that random writes can be
>batched as large sequential writes)."

And you said that RAID 5 writes are slower than reads. That's a
completely different statement. The traditional meaning of "RAID 5
penalty" is the cost of reading a stripe to calculate parity if only a
small part of the stripe changes. It has a special name because it can
result in a condition that the performance is catastrophically worse
than an optimal workload, or even the single-disk non-RAID case. It's
still an issue, but might not be relevant for a particular workload.
(Hence the recommendation to benchmark.)

>My experience across a wide range of HW, OSs, DBMS, and applications
>says you are wrong.  Given enough IO, RAID 5 takes a bigger
>performance hit for writes than RAID 10 does.

I don't understand why you keep using the pejorative term "performance
hit". Try describing the "performance characteristics" instead.  Also,
claims about performance claims based on experience are fairly useless.
Either you have data to provide (in which case claiming vast experience
is unnecessary) or you don't.

>Said RAID 10 array will also be more robust than a RAID 5 built using
>the same number of HDs.

And a RAID 6 will be more robust than either. Basing reliability on
"hopefully you wont have both disks in a mirror fail" is just silly.
Either you need double disk failure protection or you don't.

Mike Stone

Re: What's the best hardver for PostgreSQL 8.1?

From

Ron

Date:

27 December 2005, 18:47:58

At 04:15 PM 12/27/2005, Michael Stone wrote:
>I don't understand why you keep using the pejorative term "performance
>hit". Try describing the "performance characteristics" instead.

pe·jor·a·tive    ( P )  Pronunciation Key  (p-jôr-tv, -jr-, pj-rtv, pj-)
adj.
Tending to make or become worse.
Disparaging; belittling.

RAID 5 write performance is significantly enough
less than RAID 5 read performance as to be a
matter of professional note and concern.  That's
not "disparaging or belittling" nor is it
"tending to make or become worse".  It's
measurable fact that has an adverse impact on
capacity planning, budgeting, HW deployment, etc.

If you consider calling a provable decrease in
performance while doing a certain task that has
such effects "a hit" or "bad" pejorative, you are
using a definition for the word that is different than the standard one.

>Also, claims about performance claims based on experience are fairly useless.
>Either you have data to provide (in which case claiming vast experience
>is unnecessary) or you don't.

My experience _is_ the data provided.  Isn't it
convenient for you that I don't have the records
for every job I've done in 20 years, nor do I
necessarily have the right to release some
specifics for some of what I do have.  I've said
what I can as a service to the
community.  Including to you.  Your reaction
implies that I and others with perhaps equally or
more valuable experience to share shouldn't bother.

"One of the major differences between Man and
Beast is that Man learns from others experience."

It's also impressive that you evidently seem to
be implying that you do such records for your own
job experience _and_ that you have the legal
right to publish them.  In which case, please
feel free to impress me further by doing so.

>>Said RAID 10 array will also be more robust
>>than a RAID 5 built using the same number of HDs.
>
>And a RAID 6 will be more robust than either. Basing reliability on
>"hopefully you wont have both disks in a mirror fail" is just silly.
>Either you need double disk failure protection or you don't.
That statement is incorrect and ignores both
probability and real world statistical failure patterns.

The odds of a RAID 10 array of n HDs suffering a
failure that loses data are less than the odds of
it happening in a RAID 6 array of n HDs.  You are
correct that RAID 6 is more robust than RAID 5.

cheers,
Ron

Re: What's the best hardver for PostgreSQL 8.1?

From

Markus Schaber

Date:

05 January 2006, 12:44:14

Hi, William,

William Yu wrote:

> Random write performance (small block that only writes to 1 drive):
> 1 write requires N-1 reads + N writes --> 1/2N-1 %

This is not true. Most Raid-5 engines use XOR or similar checksum
methods. As opposed to cryptographic checksums, those can be updated and
corrected incrementally.

check_new = check_old xor data_old xor data_new

So 2 reads and 2 writes are enough: read data and checksum, then adjust
the checksum via the data difference, and write data and new checksum.

And often, the old data block still is in cache, accounting to 1 read
and two writes.

HTH,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org