Thread: Tuning PostgreSQL

Tuning PostgreSQL

From

"Alexander Priem"

Date:

21 July 2003, 05:30:52

Hi guys,

I am new to PostgreSQL and have done some "extensive" research already. If you could give me some advice/confirmation, I would be really grateful.

I am going to build a PostgreSQL database server for a client. This database will contain many tables (over 100, maybe more), with some of them containing over 1 million records pretty soon. This database will be accessed via a desktop application (Windows) and a web application (PHP/Apache). There may be over 100 people accessing the database at any given time (selecting, inserting and updating), but I don't think it will be a lot more than 100 at any given time. Most of the time, it will be less.

What I am thinking about is buying a server with the following specifications:

* 1 or 2 Intel Xeon processors (2.4 GHz).

* 2 Gigabytes of RAM (DDR/ECC).

* Three 36Gb SCSI160 disks (10.000rpm) in a RAID-5 config, giving 72Gb storage space (right?). The RAID-5 controller has a (hardware) cache of 128Mb.

* 100Mbit ethernet.

I will run RedHat Linux 9 (kernel 2.40) with PostgreSQL 7.3.3 on this server.

What would you think of this hardware config? Would it do? Of would 4Gb RAM be a lot better? What do you think about the need for two Xeon procs?

Finally, I have some questions about postgresql.conf (who doesnt?). After some research, I think I will go for the following settings initially. Maybe benchmarking will lead to other values, but I think these settings will be a fine starting point :

shared_buffers = 6000 (kernel.shmmax = 60000000)

sort_mem = 4096

max_connections = 150

vacuum_mem = 65536

What do you think of these settings? Do you have any other hints for optimizing PostgreSQL

Many many thanks in advance :)

Kind regards,

Alexander Priem
CICT Solutions
Email: ap@cict.nl
Internet: www.cict.nl

Re: Tuning PostgreSQL

From

"Shridhar Daithankar"

Date:

21 July 2003, 05:51:20

On 21 Jul 2003 at 10:31, Alexander Priem wrote:
> What I am thinking about is buying a server with the following specifications:
>
> * 1 or 2 Intel Xeon processors (2.4 GHz).
> * 2 Gigabytes of RAM (DDR/ECC).
> * Three 36Gb SCSI160 disks (10.000rpm) in a RAID-5 config, giving 72Gb storage
> space (right?). The RAID-5 controller has a(hardware) cache of 128Mb.
> * 100Mbit ethernet.
>
> I will run RedHat Linux 9 (kernel 2.40) with PostgreSQL 7.3.3 on this server.

You might scale down a little on hardware front if required. Of course, if you
can get it, get it.

> What would you think of this hardware config? Would it do? Of would 4Gb RAM be
> a lot better? What do you think about the need for two Xeon procs?

I would say get an SMP board with one processor in it. If requierd you can
upgrade. I suppose that would make hefty difference in price.

> shared_buffers = 6000(kernel.shmmax = 60000000)
> sort_mem = 4096
> max_connections = 150
> vacuum_mem = 65536

effective_cache_size
noatime for data partition
A good filesystem.
WAL on separate drive.

Now that is a good start..

Bye
 Shridhar

--
QOTD:    "I'm on a seafood diet -- I see food and I eat it."

Re: Tuning PostgreSQL

From

"Shridhar Daithankar"

Date:

21 July 2003, 06:32:32

Hi Alexander ,

On 21 Jul 2003 at 11:23, Alexander Priem wrote:
> So the memory settings I specified are pretty much OK?

As of now yes, You need to test with these settings and make sure that they
perform as per your requirement. That tweaking will always be there...

> What would be good guidelines for setting effective_cache_size, noatime ?

I suggest you look at
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html.

That should help you.

> I use ext3 filesystem, which probably is not the best performer, is it?

No. You also need to check ext2, reiser and XFS. There is no agreement between
users as in what works best. You need to benchmark and decide.

> I will set the WAL on a separate drive. What do I need to change in the conf
> files to achive this?

No. You need to shutdown postgresql server process and symlink WAL and clog
directories in postgresql database cluster to another place. That should do it.

HTH

Bye
 Shridhar

--
Meade's Maxim:    Always remember that you are absolutely unique, just like everyone else.

Re: Tuning PostgreSQL

From

"Alexander Priem"

Date:

21 July 2003, 06:40:24

Thanks, I will look at the site you sent me and purchase some hardware. Then
I will run some benchmarks.

Kind regards,
Alexander.

----- Original Message -----
From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>
To: "Alexander Priem" <ap@cict.nl>
Cc: <pgsql-performance@postgresql.org>
Sent: Monday, July 21, 2003 11:33 AM
Subject: Re: [PERFORM] Tuning PostgreSQL


> Hi Alexander ,
>
> On 21 Jul 2003 at 11:23, Alexander Priem wrote:
> > So the memory settings I specified are pretty much OK?
>
> As of now yes, You need to test with these settings and make sure that
they
> perform as per your requirement. That tweaking will always be there...
>
> > What would be good guidelines for setting effective_cache_size, noatime
?
>
> I suggest you look at
> http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html.
>
> That should help you.
>
> > I use ext3 filesystem, which probably is not the best performer, is it?
>
> No. You also need to check ext2, reiser and XFS. There is no agreement
between
> users as in what works best. You need to benchmark and decide.
>
> > I will set the WAL on a separate drive. What do I need to change in the
conf
> > files to achive this?
>
> No. You need to shutdown postgresql server process and symlink WAL and
clog
> directories in postgresql database cluster to another place. That should
do it.
>
> HTH
>
> Bye
>  Shridhar
>
> --
> Meade's Maxim: Always remember that you are absolutely unique, just like
everyone else.
>

Re: Tuning PostgreSQL

From

Ang Chin Han

Date:

21 July 2003, 07:16:05

Shridhar Daithankar wrote:
> On 21 Jul 2003 at 11:23, Alexander Priem wrote:

>>I use ext3 filesystem, which probably is not the best performer, is it?
>
> No. You also need to check ext2, reiser and XFS. There is no agreement between
> users as in what works best. You need to benchmark and decide.

Need? Maybe I'm a bit disillusioned, but are the performances between
the filesystems differ so much as to warrant the additional effort?
(e.g. XFS doesn't come with Red Hat 9 -- you'll have to patch the
source, and compile it yourself).

Benchmarking it properly before deployment is tough: are the test load
on the db/fs representative of actual load? Is 0.5% reduction in CPU
usage worth it? Did you test for catastrophic failure by pulling the
plug during write operations (ext2) to test if the fs can handle it? Is
the code base for the particular fs stable enough? Obscure bugs in the fs?

For the record, we tried several filesystems, but stuck with 2.4.9's
ext3 (Red Hat Advanced Server). Didn't hit a load high enough for the
filesystem choices to matter after all. :(

--
Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386
GNU/Linux
   5:30pm  up 207 days,  8:35,  5 users,  load average: 5.33, 5.16, 5.21

Attachment

msg-22763-2689.dat

Re: Tuning PostgreSQL

From

"Shridhar Daithankar"

Date:

21 July 2003, 07:32:25

On 21 Jul 2003 at 18:09, Ang Chin Han wrote:

> Shridhar Daithankar wrote:
> > On 21 Jul 2003 at 11:23, Alexander Priem wrote:
>
> >>I use ext3 filesystem, which probably is not the best performer, is it?
> >
> > No. You also need to check ext2, reiser and XFS. There is no agreement between
> > users as in what works best. You need to benchmark and decide.
>
> Need? Maybe I'm a bit disillusioned, but are the performances between
> the filesystems differ so much as to warrant the additional effort?
> (e.g. XFS doesn't come with Red Hat 9 -- you'll have to patch the
> source, and compile it yourself).

Well, the benchmarking is not to prove which filesystem is fastest and feature
rich but to find out which one suits your needs best.

> Benchmarking it properly before deployment is tough: are the test load
> on the db/fs representative of actual load? Is 0.5% reduction in CPU
> usage worth it? Did you test for catastrophic failure by pulling the
> plug during write operations (ext2) to test if the fs can handle it? Is
> the code base for the particular fs stable enough? Obscure bugs in the fs?

Well, that is what that 'benchmark' is supposed to find out. Call it pre-
deployment testing or whatever other fancy name one sees fit. But it is a must
in almost all serious usage.

> For the record, we tried several filesystems, but stuck with 2.4.9's
> ext3 (Red Hat Advanced Server). Didn't hit a load high enough for the
> filesystem choices to matter after all. :(

Good for you. You have time at hand to find out which one suits you best. Do
the testing before you have load that needs another FS..:-)

Bye
 Shridhar

--
It would be illogical to assume that all conditions remain stable.        -- Spock, "The Enterprise" Incident",
stardate5027.3

Re: Tuning PostgreSQL

From

Ang Chin Han

Date:

21 July 2003, 08:24:18

Shridhar Daithankar wrote:

> Good for you. You have time at hand to find out which one suits you best. Do
> the testing before you have load that needs another FS..:-)

Kinda my point is that when we've more load, we'd be using RAID-0 over
RAID-5, or getting faster SCSI drives, or even turn fsync off if that's
a bottleneck, because the different filesystems do not have that much
performance difference[1] -- the filesystem is not a bottleneck. Just
need to tweak most of them a bit, like noatime,data=writeback.

[1] That is, AFAIK, from our testing. Please, please correct me if I'm
wrong: has anyone found that different filesystems produces wildly
different performance for postgresql, FreeBSD's filesystems not included?

--
Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386
GNU/Linux
   7:00pm  up 207 days, 10:05,  5 users,  load average: 5.00, 5.03, 5.06

Attachment

msg-22763-2692.dat

Re: Tuning PostgreSQL

From

"Shridhar Daithankar"

Date:

21 July 2003, 08:43:04

On 21 Jul 2003 at 19:27, Ang Chin Han wrote:
> [1] That is, AFAIK, from our testing. Please, please correct me if I'm
> wrong: has anyone found that different filesystems produces wildly
> different performance for postgresql, FreeBSD's filesystems not included?

well, when postgresql starts splitting table files after a gig, filesystem sure
makes difference. IIRC, frommy last test XFS was at least 10-15% faster than
reiserfs for such databases. That was around an year back, with mandrake 8.0.

Bye
 Shridhar

--
modesty, n.:    Being comfortable that others will discover your greatness.

Re: Tuning PostgreSQL

From

"Alexander Priem"

Date:

21 July 2003, 09:11:17

So where can I set the noatime & data=writeback variables? They are not
PostgreSQL settings, but rather Linux settings, right? Where can I find
these?

Kind regards,
Alexander Priem.


----- Original Message -----
From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>
To: <pgsql-performance@postgresql.org>
Sent: Monday, July 21, 2003 12:31 PM
Subject: Re: [PERFORM] Tuning PostgreSQL


> On 21 Jul 2003 at 18:09, Ang Chin Han wrote:
>
> > Shridhar Daithankar wrote:
> > > On 21 Jul 2003 at 11:23, Alexander Priem wrote:
> >
> > >>I use ext3 filesystem, which probably is not the best performer, is
it?
> > >
> > > No. You also need to check ext2, reiser and XFS. There is no agreement
between
> > > users as in what works best. You need to benchmark and decide.
> >
> > Need? Maybe I'm a bit disillusioned, but are the performances between
> > the filesystems differ so much as to warrant the additional effort?
> > (e.g. XFS doesn't come with Red Hat 9 -- you'll have to patch the
> > source, and compile it yourself).
>
> Well, the benchmarking is not to prove which filesystem is fastest and
feature
> rich but to find out which one suits your needs best.
>
> > Benchmarking it properly before deployment is tough: are the test load
> > on the db/fs representative of actual load? Is 0.5% reduction in CPU
> > usage worth it? Did you test for catastrophic failure by pulling the
> > plug during write operations (ext2) to test if the fs can handle it? Is
> > the code base for the particular fs stable enough? Obscure bugs in the
fs?
>
> Well, that is what that 'benchmark' is supposed to find out. Call it pre-
> deployment testing or whatever other fancy name one sees fit. But it is a
must
> in almost all serious usage.
>
> > For the record, we tried several filesystems, but stuck with 2.4.9's
> > ext3 (Red Hat Advanced Server). Didn't hit a load high enough for the
> > filesystem choices to matter after all. :(
>
> Good for you. You have time at hand to find out which one suits you best.
Do
> the testing before you have load that needs another FS..:-)
>
> Bye
>  Shridhar
>
> --
> It would be illogical to assume that all conditions remain stable. --
Spock, "The Enterprise" Incident", stardate 5027.3
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

Re: Tuning PostgreSQL

From

"Shridhar Daithankar"

Date:

21 July 2003, 09:23:54

On 21 Jul 2003 at 13:45, Alexander Priem wrote:

> So where can I set the noatime & data=writeback variables? They are not
> PostgreSQL settings, but rather Linux settings, right? Where can I find
> these?

These are typicaly set in /etc/fstab.conf. These are mount settings. man mount
for more details.

The second setting data=writeback is ext3 specific, IIRC.

HTH

Bye
 Shridhar

--
History tends to exaggerate.        -- Col. Green, "The Savage Curtain", stardate
5906.4

Re: Tuning PostgreSQL

From

"Alexander Priem"

Date:

21 July 2003, 09:52:37

Thanks, i'll look further into these mount setting.

I was just thinking, the server will have a (RAID) controller containing
128Mb of battery-backed cache memory. This would really speed up inserts to
the disk and would prevent data loss in case of a power-down also.

What would you guys think of not using RAID5 in that case, but just a really
fast 15.000 rpm SCSI-320 disk?

Kind regards,
Alexander.

----- Original Message -----
From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>
To: <pgsql-performance@postgresql.org>
Sent: Monday, July 21, 2003 2:05 PM
Subject: Re: [PERFORM] Tuning PostgreSQL

> On 21 Jul 2003 at 13:45, Alexander Priem wrote:
>
> > So where can I set the noatime & data=writeback variables? They are not
> > PostgreSQL settings, but rather Linux settings, right? Where can I find
> > these?
>
> These are typicaly set in /etc/fstab.conf. These are mount settings. man
mount
> for more details.
>
> The second setting data=writeback is ext3 specific, IIRC.
>
> HTH
>
> Bye
>  Shridhar
>
> --
> History tends to exaggerate. -- Col. Green, "The Savage Curtain", stardate
> 5906.4
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Tuning PostgreSQL

From

"Roman Fail"

Date:

21 July 2003, 10:45:45

> What would you guys think of not using RAID5 in that case, but just a really
> fast 15.000 rpm SCSI-320 disk?

 
I'd say you must be able to tolerate losing all the data since your last database backup.  Your battery backed cache,
rotationalspeed, and transfer rate aren't going to help at all when the drive itself degrades and corrupts data.  If
youcan really only afford 3 drives, I'd have a single drive with the OS & WAL on it, and the data on a RAID-1 mirror
setusing the other 2 drives.  If you need more space for data, or want your OS drives to be mirrored - it's going to
costmore.  See if you can get 2x18GB drives for the OS and 2x73GB drives for the data.    
 
 
You have to consider how much headache that small amount of additional money is going to save you (and your users) down
theroad.
 
 
Roman

    -----Original Message----- 
    From: Alexander Priem [mailto:ap@cict.nl] 
    Sent: Mon 7/21/2003 5:43 AM 
    To: shridhar_daithankar@persistent.co.in; pgsql-performance@postgresql.org 
    Cc: 
    Subject: Re: [PERFORM] Tuning PostgreSQL
    
    

    Thanks, i'll look further into these mount setting.
    
    I was just thinking, the server will have a (RAID) controller containing
    128Mb of battery-backed cache memory. This would really speed up inserts to
    the disk and would prevent data loss in case of a power-down also.
    
    What would you guys think of not using RAID5 in that case, but just a really
    fast 15.000 rpm SCSI-320 disk?
    
    Kind regards,
    Alexander.
    
    
    ----- Original Message -----
    From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>
    To: <pgsql-performance@postgresql.org>
    Sent: Monday, July 21, 2003 2:05 PM
    Subject: Re: [PERFORM] Tuning PostgreSQL
    
    
    > On 21 Jul 2003 at 13:45, Alexander Priem wrote:
    >
    > > So where can I set the noatime & data=writeback variables? They are not
    > > PostgreSQL settings, but rather Linux settings, right? Where can I find
    > > these?
    >
    > These are typicaly set in /etc/fstab.conf. These are mount settings. man
    mount
    > for more details.
    >
    > The second setting data=writeback is ext3 specific, IIRC.
    >
    > HTH
    >
    > Bye
    >  Shridhar
    >
    > --
    > History tends to exaggerate. -- Col. Green, "The Savage Curtain", stardate
    > 5906.4
    >
    >
    > ---------------------------(end of broadcast)---------------------------
    > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
    
    
    ---------------------------(end of broadcast)---------------------------
    TIP 4: Don't 'kill -9' the postmaster

Re: Tuning PostgreSQL

From

"Alexander Priem"

Date:

21 July 2003, 10:59:41

That's true, certainly, and with four disks (2x18 and 2x72 or 36), I would
be able to (a) be safe and (b) split the data and WAL.

Hmmm. Seems to me that this setup would be better than one RAID5 with three
36Gb disks, wouldn't you think so? With one RAID5 array, I would still have
the data and the WAL on one volume...

Thanks for all your help so far.

Kind regards,
Alexander Priem.




----- Original Message -----
From: "Roman Fail" <rfail@posportal.com>
To: "Alexander Priem" <ap@cict.nl>; <shridhar_daithankar@persistent.co.in>;
<pgsql-performance@postgresql.org>
Sent: Monday, July 21, 2003 3:45 PM
Subject: Re: [PERFORM] Tuning PostgreSQL


> > What would you guys think of not using RAID5 in that case, but just a
really
> > fast 15.000 rpm SCSI-320 disk?
>
>
> I'd say you must be able to tolerate losing all the data since your last
database backup.  Your battery backed cache, rotational speed, and transfer
rate aren't going to help at all when the drive itself degrades and corrupts
data.  If you can really only afford 3 drives, I'd have a single drive with
the OS & WAL on it, and the data on a RAID-1 mirror set using the other 2
drives.  If you need more space for data, or want your OS drives to be
mirrored - it's going to cost more.  See if you can get 2x18GB drives for
the OS and 2x73GB drives for the data.
>
> You have to consider how much headache that small amount of additional
money is going to save you (and your users) down the road.
>
> Roman
>
> -----Original Message-----
> From: Alexander Priem [mailto:ap@cict.nl]
> Sent: Mon 7/21/2003 5:43 AM
> To: shridhar_daithankar@persistent.co.in; pgsql-performance@postgresql.org
> Cc:
> Subject: Re: [PERFORM] Tuning PostgreSQL
>
>
>
> Thanks, i'll look further into these mount setting.
>
> I was just thinking, the server will have a (RAID) controller containing
> 128Mb of battery-backed cache memory. This would really speed up inserts
to
> the disk and would prevent data loss in case of a power-down also.
>
> What would you guys think of not using RAID5 in that case, but just a
really
> fast 15.000 rpm SCSI-320 disk?
>
> Kind regards,
> Alexander.
>
>
> ----- Original Message -----
> From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>
> To: <pgsql-performance@postgresql.org>
> Sent: Monday, July 21, 2003 2:05 PM
> Subject: Re: [PERFORM] Tuning PostgreSQL
>
>
> > On 21 Jul 2003 at 13:45, Alexander Priem wrote:
> >
> > > So where can I set the noatime & data=writeback variables? They are
not
> > > PostgreSQL settings, but rather Linux settings, right? Where can I
find
> > > these?
> >
> > These are typicaly set in /etc/fstab.conf. These are mount settings. man
> mount
> > for more details.
> >
> > The second setting data=writeback is ext3 specific, IIRC.
> >
> > HTH
> >
> > Bye
> >  Shridhar
> >
> > --
> > History tends to exaggerate. -- Col. Green, "The Savage Curtain",
stardate
> > 5906.4
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>

Re: Tuning PostgreSQL

From

Josh Berkus

Date:

21 July 2003, 13:08:01

Alexander,

> Hmmm. Seems to me that this setup would be better than one RAID5 with three
> 36Gb disks, wouldn't you think so? With one RAID5 array, I would still have
> the data and the WAL on one volume...

Definitely.   As I've said, my experience with RAID5 is that with less than 5
disks, it performs around 40% of a single scsi disk for large read-write
operation on Postgres.

If you have only 3 disks, I'd advocate one disk for WAL and one RAID 1 array
for the database.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Tuning PostgreSQL

From

Vincent van Leeuwen

Date:

21 July 2003, 13:32:33

On 2003-07-21 09:06:10 -0700, Josh Berkus wrote:
> Alexander,
>
> > Hmmm. Seems to me that this setup would be better than one RAID5 with three
> > 36Gb disks, wouldn't you think so? With one RAID5 array, I would still have
> > the data and the WAL on one volume...
>
> Definitely.   As I've said, my experience with RAID5 is that with less than 5
> disks, it performs around 40% of a single scsi disk for large read-write
> operation on Postgres.
>
> If you have only 3 disks, I'd advocate one disk for WAL and one RAID 1 array
> for the database.
>

In this setup your database is still screwed if a single disk (the WAL disk)
stops working. You'll have to revert to your last backup if this happens. The
RAID-1 redundancy on your data disks buys you almost nothing: marginally
better performance and no real redundancy should a single disk fail.

I'd use RAID-5 if you absolutely cannot use more disks, but I would use
RAID-10 or two RAID-1 partitions if you can afford to use 4 disks.

Vincent van Leeuwen
Media Design - http://www.mediadesign.nl/

Re: Tuning PostgreSQL

From

"Alexander Priem"

Date:

22 July 2003, 04:03:44

Hi all,

Vincent, You said that using RAID1, you don't have real redundancy. But
RAID1 is mirroring, right? So if one of the two disks should fail, there
should be no data lost, right?

I have been thinking some more. 18Gb drives are cheaper than 36 or 72Gb
drives. I don't know if I can get the money for this, but how would the
following setup sound?

Two 18Gb (15.000rpm) disks in RAID1 array for Operating System + WAL.
Four 18Gb (15.000rpm) disks in RAID5 array for data.

For the same amount of money, I could also get:

Two 36Gb (10.000rpm) disks in RAID1 array for Operating System + WAL.
Five/Six 36Gb (10.000rpm) disks in RAID5 array for data.

Which would be the best of the above? The one with four 15k-rpm disks or the
one with five/six 10k-rpm disks?
Would these configs be better than all disks in one huge RAID5 array? There
are so many possible configs with RAID.......

Kind regards,
Alexander Priem.

----- Original Message -----
From: "Vincent van Leeuwen" <pgsql.spam@vinz.nl>
To: <pgsql-performance@postgresql.org>
Sent: Monday, July 21, 2003 6:28 PM
Subject: Re: [PERFORM] Tuning PostgreSQL

> On 2003-07-21 09:06:10 -0700, Josh Berkus wrote:
> > Alexander,
> >
> > > Hmmm. Seems to me that this setup would be better than one RAID5 with
three
> > > 36Gb disks, wouldn't you think so? With one RAID5 array, I would still
have
> > > the data and the WAL on one volume...
> >
> > Definitely.   As I've said, my experience with RAID5 is that with less
than 5
> > disks, it performs around 40% of a single scsi disk for large read-write
> > operation on Postgres.
> >
> > If you have only 3 disks, I'd advocate one disk for WAL and one RAID 1
array
> > for the database.
> >
>
> In this setup your database is still screwed if a single disk (the WAL
disk)
> stops working. You'll have to revert to your last backup if this happens.
The
> RAID-1 redundancy on your data disks buys you almost nothing: marginally
> better performance and no real redundancy should a single disk fail.
>
> I'd use RAID-5 if you absolutely cannot use more disks, but I would use
> RAID-10 or two RAID-1 partitions if you can afford to use 4 disks.
>
> Vincent van Leeuwen
> Media Design - http://www.mediadesign.nl/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Tuning PostgreSQL

From

Vincent van Leeuwen

Date:

22 July 2003, 06:44:15

On 2003-07-22 09:04:42 +0200, Alexander Priem wrote:
> Hi all,
>
> Vincent, You said that using RAID1, you don't have real redundancy. But
> RAID1 is mirroring, right? So if one of the two disks should fail, there
> should be no data lost, right?
>

Right. But the proposal was a single disk for WAL, without redundancy, and I
argued that wasn't really safe. RAID1 by itself is extremely safe, possibly
even the safest RAID type there is.

> I have been thinking some more. 18Gb drives are cheaper than 36 or 72Gb
> drives. I don't know if I can get the money for this, but how would the
> following setup sound?
>
> Two 18Gb (15.000rpm) disks in RAID1 array for Operating System + WAL.
> Four 18Gb (15.000rpm) disks in RAID5 array for data.
>

Our own testing has shown that a 6 disk RAID-10 array is faster than what you
describe. Of course, this is very much dependant on how much INSERT/UPDATES
you generate (which taxes your WAL more), so your mileage may vary.

> For the same amount of money, I could also get:
>
> Two 36Gb (10.000rpm) disks in RAID1 array for Operating System + WAL.
> Five/Six 36Gb (10.000rpm) disks in RAID5 array for data.
>

It is said that a higher RPM is particularly useful for a WAL disk. So you
might consider using two 18GB 15K rpm drives for a RAID-1 WAL disk (+OS and
swap), and using 36GB 10K rpm disks in a RAID-5 array if you need that
diskspace.

> Which would be the best of the above? The one with four 15k-rpm disks or the
> one with five/six 10k-rpm disks?
> Would these configs be better than all disks in one huge RAID5 array? There
> are so many possible configs with RAID.......
>

15K rpm disks are significantly faster than 10K rpm disks. If your only
concern is performance, buy 15K rpm disks. If you want more diskspace for your
money, fall back to larger 10K rpm disks.

I personally think seperate WAL disks are vastly overrated, since they haven't
shown a big performance gain in our own tests. But as I have said, this is
extremely dependant on the type of load you generate, so only your own tests
can tell you what you should do in this respect.

About RAID types: the fastest RAID type by far is RAID-10. However, this will
cost you a lot of useable diskspace, so it isn't for everyone. You need at
least 4 disks for a RAID-10 array. RAID-5 is a nice compromise if you want as
much useable diskspace as possible and still want to be redundant. RAID-1 is
very useful for small (2-disk) arrays.

If you have the time and are settled on buying 6 disks, I'd test the following
scenarios:
- 6-disk RAID-10 array (should perform best)
- 4-disk RAID-10 array containing data, 2-disk RAID-1 array for WAL, OS, etc
- 4-disk RAID-5 array containing data, 2-disk RAID-1 array for WAL, OS, etc
- 6-disk RAID-5 array (will probably perform worst)

Hope this helps.

Vincent van Leeuwen
Media Design - http://www.mediadesign.nl/

Re: Tuning PostgreSQL

From

Ron Johnson

Date:

22 July 2003, 07:12:50

On Mon, 2003-07-21 at 04:33, Shridhar Daithankar wrote:
> Hi Alexander ,
>
> On 21 Jul 2003 at 11:23, Alexander Priem wrote:
[snip]
> > I use ext3 filesystem, which probably is not the best performer, is it?
>
> No. You also need to check ext2, reiser and XFS. There is no agreement between
> users as in what works best. You need to benchmark and decide.

According to Jeremy Allison of SAMBA, ""They used ext3, which is one
of the slowest filesystems on Linux," Allison said. "In a real
comparative test, you would use XFS".
http://www.linuxworld.com/story/32673.htm

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+

Re: Tuning PostgreSQL

From

"Alexander Priem"

Date:

22 July 2003, 10:54:08

Wow, I never figured how many different RAID configurations one could think
of   :)

After reading lots of material, forums and of course, this mailing-list, I
think I am going for a RAID5 configuration of 6 disks (18Gb, 15.000 rpm
each), one of those six disks will be a 'hot spare'. I will just put the OS,
the WAL and the data one one volume. RAID10 is way to expensive   :)

If I understand correctly, this will give great read-performance, but less
write-performance. But since this server will be equipped with an embedded
RAID controller featuring 128Mb of battery-backed cache, I figure that this
controller will negate that (at least somewhat). I will need to find out
whether this cache can be configured so that it will ONLY cache WRITES, not
READS....

Also because of this battery backed cache controller, I will go for the ext2
file system, mounted with 'noatime'. I will use a UPS, so I don't think I
need the journaling of ext3. XFS is not natively supported by RedHat and I
will go for the easy way here   :)

1 Gb of RAM should be enough, I think. That is about the only point that
almost everyone agrees on   :)   Do you think ECC is very important? The
server I have in mind does not support it. Another one does, but is is about
1.000 euros more expensive   :(

One CPU should also be enough.

As for postgresql.conf settings, I think I will start with the following :

max_connections = 128
superuser_reserved_connections = 1
shared_buffers = 8192
max_fsm_relations = 1000
max_fsm_pages = 100000
wal_buffers = 32
sort_mem = 2048
vacuum_mem = 32768
effective_cache_size = 28672 (this one I'm not sure about, maybe this one
needs to be higher)
random_page_cost = 2
geq0_threshold = 20

This pretty much sums it up. What do you think about this config? It may not
be the fastest, but a server like this will cost about 4750 euros, and that
is including an Intel Xeon 2.4GHz cpu, redundant power supply, WITHOUT the
UPS. Seems very reasonable to me...

Kind regards,
Alexander Priem.



----- Original Message -----
From: "Vincent van Leeuwen" <pgsql.spam@vinz.nl>
To: <pgsql-performance@postgresql.org>
Sent: Tuesday, July 22, 2003 11:40 AM
Subject: Re: [PERFORM] Tuning PostgreSQL


> On 2003-07-22 09:04:42 +0200, Alexander Priem wrote:
> > Hi all,
> >
> > Vincent, You said that using RAID1, you don't have real redundancy. But
> > RAID1 is mirroring, right? So if one of the two disks should fail, there
> > should be no data lost, right?
> >
>
> Right. But the proposal was a single disk for WAL, without redundancy, and
I
> argued that wasn't really safe. RAID1 by itself is extremely safe,
possibly
> even the safest RAID type there is.
>
> > I have been thinking some more. 18Gb drives are cheaper than 36 or 72Gb
> > drives. I don't know if I can get the money for this, but how would the
> > following setup sound?
> >
> > Two 18Gb (15.000rpm) disks in RAID1 array for Operating System + WAL.
> > Four 18Gb (15.000rpm) disks in RAID5 array for data.
> >
>
> Our own testing has shown that a 6 disk RAID-10 array is faster than what
you
> describe. Of course, this is very much dependant on how much
INSERT/UPDATES
> you generate (which taxes your WAL more), so your mileage may vary.
>
> > For the same amount of money, I could also get:
> >
> > Two 36Gb (10.000rpm) disks in RAID1 array for Operating System + WAL.
> > Five/Six 36Gb (10.000rpm) disks in RAID5 array for data.
> >
>
> It is said that a higher RPM is particularly useful for a WAL disk. So you
> might consider using two 18GB 15K rpm drives for a RAID-1 WAL disk (+OS
and
> swap), and using 36GB 10K rpm disks in a RAID-5 array if you need that
> diskspace.
>
> > Which would be the best of the above? The one with four 15k-rpm disks or
the
> > one with five/six 10k-rpm disks?
> > Would these configs be better than all disks in one huge RAID5 array?
There
> > are so many possible configs with RAID.......
> >
>
> 15K rpm disks are significantly faster than 10K rpm disks. If your only
> concern is performance, buy 15K rpm disks. If you want more diskspace for
your
> money, fall back to larger 10K rpm disks.
>
> I personally think seperate WAL disks are vastly overrated, since they
haven't
> shown a big performance gain in our own tests. But as I have said, this is
> extremely dependant on the type of load you generate, so only your own
tests
> can tell you what you should do in this respect.
>
> About RAID types: the fastest RAID type by far is RAID-10. However, this
will
> cost you a lot of useable diskspace, so it isn't for everyone. You need at
> least 4 disks for a RAID-10 array. RAID-5 is a nice compromise if you want
as
> much useable diskspace as possible and still want to be redundant. RAID-1
is
> very useful for small (2-disk) arrays.
>
> If you have the time and are settled on buying 6 disks, I'd test the
following
> scenarios:
> - 6-disk RAID-10 array (should perform best)
> - 4-disk RAID-10 array containing data, 2-disk RAID-1 array for WAL, OS,
etc
> - 4-disk RAID-5 array containing data, 2-disk RAID-1 array for WAL, OS,
etc
> - 6-disk RAID-5 array (will probably perform worst)
>
>
> Hope this helps.
>
> Vincent van Leeuwen
> Media Design - http://www.mediadesign.nl/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

Re: Tuning PostgreSQL

From

"Alexander Priem"

Date:

22 July 2003, 11:05:56

Wow, I never figured how many different RAID configurations one could think
of   :)

After reading lots of material, forums and of course, this mailing-list, I
think I am going for a RAID5 configuration of 6 disks (18Gb, 15.000 rpm
each), one of those six disks will be a 'hot spare'. I will just put the OS,
the WAL and the data one one volume. RAID10 is way to expensive   :)

If I understand correctly, this will give great read-performance, but less
write-performance. But since this server will be equipped with an embedded
RAID controller featuring 128Mb of battery-backed cache, I figure that this
controller will negate that (at least somewhat). I will need to find out
whether this cache can be configured so that it will ONLY cache WRITES, not
READS....

Also because of this battery backed cache controller, I will go for the ext2
file system, mounted with 'noatime'. I will use a UPS, so I don't think I
need the journaling of ext3. XFS is not natively supported by RedHat and I
will go for the easy way here   :)

1 Gb of RAM should be enough, I think. That is about the only point that
almost everyone agrees on   :)   Do you think ECC is very important? The
server I have in mind does not support it. Another one does, but is is about
1.000 euros more expensive   :(

One CPU should also be enough.

As for postgresql.conf settings, I think I will start with the following :

max_connections = 128
superuser_reserved_connections = 1
shared_buffers = 8192
max_fsm_relations = 1000
max_fsm_pages = 100000
wal_buffers = 32
sort_mem = 2048
vacuum_mem = 32768
effective_cache_size = 28672 (this one I'm not sure about, maybe this one
needs to be higher)
random_page_cost = 2
geq0_threshold = 20

This pretty much sums it up. What do you think about this config? It may not
be the fastest, but a server like this will cost about 4750 euros, and that
is including an Intel Xeon 2.4GHz cpu, redundant power supply, WITHOUT the
UPS. Seems very reasonable to me...

Kind regards,
Alexander Priem.

Re: Tuning PostgreSQL

From

Andrew Sullivan

Date:

22 July 2003, 11:12:01

On Tue, Jul 22, 2003 at 03:27:20PM +0200, Alexander Priem wrote:

> file system, mounted with 'noatime'. I will use a UPS, so I don't think I
> need the journaling of ext3. XFS is not natively supported by RedHat and I

Just in case you're still thinking, why do you suppose that only
power failures lead to system crashes?  Surprise kernel panics due to
bad hardware or OS upgrades with bugs in them, sudden failures
because of bad memory, &c: all these things also can lead to crashes,
and though super-redundant hardware can mitigate that risk, they
can't eliminate them completely.  This is not advice, of course, but
for my money, its a bad idea not to use a journalled filesystem (or
something similar) for production systems.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: Tuning PostgreSQL

From

Ron Johnson

Date:

22 July 2003, 11:38:53

On Tue, 2003-07-22 at 07:53, Alexander Priem wrote:
> Wow, I never figured how many different RAID configurations one could think
[snip]
> Also because of this battery backed cache controller, I will go for the ext2
> file system, mounted with 'noatime'. I will use a UPS, so I don't think I
> need the journaling of ext3.

Oooooo, I don't think I'd do that!!!!!  It's akin to saying, "I
don't need to make backups, because I have RAID[1,5,10,1+0]

If the power is out for 26 minutes and your UPS only lasts for 25
minutes, you could be in be in for a long, painful boot process if
the box crashes.  (For example, the UPS auto-shutdown daemon doesn't
work properly, and no one can get to the console to shut it down
properly before the batteries die.)

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+

Re: Tuning PostgreSQL

From

"Jim C. Nasby"

Date:

22 July 2003, 11:41:51

On Tue, Jul 22, 2003 at 03:27:20PM +0200, Alexander Priem wrote:
> Wow, I never figured how many different RAID configurations one could think
> of   :)
>
> After reading lots of material, forums and of course, this mailing-list, I
> think I am going for a RAID5 configuration of 6 disks (18Gb, 15.000 rpm
> each), one of those six disks will be a 'hot spare'. I will just put the OS,
> the WAL and the data one one volume. RAID10 is way to expensive   :)
>
> If I understand correctly, this will give great read-performance, but less
> write-performance. But since this server will be equipped with an embedded
> RAID controller featuring 128Mb of battery-backed cache, I figure that this
> controller will negate that (at least somewhat). I will need to find out
> whether this cache can be configured so that it will ONLY cache WRITES, not
> READS....

I think the bigger isssue with RAID5 write performance in a database is
that it hits every spindle. The real performance bottleneck you run into
is latency, especially the latency of positioning the heads. I don't
have any proof to this theory, but I believe this is why moving WAL
and/or temp_db to seperate drives from the main database files can be a
big benefit for some applications; not because of disk bandwidth but
because it drastically cuts down the amount of time the heads have to
spend flying around the disk.

Of course, this is also highly dependant on how the filesystem operates,
too. If it puts your WALs, temp_db, and database files very close to
each other on the drive, splitting them out to seperate spindles won't
help as much.
--
Jim C. Nasby, Database Consultant                  jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Tuning PostgreSQL

From

"Alexander Priem"

Date:

22 July 2003, 12:00:59

OK, another change of plans :)

ext2 seems to be a bad idea. So i'll stick with ext3. Better safe than
sorry...

About the RAID-config: Maybe RAID-10 with six disks is affordable after all.
I would have to take the smallest disks in this case, 18Gb per disk. So six
18Gb disks (15000rpm) would result in a total capacity of 54 Gb, right? This
volume would hold OS, WAL and data, but since RAID10 appears to deliver such
great performance (according to several people), in combination with the
128Mb of battery backed cache, this would be a good solution?

Hmmm. I keep changing my mind about this. My Db would be mostly 'selecting',
but there would also be pretty much inserting and updating done. But most of
the work would be selects. So would this config be OK?

Kind regards,
Alexander.


----- Original Message -----
From: "Jim C. Nasby" <jim@nasby.net>
To: "Alexander Priem" <ap@cict.nl>
Cc: "Vincent van Leeuwen" <pgsql.spam@vinz.nl>;
<pgsql-performance@postgresql.org>
Sent: Tuesday, July 22, 2003 4:33 PM
Subject: Re: [PERFORM] Tuning PostgreSQL


> On Tue, Jul 22, 2003 at 03:27:20PM +0200, Alexander Priem wrote:
> > Wow, I never figured how many different RAID configurations one could
think
> > of   :)
> >
> > After reading lots of material, forums and of course, this mailing-list,
I
> > think I am going for a RAID5 configuration of 6 disks (18Gb, 15.000 rpm
> > each), one of those six disks will be a 'hot spare'. I will just put the
OS,
> > the WAL and the data one one volume. RAID10 is way to expensive   :)
> >
> > If I understand correctly, this will give great read-performance, but
less
> > write-performance. But since this server will be equipped with an
embedded
> > RAID controller featuring 128Mb of battery-backed cache, I figure that
this
> > controller will negate that (at least somewhat). I will need to find out
> > whether this cache can be configured so that it will ONLY cache WRITES,
not
> > READS....
>
> I think the bigger isssue with RAID5 write performance in a database is
> that it hits every spindle. The real performance bottleneck you run into
> is latency, especially the latency of positioning the heads. I don't
> have any proof to this theory, but I believe this is why moving WAL
> and/or temp_db to seperate drives from the main database files can be a
> big benefit for some applications; not because of disk bandwidth but
> because it drastically cuts down the amount of time the heads have to
> spend flying around the disk.
>
> Of course, this is also highly dependant on how the filesystem operates,
> too. If it puts your WALs, temp_db, and database files very close to
> each other on the drive, splitting them out to seperate spindles won't
> help as much.
> --
> Jim C. Nasby, Database Consultant                  jim@nasby.net
> Member: Triangle Fraternity, Sports Car Club of America
> Give your computer some brain candy! www.distributed.net Team #1828
>
> Windows: "Where do you want to go today?"
> Linux: "Where do you want to go tomorrow?"
> FreeBSD: "Are you guys coming, or what?"
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend

Re: Tuning PostgreSQL

From

Ron Johnson

Date:

22 July 2003, 12:36:45

On Tue, 2003-07-22 at 10:01, Alexander Priem wrote:
> OK, another change of plans :)
>
> ext2 seems to be a bad idea. So i'll stick with ext3. Better safe than
> sorry...

Don't forget noatime!

> About the RAID-config: Maybe RAID-10 with six disks is affordable after all.
> I would have to take the smallest disks in this case, 18Gb per disk. So six
> 18Gb disks (15000rpm) would result in a total capacity of 54 Gb, right? This
> volume would hold OS, WAL and data, but since RAID10 appears to deliver such
> great performance (according to several people), in combination with the
> 128Mb of battery backed cache, this would be a good solution?
>
> Hmmm. I keep changing my mind about this. My Db would be mostly 'selecting',
> but there would also be pretty much inserting and updating done. But most of
> the work would be selects. So would this config be OK?

Others may disagree, but I'd put the OS and executables on a separate
disk from the db and WAL, and make it an IDE drive, since it's so
much less expensive than SCSI disks.  (Make a copy of the disk, and
if it craps out, pop out the old disk, stick in the new disk, and
fire the box right back up...)

Thus, you'll have an OS/executables disk, and a separate DB disk,
and never the twain shall meet.  Theoretically, you could pick up
those 6 drives and controller, move them to another machine, and
the data should be just as it was on the other box.

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+

Re: Tuning PostgreSQL

From

Vivek Khera

Date:

22 July 2003, 13:13:17

>>>>> "AP" == Alexander Priem <ap@cict.nl> writes:

AP> Hmmm. I keep changing my mind about this. My Db would be mostly
AP> 'selecting', but there would also be pretty much inserting and
AP> updating done. But most of the work would be selects. So would
AP> this config be OK?

I'm about to order a new server.  I haven't decided exactly how many
disks I will get, but my plan is to get an 8-disk RAID10 with 15k RPM
drives.  I don't need the volume, just the speed and number of
spindles, so I'm buying the smallest drives that meet my speed
probably 18Gb each (sheesh! I remember getting my first 5Mb disk for
my 8088 PC in college and thinking that was too much space).

My mix is nearly even read/write, but probably a little biased towards
the reading.

This machine is replacing a 5-disk box that was switched from RAID5 to
4-disk RAID10 for data plus one system disk in January (what a pain
that was to re-index, but that's another story).  The switch from
RAID5 to RAID10 made an enormous improvement in performance.  The
speedup wasn't from recreating the database:  It was restored from a
file-level backup so the actual files were not compacted or secretly
"improved" in any way, other than my occasional reindexing.

So I think your 6-disk RAID10 will be good.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: Tuning PostgreSQL

From

Bruno Wolff III

Date:

22 July 2003, 14:07:01

On Tue, Jul 22, 2003 at 11:40:35 +0200,
  Vincent van Leeuwen <pgsql.spam@vinz.nl> wrote:
>
> About RAID types: the fastest RAID type by far is RAID-10. However, this will
> cost you a lot of useable diskspace, so it isn't for everyone. You need at
> least 4 disks for a RAID-10 array. RAID-5 is a nice compromise if you want as
> much useable diskspace as possible and still want to be redundant. RAID-1 is
> very useful for small (2-disk) arrays.

Note that while raid 10 requires 4 disks, you get the space of 2 disks.
This is the same ratio as for raid 1.

Re: Tuning PostgreSQL

From

"scott.marlowe"

Date:

22 July 2003, 14:32:40

On Tue, 22 Jul 2003, Jim C. Nasby wrote:

> On Tue, Jul 22, 2003 at 03:27:20PM +0200, Alexander Priem wrote:
> > Wow, I never figured how many different RAID configurations one could think
> > of   :)
> >
> > After reading lots of material, forums and of course, this mailing-list, I
> > think I am going for a RAID5 configuration of 6 disks (18Gb, 15.000 rpm
> > each), one of those six disks will be a 'hot spare'. I will just put the OS,
> > the WAL and the data one one volume. RAID10 is way to expensive   :)
> >
> > If I understand correctly, this will give great read-performance, but less
> > write-performance. But since this server will be equipped with an embedded
> > RAID controller featuring 128Mb of battery-backed cache, I figure that this
> > controller will negate that (at least somewhat). I will need to find out
> > whether this cache can be configured so that it will ONLY cache WRITES, not
> > READS....
>
> I think the bigger isssue with RAID5 write performance in a database is
> that it hits every spindle.

This is a common, and wrong misconception.

If you are writing 4k out to a RAID5 of 10 disks, this is what happens:

(assumiung 64k stipes...)
READ data stripe (64k read)
READ parity stripe (64k read)
make changes to data stripe
XOR new data stripe with old parity stripe to get a new parity stripe
write new parity stripe (64k)
write new data stripe (64k)

So it's not as bad as you might think.  No modern controller (or sw raid
for linux) hits all the spindles anymore for writes.  As you add more
drives to a RAID5 writes actually get faster on average, because there's
less chance of having contention for the same drives (remember, parity
moves about in RAID5 so the parity disk isn't a choke point in RAID5 like
it is in RAID4.)

> The real performance bottleneck you run into
> is latency, especially the latency of positioning the heads. I don't
> have any proof to this theory, but I believe this is why moving WAL
> and/or temp_db to seperate drives from the main database files can be a
> big benefit for some applications; not because of disk bandwidth but
> because it drastically cuts down the amount of time the heads have to
> spend flying around the disk.

This is absolutely true.  moving the heads costs hugely.  while most
modern drives have SEEK times <10 ms, the SETTLE times tend to be about
that as well, followed by the average of about 3 ms for rotational latency
to allow the proper sector to be under the head (10krpm drives rotate once
about every 6 ms.)

Re: Tuning PostgreSQL

From

Greg Stark

Date:

24 July 2003, 15:36:35

"scott.marlowe" <scott.marlowe@ihs.com> writes:

> If you are writing 4k out to a RAID5 of 10 disks, this is what happens:
>
> (assumiung 64k stipes...)
> READ data stripe (64k read)
> READ parity stripe (64k read)
> make changes to data stripe
> XOR new data stripe with old parity stripe to get a new parity stripe
> write new parity stripe (64k)
> write new data stripe (64k)
>
> So it's not as bad as you might think.

The main negative for RAID5 is that it had to do that extra READ. If you're
doing lots of tiny updates then the extra latency to have to go read the
parity block before it can write the parity block out is a real killer. For
that reason people prefer 0+1 for OLTP systems.

But you have to actually test your setup in practice to see if it hurts. A big
data warehousing system will be faster under RAID5 than under RAID1+0 because
of the extra disks in the stripeset. The more disks in the stripeset the more
bandwidth you get.

Even for OLTP systems I've had success with RAID5 or not depending largely on
the quality of the implementation. The Hitachi systems were amazing. They had
enough battery backed cache that the extra latency for the parity read/write
cycle really never showed up at all. But it had a lot more than 128M. I think
it had 1G and could be expanded.

--
greg

Re: Tuning PostgreSQL

From

Ron Johnson

Date:

24 July 2003, 17:17:16

On Thu, 2003-07-24 at 13:29, Greg Stark wrote:
> "scott.marlowe" <scott.marlowe@ihs.com> writes:
>
> > If you are writing 4k out to a RAID5 of 10 disks, this is what happens:
> >
> > (assumiung 64k stipes...)
> > READ data stripe (64k read)
> > READ parity stripe (64k read)
> > make changes to data stripe
> > XOR new data stripe with old parity stripe to get a new parity stripe
> > write new parity stripe (64k)
> > write new data stripe (64k)
> >
> > So it's not as bad as you might think.
>
> The main negative for RAID5 is that it had to do that extra READ. If you're
> doing lots of tiny updates then the extra latency to have to go read the
> parity block before it can write the parity block out is a real killer. For
> that reason people prefer 0+1 for OLTP systems.
>
> But you have to actually test your setup in practice to see if it hurts. A big
> data warehousing system will be faster under RAID5 than under RAID1+0 because
> of the extra disks in the stripeset. The more disks in the stripeset the more
> bandwidth you get.
>
> Even for OLTP systems I've had success with RAID5 or not depending largely on
> the quality of the implementation. The Hitachi systems were amazing. They had
> enough battery backed cache that the extra latency for the parity read/write
> cycle really never showed up at all. But it had a lot more than 128M. I think
> it had 1G and could be expanded.

Your last paragraph just stole the objection to the 1st paragraph
right out of my mouth, since enough cache will allow it to "batch"
all those tiny updates into big updates.  But those Hitachi controllers
weren't plugged into x86-type boxen, were they?

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+

Re: Tuning PostgreSQL

From

Andrew McMillan

Date:

26 July 2003, 05:38:07

On Wed, 2003-07-23 at 00:53, Alexander Priem wrote:
> Wow, I never figured how many different RAID configurations one could think
> of   :)
>
> After reading lots of material, forums and of course, this mailing-list, I
> think I am going for a RAID5 configuration of 6 disks (18Gb, 15.000 rpm
> each), one of those six disks will be a 'hot spare'. I will just put the OS,
> the WAL and the data one one volume. RAID10 is way to expensive   :)

The general heuristic is that RAID-5 is not the way to deal with
databases.  Now surely someone will disagree with me, but as I
understand it RAID-5 has a bottleneck on a single disk for the
(checksum) information.  Bottleneck is not the word you want to hear in
the context of "database server".

RAID-1 (mirroring) or RAID-10 (sort-of-mirrored-RAID-5) is the best
choice.

As far as FS performance goes, a year or two ago I remember someone
doing an evaluation of FS performance for PostgreSQL and they found that
the best performance was...

FAT

Yep: FAT

The reason is that a lot of what the database is doing, especially
guaranteeing writes (WAL) and so forth is best handled through a
filesystem that does not get in the way.  The fundamentals will not have
changed.

It is for this reason that ext2 is very much likely to be better than
ext3.  XFS is possibly (maybe, perhaps) OK, because there are
optimisations in there for databases, but the best optimisation is to
not be there at all.  That's why Oracle want direct IO to disk
partitions so they can implement their own "filesystem" (i.e. record
system... table system...) on a raw partition.

Personally I don't plan to reboot my DB server more than once a year (if
that (even my_laptop currently has 37 days uptime, not including
suspend).  On our DB servers I use ext2 (rather than ext3) mounted with
noatime, and I bite the 15 minutes to fsck (once a year) rather than
screw general performance with journalling database on top of
journalling FS.  I split pg_xlog onto a separate physical disk, if
performance requirements are extreme.

Catalyst's last significant project was to write the Domain Name
registration system for .nz (using PostgreSQL).  Currently we are
developing the electoral roll for the same country (2.8 million electors
living at 1.4 million addresses).  We use Oracle (or Progress, or MySQL)
if a client demands them, but we use PostgreSQL if we get to choose.
Increasingly we get to choose.  Good.

Regards,
                    Andrew.
--
---------------------------------------------------------------------
Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/         PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201     MOB: +64(21)635-694    OFFICE: +64(4)499-2267
           Survey for nothing with http://survey.net.nz/
---------------------------------------------------------------------

Re: Tuning PostgreSQL

From

"Arjen van der Meijden"

Date:

26 July 2003, 07:18:36

> Andrew McMillan wrote:
>
> The general heuristic is that RAID-5 is not the way to deal
> with databases.  Now surely someone will disagree with me,
> but as I understand it RAID-5 has a bottleneck on a single
> disk for the
> (checksum) information.  Bottleneck is not the word you want
> to hear in the context of "database server".
That's indeed incorrect. There is no single disk "special" in a Raid-5,
you might be mistaking it for Raid-3/4 (where a single disk holds the
checksum). In raid-5 the checksums are scattered around on all the
harddisks.
Raid-5's problem is the write-performance, but with a decent
raid-controller it outperforms a less-decent raid-controller (with the
same harddisks) on both read- and writeperformance which is running a
raid-10.
With a decent raid-controller you end up with "about the same" write
performance as with raid-1, but slightly lower read performance.

At least, that's what I was able to gather from some tests of a
colleague of mine with different raid-setups.

> RAID-1 (mirroring) or RAID-10 (sort-of-mirrored-RAID-5) is
> the best choice.
Raid-10 is _not_ similar to raid-5, it is raid1+0 i.e. a mirroring set
of stripes (raid-0 is more-or-less a stripe).

For databases, raid-10 is supposed to be the fastest, since you have the
advantage of the striping for both reading and writing. While you also
have the advantage of the mirroring for reading.

The main disadvantage of raid-1 (and also of raid-10) is the heavy waste
of harddisk space. Another advantage of raid-5 over raid-10 is that when
you don't care about space, raid-5 is more save with four harddrives
than raid-10 (i.e. set it up with a 3-disk+1spare).

> As far as FS performance goes, a year or two ago I remember
> someone doing an evaluation of FS performance for PostgreSQL
> and they found that the best performance was...
>
> FAT
>
> Yep: FAT
FAT has a few disadvantages afaik, I wouldn't use it for my database at
least.

> Personally I don't plan to reboot my DB server more than once
> a year (if that (even my_laptop currently has 37 days uptime,
> not including suspend).  On our DB servers I use ext2 (rather
> than ext3) mounted with noatime, and I bite the 15 minutes to
> fsck (once a year) rather than screw general performance with
> journalling database on top of journalling FS.  I split
> pg_xlog onto a separate physical disk, if performance
> requirements are extreme.
Well, reboting is not a problem with ext2, but crashing might be... And
normally you don't plan a systemcrash ;)
Ext3 and xfs handle that much better.

Regards,

Arjen

Re: Tuning PostgreSQL

From

Tom Lane

Date:

26 July 2003, 11:12:42

"Arjen van der Meijden" <acmmailing@vulcanus.its.tudelft.nl> writes:
> Well, reboting is not a problem with ext2, but crashing might be... And
> normally you don't plan a systemcrash ;)
> Ext3 and xfs handle that much better.

A journaling filesystem is good to use if you can set it to journal
metadata but not file contents.  PG's WAL logic can recover lost file
contents, but we have no way to help out the filesystem if it's lost
metadata.

            regards, tom lane

Re: Tuning PostgreSQL

From

"Balazs Wellisch"

Date:

26 July 2003, 18:08:54

Since there seem to be a lot of different opinions regarding the various
different RAID configurations I thought I'd post this link to the list:
http://www.storagereview.com/guide2000/ref/hdd/perf/raid/index.html

This is the best resource for information on RAID and hard drive
performance I found online.

I hope this helps.

Balazs

Re: Tuning PostgreSQL

From

Josh Berkus

Date:

28 July 2003, 17:03:35

Balasz,

> Since there seem to be a lot of different opinions regarding the various
> different RAID configurations I thought I'd post this link to the list:
> http://www.storagereview.com/guide2000/ref/hdd/perf/raid/index.html

Yeah ... this is  a really good article.  Made me realize why "stripey" RAID
sucks for OLTP databases, unless you throw a lot of platters at them.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Tuning PostgreSQL

From

Vivek Khera

Date:

29 July 2003, 12:28:13

>>>>> "GS" == Greg Stark <gsstark@mit.edu> writes:

GS> "scott.marlowe" <scott.marlowe@ihs.com> writes:

GS> But you have to actually test your setup in practice to see if it
GS> hurts. A big data warehousing system will be faster under RAID5
GS> than under RAID1+0 because of the extra disks in the
GS> stripeset. The more disks in the stripeset the more bandwidth you
GS> get.

Anyone have ideas on 14 spindles?  I just ordered a disk subsystem
with 14 high speed (U320 15kRPM) SCSI disks to hook up with a dell
PERC3/DC controller (only 128MB cache, though).

My plan was to do RAID10, but I think I'll play with RAID5 again and
see which gives me the best performance.  Unfortunatly, it is
difficult to recreate the highly fragmented tables I have now (vacuum
full takes over 14 hours on one of the tables) so I'm not sure how to
best compare them.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: Tuning PostgreSQL

From

Ron Johnson

Date:

29 July 2003, 12:46:20

On Tue, 2003-07-29 at 10:14, Vivek Khera wrote:
> >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes:
>
> GS> "scott.marlowe" <scott.marlowe@ihs.com> writes:
>
> GS> But you have to actually test your setup in practice to see if it
> GS> hurts. A big data warehousing system will be faster under RAID5
> GS> than under RAID1+0 because of the extra disks in the
> GS> stripeset. The more disks in the stripeset the more bandwidth you
> GS> get.
>
> Anyone have ideas on 14 spindles?  I just ordered a disk subsystem
> with 14 high speed (U320 15kRPM) SCSI disks to hook up with a dell
> PERC3/DC controller (only 128MB cache, though).

14 drives on one SCSI card, eh?  I'd be worried about saturating
the bus.

Maybe it's an old rule of thumb, but I would fill a SCSI chain
more than half full.

> My plan was to do RAID10, but I think I'll play with RAID5 again and
> see which gives me the best performance.  Unfortunatly, it is
> difficult to recreate the highly fragmented tables I have now (vacuum
> full takes over 14 hours on one of the tables) so I'm not sure how to
> best compare them.

Also IMO: if I needed something *really* high performance, I'd
start with a mobo that has dual PCI buses (133MB/s can get swamped
quickly by U320 devices) or PCI-X (but they're so new...).

Then:
- get dual U320 SCSI cards (one for each PCI bus)
- plug them into dual redundant fast path external storage controllers
  that have, oh, 512MB RAM cache *each*)
- dual port the drives, so they plug into both storage controllers

Since I wouldn't put 14 drives on one SCSI chain, double what I
just said, and only plug 7 drives in each controller.

If your app needs One Big Honkin' Device, use the Linux Volume
Manager (LVM) to merge the 2 RAID logical devices into one "super-
logical" device.

Yes, that's lot's of money, but is the data, and speed, important
enough?

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+

Re: Tuning PostgreSQL

From

Will LaShell

Date:

29 July 2003, 13:31:41

On Tue, 2003-07-29 at 08:14, Vivek Khera wrote:
> >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes:
>
> GS> "scott.marlowe" <scott.marlowe@ihs.com> writes:
>
> GS> But you have to actually test your setup in practice to see if it
> GS> hurts. A big data warehousing system will be faster under RAID5
> GS> than under RAID1+0 because of the extra disks in the
> GS> stripeset. The more disks in the stripeset the more bandwidth you
> GS> get.
>
> Anyone have ideas on 14 spindles?  I just ordered a disk subsystem
> with 14 high speed (U320 15kRPM) SCSI disks to hook up with a dell
> PERC3/DC controller (only 128MB cache, though).
<SNIP>

Hey one comment on this. With dell Perc3/DC  you should check the
megaraid-devel list to find the best BIOS settings for maximum
performance.  There have been many comments on it and trials to get it
going really well.  All told though I totally love the LSI Megaraid (
which is what the perc3/dc is ) controllers. We use the Elite 1650  with
seagate cheetah drives for a nice little array.

--Will

Attachment

signature.asc

Re: Tuning PostgreSQL

From

"scott.marlowe"

Date:

29 July 2003, 13:35:48

On 29 Jul 2003, Ron Johnson wrote:

> On Tue, 2003-07-29 at 10:14, Vivek Khera wrote:
> > >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes:
> >
> > GS> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> >
> > GS> But you have to actually test your setup in practice to see if it
> > GS> hurts. A big data warehousing system will be faster under RAID5
> > GS> than under RAID1+0 because of the extra disks in the
> > GS> stripeset. The more disks in the stripeset the more bandwidth you
> > GS> get.
> >
> > Anyone have ideas on 14 spindles?  I just ordered a disk subsystem
> > with 14 high speed (U320 15kRPM) SCSI disks to hook up with a dell
> > PERC3/DC controller (only 128MB cache, though).
>
> 14 drives on one SCSI card, eh?  I'd be worried about saturating
> the bus.

I'm pretty sure those PERCs are based on the megaraid cards, which can
handle 3 or 4 channels each...

> Maybe it's an old rule of thumb, but I would fill a SCSI chain
> more than half full.

It's an old rule of thumb, but it still applies, it just takes more drives
to saturate the channel.  Figure ~ 30 to 50 MBytes a second per drive, on
a U320 port it would take 10 drives to saturate it, and considering random
accesses will be much slower than the max ~30 megs a second off the
platter rate, it might take more than the max 14 drives to saturate U320.

Re: Tuning PostgreSQL

From

Ron Johnson

Date:

29 July 2003, 15:08:49

On Tue, 2003-07-29 at 11:18, scott.marlowe wrote:
> On 29 Jul 2003, Ron Johnson wrote:
>
> > On Tue, 2003-07-29 at 10:14, Vivek Khera wrote:
> > > >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes:
> > >
> > > GS> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > >
> > > GS> But you have to actually test your setup in practice to see if it
> > > GS> hurts. A big data warehousing system will be faster under RAID5
> > > GS> than under RAID1+0 because of the extra disks in the
> > > GS> stripeset. The more disks in the stripeset the more bandwidth you
> > > GS> get.
> > >
> > > Anyone have ideas on 14 spindles?  I just ordered a disk subsystem
> > > with 14 high speed (U320 15kRPM) SCSI disks to hook up with a dell
> > > PERC3/DC controller (only 128MB cache, though).
> >
> > 14 drives on one SCSI card, eh?  I'd be worried about saturating
> > the bus.
>
> I'm pretty sure those PERCs are based on the megaraid cards, which can
> handle 3 or 4 channels each...

Each with 14 devices?  If so, isn't that a concentrated point of
failure, even if the channels are 1/2 full?

> > Maybe it's an old rule of thumb, but I would fill a SCSI chain
> > more than half full.
>
> It's an old rule of thumb, but it still applies, it just takes more drives
> to saturate the channel.  Figure ~ 30 to 50 MBytes a second per drive, on
> a U320 port it would take 10 drives to saturate it, and considering random
> accesses will be much slower than the max ~30 megs a second off the
> platter rate, it might take more than the max 14 drives to saturate U320.

Ok.  You'd still saturate the 133MB/s PCI bus at 133/30 = 4.4 drives.

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+

Re: Tuning PostgreSQL

From

"scott.marlowe"

Date:

29 July 2003, 16:35:01

On 29 Jul 2003, Ron Johnson wrote:

> On Tue, 2003-07-29 at 11:18, scott.marlowe wrote:
> > On 29 Jul 2003, Ron Johnson wrote:
> >
> > > On Tue, 2003-07-29 at 10:14, Vivek Khera wrote:
> > > > >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes:
> > > >
> > > > GS> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > > >
> > > > GS> But you have to actually test your setup in practice to see if it
> > > > GS> hurts. A big data warehousing system will be faster under RAID5
> > > > GS> than under RAID1+0 because of the extra disks in the
> > > > GS> stripeset. The more disks in the stripeset the more bandwidth you
> > > > GS> get.
> > > >
> > > > Anyone have ideas on 14 spindles?  I just ordered a disk subsystem
> > > > with 14 high speed (U320 15kRPM) SCSI disks to hook up with a dell
> > > > PERC3/DC controller (only 128MB cache, though).
> > >
> > > 14 drives on one SCSI card, eh?  I'd be worried about saturating
> > > the bus.
> >
> > I'm pretty sure those PERCs are based on the megaraid cards, which can
> > handle 3 or 4 channels each...
>
> Each with 14 devices?  If so, isn't that a concentrated point of
> failure, even if the channels are 1/2 full?

Yep.  I've built one once before when BIG hard drives were 9 gigs.  :-)

And it is a point of concentrated failure, which brings me to my favorite
part about the LSI megaraid cards (which most / all perc3s are
apparently.)

If you build a RAID1+0 or 0+1, you can seperate it out so each sub part is
on it's own card, and the other cards keep acting like one big card.
Assuming the bad card isn't killing your PCI bus or draining the 12V rail
or something.

> > > Maybe it's an old rule of thumb, but I would fill a SCSI chain
> > > more than half full.
> >
> > It's an old rule of thumb, but it still applies, it just takes more drives
> > to saturate the channel.  Figure ~ 30 to 50 MBytes a second per drive, on
> > a U320 port it would take 10 drives to saturate it, and considering random
> > accesses will be much slower than the max ~30 megs a second off the
> > platter rate, it might take more than the max 14 drives to saturate U320.
>
> Ok.  You'd still saturate the 133MB/s PCI bus at 133/30 = 4.4 drives.

But that's seq scan.  For many database applications, random access
performance is much more important.  Imagine 200 people entering
reservations of 8k or less each into a transaction processing engine.
Each transactions chance to hit an unoccupied spindle is what really
counts.  If there's 30 spindles, each doing a stripe's worth of access all
the time, it's likely to never flood the channel.

If random access is 1/4th the speed of seq scan, then you need to multiply
it by 4 to get the number of drives that'd saturate the PCI bus.

Re: Tuning PostgreSQL

From

Ron Johnson

Date:

29 July 2003, 16:52:16

On Tue, 2003-07-29 at 14:00, scott.marlowe wrote:
> On 29 Jul 2003, Ron Johnson wrote:
>
> > On Tue, 2003-07-29 at 11:18, scott.marlowe wrote:
> > > On 29 Jul 2003, Ron Johnson wrote:
> > >
> > > > On Tue, 2003-07-29 at 10:14, Vivek Khera wrote:
> > > > > >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes:
> > > > >
> > > > > GS> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > > > >
> > > > > GS> But you have to actually test your setup in practice to see if it
> > > > > GS> hurts. A big data warehousing system will be faster under RAID5
> > > > > GS> than under RAID1+0 because of the extra disks in the
> > > > > GS> stripeset. The more disks in the stripeset the more bandwidth you
> > > > > GS> get.
> > > > >
> > > > > Anyone have ideas on 14 spindles?  I just ordered a disk subsystem
> > > > > with 14 high speed (U320 15kRPM) SCSI disks to hook up with a dell
> > > > > PERC3/DC controller (only 128MB cache, though).
> > > >
> > > > 14 drives on one SCSI card, eh?  I'd be worried about saturating
> > > > the bus.
> > >
> > > I'm pretty sure those PERCs are based on the megaraid cards, which can
> > > handle 3 or 4 channels each...
> >
> > Each with 14 devices?  If so, isn't that a concentrated point of
> > failure, even if the channels are 1/2 full?
>
> Yep.  I've built one once before when BIG hard drives were 9 gigs.  :-)
>
> And it is a point of concentrated failure, which brings me to my favorite
> part about the LSI megaraid cards (which most / all perc3s are
> apparently.)
>
> If you build a RAID1+0 or 0+1, you can seperate it out so each sub part is
> on it's own card, and the other cards keep acting like one big card.
> Assuming the bad card isn't killing your PCI bus or draining the 12V rail
> or something.

Sounds like my kinda card!

Is the cache battery-backed up?

How much cache can you stuff in them?



> > > > Maybe it's an old rule of thumb, but I would fill a SCSI chain
> > > > more than half full.
> > >
> > > It's an old rule of thumb, but it still applies, it just takes more drives
> > > to saturate the channel.  Figure ~ 30 to 50 MBytes a second per drive, on
> > > a U320 port it would take 10 drives to saturate it, and considering random
> > > accesses will be much slower than the max ~30 megs a second off the
> > > platter rate, it might take more than the max 14 drives to saturate U320.
> >
> > Ok.  You'd still saturate the 133MB/s PCI bus at 133/30 = 4.4 drives.
>
> But that's seq scan.  For many database applications, random access
> performance is much more important.  Imagine 200 people entering
> reservations of 8k or less each into a transaction processing engine.
> Each transactions chance to hit an unoccupied spindle is what really
> counts.  If there's 30 spindles, each doing a stripe's worth of access all
> the time, it's likely to never flood the channel.
>
> If random access is 1/4th the speed of seq scan, then you need to multiply
> it by 4 to get the number of drives that'd saturate the PCI bus.

Maybe it's just me, but I've never seen a purely TP system.

Even if roll off the daily updates to a "reporting database" each
night, some yahoo manager with enough juice to have his way still
wants up-to-the-minute reports...

Better yet, the Access Jockey, who thinks s/he's an SQL whiz but
couldn't JOIN himself out of a paper bag...

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+

Re: Tuning PostgreSQL

From

"scott.marlowe"

Date:

29 July 2003, 17:24:28

On 29 Jul 2003, Ron Johnson wrote:

> On Tue, 2003-07-29 at 14:00, scott.marlowe wrote:
> > On 29 Jul 2003, Ron Johnson wrote:
> >
> > > On Tue, 2003-07-29 at 11:18, scott.marlowe wrote:
> > > > On 29 Jul 2003, Ron Johnson wrote:
> > > >
> > > > > On Tue, 2003-07-29 at 10:14, Vivek Khera wrote:
> > > > > > >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes:
> > > > > >
> > > > > > GS> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > > > > >
> > > > > > GS> But you have to actually test your setup in practice to see if it
> > > > > > GS> hurts. A big data warehousing system will be faster under RAID5
> > > > > > GS> than under RAID1+0 because of the extra disks in the
> > > > > > GS> stripeset. The more disks in the stripeset the more bandwidth you
> > > > > > GS> get.
> > > > > >
> > > > > > Anyone have ideas on 14 spindles?  I just ordered a disk subsystem
> > > > > > with 14 high speed (U320 15kRPM) SCSI disks to hook up with a dell
> > > > > > PERC3/DC controller (only 128MB cache, though).
> > > > >
> > > > > 14 drives on one SCSI card, eh?  I'd be worried about saturating
> > > > > the bus.
> > > >
> > > > I'm pretty sure those PERCs are based on the megaraid cards, which can
> > > > handle 3 or 4 channels each...
> > >
> > > Each with 14 devices?  If so, isn't that a concentrated point of
> > > failure, even if the channels are 1/2 full?
> >
> > Yep.  I've built one once before when BIG hard drives were 9 gigs.  :-)
> >
> > And it is a point of concentrated failure, which brings me to my favorite
> > part about the LSI megaraid cards (which most / all perc3s are
> > apparently.)
> >
> > If you build a RAID1+0 or 0+1, you can seperate it out so each sub part is
> > on it's own card, and the other cards keep acting like one big card.
> > Assuming the bad card isn't killing your PCI bus or draining the 12V rail
> > or something.
>
> Sounds like my kinda card!
>
> Is the cache battery-backed up?

Yep

> How much cache can you stuff in them?

the old old old school MegaRAID428 could hold up to 128 Meg.  I'm sure the
new ones can handle 512Meg or more.

> > > > > Maybe it's an old rule of thumb, but I would fill a SCSI chain
> > > > > more than half full.
> > > >
> > > > It's an old rule of thumb, but it still applies, it just takes more drives
> > > > to saturate the channel.  Figure ~ 30 to 50 MBytes a second per drive, on
> > > > a U320 port it would take 10 drives to saturate it, and considering random
> > > > accesses will be much slower than the max ~30 megs a second off the
> > > > platter rate, it might take more than the max 14 drives to saturate U320.
> > >
> > > Ok.  You'd still saturate the 133MB/s PCI bus at 133/30 = 4.4 drives.
> >
> > But that's seq scan.  For many database applications, random access
> > performance is much more important.  Imagine 200 people entering
> > reservations of 8k or less each into a transaction processing engine.
> > Each transactions chance to hit an unoccupied spindle is what really
> > counts.  If there's 30 spindles, each doing a stripe's worth of access all
> > the time, it's likely to never flood the channel.
> >
> > If random access is 1/4th the speed of seq scan, then you need to multiply
> > it by 4 to get the number of drives that'd saturate the PCI bus.
>
> Maybe it's just me, but I've never seen a purely TP system.

I think most of them are running under TPF on a mainframe in a basement
somewhere, like for airline reservations.  I've never worked on one, but
met one of the guys who runs one, and they use 12 mainframes for 6 live
machines and each live machine has a failover machine behind it in sysplex
mode.  I kept thinking of the giant dinosaurs in Jurassic park...

> Even if roll off the daily updates to a "reporting database" each
> night, some yahoo manager with enough juice to have his way still
> wants up-to-the-minute reports...

Just because it's TP doesn't mean it doesn't have real time reporting.
But expensive reports probably do get run at night.

> Better yet, the Access Jockey, who thinks s/he's an SQL whiz but
> couldn't JOIN himself out of a paper bag...

I've seen a few who got joins and unions and what not, but explaining fks
or transactions got me a glazed look... :-)

Re: Tuning PostgreSQL

From

Ron Johnson

Date:

29 July 2003, 17:38:39

On Tue, 2003-07-29 at 15:09, scott.marlowe wrote:
> On 29 Jul 2003, Ron Johnson wrote:
>
> > On Tue, 2003-07-29 at 14:00, scott.marlowe wrote:
> > > On 29 Jul 2003, Ron Johnson wrote:
> > >
> > > > On Tue, 2003-07-29 at 11:18, scott.marlowe wrote:
> > > > > On 29 Jul 2003, Ron Johnson wrote:
> > > > >
> > > > > > On Tue, 2003-07-29 at 10:14, Vivek Khera wrote:
> > > > > > > >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes:
> > > > > > >
> > > > > > > GS> "scott.marlowe" <scott.marlowe@ihs.com> writes:
[snip]
> > > But that's seq scan.  For many database applications, random access
> > > performance is much more important.  Imagine 200 people entering
> > > reservations of 8k or less each into a transaction processing engine.
> > > Each transactions chance to hit an unoccupied spindle is what really
> > > counts.  If there's 30 spindles, each doing a stripe's worth of access all
> > > the time, it's likely to never flood the channel.
> > >
> > > If random access is 1/4th the speed of seq scan, then you need to multiply
> > > it by 4 to get the number of drives that'd saturate the PCI bus.
> >
> > Maybe it's just me, but I've never seen a purely TP system.
>
> I think most of them are running under TPF on a mainframe in a basement
> somewhere, like for airline reservations.  I've never worked on one, but
> met one of the guys who runs one, and they use 12 mainframes for 6 live
> machines and each live machine has a failover machine behind it in sysplex
> mode.  I kept thinking of the giant dinosaurs in Jurassic park...

We have something similar running on Alphas and VMS; does about
8M Txn/day.  Anyone who uses E-ZPass in the northeast eventually
gets stuck in our systems.

(Made me fear Big Brother...)

> > Even if roll off the daily updates to a "reporting database" each
> > night, some yahoo manager with enough juice to have his way still
> > wants up-to-the-minute reports...
>
> Just because it's TP doesn't mean it doesn't have real time reporting.
> But expensive reports probably do get run at night.

Yes, but...  There's always the exception.

> > Better yet, the Access Jockey, who thinks s/he's an SQL whiz but
> > couldn't JOIN himself out of a paper bag...
>
> I've seen a few who got joins and unions and what not, but explaining fks
> or transactions got me a glazed look... :-)

Wow!  They understood joins?  You lucky dog!!!

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+

Re: Tuning PostgreSQL, pt 2

From

Ron Johnson

Date:

30 July 2003, 11:45:58

On Tue, 2003-07-29 at 15:38, Ron Johnson wrote:
> On Tue, 2003-07-29 at 15:09, scott.marlowe wrote:
> > On 29 Jul 2003, Ron Johnson wrote:
> >
> > > On Tue, 2003-07-29 at 14:00, scott.marlowe wrote:
> > > > On 29 Jul 2003, Ron Johnson wrote:
> > > >
> > > > > On Tue, 2003-07-29 at 11:18, scott.marlowe wrote:
> > > > > > On 29 Jul 2003, Ron Johnson wrote:
> > > > > >
> > > > > > > On Tue, 2003-07-29 at 10:14, Vivek Khera wrote:
> > > > > > > > >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes:
> > > > > > > >
> > > > > > > > GS> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> [snip]
[snip]
> > I think most of them are running under TPF on a mainframe in a basement
> > somewhere, like for airline reservations.  I've never worked on one, but
> > met one of the guys who runs one, and they use 12 mainframes for 6 live
> > machines and each live machine has a failover machine behind it in sysplex
> > mode.  I kept thinking of the giant dinosaurs in Jurassic park...
>
> We have something similar running on Alphas and VMS; does about
> 8M Txn/day.  Anyone who uses E-ZPass in the northeast eventually
> gets stuck in our systems.

Oh, forget to mention:
yes, they are in a 2-deep basement.

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+

Re: Tuning PostgreSQL

From

Vivek Khera

Date:

31 July 2003, 16:57:57

>>>>> "RJ" == Ron Johnson <ron.l.johnson@cox.net> writes:

RJ> On Tue, 2003-07-29 at 14:00, scott.marlowe wrote:

RJ> Sounds like my kinda card!

RJ> Is the cache battery-backed up?

yep


RJ> How much cache can you stuff in them?

as per dell, the max is 128Mb, which was a bummer.


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

'View'-performance

From

"Alexander Priem"

Date:

14 August 2003, 08:44:43

Hi all,

I am wondering about something: I have a table (let's call it "Table") which will contain a lot of records. Every record has a field named "deleted" which can be either NULL or a date value. When this field is NULL, the record in question may be used by a program. If the field contains a date, this field must be considered as "deleted" and cannot be used anymore.

The reason why I don't actually delete such records is that I want to be able to reference them from other locations (for history purposes).

What I am thinking about is creating two views for this table: Table_View and Table_History. Table_View would contain all records where "Deleted is null". Table_History would just contain all records (Select * From Table).

In my program most queries would need to view only the records where deleted is null.

Would " Select * from Table_View Where Name='xxx' " perform worse than " Select * from Table Where deleted is null and Name='xxx' " ?

I ask this because I would like it if I wouldn't have to type "where deleted is null" for about every query in my program. But I will not use this strategy if this would mean serious performance loss...

Thanks in Advance,

Alexander Priem.

Re: 'View'-performance

From

Tom Lane

Date:

14 August 2003, 09:03:39

"Alexander Priem" <ap@cict.nl> writes:
> What I am thinking about is creating two views for this table: Table_View a=
> nd Table_History. Table_View would contain all records where "Deleted is nu=
> ll". Table_History would just contain all records (Select * From Table).

> Would " Select * from Table_View Where Name=3D'xxx' " perform worse than " =
> Select * from Table Where deleted is null and Name=3D'xxx' " ?

They'd be exactly the same (modulo a few extra microseconds/milliseconds
for the query planner to expand the view definition).

            regards, tom lane