Thread: Performance tuning in PostgreSQL

Performance tuning in PostgreSQL

From
Abhishek Sharma
Date:
Hello everybody,



Can someone tell me the various aspects of Hardware for performance tuning
in PostgreSQL 7.3 on a RHL 8 server.


How can I determine the required hardware configuration(CPUs,RAM,Shared
Buffer Space,OS TCP/IP ports) for n number of connections or n number of
users accessing the database?


Please help.



Thanks,
Best Regards.

Abhishek Sharma,


Re: Performance tuning in PostgreSQL

From
dev@archonet.com
Date:
> Can someone tell me the various aspects of Hardware for performance tuning
> in PostgreSQL 7.3 on a RHL 8 server.
>
> How can I determine the required hardware configuration(CPUs,RAM,Shared
> Buffer Space,OS TCP/IP ports) for n number of connections or n number of
> users accessing the database?

The bad news is there's only one way and that's to test. Without knowing
how much data you have, in what tables and how you are querying/updating
it you really can't say much useful.

A few points though.

CPU: PostgreSQL will use multiple CPUs but one query runs on only one CPU.
RAM: more is better, pretty much always
Disk: fast and see the archives for details on placing the WAL on its own
disk.
Tuning: there's a performance mailing list, and some stuff on
techdocs.postgresql.org

Hope that's some use

- Richard Huxton


Re: Performance tuning in PostgreSQL

From
Abhishek Sharma
Date:
Hi,

Tx for your reply.

Is there any way to determine the size of a process per connection.

What is it dependent on ?

What variables affect the size of a process by a user accessing a table in
the database ?

Postgresql can be configured with a --max-backends options which means the
no. of connections which can be established at any given time to the
server,which also means that there will be an equal no. of process.

In my opinion there should be some computation with regards to the amount of
RAM or shared buffer space and the no. of processes and size of these
processes.

I am using 256 MB Ram on a RHL 8 system. I have made the shared buffer space
190 MB i.e.. 75 % of my system. My database is going to be almost 80 GB.




Advice needed.



Tx,
Best regards,

Abhishek.




-----Original Message-----
From: dev@archonet.com [mailto:dev@archonet.com]
Sent: Tuesday, March 25, 2003 11:54 PM
To: Abhishek Sharma
Cc: Pgsql-General
Subject: Re: [GENERAL] Performance tuning in PostgreSQL


> Can someone tell me the various aspects of Hardware for performance tuning
> in PostgreSQL 7.3 on a RHL 8 server.
>
> How can I determine the required hardware configuration(CPUs,RAM,Shared
> Buffer Space,OS TCP/IP ports) for n number of connections or n number of
> users accessing the database?

The bad news is there's only one way and that's to test. Without knowing
how much data you have, in what tables and how you are querying/updating
it you really can't say much useful.

A few points though.

CPU: PostgreSQL will use multiple CPUs but one query runs on only one CPU.
RAM: more is better, pretty much always
Disk: fast and see the archives for details on placing the WAL on its own
disk.
Tuning: there's a performance mailing list, and some stuff on
techdocs.postgresql.org

Hope that's some use

- Richard Huxton


Re: Performance tuning in PostgreSQL

From
"scott.marlowe"
Date:
On Wed, 26 Mar 2003, Abhishek Sharma wrote:

> Is there any way to determine the size of a process per connection.

The delta of a new process is very small.  Since most of the memory it
will be accessing will be shared memory, and since most flavors of unix
just run the code in the same place as the other copies of it, the only
delta would be whatever small memory the individual process needs for
stack and local vars.  It's not much.

23234 postgres  12   0 66676  65M 62192 D    49.4  4.3   0:04 postmaster
23149 postgres   8   0  4216 4216  4136 S     0.0  0.2   1:15 postmaster
23150 postgres   9   0  5196 5196  4156 S     0.0  0.3   0:11 postmaster
23151 postgres   9   0  4780 4780  4148 S     0.0  0.3   0:08 postmaster
 3665 postgres   9   0 10096 9.8M  9024 S     0.0  0.6   0:09 postmaster
 3666 postgres   9   0 10764  10M  9700 S     0.0  0.6   0:21 postmaster

Here we see my box's postmasters that are up and running, some are for
psql, some are for web pages, one is my running an insane 'select * from
bigtable order by random()'

The first number after the 0 column is size, the third one is the shared
memory it's using.  Notice the delta on these is 1 to 4 megabytes or so.
The query doing the heavy lifting is about 4 megs, the others are about 1
meg deltas.

> What is it dependent on ?

EVERYTHING.  I.e. what the backend is doing will determine the amount of
memory it is using.  Plus the settings in postgresql.conf for things like
sort_mem and buffers.

> What variables affect the size of a process by a user accessing a table in
> the database ?
>
> Postgresql can be configured with a --max-backends options which means the
> no. of connections which can be established at any given time to the
> server,which also means that there will be an equal no. of process.
>
> In my opinion there should be some computation with regards to the amount of
> RAM or shared buffer space and the no. of processes and size of these
> processes.

Well, it's not that simple.  After experimenting with postgresql, if you
feel you have a handle on how to compute it, I'm sure it would gladly
accepted as a useful tool by all of the folks who use postgresql.

> I am using 256 MB Ram on a RHL 8 system. I have made the shared buffer space
> 190 MB i.e.. 75 % of my system. My database is going to be almost 80 GB.

Wow, that is seriously underpowered in the RAM department.  My 3 year old
server that handles web/database/ldap services has 1.5Gig and uses about
600 Megs of ram for programs with about 800+Megs for kernel cache and
buffer.

Also, it's not a good idea to just give postgresql huge amounts of buffer
memory.  Generally, the kernel is better at buffering the disk than
postgresql is, and cranking up postgresql's buffers to >50% of the
available RAM means that the kernel will always be playing catch up with
it's buffers, and postgresql will be the only layer buffering.

While we're at it, don't set sort_mem real high either, especially if you
plan on handling lots of users at the same time, as each sort is limited
to sort_mem, which means that a query with three sorts in it could use
3*sort_mem memory, and if that query gets run by a dozen people at once,
then you'd be looking at 3*12*sort_mem usage.  8 meg is a good
intermediate setting for sort_mem for most folks.

Recommendations:  Go to at least 1 Gig of ram.  Give 256 Megs or so to
postgresql buffers.  Anything after that is likely to not make for any
faster performance.  If you can fit in more than 1 gig then do so.  Memory
is your biggest limitation right now.

Use a fast RAID array setup.  Lots of disks in a RAID 5 is a good
compromise of performance and storage space.  Large RAID 0 setups are the
fastest, but a single drive failure can result in all your data being
lost.  >2 disks in a RAID 1 is a good setup for something that is mostly
read (95% reads or more).  Think data warehouse.

The problem is that you're asking how to optimize postgresql but how you
optimize any database is greatly affected by the type of load you're
looking at.  If you run batch files at night on 10G data files, then serve
them read only during the day, your needs are going to be vastly different
than if you are running a dynamic multi-user system with lots of writes
going on.

But you can't go wrong with more RAM.  How much can your server hold?
Memory bandwidth is more important than CPU speed for most postgresql
applications, and the same is true for the drives, having lots of little
fast drives is way better than one or two big slower ones.  SCSI is almost
always faster than IDE, all other things being equal (i.e. my 80 gig IDE
"mass storage" drives are way faster than a 2 Gig Ultra Wide SCSI drive
from 6 years ago would be, but any modern SCSI drive will kick the butt on
my IDE drives.


Re: Performance tuning in PostgreSQL

From
"Daniel R. Anderson"
Date:
<snip>
> SCSI is almost
> always faster than IDE, all other things being equal (i.e. my 80 gig IDE
> "mass storage" drives are way faster than a 2 Gig Ultra Wide SCSI drive
> from 6 years ago would be, but any modern SCSI drive will kick the butt on
> my IDE drives.
</snip>

That's not /entirely/ true.  There was an article on slashdot a while
back about what exactly the differences between IDE and SCSI are.  IDE
has pretty much almost caught up to SCSI in terms of everything except
testing -- i.e. one of the reasons SCSI drives cost so much more is that
they are each run through extensive individual tests to make sure
they're not gonna break down 5 minutes out of the box.

The only other difference, if I remember correctly, was the amount of
drives you could put on the same cable.  I'm going out on a limb here,
but while ATA133 or whatever you're running /needs/ a single cable and
controller to itself SCSI can put several drives on the same cable while
maintaining speed.

So the good news is that if money is tight you could probably justify an
IDE raid, but if you really need that extra reliability SCSI might be
the answer.
--
Daniel R. Anderson
Chief Lab Rat and Helper Monkey
Great Lakes Industries, Inc.
80 Pineview Ave.
Buffalo, NY 14218
(716) 691-5900 x218

"Never let your schooling interfere with your education"
    -- Mark Twain


Re: Performance tuning in PostgreSQL

From
Dennis Gearon
Date:
In General, the rotational speed is higher on SCSCI disks, and this increases
the tranfer rate from the disc, which is the limitation for anything not in the
disk's cache. Given the same areal dinsity, a 15,000 SCSI drive will be 50%
faster in tranfer rate than a 10,000 IDE drive.

Daniel R. Anderson wrote:
> <snip>
>
>>SCSI is almost
>>always faster than IDE, all other things being equal (i.e. my 80 gig IDE
>>"mass storage" drives are way faster than a 2 Gig Ultra Wide SCSI drive
>>from 6 years ago would be, but any modern SCSI drive will kick the butt on
>>my IDE drives.
>
> </snip>
>
> That's not /entirely/ true.  There was an article on slashdot a while
> back about what exactly the differences between IDE and SCSI are.  IDE
> has pretty much almost caught up to SCSI in terms of everything except
> testing -- i.e. one of the reasons SCSI drives cost so much more is that
> they are each run through extensive individual tests to make sure
> they're not gonna break down 5 minutes out of the box.
>
> The only other difference, if I remember correctly, was the amount of
> drives you could put on the same cable.  I'm going out on a limb here,
> but while ATA133 or whatever you're running /needs/ a single cable and
> controller to itself SCSI can put several drives on the same cable while
> maintaining speed.
>
> So the good news is that if money is tight you could probably justify an
> IDE raid, but if you really need that extra reliability SCSI might be
> the answer.


Re: Performance tuning in PostgreSQL

From
"Daniel R. Anderson"
Date:
On Wed, 2003-03-26 at 12:56, Dennis Gearon wrote:
> In General, the rotational speed is higher on SCSCI disks, and this increases
> the tranfer rate from the disc, which is the limitation for anything not in the
> disk's cache. Given the same areal dinsity, a 15,000 SCSI drive will be 50%
> faster in tranfer rate than a 10,000 IDE drive.

For anybody interested I got the story off of slashdot:

http://slashdot.org/article.pl?sid=03/02/21/0553249&mode=thread&tid=137

The claim is that these ATA drives have "SCSI-like specs at 30% less of
the price".  SCSI-LIKE != SCSI though.  :-(

--
Daniel R. Anderson
Chief Lab Rat and Helper Monkey
Great Lakes Industries, Inc.
80 Pineview Ave.
Buffalo, NY 14218
(716) 691-5900 x218

"Never let your schooling interfere with your education"
    -- Mark Twain


Re: Performance tuning in PostgreSQL

From
"Daniel R. Anderson"
Date:
In case you're interested in a do it yourself Fiber channel array check
out:

http://slashdot.org/article.pl?sid=03/03/16/0449201&mode=thread&tid=188&tid=137

(Fiber Channel means up to 10 Gbps!!!!!)

--
Daniel R. Anderson
Chief Lab Rat and Helper Monkey
Great Lakes Industries, Inc.
80 Pineview Ave.
Buffalo, NY 14218
(716) 691-5900 x218

"Never let your schooling interfere with your education"
    -- Mark Twain


Re: Performance tuning in PostgreSQL

From
Dennis Gearon
Date:
I read a recent article on Tom's hardware that said, even given the same speed
of data from the read heads and the same buffer size, a SCSI drive will work
better for a server, and the IDE drive will work better for the desktop. The
caching algorithms are optimised with the assumption that a SCSI drive will BE
on a server and an IDE drive will BE on a desktop.

Daniel R. Anderson wrote:
> On Wed, 2003-03-26 at 12:56, Dennis Gearon wrote:
>
>>In General, the rotational speed is higher on SCSCI disks, and this increases
>>the tranfer rate from the disc, which is the limitation for anything not in the
>>disk's cache. Given the same areal dinsity, a 15,000 SCSI drive will be 50%
>>faster in tranfer rate than a 10,000 IDE drive.
>
>
> For anybody interested I got the story off of slashdot:
>
> http://slashdot.org/article.pl?sid=03/02/21/0553249&mode=thread&tid=137
>
> The claim is that these ATA drives have "SCSI-like specs at 30% less of
> the price".  SCSI-LIKE != SCSI though.  :-(
>


Re: Performance tuning in PostgreSQL

From
Arjen van der Meijden
Date:
I've read something similar indeed.
Aswell as most IDE disks really do have less quality components. Scsi
disks have also smaller, thicker platters to enable them to run more
stable and on higher speeds, larger magnets etc etc.

For those of you who can read Dutch:
http://www.tweakers.net/reviews/362/3 In the second-last paragraph is a
summary of Maxtor's view on the differences between ATA and SCSI.

Regards,

Arjen

> -----Oorspronkelijk bericht-----
> Van: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] Namens Dennis Gearon
> Verzonden: donderdag 27 maart 2003 18:10
> Aan: Daniel R. Anderson
> CC: Pgsql-General
> Onderwerp: Re: [GENERAL] Performance tuning in PostgreSQL
>
>
> I read a recent article on Tom's hardware that said, even
> given the same speed
> of data from the read heads and the same buffer size, a SCSI
> drive will work
> better for a server, and the IDE drive will work better for
> the desktop. The
> caching algorithms are optimised with the assumption that a
> SCSI drive will BE
> on a server and an IDE drive will BE on a desktop.
>
> Daniel R. Anderson wrote:
> > On Wed, 2003-03-26 at 12:56, Dennis Gearon wrote:
> >
> >>In General, the rotational speed is higher on SCSCI disks, and this
> >>increases
> >>the tranfer rate from the disc, which is the limitation for
> anything not in the
> >>disk's cache. Given the same areal dinsity, a 15,000 SCSI
> drive will be 50%
> >>faster in tranfer rate than a 10,000 IDE drive.
> >
> >
> > For anybody interested I got the story off of slashdot:
> >
> >
> http://slashdot.org/article.pl?sid=>
03/02/21/0553249&mode=thread&tid=13
> > 7
> >
> > The claim is that these ATA drives have "SCSI-like specs at
> 30% less
> > of the price".  SCSI-LIKE != SCSI though.  :-(
> >
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>