Re: Performance tuning in PostgreSQL - Mailing list pgsql-general

From scott.marlowe
Subject Re: Performance tuning in PostgreSQL
Date
Msg-id Pine.LNX.4.33.0303260909440.27731-100000@css120.ihs.com
Whole thread Raw
In response to Re: Performance tuning in PostgreSQL  (Abhishek Sharma <abhisheks@dpsl.net>)
Responses Re: Performance tuning in PostgreSQL  ("Daniel R. Anderson" <dan@mathjunkies.com>)
List pgsql-general
On Wed, 26 Mar 2003, Abhishek Sharma wrote:

> Is there any way to determine the size of a process per connection.

The delta of a new process is very small.  Since most of the memory it
will be accessing will be shared memory, and since most flavors of unix
just run the code in the same place as the other copies of it, the only
delta would be whatever small memory the individual process needs for
stack and local vars.  It's not much.

23234 postgres  12   0 66676  65M 62192 D    49.4  4.3   0:04 postmaster
23149 postgres   8   0  4216 4216  4136 S     0.0  0.2   1:15 postmaster
23150 postgres   9   0  5196 5196  4156 S     0.0  0.3   0:11 postmaster
23151 postgres   9   0  4780 4780  4148 S     0.0  0.3   0:08 postmaster
 3665 postgres   9   0 10096 9.8M  9024 S     0.0  0.6   0:09 postmaster
 3666 postgres   9   0 10764  10M  9700 S     0.0  0.6   0:21 postmaster

Here we see my box's postmasters that are up and running, some are for
psql, some are for web pages, one is my running an insane 'select * from
bigtable order by random()'

The first number after the 0 column is size, the third one is the shared
memory it's using.  Notice the delta on these is 1 to 4 megabytes or so.
The query doing the heavy lifting is about 4 megs, the others are about 1
meg deltas.

> What is it dependent on ?

EVERYTHING.  I.e. what the backend is doing will determine the amount of
memory it is using.  Plus the settings in postgresql.conf for things like
sort_mem and buffers.

> What variables affect the size of a process by a user accessing a table in
> the database ?
>
> Postgresql can be configured with a --max-backends options which means the
> no. of connections which can be established at any given time to the
> server,which also means that there will be an equal no. of process.
>
> In my opinion there should be some computation with regards to the amount of
> RAM or shared buffer space and the no. of processes and size of these
> processes.

Well, it's not that simple.  After experimenting with postgresql, if you
feel you have a handle on how to compute it, I'm sure it would gladly
accepted as a useful tool by all of the folks who use postgresql.

> I am using 256 MB Ram on a RHL 8 system. I have made the shared buffer space
> 190 MB i.e.. 75 % of my system. My database is going to be almost 80 GB.

Wow, that is seriously underpowered in the RAM department.  My 3 year old
server that handles web/database/ldap services has 1.5Gig and uses about
600 Megs of ram for programs with about 800+Megs for kernel cache and
buffer.

Also, it's not a good idea to just give postgresql huge amounts of buffer
memory.  Generally, the kernel is better at buffering the disk than
postgresql is, and cranking up postgresql's buffers to >50% of the
available RAM means that the kernel will always be playing catch up with
it's buffers, and postgresql will be the only layer buffering.

While we're at it, don't set sort_mem real high either, especially if you
plan on handling lots of users at the same time, as each sort is limited
to sort_mem, which means that a query with three sorts in it could use
3*sort_mem memory, and if that query gets run by a dozen people at once,
then you'd be looking at 3*12*sort_mem usage.  8 meg is a good
intermediate setting for sort_mem for most folks.

Recommendations:  Go to at least 1 Gig of ram.  Give 256 Megs or so to
postgresql buffers.  Anything after that is likely to not make for any
faster performance.  If you can fit in more than 1 gig then do so.  Memory
is your biggest limitation right now.

Use a fast RAID array setup.  Lots of disks in a RAID 5 is a good
compromise of performance and storage space.  Large RAID 0 setups are the
fastest, but a single drive failure can result in all your data being
lost.  >2 disks in a RAID 1 is a good setup for something that is mostly
read (95% reads or more).  Think data warehouse.

The problem is that you're asking how to optimize postgresql but how you
optimize any database is greatly affected by the type of load you're
looking at.  If you run batch files at night on 10G data files, then serve
them read only during the day, your needs are going to be vastly different
than if you are running a dynamic multi-user system with lots of writes
going on.

But you can't go wrong with more RAM.  How much can your server hold?
Memory bandwidth is more important than CPU speed for most postgresql
applications, and the same is true for the drives, having lots of little
fast drives is way better than one or two big slower ones.  SCSI is almost
always faster than IDE, all other things being equal (i.e. my 80 gig IDE
"mass storage" drives are way faster than a 2 Gig Ultra Wide SCSI drive
from 6 years ago would be, but any modern SCSI drive will kick the butt on
my IDE drives.


pgsql-general by date:

Previous
From: Manfred Koizar
Date:
Subject: Re: postmaster has high CPU (system) utilization
Next
From: Dennis Gearon
Date:
Subject: Re: Please help with this error message