Re: performance enhancements for PostgreSQL: update - Mailing list pgsql-general

From Johnson, Shaunn
Subject Re: performance enhancements for PostgreSQL: update
Date
Msg-id 73309C2FDD95D11192E60008C7B1D5BB04C74259@snt452.corp.bcbsm.com
Whole thread Raw
Responses Re: performance enhancements for PostgreSQL: update  ("scott.marlowe" <scott.marlowe@ihs.com>)
List pgsql-general

--thanks for the reply all

--this is what i have so far:

--[snip]

> In an effort to enhance / streamline performance, I've done
> the following:
>
> * memory upgrade from 512M to 1G
> * move RAID5 to scsi drives (10K RPM)
> * set up cron script to vacuum database weekly
> * set number of client connects ( i.e., /usr/bin/postmaster -i -B 128 -N 64
> -d 4 )

--in the startup script, i launch / end postgres this way:

--[snip from startup]

su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl -D $PGDATA -o '-i -B 128 -N 64 -d 2' \
-p /usr/bin/postmaster start >/dev/null "

--and--

su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl stop -D $PGDATA -s -m fast" > /dev/null 2>&1

--[/snip]

--i have been trying to use the postgres.conf file but i haven't
--had much success with it.  will put that up higher in the priority
--food chain.

--can someone explain what the benefits for that (allocating
--more buffer blocks) will be?  rather, can someone post a link
--to the docs that explains why it would benefit me?

You could probably allocate WAY more buffer blocks than that.  I run 4000
on most of my medium weight machines, and with a gig of ram you could get
away with quite a bit more, but you'll need to increads shmmax and shmall
to go very high.   But I don't think that's your major problem.
 
--working on it now ... had to take the '-d 4' part of my start up options
--down to a '-d 2' ... the system was thinking / writing for too long just
--to test a simple query ...

More important, don't start the postmaster that way.  Edit the
$PGDATA/postgresql.conf file, then use the pg_ctl command to start and
stop it.

> But I'm at the point now that I can't kill some jobs.  Yes, I know
> I shouldn't use 'kill' in any forceful way, but just a kill seems to do
> nothing (or, if it is doing something, it's not fast enough
> for the user community and it's stopping production).

--i've seen it first hand why it's not good, but, i got this not
--too long ago:

--[excerpt from old email]
Use kill -9. Do a:

killall -9 postgres
killall -9 postmaster

The tip message

'Don't kill -9 the postmaster'

is old and was added when havoc could be caused by an old
postgres backend process. Consider the scenrio:

1. postmaster started
2. postgres started  (Session #1)
3. postmaster killed (-9)
4. postmaster restarted
5. postgres started (Session #2)

Now there isn't any synchronization between Sesison #1 and
Session #2 at all, which would lead to data corruption. This
scenario was fixed a long time ago (7.1?). The whole 'Don't kill
-9 the postmaster' comment was actually a tongue-in-cheek remark
by me regarding a parallel discussion of RedHat init scripts.
The corruption possibility has long-since been fixed. Since I've
seen FUD claiming PostgreSQL doesn't have sufficient
crash-recovery because of the tip, I suggest the tip be changed to:

'Feel free to kill -9 the postmaster'

--[/excerpt]

--is this true?   perhaps someone can verify this.

You can kill individual backends pretty safely, it's the postmaster you
can't kill -9 safely.  If you see a postgres child running away with all
your memory etc... you can kill -9 that pid pretty safely. 

> I got this from a co-worker:
>
> [snip]
>
> the server is blocking on access to the metadata tables.  not even
> logins are being processed.  I'm not sure what caused the
> problem, but I think a database restart is the best course,
> which I have been trying to do.  Do not kill -9, as it will corrupt  the
> WAL.

--yup.  tried that and nothing changed for about 20 minutes or so ... so
--i had to reboot the server (i *hate* doing that).

If you want to shut down the server and it doesn't seem to respond to
pg_ctl stop, try 'pg_ctl -m fast stop' and see if that works.

--[snip raid errors]

--this is what the  /proc/mdstat says

--[snip from /proc/mdstats]

Personalities :
read_ahead not set
unused devices: <none>

--[/snip from /proc/mdstats]

That looks like a dead drive in your RAID array.  What does 'cat
/proc/mdstat' say about the drive

--i'm doing a linux software raid

(are you using linux software raid, or a hardware controller?)

> * have can I figure out how access to the metadata tables
>   are being stopped? (my guess is the error on the scsi drive, but ... )

Sounds like processes are hanging, and users are just trying to
reconnect over and over and you're running out of connections. 
This is a symptom, not the problem, which is your machine is
having issues.

[snip rest of email]

--i'm thinking more and more that while someone was trying to
--update / insert data into a table the scsi disk stopped writing
--(or just gave a lot of errors while writing).  the thing is, *some* people
--could do work (albeit very little).

--at any rate, i'll have to investigate more later (probably the
--weekend) and stress test the array. 

--also, i hope to get more messages in the log file with a
--better debug level to help track the problem.

--thanks again!

-X

pgsql-general by date:

Previous
From: Justin Clift
Date:
Subject: Looking for a "Linux on Playstation 2" person to compile PostgreSQL RPM's
Next
From: "scott.marlowe"
Date:
Subject: Re: performance enhancements for PostgreSQL: update