Re: performance enhancements for PostgreSQL - Mailing list pgsql-general

From scott.marlowe
Subject Re: performance enhancements for PostgreSQL
Date
Msg-id Pine.LNX.4.33.0211181239490.17412-100000@css120.ihs.com
Whole thread Raw
In response to performance enhancements for PostgreSQL  ("Johnson, Shaunn" <SJohnson6@bcbsm.com>)
List pgsql-general
On Mon, 18 Nov 2002, Johnson, Shaunn wrote:

> Howdy:
>
> I have a Linux server running PostgreSQL 7.2.1. with about 1 Gig of
> memory.  The proc speed is about 1.14 GHz.
>
> I'm getting more and more concerned about how often
> the database gets used and the days (like today) where I
> wonder if buying more memory rather than buying a 2nd CPU
> was such a great idea.
>
> In an effort to enhance / streamline performance, I've done
> the following:
>
> * memory upgrade from 512M to 1G
> * move RAID5 to scsi drives (10K RPM)
> * set up cron script to vacuum database weekly
> * set number of client connects ( i.e., /usr/bin/postmaster -i -B 128 -N 64
> -d 4 )

You could probably allocate WAY more buffer blocks than that.  I run 4000
on most of my medium weight machines, and with a gig of ram you could get
away with quite a bit more, but you'll need to increads shmmax and shmall
to go very high.   But I don't think that's your major problem.

More important, don't start the postmaster that way.  Edit the
$PGDATA/postgresql.conf file, then use the pg_ctl command to start and
stop it.

> But I'm at the point now that I can't kill some jobs.  Yes, I know
> I shouldn't use 'kill' in any forceful way, but just a kill seems to do
> nothing (or, if it is doing something, it's not fast enough
> for the user community and it's stopping production).

You can kill individual backends pretty safely, it's the postmaster you
can't kill -9 safely.  If you see a postgres child running away with all
your memory etc... you can kill -9 that pid pretty safely.

> I got this from a co-worker:
>
> [snip]
>
> the server is blocking on access to the metadata tables.  not even
> logins are being processed.  I'm not sure what caused the
> problem, but I think a database restart is the best course,
> which I have been trying to do.  Do not kill -9, as it will corrupt  the
> WAL.

If you want to shut down the server and it doesn't seem to respond to
pg_ctl stop, try 'pg_ctl -m fast stop' and see if that works.

> I see some errors in the messages file regarding the RAID drives
> (the filesystem where the  database lives)
>
> [snip error]
>
> Nov 18 12:33:51 hmp kernel: scsi : aborting command due to timeout : pid 0,
> scsi
> 2, channel 2, id 0, lun 1 Write (10) 20 11 d2 76 9f 00 00 40 00
> Nov 18 12:33:51 hmp kernel: scsi : aborting command due to timeout : pid 0,
> scsi
> 2, channel 2, id 0, lun 1 Write (10) 20 11 d2 76 df 00 00 08 00
> Nov 18 12:33:51 hmp kernel: scsi : aborting command due to timeout : pid 0,
> scsi
> 2, channel 2, id 0, lun 1 Write (10) 20 11 d2 76 ef 00 00 08 00
> Nov 18 12:33:52 hmp kernel: scsi : aborting command due to timeout : pid 0,
> scsi
> 2, channel 2, id 0, lun 1 Write (10) 20 11 d2 77 27 00 00 40 00
> Nov 18 12:33:52 hmp kernel: scsi : aborting command due to timeout : pid 0,
> scsi
> 2, channel 2, id 0, lun 1 Write (10) 20 11 d2 77 67 00 00 40 00
> Nov 18 12:33:53 hmp kernel: scsi : aborting command due to timeout : pid 0,
> scsi

That looks like a dead drive in your RAID array.  What does 'cat
/proc/mdstat' say about the drive (are you using linux software raid, or a
hardware controller?)

> * have can I figure out how access to the metadata tables
>    are being stopped? (my guess is the error on the scsi drive, but ... )

Sounds like processes are hanging, and users are just trying to reconnect
over and over and you're running out of connections.  This is a symptom,
not the problem, which is your machine is having issues.

> * how to restart PostgreSQL without running the risk of corrupting data?
> * what are the benefits to adding a 2nd CPU over, say, more memory?

pg_ctl -m fast stop

> I'm sorry that I don't have enough information at this time ... I'm getting
> swamped by users as I type this.

Well, good luck.  Write back to let us know how things are going.


pgsql-general by date:

Previous
From: Medi Montaseri
Date:
Subject: Re: pfree() core dump in 7.2.3
Next
From: "Williams, Travis L, NPONS"
Date:
Subject: Re: More time manipulation..