performance enhancements for PostgreSQL - Mailing list pgsql-general

From Johnson, Shaunn
Subject performance enhancements for PostgreSQL
Date
Msg-id 73309C2FDD95D11192E60008C7B1D5BB04C74253@snt452.corp.bcbsm.com
Whole thread Raw
Responses Re: performance enhancements for PostgreSQL  ("scott.marlowe" <scott.marlowe@ihs.com>)
List pgsql-general

Howdy:

I have a Linux server running PostgreSQL 7.2.1. with about 1 Gig of
memory.  The proc speed is about 1.14 GHz. 

I'm getting more and more concerned about how often
the database gets used and the days (like today) where I
wonder if buying more memory rather than buying a 2nd CPU
was such a great idea.

In an effort to enhance / streamline performance, I've done
the following:

* memory upgrade from 512M to 1G
* move RAID5 to scsi drives (10K RPM)
* set up cron script to vacuum database weekly
* set number of client connects ( i.e., /usr/bin/postmaster -i -B 128 -N 64 -d 4 )

But I'm at the point now that I can't kill some jobs.  Yes, I know
I shouldn't use 'kill' in any forceful way, but just a kill seems to do
nothing (or, if it is doing something, it's not fast enough
for the user community and it's stopping production).

I got this from a co-worker:

[snip]

the server is blocking on access to the metadata tables.  not even
logins are being processed.  I'm not sure what caused the
problem, but I think a database restart is the best course,
which I have been trying to do.  Do not kill -9, as it will corrupt  the WAL.

[/snip]

I see some errors in the messages file regarding the RAID drives
(the filesystem where the  database lives)

[snip error]

Nov 18 12:33:51 hmp kernel: scsi : aborting command due to timeout : pid 0, scsi
2, channel 2, id 0, lun 1 Write (10) 20 11 d2 76 9f 00 00 40 00
Nov 18 12:33:51 hmp kernel: scsi : aborting command due to timeout : pid 0, scsi
2, channel 2, id 0, lun 1 Write (10) 20 11 d2 76 df 00 00 08 00
Nov 18 12:33:51 hmp kernel: scsi : aborting command due to timeout : pid 0, scsi
2, channel 2, id 0, lun 1 Write (10) 20 11 d2 76 ef 00 00 08 00
Nov 18 12:33:52 hmp kernel: scsi : aborting command due to timeout : pid 0, scsi
2, channel 2, id 0, lun 1 Write (10) 20 11 d2 77 27 00 00 40 00
Nov 18 12:33:52 hmp kernel: scsi : aborting command due to timeout : pid 0, scsi
2, channel 2, id 0, lun 1 Write (10) 20 11 d2 77 67 00 00 40 00
Nov 18 12:33:53 hmp kernel: scsi : aborting command due to timeout : pid 0, scsi

[/snip error]

my questions are:

* have can I figure out how access to the metadata tables
   are being stopped? (my guess is the error on the scsi drive, but ... )
* how to restart PostgreSQL without running the risk of corrupting data?
* what are the benefits to adding a 2nd CPU over, say, more memory?

I'm sorry that I don't have enough information at this time ... I'm getting
swamped by users as I type this.

Thanks!

-X

pgsql-general by date:

Previous
From: Gregory Seidman
Date:
Subject: Re: More time manipulation..
Next
From: Doug McNaught
Date:
Subject: Re: performance enhancements for PostgreSQL